feat: 添加项目配置文件和文档以确保跨机可运行性
- 添加 requirements.txt 用于 pip 依赖管理 - 添加 setup.py 用于包安装 - 添加 QUICKSTART.md 快速开始指南 - 添加 check_environment.py 环境检查脚本 - 更新 README.md 添加详细的安装步骤 - 更新 .gitignore 忽略模型和数据文件
This commit is contained in:
parent
a9c049132b
commit
937544e7b6
17
.gitignore
vendored
17
.gitignore
vendored
@ -21,12 +21,17 @@ ENV/
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# 大文件
|
||||
*.joblib
|
||||
*.pkl
|
||||
*.h5
|
||||
*.hdf5
|
||||
*.pb
|
||||
# 大文件 - 模型文件(训练后生成)
|
||||
models/*.joblib
|
||||
models/*.pkl
|
||||
models/*.h5
|
||||
models/*.hdf5
|
||||
models/*.pb
|
||||
|
||||
# 大文件 - 数据文件(需要单独下载)
|
||||
data/*.csv
|
||||
data/*.zip
|
||||
data/*.parquet
|
||||
|
||||
# 测试覆盖率
|
||||
.coverage
|
||||
|
||||
225
QUICKSTART.md
Normal file
225
QUICKSTART.md
Normal file
@ -0,0 +1,225 @@
|
||||
# 快速开始指南
|
||||
|
||||
本指南将帮助您在5分钟内运行信用卡欺诈检测系统。
|
||||
|
||||
## 前置要求
|
||||
|
||||
- Python 3.10 或更高版本
|
||||
- pip(Python包管理器)
|
||||
|
||||
## 安装步骤
|
||||
|
||||
### 1. 克隆仓库
|
||||
|
||||
```bash
|
||||
git clone <repository-url>
|
||||
cd Credit-Card-Fraud-Detection
|
||||
```
|
||||
|
||||
### 2. 创建虚拟环境(推荐)
|
||||
|
||||
**Windows:**
|
||||
```bash
|
||||
python -m venv venv
|
||||
venv\Scripts\activate
|
||||
```
|
||||
|
||||
**Linux/Mac:**
|
||||
```bash
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
### 3. 安装依赖
|
||||
|
||||
**方式1: 使用 requirements.txt(推荐)**
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
**方式2: 使用 setup.py**
|
||||
```bash
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
**方式3: 使用 uv(需要先安装 uv)**
|
||||
```bash
|
||||
pip install uv
|
||||
uv sync
|
||||
```
|
||||
|
||||
### 4. 准备数据
|
||||
|
||||
确保 `data/creditcard.csv` 文件存在。如果不存在,请:
|
||||
|
||||
1. 从 Kaggle 下载数据集: https://www.kaggle.com/mlg-ulb/creditcardfraud
|
||||
2. 将下载的 `creditcard.csv` 文件放入 `data/` 目录
|
||||
|
||||
### 5. 训练模型
|
||||
|
||||
```bash
|
||||
python src/train.py
|
||||
```
|
||||
|
||||
训练完成后,模型文件将保存在 `models/` 目录中:
|
||||
- `random_forest_model.joblib` - 随机森林模型
|
||||
- `logistic_regression_model.joblib` - 逻辑回归模型
|
||||
- `scaler.joblib` - 特征缩放器
|
||||
|
||||
### 6. 运行应用
|
||||
|
||||
**方式1: 使用 agent_app.py(推荐)**
|
||||
```bash
|
||||
python src/agent_app.py
|
||||
```
|
||||
|
||||
这将自动启动 Web 界面并在浏览器中打开。
|
||||
|
||||
**方式2: 直接运行 Streamlit**
|
||||
```bash
|
||||
streamlit run src/streamlit_app.py
|
||||
```
|
||||
|
||||
## 使用说明
|
||||
|
||||
### Web 界面使用
|
||||
|
||||
1. **选择输入方式**
|
||||
- 上传CSV文件:批量处理交易数据
|
||||
- 手动输入:输入30个特征值
|
||||
|
||||
2. **输入特征**
|
||||
- Time: 交易时间(秒)
|
||||
- V1-V28: PCA转换后的特征
|
||||
- Amount: 交易金额
|
||||
|
||||
3. **点击"检测欺诈"按钮**
|
||||
- 系统会显示预测结果
|
||||
- 查看特征解释
|
||||
- 获取行动建议
|
||||
|
||||
### 命令行使用
|
||||
|
||||
```python
|
||||
from src.agent_app import create_agent
|
||||
|
||||
agent = create_agent()
|
||||
|
||||
transaction = [
|
||||
0, -1.3598071336738, -0.0727811733098497, 2.53634673796914, 1.37815522427443,
|
||||
-0.338320769942518, 0.462387777762292, 0.239598554061257, 0.0986979012610507,
|
||||
0.363786969611213, 0.0907941719789316, -0.551599533260813, -0.617800855762348,
|
||||
-0.991389847235408, -0.311169353699879, 1.46817697209427, -0.470400525259478,
|
||||
0.207971241929242, 0.0257905801985591, 0.403992960255733, 0.251412098239705,
|
||||
-0.018306777944153, 0.277837575558899, -0.110473910188767, 0.0669280749146731,
|
||||
0.128539358273528, -0.189114843888824, 0.133558376740387, -0.0210530534538215,
|
||||
149.62
|
||||
]
|
||||
|
||||
result = agent.process_transaction(transaction)
|
||||
print(f"预测类别: {result.evaluation.class_name}")
|
||||
print(f"欺诈概率: {result.evaluation.fraud_probability:.4f}")
|
||||
```
|
||||
|
||||
## 运行测试
|
||||
|
||||
```bash
|
||||
# 运行所有测试
|
||||
pytest tests/
|
||||
|
||||
# 运行特定测试文件
|
||||
pytest tests/test_data.py
|
||||
|
||||
# 查看测试覆盖率
|
||||
pytest tests/ --cov=src --cov-report=html
|
||||
```
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q1: 找不到数据文件
|
||||
|
||||
**错误信息**: `FileNotFoundError: data/creditcard.csv`
|
||||
|
||||
**解决方案**:
|
||||
1. 确保数据文件存在于 `data/` 目录
|
||||
2. 从 Kaggle 下载数据集并放入正确位置
|
||||
|
||||
### Q2: 模型文件不存在
|
||||
|
||||
**错误信息**: `RuntimeError: 模型或缩放器加载失败`
|
||||
|
||||
**解决方案**:
|
||||
```bash
|
||||
# 先训练模型
|
||||
python src/train.py
|
||||
```
|
||||
|
||||
### Q3: 依赖安装失败
|
||||
|
||||
**错误信息**: `pip install` 失败
|
||||
|
||||
**解决方案**:
|
||||
1. 确保使用 Python 3.10+
|
||||
2. 升级 pip: `pip install --upgrade pip`
|
||||
3. 使用国内镜像源: `pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple`
|
||||
|
||||
### Q4: Streamlit 无法启动
|
||||
|
||||
**错误信息**: `streamlit not found`
|
||||
|
||||
**解决方案**:
|
||||
```bash
|
||||
pip install streamlit
|
||||
```
|
||||
|
||||
### Q5: 端口被占用
|
||||
|
||||
**错误信息**: `Address already in use`
|
||||
|
||||
**解决方案**:
|
||||
```bash
|
||||
# 使用不同端口启动
|
||||
streamlit run src/streamlit_app.py --server.port 8502
|
||||
```
|
||||
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
Credit-Card-Fraud-Detection/
|
||||
├── data/ # 数据目录
|
||||
│ └── creditcard.csv # 信用卡交易数据
|
||||
├── models/ # 模型目录
|
||||
│ ├── random_forest_model.joblib
|
||||
│ ├── logistic_regression_model.joblib
|
||||
│ └── scaler.joblib
|
||||
├── src/ # 源代码
|
||||
│ ├── agent_app.py # Agent 入口(推荐使用)
|
||||
│ ├── streamlit_app.py # Streamlit Web 应用
|
||||
│ ├── train.py # 模型训练
|
||||
│ ├── infer.py # 推理接口
|
||||
│ ├── data.py # 数据处理
|
||||
│ └── features.py # 特征定义
|
||||
├── tests/ # 测试文件
|
||||
├── requirements.txt # Python 依赖
|
||||
├── setup.py # 安装脚本
|
||||
├── QUICKSTART.md # 快速开始指南(本文件)
|
||||
└── README.md # 详细文档
|
||||
```
|
||||
|
||||
## 下一步
|
||||
|
||||
- 阅读 [README.md](README.md) 了解项目详情
|
||||
- 查看 [src/](src/) 目录下的源代码
|
||||
- 运行测试确保一切正常
|
||||
- 开始使用系统进行欺诈检测
|
||||
|
||||
## 技术支持
|
||||
|
||||
如遇到问题,请:
|
||||
1. 检查本指南的"常见问题"部分
|
||||
2. 查看 [README.md](README.md) 获取更多信息
|
||||
3. 提交 Issue 到项目仓库
|
||||
|
||||
## 许可证
|
||||
|
||||
MIT License
|
||||
83
README.md
83
README.md
@ -129,21 +129,90 @@ ml_course_design/
|
||||
- **Web 应用**: Streamlit
|
||||
- **依赖管理**: uv
|
||||
|
||||
## 环境要求
|
||||
## 快速开始
|
||||
|
||||
- Python 3.10+
|
||||
- uv (用于依赖管理)
|
||||
### 前置要求
|
||||
|
||||
## 安装依赖
|
||||
- Python 3.10 或更高版本
|
||||
- pip(Python包管理器)
|
||||
|
||||
### 安装步骤
|
||||
|
||||
#### 1. 克隆仓库
|
||||
|
||||
```bash
|
||||
# 使用 uv 安装依赖(推荐)
|
||||
uv sync
|
||||
git clone <repository-url>
|
||||
cd Credit-Card-Fraud-Detection
|
||||
```
|
||||
|
||||
# 或者使用 pip
|
||||
#### 2. 创建虚拟环境(推荐)
|
||||
|
||||
**Windows:**
|
||||
```bash
|
||||
python -m venv venv
|
||||
venv\Scripts\activate
|
||||
```
|
||||
|
||||
**Linux/Mac:**
|
||||
```bash
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
#### 3. 安装依赖
|
||||
|
||||
**方式1: 使用 requirements.txt(推荐)**
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
**方式2: 使用 setup.py**
|
||||
```bash
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
**方式3: 使用 uv(需要先安装 uv)**
|
||||
```bash
|
||||
pip install uv
|
||||
uv sync
|
||||
```
|
||||
|
||||
#### 4. 准备数据
|
||||
|
||||
确保 `data/creditcard.csv` 文件存在。如果不存在,请:
|
||||
|
||||
1. 从 Kaggle 下载数据集: https://www.kaggle.com/mlg-ulb/creditcardfraud
|
||||
2. 将下载的 `creditcard.csv` 文件放入 `data/` 目录
|
||||
|
||||
#### 5. 训练模型
|
||||
|
||||
```bash
|
||||
python src/train.py
|
||||
```
|
||||
|
||||
训练完成后,模型文件将保存在 `models/` 目录中:
|
||||
- `random_forest_model.joblib` - 随机森林模型
|
||||
- `logistic_regression_model.joblib` - 逻辑回归模型
|
||||
- `scaler.joblib` - 特征缩放器
|
||||
|
||||
#### 6. 运行应用
|
||||
|
||||
**方式1: 使用 agent_app.py(推荐)**
|
||||
```bash
|
||||
python src/agent_app.py
|
||||
```
|
||||
|
||||
这将自动启动 Web 界面并在浏览器中打开。
|
||||
|
||||
**方式2: 直接运行 Streamlit**
|
||||
```bash
|
||||
streamlit run src/streamlit_app.py
|
||||
```
|
||||
|
||||
### 详细使用指南
|
||||
|
||||
查看 [QUICKSTART.md](QUICKSTART.md) 获取详细的使用说明和常见问题解答。
|
||||
|
||||
## 运行测试
|
||||
|
||||
```bash
|
||||
|
||||
193
check_environment.py
Normal file
193
check_environment.py
Normal file
@ -0,0 +1,193 @@
|
||||
"""
|
||||
环境检查脚本
|
||||
用于验证项目依赖和环境配置是否正确
|
||||
"""
|
||||
|
||||
import sys
|
||||
import importlib
|
||||
from pathlib import Path
|
||||
|
||||
def check_python_version():
|
||||
"""检查Python版本"""
|
||||
print("=" * 60)
|
||||
print("检查 Python 版本...")
|
||||
print("=" * 60)
|
||||
|
||||
version = sys.version_info
|
||||
print(f"当前 Python 版本: {version.major}.{version.minor}.{version.micro}")
|
||||
|
||||
if version.major == 3 and version.minor >= 10:
|
||||
print("✓ Python 版本符合要求 (>= 3.10)")
|
||||
return True
|
||||
else:
|
||||
print("✗ Python 版本不符合要求,需要 3.10 或更高版本")
|
||||
return False
|
||||
|
||||
def check_dependencies():
|
||||
"""检查必要的依赖包"""
|
||||
print("\n" + "=" * 60)
|
||||
print("检查依赖包...")
|
||||
print("=" * 60)
|
||||
|
||||
required_packages = {
|
||||
'numpy': 'numpy',
|
||||
'polars': 'polars',
|
||||
'sklearn': 'scikit-learn',
|
||||
'imblearn': 'imbalanced-learn',
|
||||
'matplotlib': 'matplotlib',
|
||||
'seaborn': 'seaborn',
|
||||
'joblib': 'joblib',
|
||||
'pydantic': 'pydantic',
|
||||
'streamlit': 'streamlit',
|
||||
}
|
||||
|
||||
missing_packages = []
|
||||
|
||||
for module_name, package_name in required_packages.items():
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
version = getattr(module, '__version__', 'unknown')
|
||||
print(f"✓ {package_name:20s} - 版本: {version}")
|
||||
except ImportError:
|
||||
print(f"✗ {package_name:20s} - 未安装")
|
||||
missing_packages.append(package_name)
|
||||
|
||||
if missing_packages:
|
||||
print(f"\n缺少 {len(missing_packages)} 个依赖包")
|
||||
print(f"请运行: pip install {' '.join(missing_packages)}")
|
||||
return False
|
||||
else:
|
||||
print("\n✓ 所有依赖包已正确安装")
|
||||
return True
|
||||
|
||||
def check_data_files():
|
||||
"""检查数据文件"""
|
||||
print("\n" + "=" * 60)
|
||||
print("检查数据文件...")
|
||||
print("=" * 60)
|
||||
|
||||
data_dir = Path("data")
|
||||
creditcard_csv = data_dir / "creditcard.csv"
|
||||
|
||||
if creditcard_csv.exists():
|
||||
file_size = creditcard_csv.stat().st_size / (1024 * 1024) # MB
|
||||
print(f"✓ data/creditcard.csv 存在 (大小: {file_size:.2f} MB)")
|
||||
return True
|
||||
else:
|
||||
print("✗ data/creditcard.csv 不存在")
|
||||
print("请从以下地址下载数据集:")
|
||||
print("https://www.kaggle.com/mlg-ulb/creditcardfraud")
|
||||
print("并将 creditcard.csv 文件放入 data/ 目录")
|
||||
return False
|
||||
|
||||
def check_model_files():
|
||||
"""检查模型文件"""
|
||||
print("\n" + "=" * 60)
|
||||
print("检查模型文件...")
|
||||
print("=" * 60)
|
||||
|
||||
models_dir = Path("models")
|
||||
required_models = [
|
||||
"random_forest_model.joblib",
|
||||
"logistic_regression_model.joblib",
|
||||
"scaler.joblib"
|
||||
]
|
||||
|
||||
missing_models = []
|
||||
|
||||
for model_file in required_models:
|
||||
model_path = models_dir / model_file
|
||||
if model_path.exists():
|
||||
file_size = model_path.stat().st_size / 1024 # KB
|
||||
print(f"✓ {model_file:35s} (大小: {file_size:.2f} KB)")
|
||||
else:
|
||||
print(f"✗ {model_file:35s} - 不存在")
|
||||
missing_models.append(model_file)
|
||||
|
||||
if missing_models:
|
||||
print(f"\n缺少 {len(missing_models)} 个模型文件")
|
||||
print("请运行: python src/train.py 来训练模型")
|
||||
return False
|
||||
else:
|
||||
print("\n✓ 所有模型文件已存在")
|
||||
return True
|
||||
|
||||
def check_source_files():
|
||||
"""检查源代码文件"""
|
||||
print("\n" + "=" * 60)
|
||||
print("检查源代码文件...")
|
||||
print("=" * 60)
|
||||
|
||||
src_dir = Path("src")
|
||||
required_files = [
|
||||
"__init__.py",
|
||||
"data.py",
|
||||
"features.py",
|
||||
"train.py",
|
||||
"infer.py",
|
||||
"agent_app.py",
|
||||
"streamlit_app.py"
|
||||
]
|
||||
|
||||
missing_files = []
|
||||
|
||||
for file_name in required_files:
|
||||
file_path = src_dir / file_name
|
||||
if file_path.exists():
|
||||
print(f"✓ src/{file_name}")
|
||||
else:
|
||||
print(f"✗ src/{file_name} - 不存在")
|
||||
missing_files.append(file_name)
|
||||
|
||||
if missing_files:
|
||||
print(f"\n缺少 {len(missing_files)} 个源代码文件")
|
||||
return False
|
||||
else:
|
||||
print("\n✓ 所有源代码文件完整")
|
||||
return True
|
||||
|
||||
def run_all_checks():
|
||||
"""运行所有检查"""
|
||||
print("\n" + "=" * 60)
|
||||
print("信用卡欺诈检测系统 - 环境检查")
|
||||
print("=" * 60)
|
||||
|
||||
results = {
|
||||
"Python 版本": check_python_version(),
|
||||
"依赖包": check_dependencies(),
|
||||
"数据文件": check_data_files(),
|
||||
"模型文件": check_model_files(),
|
||||
"源代码文件": check_source_files(),
|
||||
}
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("检查结果汇总")
|
||||
print("=" * 60)
|
||||
|
||||
for check_name, result in results.items():
|
||||
status = "✓ 通过" if result else "✗ 失败"
|
||||
print(f"{check_name:15s}: {status}")
|
||||
|
||||
all_passed = all(results.values())
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
if all_passed:
|
||||
print("✓ 所有检查通过!您可以运行系统了")
|
||||
print("\n运行命令:")
|
||||
print(" python src/agent_app.py")
|
||||
else:
|
||||
print("✗ 部分检查未通过,请根据上述提示解决问题")
|
||||
print("\n快速修复:")
|
||||
if not results["依赖包"]:
|
||||
print(" 1. 安装依赖: pip install -r requirements.txt")
|
||||
if not results["数据文件"]:
|
||||
print(" 2. 下载数据: 从 Kaggle 下载 creditcard.csv 到 data/ 目录")
|
||||
if not results["模型文件"]:
|
||||
print(" 3. 训练模型: python src/train.py")
|
||||
print("=" * 60)
|
||||
|
||||
return all_passed
|
||||
|
||||
if __name__ == "__main__":
|
||||
success = run_all_checks()
|
||||
sys.exit(0 if success else 1)
|
||||
9
requirements.txt
Normal file
9
requirements.txt
Normal file
@ -0,0 +1,9 @@
|
||||
numpy>=1.24.0
|
||||
polars>=0.19.0
|
||||
scikit-learn>=1.3.0
|
||||
imbalanced-learn>=0.11.0
|
||||
matplotlib>=3.7.0
|
||||
seaborn>=0.12.0
|
||||
joblib>=1.3.0
|
||||
pydantic>=2.0.0
|
||||
streamlit>=1.28.0
|
||||
25
setup.py
Normal file
25
setup.py
Normal file
@ -0,0 +1,25 @@
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
setup(
|
||||
name="creditcard-fraud-detection",
|
||||
version="0.1.0",
|
||||
description="信用卡欺诈检测系统",
|
||||
packages=find_packages(),
|
||||
python_requires=">=3.10",
|
||||
install_requires=[
|
||||
"numpy>=1.24.0",
|
||||
"polars>=0.19.0",
|
||||
"scikit-learn>=1.3.0",
|
||||
"imbalanced-learn>=0.11.0",
|
||||
"matplotlib>=3.7.0",
|
||||
"seaborn>=0.12.0",
|
||||
"joblib>=1.3.0",
|
||||
"pydantic>=2.0.0",
|
||||
"streamlit>=1.28.0",
|
||||
],
|
||||
entry_points={
|
||||
"console_scripts": [
|
||||
"train=src.train:train_and_evaluate",
|
||||
],
|
||||
},
|
||||
)
|
||||
Loading…
Reference in New Issue
Block a user