feat: 添加项目配置文件和文档以确保跨机可运行性

- 添加 requirements.txt 用于 pip 依赖管理 - 添加 setup.py 用于包安装 - 添加 QUICKSTART.md 快速开始指南 - 添加 check_environment.py 环境检查脚本 - 更新 README.md 添加详细的安装步骤 - 更新 .gitignore 忽略模型和数据文件
2026-01-15 21:34:25 +08:00 · 2026-01-15 21:34:25 +08:00 · 937544e7b6
commit 937544e7b6
parent a9c049132b
6 changed files with 539 additions and 13 deletions
--- a/.gitignore
+++ b/.gitignore
@ -21,12 +21,17 @@ ENV/
 *.swo
 *~
-# 大文件
+# 大文件 - 模型文件（训练后生成）
-*.joblib
+models/*.joblib
-*.pkl
+models/*.pkl
-*.h5
+models/*.h5
-*.hdf5
+models/*.hdf5
-*.pb
+models/*.pb
 # 大文件 - 数据文件（需要单独下载）
 data/*.csv
 data/*.zip
 data/*.parquet
 # 测试覆盖率
 .coverage
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@ -0,0 +1,225 @@
 # 快速开始指南
 本指南将帮助您在5分钟内运行信用卡欺诈检测系统。
 ## 前置要求
 - Python 3.10 或更高版本
 - pip（Python包管理器）
 ## 安装步骤
 ### 1. 克隆仓库
 ```bash
 git clone <repository-url>
 cd Credit-Card-Fraud-Detection
 ```
 ### 2. 创建虚拟环境（推荐）
 **Windows:**
 ```bash
 python -m venv venv
 venv\Scripts\activate
 ```
 **Linux/Mac:**
 ```bash
 python3 -m venv venv
 source venv/bin/activate
 ```
 ### 3. 安装依赖
 **方式1: 使用 requirements.txt（推荐）**
 ```bash
 pip install -r requirements.txt
 ```
 **方式2: 使用 setup.py**
 ```bash
 pip install -e .
 ```
 **方式3: 使用 uv（需要先安装 uv）**
 ```bash
 pip install uv
 uv sync
 ```
 ### 4. 准备数据
 确保 `data/creditcard.csv` 文件存在。如果不存在，请：
 1. 从 Kaggle 下载数据集: https://www.kaggle.com/mlg-ulb/creditcardfraud
 2. 将下载的 `creditcard.csv` 文件放入 `data/` 目录
 ### 5. 训练模型
 ```bash
 python src/train.py
 ```
 训练完成后，模型文件将保存在 `models/` 目录中：
 - `random_forest_model.joblib` - 随机森林模型
 - `logistic_regression_model.joblib` - 逻辑回归模型
 - `scaler.joblib` - 特征缩放器
 ### 6. 运行应用
 **方式1: 使用 agent_app.py（推荐）**
 ```bash
 python src/agent_app.py
 ```
 这将自动启动 Web 界面并在浏览器中打开。
 **方式2: 直接运行 Streamlit**
 ```bash
 streamlit run src/streamlit_app.py
 ```
 ## 使用说明
 ### Web 界面使用
 1. **选择输入方式**
   - 上传CSV文件：批量处理交易数据
   - 手动输入：输入30个特征值
 2. **输入特征**
   - Time: 交易时间（秒）
   - V1-V28: PCA转换后的特征
   - Amount: 交易金额
 3. **点击"检测欺诈"按钮**
   - 系统会显示预测结果
   - 查看特征解释
   - 获取行动建议
 ### 命令行使用
 ```python
 from src.agent_app import create_agent
 agent = create_agent()
 transaction = [
    0, -1.3598071336738, -0.0727811733098497, 2.53634673796914, 1.37815522427443,
    -0.338320769942518, 0.462387777762292, 0.239598554061257, 0.0986979012610507,
    0.363786969611213, 0.0907941719789316, -0.551599533260813, -0.617800855762348,
    -0.991389847235408, -0.311169353699879, 1.46817697209427, -0.470400525259478,
    0.207971241929242, 0.0257905801985591, 0.403992960255733, 0.251412098239705,
    -0.018306777944153, 0.277837575558899, -0.110473910188767, 0.0669280749146731,
    0.128539358273528, -0.189114843888824, 0.133558376740387, -0.0210530534538215,
    149.62
 ]
 result = agent.process_transaction(transaction)
 print(f"预测类别: {result.evaluation.class_name}")
 print(f"欺诈概率: {result.evaluation.fraud_probability:.4f}")
 ```
 ## 运行测试
 ```bash
 # 运行所有测试
 pytest tests/
 # 运行特定测试文件
 pytest tests/test_data.py
 # 查看测试覆盖率
 pytest tests/ --cov=src --cov-report=html
 ```
 ## 常见问题
 ### Q1: 找不到数据文件
 **错误信息**: `FileNotFoundError: data/creditcard.csv`
 **解决方案**:
 1. 确保数据文件存在于 `data/` 目录
 2. 从 Kaggle 下载数据集并放入正确位置
 ### Q2: 模型文件不存在
 **错误信息**: `RuntimeError: 模型或缩放器加载失败`
 **解决方案**:
 ```bash
 # 先训练模型
 python src/train.py
 ```
 ### Q3: 依赖安装失败
 **错误信息**: `pip install` 失败
 **解决方案**:
 1. 确保使用 Python 3.10+
 2. 升级 pip: `pip install --upgrade pip`
 3. 使用国内镜像源: `pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple`
 ### Q4: Streamlit 无法启动
 **错误信息**: `streamlit not found`
 **解决方案**:
 ```bash
 pip install streamlit
 ```
 ### Q5: 端口被占用
 **错误信息**: `Address already in use`
 **解决方案**:
 ```bash
 # 使用不同端口启动
 streamlit run src/streamlit_app.py --server.port 8502
 ```
 ## 项目结构
 ```
 Credit-Card-Fraud-Detection/
 ├── data/                    # 数据目录
 │   └── creditcard.csv      # 信用卡交易数据
 ├── models/                  # 模型目录
 │   ├── random_forest_model.joblib
 │   ├── logistic_regression_model.joblib
 │   └── scaler.joblib
 ├── src/                     # 源代码
 │   ├── agent_app.py        # Agent 入口（推荐使用）
 │   ├── streamlit_app.py    # Streamlit Web 应用
 │   ├── train.py            # 模型训练
 │   ├── infer.py            # 推理接口
 │   ├── data.py             # 数据处理
 │   └── features.py         # 特征定义
 ├── tests/                   # 测试文件
 ├── requirements.txt        # Python 依赖
 ├── setup.py                # 安装脚本
 ├── QUICKSTART.md           # 快速开始指南（本文件）
 └── README.md               # 详细文档
 ```
 ## 下一步
 - 阅读 [README.md](README.md) 了解项目详情
 - 查看 [src/](src/) 目录下的源代码
 - 运行测试确保一切正常
 - 开始使用系统进行欺诈检测
 ## 技术支持
 如遇到问题，请：
 1. 检查本指南的"常见问题"部分
 2. 查看 [README.md](README.md) 获取更多信息
 3. 提交 Issue 到项目仓库
 ## 许可证
 MIT License
--- a/README.md
+++ b/README.md
@ -129,21 +129,90 @@ ml_course_design/
 - **Web 应用**: Streamlit
 - **依赖管理**: uv
-## 环境要求
+## 快速开始
- Python 3.10+
+### 前置要求
 - uv (用于依赖管理)
-## 安装依赖
+- Python 3.10 或更高版本
 - pip（Python包管理器）
 ### 安装步骤
 #### 1. 克隆仓库
 ```bash
-# 使用 uv 安装依赖（推荐）
+git clone <repository-url>
-uv sync
+cd Credit-Card-Fraud-Detection
 ```
-# 或者使用 pip
+#### 2. 创建虚拟环境（推荐）
 **Windows:**
 ```bash
 python -m venv venv
 venv\Scripts\activate
 ```
 **Linux/Mac:**
 ```bash
 python3 -m venv venv
 source venv/bin/activate
 ```
 #### 3. 安装依赖
 **方式1: 使用 requirements.txt（推荐）**
 ```bash
 pip install -r requirements.txt
 ```
 **方式2: 使用 setup.py**
 ```bash
 pip install -e .
 ```
 **方式3: 使用 uv（需要先安装 uv）**
 ```bash
 pip install uv
 uv sync
 ```
 #### 4. 准备数据
 确保 `data/creditcard.csv` 文件存在。如果不存在，请：
 1. 从 Kaggle 下载数据集: https://www.kaggle.com/mlg-ulb/creditcardfraud
 2. 将下载的 `creditcard.csv` 文件放入 `data/` 目录
 #### 5. 训练模型
 ```bash
 python src/train.py
 ```
 训练完成后，模型文件将保存在 `models/` 目录中：
 - `random_forest_model.joblib` - 随机森林模型
 - `logistic_regression_model.joblib` - 逻辑回归模型
 - `scaler.joblib` - 特征缩放器
 #### 6. 运行应用
 **方式1: 使用 agent_app.py（推荐）**
 ```bash
 python src/agent_app.py
 ```
 这将自动启动 Web 界面并在浏览器中打开。
 **方式2: 直接运行 Streamlit**
 ```bash
 streamlit run src/streamlit_app.py
 ```
 ### 详细使用指南
 查看 [QUICKSTART.md](QUICKSTART.md) 获取详细的使用说明和常见问题解答。
 ## 运行测试
 ```bash
--- a/check_environment.py
+++ b/check_environment.py
@ -0,0 +1,193 @@
 """
 环境检查脚本
 用于验证项目依赖和环境配置是否正确
 """
 import sys
 import importlib
 from pathlib import Path
 def check_python_version():
    """检查Python版本"""
    print("=" * 60)
    print("检查 Python 版本...")
    print("=" * 60)
    version = sys.version_info
    print(f"当前 Python 版本: {version.major}.{version.minor}.{version.micro}")
    if version.major == 3 and version.minor >= 10:
        print("✓ Python 版本符合要求 (>= 3.10)")
        return True
    else:
        print("✗ Python 版本不符合要求，需要 3.10 或更高版本")
        return False
 def check_dependencies():
    """检查必要的依赖包"""
    print("\n" + "=" * 60)
    print("检查依赖包...")
    print("=" * 60)
    required_packages = {
        'numpy': 'numpy',
        'polars': 'polars',
        'sklearn': 'scikit-learn',
        'imblearn': 'imbalanced-learn',
        'matplotlib': 'matplotlib',
        'seaborn': 'seaborn',
        'joblib': 'joblib',
        'pydantic': 'pydantic',
        'streamlit': 'streamlit',
    }
    missing_packages = []
    for module_name, package_name in required_packages.items():
        try:
            module = importlib.import_module(module_name)
            version = getattr(module, '__version__', 'unknown')
            print(f"✓ {package_name:20s} - 版本: {version}")
        except ImportError:
            print(f"✗ {package_name:20s} - 未安装")
            missing_packages.append(package_name)
    if missing_packages:
        print(f"\n缺少 {len(missing_packages)} 个依赖包")
        print(f"请运行: pip install {' '.join(missing_packages)}")
        return False
    else:
        print("\n✓ 所有依赖包已正确安装")
        return True
 def check_data_files():
    """检查数据文件"""
    print("\n" + "=" * 60)
    print("检查数据文件...")
    print("=" * 60)
    data_dir = Path("data")
    creditcard_csv = data_dir / "creditcard.csv"
    if creditcard_csv.exists():
        file_size = creditcard_csv.stat().st_size / (1024 * 1024)  # MB
        print(f"✓ data/creditcard.csv 存在 (大小: {file_size:.2f} MB)")
        return True
    else:
        print("✗ data/creditcard.csv 不存在")
        print("请从以下地址下载数据集:")
        print("https://www.kaggle.com/mlg-ulb/creditcardfraud")
        print("并将 creditcard.csv 文件放入 data/ 目录")
        return False
 def check_model_files():
    """检查模型文件"""
    print("\n" + "=" * 60)
    print("检查模型文件...")
    print("=" * 60)
    models_dir = Path("models")
    required_models = [
        "random_forest_model.joblib",
        "logistic_regression_model.joblib",
        "scaler.joblib"
    ]
    missing_models = []
    for model_file in required_models:
        model_path = models_dir / model_file
        if model_path.exists():
            file_size = model_path.stat().st_size / 1024  # KB
            print(f"✓ {model_file:35s} (大小: {file_size:.2f} KB)")
        else:
            print(f"✗ {model_file:35s} - 不存在")
            missing_models.append(model_file)
    if missing_models:
        print(f"\n缺少 {len(missing_models)} 个模型文件")
        print("请运行: python src/train.py 来训练模型")
        return False
    else:
        print("\n✓ 所有模型文件已存在")
        return True
 def check_source_files():
    """检查源代码文件"""
    print("\n" + "=" * 60)
    print("检查源代码文件...")
    print("=" * 60)
    src_dir = Path("src")
    required_files = [
        "__init__.py",
        "data.py",
        "features.py",
        "train.py",
        "infer.py",
        "agent_app.py",
        "streamlit_app.py"
    ]
    missing_files = []
    for file_name in required_files:
        file_path = src_dir / file_name
        if file_path.exists():
            print(f"✓ src/{file_name}")
        else:
            print(f"✗ src/{file_name} - 不存在")
            missing_files.append(file_name)
    if missing_files:
        print(f"\n缺少 {len(missing_files)} 个源代码文件")
        return False
    else:
        print("\n✓ 所有源代码文件完整")
        return True
 def run_all_checks():
    """运行所有检查"""
    print("\n" + "=" * 60)
    print("信用卡欺诈检测系统 - 环境检查")
    print("=" * 60)
    results = {
        "Python 版本": check_python_version(),
        "依赖包": check_dependencies(),
        "数据文件": check_data_files(),
        "模型文件": check_model_files(),
        "源代码文件": check_source_files(),
    }
    print("\n" + "=" * 60)
    print("检查结果汇总")
    print("=" * 60)
    for check_name, result in results.items():
        status = "✓ 通过" if result else "✗ 失败"
        print(f"{check_name:15s}: {status}")
    all_passed = all(results.values())
    print("\n" + "=" * 60)
    if all_passed:
        print("✓ 所有检查通过！您可以运行系统了")
        print("\n运行命令:")
        print("  python src/agent_app.py")
    else:
        print("✗ 部分检查未通过，请根据上述提示解决问题")
        print("\n快速修复:")
        if not results["依赖包"]:
            print("  1. 安装依赖: pip install -r requirements.txt")
        if not results["数据文件"]:
            print("  2. 下载数据: 从 Kaggle 下载 creditcard.csv 到 data/ 目录")
        if not results["模型文件"]:
            print("  3. 训练模型: python src/train.py")
    print("=" * 60)
    return all_passed
 if __name__ == "__main__":
    success = run_all_checks()
    sys.exit(0 if success else 1)
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,9 @@
 numpy>=1.24.0
 polars>=0.19.0
 scikit-learn>=1.3.0
 imbalanced-learn>=0.11.0
 matplotlib>=3.7.0
 seaborn>=0.12.0
 joblib>=1.3.0
 pydantic>=2.0.0
 streamlit>=1.28.0
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,25 @@
 from setuptools import setup, find_packages
 setup(
    name="creditcard-fraud-detection",
    version="0.1.0",
    description="信用卡欺诈检测系统",
    packages=find_packages(),
    python_requires=">=3.10",
    install_requires=[
        "numpy>=1.24.0",
        "polars>=0.19.0",
        "scikit-learn>=1.3.0",
        "imbalanced-learn>=0.11.0",
        "matplotlib>=3.7.0",
        "seaborn>=0.12.0",
        "joblib>=1.3.0",
        "pydantic>=2.0.0",
        "streamlit>=1.28.0",
    ],
    entry_points={
        "console_scripts": [
            "train=src.train:train_and_evaluate",
        ],
    },
 )