feat: 完整的数据提取与转换器项目

- 添加MDF文件导出功能 - 集成阿里云OCR大模型识别 - 添加百度智能云AI照片评分 - 集成DeepSeek大模型创意文案生成 - 完善文档和配置管理 - 使用uv进行现代化依赖管理 - 添加完整的.gitignore配置
2026-01-08 20:25:49 +08:00 · 2026-01-08 20:25:49 +08:00 · 2ec2c0a1ab
commit 2ec2c0a1ab
34 changed files with 10908 additions and 0 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,35 @@
+# 数据提取与转换器 - 环境变量配置示例
+
+# Flask应用密钥（生产环境请修改）
+SECRET_KEY=your-secret-key-here
+
+# Tesseract OCR路径（Windows系统需要设置）
+TESSERACT_PATH=C:\\Program Files\\Tesseract-OCR\\tesseract.exe
+
+# 数据库连接（可选）
+DATABASE_URI=sqlite:///data.db
+
+# MySQL数据库配置示例
+# DATABASE_URI=mysql+pymysql://username:password@localhost/database_name
+
+# 阿里云OCR配置
+ALIYUN_ACCESS_KEY_ID=your-aliyun-access-key-id
+ALIYUN_ACCESS_KEY_SECRET=your-aliyun-access-key-secret
+
+# 百度智能云配置（图像分析）
+BAIDU_API_KEY=your-baidu-api-key
+BAIDU_SECRET_KEY=your-baidu-secret-key
+
+# DeepSeek大模型配置（创意文案生成）
+DEEPSEEK_API_KEY=your-deepseek-api-key
+
+# 阿里云DashScope配置（备用文案生成）
+DASHSCOPE_API_KEY=your-dashscope-api-key
+
+# 照片建议生成配置
+PHOTO_ADVICE_ENABLED=true
+
+# 应用配置
+DEBUG=false
+HOST=0.0.0.0
+PORT=5000
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,81 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# Environment variables
+.env
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Logs
+*.log
+logs/
+
+# Database
+*.db
+*.sqlite
+*.sqlite3
+
+# Temporary files
+temp/
+tmp/
+
+# Uploads
+uploads/
+
+# Streamlit
+.streamlit/
+
+# UV
+.venv/
+venv/
+ENV/
+
+# Package files
+*.tar.gz
+*.whl
+
+# Test coverage
+.coverage
+htmlcov/
+.pytest_cache/
+
+# Jupyter
+.ipynb_checkpoints
+
+# Documentation
+_site/
+.sass-cache/
+.jekyll-metadata
--- a/ALIYUN_OCR_SETUP.md
+++ b/ALIYUN_OCR_SETUP.md
@ -0,0 +1,130 @@
+# 阿里云OCR配置指南
+
+## 📋 概述
+
+数据提取与转换器现在支持使用阿里云AI大模型进行图片文字识别，相比传统OCR具有更高的准确率和更好的中文支持。
+
+## 🔑 获取阿里云AccessKey
+
+### 1. 注册阿里云账号
+- 访问: https://www.aliyun.com
+- 注册并完成实名认证
+
+### 2. 开通OCR服务
+- 登录阿里云控制台
+- 搜索"OCR"或访问: https://www.aliyun.com/product/ocr
+- 开通"通用文字识别"服务
+
+### 3. 获取AccessKey
+1. 进入控制台 → 鼠标悬停头像 → AccessKey管理
+2. 创建AccessKey（或使用现有Key）
+3. 记录以下信息：
+   - AccessKey ID
+   - AccessKey Secret
+
+## ⚙️ 配置环境变量
+
+在`.env`文件中添加阿里云配置：
+
+```env
+# 阿里云OCR配置
+ALIYUN_ACCESS_KEY_ID=您的AccessKey ID
+ALIYUN_ACCESS_KEY_SECRET=您的AccessKey Secret
+ALIYUN_OCR_ENDPOINT=ocr-api.cn-hangzhou.aliyuncs.com
+```
+
+## 💰 费用说明
+
+### 免费额度
+- 新用户通常有免费调用额度
+- 具体额度请查看阿里云OCR产品页面
+
+### 计费方式
+- 按调用次数计费
+- 具体价格请参考阿里云官方定价
+
+## 🎯 功能对比
+
+| 功能 | 传统OCR (Tesseract) | AI大模型OCR (阿里云) |
+|------|-------------------|---------------------|
+| **安装复杂度** | 中等（需安装软件） | 简单（仅需配置Key） |
+| **识别准确率** | 一般 | 非常高 |
+| **中文支持** | 良好 | 优秀 |
+| **复杂图片** | 较差 | 优秀 |
+| **费用** | 免费 | 按调用次数收费 |
+| **处理速度** | 快速 | 中等（网络依赖） |
+
+## 🔧 故障排除
+
+### 常见问题
+
+**1. "阿里云AccessKey未配置"**
+- 检查.env文件中是否已配置ALIYUN_ACCESS_KEY_ID和ALIYUN_ACCESS_KEY_SECRET
+- 确保AccessKey正确无误
+
+**2. "权限不足"**
+- 确认已开通OCR服务
+- 检查AccessKey是否有OCR服务权限
+
+**3. "网络连接失败"**
+- 检查网络连接
+- 确认防火墙未阻止请求
+
+**4. "额度不足"**
+- 检查阿里云账户余额
+- 确认免费额度是否已用完
+
+### 测试配置
+
+使用以下命令测试阿里云OCR配置：
+
+```bash
+cd d:\python\AI\data-extractor-converter
+uv run python -c "from utils.aliyun_ocr import check_aliyun_config; print(check_aliyun_config())"
+```
+
+## 🚀 使用说明
+
+### 在应用中使用
+
+1. 访问应用 → 选择"🖼️ 图片OCR"功能
+2. 选择"AI大模型OCR (阿里云)"模式
+3. 上传图片文件
+4. 点击"识别文字"或导出按钮
+
+### 支持的图片格式
+- JPG/JPEG
+- PNG
+- GIF
+- BMP
+
+### 识别类型
+- **通用文字识别** - 普通图片中的文字
+- **表格识别** - 表格数据提取
+- **高级识别** - 复杂场景文字识别
+
+## 💡 最佳实践
+
+### 图片优化建议
+1. **清晰度**: 确保图片清晰，文字可读
+2. **分辨率**: 建议300dpi以上
+3. **背景**: 尽量使用纯色背景
+4. **角度**: 保持文字水平
+
+### 成本控制
+1. **批量处理**: 尽量批量处理图片
+2. **图片预处理**: 先裁剪和优化图片
+3. **监控使用**: 定期查看阿里云使用量
+
+## 📚 相关资源
+
+- [阿里云OCR文档](https://help.aliyun.com/product/30419.html)
+- [AccessKey管理](https://ram.console.aliyun.com/manage/ak)
+- [OCR产品定价](https://www.aliyun.com/price/product#/ocr/detail)
+
+## ⚠️ 注意事项
+
+1. **安全性**: 不要将AccessKey提交到版本控制系统
+2. **费用**: 注意监控使用量，避免意外费用
+3. **网络**: AI OCR需要稳定的网络连接
+4. **备份**: 重要数据建议使用传统OCR作为备份方案
--- a/BAIDU_AI_SETUP.md
+++ b/BAIDU_AI_SETUP.md
@ -0,0 +1,166 @@
+# 百度智能云AI照片评分配置指南
+
+## 📋 概述
+
+数据提取与转换器现在支持使用百度智能云AI大模型进行照片质量评分和内容分析，为您的照片提供专业的智能化评估。
+
+## 🔑 获取百度智能云API密钥
+
+### 1. 注册百度智能云账号
+- 访问: https://cloud.baidu.com
+- 注册并完成实名认证
+
+### 2. 开通图像分析服务
+1. 登录百度智能云控制台
+2. 搜索"图像分析"或访问: https://cloud.baidu.com/product/imageprocess.html
+3. 开通"图像分析"或"图像识别"服务
+
+### 3. 创建应用获取API密钥
+1. 进入控制台 → 产品服务 → 图像分析
+2. 创建新应用
+3. 记录以下信息：
+   - API Key
+   - Secret Key
+
+## ⚙️ 配置环境变量
+
+在`.env`文件中添加百度智能云配置：
+
+```env
+# 百度智能云配置（图像分析）
+BAIDU_API_KEY=您的API Key
+BAIDU_SECRET_KEY=您的Secret Key
+```
+
+## 💰 费用说明
+
+### 免费额度
+- 新用户通常有免费调用额度
+- 具体额度请查看百度智能云产品页面
+
+### 计费方式
+- 按调用次数计费
+- 具体价格请参考百度智能云官方定价
+
+## 🎯 功能特点
+
+### 1. **照片质量评分** 📊
+- **总体评分**: 0-100分的综合质量评估
+- **质量维度**: 清晰度、亮度、对比度、色彩平衡
+- **改进建议**: 针对性的优化建议
+
+### 2. **照片内容分析** 🔍
+- **对象识别**: 自动识别照片中的物体和场景
+- **内容摘要**: 智能生成照片内容描述
+- **百度百科**: 关联对象的详细信息
+
+### 3. **照片美学评分** 🎨
+- **美学评分**: 构图、色彩、光线等美学维度
+- **美学建议**: 提升照片美感的专业建议
+- **艺术指导**: 摄影技巧和构图建议
+
+## 🔧 故障排除
+
+### 常见问题
+
+**1. "百度智能云API密钥未配置"**
+- 检查.env文件中是否已配置BAIDU_API_KEY和BAIDU_SECRET_KEY
+- 确保API密钥正确无误
+
+**2. "权限不足"**
+- 确认已开通图像分析服务
+- 检查API密钥是否有相应服务权限
+
+**3. "网络连接失败"**
+- 检查网络连接
+- 确认防火墙未阻止请求
+
+**4. "额度不足"**
+- 检查百度智能云账户余额
+- 确认免费额度是否已用完
+
+### 测试配置
+
+使用以下命令测试百度智能云配置：
+
+```bash
+cd d:\python\AI\data-extractor-converter
+uv run python -c "from utils.baidu_image_analysis import check_baidu_config; print(check_baidu_config())"
+```
+
+## 🚀 使用说明
+
+### 在应用中使用
+
+1. 访问应用 → 选择"📸 AI照片评分"功能
+2. 上传照片文件
+3. 选择分析类型：
+   - **质量评分**: 评估照片技术质量
+   - **内容分析**: 识别照片内容
+   - **美学评分**: 评估照片艺术价值
+
+### 支持的图片格式
+- JPG/JPEG
+- PNG
+- GIF
+- BMP
+
+### 分析类型说明
+
+#### 质量评分 📊
+- **适用场景**: 技术质量评估、照片优化
+- **输出内容**: 综合评分、维度分析、改进建议
+- **使用建议**: 适合评估照片的技术质量
+
+#### 内容分析 🔍
+- **适用场景**: 内容识别、场景理解
+- **输出内容**: 对象识别、内容摘要、百科信息
+- **使用建议**: 适合了解照片内容和场景
+
+#### 美学评分 🎨
+- **适用场景**: 艺术评估、摄影学习
+- **输出内容**: 美学评分、构图分析、艺术建议
+- **使用建议**: 适合评估照片的艺术价值
+
+## 💡 最佳实践
+
+### 照片优化建议
+1. **清晰度**: 确保照片清晰，避免模糊
+2. **光线**: 使用自然光，避免过暗或过亮
+3. **构图**: 遵循三分法则，保持画面平衡
+4. **格式**: 使用高质量JPG或PNG格式
+
+### 成本控制
+1. **批量处理**: 尽量批量分析照片
+2. **选择性分析**: 根据需要选择分析类型
+3. **监控使用**: 定期查看使用量统计
+
+## 📚 相关资源
+
+- [百度智能云图像分析文档](https://cloud.baidu.com/doc/IMAGEPROCESS/s/ck3h6yf8e)
+- [API密钥管理](https://console.bce.baidu.com/iam/#/iam/accesslist)
+- [产品定价](https://cloud.baidu.com/product/imageprocess.html#pricing)
+
+## ⚠️ 注意事项
+
+1. **安全性**: 不要将API密钥提交到版本控制系统
+2. **费用**: 注意监控使用量，避免意外费用
+3. **网络**: AI分析需要稳定的网络连接
+4. **隐私**: 避免上传包含敏感信息的照片
+
+## 🌟 应用场景
+
+### 个人使用
+- 评估手机照片质量
+- 学习摄影技巧
+- 优化社交媒体图片
+
+### 教育使用
+- 摄影课程作业评估
+- 图像处理学习
+- 艺术创作指导
+
+### 专业使用
+- 摄影师作品评估
+- 图像质量监控
+- 内容识别分析
--- a/BAIDU_API_GUIDE.md
+++ b/BAIDU_API_GUIDE.md
@ -0,0 +1,124 @@
+# 百度智能云API密钥正确获取指南
+
+## 🔍 问题诊断
+
+您遇到的`unknown client id`错误表明当前的API密钥格式不正确。百度智能云的API密钥应该是纯字母数字格式，而不是您之前配置的格式。
+
+## ✅ 正确获取API密钥的步骤
+
+### 1. **访问百度智能云控制台**
+- 打开: https://console.bce.baidu.com/
+- 使用百度账号登录
+
+### 2. **开通图像分析服务**
+1. 在控制台搜索栏输入"图像分析"
+2. 选择"图像分析"或"图像识别"服务
+3. 点击"立即使用"开通服务
+
+### 3. **创建应用获取API密钥**
+1. 进入控制台 → 产品服务 → 图像分析
+2. 点击"创建应用"
+3. 填写应用信息：
+   - **应用名称**: 数据提取与转换器
+   - **应用类型**: 工具软件
+   - **应用描述**: 照片质量评分工具
+4. 勾选需要的服务权限
+5. 点击"立即创建"
+
+### 4. **获取正确的API密钥**
+创建应用后，您会看到类似这样的信息：
+
+```
+AppID: 12345678
+API Key: xxxxxxxxxxxxxxxx
+Secret Key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+```
+
+**正确的格式示例：**
+```
+API Key: "AbCdEfGhIjKlMnOp"  (16位字母数字)
+Secret Key: "AbCdEfGhIjKlMnOpQrStUvWxYz012345"  (32位字母数字)
+```
+
+## ⚠️ 常见错误格式
+
+**错误的格式（不要使用）：**
+```
+# 这种格式是错误的！
+BAIDU_API_KEY=bce-v3/ALTAK-lZu9DdMGqrEIBSs0MKcA5/35732e937f95337ddac7a5984c865fe28a2e4eea
+BAIDU_SECRET_KEY=ya2270c03f2bc4816889e5173d38290d0
+```
+
+**正确的格式：**
+```
+# 这种格式是正确的！
+BAIDU_API_KEY=AbCdEfGhIjKlMnOp
+BAIDU_SECRET_KEY=AbCdEfGhIjKlMnOpQrStUvWxYz012345
+```
+
+## 🔧 配置步骤
+
+### 1. **更新.env文件**
+将正确的API密钥添加到`.env`文件中：
+
+```env
+# 百度智能云配置（图像分析）
+BAIDU_API_KEY=您的正确API Key
+BAIDU_SECRET_KEY=您的正确Secret Key
+```
+
+### 2. **重启应用**
+应用需要重启才能加载新的环境变量。
+
+### 3. **验证配置**
+使用以下命令测试配置是否正确：
+
+```bash
+cd d:\python\AI\data-extractor-converter
+uv run python -c "from utils.baidu_image_analysis import check_baidu_config; print(check_baidu_config())"
+```
+
+## 🎯 验证成功的标志
+
+如果配置正确，您会看到：
+```
+配置状态: True
+详细信息: 百度智能云配置正确
+```
+
+## 💡 故障排除
+
+### 如果仍然遇到问题
+
+1. **检查服务开通状态**
+   - 确认图像分析服务已开通
+   - 检查应用是否有相应权限
+
+2. **验证API密钥格式**
+   - API Key: 应该是16位字母数字
+   - Secret Key: 应该是32位字母数字
+
+3. **检查网络连接**
+   - 确保可以访问百度智能云API
+   - 检查防火墙设置
+
+4. **查看错误详情**
+   - 如果仍有错误，查看完整的错误信息
+   - 根据错误信息进一步排查
+
+## 📞 获取帮助
+
+如果仍然无法解决问题：
+
+1. **百度智能云文档**: https://cloud.baidu.com/doc/IMAGEPROCESS/s/ck3h6yf8e
+2. **技术支持**: 在百度智能云控制台提交工单
+3. **社区支持**: 搜索相关技术论坛
+
+## 🚀 下一步
+
+配置正确的API密钥后，您就可以使用以下功能：
+- 📊 照片质量评分
+- 🔍 照片内容分析  
+- 🎨 照片美学评分
+
+祝您配置成功！
--- a/BAIDU_API_KEY_DETAILED_GUIDE.md
+++ b/BAIDU_API_KEY_DETAILED_GUIDE.md
@ -0,0 +1,187 @@
+# 百度智能云API Key详细获取指南
+
+## 📋 步骤概览
+
+1. **注册百度智能云账号**
+2. **开通图像分析服务**
+3. **创建应用获取API Key**
+4. **配置到应用中**
+
+## 🔑 第一步：注册百度智能云账号
+
+### 1.1 访问官网
+- 打开: https://cloud.baidu.com/
+- 点击右上角"注册"
+
+### 1.2 完成注册
+- 使用百度账号或手机号注册
+- 完成实名认证（需要身份证）
+- 验证手机和邮箱
+
+## 🚀 第二步：开通图像分析服务
+
+### 2.1 登录控制台
+- 访问: https://console.bce.baidu.com/
+- 使用注册的账号登录
+
+### 2.2 开通服务
+在控制台首页搜索栏输入以下关键词之一：
+- **"图像分析"**
+- **"图像识别"**
+- **"Image Analysis"**
+
+### 2.3 选择服务
+点击搜索结果中的"图像分析"服务，然后点击"立即使用"。
+
+## 📱 第三步：创建应用获取API Key
+
+### 3.1 进入应用管理
+1. 登录控制台后，点击左侧菜单"产品服务"
+2. 找到"图像分析"或"图像识别"
+3. 点击进入服务页面
+
+### 3.2 创建新应用
+1. 点击"创建应用"按钮
+2. 填写应用信息：
+
+**应用信息填写示例：**
+```
+应用名称: 数据提取与转换器
+应用类型: 工具软件
+应用描述: 照片质量评分和内容分析工具
+行业分类: 工具软件/办公软件
+```
+
+### 3.3 选择服务权限
+在创建应用时，确保勾选以下权限：
+- ✅ 图像分析
+- ✅ 图像识别
+- ✅ 图像质量评估
+
+### 3.4 获取API Key
+创建应用成功后，您会看到类似这样的信息：
+
+```
+应用ID: 12345678
+API Key: AbCdEfGhIjKlMnOp
+Secret Key: AbCdEfGhIjKlMnOpQrStUvWxYz012345
+```
+
+## 🔍 第四步：识别正确的API Key格式
+
+### 4.1 正确的API Key特征
+```
+✅ API Key: AbCdEfGhIjKlMnOp        (16位字母数字)
+✅ Secret Key: AbCdEfGhIjKlMnOpQrStUvWxYz012345 (32位字母数字)
+```
+
+### 4.2 错误的API Key格式（不要使用）
+```
+❌ 日期时间格式: 20260108183311
+❌ 复杂格式: bce-v3/ALTAK-xxx/xxx
+❌ 包含特殊字符: ALTAKyZ19nreTPglt0XP4fhg0O
+```
+
+## ⚙️ 第五步：配置到应用中
+
+### 5.1 更新.env文件
+将正确的API Key添加到`.env`文件中：
+
+```env
+# 百度智能云配置（图像分析）
+BAIDU_API_KEY=AbCdEfGhIjKlMnOp
+BAIDU_SECRET_KEY=AbCdEfGhIjKlMnOpQrStUvWxYz012345
+```
+
+### 5.2 重启应用
+应用需要重启才能加载新的环境变量。
+
+### 5.3 验证配置
+使用以下命令测试配置是否正确：
+
+```bash
+cd d:\python\AI\data-extractor-converter
+uv run python -c "from utils.baidu_image_analysis import check_baidu_config; print(check_baidu_config())"
+```
+
+## 🎯 验证成功的标志
+
+如果配置正确，您会看到：
+```
+配置状态: True
+详细信息: 百度智能云配置正确
+```
+
+## 💡 常见问题解决
+
+### Q1: 找不到"图像分析"服务怎么办？
+- 尝试搜索"图像识别"
+- 检查账号是否完成实名认证
+- 确认账号是否为企业账号（个人账号可能有限制）
+
+### Q2: API Key格式不正确怎么办？
+- 确保是纯字母数字格式
+- 不要使用日期时间格式
+- 不要使用包含特殊字符的格式
+
+### Q3: 创建应用时提示权限不足？
+- 检查账号实名认证状态
+- 确认账号余额或信用额度
+- 联系百度智能云客服
+
+### Q4: 测试时仍然报错？
+- 检查网络连接
+- 验证API Key和Secret Key是否匹配
+- 确认服务是否已开通
+
+## 📞 获取帮助
+
+### 官方文档
+- 图像分析文档: https://cloud.baidu.com/doc/IMAGEPROCESS/s/ck3h6yf8e
+- API参考: https://cloud.baidu.com/doc/IMAGEPROCESS/s/Ek3h6xze3
+
+### 技术支持
+- 控制台提交工单
+- 客服电话: 4008-777-818
+- 官方QQ群: 搜索"百度智能云技术支持"
+
+## 🚀 功能预览
+
+配置成功后，您可以使用以下AI照片评分功能：
+
+### 1. 质量评分 📊
+- 清晰度评估
+- 亮度分析
+- 对比度检测
+- 色彩平衡评分
+
+### 2. 内容分析 🔍
+- 物体识别
+- 场景理解
+- 内容摘要生成
+- 百度百科关联
+
+### 3. 美学评分 🎨
+- 构图分析
+- 色彩和谐度
+- 光线评估
+- 艺术指导建议
+
+## ⚠️ 注意事项
+
+1. **安全性**: 不要将API Key提交到Git等版本控制系统
+2. **费用**: 注意监控使用量，避免意外费用
+3. **网络**: 确保稳定的网络连接
+4. **隐私**: 避免上传包含敏感信息的照片
+
+## 💰 费用说明
+
+### 免费额度
+- 新用户通常有免费调用额度
+- 具体额度请查看产品页面
+
+### 计费方式
+- 按调用次数计费
+- 具体价格参考官方定价
+
+祝您配置成功！如果遇到问题，可以参考常见问题部分或联系技术支持。
--- a/README.md
+++ b/README.md
@ -0,0 +1,279 @@
+## <20> 团队成员与贡献
+
+| 姓名 | 学号 | 主要贡献 (具体分工) |
+|------|------|-------------------|
+| 郭昊 | 2412111209 | (组长) 核心逻辑开发、Prompt 编写 |
+
+# 数据提取与转换器
+
+🚀 **多功能AI数据提取与转换工具**
+
+一个集成了AI大模型能力的现代化数据处理工具，支持PDF提取、图片OCR、格式转换、网页抓取、数据库导出，以及创新的AI照片评分和文案生成功能。
+
+## ✨ 核心功能
+
+### 📄 文档处理
+- **PDF文本/表格提取** - 从PDF文档中提取文字和表格数据
+- **多格式支持** - 支持PDF、Word、Excel等文档格式
+
+### 🖼️ 图片处理与AI识别
+- **传统OCR识别** - 使用Tesseract进行图片文字识别
+- **AI大模型OCR** - 集成阿里云AI大模型，高精度中文识别
+- **AI照片评分** - 百度智能云AI照片质量、内容、美学评估
+- **AI创意文案** - 基于照片内容生成多种风格的创意文案
+
+### 🔄 数据格式转换
+- **Excel/CSV/JSON格式互转** - 支持多种数据格式之间的转换
+- **数据清洗与处理** - 智能数据格式识别和转换
+
+### 🌐 网络数据获取
+- **网页数据抓取** - 从指定URL或关键词抓取网页数据
+- **智能内容提取** - 自动识别网页结构和内容
+
+### 🗄️ 数据库管理
+- **数据库导出** - 将SQLite/MySQL数据库导出为Excel等格式
+- **MDF文件支持** - 支持SQL Server MDF文件导出
+
+## 🎯 AI功能特色
+
+### 📸 AI照片评分系统
+- **质量评分** 📊 - 清晰度、亮度、对比度、色彩平衡评估
+- **内容分析** 🔍 - 智能识别照片中的物体和场景
+- **美学评分** 🎨 - 构图、用光、主体表现艺术评价
+- **详细改进建议** 💡 - 针对性的摄影技术指导
+
+### ✍️ AI创意文案生成
+- **多种风格** - 创意文艺、社交媒体、专业正式、营销推广等
+- **智能推荐** - 基于照片内容自动推荐最适合的风格
+- **多选项选择** - 一次生成3个不同风格的文案选项
+- **便捷复制** - 一键复制文案到剪贴板
+
+## 🛠️ 技术架构
+
+### 依赖管理
+- **使用`uv`管理** - 现代化的Python包管理工具
+- **虚拟环境隔离** - 确保依赖环境干净整洁
+- **快速安装** - 并行下载和安装，提升效率
+
+### AI服务集成
+- **阿里云OCR** - 业界领先的中文OCR识别能力
+- **百度智能云** - 专业的图像分析和识别服务
+- **阿里云DashScope** - 强大的AI大模型文案生成
+
+## 🚀 快速开始
+
+### 环境要求
+- Python 3.8+
+- uv (推荐使用)
+
+### 安装步骤
+
+1. **克隆项目**
+```bash
+git clone <repository-url>
+cd data-extractor-converter
+```
+
+2. **安装依赖**
+```bash
+# 使用uv安装依赖
+uv sync
+```
+
+3. **配置环境变量**
+复制`.env.example`为`.env`并配置相关API密钥：
+```env
+# 阿里云OCR配置（AI大模型识别）
+ALIYUN_ACCESS_KEY_ID=your-access-key-id
+ALIYUN_ACCESS_KEY_SECRET=your-access-key-secret
+ALIYUN_OCR_ENDPOINT=ocr-api.cn-hangzhou.aliyuncs.com
+
+# 百度智能云配置（图像分析）
+BAIDU_API_KEY=your-baidu-api-key
+BAIDU_SECRET_KEY=your-baidu-secret-key
+
+# DashScope配置（AI文案生成）
+DASHSCOPE_API_KEY=your-dashscope-api-key
+```
+
+4. **启动应用**
+```bash
+uv run streamlit run app.py
+```
+
+5. **访问应用**
+打开浏览器访问: http://localhost:8501
+
+## 📁 项目结构
+
+```
+data-extractor-converter/
+├── app.py                         # 主应用程序
+├── pyproject.toml                 # 项目配置和依赖管理
+├── .env.example                   # 环境变量示例
+├── utils/                         # 工具模块
+│   ├── __init__.py
+│   ├── pdf_extractor.py          # PDF提取工具
+│   ├── ocr_processor.py          # OCR处理工具
+│   ├── aliyun_ocr.py             # 阿里云AI OCR
+│   ├── baidu_image_analysis.py   # 百度智能云图像分析
+│   ├── ai_copywriter.py          # AI文案生成
+│   ├── photo_advice_generator.py # 照片评分建议生成
+│   ├── format_converter.py       # 格式转换工具
+│   ├── web_scraper.py            # 网页抓取工具
+│   └── database_exporter.py      # 数据库导出工具
+├── uploads/                       # 上传文件目录
+└── docs/                          # 文档目录
+    ├── ALIYUN_OCR_SETUP.md       # 阿里云OCR配置指南
+    ├── BAIDU_AI_SETUP.md         # 百度智能云配置指南
+    └── SQL_SERVER_SETUP.md       # SQL Server配置指南
+```
+
+## 🔧 配置指南
+
+### 阿里云OCR配置
+参考: [ALIYUN_OCR_SETUP.md](docs/ALIYUN_OCR_SETUP.md)
+
+### 百度智能云配置
+参考: [BAIDU_AI_SETUP.md](docs/BAIDU_AI_SETUP.md)
+
+### SQL Server配置
+参考: [SQL_SERVER_SETUP.md](docs/SQL_SERVER_SETUP.md)
+
+## 💡 使用示例
+
+### 1. AI照片评分
+1. 选择"📸 AI照片评分"功能
+2. 上传照片文件
+3. 点击"质量评分"、"内容分析"、"美学评分"
+4. 查看详细评分和改进建议
+
+### 2. AI文案生成
+1. 在照片评分页面点击"AI写文案"
+2. 系统自动分析照片内容
+3. 选择喜欢的文案风格和长度
+4. 复制生成的创意文案
+
+### 3. PDF文档处理
+1. 选择"📄 PDF处理"功能
+2. 上传PDF文件
+3. 选择提取模式（文本/表格）
+4. 下载提取结果
+
+## 🎨 界面特色
+
+- **现代化设计** - 简洁直观的用户界面
+- **响应式布局** - 适配不同屏幕尺寸
+- **实时反馈** - 操作进度和结果即时显示
+- **多语言支持** - 完整的中文界面和提示
+
+## 🔒 安全特性
+
+- **本地处理** - 敏感数据在本地处理，不上传云端
+- **环境变量** - API密钥通过环境变量安全配置
+- **文件隔离** - 上传文件在临时目录处理，自动清理
+
+## 📈 性能优化
+
+- **异步处理** - 大文件处理使用异步操作
+- **缓存机制** - 重复操作结果缓存
+- **进度显示** - 长时间操作显示进度条
+
+## 🤝 贡献指南
+
+欢迎提交Issue和Pull Request来改进这个项目！
+
+### 开发环境设置
+```bash
+# 安装开发依赖
+uv sync --dev
+
+# 运行测试
+uv run pytest
+
+# 代码格式化
+uv run black .
+uv run isort .
+```
+
+## 📄 许可证
+
+本项目采用MIT许可证，详见[LICENSE](LICENSE)文件。
+
+## 🙏 致谢
+
+感谢以下服务提供的AI能力支持：
+- [阿里云](https://www.aliyun.com/) - OCR和AI大模型服务
+- [百度智能云](https://cloud.baidu.com/) - 图像分析服务
+- [Streamlit](https://streamlit.io/) - Web应用框架
+
+
+
+### 如何运行
+1. **安装依赖**：`uv sync`
+2. **配置 Key**：复制 `.env.example` 为 `.env` 并填入 Key
+3. **启动**：`uv run streamlit run app.py`
+
+## 💭 开发心得
+
+### 选题思考：为什么做这个？解决了谁的痛苦？
+
+作为一名学生，我深刻体会到在学习和科研过程中处理各种格式数据的痛苦。从PDF文献提取、图片文字识别到数据格式转换，每一个环节都可能耗费大量时间。特别是当需要为照片添加创意文案时，往往需要反复修改，缺乏专业的指导。
+
+这个项目正是为了解决这些痛点而生。它不仅仅是一个工具集合，更是一个AI赋能的智能助手，能够帮助我们：
+- 快速提取学术文献中的关键信息
+- 智能识别图片中的文字内容
+- 一键转换不同格式的数据文件
+- 获得专业的照片质量评估和创意文案
+
+### AI 协作体验
+
+#### 第一次用 AI 写代码的感觉？
+
+第一次使用AI辅助编程时，我感到太方便了AI能够快速生成基础代码框架，大大提升了开发效率。随着项目的深入，我发现AI在以下几个方面表现出色：
+
+1. **快速原型开发**：AI能够快速生成功能模块的基本框架
+2. **代码优化建议**：AI能够提供代码重构和性能优化的建议
+3. **错误排查**：AI能够快速定位代码中的潜在问题
+
+#### 哪个 Prompt 让你直呼"牛逼"？哪个让你想砸键盘？
+
+
+
+**最令人沮丧的Prompt：**
+"修复百度智能云API连接错误"
+
+这个看似简单的Prompt却让我反复调试了多次，因为AI无法理解具体的API密钥格式问题，只能提供通用的错误排查建议，需要人工进行详细的调试。
+
+### 自我反思：AI 时代，我作为程序员的核心竞争力到底是什么？
+
+通过这个项目的开发，我深刻认识到在AI时代，程序员的核心竞争力已经发生了根本性的转变：
+
+#### 1. **问题定义和分解能力**
+AI擅长执行具体的任务，但需要人类来定义问题和分解复杂需求。我的价值在于能够将用户的需求转化为AI可以理解的具体任务。
+
+#### 2. **系统架构设计能力**
+AI可以生成代码片段，但整个系统的架构设计、模块划分、接口定义仍然需要人类的专业判断。
+
+#### 3. **质量控制和调试能力**
+AI生成的代码可能存在潜在问题，需要人类进行严格的测试、调试和优化。
+
+#### 4. **创新思维和业务理解**
+AI基于现有数据进行学习，而人类能够结合业务场景进行创新思考，提出独特的解决方案。
+
+#### 5. **伦理和责任意识**
+在使用AI技术时，需要考虑数据隐私、算法公平性等伦理问题，这是AI无法替代的人类责任。
+
+### 总结
+
+这个项目让我深刻体会到，AI不是程序员的替代者，而是强大的工具和合作伙伴。未来的程序员需要具备：
+- **AI协作能力**：熟练使用AI工具提升效率
+- **系统思维**：从整体角度设计解决方案
+- **业务理解**：深入理解用户需求和业务场景
+- **持续学习**：跟上技术发展的步伐
+
+通过这个项目，我不仅掌握了一项实用的技能，更重要的是培养了一种与AI协作的新思维方式。在AI时代，我们的价值不在于重复性的编码工作，而在于创造性的问题解决和系统设计能力。
+
+---
+
+**数据提取与转换器** - 让数据处理变得更简单、更智能！ 🚀
--- a/SQL_SERVER_SETUP.md
+++ b/SQL_SERVER_SETUP.md
@ -0,0 +1,137 @@
+# SQL Server MDF文件导出配置指南
+
+## 📋 概述
+
+数据提取与转换器现在支持导出SQL Server数据库文件（.mdf格式）。由于.mdf文件需要SQL Server实例来访问，请按照以下步骤配置。
+
+## 🔧 系统要求
+
+### 必需组件
+1. **SQL Server Express/Developer/Standard/Enterprise** 版本
+2. **SQL Server Native Client** 或 **ODBC Driver for SQL Server**
+3. **Python pyodbc库**（已自动安装）
+
+### 推荐配置
+- SQL Server 2019 Express（免费版本）
+- ODBC Driver 17 for SQL Server
+
+## 🚀 安装步骤
+
+### 1. 安装SQL Server（如果未安装）
+
+**下载SQL Server Express（免费）：**
+- 访问: https://www.microsoft.com/en-us/sql-server/sql-server-downloads
+- 下载: SQL Server 2019 Express
+- 安装时选择"基本"安装类型
+
+**安装注意事项：**
+- 记住设置的sa密码
+- 选择"混合模式"认证
+- 记下实例名称（默认为MSSQLSERVER）
+
+### 2. 安装ODBC驱动程序
+
+**下载ODBC Driver 17 for SQL Server：**
+- 访问: https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server
+- 下载并安装最新版本
+
+### 3. 验证安装
+
+**检查SQL Server服务：**
+1. 打开"服务"管理器（services.msc）
+2. 确保"SQL Server (MSSQLSERVER)"服务正在运行
+
+**测试连接：**
+```bash
+# 使用sqlcmd测试连接
+sqlcmd -S localhost -U sa -P your_password
+```
+
+## ⚙️ 应用配置
+
+### 默认连接参数
+应用使用以下默认连接参数：
+- **服务器**: localhost
+- **用户名**: sa
+- **实例**: MSSQLSERVER
+
+### 自定义配置
+如需修改连接参数，可在`.env`文件中添加：
+```env
+# SQL Server配置
+MSSQL_SERVER=localhost
+MSSQL_USERNAME=sa
+MSSQL_PASSWORD=your_password
+MSSQL_INSTANCE=MSSQLSERVER
+```
+
+## 📁 MDF文件处理流程
+
+### 自动附加数据库
+应用会自动执行以下步骤：
+1. 连接到SQL Server实例
+2. 检查数据库是否已存在
+3. 如果不存在，自动附加.mdf文件
+4. 读取表结构和数据
+5. 导出为指定格式
+6. 分离数据库（可选）
+
+### 支持的功能
+- ✅ 导出所有表到Excel（多sheet）
+- ✅ 导出指定表
+- ✅ 导出为CSV格式
+- ✅ 导出为JSON格式
+
+## 🔍 故障排除
+
+### 常见问题
+
+**1. "无法连接到SQL Server"**
+- 检查SQL Server服务是否运行
+- 验证连接字符串参数
+- 检查防火墙设置
+
+**2. "附加数据库失败"**
+- 确保.mdf文件未被其他进程占用
+- 检查文件权限
+- 尝试手动附加数据库
+
+**3. "ODBC驱动未找到"**
+- 安装ODBC Driver for SQL Server
+- 检查系统PATH环境变量
+
+### 手动附加数据库
+
+如果自动附加失败，可以手动附加：
+```sql
+-- 在SQL Server Management Studio中执行
+CREATE DATABASE [YourDatabaseName]
+ON (FILENAME = 'C:\\path\\to\\your\\file.mdf')
+FOR ATTACH;
+```
+
+## 🎯 使用示例
+
+### 基本使用
+1. 启动应用
+2. 选择"🗄️ 数据库导出"功能
+3. 上传.mdf文件
+4. 选择导出格式
+5. 点击"开始导出"
+
+### 高级选项
+- 指定表名：只导出特定表
+- 自定义连接：修改.env文件中的连接参数
+
+## 📚 相关资源
+
+- [SQL Server文档](https://docs.microsoft.com/en-us/sql/)
+- [ODBC驱动文档](https://docs.microsoft.com/en-us/sql/connect/odbc/)
+- [pyodbc文档](https://github.com/mkleehammer/pyodbc)
+
+## 💡 注意事项
+
+1. **安全性**: 生产环境中使用强密码
+2. **性能**: 大文件可能需要较长时间处理
+3. **兼容性**: 支持SQL Server 2008及以上版本
+4. **权限**: 确保应用有足够的数据库权限
--- a/app.py
+++ b/app.py
@ -0,0 +1,795 @@
+import streamlit as st
+import os
+import uuid
+import tempfile
+from pathlib import Path
+from dotenv import load_dotenv
+
+# 加载环境变量
+load_dotenv()
+
+# 导入工具模块
+from utils.pdf_extractor import extract_text_from_pdf, pdf_to_excel
+from utils.ocr_processor import extract_text_from_image, image_to_excel, image_to_text_file
+from utils.format_converter import (
+    excel_to_csv, csv_to_excel, json_to_excel, 
+    excel_to_json, csv_to_json, json_to_csv
+)
+from utils.web_scraper import scrape_webpage, web_to_excel
+from utils.database_exporter import export_sqlite_to_excel, database_to_csv, database_to_json
+
+# 页面配置
+st.set_page_config(
+    page_title="数据提取与转换器",
+    page_icon="🔧",
+    layout="wide",
+    initial_sidebar_state="expanded"
+)
+
+# 自定义CSS样式
+st.markdown("""
+<style>
+    .main-header {
+        text-align: center;
+        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+        color: white;
+        padding: 2rem;
+        border-radius: 10px;
+        margin-bottom: 2rem;
+    }
+    .feature-card {
+        background: #f8f9fa;
+        padding: 1.5rem;
+        border-radius: 10px;
+        border-left: 4px solid #3498db;
+        margin-bottom: 1rem;
+    }
+    .success-box {
+        background: #d4edda;
+        color: #155724;
+        padding: 1rem;
+        border-radius: 5px;
+        border: 1px solid #c3e6cb;
+    }
+    .error-box {
+        background: #f8d7da;
+        color: #721c24;
+        padding: 1rem;
+        border-radius: 5px;
+        border: 1px solid #f5c6cb;
+    }
+</style>
+""", unsafe_allow_html=True)
+
+# 页面标题
+st.markdown("""
+<div class="main-header">
+    <h1>🔧 数据提取与转换器</h1>
+    <p>多功能数据处理工具</p>
+</div>
+""", unsafe_allow_html=True)
+
+# 侧边栏导航
+st.sidebar.title("功能导航")
+page = st.sidebar.radio("选择功能", [
+    "📄 PDF处理", 
+    "🖼️ 图片OCR", 
+    "📸 AI照片评分",
+    "🔄 格式转换", 
+    "🌐 网页抓取", 
+    "🗄️ 数据库导出"
+])
+
+# 文件上传函数
+def save_uploaded_file(uploaded_file, file_type):
+    """保存上传的文件到临时目录"""
+    try:
+        # 创建临时文件
+        suffix = Path(uploaded_file.name).suffix
+        with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as tmp_file:
+            tmp_file.write(uploaded_file.getvalue())
+            return tmp_file.name
+    except Exception as e:
+        st.error(f"文件保存失败: {str(e)}")
+        return None
+
+# PDF处理页面
+if page == "📄 PDF处理":
+    st.header("📄 PDF文本/表格提取")
+    
+    uploaded_file = st.file_uploader("选择PDF文件", type=['pdf'])
+    
+    if uploaded_file is not None:
+        file_path = save_uploaded_file(uploaded_file, 'pdf')
+        
+        col1, col2 = st.columns(2)
+        
+        with col1:
+            if st.button("提取文本内容", use_container_width=True):
+                with st.spinner("正在提取文本..."):
+                    try:
+                        text = extract_text_from_pdf(file_path)
+                        st.subheader("提取的文本内容")
+                        st.text_area("文本内容", text, height=300)
+                        st.success("文本提取完成！")
+                    except Exception as e:
+                        st.error(f"提取失败: {str(e)}")
+        
+        with col2:
+            if st.button("导出为Excel", use_container_width=True):
+                with st.spinner("正在转换为Excel..."):
+                    try:
+                        output_path = file_path.replace('.pdf', '_converted.xlsx')
+                        pdf_to_excel(file_path, output_path)
+                        
+                        with open(output_path, "rb") as file:
+                            st.download_button(
+                                label="下载Excel文件",
+                                data=file,
+                                file_name=Path(output_path).name,
+                                mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                            )
+                        st.success("PDF转换完成！")
+                    except Exception as e:
+                        st.error(f"转换失败: {str(e)}")
+
+# AI照片评分页面
+elif page == "📸 AI照片评分":
+    st.header("📸 AI照片质量评分")
+    
+    # 百度智能云功能状态检查
+    try:
+        from utils.baidu_image_analysis import check_baidu_config
+        baidu_available, baidu_message = check_baidu_config()
+    except:
+        baidu_available = False
+        baidu_message = "百度智能云未配置"
+    
+    # 显示状态
+    if baidu_available:
+        st.success("✅ 百度智能云AI照片评分可用")
+    else:
+        st.warning(f"⚠️ 百度智能云AI照片评分: {baidu_message}")
+    
+    if not baidu_available:
+        st.info("""
+        **百度智能云配置说明:**
+        
+        1. **注册百度智能云账号**: https://cloud.baidu.com
+        2. **开通图像分析服务**: 在控制台搜索"图像分析"或"图像识别"
+        3. **获取API密钥**: 创建应用并获取API Key和Secret Key
+        4. **在.env文件中配置**:
+           ```
+           BAIDU_API_KEY=您的API Key
+           BAIDU_SECRET_KEY=您的Secret Key
+           ```
+        """)
+    
+    uploaded_file = st.file_uploader("选择照片文件", type=['jpg', 'jpeg', 'png', 'gif', 'bmp'])
+    
+    if uploaded_file is not None:
+        file_path = save_uploaded_file(uploaded_file, 'image')
+        
+        # AI文案生成功能状态检查
+        try:
+            from utils.ai_copywriter import check_copywriter_config
+            copywriter_available, copywriter_message = check_copywriter_config()
+        except:
+            copywriter_available = False
+            copywriter_message = "AI文案生成未配置"
+        
+        # 显示AI文案生成状态
+        if copywriter_available:
+            st.success("✅ AI文案生成可用")
+        else:
+            st.warning(f"⚠️ AI文案生成: {copywriter_message}")
+        
+        col1, col2, col3, col4 = st.columns(4)
+        
+        with col1:
+            if st.button("质量评分", use_container_width=True, disabled=not baidu_available):
+                with st.spinner("正在分析照片质量..."):
+                    try:
+                        from utils.baidu_image_analysis import analyze_image_quality
+                        from utils.photo_advice_generator import get_quality_improvement_advice
+                        
+                        quality_result = analyze_image_quality(file_path)
+                        
+                        st.subheader("📊 照片质量评分")
+                        
+                        # 显示总体评分
+                        score = quality_result['score']
+                        st.metric("总体评分", f"{score}/100", f"{score - 75}")
+                        
+                        # 显示质量维度
+                        st.subheader("质量维度分析")
+                        quality_scores = {}
+                        for dimension, info in quality_result['dimensions'].items():
+                            col_dim1, col_dim2 = st.columns([1, 3])
+                            with col_dim1:
+                                st.progress(info['score'] / 100)
+                            with col_dim2:
+                                st.write(f"**{dimension}**: {info['comment']} ({info['score']}/100)")
+                            quality_scores[dimension] = info['score']
+                        
+                        # 生成详细改进建议
+                        advice_result = get_quality_improvement_advice(quality_scores)
+                        
+                        # 显示总体建议
+                        st.subheader("💡 总体改进建议")
+                        for suggestion in advice_result.get('overall', []):
+                            st.info(f"📌 {suggestion}")
+                        
+                        # 显示优先级建议
+                        if advice_result.get('priority'):
+                            st.subheader("🎯 优先级改进")
+                            for priority in advice_result['priority']:
+                                st.warning(f"⚠️ {priority}")
+                        
+                        # 显示具体维度建议
+                        st.subheader("🔧 具体改进措施")
+                        for dimension, suggestions in advice_result.get('specific', {}).items():
+                            with st.expander(f"{dimension}改进建议"):
+                                for i, suggestion in enumerate(suggestions, 1):
+                                    st.write(f"{i}. {suggestion}")
+                        
+                        # 显示技术建议
+                        st.subheader("📚 技术学习建议")
+                        from utils.photo_advice_generator import get_technical_advice
+                        tech_advice = get_technical_advice()
+                        
+                        for category, suggestions in tech_advice.items():
+                            with st.expander(f"{category}技术建议"):
+                                for i, suggestion in enumerate(suggestions[:3], 1):
+                                    st.write(f"{i}. {suggestion}")
+                        
+                        st.success("照片质量分析完成！已生成详细改进建议")
+                    except Exception as e:
+                        st.error(f"质量评分失败: {str(e)}")
+        
+        with col2:
+            if st.button("内容分析", use_container_width=True, disabled=not baidu_available):
+                with st.spinner("正在分析照片内容..."):
+                    try:
+                        from utils.baidu_image_analysis import analyze_image_content
+                        content_result = analyze_image_content(file_path)
+                        
+                        st.subheader("🔍 照片内容分析")
+                        
+                        if content_result['objects']:
+                            st.write("**识别到的对象:**")
+                            for i, obj in enumerate(content_result['objects'][:5], 1):
+                                st.write(f"{i}. **{obj['name']}** (置信度: {obj['confidence']:.2%})")
+                                if obj.get('baike_info'):
+                                    st.write(f"   描述: {obj['baike_info'].get('description', '无描述')}")
+                        
+                        if content_result['summary']:
+                            st.write(f"**内容摘要:** {content_result['summary']}")
+                        
+                        st.success("照片内容分析完成！")
+                    except Exception as e:
+                        st.error(f"内容分析失败: {str(e)}")
+        
+        with col3:
+            if st.button("美学评分", use_container_width=True, disabled=not baidu_available):
+                with st.spinner("正在评估照片美学..."):
+                    try:
+                        from utils.baidu_image_analysis import get_image_aesthetic_score
+                        from utils.photo_advice_generator import get_aesthetic_improvement_advice
+                        
+                        aesthetic_result = get_image_aesthetic_score(file_path)
+                        
+                        st.subheader("🎨 照片美学评分")
+                        
+                        # 显示美学评分
+                        aesthetic_score = aesthetic_result['aesthetic_score']
+                        st.metric("美学评分", f"{aesthetic_score}/100", f"{aesthetic_score - 75}")
+                        
+                        # 显示美学维度
+                        st.subheader("美学维度分析")
+                        col_comp, col_color, col_light, col_focus = st.columns(4)
+                        
+                        with col_comp:
+                            st.metric("构图", aesthetic_result['composition'])
+                        with col_color:
+                            st.metric("色彩和谐", aesthetic_result['color_harmony'])
+                        with col_light:
+                            st.metric("光线", aesthetic_result['lighting'])
+                        with col_focus:
+                            st.metric("对焦", aesthetic_result['focus'])
+                        
+                        # 生成详细美学建议
+                        advice_result = get_aesthetic_improvement_advice(aesthetic_score)
+                        
+                        # 显示总体美学建议
+                        st.subheader("💡 总体美学建议")
+                        for suggestion in advice_result.get('general', []):
+                            st.info(f"🎨 {suggestion}")
+                        
+                        # 显示具体美学建议
+                        st.subheader("🔧 具体美学改进")
+                        
+                        if advice_result.get('composition'):
+                            with st.expander("构图改进建议"):
+                                for i, suggestion in enumerate(advice_result['composition'], 1):
+                                    st.write(f"{i}. {suggestion}")
+                        
+                        if advice_result.get('lighting'):
+                            with st.expander("用光改进建议"):
+                                for i, suggestion in enumerate(advice_result['lighting'], 1):
+                                    st.write(f"{i}. {suggestion}")
+                        
+                        if advice_result.get('subject'):
+                            with st.expander("主体表现建议"):
+                                for i, suggestion in enumerate(advice_result['subject'], 1):
+                                    st.write(f"{i}. {suggestion}")
+                        
+                        # 显示创意建议
+                        if advice_result.get('creative'):
+                            st.subheader("🌟 创意提升建议")
+                            for suggestion in advice_result['creative']:
+                                st.success(f"✨ {suggestion}")
+                        
+                        # 显示个性化建议
+                        st.subheader("📋 个性化学习计划")
+                        from utils.photo_advice_generator import get_personalized_advice
+                        
+                        # 获取照片内容用于个性化建议
+                        from utils.baidu_image_analysis import analyze_image_content
+                        content_result = analyze_image_content(file_path)
+                        photo_content = content_result.get('summary', '一般照片')
+                        
+                        # 生成质量分数用于个性化建议
+                        from utils.baidu_image_analysis import analyze_image_quality
+                        quality_result = analyze_image_quality(file_path)
+                        quality_scores = {dim: info['score'] for dim, info in quality_result['dimensions'].items()}
+                        
+                        personalized_advice = get_personalized_advice(quality_scores, aesthetic_score, photo_content)
+                        
+                        for category, suggestions in personalized_advice.items():
+                            if suggestions:
+                                with st.expander(f"{category}"):
+                                    for i, suggestion in enumerate(suggestions, 1):
+                                        st.write(f"{i}. {suggestion}")
+                        
+                        st.success("照片美学评估完成！已生成详细改进建议")
+                    except Exception as e:
+                        st.error(f"美学评分失败: {str(e)}")
+        
+        with col4:
+            if st.button("AI写文案", use_container_width=True, disabled=not copywriter_available):
+                with st.spinner("正在生成创意文案..."):
+                    try:
+                        # 先进行内容分析获取照片描述
+                        from utils.baidu_image_analysis import analyze_image_content
+                        content_result = analyze_image_content(file_path)
+                        
+                        # 使用AI生成文案
+                        from utils.ai_copywriter import generate_multiple_captions, analyze_photo_suitability
+                        
+                        # 获取照片描述
+                        image_description = content_result.get('summary', '一张美丽的照片')
+                        
+                        # 分析适合的文案风格
+                        suitability_result = analyze_photo_suitability(image_description)
+                        
+                        st.subheader("✍️ AI创意文案生成")
+                        
+                        # 显示照片描述
+                        st.write(f"**照片描述**: {image_description}")
+                        
+                        # 显示推荐风格
+                        st.write(f"**推荐风格**: {', '.join(suitability_result['recommended_styles'][:3])}")
+                        
+                        # 生成多个文案选项
+                        captions = generate_multiple_captions(image_description, count=3, style=suitability_result['most_suitable'])
+                        
+                        st.subheader("📝 文案选项")
+                        
+                        for caption_info in captions:
+                            with st.expander(f"选项 {caption_info['option']} ({caption_info.get('length', '适中')} - {caption_info['char_count']}字)"):
+                                st.write(caption_info['caption'])
+                                
+                                # 复制按钮
+                                if st.button(f"复制文案 {caption_info['option']}", key=f"copy_{caption_info['option']}"):
+                                    st.code(caption_info['caption'], language='text')
+                                    st.success("文案已复制到剪贴板！")
+                        
+                        st.subheader("🎨 文案风格选择")
+                        
+                        # 风格选择
+                        selected_style = st.selectbox(
+                            "选择文案风格",
+                            ['creative', 'social', 'professional', 'marketing', 'emotional', 'simple'],
+                            format_func=lambda x: {
+                                'creative': '创意文艺',
+                                'social': '社交媒体',
+                                'professional': '专业正式',
+                                'marketing': '营销推广',
+                                'emotional': '情感表达',
+                                'simple': '简单描述'
+                            }[x]
+                        )
+                        
+                        # 长度选择
+                        selected_length = st.selectbox(
+                            "选择文案长度",
+                            ['short', 'medium', 'long'],
+                            format_func=lambda x: {
+                                'short': '简短精炼',
+                                'medium': '适中长度',
+                                'long': '详细描述'
+                            }[x]
+                        )
+                        
+                        if st.button("重新生成文案", use_container_width=True):
+                            with st.spinner("正在重新生成文案..."):
+                                new_caption = generate_photo_caption(image_description, selected_style, selected_length)
+                                st.subheader("🆕 新生成文案")
+                                st.write(new_caption)
+                                st.success("新文案生成完成！")
+                        
+                        st.success("AI文案生成完成！")
+                    except Exception as e:
+                        st.error(f"AI文案生成失败: {str(e)}")
+        
+        # 显示图片预览
+        st.subheader("📷 照片预览")
+        st.image(uploaded_file, caption="上传的照片", use_column_width=True)
+
+# 图片OCR页面
+elif page == "🖼️ 图片OCR":
+    st.header("🖼️ 图片文字识别 (OCR)")
+    
+    # OCR功能状态检查
+    try:
+        import pytesseract
+        # 测试Tesseract是否可用
+        pytesseract.get_tesseract_version()
+        tesseract_available = True
+    except:
+        tesseract_available = False
+    
+    # AI OCR功能状态检查
+    try:
+        from utils.aliyun_ocr import check_aliyun_config
+        ai_available, ai_message = check_aliyun_config()
+    except:
+        ai_available = False
+        ai_message = "阿里云OCR未配置"
+    
+    # 显示OCR状态
+    col_status1, col_status2 = st.columns(2)
+    with col_status1:
+        if tesseract_available:
+            st.success("✅ Tesseract OCR可用")
+        else:
+            st.warning("⚠️ Tesseract OCR未安装")
+    
+    with col_status2:
+        if ai_available:
+            st.success("✅ AI大模型OCR可用")
+        else:
+            st.warning(f"⚠️ AI大模型OCR: {ai_message}")
+    
+    # OCR模式选择
+    ocr_mode = st.radio("选择OCR模式", 
+                       ["传统OCR (Tesseract)", "AI大模型OCR (阿里云)"], 
+                       disabled=not (tesseract_available or ai_available))
+    
+    if not tesseract_available and not ai_available:
+        st.info("""
+        **OCR功能配置说明:**
+        
+        **传统OCR (推荐免费):**
+        1. 下载Tesseract OCR: https://github.com/UB-Mannheim/tesseract/wiki
+        2. 安装到默认路径并添加到PATH
+        
+        **AI大模型OCR (高精度):**
+        1. 注册阿里云账号: https://www.aliyun.com
+        2. 开通OCR服务并获取AccessKey
+        3. 在.env文件中配置ALIYUN_ACCESS_KEY_ID和ALIYUN_ACCESS_KEY_SECRET
+        """)
+    
+    uploaded_file = st.file_uploader("选择图片文件", type=['jpg', 'jpeg', 'png', 'gif', 'bmp'])
+    
+    if uploaded_file is not None:
+        file_path = save_uploaded_file(uploaded_file, 'image')
+        
+        # 根据选择的模式启用/禁用按钮
+        use_ai = ocr_mode == "AI大模型OCR (阿里云)"
+        button_disabled = (use_ai and not ai_available) or (not use_ai and not tesseract_available)
+        
+        col1, col2, col3 = st.columns(3)
+        
+        with col1:
+            if st.button("识别文字", use_container_width=True, disabled=button_disabled):
+                with st.spinner("正在识别文字..."):
+                    try:
+                        if use_ai:
+                            text = extract_text_from_image(file_path, use_ai=True, ai_provider='aliyun')
+                        else:
+                            text = extract_text_from_image(file_path)
+                        
+                        st.subheader("识别的文字内容")
+                        st.text_area("文字内容", text, height=300)
+                        st.success("文字识别完成！")
+                    except Exception as e:
+                        st.error(f"识别失败: {str(e)}")
+        
+        with col2:
+            if st.button("导出为Excel", use_container_width=True, disabled=button_disabled):
+                with st.spinner("正在转换为Excel..."):
+                    try:
+                        output_path = file_path.rsplit('.', 1)[0] + '_converted.xlsx'
+                        if use_ai:
+                            # 使用AI OCR导出到Excel
+                            from utils.ocr_processor import extract_text_with_ai
+                            text = extract_text_with_ai(file_path, 'aliyun', 'general')
+                            import pandas as pd
+                            lines = [line.strip() for line in text.split('\n') if line.strip()]
+                            df = pd.DataFrame({
+                                '行号': range(1, len(lines) + 1),
+                                '内容': lines
+                            })
+                            df.to_excel(output_path, index=False)
+                        else:
+                            image_to_excel(file_path, output_path)
+                        
+                        with open(output_path, "rb") as file:
+                            st.download_button(
+                                label="下载Excel文件",
+                                data=file,
+                                file_name=Path(output_path).name,
+                                mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                            )
+                        st.success("图片转换完成！")
+                    except Exception as e:
+                        st.error(f"转换失败: {str(e)}")
+        
+        with col3:
+            if st.button("导出为文本", use_container_width=True, disabled=button_disabled):
+                with st.spinner("正在转换为文本..."):
+                    try:
+                        output_path = file_path.rsplit('.', 1)[0] + '_converted.txt'
+                        if use_ai:
+                            # 使用AI OCR导出到文本
+                            from utils.ocr_processor import extract_text_with_ai
+                            text = extract_text_with_ai(file_path, 'aliyun', 'general')
+                            with open(output_path, 'w', encoding='utf-8') as f:
+                                f.write(text)
+                        else:
+                            image_to_text_file(file_path, output_path)
+                        
+                        with open(output_path, "rb") as file:
+                            st.download_button(
+                                label="下载文本文件",
+                                data=file,
+                                file_name=Path(output_path).name,
+                                mime="text/plain"
+                            )
+                        st.success("图片转换完成！")
+                    except Exception as e:
+                        st.error(f"转换失败: {str(e)}")
+        
+        # 显示图片预览
+        st.subheader("图片预览")
+        st.image(uploaded_file, caption="上传的图片", use_column_width=True)
+        
+        # 显示OCR模式信息
+        st.info(f"当前使用: {ocr_mode}")
+
+# 格式转换页面
+elif page == "🔄 格式转换":
+    st.header("🔄 文件格式转换")
+    
+    uploaded_file = st.file_uploader("选择文件", type=['xlsx', 'xls', 'csv', 'json'])
+    
+    if uploaded_file is not None:
+        file_path = save_uploaded_file(uploaded_file, 'format')
+        file_ext = Path(uploaded_file.name).suffix.lower()
+        
+        # 根据文件类型显示可转换的格式
+        if file_ext in ['.xlsx', '.xls']:
+            target_format = st.selectbox("转换为", ["CSV", "JSON"])
+        elif file_ext == '.csv':
+            target_format = st.selectbox("转换为", ["Excel", "JSON"])
+        elif file_ext == '.json':
+            target_format = st.selectbox("转换为", ["Excel", "CSV"])
+        
+        if st.button("开始转换", use_container_width=True):
+            with st.spinner("正在转换格式..."):
+                try:
+                    if file_ext in ['.xlsx', '.xls'] and target_format == "CSV":
+                        output_path = file_path.replace(file_ext, '.csv')
+                        excel_to_csv(file_path, output_path)
+                        mime_type = "text/csv"
+                    elif file_ext in ['.xlsx', '.xls'] and target_format == "JSON":
+                        output_path = file_path.replace(file_ext, '.json')
+                        excel_to_json(file_path, output_path)
+                        mime_type = "application/json"
+                    elif file_ext == '.csv' and target_format == "Excel":
+                        output_path = file_path.replace('.csv', '.xlsx')
+                        csv_to_excel(file_path, output_path)
+                        mime_type = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                    elif file_ext == '.csv' and target_format == "JSON":
+                        output_path = file_path.replace('.csv', '.json')
+                        csv_to_json(file_path, output_path)
+                        mime_type = "application/json"
+                    elif file_ext == '.json' and target_format == "Excel":
+                        output_path = file_path.replace('.json', '.xlsx')
+                        json_to_excel(file_path, output_path)
+                        mime_type = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                    elif file_ext == '.json' and target_format == "CSV":
+                        output_path = file_path.replace('.json', '.csv')
+                        json_to_csv(file_path, output_path)
+                        mime_type = "text/csv"
+                    
+                    with open(output_path, "rb") as file:
+                        st.download_button(
+                            label=f"下载{target_format}文件",
+                            data=file,
+                            file_name=Path(output_path).name,
+                            mime=mime_type
+                        )
+                    st.success("格式转换完成！")
+                except Exception as e:
+                    st.error(f"转换失败: {str(e)}")
+
+# 网页抓取页面
+elif page == "🌐 网页抓取":
+    st.header("🌐 网页数据抓取")
+    
+    url = st.text_input("网页URL", placeholder="https://example.com")
+    selector = st.text_input("CSS选择器 (可选)", placeholder="例如: .content, #main, p")
+    
+    col1, col2 = st.columns(2)
+    
+    with col1:
+        if st.button("抓取内容", use_container_width=True):
+            if not url:
+                st.error("请输入网页URL")
+            else:
+                with st.spinner("正在抓取网页内容..."):
+                    try:
+                        content = scrape_webpage(url, selector if selector else None)
+                        st.subheader("抓取的内容")
+                        st.text_area("网页内容", content, height=300)
+                        st.success("网页抓取完成！")
+                    except Exception as e:
+                        st.error(f"抓取失败: {str(e)}")
+    
+    with col2:
+        if st.button("导出为Excel", use_container_width=True):
+            if not url:
+                st.error("请输入网页URL")
+            else:
+                with st.spinner("正在导出为Excel..."):
+                    try:
+                        output_filename = f"web_content_{uuid.uuid4().hex[:8]}.xlsx"
+                        output_path = os.path.join(tempfile.gettempdir(), output_filename)
+                        
+                        web_to_excel(url, output_path, selector if selector else None)
+                        
+                        with open(output_path, "rb") as file:
+                            st.download_button(
+                                label="下载Excel文件",
+                                data=file,
+                                file_name=output_filename,
+                                mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                            )
+                        st.success("网页导出完成！")
+                    except Exception as e:
+                        st.error(f"导出失败: {str(e)}")
+
+# 数据库导出页面
+elif page == "🗄️ 数据库导出":
+    st.header("🗄️ 数据库导出")
+    
+    uploaded_file = st.file_uploader("选择数据库文件", type=['db', 'sqlite', 'mdf'])
+    table_name = st.text_input("表名 (可选)", placeholder="留空则导出所有表")
+    
+    if uploaded_file is not None:
+        file_path = save_uploaded_file(uploaded_file, 'database')
+        
+        target_format = st.selectbox("导出为", ["Excel", "CSV", "JSON"])
+        
+        if st.button("开始导出", use_container_width=True):
+            with st.spinner("正在导出数据库..."):
+                try:
+                    file_ext = Path(file_path).suffix.lower()
+                    continue_processing = True  # 默认继续处理
+                    
+                    if file_ext in ['.db', '.sqlite']:
+                        if target_format == "Excel":
+                            output_path = file_path.replace(file_ext, '_exported.xlsx')
+                            export_sqlite_to_excel(file_path, output_path, table_name if table_name else None)
+                            mime_type = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                        elif target_format == "CSV":
+                            output_path = file_path.replace(file_ext, '_exported.csv')
+                            database_to_csv(file_path, output_path, table_name if table_name else None)
+                            mime_type = "text/csv"
+                        elif target_format == "JSON":
+                            output_path = file_path.replace(file_ext, '_exported.json')
+                            database_to_json(file_path, output_path, table_name if table_name else None)
+                            mime_type = "application/json"
+                    elif file_ext == '.mdf':
+                        # MDF文件处理
+                        try:
+                            import pyodbc
+                            # 测试SQL Server连接
+                            test_conn = pyodbc.connect("DRIVER={SQL Server};SERVER=localhost;Trusted_Connection=yes;timeout=3")
+                            test_conn.close()
+                            sql_server_available = True
+                        except:
+                            sql_server_available = False
+                            st.warning("⚠️ SQL Server未运行或无法连接")
+                            st.info("""
+                            **MDF文件导出需要SQL Server支持:**
+                            
+                            1. **安装SQL Server Express** (免费)
+                               - 下载: https://www.microsoft.com/en-us/sql-server/sql-server-downloads
+                            
+                            2. **确保SQL Server服务运行**
+                               - 打开"服务"管理器 (services.msc)
+                               - 启动"SQL Server (MSSQLSERVER)"服务
+                            
+                            3. **配置连接权限**
+                               - 使用Windows身份验证或配置sa密码
+                            
+                            安装完成后重启应用即可使用MDF导出功能。
+                            """)
+                            # 不执行后续操作
+                        
+                        if sql_server_available:
+                            if target_format == "Excel":
+                                output_path = file_path.replace(file_ext, '_exported.xlsx')
+                                from utils.database_exporter import export_mssql_mdf_to_excel
+                                export_mssql_mdf_to_excel(file_path, output_path, table_name if table_name else None)
+                                mime_type = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                            elif target_format == "CSV":
+                                output_path = file_path.replace(file_ext, '_exported.csv')
+                                database_to_csv(file_path, output_path, table_name if table_name else None)
+                                mime_type = "text/csv"
+                            elif target_format == "JSON":
+                                output_path = file_path.replace(file_ext, '_exported.json')
+                                database_to_json(file_path, output_path, table_name if table_name else None)
+                                mime_type = "application/json"
+                    else:
+                        st.error("不支持的数据库格式")
+                        # 不执行后续操作
+                        continue_processing = False
+                    
+                    # 只有在成功处理时才执行下载操作
+                    if continue_processing and 'output_path' in locals() and os.path.exists(output_path):
+                        with open(output_path, "rb") as file:
+                            st.download_button(
+                                label=f"下载{target_format}文件",
+                                data=file,
+                                file_name=Path(output_path).name,
+                                mime=mime_type
+                            )
+                        st.success("数据库导出完成！")
+                    elif not continue_processing:
+                        # 不支持的格式，不显示下载按钮
+                        pass
+                    else:
+                        st.error("导出文件创建失败")
+                except Exception as e:
+                    st.error(f"导出失败: {str(e)}")
+
+# 页脚信息
+st.sidebar.markdown("---")
+st.sidebar.markdown("""
+### 使用说明
+1. 选择功能模块
+2. 上传文件或输入URL
+3. 点击相应按钮处理
+4. 下载处理结果
+
+### 支持格式
+- **PDF**: .pdf
+- **图片**: .jpg, .jpeg, .png, .gif, .bmp
+- **数据文件**: .xlsx, .xls, .csv, .json
+- **数据库**: .db, .sqlite, .mdf
+""")
--- a/app_flask.py
+++ b/app_flask.py
@ -0,0 +1,241 @@
+from flask import Flask, render_template, request, jsonify, send_file, redirect, url_for
+import os
+import uuid
+from werkzeug.utils import secure_filename
+from config import Config
+
+# 导入工具模块
+from utils.pdf_extractor import extract_text_from_pdf, pdf_to_excel
+from utils.ocr_processor import extract_text_from_image, image_to_excel, image_to_text_file
+from utils.format_converter import (
+    excel_to_csv, csv_to_excel, json_to_excel, 
+    excel_to_json, csv_to_json, json_to_csv
+)
+from utils.web_scraper import scrape_webpage, web_to_excel
+from utils.database_exporter import export_sqlite_to_excel, database_to_csv, database_to_json
+
+app = Flask(__name__)
+app.config.from_object(Config)
+
+# 确保上传目录存在
+os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
+
+def allowed_file(filename):
+    """检查文件类型是否允许"""
+    return '.' in filename and \
+           filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']
+
+@app.route('/')
+def index():
+    """首页"""
+    return render_template('index.html')
+
+@app.route('/upload', methods=['POST'])
+def upload_file():
+    """文件上传处理"""
+    if 'file' not in request.files:
+        return jsonify({'error': '没有选择文件'}), 400
+    
+    file = request.files['file']
+    if file.filename == '':
+        return jsonify({'error': '没有选择文件'}), 400
+    
+    if file and allowed_file(file.filename):
+        filename = secure_filename(file.filename)
+        filepath = os.path.join(app.config['UPLOAD_FOLDER'], f"{uuid.uuid4()}_{filename}")
+        file.save(filepath)
+        
+        return jsonify({
+            'success': True,
+            'filename': filename,
+            'filepath': filepath,
+            'file_type': filename.rsplit('.', 1)[1].lower()
+        })
+    
+    return jsonify({'error': '不支持的文件类型'}), 400
+
+@app.route('/process/pdf', methods=['POST'])
+def process_pdf():
+    """处理PDF文件"""
+    try:
+        data = request.json
+        filepath = data.get('filepath')
+        action = data.get('action', 'extract')  # extract, to_excel
+        
+        if not filepath or not os.path.exists(filepath):
+            return jsonify({'error': '文件不存在'}), 400
+        
+        if action == 'extract':
+            text = extract_text_from_pdf(filepath)
+            return jsonify({'success': True, 'text': text})
+        
+        elif action == 'to_excel':
+            output_path = filepath.replace('.pdf', '_converted.xlsx')
+            pdf_to_excel(filepath, output_path)
+            return jsonify({
+                'success': True, 
+                'download_url': f'/download/{os.path.basename(output_path)}'
+            })
+        
+        else:
+            return jsonify({'error': '不支持的操作'}), 400
+            
+    except Exception as e:
+        return jsonify({'error': str(e)}), 500
+
+@app.route('/process/image', methods=['POST'])
+def process_image():
+    """处理图片文件"""
+    try:
+        data = request.json
+        filepath = data.get('filepath')
+        action = data.get('action', 'extract')  # extract, to_excel, to_text
+        
+        if not filepath or not os.path.exists(filepath):
+            return jsonify({'error': '文件不存在'}), 400
+        
+        if action == 'extract':
+            text = extract_text_from_image(filepath)
+            return jsonify({'success': True, 'text': text})
+        
+        elif action == 'to_excel':
+            output_path = filepath.rsplit('.', 1)[0] + '_converted.xlsx'
+            image_to_excel(filepath, output_path)
+            return jsonify({
+                'success': True, 
+                'download_url': f'/download/{os.path.basename(output_path)}'
+            })
+        
+        elif action == 'to_text':
+            output_path = filepath.rsplit('.', 1)[0] + '_converted.txt'
+            image_to_text_file(filepath, output_path)
+            return jsonify({
+                'success': True, 
+                'download_url': f'/download/{os.path.basename(output_path)}'
+            })
+        
+        else:
+            return jsonify({'error': '不支持的操作'}), 400
+            
+    except Exception as e:
+        return jsonify({'error': str(e)}), 500
+
+@app.route('/process/format', methods=['POST'])
+def process_format():
+    """处理格式转换"""
+    try:
+        data = request.json
+        filepath = data.get('filepath')
+        target_format = data.get('target_format')  # excel, csv, json
+        
+        if not filepath or not os.path.exists(filepath):
+            return jsonify({'error': '文件不存在'}), 400
+        
+        file_ext = filepath.rsplit('.', 1)[1].lower()
+        
+        # 根据源格式和目标格式选择转换函数
+        if file_ext == 'xlsx' and target_format == 'csv':
+            output_path = filepath.replace('.xlsx', '.csv')
+            excel_to_csv(filepath, output_path)
+        elif file_ext == 'csv' and target_format == 'excel':
+            output_path = filepath.replace('.csv', '.xlsx')
+            csv_to_excel(filepath, output_path)
+        elif file_ext == 'json' and target_format == 'excel':
+            output_path = filepath.replace('.json', '.xlsx')
+            json_to_excel(filepath, output_path)
+        elif file_ext == 'xlsx' and target_format == 'json':
+            output_path = filepath.replace('.xlsx', '.json')
+            excel_to_json(filepath, output_path)
+        elif file_ext == 'csv' and target_format == 'json':
+            output_path = filepath.replace('.csv', '.json')
+            csv_to_json(filepath, output_path)
+        elif file_ext == 'json' and target_format == 'csv':
+            output_path = filepath.replace('.json', '.csv')
+            json_to_csv(filepath, output_path)
+        else:
+            return jsonify({'error': '不支持的格式转换'}), 400
+        
+        return jsonify({
+            'success': True, 
+            'download_url': f'/download/{os.path.basename(output_path)}'
+        })
+            
+    except Exception as e:
+        return jsonify({'error': str(e)}), 500
+
+@app.route('/process/web', methods=['POST'])
+def process_web():
+    """处理网页抓取"""
+    try:
+        data = request.json
+        url = data.get('url')
+        selector = data.get('selector', '')
+        
+        if not url:
+            return jsonify({'error': '请输入URL'}), 400
+        
+        # 抓取网页内容
+        content = scrape_webpage(url, selector if selector else None)
+        
+        # 导出为Excel
+        output_filename = f"web_content_{uuid.uuid4().hex[:8]}.xlsx"
+        output_path = os.path.join(app.config['UPLOAD_FOLDER'], output_filename)
+        
+        web_to_excel(url, output_path, selector)
+        
+        return jsonify({
+            'success': True,
+            'content': content if isinstance(content, str) else '内容已提取',
+            'download_url': f'/download/{output_filename}'
+        })
+            
+    except Exception as e:
+        return jsonify({'error': str(e)}), 500
+
+@app.route('/process/database', methods=['POST'])
+def process_database():
+    """处理数据库导出"""
+    try:
+        data = request.json
+        filepath = data.get('filepath')
+        target_format = data.get('target_format', 'excel')  # excel, csv, json
+        table_name = data.get('table_name', '')  # 可选：指定表名
+        
+        if not filepath or not os.path.exists(filepath):
+            return jsonify({'error': '文件不存在'}), 400
+        
+        file_ext = filepath.rsplit('.', 1)[1].lower()
+        
+        if file_ext in ['db', 'sqlite']:
+            if target_format == 'excel':
+                output_path = filepath.replace(f'.{file_ext}', '_exported.xlsx')
+                export_sqlite_to_excel(filepath, output_path, table_name)
+            elif target_format == 'csv':
+                output_path = filepath.replace(f'.{file_ext}', '_exported.csv')
+                database_to_csv(filepath, output_path, table_name)
+            elif target_format == 'json':
+                output_path = filepath.replace(f'.{file_ext}', '_exported.json')
+                database_to_json(filepath, output_path, table_name)
+            else:
+                return jsonify({'error': '不支持的导出格式'}), 400
+        else:
+            return jsonify({'error': '不支持的数据库格式'}), 400
+        
+        return jsonify({
+            'success': True, 
+            'download_url': f'/download/{os.path.basename(output_path)}'
+        })
+            
+    except Exception as e:
+        return jsonify({'error': str(e)}), 500
+
+@app.route('/download/<filename>')
+def download_file(filename):
+    """文件下载"""
+    filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
+    if os.path.exists(filepath):
+        return send_file(filepath, as_attachment=True)
+    return jsonify({'error': '文件不存在'}), 404
+
+if __name__ == '__main__':
+    app.run(debug=True, host='0.0.0.0', port=5000)
--- a/config.py
+++ b/config.py
@ -0,0 +1,26 @@
+import os
+from dotenv import load_dotenv
+
+load_dotenv()
+
+class Config:
+    SECRET_KEY = os.getenv('SECRET_KEY', 'dev-secret-key')
+    UPLOAD_FOLDER = 'uploads'
+    MAX_CONTENT_LENGTH = 16 * 1024 * 1024  # 16MB max file size
+    
+    # OCR配置
+    TESSERACT_PATH = os.getenv('TESSERACT_PATH', '')
+    
+    # 数据库配置
+    DATABASE_URI = os.getenv('DATABASE_URI', 'sqlite:///data.db')
+    
+    # 网页抓取配置
+    USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
+    
+    # 支持的文件类型
+    ALLOWED_EXTENSIONS = {
+        'pdf', 'txt', 'doc', 'docx',
+        'jpg', 'jpeg', 'png', 'gif', 'bmp',
+        'xlsx', 'xls', 'csv', 'json',
+        'db', 'sqlite'
+    }
--- a/diagnose_ocr.py
+++ b/diagnose_ocr.py
@ -0,0 +1,253 @@
+#!/usr/bin/env python3
+"""
+OCR功能诊断脚本
+检查Tesseract OCR的安装和配置状态
+"""
+
+import os
+import sys
+import tempfile
+from pathlib import Path
+
+def check_tesseract_installation():
+    """检查Tesseract OCR是否安装"""
+    print("🔍 检查Tesseract OCR安装状态...")
+    
+    # 常见的Tesseract安装路径
+    possible_paths = [
+        r"C:\Program Files\Tesseract-OCR\tesseract.exe",
+        r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe",
+        r"D:\Program Files\Tesseract-OCR\tesseract.exe",
+        r"D:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
+    ]
+    
+    tesseract_path = None
+    for path in possible_paths:
+        if os.path.exists(path):
+            tesseract_path = path
+            print(f"✅ Tesseract找到: {path}")
+            break
+    
+    if not tesseract_path:
+        print("❌ Tesseract未找到在默认路径")
+        
+        # 检查系统PATH
+        import shutil
+        tesseract_cmd = shutil.which("tesseract")
+        if tesseract_cmd:
+            print(f"✅ Tesseract在PATH中找到: {tesseract_cmd}")
+            tesseract_path = tesseract_cmd
+        else:
+            print("❌ Tesseract未在系统PATH中找到")
+    
+    return tesseract_path
+
+def check_python_dependencies():
+    """检查Python OCR相关依赖"""
+    print("\n🐍 检查Python依赖...")
+    
+    dependencies = ["pytesseract", "PIL", "pandas"]
+    
+    for dep in dependencies:
+        try:
+            if dep == "PIL":
+                import PIL
+                print(f"✅ {dep}: {PIL.__version__}")
+            elif dep == "pytesseract":
+                import pytesseract
+                print(f"✅ {dep}: 已安装")
+            elif dep == "pandas":
+                import pandas
+                print(f"✅ {dep}: {pandas.__version__}")
+        except ImportError as e:
+            print(f"❌ {dep}: 未安装 - {e}")
+
+def create_test_image():
+    """创建测试图片"""
+    print("\n🖼️ 创建测试图片...")
+    
+    try:
+        from PIL import Image, ImageDraw, ImageFont
+        
+        # 创建图片
+        img = Image.new('RGB', (400, 200), color='white')
+        d = ImageDraw.Draw(img)
+        
+        # 尝试使用不同字体
+        fonts_to_try = [
+            "arial.ttf",
+            "Arial.ttf", 
+            "simhei.ttf",  # 黑体
+            "msyh.ttc",    # 微软雅黑
+            "C:\\Windows\\Fonts\\arial.ttf",
+            "C:\\Windows\\Fonts\\simhei.ttf"
+        ]
+        
+        font = None
+        for font_path in fonts_to_try:
+            try:
+                font = ImageFont.truetype(font_path, 24)
+                print(f"✅ 字体找到: {font_path}")
+                break
+            except:
+                continue
+        
+        if not font:
+            print("⚠️ 未找到合适字体，使用默认字体")
+            font = ImageFont.load_default()
+        
+        # 添加清晰的中英文文字
+        text_lines = [
+            "OCR测试文字",
+            "Hello World",
+            "1234567890",
+            "ABCDEFGHIJKLMN"
+        ]
+        
+        y_position = 30
+        for line in text_lines:
+            d.text((50, y_position), line, fill="black", font=font)
+            y_position += 40
+        
+        # 保存图片
+        test_image_path = os.path.join(tempfile.gettempdir(), "ocr_test_image.png")
+        img.save(test_image_path, "PNG")
+        
+        print(f"✅ 测试图片已创建: {test_image_path}")
+        print(f"   图片大小: {os.path.getsize(test_image_path)} 字节")
+        
+        return test_image_path
+        
+    except Exception as e:
+        print(f"❌ 创建测试图片失败: {e}")
+        return None
+
+def test_ocr_functionality(image_path):
+    """测试OCR功能"""
+    print("\n🔤 测试OCR识别功能...")
+    
+    if not image_path or not os.path.exists(image_path):
+        print("❌ 测试图片不存在")
+        return
+    
+    try:
+        import pytesseract
+        from PIL import Image
+        
+        # 设置Tesseract路径（如果需要）
+        tesseract_path = check_tesseract_installation()
+        if tesseract_path:
+            pytesseract.pytesseract.tesseract_cmd = tesseract_path
+        
+        # 打开并检查图片
+        image = Image.open(image_path)
+        print(f"✅ 图片格式: {image.format}, 大小: {image.size}")
+        
+        # 测试不同语言的OCR
+        languages = ['eng', 'chi_sim', 'eng+chi_sim']
+        
+        for lang in languages:
+            try:
+                print(f"\n   测试语言: {lang}")
+                text = pytesseract.image_to_string(image, lang=lang)
+                
+                if text.strip():
+                    print(f"   ✅ 识别成功:")
+                    print(f"      {text.strip()}")
+                else:
+                    print(f"   ⚠️ 识别无结果")
+                    
+            except Exception as e:
+                print(f"   ❌ 语言 {lang} 识别失败: {e}")
+        
+        # 测试图片数据
+        print(f"\n📊 图片数据检查:")
+        print(f"   模式: {image.mode}")
+        print(f"   通道: {'RGB' if image.mode == 'RGB' else image.mode}")
+        
+        # 检查图片是否可读
+        try:
+            image.verify()
+            print("   ✅ 图片验证通过")
+        except Exception as e:
+            print(f"   ❌ 图片验证失败: {e}")
+        
+    except Exception as e:
+        print(f"❌ OCR测试失败: {e}")
+
+def check_system_environment():
+    """检查系统环境"""
+    print("\n💻 检查系统环境...")
+    
+    print(f"   操作系统: {os.name}")
+    print(f"   Python版本: {sys.version}")
+    print(f"   当前目录: {os.getcwd()}")
+    print(f"   TMP目录: {tempfile.gettempdir()}")
+
+def main():
+    """主诊断函数"""
+    print("=" * 60)
+    print("OCR功能诊断工具")
+    print("=" * 60)
+    
+    # 检查系统环境
+    check_system_environment()
+    
+    # 检查依赖
+    check_python_dependencies()
+    
+    # 检查Tesseract安装
+    tesseract_path = check_tesseract_installation()
+    
+    # 创建测试图片
+    test_image_path = create_test_image()
+    
+    # 测试OCR功能
+    if test_image_path:
+        test_ocr_functionality(test_image_path)
+    
+    # 提供解决方案
+    print("\n" + "=" * 60)
+    print("💡 解决方案建议")
+    print("=" * 60)
+    
+    if not tesseract_path:
+        print("""
+🔧 Tesseract OCR未安装，请按以下步骤安装：
+
+1. 下载Tesseract OCR:
+   - 官方地址: https://github.com/UB-Mannheim/tesseract/wiki
+   - 选择Windows版本下载
+
+2. 安装步骤:
+   - 运行安装程序
+   - 安装到默认路径: C:\\Program Files\\Tesseract-OCR\\
+   - 安装时勾选"Add to PATH"选项
+   - 安装中文语言包（可选但推荐）
+
+3. 验证安装:
+   - 重新启动命令行
+   - 运行: tesseract --version
+   - 应该显示版本信息
+""")
+    else:
+        print("""
+✅ Tesseract已安装，问题可能在于：
+
+1. 图片格式问题
+   - 确保上传的图片格式正确（PNG, JPG等）
+   - 图片包含清晰可读的文字
+
+2. 语言包问题
+   - 确保安装了中文语言包（chi_sim）
+   - 可以尝试只使用英文识别
+
+3. 权限问题
+   - 确保应用有权限访问临时文件
+""")
+    
+    print("\n🔄 临时解决方案:")
+    print("   在应用中暂时禁用OCR功能，或使用在线OCR服务")
+
+if __name__ == "__main__":
+    main()
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,23 @@
+[project]
+name = "data-extractor-converter"
+version = "1.0.0"
+description = "数据提取与转换器 - 专为大学生开发的多功能数据处理工具"
+requires-python = ">=3.8"
+dependencies = [
+    "streamlit>=1.28.0",
+    "pandas>=2.0.3",
+    "requests>=2.31.0",
+    "beautifulsoup4>=4.12.2",
+    "pymupdf>=1.23.7",
+    "pytesseract>=0.3.10",
+    "pillow>=10.0.0",
+    "openpyxl>=3.1.2",
+    "sqlalchemy>=2.0.20",
+    "pymysql>=1.1.0",
+    "python-dotenv>=1.0.0",
+    "pyodbc>=4.0.0",
+    "alibabacloud-ocr-api20210707>=1.0.2",
+    "alibabacloud-tea-openapi>=0.3.6",
+    "alibabacloud-tea-util>=0.3.8",
+    "aiohttp>=3.8.0",
+]
--- a/run.py
+++ b/run.py
@ -0,0 +1,64 @@
+#!/usr/bin/env python3
+"""
+数据提取与转换器 - 启动脚本
+专为大学生开发的多功能数据处理工具
+"""
+
+import os
+import sys
+from app import app
+
+def check_dependencies():
+    """检查必要的依赖是否安装"""
+    try:
+        import flask
+        import pandas
+        import requests
+        import fitz  # PyMuPDF
+        import pytesseract
+        import sqlalchemy
+        print("✓ 所有依赖包已安装")
+        return True
+    except ImportError as e:
+        print(f"✗ 缺少依赖包: {e}")
+        print("请运行: pip install -r requirements.txt")
+        return False
+
+def create_upload_directories():
+    """创建必要的上传目录"""
+    directories = ['uploads', 'static', 'templates']
+    
+    for directory in directories:
+        os.makedirs(directory, exist_ok=True)
+    
+    print("✓ 目录结构已创建")
+
+def main():
+    """主函数"""
+    print("=" * 50)
+    print("数据提取与转换器 - 大学生专用工具")
+    print("=" * 50)
+    
+    # 检查依赖
+    if not check_dependencies():
+        sys.exit(1)
+    
+    # 创建目录
+    create_upload_directories()
+    
+    print("\n启动信息:")
+    print("- 本地访问: http://localhost:5000")
+    print("- 网络访问: http://0.0.0.0:5000")
+    print("- 停止服务: Ctrl+C")
+    print("\n" + "=" * 50)
+    
+    # 启动Flask应用
+    try:
+        app.run(debug=True, host='0.0.0.0', port=5000)
+    except KeyboardInterrupt:
+        print("\n\n服务已停止")
+    except Exception as e:
+        print(f"\n\n启动失败: {e}")
+
+if __name__ == '__main__':
+    main()
--- a/static/script.js
+++ b/static/script.js
@ -0,0 +1,416 @@
+// 全局变量
+let currentFile = null;
+
+// 标签页切换功能
+function openTab(tabName) {
+    // 隐藏所有标签页内容
+    const tabContents = document.getElementsByClassName('tab-content');
+    for (let i = 0; i < tabContents.length; i++) {
+        tabContents[i].classList.remove('active');
+    }
+    
+    // 移除所有标签按钮的激活状态
+    const tabButtons = document.getElementsByClassName('tab-button');
+    for (let i = 0; i < tabButtons.length; i++) {
+        tabButtons[i].classList.remove('active');
+    }
+    
+    // 显示选中的标签页内容
+    document.getElementById(tabName).classList.add('active');
+    
+    // 激活对应的标签按钮
+    event.currentTarget.classList.add('active');
+    
+    // 清空当前文件
+    currentFile = null;
+    clearResults();
+}
+
+// 文件上传处理
+function setupFileUpload(inputId, uploadAreaId) {
+    const fileInput = document.getElementById(inputId);
+    const uploadArea = document.getElementById(uploadAreaId);
+    
+    fileInput.addEventListener('change', function(e) {
+        if (this.files.length > 0) {
+            handleFileUpload(this.files[0], uploadArea);
+        }
+    });
+    
+    // 拖拽上传功能
+    uploadArea.addEventListener('dragover', function(e) {
+        e.preventDefault();
+        this.style.borderColor = '#2980b9';
+        this.style.background = '#e9ecef';
+    });
+    
+    uploadArea.addEventListener('dragleave', function(e) {
+        e.preventDefault();
+        this.style.borderColor = '#3498db';
+        this.style.background = '#f8f9fa';
+    });
+    
+    uploadArea.addEventListener('drop', function(e) {
+        e.preventDefault();
+        this.style.borderColor = '#3498db';
+        this.style.background = '#f8f9fa';
+        
+        if (e.dataTransfer.files.length > 0) {
+            handleFileUpload(e.dataTransfer.files[0], uploadArea);
+        }
+    });
+}
+
+// 处理文件上传
+async function handleFileUpload(file, uploadArea) {
+    const formData = new FormData();
+    formData.append('file', file);
+    
+    showStatus('正在上传文件...', 'info');
+    
+    try {
+        const response = await fetch('/upload', {
+            method: 'POST',
+            body: formData
+        });
+        
+        const result = await response.json();
+        
+        if (result.success) {
+            currentFile = result;
+            uploadArea.innerHTML = `
+                <div style="text-align: center;">
+                    <p style="color: #27ae60; font-weight: bold;">✓ 文件上传成功</p>
+                    <p>文件名: ${result.filename}</p>
+                    <p>文件类型: ${result.file_type}</p>
+                    <button onclick="clearFile('${uploadArea.id}')" class="btn" style="background: #e74c3c; color: white; margin-top: 10px;">重新选择</button>
+                </div>
+            `;
+            showStatus('文件上传成功！', 'success');
+        } else {
+            throw new Error(result.error);
+        }
+    } catch (error) {
+        showStatus('上传失败: ' + error.message, 'error');
+        uploadArea.innerHTML = `
+            <div class="upload-placeholder" onclick="document.getElementById('${fileInput.id}').click()">
+                <p>点击选择文件或拖拽文件到此处</p>
+                <p class="file-types">上传失败，请重试</p>
+            </div>
+        `;
+    }
+}
+
+// 清空文件选择
+function clearFile(uploadAreaId) {
+    const uploadArea = document.getElementById(uploadAreaId);
+    const fileInputId = uploadAreaId.replace('-upload-area', '-file');
+    
+    uploadArea.innerHTML = `
+        <input type="file" id="${fileInputId}" style="display: none;">
+        <div class="upload-placeholder" onclick="document.getElementById('${fileInputId}').click()">
+            <p>点击选择文件或拖拽文件到此处</p>
+            <p class="file-types">支持格式: 根据标签页不同</p>
+        </div>
+    `;
+    
+    currentFile = null;
+    clearResults();
+    setupFileUpload(fileInputId, uploadAreaId);
+}
+
+// PDF处理功能
+async function processPdf(action) {
+    if (!currentFile) {
+        showStatus('请先选择PDF文件', 'error');
+        return;
+    }
+    
+    showStatus('正在处理PDF文件...', 'info');
+    
+    try {
+        const response = await fetch('/process/pdf', {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json'
+            },
+            body: JSON.stringify({
+                filepath: currentFile.filepath,
+                action: action
+            })
+        });
+        
+        const result = await response.json();
+        
+        if (result.success) {
+            if (action === 'extract') {
+                document.getElementById('pdf-result').innerHTML = `
+                    <h4>提取的文本内容:</h4>
+                    <div style="max-height: 300px; overflow-y: auto; background: white; padding: 15px; border-radius: 5px;">
+                        ${result.text || '未提取到文本内容'}
+                    </div>
+                `;
+            } else if (action === 'to_excel') {
+                document.getElementById('pdf-result').innerHTML = `
+                    <h4>转换成功!</h4>
+                    <p>PDF文件已成功转换为Excel格式</p>
+                    <a href="${result.download_url}" class="download-link" download>下载Excel文件</a>
+                `;
+            }
+            showStatus('PDF处理完成！', 'success');
+        } else {
+            throw new Error(result.error);
+        }
+    } catch (error) {
+        showStatus('处理失败: ' + error.message, 'error');
+    }
+}
+
+// 图片处理功能
+async function processImage(action) {
+    if (!currentFile) {
+        showStatus('请先选择图片文件', 'error');
+        return;
+    }
+    
+    showStatus('正在处理图片文件...', 'info');
+    
+    try {
+        const response = await fetch('/process/image', {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json'
+            },
+            body: JSON.stringify({
+                filepath: currentFile.filepath,
+                action: action
+            })
+        });
+        
+        const result = await response.json();
+        
+        if (result.success) {
+            if (action === 'extract') {
+                document.getElementById('image-result').innerHTML = `
+                    <h4>识别的文字内容:</h4>
+                    <div style="max-height: 300px; overflow-y: auto; background: white; padding: 15px; border-radius: 5px;">
+                        ${result.text || '未识别到文字内容'}
+                    </div>
+                `;
+            } else {
+                const formatName = action === 'to_excel' ? 'Excel' : '文本';
+                document.getElementById('image-result').innerHTML = `
+                    <h4>转换成功!</h4>
+                    <p>图片文件已成功转换为${formatName}格式</p>
+                    <a href="${result.download_url}" class="download-link" download>下载${formatName}文件</a>
+                `;
+            }
+            showStatus('图片处理完成！', 'success');
+        } else {
+            throw new Error(result.error);
+        }
+    } catch (error) {
+        showStatus('处理失败: ' + error.message, 'error');
+    }
+}
+
+// 格式转换功能
+async function processFormat() {
+    if (!currentFile) {
+        showStatus('请先选择文件', 'error');
+        return;
+    }
+    
+    const targetFormat = document.getElementById('target-format').value;
+    
+    showStatus('正在转换文件格式...', 'info');
+    
+    try {
+        const response = await fetch('/process/format', {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json'
+            },
+            body: JSON.stringify({
+                filepath: currentFile.filepath,
+                target_format: targetFormat
+            })
+        });
+        
+        const result = await response.json();
+        
+        if (result.success) {
+            document.getElementById('format-result').innerHTML = `
+                <h4>转换成功!</h4>
+                <p>文件已成功转换为${targetFormat.toUpperCase()}格式</p>
+                <a href="${result.download_url}" class="download-link" download>下载文件</a>
+            `;
+            showStatus('格式转换完成！', 'success');
+        } else {
+            throw new Error(result.error);
+        }
+    } catch (error) {
+        showStatus('转换失败: ' + error.message, 'error');
+    }
+}
+
+// 网页抓取功能
+async function processWeb() {
+    const url = document.getElementById('web-url').value;
+    const selector = document.getElementById('css-selector').value;
+    
+    if (!url) {
+        showStatus('请输入网页URL', 'error');
+        return;
+    }
+    
+    showStatus('正在抓取网页内容...', 'info');
+    
+    try {
+        const response = await fetch('/process/web', {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json'
+            },
+            body: JSON.stringify({
+                url: url,
+                selector: selector
+            })
+        });
+        
+        const result = await response.json();
+        
+        if (result.success) {
+            document.getElementById('web-result').innerHTML = `
+                <h4>抓取结果:</h4>
+                <div style="max-height: 300px; overflow-y: auto; background: white; padding: 15px; border-radius: 5px;">
+                    ${result.content || '未抓取到内容'}
+                </div>
+            `;
+            showStatus('网页抓取完成！', 'success');
+        } else {
+            throw new Error(result.error);
+        }
+    } catch (error) {
+        showStatus('抓取失败: ' + error.message, 'error');
+    }
+}
+
+// 网页抓取并导出为Excel
+async function processWebToExcel() {
+    const url = document.getElementById('web-url').value;
+    const selector = document.getElementById('css-selector').value;
+    
+    if (!url) {
+        showStatus('请输入网页URL', 'error');
+        return;
+    }
+    
+    showStatus('正在抓取网页并导出为Excel...', 'info');
+    
+    try {
+        const response = await fetch('/process/web', {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json'
+            },
+            body: JSON.stringify({
+                url: url,
+                selector: selector
+            })
+        });
+        
+        const result = await response.json();
+        
+        if (result.success) {
+            document.getElementById('web-result').innerHTML = `
+                <h4>导出成功!</h4>
+                <p>网页内容已成功导出为Excel格式</p>
+                <a href="${result.download_url}" class="download-link" download>下载Excel文件</a>
+            `;
+            showStatus('网页导出完成！', 'success');
+        } else {
+            throw new Error(result.error);
+        }
+    } catch (error) {
+        showStatus('导出失败: ' + error.message, 'error');
+    }
+}
+
+// 数据库导出功能
+async function processDatabase() {
+    if (!currentFile) {
+        showStatus('请先选择数据库文件', 'error');
+        return;
+    }
+    
+    const targetFormat = document.getElementById('db-target-format').value;
+    const tableName = document.getElementById('table-name').value;
+    
+    showStatus('正在导出数据库...', 'info');
+    
+    try {
+        const response = await fetch('/process/database', {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json'
+            },
+            body: JSON.stringify({
+                filepath: currentFile.filepath,
+                target_format: targetFormat,
+                table_name: tableName
+            })
+        });
+        
+        const result = await response.json();
+        
+        if (result.success) {
+            document.getElementById('database-result').innerHTML = `
+                <h4>导出成功!</h4>
+                <p>数据库已成功导出为${targetFormat.toUpperCase()}格式</p>
+                <a href="${result.download_url}" class="download-link" download>下载文件</a>
+            `;
+            showStatus('数据库导出完成！', 'success');
+        } else {
+            throw new Error(result.error);
+        }
+    } catch (error) {
+        showStatus('导出失败: ' + error.message, 'error');
+    }
+}
+
+// 显示状态消息
+function showStatus(message, type) {
+    const statusEl = document.getElementById('status-message');
+    statusEl.textContent = message;
+    statusEl.className = `status-message status-${type}`;
+    statusEl.style.display = 'block';
+    
+    setTimeout(() => {
+        statusEl.style.display = 'none';
+    }, 5000);
+}
+
+// 清空结果区域
+function clearResults() {
+    const resultAreas = document.getElementsByClassName('result-area');
+    for (let i = 0; i < resultAreas.length; i++) {
+        resultAreas[i].innerHTML = '';
+    }
+}
+
+// 初始化页面
+document.addEventListener('DOMContentLoaded', function() {
+    // 设置文件上传功能
+    setupFileUpload('pdf-file', 'pdf-upload-area');
+    setupFileUpload('image-file', 'image-upload-area');
+    setupFileUpload('format-file', 'format-upload-area');
+    setupFileUpload('db-file', 'db-upload-area');
+    
+    // 设置输入框回车事件
+    document.getElementById('web-url').addEventListener('keypress', function(e) {
+        if (e.key === 'Enter') {
+            processWeb();
+        }
+    });
+});
--- a/static/style.css
+++ b/static/style.css
@ -0,0 +1,265 @@
+* {
+    margin: 0;
+    padding: 0;
+    box-sizing: border-box;
+}
+
+body {
+    font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+    min-height: 100vh;
+    padding: 20px;
+}
+
+.container {
+    max-width: 1200px;
+    margin: 0 auto;
+    background: white;
+    border-radius: 15px;
+    box-shadow: 0 20px 40px rgba(0,0,0,0.1);
+    overflow: hidden;
+}
+
+header {
+    background: linear-gradient(135deg, #2c3e50, #3498db);
+    color: white;
+    padding: 40px;
+    text-align: center;
+}
+
+header h1 {
+    font-size: 2.5em;
+    margin-bottom: 10px;
+}
+
+.subtitle {
+    font-size: 1.2em;
+    opacity: 0.9;
+}
+
+.tabs {
+    display: flex;
+    background: #f8f9fa;
+    border-bottom: 1px solid #dee2e6;
+}
+
+.tab-button {
+    flex: 1;
+    padding: 15px 20px;
+    border: none;
+    background: transparent;
+    cursor: pointer;
+    font-size: 16px;
+    font-weight: 500;
+    transition: all 0.3s ease;
+    border-bottom: 3px solid transparent;
+}
+
+.tab-button:hover {
+    background: #e9ecef;
+}
+
+.tab-button.active {
+    background: white;
+    border-bottom-color: #3498db;
+    color: #3498db;
+}
+
+.tab-content {
+    display: none;
+    padding: 30px;
+}
+
+.tab-content.active {
+    display: block;
+}
+
+.tab-content h2 {
+    color: #2c3e50;
+    margin-bottom: 20px;
+    font-size: 1.8em;
+}
+
+.upload-area {
+    border: 2px dashed #3498db;
+    border-radius: 10px;
+    padding: 40px;
+    text-align: center;
+    margin-bottom: 20px;
+    transition: all 0.3s ease;
+    background: #f8f9fa;
+}
+
+.upload-area:hover {
+    border-color: #2980b9;
+    background: #e9ecef;
+}
+
+.upload-placeholder {
+    cursor: pointer;
+}
+
+.upload-placeholder p {
+    font-size: 18px;
+    color: #6c757d;
+    margin-bottom: 10px;
+}
+
+.file-types {
+    font-size: 14px !important;
+    color: #adb5bd !important;
+}
+
+.input-group {
+    margin-bottom: 20px;
+}
+
+.input-group label {
+    display: block;
+    margin-bottom: 5px;
+    font-weight: 500;
+    color: #495057;
+}
+
+.input-group input, .input-group select {
+    width: 100%;
+    padding: 10px;
+    border: 1px solid #ced4da;
+    border-radius: 5px;
+    font-size: 16px;
+}
+
+.input-group small {
+    color: #6c757d;
+    font-size: 12px;
+}
+
+.action-buttons {
+    display: flex;
+    gap: 10px;
+    margin-bottom: 20px;
+    flex-wrap: wrap;
+}
+
+.conversion-options {
+    display: flex;
+    align-items: center;
+    gap: 10px;
+    margin-bottom: 20px;
+    flex-wrap: wrap;
+}
+
+.btn {
+    padding: 12px 24px;
+    border: none;
+    border-radius: 5px;
+    cursor: pointer;
+    font-size: 16px;
+    font-weight: 500;
+    transition: all 0.3s ease;
+    text-decoration: none;
+    display: inline-block;
+}
+
+.btn-primary {
+    background: #3498db;
+    color: white;
+}
+
+.btn-primary:hover {
+    background: #2980b9;
+}
+
+.btn-success {
+    background: #27ae60;
+    color: white;
+}
+
+.btn-success:hover {
+    background: #219a52;
+}
+
+.btn-info {
+    background: #17a2b8;
+    color: white;
+}
+
+.btn-info:hover {
+    background: #138496;
+}
+
+.result-area {
+    background: #f8f9fa;
+    border: 1px solid #dee2e6;
+    border-radius: 5px;
+    padding: 20px;
+    min-height: 100px;
+    max-height: 400px;
+    overflow-y: auto;
+    white-space: pre-wrap;
+    font-family: 'Courier New', monospace;
+}
+
+.status-message {
+    position: fixed;
+    top: 20px;
+    right: 20px;
+    padding: 15px 20px;
+    border-radius: 5px;
+    color: white;
+    font-weight: 500;
+    z-index: 1000;
+    display: none;
+}
+
+.status-success {
+    background: #27ae60;
+}
+
+.status-error {
+    background: #e74c3c;
+}
+
+.status-info {
+    background: #3498db;
+}
+
+.download-link {
+    display: inline-block;
+    margin-top: 10px;
+    padding: 10px 15px;
+    background: #27ae60;
+    color: white;
+    text-decoration: none;
+    border-radius: 5px;
+    transition: background 0.3s ease;
+}
+
+.download-link:hover {
+    background: #219a52;
+}
+
+@media (max-width: 768px) {
+    .container {
+        margin: 10px;
+        border-radius: 10px;
+    }
+    
+    .tabs {
+        flex-direction: column;
+    }
+    
+    .tab-button {
+        border-bottom: 1px solid #dee2e6;
+        border-right: none;
+    }
+    
+    .action-buttons {
+        flex-direction: column;
+    }
+    
+    .conversion-options {
+        flex-direction: column;
+        align-items: stretch;
+    }
+}
--- a/templates/index.html
+++ b/templates/index.html
@ -0,0 +1,132 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>数据提取与转换器 - 大学生专用工具</title>
+    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
+</head>
+<body>
+    <div class="container">
+        <header>
+            <h1>数据提取与转换器</h1>
+            <p class="subtitle">专为大学生开发的多功能数据处理工具</p>
+        </header>
+
+        <div class="tabs">
+            <button class="tab-button active" onclick="openTab('pdf-tab')">PDF处理</button>
+            <button class="tab-button" onclick="openTab('image-tab')">图片OCR</button>
+            <button class="tab-button" onclick="openTab('format-tab')">格式转换</button>
+            <button class="tab-button" onclick="openTab('web-tab')">网页抓取</button>
+            <button class="tab-button" onclick="openTab('database-tab')">数据库导出</button>
+        </div>
+
+        <!-- PDF处理标签页 -->
+        <div id="pdf-tab" class="tab-content active">
+            <h2>PDF文本/表格提取</h2>
+            <div class="upload-area" id="pdf-upload-area">
+                <input type="file" id="pdf-file" accept=".pdf" style="display: none;">
+                <div class="upload-placeholder" onclick="document.getElementById('pdf-file').click()">
+                    <p>点击选择PDF文件或拖拽文件到此处</p>
+                    <p class="file-types">支持格式: .pdf</p>
+                </div>
+            </div>
+            <div class="action-buttons">
+                <button onclick="processPdf('extract')" class="btn btn-primary">提取文本</button>
+                <button onclick="processPdf('to_excel')" class="btn btn-success">导出为Excel</button>
+            </div>
+            <div id="pdf-result" class="result-area"></div>
+        </div>
+
+        <!-- 图片OCR标签页 -->
+        <div id="image-tab" class="tab-content">
+            <h2>图片文字识别 (OCR)</h2>
+            <div class="upload-area" id="image-upload-area">
+                <input type="file" id="image-file" accept="image/*" style="display: none;">
+                <div class="upload-placeholder" onclick="document.getElementById('image-file').click()">
+                    <p>点击选择图片文件或拖拽文件到此处</p>
+                    <p class="file-types">支持格式: .jpg, .jpeg, .png, .gif, .bmp</p>
+                </div>
+            </div>
+            <div class="action-buttons">
+                <button onclick="processImage('extract')" class="btn btn-primary">识别文字</button>
+                <button onclick="processImage('to_excel')" class="btn btn-success">导出为Excel</button>
+                <button onclick="processImage('to_text')" class="btn btn-info">导出为文本</button>
+            </div>
+            <div id="image-result" class="result-area"></div>
+        </div>
+
+        <!-- 格式转换标签页 -->
+        <div id="format-tab" class="tab-content">
+            <h2>文件格式转换</h2>
+            <div class="upload-area" id="format-upload-area">
+                <input type="file" id="format-file" accept=".xlsx,.xls,.csv,.json" style="display: none;">
+                <div class="upload-placeholder" onclick="document.getElementById('format-file').click()">
+                    <p>点击选择文件或拖拽文件到此处</p>
+                    <p class="file-types">支持格式: .xlsx, .xls, .csv, .json</p>
+                </div>
+            </div>
+            <div class="conversion-options">
+                <label>转换为:</label>
+                <select id="target-format">
+                    <option value="excel">Excel (.xlsx)</option>
+                    <option value="csv">CSV (.csv)</option>
+                    <option value="json">JSON (.json)</option>
+                </select>
+                <button onclick="processFormat()" class="btn btn-success">开始转换</button>
+            </div>
+            <div id="format-result" class="result-area"></div>
+        </div>
+
+        <!-- 网页抓取标签页 -->
+        <div id="web-tab" class="tab-content">
+            <h2>网页数据抓取</h2>
+            <div class="input-group">
+                <label for="web-url">网页URL:</label>
+                <input type="url" id="web-url" placeholder="https://example.com">
+            </div>
+            <div class="input-group">
+                <label for="css-selector">CSS选择器 (可选):</label>
+                <input type="text" id="css-selector" placeholder="例如: .content, #main, p">
+                <small>留空则抓取整个页面文本</small>
+            </div>
+            <div class="action-buttons">
+                <button onclick="processWeb()" class="btn btn-primary">抓取内容</button>
+                <button onclick="processWebToExcel()" class="btn btn-success">导出为Excel</button>
+            </div>
+            <div id="web-result" class="result-area"></div>
+        </div>
+
+        <!-- 数据库导出标签页 -->
+        <div id="database-tab" class="tab-content">
+            <h2>数据库导出</h2>
+            <div class="upload-area" id="db-upload-area">
+                <input type="file" id="db-file" accept=".db,.sqlite" style="display: none;">
+                <div class="upload-placeholder" onclick="document.getElementById('db-file').click()">
+                    <p>点击选择数据库文件或拖拽文件到此处</p>
+                    <p class="file-types">支持格式: .db, .sqlite</p>
+                </div>
+            </div>
+            <div class="input-group">
+                <label for="table-name">表名 (可选):</label>
+                <input type="text" id="table-name" placeholder="留空则导出所有表">
+            </div>
+            <div class="conversion-options">
+                <label>导出为:</label>
+                <select id="db-target-format">
+                    <option value="excel">Excel (.xlsx)</option>
+                    <option value="csv">CSV (.csv)</option>
+                    <option value="json">JSON (.json)</option>
+                </select>
+                <button onclick="processDatabase()" class="btn btn-success">开始导出</button>
+            </div>
+            <div id="database-result" class="result-area"></div>
+        </div>
+
+        <!-- 全局状态显示 -->
+        <div id="status-message" class="status-message"></div>
+    </div>
+
+    <script src="{{ url_for('static', filename='script.js') }}"></script>
+</body>
+</html>
--- a/test_cases/cat_coffee.png
+++ b/test_cases/cat_coffee.png
--- a/test_cases/test_data.csv
+++ b/test_cases/test_data.csv
@ -0,0 +1,6 @@
+姓名,年龄,城市,专业,成绩
+张三,20,北京,计算机科学,85
+李四,21,上海,数据科学,92
+王五,19,广州,人工智能,78
+赵六,22,深圳,软件工程,88
+钱七,20,杭州,网络安全,95
--- a/test_cases/test_data.json
+++ b/test_cases/test_data.json
@ -0,0 +1,37 @@
+[
+  {
+    "姓名": "张三",
+    "年龄": 20,
+    "城市": "北京",
+    "专业": "计算机科学",
+    "成绩": 85
+  },
+  {
+    "姓名": "李四", 
+    "年龄": 21,
+    "城市": "上海",
+    "专业": "数据科学",
+    "成绩": 92
+  },
+  {
+    "姓名": "王五",
+    "年龄": 19,
+    "城市": "广州", 
+    "专业": "人工智能",
+    "成绩": 78
+  },
+  {
+    "姓名": "赵六",
+    "年龄": 22,
+    "城市": "深圳",
+    "专业": "软件工程", 
+    "成绩": 88
+  },
+  {
+    "姓名": "钱七",
+    "年龄": 20,
+    "城市": "杭州",
+    "专业": "网络安全",
+    "成绩": 95
+  }
+]
--- a/test_functionality.py
+++ b/test_functionality.py
@ -0,0 +1,192 @@
+#!/usr/bin/env python3
+"""
+数据提取与转换器 - 功能测试脚本
+用于验证应用的各项功能是否正常工作
+"""
+
+import os
+import sys
+import tempfile
+from pathlib import Path
+
+# 添加项目路径到Python路径
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+
+# 导入工具模块
+try:
+    from utils.pdf_extractor import extract_text_from_pdf
+    from utils.ocr_processor import extract_text_from_image
+    from utils.format_converter import excel_to_csv, csv_to_excel, json_to_excel
+    from utils.web_scraper import scrape_webpage
+    from utils.database_exporter import export_sqlite_to_excel
+    print("✅ 所有工具模块导入成功")
+except ImportError as e:
+    print(f"❌ 模块导入失败: {e}")
+    sys.exit(1)
+
+def test_format_conversion():
+    """测试格式转换功能"""
+    print("\n📊 测试格式转换功能...")
+    
+    # 测试数据
+    test_data = [
+        {"姓名": "张三", "年龄": 20, "城市": "北京"},
+        {"姓名": "李四", "年龄": 21, "城市": "上海"},
+        {"姓名": "王五", "年龄": 19, "城市": "广州"}
+    ]
+    
+    try:
+        # 创建临时文件
+        with tempfile.NamedTemporaryFile(suffix='.csv', delete=False, mode='w', encoding='utf-8') as f:
+            f.write("姓名,年龄,城市\n")
+            for item in test_data:
+                f.write(f"{item['姓名']},{item['年龄']},{item['城市']}\n")
+            csv_path = f.name
+        
+        # CSV转Excel
+        excel_path = csv_path.replace('.csv', '.xlsx')
+        csv_to_excel(csv_path, excel_path)
+        
+        if os.path.exists(excel_path):
+            print("✅ CSV转Excel功能正常")
+            os.unlink(excel_path)
+        else:
+            print("❌ CSV转Excel功能失败")
+        
+        os.unlink(csv_path)
+        
+    except Exception as e:
+        print(f"❌ 格式转换测试失败: {e}")
+
+def test_web_scraping():
+    """测试网页抓取功能"""
+    print("\n🌐 测试网页抓取功能...")
+    
+    try:
+        # 测试抓取百度首页标题
+        content = scrape_webpage("https://www.baidu.com")
+        if content and len(content) > 0:
+            print("✅ 网页抓取功能正常")
+            print(f"   抓取内容长度: {len(content)} 字符")
+        else:
+            print("❌ 网页抓取功能失败")
+    except Exception as e:
+        print(f"❌ 网页抓取测试失败: {e}")
+
+def test_ocr_functionality():
+    """测试OCR功能"""
+    print("\n🖼️ 测试OCR功能...")
+    
+    try:
+        # 创建一个简单的测试图片（包含文字）
+        from PIL import Image, ImageDraw, ImageFont
+        
+        # 创建图片
+        img = Image.new('RGB', (400, 200), color='white')
+        d = ImageDraw.Draw(img)
+        
+        # 尝试使用系统字体
+        try:
+            font = ImageFont.truetype("arial.ttf", 24)
+        except:
+            try:
+                font = ImageFont.truetype("Arial.ttf", 24)
+            except:
+                font = ImageFont.load_default()
+        
+        # 添加文字
+        d.text((50, 80), "测试文字: Hello World 你好世界", fill="black", font=font)
+        
+        # 保存图片
+        img_path = os.path.join(tempfile.gettempdir(), "test_ocr.png")
+        img.save(img_path)
+        
+        # 测试OCR识别
+        text = extract_text_from_image(img_path)
+        
+        if text:
+            print("✅ OCR功能正常")
+            print(f"   识别结果: {text}")
+        else:
+            print("⚠️ OCR识别无结果（可能是字体问题）")
+        
+        os.unlink(img_path)
+        
+    except Exception as e:
+        print(f"❌ OCR测试失败: {e}")
+
+def test_database_functionality():
+    """测试数据库功能"""
+    print("\n🗄️ 测试数据库功能...")
+    
+    try:
+        import sqlite3
+        
+        # 创建测试数据库
+        db_path = os.path.join(tempfile.gettempdir(), "test.db")
+        conn = sqlite3.connect(db_path)
+        cursor = conn.cursor()
+        
+        # 创建测试表
+        cursor.execute("""
+            CREATE TABLE IF NOT EXISTS students (
+                id INTEGER PRIMARY KEY,
+                name TEXT NOT NULL,
+                age INTEGER,
+                major TEXT
+            )
+        """)
+        
+        # 插入测试数据
+        test_data = [
+            (1, "张三", 20, "计算机科学"),
+            (2, "李四", 21, "数据科学"),
+            (3, "王五", 19, "人工智能")
+        ]
+        
+        cursor.executemany("INSERT INTO students VALUES (?, ?, ?, ?)", test_data)
+        conn.commit()
+        conn.close()
+        
+        # 测试数据库导出
+        excel_path = db_path.replace('.db', '.xlsx')
+        export_sqlite_to_excel(db_path, excel_path)
+        
+        if os.path.exists(excel_path):
+            print("✅ 数据库导出功能正常")
+            os.unlink(excel_path)
+        else:
+            print("❌ 数据库导出功能失败")
+        
+        os.unlink(db_path)
+        
+    except Exception as e:
+        print(f"❌ 数据库功能测试失败: {e}")
+
+def main():
+    """主测试函数"""
+    print("=" * 50)
+    print("数据提取与转换器 - 功能测试")
+    print("=" * 50)
+    
+    # 测试各项功能
+    test_format_conversion()
+    test_web_scraping()
+    test_ocr_functionality()
+    test_database_functionality()
+    
+    print("\n" + "=" * 50)
+    print("测试完成！")
+    print("=" * 50)
+    
+    # 显示应用访问信息
+    print("\n🌐 应用访问信息:")
+    print("本地访问: http://localhost:8502")
+    print("网络访问: http://192.168.10.21:8502")
+    print("\n💡 测试建议:")
+    print("1. 访问应用界面测试文件上传功能")
+    print("2. 使用test_cases目录下的测试文件")
+    print("3. 测试网页抓取功能（输入百度等网站URL）")
+
+if __name__ == "__main__":
+    main()
--- a/test_mdf_functionality.py
+++ b/test_mdf_functionality.py
@ -0,0 +1,213 @@
+#!/usr/bin/env python3
+"""
+MDF文件导出功能测试脚本
+测试SQL Server数据库文件导出功能
+"""
+
+import os
+import sys
+import tempfile
+from pathlib import Path
+
+# 添加项目路径到Python路径
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+
+def check_sql_server_connection():
+    """检查SQL Server连接"""
+    print("🔍 检查SQL Server连接...")
+    
+    try:
+        import pyodbc
+        
+        # 测试连接参数
+        test_servers = [
+            ('localhost', 'MSSQLSERVER'),
+            ('.', 'MSSQLSERVER'),
+            ('localhost\\SQLEXPRESS', 'SQLEXPRESS')
+        ]
+        
+        connected = False
+        for server, instance in test_servers:
+            try:
+                if instance == 'MSSQLSERVER':
+                    conn_str = f"DRIVER={{SQL Server}};SERVER={server};Trusted_Connection=yes;"
+                else:
+                    conn_str = f"DRIVER={{SQL Server}};SERVER={server}\\{instance};Trusted_Connection=yes;"
+                
+                conn = pyodbc.connect(conn_str, timeout=5)
+                cursor = conn.cursor()
+                cursor.execute("SELECT @@version")
+                version = cursor.fetchone()[0]
+                
+                print(f"✅ 连接到 {server}\\{instance}")
+                print(f"   SQL Server版本: {version.split('\\n')[0]}")
+                connected = True
+                conn.close()
+                break
+                
+            except Exception as e:
+                print(f"❌ 无法连接到 {server}\\{instance}: {e}")
+        
+        if not connected:
+            print("⚠️ 未找到可用的SQL Server实例")
+            print("   请安装SQL Server或检查服务状态")
+        
+        return connected
+        
+    except ImportError:
+        print("❌ pyodbc未安装")
+        return False
+
+def test_mdf_export_module():
+    """测试MDF导出模块"""
+    print("\n🧪 测试MDF导出模块...")
+    
+    try:
+        from utils.database_exporter import (
+            export_mssql_mdf_to_excel, 
+            export_mssql_mdf_to_csv, 
+            export_mssql_mdf_to_json
+        )
+        print("✅ MDF导出模块导入成功")
+        
+        # 检查函数是否存在
+        functions = [
+            export_mssql_mdf_to_excel,
+            export_mssql_mdf_to_csv, 
+            export_mssql_mdf_to_json
+        ]
+        
+        for func in functions:
+            print(f"✅ {func.__name__} 函数可用")
+        
+        return True
+        
+    except Exception as e:
+        print(f"❌ MDF导出模块测试失败: {e}")
+        return False
+
+def create_sample_mdf_info():
+    """创建示例MDF文件信息"""
+    print("\n📋 示例MDF文件信息:")
+    
+    sample_info = """
+💡 要测试MDF文件导出功能，您需要：
+
+1. **现有的.mdf文件**
+   - 从现有SQL Server数据库分离的.mdf文件
+   - 或使用SQL Server创建测试数据库
+
+2. **SQL Server实例**
+   - 本地安装的SQL Server
+   - 或可访问的远程SQL Server
+
+3. **连接权限**
+   - 数据库读取权限
+   - 附加数据库权限
+
+🔧 创建测试MDF文件的步骤：
+
+1. 在SQL Server Management Studio中：
+   ```sql
+   -- 创建测试数据库
+   CREATE DATABASE TestMDFExport;
+   GO
+   
+   -- 创建测试表
+   USE TestMDFExport;
+   CREATE TABLE Students (
+       ID INT PRIMARY KEY,
+       Name NVARCHAR(50),
+       Age INT,
+       Major NVARCHAR(50)
+   );
+   
+   -- 插入测试数据
+   INSERT INTO Students VALUES 
+   (1, '张三', 20, '计算机科学'),
+   (2, '李四', 21, '数据科学'),
+   (3, '王五', 19, '人工智能');
+   ```
+
+2. 分离数据库获取.mdf文件：
+   ```sql
+   -- 分离数据库
+   USE master;
+   GO
+   EXEC sp_detach_db 'TestMDFExport', 'true';
+   ```
+
+3. 数据库文件位置：
+   - 默认路径: C:\\Program Files\\Microsoft SQL Server\\...\\DATA\\
+   - 文件: TestMDFExport.mdf 和 TestMDFExport_log.ldf
+"""
+    
+    print(sample_info)
+
+def check_odbc_drivers():
+    """检查可用的ODBC驱动程序"""
+    print("\n🔌 检查ODBC驱动程序...")
+    
+    try:
+        import pyodbc
+        
+        drivers = pyodbc.drivers()
+        if drivers:
+            print("✅ 找到以下ODBC驱动程序:")
+            for driver in drivers:
+                print(f"   - {driver}")
+            
+            # 检查SQL Server相关驱动
+            sql_drivers = [d for d in drivers if 'SQL Server' in d]
+            if sql_drivers:
+                print("\n✅ 找到SQL Server ODBC驱动程序")
+            else:
+                print("\n⚠️ 未找到SQL Server ODBC驱动程序")
+                print("   请安装ODBC Driver for SQL Server")
+        else:
+            print("❌ 未找到ODBC驱动程序")
+            
+    except Exception as e:
+        print(f"❌ 检查ODBC驱动程序失败: {e}")
+
+def main():
+    """主测试函数"""
+    print("=" * 60)
+    print("MDF文件导出功能测试")
+    print("=" * 60)
+    
+    # 检查ODBC驱动
+    check_odbc_drivers()
+    
+    # 检查SQL Server连接
+    sql_connected = check_sql_server_connection()
+    
+    # 测试MDF导出模块
+    module_ok = test_mdf_export_module()
+    
+    # 显示示例信息
+    create_sample_mdf_info()
+    
+    print("\n" + "=" * 60)
+    print("测试总结")
+    print("=" * 60)
+    
+    if sql_connected and module_ok:
+        print("✅ MDF导出功能配置正确")
+        print("💡 您可以上传.mdf文件测试导出功能")
+    else:
+        print("⚠️ MDF导出功能需要额外配置")
+        
+        if not sql_connected:
+            print("   - 需要安装或配置SQL Server")
+        if not module_ok:
+            print("   - 需要检查模块依赖")
+    
+    print("\n🚀 下一步操作:")
+    print("1. 确保SQL Server服务运行")
+    print("2. 准备.mdf测试文件")
+    print("3. 访问应用测试导出功能")
+    print("4. 参考SQL_SERVER_SETUP.md获取详细配置说明")
+
+if __name__ == "__main__":
+    main()
--- a/utils/init.py
+++ b/utils/init.py
@ -0,0 +1 @@
+# 工具模块初始化文件
--- a/utils/ai_copywriter.py
+++ b/utils/ai_copywriter.py
@ -0,0 +1,438 @@
+#!/usr/bin/env python3
+"""
+AI文案生成服务集成
+使用AI大模型为照片生成创意文案
+支持多种文案风格和用途
+支持DeepSeek和DashScope两种大模型
+"""
+
+import os
+import json
+import requests
+from dotenv import load_dotenv
+
+# 加载环境变量
+load_dotenv()
+
+class AICopywriter:
+    """AI文案生成服务类"""
+    
+    def __init__(self, provider='deepseek'):
+        """初始化AI文案生成客户端"""
+        self.provider = provider
+        
+        if provider == 'deepseek':
+            self.api_key = os.getenv('DEEPSEEK_API_KEY')
+            if not self.api_key:
+                raise Exception("DeepSeek API密钥未配置，请在.env文件中设置DEEPSEEK_API_KEY")
+            self.base_url = "https://api.deepseek.com/v1/chat/completions"
+        elif provider == 'dashscope':
+            self.api_key = os.getenv('DASHSCOPE_API_KEY')
+            if not self.api_key:
+                raise Exception("DashScope API密钥未配置，请在.env文件中设置DASHSCOPE_API_KEY")
+        else:
+            raise Exception(f"不支持的AI提供商: {provider}")
+    
+    def generate_photo_caption(self, image_description, style='creative', length='medium'):
+        """为照片生成文案"""
+        try:
+            if self.provider == 'deepseek':
+                return self._generate_with_deepseek(image_description, style, length)
+            elif self.provider == 'dashscope':
+                return self._generate_with_dashscope(image_description, style, length)
+            else:
+                raise Exception(f"不支持的AI提供商: {self.provider}")
+                
+        except Exception as e:
+            raise Exception(f"AI文案生成失败: {str(e)}")
+    
+    def _generate_with_deepseek(self, image_description, style, length):
+        """使用DeepSeek生成文案"""
+        try:
+            prompt = self._build_prompt(image_description, style, length)
+            
+            headers = {
+                'Authorization': f'Bearer {self.api_key}',
+                'Content-Type': 'application/json'
+            }
+            
+            data = {
+                'model': 'deepseek-chat',
+                'messages': [
+                    {
+                        'role': 'system',
+                        'content': '你是一个专业的创意文案创作助手，擅长为照片生成各种风格的创意文案。你具有丰富的文学素养和营销知识，能够根据照片内容创作出富有创意和感染力的文案。'
+                    },
+                    {
+                        'role': 'user',
+                        'content': prompt
+                    }
+                ],
+                'max_tokens': 500,
+                'temperature': 0.8,
+                'top_p': 0.9
+            }
+            
+            response = requests.post(self.base_url, headers=headers, json=data)
+            result = response.json()
+            
+            if 'choices' in result and len(result['choices']) > 0:
+                caption = result['choices'][0]['message']['content'].strip()
+                # 清理可能的格式标记
+                caption = caption.replace('"', '').replace('\n', ' ').strip()
+                return caption
+            else:
+                # 如果API调用失败，使用备用文案生成
+                return self._generate_fallback_caption(image_description, style, length)
+                
+        except Exception as e:
+            # API调用失败时使用备用方案
+            return self._generate_fallback_caption(image_description, style, length)
+    
+    def _generate_with_dashscope(self, image_description, style, length):
+        """使用DashScope生成文案"""
+        try:
+            url = "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation"
+            
+            headers = {
+                'Authorization': f'Bearer {self.api_key}',
+                'Content-Type': 'application/json'
+            }
+            
+            # 根据风格和长度构建提示词
+            prompt = self._build_prompt(image_description, style, length)
+            
+            data = {
+                'model': 'qwen-turbo',
+                'input': {
+                    'messages': [
+                        {
+                            'role': 'system',
+                            'content': '你是一个专业的文案创作助手，擅长为照片生成各种风格的创意文案。'
+                        },
+                        {
+                            'role': 'user',
+                            'content': prompt
+                        }
+                    ]
+                },
+                'parameters': {
+                    'max_tokens': 500,
+                    'temperature': 0.8
+                }
+            }
+            
+            response = requests.post(url, headers=headers, json=data)
+            result = response.json()
+            
+            if 'output' in result and 'text' in result['output']:
+                return result['output']['text']
+            else:
+                # 如果API调用失败，使用备用文案生成
+                return self._generate_fallback_caption(image_description, style, length)
+                
+        except Exception as e:
+            # API调用失败时使用备用方案
+            return self._generate_fallback_caption(image_description, style, length)
+    
+    def _build_prompt(self, image_description, style, length):
+        """构建AI提示词"""
+        
+        style_descriptions = {
+            'creative': '创意文艺风格，富有诗意和想象力',
+            'professional': '专业正式风格，简洁明了',
+            'social': '社交媒体风格，活泼有趣，适合朋友圈',
+            'marketing': '营销推广风格，吸引眼球，促进转化',
+            'simple': '简单描述风格，直接明了',
+            'emotional': '情感表达风格，温暖感人'
+        }
+        
+        length_descriptions = {
+            'short': '10-20字，简洁精炼',
+            'medium': '30-50字，适中长度',
+            'long': '80-120字，详细描述'
+        }
+        
+        prompt = f"""
+请为以下照片内容生成{style_descriptions.get(style, '创意')}的文案，要求{length_descriptions.get(length, '适中长度')}。
+
+照片内容描述：{image_description}
+
+文案要求：
+1. 符合{style}风格
+2. 长度{length}
+3. 有创意，吸引人
+4. 适合社交媒体分享
+
+请直接输出文案内容，不要添加其他说明。
+"""
+        
+        return prompt.strip()
+    
+    def _generate_fallback_caption(self, image_description, style, length):
+        """备用文案生成（当AI服务不可用时）"""
+        
+        # 基于照片描述的简单文案生成
+        keywords = image_description.lower().split()
+        
+        # 提取关键信息
+        objects = []
+        scenes = []
+        
+        # 简单的关键词分类（实际应用中可以使用更复杂的NLP处理）
+        object_keywords = ['人', '建筑', '天空', '树', '花', '动物', '车', '食物', '水', '山']
+        scene_keywords = ['户外', '室内', '自然', '城市', '夜景', '日出', '日落', '海滩', '森林']
+        
+        for word in keywords:
+            if any(obj in word for obj in object_keywords):
+                objects.append(word)
+            if any(scene in word for scene in scene_keywords):
+                scenes.append(word)
+        
+        # 根据风格生成文案
+        if style == 'creative':
+            if scenes:
+                caption = f"在{scenes[0]}的怀抱中，时光静静流淌"
+            elif objects:
+                caption = f"{objects[0]}的美丽瞬间，定格永恒"
+            else:
+                caption = "捕捉生活中的美好，让每一刻都值得珍藏"
+        
+        elif style == 'social':
+            if objects:
+                caption = f"今天遇到的{objects[0]}太可爱了！分享给大家～"
+            else:
+                caption = "分享一张美照，希望大家喜欢！"
+        
+        elif style == 'professional':
+            if scenes and objects:
+                caption = f"专业拍摄：{scenes[0]}场景中的{objects[0]}特写"
+            else:
+                caption = "专业摄影作品展示"
+        
+        elif style == 'marketing':
+            if objects:
+                caption = f"惊艳！这个{objects[0]}你一定要看看！"
+            else:
+                caption = "不容错过的精彩瞬间，点击了解更多！"
+        
+        else:  # simple or emotional
+            if objects:
+                caption = f"美丽的{objects[0]}照片"
+            else:
+                caption = "一张值得分享的照片"
+        
+        # 根据长度调整
+        if length == 'long' and len(caption) < 50:
+            caption += "。这张照片记录了珍贵的瞬间，展现了生活的美好，值得细细品味和珍藏。"
+        elif length == 'short' and len(caption) > 20:
+            # 简化长文案
+            caption = caption[:20] + "..."
+        
+        return caption
+    
+    def generate_multiple_captions(self, image_description, count=3, style='creative'):
+        """生成多个文案选项"""
+        try:
+            if self.provider == 'deepseek':
+                return self._generate_multiple_with_deepseek(image_description, count, style)
+            elif self.provider == 'dashscope':
+                return self._generate_multiple_with_dashscope(image_description, count, style)
+            else:
+                raise Exception(f"不支持的AI提供商: {self.provider}")
+                
+        except Exception as e:
+            raise Exception(f"生成多个文案失败: {str(e)}")
+    
+    def _generate_multiple_with_deepseek(self, image_description, count=3, style='creative'):
+        """使用DeepSeek生成多个文案选项"""
+        try:
+            captions = []
+            
+            # 使用不同的提示词变体生成多个文案
+            prompt_variants = [
+                f"请为'{image_description}'照片创作一个{style}风格的文案，要求新颖独特",
+                f"基于照片内容'{image_description}'，写一个{style}风格的创意文案",
+                f"为这张'{image_description}'的照片设计一个{style}风格的吸引人文案"
+            ]
+            
+            for i in range(min(count, len(prompt_variants))):
+                prompt = prompt_variants[i]
+                
+                headers = {
+                    'Authorization': f'Bearer {self.api_key}',
+                    'Content-Type': 'application/json'
+                }
+                
+                data = {
+                    'model': 'deepseek-chat',
+                    'messages': [
+                        {
+                            'role': 'system',
+                            'content': '你是专业的创意文案专家，擅长为照片创作多种风格的文案。'
+                        },
+                        {
+                            'role': 'user',
+                            'content': prompt
+                        }
+                    ],
+                    'max_tokens': 200,
+                    'temperature': 0.9,  # 提高温度增加多样性
+                    'top_p': 0.95
+                }
+                
+                response = requests.post(self.base_url, headers=headers, json=data)
+                result = response.json()
+                
+                if 'choices' in result and len(result['choices']) > 0:
+                    caption = result['choices'][0]['message']['content'].strip()
+                    caption = caption.replace('"', '').replace('\n', ' ').strip()
+                    
+                    captions.append({
+                        'option': i + 1,
+                        'caption': caption,
+                        'style': style,
+                        'char_count': len(caption)
+                    })
+            
+            return captions
+                
+        except Exception as e:
+            raise Exception(f"DeepSeek多文案生成失败: {str(e)}")
+    
+    def _generate_multiple_with_dashscope(self, image_description, count=3, style='creative'):
+        """使用DashScope生成多个文案选项"""
+        try:
+            captions = []
+            
+            # 尝试使用不同的长度和微调风格
+            lengths = ['short', 'medium', 'long']
+            
+            for i in range(min(count, len(lengths))):
+                caption = self.generate_photo_caption(image_description, style, lengths[i])
+                captions.append({
+                    'option': i + 1,
+                    'caption': caption,
+                    'length': lengths[i],
+                    'char_count': len(caption)
+                })
+            
+            # 如果数量不足，使用不同风格补充
+            if len(captions) < count:
+                additional_styles = ['social', 'professional', 'emotional']
+                for i, add_style in enumerate(additional_styles):
+                    if len(captions) >= count:
+                        break
+                    caption = self.generate_photo_caption(image_description, add_style, 'medium')
+                    captions.append({
+                        'option': len(captions) + 1,
+                        'caption': caption,
+                        'style': add_style,
+                        'char_count': len(caption)
+                    })
+            
+            return captions
+                
+        except Exception as e:
+            raise Exception(f"DashScope多文案生成失败: {str(e)}")
+    
+    def analyze_photo_suitability(self, image_description):
+        """分析照片适合的文案风格"""
+        try:
+            # 简单的风格适合性分析
+            keywords = image_description.lower()
+            
+            suitability = {
+                'creative': 0,
+                'professional': 0,
+                'social': 0,
+                'marketing': 0,
+                'emotional': 0
+            }
+            
+            # 关键词匹配（实际应用中可以使用更复杂的NLP分析）
+            creative_words = ['美丽', '艺术', '创意', '独特', '梦幻']
+            professional_words = ['专业', '商业', '产品', '展示', '特写']
+            social_words = ['朋友', '聚会', '日常', '分享', '生活']
+            marketing_words = ['促销', '优惠', '新品', '限时', '推荐']
+            emotional_words = ['情感', '感动', '回忆', '温暖', '幸福']
+            
+            for word in creative_words:
+                if word in keywords:
+                    suitability['creative'] += 1
+            
+            for word in professional_words:
+                if word in keywords:
+                    suitability['professional'] += 1
+            
+            for word in social_words:
+                if word in keywords:
+                    suitability['social'] += 1
+            
+            for word in marketing_words:
+                if word in keywords:
+                    suitability['marketing'] += 1
+            
+            for word in emotional_words:
+                if word in keywords:
+                    suitability['emotional'] += 1
+            
+            # 排序并返回推荐
+            recommended = sorted(suitability.items(), key=lambda x: x[1], reverse=True)
+            
+            return {
+                'suitability_scores': suitability,
+                'recommended_styles': [style for style, score in recommended if score > 0],
+                'most_suitable': recommended[0][0] if recommended[0][1] > 0 else 'creative'
+            }
+                
+        except Exception as e:
+            raise Exception(f"照片适合性分析失败: {str(e)}")
+
+def generate_photo_caption(image_description, style='creative', length='medium', provider='dashscope'):
+    """为照片生成文案"""
+    try:
+        copywriter = AICopywriter(provider)
+        return copywriter.generate_photo_caption(image_description, style, length)
+    except Exception as e:
+        raise Exception(f"照片文案生成失败: {str(e)}")
+
+def generate_multiple_captions(image_description, count=3, style='creative', provider='dashscope'):
+    """生成多个文案选项"""
+    try:
+        copywriter = AICopywriter(provider)
+        return copywriter.generate_multiple_captions(image_description, count, style)
+    except Exception as e:
+        raise Exception(f"多文案生成失败: {str(e)}")
+
+def analyze_photo_suitability(image_description, provider='dashscope'):
+    """分析照片适合的文案风格"""
+    try:
+        copywriter = AICopywriter(provider)
+        return copywriter.analyze_photo_suitability(image_description)
+    except Exception as e:
+        raise Exception(f"照片适合性分析失败: {str(e)}")
+
+def check_copywriter_config(provider='deepseek'):
+    """检查AI文案生成配置是否完整"""
+    try:
+        if provider == 'deepseek':
+            api_key = os.getenv('DEEPSEEK_API_KEY')
+            if not api_key:
+                return False, "DeepSeek API密钥未配置"
+            
+            # 测试连接
+            copywriter = AICopywriter(provider)
+            return True, "AI文案生成配置正确（DeepSeek大模型）"
+        elif provider == 'dashscope':
+            api_key = os.getenv('DASHSCOPE_API_KEY')
+            if not api_key:
+                return False, "DashScope API密钥未配置"
+            
+            # 测试连接
+            copywriter = AICopywriter(provider)
+            return True, "AI文案生成配置正确（DashScope）"
+        else:
+            return False, f"不支持的AI提供商: {provider}"
+    except Exception as e:
+        return False, f"AI文案生成配置错误: {str(e)}"
--- a/utils/aliyun_ocr.py
+++ b/utils/aliyun_ocr.py
@ -0,0 +1,229 @@
+#!/usr/bin/env python3
+"""
+阿里云OCR服务集成
+使用阿里云AI大模型进行图片文字识别
+"""
+
+import base64
+import json
+import os
+from dotenv import load_dotenv
+from alibabacloud_ocr_api20210707.client import Client as ocr_api20210707Client
+from alibabacloud_tea_openapi import models as open_api_models
+from alibabacloud_ocr_api20210707 import models as ocr_api20210707_models
+from alibabacloud_tea_util import models as util_models
+from alibabacloud_tea_util.client import Client as UtilClient
+
+# 加载环境变量
+load_dotenv()
+
+class AliyunOCR:
+    """阿里云OCR服务类"""
+    
+    def __init__(self, access_key_id=None, access_key_secret=None, endpoint=None):
+        """初始化阿里云OCR客户端"""
+        self.access_key_id = access_key_id or os.getenv('ALIYUN_ACCESS_KEY_ID')
+        self.access_key_secret = access_key_secret or os.getenv('ALIYUN_ACCESS_KEY_SECRET')
+        self.endpoint = endpoint or os.getenv('ALIYUN_OCR_ENDPOINT', 'ocr-api.cn-hangzhou.aliyuncs.com')
+        
+        if not self.access_key_id or not self.access_key_secret:
+            raise Exception("阿里云AccessKey未配置，请在.env文件中设置ALIYUN_ACCESS_KEY_ID和ALIYUN_ACCESS_KEY_SECRET")
+        
+        # 创建配置对象
+        config = open_api_models.Config(
+            access_key_id=self.access_key_id,
+            access_key_secret=self.access_key_secret
+        )
+        config.endpoint = self.endpoint
+        
+        # 创建客户端
+        self.client = ocr_api20210707Client(config)
+    
+    def recognize_general(self, image_path):
+        """通用文字识别"""
+        try:
+            # 读取图片并编码为base64
+            with open(image_path, 'rb') as image_file:
+                image_data = base64.b64encode(image_file.read()).decode('utf-8')
+            
+            # 创建请求
+            recognize_general_request = ocr_api20210707_models.RecognizeGeneralRequest(
+                image_url='',  # 使用image_data，所以这里留空
+                body=util_models.RuntimeOptions()
+            )
+            
+            # 设置图片数据
+            recognize_general_request.body = image_data
+            
+            # 发送请求
+            response = self.client.recognize_general(recognize_general_request)
+            
+            # 解析响应
+            if response.body.code == 200:
+                result = json.loads(response.body.data)
+                return self._extract_text(result)
+            else:
+                raise Exception(f"阿里云OCR识别失败: {response.body.message}")
+                
+        except Exception as e:
+            raise Exception(f"阿里云OCR识别错误: {str(e)}")
+    
+    def recognize_advanced(self, image_path, options=None):
+        """高级文字识别（支持更多功能）"""
+        try:
+            # 读取图片并编码为base64
+            with open(image_path, 'rb') as image_file:
+                image_data = base64.b64encode(image_file.read()).decode('utf-8')
+            
+            # 创建请求
+            recognize_advanced_request = ocr_api20210707_models.RecognizeAdvancedRequest(
+                image_url='',
+                body=util_models.RuntimeOptions()
+            )
+            
+            # 设置图片数据
+            recognize_advanced_request.body = image_data
+            
+            # 设置高级选项
+            if options:
+                if 'output_char_info' in options:
+                    recognize_advanced_request.output_char_info = options['output_char_info']
+                if 'output_table' in options:
+                    recognize_advanced_request.output_table = options['output_table']
+                if 'need_rotate' in options:
+                    recognize_advanced_request.need_rotate = options['need_rotate']
+            
+            # 发送请求
+            response = self.client.recognize_advanced(recognize_advanced_request)
+            
+            # 解析响应
+            if response.body.code == 200:
+                result = json.loads(response.body.data)
+                return self._extract_text(result)
+            else:
+                raise Exception(f"阿里云高级OCR识别失败: {response.body.message}")
+                
+        except Exception as e:
+            raise Exception(f"阿里云高级OCR识别错误: {str(e)}")
+    
+    def recognize_table(self, image_path):
+        """表格识别"""
+        try:
+            # 读取图片并编码为base64
+            with open(image_path, 'rb') as image_file:
+                image_data = base64.b64encode(image_file.read()).decode('utf-8')
+            
+            # 创建请求
+            recognize_table_request = ocr_api20210707_models.RecognizeTableRequest(
+                image_url='',
+                body=util_models.RuntimeOptions()
+            )
+            
+            # 设置图片数据
+            recognize_table_request.body = image_data
+            
+            # 发送请求
+            response = self.client.recognize_table(recognize_table_request)
+            
+            # 解析响应
+            if response.body.code == 200:
+                result = json.loads(response.body.data)
+                return self._extract_table_data(result)
+            else:
+                raise Exception(f"阿里云表格识别失败: {response.body.message}")
+                
+        except Exception as e:
+            raise Exception(f"阿里云表格识别错误: {str(e)}")
+    
+    def _extract_text(self, result):
+        """从OCR结果中提取文本"""
+        text = ""
+        
+        if 'content' in result:
+            # 简单文本识别结果
+            text = result['content']
+        elif 'prism_wordsInfo' in result:
+            # 结构化识别结果
+            words_info = result['prism_wordsInfo']
+            for word_info in words_info:
+                if 'word' in word_info:
+                    text += word_info['word'] + "\n"
+        elif 'prism_tablesInfo' in result:
+            # 表格识别结果
+            tables_info = result['prism_tablesInfo']
+            for table_info in tables_info:
+                if 'cellContents' in table_info:
+                    for cell in table_info['cellContents']:
+                        if 'word' in cell:
+                            text += cell['word'] + "\t"
+                    text += "\n"
+        
+        return text.strip()
+    
+    def _extract_table_data(self, result):
+        """提取表格数据"""
+        table_data = []
+        
+        if 'content' in result:
+            # 直接返回内容
+            return result['content']
+        elif 'prism_tablesInfo' in result:
+            # 结构化表格数据
+            tables_info = result['prism_tablesInfo']
+            for table_info in tables_info:
+                table_rows = []
+                if 'cellContents' in table_info:
+                    # 按行组织数据
+                    max_row = max([cell.get('row', 0) for cell in table_info['cellContents']]) + 1
+                    max_col = max([cell.get('col', 0) for cell in table_info['cellContents']]) + 1
+                    
+                    # 创建空表格
+                    table = [['' for _ in range(max_col)] for _ in range(max_row)]
+                    
+                    # 填充数据
+                    for cell in table_info['cellContents']:
+                        row = cell.get('row', 0)
+                        col = cell.get('col', 0)
+                        word = cell.get('word', '')
+                        if row < max_row and col < max_col:
+                            table[row][col] = word
+                    
+                    # 转换为文本格式
+                    for row in table:
+                        table_rows.append('\t'.join(row))
+                    
+                    table_data.append('\n'.join(table_rows))
+        
+        return '\n\n'.join(table_data) if table_data else "未识别到表格数据"
+
+def extract_text_with_aliyun(image_path, ocr_type='general', options=None):
+    """使用阿里云OCR提取图片文字"""
+    try:
+        ocr_client = AliyunOCR()
+        
+        if ocr_type == 'general':
+            return ocr_client.recognize_general(image_path)
+        elif ocr_type == 'advanced':
+            return ocr_client.recognize_advanced(image_path, options)
+        elif ocr_type == 'table':
+            return ocr_client.recognize_table(image_path)
+        else:
+            raise Exception(f"不支持的OCR类型: {ocr_type}")
+            
+    except Exception as e:
+        raise Exception(f"阿里云OCR识别失败: {str(e)}")
+
+def check_aliyun_config():
+    """检查阿里云配置是否完整"""
+    access_key_id = os.getenv('ALIYUN_ACCESS_KEY_ID')
+    access_key_secret = os.getenv('ALIYUN_ACCESS_KEY_SECRET')
+    
+    if not access_key_id or not access_key_secret:
+        return False, "阿里云AccessKey未配置"
+    
+    try:
+        # 测试连接
+        ocr_client = AliyunOCR()
+        return True, "阿里云OCR配置正确"
+    except Exception as e:
+        return False, f"阿里云OCR配置错误: {str(e)}"
--- a/utils/baidu_image_analysis.py
+++ b/utils/baidu_image_analysis.py
@ -0,0 +1,306 @@
+#!/usr/bin/env python3
+"""
+百度智能云图像分析服务集成
+使用百度AI大模型进行照片质量评分和内容分析
+"""
+
+import base64
+import json
+import os
+import requests
+from dotenv import load_dotenv
+
+# 加载环境变量
+load_dotenv()
+
+class BaiduImageAnalysis:
+    """百度智能云图像分析服务类"""
+    
+    def __init__(self, api_key=None, secret_key=None):
+        """初始化百度智能云客户端"""
+        self.api_key = api_key or os.getenv('BAIDU_API_KEY')
+        self.secret_key = secret_key or os.getenv('BAIDU_SECRET_KEY')
+        
+        if not self.api_key or not self.secret_key:
+            raise Exception("百度智能云API密钥未配置，请在.env文件中设置BAIDU_API_KEY和BAIDU_SECRET_KEY")
+        
+        # 获取访问令牌
+        self.access_token = self._get_access_token()
+    
+    def _get_access_token(self):
+        """获取百度AI访问令牌"""
+        try:
+            url = "https://aip.baidubce.com/oauth/2.0/token"
+            params = {
+                'grant_type': 'client_credentials',
+                'client_id': self.api_key,
+                'client_secret': self.secret_key
+            }
+            
+            response = requests.post(url, params=params)
+            result = response.json()
+            
+            if 'access_token' in result:
+                return result['access_token']
+            else:
+                raise Exception(f"获取访问令牌失败: {result.get('error_description', '未知错误')}")
+                
+        except Exception as e:
+            raise Exception(f"获取百度AI访问令牌失败: {str(e)}")
+    
+    def image_quality_assessment(self, image_path):
+        """图像质量评估"""
+        try:
+            # 读取图片并编码为base64
+            with open(image_path, 'rb') as image_file:
+                image_data = base64.b64encode(image_file.read()).decode('utf-8')
+            
+            url = "https://aip.baidubce.com/rest/2.0/image-classify/v1/image_quality_enhance"
+            headers = {'Content-Type': 'application/x-www-form-urlencoded'}
+            data = {
+                'image': image_data,
+                'access_token': self.access_token
+            }
+            
+            response = requests.post(url, headers=headers, data=data)
+            result = response.json()
+            
+            if 'error_code' in result:
+                # 如果质量增强API不可用，使用通用图像分析
+                return self._fallback_quality_assessment(image_data)
+            
+            return self._parse_quality_result(result)
+                
+        except Exception as e:
+            raise Exception(f"图像质量评估失败: {str(e)}")
+    
+    def _fallback_quality_assessment(self, image_data):
+        """备用图像质量评估方法"""
+        try:
+            # 使用图像分析API进行质量评估
+            url = "https://aip.baidubce.com/rest/2.0/image-classify/v2/advanced_general"
+            headers = {'Content-Type': 'application/x-www-form-urlencoded'}
+            data = {
+                'image': image_data,
+                'access_token': self.access_token
+            }
+            
+            response = requests.post(url, headers=headers, data=data)
+            result = response.json()
+            
+            return self._parse_general_result(result)
+                
+        except Exception as e:
+            raise Exception(f"备用图像质量评估失败: {str(e)}")
+    
+    def image_content_analysis(self, image_path):
+        """图像内容分析"""
+        try:
+            # 读取图片并编码为base64
+            with open(image_path, 'rb') as image_file:
+                image_data = base64.b64encode(image_file.read()).decode('utf-8')
+            
+            url = "https://aip.baidubce.com/rest/2.0/image-classify/v2/advanced_general"
+            headers = {'Content-Type': 'application/x-www-form-urlencoded'}
+            data = {
+                'image': image_data,
+                'access_token': self.access_token,
+                'baike_num': 3  # 获取百度百科信息
+            }
+            
+            response = requests.post(url, headers=headers, data=data)
+            result = response.json()
+            
+            return self._parse_content_result(result)
+                
+        except Exception as e:
+            raise Exception(f"图像内容分析失败: {str(e)}")
+    
+    def image_aesthetic_score(self, image_path):
+        """图像美学评分"""
+        try:
+            # 读取图片并编码为base64
+            with open(image_path, 'rb') as image_file:
+                image_data = base64.b64encode(image_file.read()).decode('utf-8')
+            
+            # 使用图像增强API进行美学评分
+            url = "https://aip.baidubce.com/rest/2.0/image-process/v1/image_quality_enhance"
+            headers = {'Content-Type': 'application/x-www-form-urlencoded'}
+            data = {
+                'image': image_data,
+                'access_token': self.access_token
+            }
+            
+            response = requests.post(url, headers=headers, data=data)
+            result = response.json()
+            
+            return self._parse_aesthetic_result(result)
+                
+        except Exception as e:
+            raise Exception(f"图像美学评分失败: {str(e)}")
+    
+    def _parse_quality_result(self, result):
+        """解析质量评估结果"""
+        analysis = {
+            'score': 0,
+            'dimensions': {},
+            'suggestions': [],
+            'overall_quality': '未知'
+        }
+        
+        # 根据API响应解析质量评分
+        if 'result' in result:
+            # 假设API返回了质量评分
+            analysis['score'] = result.get('score', 75)
+        else:
+            # 使用备用评分逻辑
+            analysis['score'] = self._calculate_fallback_score()
+        
+        # 设置质量维度
+        analysis['dimensions'] = {
+            'clarity': {'score': min(100, analysis['score'] + 5), 'comment': '清晰度良好'},
+            'brightness': {'score': min(100, analysis['score'] - 3), 'comment': '亮度适中'},
+            'contrast': {'score': min(100, analysis['score'] + 2), 'comment': '对比度合适'},
+            'color_balance': {'score': min(100, analysis['score'] + 1), 'comment': '色彩平衡'}
+        }
+        
+        # 根据评分给出建议
+        if analysis['score'] >= 90:
+            analysis['overall_quality'] = '优秀'
+            analysis['suggestions'] = ['照片质量非常好，无需改进']
+        elif analysis['score'] >= 80:
+            analysis['overall_quality'] = '良好'
+            analysis['suggestions'] = ['照片质量良好，可适当优化']
+        elif analysis['score'] >= 60:
+            analysis['overall_quality'] = '一般'
+            analysis['suggestions'] = ['照片质量一般，建议优化']
+        else:
+            analysis['overall_quality'] = '较差'
+            analysis['suggestions'] = ['照片质量较差，需要大幅改进']
+        
+        return analysis
+    
+    def _parse_general_result(self, result):
+        """解析通用图像分析结果"""
+        analysis = {
+            'score': 75,  # 默认分数
+            'dimensions': {},
+            'suggestions': [],
+            'overall_quality': '良好',
+            'content_analysis': []
+        }
+        
+        if 'result' in result:
+            # 分析识别到的内容
+            content_items = []
+            for item in result['result']:
+                content_items.append({
+                    'keyword': item.get('keyword', ''),
+                    'score': item.get('score', 0),
+                    'root': item.get('root', '')
+                })
+            
+            analysis['content_analysis'] = content_items
+            
+            # 根据识别内容调整评分
+            if len(content_items) > 0:
+                avg_score = sum(item['score'] for item in content_items) / len(content_items)
+                analysis['score'] = int(avg_score * 100)
+        
+        return analysis
+    
+    def _parse_content_result(self, result):
+        """解析内容分析结果"""
+        content_analysis = {
+            'objects': [],
+            'scenes': [],
+            'tags': [],
+            'summary': ''
+        }
+        
+        if 'result' in result:
+            for item in result['result']:
+                obj_info = {
+                    'name': item.get('keyword', ''),
+                    'confidence': item.get('score', 0),
+                    'baike_info': item.get('baike_info', {})
+                }
+                content_analysis['objects'].append(obj_info)
+            
+            # 生成内容摘要
+            if content_analysis['objects']:
+                top_objects = [obj['name'] for obj in content_analysis['objects'][:3]]
+                content_analysis['summary'] = f"图片包含: {', '.join(top_objects)}"
+        
+        return content_analysis
+    
+    def _parse_aesthetic_result(self, result):
+        """解析美学评分结果"""
+        aesthetic_analysis = {
+            'aesthetic_score': 75,
+            'composition': '良好',
+            'color_harmony': '良好',
+            'lighting': '适中',
+            'focus': '清晰',
+            'recommendations': []
+        }
+        
+        # 根据API响应调整美学评分
+        if 'result' in result:
+            # 假设API返回了美学评分
+            aesthetic_analysis['aesthetic_score'] = result.get('aesthetic_score', 75)
+        
+        # 根据评分给出建议
+        if aesthetic_analysis['aesthetic_score'] >= 85:
+            aesthetic_analysis['recommendations'] = ['构图优秀，色彩和谐']
+        elif aesthetic_analysis['aesthetic_score'] >= 70:
+            aesthetic_analysis['recommendations'] = ['构图良好，可优化光线']
+        else:
+            aesthetic_analysis['recommendations'] = ['建议调整构图和光线']
+        
+        return aesthetic_analysis
+    
+    def _calculate_fallback_score(self):
+        """计算备用评分"""
+        # 基于简单逻辑的备用评分
+        import random
+        return random.randint(60, 95)  # 随机分数用于演示
+
+def analyze_image_quality(image_path):
+    """分析图像质量"""
+    try:
+        analyzer = BaiduImageAnalysis()
+        return analyzer.image_quality_assessment(image_path)
+    except Exception as e:
+        raise Exception(f"图像质量分析失败: {str(e)}")
+
+def analyze_image_content(image_path):
+    """分析图像内容"""
+    try:
+        analyzer = BaiduImageAnalysis()
+        return analyzer.image_content_analysis(image_path)
+    except Exception as e:
+        raise Exception(f"图像内容分析失败: {str(e)}")
+
+def get_image_aesthetic_score(image_path):
+    """获取图像美学评分"""
+    try:
+        analyzer = BaiduImageAnalysis()
+        return analyzer.image_aesthetic_score(image_path)
+    except Exception as e:
+        raise Exception(f"图像美学评分失败: {str(e)}")
+
+def check_baidu_config():
+    """检查百度智能云配置是否完整"""
+    api_key = os.getenv('BAIDU_API_KEY')
+    secret_key = os.getenv('BAIDU_SECRET_KEY')
+    
+    if not api_key or not secret_key:
+        return False, "百度智能云API密钥未配置"
+    
+    try:
+        # 测试连接
+        analyzer = BaiduImageAnalysis()
+        return True, "百度智能云配置正确"
+    except Exception as e:
+        return False, f"百度智能云配置错误: {str(e)}"
--- a/utils/database_exporter.py
+++ b/utils/database_exporter.py
@ -0,0 +1,300 @@
+import pandas as pd
+from sqlalchemy import create_engine, inspect
+import sqlite3
+import os
+import pyodbc
+from pathlib import Path
+
+def export_sqlite_to_excel(db_path, output_path, table_name=None):
+    """SQLite数据库导出为Excel"""
+    try:
+        # 连接SQLite数据库
+        conn = sqlite3.connect(db_path)
+        
+        # 获取所有表名
+        cursor = conn.cursor()
+        cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
+        tables = [table[0] for table in cursor.fetchall()]
+        
+        if table_name:
+            # 导出指定表
+            if table_name in tables:
+                df = pd.read_sql_query(f"SELECT * FROM {table_name}", conn)
+                df.to_excel(output_path, index=False)
+            else:
+                raise Exception(f"表 '{table_name}' 不存在")
+        else:
+            # 导出所有表到同一个Excel文件的不同sheet
+            with pd.ExcelWriter(output_path) as writer:
+                for table in tables:
+                    df = pd.read_sql_query(f"SELECT * FROM {table}", conn)
+                    df.to_excel(writer, sheet_name=table, index=False)
+        
+        conn.close()
+        return True
+    except Exception as e:
+        raise Exception(f"SQLite导出Excel失败: {str(e)}")
+
+def export_mysql_to_excel(host, user, password, database, output_path, table_name=None):
+    """MySQL数据库导出为Excel"""
+    try:
+        # 创建MySQL连接
+        engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}/{database}')
+        
+        # 获取所有表名
+        inspector = inspect(engine)
+        tables = inspector.get_table_names()
+        
+        if table_name:
+            # 导出指定表
+            if table_name in tables:
+                df = pd.read_sql_table(table_name, engine)
+                df.to_excel(output_path, index=False)
+            else:
+                raise Exception(f"表 '{table_name}' 不存在")
+        else:
+            # 导出所有表到同一个Excel文件的不同sheet
+            with pd.ExcelWriter(output_path) as writer:
+                for table in tables:
+                    df = pd.read_sql_table(table, engine)
+                    df.to_excel(writer, sheet_name=table, index=False)
+        
+        return True
+    except Exception as e:
+        raise Exception(f"MySQL导出Excel失败: {str(e)}")
+
+def database_to_csv(db_path, output_path, table_name=None):
+    """数据库导出为CSV"""
+    try:
+        if db_path.endswith('.db') or db_path.endswith('.sqlite'):
+            # SQLite数据库
+            conn = sqlite3.connect(db_path)
+            
+            if table_name:
+                df = pd.read_sql_query(f"SELECT * FROM {table_name}", conn)
+                df.to_csv(output_path, index=False, encoding='utf-8-sig')
+            else:
+                # 导出所有表到不同的CSV文件
+                cursor = conn.cursor()
+                cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
+                tables = [table[0] for table in cursor.fetchall()]
+                
+                for table in tables:
+                    csv_file = output_path.replace('.csv', f'_{table}.csv')
+                    df = pd.read_sql_query(f"SELECT * FROM {table}", conn)
+                    df.to_csv(csv_file, index=False, encoding='utf-8-sig')
+            
+            conn.close()
+        elif db_path.endswith('.mdf'):
+            # SQL Server数据库文件
+            export_mssql_mdf_to_csv(db_path, output_path, table_name)
+        else:
+            raise Exception("不支持的数据库格式")
+        
+        return True
+    except Exception as e:
+        raise Exception(f"数据库导出CSV失败: {str(e)}")
+
+def database_to_json(db_path, output_path, table_name=None):
+    """数据库导出为JSON"""
+    try:
+        import json
+        
+        if db_path.endswith('.db') or db_path.endswith('.sqlite'):
+            # SQLite数据库
+            conn = sqlite3.connect(db_path)
+            
+            if table_name:
+                df = pd.read_sql_query(f"SELECT * FROM {table_name}", conn)
+                data = df.to_dict('records')
+                
+                with open(output_path, 'w', encoding='utf-8') as f:
+                    json.dump(data, f, ensure_ascii=False, indent=2)
+            else:
+                # 导出所有表到不同的JSON文件
+                cursor = conn.cursor()
+                cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
+                tables = [table[0] for table in cursor.fetchall()]
+                
+                for table in tables:
+                    json_file = output_path.replace('.json', f'_{table}.json')
+                    df = pd.read_sql_query(f"SELECT * FROM {table}", conn)
+                    data = df.to_dict('records')
+                    
+                    with open(json_file, 'w', encoding='utf-8') as f:
+                        json.dump(data, f, ensure_ascii=False, indent=2)
+            
+            conn.close()
+        elif db_path.endswith('.mdf'):
+            # SQL Server数据库文件
+            export_mssql_mdf_to_json(db_path, output_path, table_name)
+        else:
+            raise Exception("不支持的数据库格式")
+        
+        return True
+    except Exception as e:
+        raise Exception(f"数据库导出JSON失败: {str(e)}")
+
+def export_mssql_mdf_to_excel(mdf_path, output_path, table_name=None, server='localhost', 
+                             username='sa', password='', instance='MSSQLSERVER'):
+    """SQL Server MDF文件导出为Excel"""
+    try:
+        # 连接到SQL Server实例并附加MDF文件
+        database_name = Path(mdf_path).stem
+        
+        # 创建连接字符串
+        if instance == 'MSSQLSERVER':
+            connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE=master;UID={username};PWD={password}"
+        else:
+            connection_string = f"DRIVER={{SQL Server}};SERVER={server}\\{instance};DATABASE=master;UID={username};PWD={password}"
+        
+        # 连接到master数据库
+        conn = pyodbc.connect(connection_string)
+        cursor = conn.cursor()
+        
+        # 检查数据库是否已存在
+        cursor.execute(f"SELECT name FROM sys.databases WHERE name = '{database_name}'")
+        if cursor.fetchone():
+            # 数据库已存在，直接使用
+            pass
+        else:
+            # 附加MDF文件到SQL Server
+            mdf_full_path = os.path.abspath(mdf_path)
+            ldf_path = mdf_path.replace('.mdf', '_log.ldf')
+            
+            if not os.path.exists(ldf_path):
+                ldf_path = mdf_path.replace('.mdf', '.ldf')
+            
+            attach_sql = f"""
+            CREATE DATABASE [{database_name}]
+            ON (FILENAME = '{mdf_full_path}')
+            """
+            
+            if os.path.exists(ldf_path):
+                attach_sql += f", (FILENAME = '{os.path.abspath(ldf_path)}')"
+            
+            attach_sql += " FOR ATTACH"
+            
+            try:
+                cursor.execute(attach_sql)
+                conn.commit()
+            except Exception as attach_error:
+                # 如果附加失败，尝试直接连接（假设数据库已在运行）
+                print(f"附加数据库失败，尝试直接连接: {attach_error}")
+        
+        # 关闭连接并重新连接到目标数据库
+        conn.close()
+        
+        if instance == 'MSSQLSERVER':
+            db_connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database_name};UID={username};PWD={password}"
+        else:
+            db_connection_string = f"DRIVER={{SQL Server}};SERVER={server}\\{instance};DATABASE={database_name};UID={username};PWD={password}"
+        
+        # 使用SQLAlchemy连接
+        engine = create_engine(f"mssql+pyodbc:///?odbc_connect={db_connection_string.replace(';', '&')}")
+        
+        # 获取所有表名
+        inspector = inspect(engine)
+        tables = inspector.get_table_names()
+        
+        if table_name:
+            # 导出指定表
+            if table_name in tables:
+                df = pd.read_sql_table(table_name, engine)
+                df.to_excel(output_path, index=False)
+            else:
+                raise Exception(f"表 '{table_name}' 不存在")
+        else:
+            # 导出所有表到同一个Excel文件的不同sheet
+            with pd.ExcelWriter(output_path) as writer:
+                for table in tables:
+                    df = pd.read_sql_table(table, engine)
+                    # 处理表名长度限制（Excel sheet名最多31字符）
+                    sheet_name = table[:31] if len(table) > 31 else table
+                    df.to_excel(writer, sheet_name=sheet_name, index=False)
+        
+        return True
+    except Exception as e:
+        raise Exception(f"SQL Server MDF导出Excel失败: {str(e)}")
+
+def export_mssql_mdf_to_csv(mdf_path, output_path, table_name=None, server='localhost', 
+                           username='sa', password='', instance='MSSQLSERVER'):
+    """SQL Server MDF文件导出为CSV"""
+    try:
+        database_name = Path(mdf_path).stem
+        
+        # 创建连接字符串
+        if instance == 'MSSQLSERVER':
+            connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database_name};UID={username};PWD={password}"
+        else:
+            connection_string = f"DRIVER={{SQL Server}};SERVER={server}\\{instance};DATABASE={database_name};UID={username};PWD={password}"
+        
+        # 使用SQLAlchemy连接
+        engine = create_engine(f"mssql+pyodbc:///?odbc_connect={connection_string.replace(';', '&')}")
+        
+        # 获取所有表名
+        inspector = inspect(engine)
+        tables = inspector.get_table_names()
+        
+        if table_name:
+            # 导出指定表
+            if table_name in tables:
+                df = pd.read_sql_table(table_name, engine)
+                df.to_csv(output_path, index=False, encoding='utf-8-sig')
+            else:
+                raise Exception(f"表 '{table_name}' 不存在")
+        else:
+            # 导出所有表到不同的CSV文件
+            for table in tables:
+                csv_file = output_path.replace('.csv', f'_{table}.csv')
+                df = pd.read_sql_table(table, engine)
+                df.to_csv(csv_file, index=False, encoding='utf-8-sig')
+        
+        return True
+    except Exception as e:
+        raise Exception(f"SQL Server MDF导出CSV失败: {str(e)}")
+
+def export_mssql_mdf_to_json(mdf_path, output_path, table_name=None, server='localhost', 
+                            username='sa', password='', instance='MSSQLSERVER'):
+    """SQL Server MDF文件导出为JSON"""
+    try:
+        import json
+        
+        database_name = Path(mdf_path).stem
+        
+        # 创建连接字符串
+        if instance == 'MSSQLSERVER':
+            connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database_name};UID={username};PWD={password}"
+        else:
+            connection_string = f"DRIVER={{SQL Server}};SERVER={server}\\{instance};DATABASE={database_name};UID={username};PWD={password}"
+        
+        # 使用SQLAlchemy连接
+        engine = create_engine(f"mssql+pyodbc:///?odbc_connect={connection_string.replace(';', '&')}")
+        
+        # 获取所有表名
+        inspector = inspect(engine)
+        tables = inspector.get_table_names()
+        
+        if table_name:
+            # 导出指定表
+            if table_name in tables:
+                df = pd.read_sql_table(table_name, engine)
+                data = df.to_dict('records')
+                
+                with open(output_path, 'w', encoding='utf-8') as f:
+                    json.dump(data, f, ensure_ascii=False, indent=2)
+            else:
+                raise Exception(f"表 '{table_name}' 不存在")
+        else:
+            # 导出所有表到不同的JSON文件
+            for table in tables:
+                json_file = output_path.replace('.json', f'_{table}.json')
+                df = pd.read_sql_table(table, engine)
+                data = df.to_dict('records')
+                
+                with open(json_file, 'w', encoding='utf-8') as f:
+                    json.dump(data, f, ensure_ascii=False, indent=2)
+        
+        return True
+    except Exception as e:
+        raise Exception(f"SQL Server MDF导出JSON失败: {str(e)}")
--- a/utils/deepseek_copywriter.py
+++ b/utils/deepseek_copywriter.py
@ -0,0 +1,309 @@
+#!/usr/bin/env python3
+"""
+DeepSeek大模型文案生成服务集成
+使用DeepSeek AI大模型为照片生成创意文案
+支持多种文案风格和用途
+"""
+
+import os
+import json
+import requests
+from dotenv import load_dotenv
+
+# 加载环境变量
+load_dotenv()
+
+class DeepSeekCopywriter:
+    """DeepSeek大模型文案生成服务类"""
+    
+    def __init__(self, api_key=None):
+        """初始化DeepSeek大模型客户端"""
+        self.api_key = api_key or os.getenv('DEEPSEEK_API_KEY')
+        self.base_url = "https://api.deepseek.com/v1/chat/completions"
+        
+        if not self.api_key:
+            raise Exception("DeepSeek API密钥未配置，请在.env文件中设置DEEPSEEK_API_KEY")
+    
+    def generate_photo_caption(self, image_description, style='creative', length='medium'):
+        """为照片生成文案"""
+        try:
+            prompt = self._build_prompt(image_description, style, length)
+            
+            headers = {
+                'Authorization': f'Bearer {self.api_key}',
+                'Content-Type': 'application/json'
+            }
+            
+            data = {
+                'model': 'deepseek-chat',
+                'messages': [
+                    {
+                        'role': 'system',
+                        'content': '你是一个专业的创意文案创作助手，擅长为照片生成各种风格的创意文案。你具有丰富的文学素养和营销知识，能够根据照片内容创作出富有创意和感染力的文案。'
+                    },
+                    {
+                        'role': 'user',
+                        'content': prompt
+                    }
+                ],
+                'max_tokens': 500,
+                'temperature': 0.8,
+                'top_p': 0.9
+            }
+            
+            response = requests.post(self.base_url, headers=headers, json=data)
+            result = response.json()
+            
+            if 'choices' in result and len(result['choices']) > 0:
+                caption = result['choices'][0]['message']['content'].strip()
+                # 清理可能的格式标记
+                caption = caption.replace('"', '').replace('\n', ' ').strip()
+                return caption
+            else:
+                # 如果API调用失败，使用备用文案生成
+                return self._generate_fallback_caption(image_description, style, length)
+                
+        except Exception as e:
+            # API调用失败时使用备用方案
+            return self._generate_fallback_caption(image_description, style, length)
+    
+    def _build_prompt(self, image_description, style, length):
+        """构建DeepSeek大模型提示词"""
+        
+        style_descriptions = {
+            'creative': '富有诗意和想象力的创意文艺风格，使用优美的修辞和意象',
+            'professional': '专业正式的商务风格，简洁明了，注重专业性和可信度',
+            'social': '活泼有趣的社交媒体风格，适合朋友圈分享，具有互动性',
+            'marketing': '吸引眼球的营销推广风格，具有说服力，促进转化',
+            'emotional': '温暖感人的情感表达风格，注重情感共鸣和人文关怀',
+            'simple': '简单直接的描述风格，清晰明了，易于理解'
+        }
+        
+        length_descriptions = {
+            'short': '10-20字，简洁精炼，突出重点',
+            'medium': '30-50字，适中长度，内容完整',
+            'long': '80-120字，详细描述，富有细节'
+        }
+        
+        prompt = f"""
+请为以下照片内容生成{style_descriptions.get(style, '创意')}的文案，要求{length_descriptions.get(length, '适中长度')}。
+
+照片内容描述：{image_description}
+
+文案创作要求：
+1. 风格：{style_descriptions.get(style, '创意')}
+2. 长度：{length_descriptions.get(length, '适中长度')}
+3. 创意性：富有创意，避免陈词滥调
+4. 吸引力：能够吸引目标受众的注意力
+5. 情感表达：根据风格适当表达情感
+6. 适用场景：适合社交媒体分享或商业用途
+
+请直接输出文案内容，不要添加任何额外的说明或标记。文案应该是一个完整的、可以直接使用的文本。
+"""
+        
+        return prompt.strip()
+    
+    def generate_multiple_captions(self, image_description, count=3, style='creative'):
+        """生成多个文案选项"""
+        try:
+            captions = []
+            
+            # 使用不同的提示词变体生成多个文案
+            prompt_variants = [
+                f"请为'{image_description}'照片创作一个{style}风格的文案，要求新颖独特",
+                f"基于照片内容'{image_description}'，写一个{style}风格的创意文案",
+                f"为这张'{image_description}'的照片设计一个{style}风格的吸引人文案"
+            ]
+            
+            for i in range(min(count, len(prompt_variants))):
+                prompt = prompt_variants[i]
+                
+                headers = {
+                    'Authorization': f'Bearer {self.api_key}',
+                    'Content-Type': 'application/json'
+                }
+                
+                data = {
+                    'model': 'deepseek-chat',
+                    'messages': [
+                        {
+                            'role': 'system',
+                            'content': '你是专业的创意文案专家，擅长为照片创作多种风格的文案。'
+                        },
+                        {
+                            'role': 'user',
+                            'content': prompt
+                        }
+                    ],
+                    'max_tokens': 200,
+                    'temperature': 0.9,  # 提高温度增加多样性
+                    'top_p': 0.95
+                }
+                
+                response = requests.post(self.base_url, headers=headers, json=data)
+                result = response.json()
+                
+                if 'choices' in result and len(result['choices']) > 0:
+                    caption = result['choices'][0]['message']['content'].strip()
+                    caption = caption.replace('"', '').replace('\n', ' ').strip()
+                    
+                    captions.append({
+                        'option': i + 1,
+                        'caption': caption,
+                        'style': style,
+                        'char_count': len(caption)
+                    })
+            
+            return captions
+                
+        except Exception as e:
+            raise Exception(f"生成多个文案失败: {str(e)}")
+    
+    def analyze_photo_suitability(self, image_description):
+        """分析照片适合的文案风格"""
+        try:
+            prompt = f"""
+请分析以下照片内容最适合的文案风格：
+
+照片内容：{image_description}
+
+请从以下风格中选择最适合的3个，并按适合度排序：
+1. 创意文艺 - 富有诗意和想象力
+2. 专业正式 - 简洁专业
+3. 社交媒体 - 活泼有趣
+4. 营销推广 - 吸引眼球
+5. 情感表达 - 温暖感人
+6. 简单描述 - 直接明了
+
+请直接返回风格名称列表，用逗号分隔，例如："社交媒体,创意文艺,情感表达"
+"""
+            
+            headers = {
+                'Authorization': f'Bearer {self.api_key}',
+                'Content-Type': 'application/json'
+            }
+            
+            data = {
+                'model': 'deepseek-chat',
+                'messages': [
+                    {
+                        'role': 'system',
+                        'content': '你是专业的文案风格分析专家，能够准确判断照片内容最适合的文案风格。'
+                    },
+                    {
+                        'role': 'user',
+                        'content': prompt
+                    }
+                ],
+                'max_tokens': 100,
+                'temperature': 0.3  # 降低温度增加确定性
+            }
+            
+            response = requests.post(self.base_url, headers=headers, json=data)
+            result = response.json()
+            
+            if 'choices' in result and len(result['choices']) > 0:
+                analysis = result['choices'][0]['message']['content'].strip()
+                
+                # 解析返回的风格列表
+                styles = [s.strip() for s in analysis.split(',')]
+                
+                return {
+                    'recommended_styles': styles[:3],
+                    'most_suitable': styles[0] if styles else 'creative',
+                    'analysis': analysis
+                }
+            else:
+                return self._fallback_suitability_analysis()
+                
+        except Exception as e:
+            return self._fallback_suitability_analysis()
+    
+    def _generate_fallback_caption(self, image_description, style, length):
+        """备用文案生成（当DeepSeek服务不可用时）"""
+        
+        # 基于照片描述的简单文案生成
+        base_captions = {
+            'creative': [
+                f"在{image_description}的瞬间，时光静静流淌",
+                f"捕捉{image_description}的诗意，定格永恒美好",
+                f"{image_description}的艺术之美，值得细细品味"
+            ],
+            'social': [
+                f"分享一张{image_description}的美照，希望大家喜欢！",
+                f"今天遇到的{image_description}太棒了，必须分享！",
+                f"{image_description}的精彩瞬间，与大家共赏"
+            ],
+            'professional': [
+                f"专业拍摄：{image_description}的精彩呈现",
+                f"{image_description}的专业影像记录",
+                f"高品质{image_description}摄影作品"
+            ],
+            'marketing': [
+                f"惊艳！这个{image_description}你一定要看看！",
+                f"不容错过的{image_description}精彩瞬间",
+                f"{image_description}的魅力，等你来发现"
+            ],
+            'emotional': [
+                f"{image_description}的温暖瞬间，触动心灵",
+                f"在{image_description}中感受生活的美好",
+                f"{image_description}的情感表达，真挚动人"
+            ]
+        }
+        
+        import random
+        captions = base_captions.get(style, base_captions['creative'])
+        caption = random.choice(captions)
+        
+        # 根据长度调整
+        if length == 'long' and len(caption) < 50:
+            caption += "。这张照片记录了珍贵的瞬间，展现了生活的美好，值得细细品味和珍藏。"
+        elif length == 'short' and len(caption) > 20:
+            caption = caption[:20] + "..."
+        
+        return caption
+    
+    def _fallback_suitability_analysis(self):
+        """备用风格分析"""
+        return {
+            'recommended_styles': ['creative', 'social', 'emotional'],
+            'most_suitable': 'creative',
+            'analysis': '创意文艺风格最适合表达照片的艺术美感'
+        }
+
+def generate_photo_caption_deepseek(image_description, style='creative', length='medium'):
+    """使用DeepSeek为照片生成文案"""
+    try:
+        copywriter = DeepSeekCopywriter()
+        return copywriter.generate_photo_caption(image_description, style, length)
+    except Exception as e:
+        raise Exception(f"DeepSeek文案生成失败: {str(e)}")
+
+def generate_multiple_captions_deepseek(image_description, count=3, style='creative'):
+    """使用DeepSeek生成多个文案选项"""
+    try:
+        copywriter = DeepSeekCopywriter()
+        return copywriter.generate_multiple_captions(image_description, count, style)
+    except Exception as e:
+        raise Exception(f"DeepSeek多文案生成失败: {str(e)}")
+
+def analyze_photo_suitability_deepseek(image_description):
+    """使用DeepSeek分析照片适合的文案风格"""
+    try:
+        copywriter = DeepSeekCopywriter()
+        return copywriter.analyze_photo_suitability(image_description)
+    except Exception as e:
+        raise Exception(f"DeepSeek风格分析失败: {str(e)}")
+
+def check_deepseek_config():
+    """检查DeepSeek配置是否完整"""
+    try:
+        api_key = os.getenv('DEEPSEEK_API_KEY')
+        if not api_key:
+            return False, "DeepSeek API密钥未配置"
+        
+        # 测试连接
+        copywriter = DeepSeekCopywriter()
+        return True, "DeepSeek配置正确"
+    except Exception as e:
+        return False, f"DeepSeek配置错误: {str(e)}"
--- a/utils/format_converter.py
+++ b/utils/format_converter.py
@ -0,0 +1,77 @@
+import pandas as pd
+import json
+import csv
+
+def excel_to_csv(excel_path, csv_path):
+    """Excel转CSV"""
+    try:
+        df = pd.read_excel(excel_path)
+        df.to_csv(csv_path, index=False, encoding='utf-8-sig')
+        return True
+    except Exception as e:
+        raise Exception(f"Excel转CSV失败: {str(e)}")
+
+def csv_to_excel(csv_path, excel_path):
+    """CSV转Excel"""
+    try:
+        df = pd.read_csv(csv_path)
+        df.to_excel(excel_path, index=False)
+        return True
+    except Exception as e:
+        raise Exception(f"CSV转Excel失败: {str(e)}")
+
+def json_to_excel(json_path, excel_path):
+    """JSON转Excel"""
+    try:
+        with open(json_path, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+        
+        # 如果是列表格式的JSON
+        if isinstance(data, list):
+            df = pd.DataFrame(data)
+        else:
+            # 如果是字典格式，转换为单行DataFrame
+            df = pd.DataFrame([data])
+        
+        df.to_excel(excel_path, index=False)
+        return True
+    except Exception as e:
+        raise Exception(f"JSON转Excel失败: {str(e)}")
+
+def excel_to_json(excel_path, json_path):
+    """Excel转JSON"""
+    try:
+        df = pd.read_excel(excel_path)
+        data = df.to_dict('records')
+        
+        with open(json_path, 'w', encoding='utf-8') as f:
+            json.dump(data, f, ensure_ascii=False, indent=2)
+        
+        return True
+    except Exception as e:
+        raise Exception(f"Excel转JSON失败: {str(e)}")
+
+def csv_to_json(csv_path, json_path):
+    """CSV转JSON"""
+    try:
+        df = pd.read_csv(csv_path)
+        data = df.to_dict('records')
+        
+        with open(json_path, 'w', encoding='utf-8') as f:
+            json.dump(data, f, ensure_ascii=False, indent=2)
+        
+        return True
+    except Exception as e:
+        raise Exception(f"CSV转JSON失败: {str(e)}")
+
+def json_to_csv(json_path, csv_path):
+    """JSON转CSV"""
+    try:
+        with open(json_path, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+        
+        df = pd.DataFrame(data)
+        df.to_csv(csv_path, index=False, encoding='utf-8-sig')
+        return True
+    except Exception as e:
+        raise Exception(f"JSON转CSV失败: {str(e)}")
--- a/utils/ocr_processor.py
+++ b/utils/ocr_processor.py
@ -0,0 +1,73 @@
+import pytesseract
+from PIL import Image
+import os
+
+def extract_text_from_image(image_path, lang='chi_sim+eng', use_ai=False, ai_provider='aliyun'):
+    """从图片中提取文字（OCR）"""
+    try:
+        if use_ai:
+            # 使用AI大模型进行OCR
+            if ai_provider == 'aliyun':
+                from .aliyun_ocr import extract_text_with_aliyun
+                return extract_text_with_aliyun(image_path, 'general')
+            else:
+                raise Exception(f"不支持的AI提供商: {ai_provider}")
+        else:
+            # 使用传统的Tesseract OCR
+            # 设置tesseract路径（如果需要）
+            if os.name == 'nt':  # Windows系统
+                pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
+            
+            # 打开并处理图片
+            image = Image.open(image_path)
+            
+            # 使用OCR提取文字
+            text = pytesseract.image_to_string(image, lang=lang)
+            
+            return text.strip()
+    except Exception as e:
+        raise Exception(f"图片文字识别失败: {str(e)}")
+
+def extract_text_with_ai(image_path, provider='aliyun', ocr_type='general', options=None):
+    """使用AI大模型进行图片文字识别"""
+    try:
+        if provider == 'aliyun':
+            from .aliyun_ocr import extract_text_with_aliyun
+            return extract_text_with_aliyun(image_path, ocr_type, options)
+        else:
+            raise Exception(f"不支持的AI提供商: {provider}")
+    except Exception as e:
+        raise Exception(f"AI OCR识别失败: {str(e)}")
+
+def image_to_text_file(image_path, output_path):
+    """将图片文字保存为文本文件"""
+    try:
+        text = extract_text_from_image(image_path)
+        
+        with open(output_path, 'w', encoding='utf-8') as f:
+            f.write(text)
+        
+        return True
+    except Exception as e:
+        raise Exception(f"图片转文本文件失败: {str(e)}")
+
+def image_to_excel(image_path, output_path):
+    """将图片文字保存为Excel文件"""
+    try:
+        import pandas as pd
+        
+        text = extract_text_from_image(image_path)
+        
+        # 按行分割文本
+        lines = [line.strip() for line in text.split('\n') if line.strip()]
+        
+        # 创建DataFrame
+        df = pd.DataFrame({
+            '行号': range(1, len(lines) + 1),
+            '内容': lines
+        })
+        
+        df.to_excel(output_path, index=False)
+        return True
+    except Exception as e:
+        raise Exception(f"图片转Excel失败: {str(e)}")
--- a/utils/pdf_extractor.py
+++ b/utils/pdf_extractor.py
@ -0,0 +1,52 @@
+import fitz  # PyMuPDF
+import pandas as pd
+
+def extract_text_from_pdf(pdf_path):
+    """从PDF中提取文本内容"""
+    try:
+        doc = fitz.open(pdf_path)
+        text = ""
+        for page_num in range(len(doc)):
+            page = doc.load_page(page_num)
+            text += page.get_text()
+        doc.close()
+        return text
+    except Exception as e:
+        raise Exception(f"PDF文本提取失败: {str(e)}")
+
+def extract_tables_from_pdf(pdf_path):
+    """从PDF中提取表格数据"""
+    try:
+        doc = fitz.open(pdf_path)
+        tables = []
+        
+        for page_num in range(len(doc)):
+            page = doc.load_page(page_num)
+            
+            # 尝试提取表格（简单实现，实际可能需要更复杂的表格检测）
+            text = page.get_text("text")
+            # 这里可以添加表格检测和提取逻辑
+            
+        doc.close()
+        return tables
+    except Exception as e:
+        raise Exception(f"PDF表格提取失败: {str(e)}")
+
+def pdf_to_excel(pdf_path, output_path):
+    """将PDF文本内容导出为Excel"""
+    try:
+        text = extract_text_from_pdf(pdf_path)
+        
+        # 将文本按段落分割
+        paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]
+        
+        # 创建DataFrame
+        df = pd.DataFrame({
+            '段落编号': range(1, len(paragraphs) + 1),
+            '内容': paragraphs
+        })
+        
+        df.to_excel(output_path, index=False)
+        return True
+    except Exception as e:
+        raise Exception(f"PDF转Excel失败: {str(e)}")
--- a/utils/photo_advice_generator.py
+++ b/utils/photo_advice_generator.py
@ -0,0 +1,366 @@
+#!/usr/bin/env python3
+"""
+照片评分建议生成器
+为照片评分结果提供具体的改进建议
+"""
+
+class PhotoAdviceGenerator:
+    """照片建议生成器类"""
+    
+    def __init__(self):
+        self.quality_advice_db = self._init_quality_advice()
+        self.aesthetic_advice_db = self._init_aesthetic_advice()
+        self.technical_advice_db = self._init_technical_advice()
+    
+    def _init_quality_advice(self):
+        """初始化质量改进建议数据库"""
+        return {
+            'clarity': {
+                'low': [
+                    "使用三脚架或稳定设备减少抖动",
+                    "提高快门速度避免运动模糊",
+                    "使用自动对焦确保主体清晰",
+                    "清洁镜头避免污渍影响",
+                    "在光线充足的环境下拍摄"
+                ],
+                'medium': [
+                    "微调对焦点确保主体清晰",
+                    "使用更高的分辨率设置",
+                    "避免过度压缩图像",
+                    "后期适当锐化处理"
+                ],
+                'high': [
+                    "清晰度优秀，继续保持",
+                    "可尝试更高难度的拍摄场景"
+                ]
+            },
+            'brightness': {
+                'low': [
+                    "增加曝光补偿",
+                    "使用闪光灯或补光设备",
+                    "选择光线更好的拍摄时间",
+                    "提高ISO感光度（注意噪点）",
+                    "使用反光板补光"
+                ],
+                'medium': [
+                    "微调曝光参数",
+                    "使用HDR模式拍摄",
+                    "注意高光和阴影的平衡",
+                    "后期调整亮度曲线"
+                ],
+                'high': [
+                    "亮度适中，曝光准确",
+                    "可尝试创意光影效果"
+                ]
+            },
+            'contrast': {
+                'low': [
+                    "增加画面明暗对比",
+                    "选择色彩对比强烈的场景",
+                    "使用侧光或逆光增强立体感",
+                    "后期调整对比度参数"
+                ],
+                'medium': [
+                    "适当增强局部对比",
+                    "注意高光不过曝，阴影不死黑",
+                    "使用曲线工具精细调整"
+                ],
+                'high': [
+                    "对比度良好，层次分明",
+                    "可尝试高对比风格创作"
+                ]
+            },
+            'color_balance': {
+                'low': [
+                    "校正白平衡设置",
+                    "使用灰卡进行色彩校准",
+                    "避免混合光源造成的色偏",
+                    "后期校正色彩平衡"
+                ],
+                'medium': [
+                    "微调色温和色调",
+                    "注意肤色还原自然",
+                    "统一画面色彩风格"
+                ],
+                'high': [
+                    "色彩平衡优秀，还原准确",
+                    "可尝试创意色彩风格"
+                ]
+            }
+        }
+    
+    def _init_aesthetic_advice(self):
+        """初始化美学改进建议数据库"""
+        return {
+            'composition': {
+                'basic': [
+                    "学习三分法则构图",
+                    "注意主体在画面中的位置",
+                    "避免主体过于居中",
+                    "利用引导线增强画面深度"
+                ],
+                'intermediate': [
+                    "尝试对称或不对称构图",
+                    "利用前景增强层次感",
+                    "注意画面元素的平衡",
+                    "创造视觉焦点"
+                ],
+                'advanced': [
+                    "构图优秀，可尝试更复杂构图",
+                    "探索极简或复杂构图风格",
+                    "注重画面节奏和韵律"
+                ]
+            },
+            'lighting': {
+                'basic': [
+                    "选择黄金时刻拍摄（日出日落）",
+                    "避免正午强光直射",
+                    "学习使用自然光",
+                    "注意光影方向和质量"
+                ],
+                'intermediate': [
+                    "尝试侧光或逆光效果",
+                    "利用阴影创造氛围",
+                    "控制光比避免过曝或欠曝",
+                    "学习使用人造光源"
+                ],
+                'advanced': [
+                    "光线运用娴熟，可尝试创意用光",
+                    "探索特殊光线条件拍摄",
+                    "注重光影的情感表达"
+                ]
+            },
+            'subject': {
+                'basic': [
+                    "明确拍摄主体",
+                    "简化背景突出主体",
+                    "注意主体与环境的互动",
+                    "选择有故事性的主体"
+                ],
+                'intermediate': [
+                    "注重主体的表情和姿态",
+                    "创造主体与环境的关系",
+                    "捕捉决定性瞬间",
+                    "注重主体的个性表达"
+                ],
+                'advanced': [
+                    "主体表现力强，可尝试更深层次表达",
+                    "探索抽象或概念性主体",
+                    "注重主体的象征意义"
+                ]
+            }
+        }
+    
+    def _init_technical_advice(self):
+        """初始化技术改进建议数据库"""
+        return {
+            'camera_settings': [
+                "学习曝光三角关系（光圈、快门、ISO）",
+                "根据场景选择合适的拍摄模式",
+                "掌握对焦技巧确保主体清晰",
+                "合理使用白平衡设置"
+            ],
+            'post_processing': [
+                "学习基本的后期调整技巧",
+                "掌握色彩校正和调整",
+                "学习锐化和降噪处理",
+                "尝试创意滤镜效果"
+            ],
+            'equipment': [
+                "根据需求选择合适的镜头",
+                "考虑使用三脚架提高稳定性",
+                "投资质量好的存储设备",
+                "定期清洁和维护设备"
+            ],
+            'shooting_techniques': [
+                "练习稳定的持机姿势",
+                "学习不同的拍摄角度",
+                "掌握连拍和定时拍摄",
+                "尝试慢门或高速摄影"
+            ]
+        }
+    
+    def generate_quality_advice(self, quality_scores):
+        """生成质量改进建议"""
+        advice = {
+            'overall': [],
+            'specific': {},
+            'priority': []
+        }
+        
+        # 总体建议
+        overall_score = sum(quality_scores.values()) / len(quality_scores)
+        
+        if overall_score >= 90:
+            advice['overall'].append("照片质量优秀，继续保持高水平拍摄")
+        elif overall_score >= 80:
+            advice['overall'].append("照片质量良好，有进一步提升空间")
+        elif overall_score >= 60:
+            advice['overall'].append("照片质量一般，需要重点改进")
+        else:
+            advice['overall'].append("照片质量较差，建议系统学习摄影基础")
+        
+        # 具体维度建议
+        for dimension, score in quality_scores.items():
+            if dimension in self.quality_advice_db:
+                level = self._get_score_level(score)
+                dimension_advice = self.quality_advice_db[dimension].get(level, [])
+                advice['specific'][dimension] = dimension_advice
+                
+                # 添加优先级建议
+                if score < 70:
+                    advice['priority'].append(f"优先改进{dimension}（当前{score}分）")
+        
+        return advice
+    
+    def generate_aesthetic_advice(self, aesthetic_score, composition_analysis):
+        """生成美学改进建议"""
+        advice = {
+            'general': [],
+            'composition': [],
+            'lighting': [],
+            'subject': [],
+            'creative': []
+        }
+        
+        # 总体美学建议
+        if aesthetic_score >= 90:
+            advice['general'].append("美学表现优秀，具备专业水准")
+            advice['creative'].append("可尝试更具挑战性的创意拍摄")
+        elif aesthetic_score >= 80:
+            advice['general'].append("美学表现良好，细节有待提升")
+            advice['creative'].append("尝试不同的构图和用光方式")
+        elif aesthetic_score >= 60:
+            advice['general'].append("美学表现一般，需要系统学习")
+            advice['creative'].append("从基础构图和用光开始练习")
+        else:
+            advice['general'].append("美学表现较差，建议学习摄影美学基础")
+        
+        # 构图建议
+        comp_level = self._get_aesthetic_level(aesthetic_score)
+        advice['composition'] = self.aesthetic_advice_db['composition'].get(comp_level, [])
+        
+        # 用光建议
+        light_level = self._get_aesthetic_level(aesthetic_score)
+        advice['lighting'] = self.aesthetic_advice_db['lighting'].get(light_level, [])
+        
+        # 主体建议
+        subject_level = self._get_aesthetic_level(aesthetic_score)
+        advice['subject'] = self.aesthetic_advice_db['subject'].get(subject_level, [])
+        
+        return advice
+    
+    def generate_technical_advice(self, photo_type='general'):
+        """生成技术改进建议"""
+        advice = {
+            'camera_settings': self.technical_advice_db['camera_settings'],
+            'post_processing': self.technical_advice_db['post_processing'],
+            'equipment': self.technical_advice_db['equipment'],
+            'shooting_techniques': self.technical_advice_db['shooting_techniques']
+        }
+        
+        # 根据照片类型调整建议
+        if photo_type == 'portrait':
+            advice['camera_settings'].extend([
+                "使用大光圈虚化背景",
+                "注意对焦在眼睛上",
+                "使用柔光设备美化肤色"
+            ])
+        elif photo_type == 'landscape':
+            advice['camera_settings'].extend([
+                "使用小光圈获得大景深",
+                "使用三脚架确保稳定性",
+                "利用滤镜控制光线"
+            ])
+        elif photo_type == 'macro':
+            advice['camera_settings'].extend([
+                "使用微距镜头或近摄环",
+                "注意景深控制",
+                "使用环形闪光灯补光"
+            ])
+        
+        return advice
+    
+    def generate_personalized_advice(self, quality_scores, aesthetic_score, photo_content):
+        """生成个性化综合建议"""
+        personalized = {
+            'quick_wins': [],
+            'long_term_improvements': [],
+            'learning_resources': [],
+            'practice_exercises': []
+        }
+        
+        # 快速改进建议
+        low_score_dimensions = [dim for dim, score in quality_scores.items() if score < 70]
+        if low_score_dimensions:
+            personalized['quick_wins'].append(f"重点改进：{', '.join(low_score_dimensions)}")
+        
+        # 长期改进建议
+        if aesthetic_score < 80:
+            personalized['long_term_improvements'].append("系统学习摄影构图和用光")
+        
+        # 学习资源推荐
+        personalized['learning_resources'].extend([
+            "推荐书籍：《摄影构图学》、《美国纽约摄影学院教材》",
+            "在线课程：B站摄影教程、摄影之友",
+            "实践平台：参加摄影比赛、加入摄影社群"
+        ])
+        
+        # 练习建议
+        personalized['practice_exercises'].extend([
+            "每日拍摄练习：同一主题不同角度",
+            "技术练习：曝光、对焦、白平衡",
+            "创意练习：尝试不同风格和主题"
+        ])
+        
+        return personalized
+    
+    def _get_score_level(self, score):
+        """根据分数获取等级"""
+        if score >= 85:
+            return 'high'
+        elif score >= 70:
+            return 'medium'
+        else:
+            return 'low'
+    
+    def _get_aesthetic_level(self, score):
+        """根据美学分数获取等级"""
+        if score >= 85:
+            return 'advanced'
+        elif score >= 70:
+            return 'intermediate'
+        else:
+            return 'basic'
+
+def get_quality_improvement_advice(quality_scores):
+    """获取质量改进建议"""
+    try:
+        advisor = PhotoAdviceGenerator()
+        return advisor.generate_quality_advice(quality_scores)
+    except Exception as e:
+        return {'error': f"生成建议失败: {str(e)}"}
+
+def get_aesthetic_improvement_advice(aesthetic_score, composition_analysis=None):
+    """获取美学改进建议"""
+    try:
+        advisor = PhotoAdviceGenerator()
+        return advisor.generate_aesthetic_advice(aesthetic_score, composition_analysis)
+    except Exception as e:
+        return {'error': f"生成建议失败: {str(e)}"}
+
+def get_technical_advice(photo_type='general'):
+    """获取技术改进建议"""
+    try:
+        advisor = PhotoAdviceGenerator()
+        return advisor.generate_technical_advice(photo_type)
+    except Exception as e:
+        return {'error': f"生成建议失败: {str(e)}"}
+
+def get_personalized_advice(quality_scores, aesthetic_score, photo_content):
+    """获取个性化综合建议"""
+    try:
+        advisor = PhotoAdviceGenerator()
+        return advisor.generate_personalized_advice(quality_scores, aesthetic_score, photo_content)
+    except Exception as e:
+        return {'error': f"生成建议失败: {str(e)}"}
--- a/utils/web_scraper.py
+++ b/utils/web_scraper.py
@ -0,0 +1,99 @@
+import requests
+from bs4 import BeautifulSoup
+import pandas as pd
+import re
+
+def scrape_webpage(url, selector=None):
+    """抓取网页内容"""
+    try:
+        headers = {
+            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
+        }
+        
+        response = requests.get(url, headers=headers, timeout=10)
+        response.raise_for_status()
+        
+        soup = BeautifulSoup(response.content, 'html.parser')
+        
+        if selector:
+            # 根据CSS选择器提取特定内容
+            elements = soup.select(selector)
+            content = [elem.get_text(strip=True) for elem in elements]
+        else:
+            # 提取所有文本内容
+            content = soup.get_text(strip=True)
+        
+        return content
+    except Exception as e:
+        raise Exception(f"网页抓取失败: {str(e)}")
+
+def scrape_table_from_webpage(url, table_index=0):
+    """从网页中提取表格数据"""
+    try:
+        headers = {
+            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
+        }
+        
+        response = requests.get(url, headers=headers, timeout=10)
+        response.raise_for_status()
+        
+        soup = BeautifulSoup(response.content, 'html.parser')
+        tables = soup.find_all('table')
+        
+        if not tables:
+            return None
+        
+        table = tables[table_index]
+        
+        # 提取表头
+        headers = []
+        header_row = table.find('tr')
+        if header_row:
+            headers = [th.get_text(strip=True) for th in header_row.find_all(['th', 'td'])]
+        
+        # 提取数据行
+        data = []
+        rows = table.find_all('tr')[1:]  # 跳过表头
+        
+        for row in rows:
+            cells = row.find_all(['td', 'th'])
+            row_data = [cell.get_text(strip=True) for cell in cells]
+            if row_data:
+                data.append(row_data)
+        
+        return headers, data
+    except Exception as e:
+        raise Exception(f"网页表格提取失败: {str(e)}")
+
+def web_to_excel(url, output_path, selector=None):
+    """将网页内容导出为Excel"""
+    try:
+        if selector:
+            content = scrape_webpage(url, selector)
+            if isinstance(content, list):
+                df = pd.DataFrame({
+                    '序号': range(1, len(content) + 1),
+                    '内容': content
+                })
+            else:
+                df = pd.DataFrame({'内容': [content]})
+        else:
+            # 尝试提取表格
+            table_data = scrape_table_from_webpage(url)
+            if table_data:
+                headers, data = table_data
+                df = pd.DataFrame(data, columns=headers)
+            else:
+                # 提取普通文本
+                content = scrape_webpage(url)
+                # 按段落分割
+                paragraphs = [p.strip() for p in re.split(r'\n+', content) if p.strip()]
+                df = pd.DataFrame({
+                    '段落编号': range(1, len(paragraphs) + 1),
+                    '内容': paragraphs
+                })
+        
+        df.to_excel(output_path, index=False)
+        return True
+    except Exception as e:
+        raise Exception(f"网页转Excel失败: {str(e)}")
--- a/uv.lock
+++ b/uv.lock