fix: improve LLM prompt to use actual rubric score ranges

Previous prompt had hardcoded 0-3 score examples which misled LLM. Now prompt instructs LLM to read max_score from rubric for each criterion.
2025-12-02 14:21:41 +08:00 · 2025-12-02 14:21:41 +08:00 · c25bcfddd0
commit c25bcfddd0
parent 1afc2eae48
1 changed files with 19 additions and 13 deletions
--- a/.autograde/llm_grade.py
+++ b/.autograde/llm_grade.py
@ -36,19 +36,25 @@ def read_file_or_string(value):
 PROMPT_TEMPLATE = """你是严格且一致的助教，按提供的评分量表为学生的简答题评分。
- 只依据量表，不做主观延伸；允许多样表述。
+- 只依据量表中各评分项的 max_score 和 scoring_guide 进行评分
- 不输出任何解释性文本；只输出 JSON，包含:
+- 每个评分项的分数范围是 0 到该项的 max_score
-  {{
+- 不输出任何解释性文本；只输出 JSON
-    "total": number(0-10, 两位小数),
+
-    "criteria": [
+输出格式：
-      {{"id":"accuracy","score":0-3,"reason":"要点式一句话"}},
+{{
-      {{"id":"coverage","score":0-3,"reason":""}},
+  "total": number (各项分数之和，保留两位小数),
-      {{"id":"clarity","score":0-3,"reason":""}}
+  "criteria": [
-    ],
+    {{"id": "评分项id", "score": number(0到该项max_score), "reason": "简短评语"}},
-    "flags": [],
+    ...
-    "confidence": number(0-1)
+  ],
-  }}
+  "flags": [],
-如果答案与题目无关，total=0，并加 flag "need_review"。
+  "confidence": number(0-1, 评分置信度)
 }}
 重要：
 - 每个评分项的 score 必须在 0 到该项 max_score 范围内
 - total 必须等于所有 criteria 的 score 之和
 - 如果答案与题目无关或为空，total=0，并加 flag "need_review"
 【题目】
 <<<{question}>>>