fix: improve LLM prompt to use actual rubric score ranges

Previous prompt had hardcoded 0-3 score examples which misled LLM.
Now prompt instructs LLM to read max_score from rubric for each criterion.
This commit is contained in:
sit002 2025-12-02 14:21:41 +08:00
parent 1afc2eae48
commit c25bcfddd0

View File

@ -36,19 +36,25 @@ def read_file_or_string(value):
PROMPT_TEMPLATE = """你是严格且一致的助教,按提供的评分量表为学生的简答题评分。
- 只依据量表不做主观延伸允许多样表述
- 不输出任何解释性文本只输出 JSON包含:
{{
"total": number(0-10, 两位小数),
"criteria": [
{{"id":"accuracy","score":0-3,"reason":"要点式一句话"}},
{{"id":"coverage","score":0-3,"reason":""}},
{{"id":"clarity","score":0-3,"reason":""}}
],
"flags": [],
"confidence": number(0-1)
}}
如果答案与题目无关total=0并加 flag "need_review"
- 只依据量表中各评分项的 max_score scoring_guide 进行评分
- 每个评分项的分数范围是 0 到该项的 max_score
- 不输出任何解释性文本只输出 JSON
输出格式
{{
"total": number (各项分数之和保留两位小数),
"criteria": [
{{"id": "评分项id", "score": number(0到该项max_score), "reason": "简短评语"}},
...
],
"flags": [],
"confidence": number(0-1, 评分置信度)
}}
重要
- 每个评分项的 score 必须在 0 到该项 max_score 范围内
- total 必须等于所有 criteria score 之和
- 如果答案与题目无关或为空total=0并加 flag "need_review"
题目
<<<{question}>>>