linter docs 作为训练集

刚刚想到，像 Ruff 这样的 linter 对每条 rule 都给出了很详细的文档，其实可以作为很好的 SFT 和 DPO 源。准备试验一下

作为 SFT，可以这么构造训练集：

输入描述和 do not use，叫模型输出 use
输入描述和案例，叫模型输出 risks
输入 do not use 和 use，叫模型输出 preference 和 reason
输入 do not use 和“请改进”之类的提示词，叫模型输出 reason 和 use，或者 reason 也作为输出

作为 DPO，可以这么构造训练集：

输入代码的用途解释（这个需要生成），对比 do not use 和 use
对比模型生成的解释和官方的解释

我准备有空开个仓库，构建个工作流来做这事儿，定期将 ruff 的 docs 生成训练集，发布到 huggingface 上，以及自己微调几个模型试试看

初步成果：

ruff rule --all --output-format json

这样能获得一个大的 json list，schema 大概是这样：

class Rule(TypedDict):
    name: str
    code: str
    linter: str
    summary: str
    message_formats: list[str]
    fix: str
    explanation: str
    preview: bool

其中只有 explanation 我们比较有用，这是个 markdown 格式的，我试着解析了下：

解析了 ruff 的所有 rules 的 explanation 的 markdown 为结构化的数据，可以看到，就是这三个字段：

是什么 What it does
为什么 Why is this bad
怎么办 Example、Use instead

接下来把这三个部分提取出来，就能作为一个数据集来训练写出好的代码的能力了。

（截至 ruff 0.9.7，一共有 915 条 rules）

还是稍微统计了一下，其实不止这三个标题，一共是

Why is this bad? (915)
What it does (914)
Example (831)
References (517)
Options (135)
Fix safety (104)
Examples (65)
Known problems (37)
Known issues (33)
Formatter compatibility (16)
Removed (10)
Notebook behavior (6)
See also (6)
Typing stub files (.pyi) (5)
Details (3)
Note (3)
Removal (3)
Fix availability (3)
Fix safety and availability (2)
Preview (2)
Error suppression (2)
Use instead: (1)
Examples: (1)
Known deviations (1)
Fix behaviour and safety (1)
Preview-mode behaviour (1)
What it does? (1)
Limitations (1)
Fix Safety (1)

看得出来他们并没有很好地控制这些。甚至有的是忘了标点符号了😂难道说 Astral 也是草台班子？

2025年2月25日更新：https://github.com/astral-sh/ruff/pull/16364 居然给我 merge 了