In LLM post-training, SFT (supervised fine-tuning) usually handles cold-start, behavior formatting, and task-protocol learning. But longer SFT is not always better, and lower loss is not always better. Judging whether SFT is “done” hinges not on how well it fits the training set, but on whether continued supervised imitation still...
[Read More]