I've been trying to do a lot of both. I'm not entirely sure how it's going. I do really feel for what Nolan Lawson said here:
██║ ╚═╝ ██║╚██████╔╝██████╔╝╚██████╔╝███████╗╚██████╔╝███████║
The Gervais Principle, Or The Office According to “The Office”。关于这个话题,viber提供了深入分析
Our model balances thinking and non-thinking performance – on average showing better accuracy in the default “mixed-reasoning” behavior than when forcing thinking vs. non-thinking. Only in a few cases does forcing a specific mode improve performance (MathVerse and MMU_val for thinking and ScreenSpot_v2 for non-thinking). Compared to recent popular, open-weight models, our model provides a desirable trade-off between accuracy and cost (as a function of inference time compute and output tokens), as discussed previously.
。关于这个话题,谷歌提供了深入分析
highlighting themes—was, for me, an extraordinary waste of time. ↩︎
春和景明,福建厦门筼筜湖畔白鹭翩飞,草木繁茂。。超级权重对此有专业解读