GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
Introduction
In early April, we introduced CodeQwen1.5, which garnered significant attention from the community. Since then, we have been working to enhance the coding model. Today, we are excited to announce the release of the next generation of open-source coding models, Qwen2.5-Coder, and officially rename CodeQwen to Qwen-Coder. We think “Coder” is more human-like and agile, reflecting our vision of it becoming a true coding partner in the future. Qwen2.5-Coder is part of the Qwen2.5 series, available in three model sizes: 1.5B, 7B, and a 32B version (coming soon).
This update focuses on two main improvements: scaling up the code training data and enhancing coding capabilities while maintaining strong performance in other core areas like math and general tasks.
💻 Code More: Qwen2.5-Coder builds on the strong Qwen2.5 and continues training on a larger scale of code data, including source code, text-code grounding data, and synthetic data, totaling 5.5 trillion tokens. This leads to significant improvements in code-related tasks.
📚 Learn More: While enhancing coding abilities, we aimed to retain strengths in math and general capabilities from base model. Therefore, Qwen2.5-Coder incorporates additional data on mathematics and general abilities, providing a comprehensive foundation for real-world applications like Code Agent.
Qwen2.5-Coder: Base Models
Qwen2.5-Coder supports up to 128K tokens of context, covers 92 programming languages, and has achieved remarkable improvements across various code-related evaluation tasks, including code generation, multi-programming code generation, code completion, and code repair. Notably, the open-source 7B version of Qwen2.5-Coder has even outperformed larger models like DeepSeek-Coder-V2-Lite and CodeStral-22B, making it one of the most powerful base code models available. Beyond code tasks, Qwen2.5-Coder also demonstrates competitive math capabilities in evaluations such as GSM8K and Math. For general tasks, evaluations on MMLU and ARC show that Qwen2.5-Coder has retained the general ability performance of Qwen2.5.
Qwen2.5-Coder-Instruct: Instruction-Tuned Models
Building on Qwen2.5-Coder, we fine-tuned it with instruction data, creating Qwen2.5-Coder-Instruct. This instruction-tuned model not only further improves task performance but also demonstrates exceptional generalization across various benchmarks.
Qwen2.5-Coder-Instruct excels in several key areas:
- Outstanding Multi-programming Expert: We expanded the multi-language evaluations using McEval, covering more than 40 programming languages. The results show that Qwen2.5-Coder-Instruct performs remarkably well across many languages, including niche ones.
- Code Reasoning: We believe code reasoning is closely tied to general reasoning skills. We used CRUXEval as a benchmark, and the results show Qwen2.5-Coder-Instruct excels in code reasoning tasks. Interestingly, as code reasoning improves, the model’s ability to follow complex instructions also gets better, encouraging us to further explore how code can enhance general skills.
- Math Reasoning: Math and code are often discussed together: math is the foundation of code, and code is a key tool for math. Qwen2.5-Coder-Instruct shines in both code and math tasks, proven to be a “science student”.
Model | Math | GSM8K | GaoKao2023en | OlympiadBench | CollegeMath | AIME24 |
---|---|---|---|---|---|---|
DeepSeek-Coder-V2-Lite-Instruct | 61.0 | 87.6 | 56.1 | 26.4 | 39.8 | 6.7 |
Qwen2.5-Coder-7B-Instruct | 66.8 | 86.7 | 60.5 | 29.8 | 43.5 | 10.0 |
- Basic capabilities: We also assessed the general capabilities, and the results indicate that Qwen2.5-Coder-Instruct maintains the advantages of Qwen2.5 in terms of general abilities.
Model | AMC23 | MMLU | MMLU-Pro | IFEval | CEval | GPQA |
---|---|---|---|---|---|---|
DeepSeek-Coder-V2-Lite-Instruct | 40.4 | 42.5 | 60.6 | 38.6 | 60.1 | 27.6 |
Qwen2.5-Coder-7B-Instruct | 42.5 | 45.6 | 68.7 | 58.6 | 61.4 | 35.6 |
License
Qwen2.5-Coder is released under the Apache 2.0 license. We hope this increased openness will accelerate its application in code intelligence.
What’s Next for Qwen2.5-Coder?
We are preparing the 32B version of Qwen2.5-Coder, aiming to challenge proprietary models. Stay tuned—it’s coming soon! Additionally, we’re exploring powerful code-centric reasoning models to push the boundaries of code intelligence.
Citation
@article{hui2024qwen2,
title={Qwen2. 5-Coder Technical Report},
author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others},
journal={arXiv preprint arXiv:2409.12186},
year={2024}
}
@article{yang2024qwen2,
title={Qwen2 technical report},
author={Yang, An and Yang, Baosong and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Zhou, Chang and Li, Chengpeng and Li, Chengyuan and Liu, Dayiheng and Huang, Fei and others},
journal={arXiv preprint arXiv:2407.10671},
year={2024}
}