QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD

Introduction

At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention and positive feedback from the community. Building on the Qwen2.5-VL series, we continued to optimize the model using reinforcement learning and open-sourced the new VL model with the beloved 32B parameter scale under the Apache 2.0 license — Qwen2.5-VL-32B-Instruct. Compared to the previously released Qwen2.5-VL series models, the features of this 32B VL model are as follows:

  • Responses More Aligned with Human Preferences: Adjusted the output style to provide more detailed, better-formatted answers that align more closely with human preferences.
  • Mathematical Reasoning: Significant improvement in the accuracy of solving complex mathematical problems.
  • Fine-grained Image Understanding and Reasoning: Enhanced accuracy and detailed analysis in tasks such as image parsing, content recognition, and visual logic deduction.

Performance

Extensive benchmarking against state-of-the-art (SoTA) models of comparable scale, Qwen2.5-VL-32B-Instruct has demonstrated superiority over baselines, e.g., Mistral-Small-3.1-24B and Gemma-3-27B-IT, even surpassing the larger Qwen2-VL-72B-Instruct. Notably, it achieves significant advantages in multimodal tasks such as MMMU, MMMU-Pro, and MathVista, which focus on complex, multi-step reasoning. On MM-MT-Bench, a benchmark emphasizing subjective user experience evaluation, Qwen2.5-VL-32B-Instruct outperforms its predecessor Qwen2-VL-72B-Instruct by a substantial margin.

In addition to excelling in visual capabilities, Qwen2.5-VL-32B-Instruct has also achieved top-tier performance in pure text capabilities at the same scale.

Demo Cases

Fine-grained Image Understanding and Reasoning Next
User
I am driving a large truck on this road, and it is now 12 o’clock. Can I reach a place 110 kilometers away before 13 o’clock? image
Qwen2.5-VL-32B

To determine whether you can reach a place 110 kilometers away before 13:00, we need to analyze the information provided:

  1. Current Time: It is 12:00.
  2. Distance to Destination: 110 kilometers.
  3. Speed Limit for Trucks: The speed limit sign indicates that the maximum speed for trucks is 100 km/h.

Step-by-Step Analysis:

1. Calculate the Required Travel Time

The formula to calculate travel time is: \[ \text{Time} = \frac{\text{Distance}}{\text{Speed}} \] Given:

  • Distance = 110 km
  • Maximum Speed = 100 km/h

Substitute the values: \[ \text{Time} = \frac{110 \text{ km}}{100 \text{ km/h}} = 1.1 \text{ hours} \]

2. Convert Time to Hours and Minutes

1.1 hours is equivalent to: \[ 1 \text{ hour} + 0.1 \times 60 \text{ minutes} = 1 \text{ hour} + 6 \text{ minutes} = 1 \text{ hour and } 6 \text{ minutes} \]

3. Determine Arrival Time

If you start at 12:00 and it takes 1 hour and 6 minutes to travel 110 kilometers, your arrival time will be: \[ 12:00 + 1 \text{ hour and } 6 \text{ minutes} = 13:06 \]

4. Compare with Target Time

You want to arrive before 13:00. However, based on the calculations, you will arrive at 13:06, which is after 13:00.

Conclusion:

Given the speed limit of 100 km/h for trucks, it is not possible to reach a place 110 kilometers away before 13:00.

Thus, the final answer is: \[ \boxed{\text{No}} \]

Next Step

While Qwen2.5-VL-32B has focused on optimizing subjective experience and mathematical reasoning through reinforcement learning—operating within the paradigm of “fast thinking”. Our next research direction will prioritize long and effective reasoning processes to push the boundaries of visual models in tackling highly complex, multi-step visual reasoning tasks.

Citation

If you find our model helpful, feel free to cite it:

@article{Qwen2.5-VL,
  title={Qwen2.5-VL Technical Report},
  author={Bai, Shuai and Chen, Keqin and Liu, Xuejing and Wang, Jialin and Ge, Wenbin and Song, Sibo and Dang, Kai and Wang, Peng and Wang, Shijie and Tang, Jun and Zhong, Humen and Zhu, Yuanzhi and Yang, Mingkun and Li, Zhaohai and Wan, Jianqiang and Wang, Pengfei and Ding, Wei and Fu, Zheren and Xu, Yiheng and Ye, Jiabo and Zhang, Xi and Xie, Tianbao and Cheng, Zesen and Zhang, Hang and Yang, Zhibo and Xu, Haiyang and Lin, Junyang},
  journal={arXiv preprint arXiv:2502.13923},
  year={2025}
}