简介

在 Qwen2.5 发布之后，我们听到社区对处理更长序列的需求。在这段时间，我们针对长序列处理能力以及长序列下的推理效率进行了很多优化。今天，我们隆重推出新的 Qwen2.5-Turbo 版本，其特点在于：

更长的上下文支持: 我们首次将模型的上下文长度从 128k 扩展到 1M，该长度约为 100 万个英文单词或 150 万个汉字，相当于 10 本长篇小说，150 小时的演讲稿，3 万行代码。该模型能在 1M 长度的大海捞针 (Passkey Retrieval) 任务中实现 100% 的准确率，在长文本评测集 RULER 上获得 93.1 分，超越 GPT-4 的 91.6 分，GLM4-9B-1M 的 89.9 分。此外，在短序列能力上，该模型仍然保持了非常强的竞争力，与 GPT-4o-mini 持平。
更快的推理速度: 利用稀疏注意力机制，我们成功将处理 1M tokens 上下文时的首字返回时间从 4.9 分钟降低到68秒，实现 4.3 倍加速比。
更低的价格：价格仍为 0.3元 / 1M tokens。在相同成本下，Qwen2.5-Turbo 相比 GPT-4o-mini 能够处理 3.6 倍的 Token。

现在，你可以通过阿里云大模型服务平台的 API 服务，或者通过 HuggingFace Demo 或者 ModelScope Demo 进行体验。

实例演示

我们准备了一些实例演示，来体现 Qwen2.5-Turbo 能带来的新用法，涵盖了长篇小说的深入理解，仓库级别代码助手，多篇论文阅读等能力。

Example: Understanding Long Novels Next

User

[Upload the three Chinese novels of The Three-Body Problem, which contain 690k tokens.]

Provide a summary of the plots of these three novels in English

Qwen2.5-Turbo

The three novels you mentioned from the “Earth’s Past” series are interconnected and form a trilogy that delves into the complex interactions between Earth and an alien civilization known as the Trisolarans.

First Novel: “Earth’s Past” In the first novel, the narrative focuses on the development of a top-secret project called “Red Coast” (红岸) initiated by the Chinese military. The protagonist, Ye Wenjie, is a scientist who discovers that the Trisolarans have begun listening to the universe. Her actions, particularly her decision to respond to the Trisolarans, drastically alter the course of human history. She sets up a powerful signal to send to the Trisolarans using the sun as an amplifier. This action marks the beginning of a new era where humanity faces a significant existential threat.

Second Novel: “Dark Forest” The second novel, titled “Dark Forest,” delves into the concept of the “Dark Forest Theory.” According to this theory, the universe is a vast battlefield where civilizations must remain silent to avoid detection and subsequent destruction. The Trisolarans and humans engage in a complex strategic game, with the Trisolarans deploying advanced weapons known as “water drops” (水滴). These water drops are capable of destroying entire planets by disrupting their ecosystems. Meanwhile, humans develop their own strategies, including the “Wall-Bearing Plan” (面壁计划), which involves four individuals who devise plans to counter the Trisolaran threat without revealing their intentions to the outside world. The story follows one of these face-bearers, Luo Ji, as he struggles with the ethical implications of his mission.

Third Novel: “Death’s End” In “Death’s End,” Luo Ji becomes the ultimate guardian of the Dark Forest Theory. He holds the key to the survival of both human and Trisolaran civilizations. However, his position as the guardian (执剑人) brings him immense personal struggle and psychological turmoil. As Luo Ji’s time as the guardian nears its end, he must make a critical decision about the future of humanity. Meanwhile, the Trisolarans continue their efforts to colonize and assimilate other worlds. The novel explores the consequences of this decision and the broader impact on the universe. It also introduces a new element: the creation of micro-black holes (黑域) by humans, which could potentially isolate them from the rest of the universe. However, the effectiveness of this strategy is questioned when six of these micro-black holes begin to move and accelerate towards Earth. The trilogy as a whole explores themes of survival, technological advancement, and the ethical dilemmas faced by humanity in the face of extraterrestrial threats. Each novel builds upon the previous one, revealing more about the intricate relationship between the two civilizations and the universe they inhabit.

Example: 长篇小说理解 Next

User

[上传《三体》小说三部曲，共计69万tokens]

给出这三本小说的剧情概要

Qwen2.5-Turbo

《三体》、《三体 II·黑暗森林》和《三体 III·死神永生》三部小说的剧情概要如下：

《三体》

背景：文化大革命期间，军方启动了名为“红岸工程”的绝密计划，以探寻外星文明。
情节：叶文洁在红岸基地见证了人类向宇宙发出的第一声啼鸣，这一举动彻底改变了人类的命运。三体文明在接收到地球的信息后，开始向地球进发。人类面对三体文明的入侵，采取了一系列的防御措施。

《三体 II·黑暗森林》

背景：人类与三体文明的战争使得双方都面临前所未有的威胁，人类社会陷入极度的绝望与恐惧。
情节：罗辑被选为面壁者，执行一项高度机密的任务。他通过太阳向宇宙发送咒语，导致三体世界中的一个恒星被摧毁。随后，罗辑被指定为执剑人，负责人类的威慑计划。最终，罗辑的威慑成功地阻止了三体文明的进攻。

《三体 III·死神永生》

背景：三体舰队和人类文明之间的战争导致了双方的毁灭。罗辑和程心在两个世纪后醒来，发现世界已经发生了巨大变化。
情节：程心和罗辑在宇宙中发现了一个被命名为“647号宇宙”的小宇宙，其中蕴藏了三体文明的秘密。他们了解到，人类需要借助三体文明的技术来生存。程心最终成为新的执剑人，罗辑则在精神上得到了新的寄托。三部小说展现了人类文明在面对三体文明威胁时的生存斗争，从探寻外星文明到最终的战争与和平，描绘了一个宏大的宇宙史诗。

Example: Understanding Repository Next

User

[Upload a text file containing all code files from the Qwen-Agent repository, which has a length of 133k tokens.]

Please read the codes carefully and ensure your response is comprehensive. Introduce this repository

Qwen2.5-Turbo

Qwen-Agent is a framework designed for developing LLM (Large Language Model) applications. It is based on the instruction-following, tool usage, planning, and memory capabilities of Qwen. Here are some key points and functionalities:

Agent Class: Provides a high-level interface for integrating tool calls and LLM interactions. It supports role-playing, automatic planning, and tool invocation.
DocQA Agent: Specialized for document question answering, capable of handling long documents and providing context-aware responses.
Function Calling: Supports function calling, enabling interaction with tools or APIs directly from the model’s responses.
RAG (Retrieval-Augmented Generation): Integrates document retrieval with generation, enhancing the model’s ability to recall and use information from documents.
Tool Integration: Includes various built-in and custom tools, such as code interpreters, web extractors, and image generators, which can be used to perform tasks like code execution, web scraping, and image manipulation.
Multi-Agent Systems: Supports the creation of multi-agent systems where multiple agents can interact, simulate conversations, and collaborate on tasks.
Customization: Highly customizable, allowing developers to tailor the framework to specific needs, such as creating custom agents, defining custom workflows, and integrating additional tools.
Examples and Demos: Comes with several examples and demos, such as a Chrome browser extension called BrowserQwen, which demonstrates the practical application of Qwen-Agent in a real-world scenario.
Testing and Debugging: Includes testing and debugging utilities, such as test cases and logging mechanisms, to ensure the reliability and effectiveness of the developed applications.
Dependencies: Requires certain dependencies to be installed for specific functionalities, such as qwen-agent[gui] for GUI support and qwen-agent[rag] for RAG support. This repository is valuable for researchers, developers, and anyone interested in leveraging large language models for complex tasks and applications.

User

What are the subclasses of Agent in this repository, and provide their file paths

Qwen2.5-Turbo

Here are the subclasses of the Agent class in the Qwen-Agent repository along with their file paths:

Assistant

File Path: qwen_agent/agents/assistant.py

ArticleAgent
- File Path: qwen_agent/agents/article_agent.py
BasicDocQA
- File Path: qwen_agent/agents/doc_qa/basic_doc_qa.py
DialogueRetrievalAgent
- File Path: qwen_agent/agents/dialogue_retrieval_agent.py
DialogueSimulator
- File Path: qwen_agent/agents/dialogue_simulator.py
DocQAAgent (Note: The actual implementation of DocQAAgent may vary with each release)
- File Path: qwen_agent/agents/doc_qa/basic_doc_qa.py
GroupChat
- File Path: qwen_agent/agents/group_chat.py
GroupChatAutoRouter
- File Path: qwen_agent/agents/group_chat_auto_router.py
GroupChatCreator
- File Path: qwen_agent/agents/group_chat_creator.py
HumanSimulator
- File Path: qwen_agent/agents/human_simulator.py
ReActChat
- File Path: qwen_agent/agents/react_chat.py
Router
- File Path: qwen_agent/agents/router.py
TIRMathAgent
- File Path: qwen_agent/agents/tir_agent.py
UserAgent
- File Path: qwen_agent/agents/user_agent.py
VisualStorytelling
- File Path: qwen_agent/agents/visual_storytelling.py
MemoAssistant
- File Path: qwen_agent/agents/memo_assistant.py
WriteFromScratch
- File Path: qwen_agent/agents/writing/write_from_scratch.py
ExpandWriting
- File Path: qwen_agent/agents/writing/expand_writing.py
OutlineWriting

File Path: qwen_agent/agents/writing/outline_writing.py

These subclasses extend the base Agent class to provide specialized functionalities tailored for different use cases within the Qwen-Agent framework.

Example: Understanding Multiple Papers Next

User

[Upload 7 papers about long-context LLM, which have a length of 171k tokens. Sources: 1 2 3 4 5 6 7]

classify the above papers into two categories

Qwen2.5-Turbo

The papers provided can be classified into two categories based on their focus:

Category 1: Long-Context Scaling Techniques

DCA (Dual Chunk Attention): This paper introduces a new training-free framework to allow large language models to handle long-context sequences by breaking down the attention computation into smaller chunks. (File: 1. DCA.pdf)
MInference: This paper proposes a method to accelerate pre-filling for long-context language models by using a dynamic sparse attention mechanism with a context memory module. (File: 2. minference.pdf)
InfLLM: This paper presents a training-free memory-based approach to enable large language models to understand extremely long sequences by incorporating an efficient context memory mechanism. (File: 4. infLLM.pdf)
SnapKV: This paper introduces a fine-tuning-free approach to efficiently minimize the key-value (KV) cache size of large language models during generation while maintaining comparable performance. (File: 7. SnapKV.pdf)

Category 2: Benchmarking Long-Context Capabilities

LV-Eval: This paper introduces a challenging long-context benchmark designed to evaluate the capabilities of large language models across various lengths and tasks, with a focus on mitigating knowledge leakage and providing a balanced evaluation. (File: 3. lveval.pdf)
NeedleBench: This paper presents a benchmark for evaluating the retrieval and reasoning capabilities of large language models in long-context scenarios, including a needle-in-a-haystack task. (File: 5. needlebench.pdf)
RULER: This paper proposes a synthetic benchmark for evaluating long-context language models with diverse task categories, including retrieval, multi-hop tracing, aggregation, and question answering. (File: 6. ruler.pdf)

User

Output a JSON array where each element sequentially contains the paper title and a one-sentence summary

Qwen2.5-Turbo

[
    {
        "title": "Dual Chunk Attention",
        "summary": "DCA introduces a novel training-free framework to enable large language models to handle 
long-context sequences bydecomposing attention computation into smaller chunks."
    },
    {
        "title": "MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention",
        "summary": "MInference uses a dynamic sparse attention mechanism with a context memory module to
significantlyreduce the latency of long-context LLM inference."
    },
    {
        "title": "InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory",
        "summary": "InfLLM leverages an efficient context memory to enable large language models to process
extremelylong sequences without additional training."
    },
    {
        "title": "SnapKV: LLM Knows What You are Looking for Before Generation",
        "summary": "SnapKV is a fine-tuning-free approach that efficiently compresses the KV cache to minimize
memory usage while maintaining model performance."
    },
    {
        "title": "LV-EVAL: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K",
        "summary": "LV-EVAL is a benchmark designed to evaluate large language models' long-context capabilities
across multiple lengths and tasks, focusing on mitigating knowledge leakage."
    },
    {
        "title": "NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?",
        "summary": "NeedleBench assesses the retrieval and reasoning capabilities of large language models in
long-context scenarios, including tasks like needle-in-a-haystack."
    },
    {
        "title": "RULER: What's the Real Context Size of Your Long-Context Language Models?",
        "summary": "RULER is a synthetic benchmark that evaluates long-context language models through tasks
like retrieval, multi-hop tracing, aggregation, and question answering."
    }
]

如何使用 API

最新的支持 1M tokens 的 Qwen2.5-Turbo 的使用方法和 Qwen API 的标准用法一致，并且与 OpenAI API 兼容。下面是一个简单的 Python 调用示例 (注意：你需要将环境变量 YOUR_API_KEY 设置为您的 API Key，欲了解更多细节请访问阿里云大模型服务快速入门)：

import os

from openai import OpenAI

# 读取长文本文件
with open("example.txt", "r", encoding="utf-8") as f:
    text = f.read()
user_input = text + "\n\nSummarize the above text."

client = OpenAI(
    api_key=os.getenv("YOUR_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-turbo-latest",
    messages=[
      {'role': 'system', 'content': 'You are a helpful assistant.'},
      {'role': 'user', 'content': user_input},
    ],
)

print(completion.choices[0].message)

模型表现

在这一部分，我们通过各类基准测试来评估 Qwen2.5-Turbo 的性能表现以及推理速度上的进步。

大海捞针

我们首先在 1M 大海捞针 (Passkey Retrieval) 任务上进行了实验。结果显示，Qwen2.5-Turbo 能在1M长度的无关文本中完美捕捉所有隐藏的数字，证明模型在超长上下文中捕捉细节信息的能力。

更复杂的长文本任务

我们选取了若干长文本任务数据集对模型进行了测试，包括：

RULER：在大海捞针基础上的扩展基准，任务包括在无关上下文中查找多“针”或回答多个问题，或找到上下文中出现最多或最少的词。数据的上下文长度最长为 128K。
LV-Eval：要求同时理解众多证据片段的基准测试。我们对 LV-Eval 原始版本中的评估指标进行了调整，避免因为过于严苛的匹配规则所导致的假阴性结果。数据的上下文长度最长为 256K。
LongbenchChat：一个评价长文本任务中人类偏好对齐的数据集。数据的上下文长度最长为 100K。

从结果中可以看出，Qwen2.5-Turbo 在各类长文本任务中均有优势：

在 RULER 基准测试中，Qwen2.5-Turbo 取得了 93.1 分，不仅超越了 GPT-4o-mini，甚至超越了 GPT-4，证明了其处理长文本任务的卓越能力。
在 LV-Eval、LongBench-Chat 等更加接近真实情况的长文本任务中，Qwen2.5-Turbo 在多数维度超越了 GPT-4o-mini，并且能够进一步扩展到超过 128K tokens 的上下文的问题上。

短文本任务

除了长序列任务的性能提升外，我们也同样关心模型在短文本任务上的性能。现有的上下文长度扩展方案经常会导致模型在处理短文本时出现比较大的性能下降，因此我们在构建 Qwen2.5-Turbo 的过程中特别关注了该问题，保证了在扩展上下文长度的同时，几乎不会影响短文本任务的能力。

在传统短文本基准上的结果显示，Qwen2.5-Turbo 在大部分任务上显著超越之前上下文长度为 1M tokens 的开源模型；和 GPT-4o-mini 以及 Qwen2.5-14B-Instruct 模型相比，Qwen2.5-Turbo 在短文本任务性能接近的同时，却能支持其 8 倍长度的上下文。

推理速度

我们对不同长度的输入在多种模型架构下的首包延时时间进行了测试。在 1M tokens 的序列上，我们利用稀疏注意力机制将注意力部分的计算量压缩了约 12.5 倍，在不同硬件配置下实现了 3.2 至 4.3 倍的加速比。

下一步目标

虽然我们很高兴终于能够将 Qwen2.5-Turbo 的上下文扩展到 1M tokens，但我们也清楚地认识到，当前模型在解决实际长序列任务上并不总是令人满意。这里存在很多未解决的挑战，例如长序列任务中模型的表现更加不稳定，推理的成本使得使用更大规模的模型变得困难等。不过，我们将积极进一步探索长序列人类偏好对齐，优化推理效率以减少运算时间，尝试推出更大、更强的长序列模型。我们期望能够很快向大家分享在长序列模型上的新进展，敬请关注！

将上下文长度扩展至百万 Tokens ！

简介#

实例演示#

如何使用 API#

模型表现#

大海捞针#

更复杂的长文本任务#

短文本任务#

推理速度#

下一步目标#

简介

实例演示

如何使用 API

模型表现

大海捞针

更复杂的长文本任务

短文本任务

推理速度

下一步目标