API文档 HuggingFace Demo ModelScope Demo

简介

在 Qwen2.5 发布之后,我们听到社区对处理更长序列的需求。在这段时间,我们针对长序列处理能力以及长序列下的推理效率进行了很多优化。今天,我们隆重推出新的 Qwen2.5-Turbo 版本,其特点在于:

  1. 更长的上下文支持: 我们首次将模型的上下文长度从 128k 扩展到 1M,该长度约为 100 万个英文单词或 150 万个汉字,相当于 10 本长篇小说,150 小时的演讲稿,3 万行代码。该模型能在 1M 长度的大海捞针 (Passkey Retrieval) 任务中实现 100% 的准确率,在长文本评测集 RULER 上获得 93.1 分,超越 GPT-4 的 91.6 分,GLM4-9B-1M 的 89.9 分。此外,在短序列能力上,该模型仍然保持了非常强的竞争力,与 GPT-4o-mini 持平。
  2. 更快的推理速度: 利用稀疏注意力机制,我们成功将处理 1M tokens 上下文时的首字返回时间从 4.9 分钟降低到68秒,实现 4.3 倍加速比
  3. 更低的价格:价格仍为 0.3元 / 1M tokens。在相同成本下,Qwen2.5-Turbo 相比 GPT-4o-mini 能够处理 3.6 倍的 Token

现在,你可以通过阿里云大模型服务平台的 API 服务,或者通过 HuggingFace Demo 或者 ModelScope Demo 进行体验。

实例演示

我们准备了一些实例演示,来体现 Qwen2.5-Turbo 能带来的新用法,涵盖了长篇小说的深入理解,仓库级别代码助手,多篇论文阅读等能力。

Example: Understanding Long Novels Next
User

[Upload the three Chinese novels of The Three-Body Problem, which contain 690k tokens.]

Provide a summary of the plots of these three novels in English

Qwen2.5-Turbo

The three novels you mentioned from the “Earth’s Past” series are interconnected and form a trilogy that delves into the complex interactions between Earth and an alien civilization known as the Trisolarans.

First Novel: “Earth’s Past” In the first novel, the narrative focuses on the development of a top-secret project called “Red Coast” (红岸) initiated by the Chinese military. The protagonist, Ye Wenjie, is a scientist who discovers that the Trisolarans have begun listening to the universe. Her actions, particularly her decision to respond to the Trisolarans, drastically alter the course of human history. She sets up a powerful signal to send to the Trisolarans using the sun as an amplifier. This action marks the beginning of a new era where humanity faces a significant existential threat.

Second Novel: “Dark Forest” The second novel, titled “Dark Forest,” delves into the concept of the “Dark Forest Theory.” According to this theory, the universe is a vast battlefield where civilizations must remain silent to avoid detection and subsequent destruction. The Trisolarans and humans engage in a complex strategic game, with the Trisolarans deploying advanced weapons known as “water drops” (水滴). These water drops are capable of destroying entire planets by disrupting their ecosystems. Meanwhile, humans develop their own strategies, including the “Wall-Bearing Plan” (面壁计划), which involves four individuals who devise plans to counter the Trisolaran threat without revealing their intentions to the outside world. The story follows one of these face-bearers, Luo Ji, as he struggles with the ethical implications of his mission.

Third Novel: “Death’s End” In “Death’s End,” Luo Ji becomes the ultimate guardian of the Dark Forest Theory. He holds the key to the survival of both human and Trisolaran civilizations. However, his position as the guardian (执剑人) brings him immense personal struggle and psychological turmoil. As Luo Ji’s time as the guardian nears its end, he must make a critical decision about the future of humanity. Meanwhile, the Trisolarans continue their efforts to colonize and assimilate other worlds. The novel explores the consequences of this decision and the broader impact on the universe. It also introduces a new element: the creation of micro-black holes (黑域) by humans, which could potentially isolate them from the rest of the universe. However, the effectiveness of this strategy is questioned when six of these micro-black holes begin to move and accelerate towards Earth. The trilogy as a whole explores themes of survival, technological advancement, and the ethical dilemmas faced by humanity in the face of extraterrestrial threats. Each novel builds upon the previous one, revealing more about the intricate relationship between the two civilizations and the universe they inhabit.

如何使用 API

最新的支持 1M tokens 的 Qwen2.5-Turbo 的使用方法和 Qwen API 的标准用法一致,并且与 OpenAI API 兼容。下面是一个简单的 Python 调用示例 (注意:你需要将环境变量 YOUR_API_KEY 设置为您的 API Key,欲了解更多细节请访问阿里云大模型服务快速入门):

import os

from openai import OpenAI

# 读取长文本文件
with open("example.txt", "r", encoding="utf-8") as f:
    text = f.read()
user_input = text + "\n\nSummarize the above text."

client = OpenAI(
    api_key=os.getenv("YOUR_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-turbo-latest",
    messages=[
      {'role': 'system', 'content': 'You are a helpful assistant.'},
      {'role': 'user', 'content': user_input},
    ],
)

print(completion.choices[0].message)

模型表现

在这一部分,我们通过各类基准测试来评估 Qwen2.5-Turbo 的性能表现以及推理速度上的进步。

大海捞针

我们首先在 1M 大海捞针 (Passkey Retrieval) 任务上进行了实验。结果显示,Qwen2.5-Turbo 能在1M长度的无关文本中完美捕捉所有隐藏的数字,证明模型在超长上下文中捕捉细节信息的能力。

更复杂的长文本任务

我们选取了若干长文本任务数据集对模型进行了测试,包括:

  • RULER:在大海捞针基础上的扩展基准,任务包括在无关上下文中查找多“针”或回答多个问题,或找到上下文中出现最多或最少的词。数据的上下文长度最长为 128K。
  • LV-Eval:要求同时理解众多证据片段的基准测试。我们对 LV-Eval 原始版本中的评估指标进行了调整,避免因为过于严苛的匹配规则所导致的假阴性结果。数据的上下文长度最长为 256K。
  • LongbenchChat:一个评价长文本任务中人类偏好对齐的数据集。数据的上下文长度最长为 100K。

从结果中可以看出,Qwen2.5-Turbo 在各类长文本任务中均有优势:

  • 在 RULER 基准测试中,Qwen2.5-Turbo 取得了 93.1 分,不仅超越了 GPT-4o-mini,甚至超越了 GPT-4,证明了其处理长文本任务的卓越能力。
  • 在 LV-Eval、LongBench-Chat 等更加接近真实情况的长文本任务中,Qwen2.5-Turbo 在多数维度超越了 GPT-4o-mini,并且能够进一步扩展到超过 128K tokens 的上下文的问题上。

短文本任务

除了长序列任务的性能提升外,我们也同样关心模型在短文本任务上的性能。现有的上下文长度扩展方案经常会导致模型在处理短文本时出现比较大的性能下降,因此我们在构建 Qwen2.5-Turbo 的过程中特别关注了该问题,保证了在扩展上下文长度的同时,几乎不会影响短文本任务的能力。

在传统短文本基准上的结果显示,Qwen2.5-Turbo 在大部分任务上显著超越之前上下文长度为 1M tokens 的开源模型;和 GPT-4o-mini 以及 Qwen2.5-14B-Instruct 模型相比,Qwen2.5-Turbo 在短文本任务性能接近的同时,却能支持其 8 倍长度的上下文。

推理速度

我们对不同长度的输入在多种模型架构下的首包延时时间进行了测试。在 1M tokens 的序列上,我们利用稀疏注意力机制将注意力部分的计算量压缩了约 12.5 倍,在不同硬件配置下实现了 3.2 至 4.3 倍的加速比。

下一步目标

虽然我们很高兴终于能够将 Qwen2.5-Turbo 的上下文扩展到 1M tokens,但我们也清楚地认识到,当前模型在解决实际长序列任务上并不总是令人满意。这里存在很多未解决的挑战,例如长序列任务中模型的表现更加不稳定,推理的成本使得使用更大规模的模型变得困难等。不过,我们将积极进一步探索长序列人类偏好对齐,优化推理效率以减少运算时间,尝试推出更大、更强的长序列模型。我们期望能够很快向大家分享在长序列模型上的新进展,敬请关注!