Skip to Content
GuideCore ModulsSchema

Qwen-Agent Schema Documentation

Overview

The qwen-agent schema provides a structured, type-safe messaging system to support advanced capabilities such as multimodal conversations, function calling, and reasoning chains. Built on Pydantic, it ensures data integrity during construction, validation, and serialization while enabling flexible representation of multimodal content (text, images, files, audio, and video).

Design Goals

  • Type Safety: Enforced validation via Pydantic models.
  • Multimodal Support: Messages can include heterogeneous media types.
  • Compatibility: Aligns with OpenAI-style message formats while extending support for arbitrary metadata via extra.
  • Developer Experience: Offers dictionary-like access (__getitem__, .get()), automatic exclusion of None fields during serialization, and intuitive debugging representations.

Core Constants

# Role types SYSTEM = 'system' USER = 'user' ASSISTANT = 'assistant' FUNCTION = 'function' # Content types (used in ContentItem) TEXT = 'text' IMAGE = 'image' FILE = 'file' AUDIO = 'audio' VIDEO = 'video' # Message field names ROLE = 'role' CONTENT = 'content' REASONING_CONTENT = 'reasoning_content' NAME = 'name'

Base Class: BaseModelCompatibleDict

All schema classes inherit from BaseModelCompatibleDict, which extends pydantic.BaseModel with dictionary-like behavior.

Features

  • Dictionary-style access:
    msg = Message(role='user', content='Hello') print(msg['role']) # → 'user'
  • Safe retrieval:
    msg.get('non_existent_key', 'default') # → 'default'
  • Clean serialization (automatically omits None fields):
    msg.model_dump() # fields with value=None are excluded by default
  • Readable string representation: str(msg) returns the result of model_dump().

Core Schema Classes

1. FunctionCall

Represents a function invocation proposed or executed by the model.

class FunctionCall(BaseModelCompatibleDict): name: str arguments: str # JSON-encoded string

Example:

fc = FunctionCall(name='get_weather', arguments='{"city": "Beijing"}') print(fc.name) # → 'get_weather' print(fc.arguments) # → '{"city": "Beijing"}'

2. ContentItem

Represents a single piece of multimodal content. Exactly one of its fields (text, image, file, audio, video) must be provided.

class ContentItem(BaseModelCompatibleDict): text: Optional[str] = None image: Optional[str] = None # e.g., base64-encoded data or URL file: Optional[str] = None # file path or URL audio: Optional[Union[str, dict]] = None video: Optional[Union[str, list]] = None

Properties

  • .type'text' | 'image' | 'file' | 'audio' | 'video'
  • .value → the associated string value (e.g., base64, URL, file path)

Validation

  • Mutual Exclusivity: Only one field may be non-None. Providing zero or more than one field raises a ValueError.

Examples:

# Valid txt = ContentItem(text='Hello') img = ContentItem(image='https://example.jpg') img = ContentItem(image='data:image/png;base64,...') # Invalid (raises ValueError) bad = ContentItem(text='Hi', image='...') # ❌ "Exactly one ... must be provided"

3. Message

Represents a single message in a conversation, supporting multimodal content and function calls.

class Message(BaseModelCompatibleDict): role: Literal['system', 'user', 'assistant', 'function'] content: Union[str, List[ContentItem]] reasoning_content: Optional[Union[str, List[ContentItem]]] = None name: Optional[str] = None function_call: Optional[FunctionCall] = None extra: Optional[dict] = None

Field Descriptions

FieldTypeDescription
rolestrMust be one of: 'system', 'user', 'assistant', 'function'
contentstr or List[ContentItem]Primary message content – plain text or a list of multimodal items
reasoning_contentOptionalStores the model’s reasoning trace (e.g., chain-of-thought), same format as content
nameOptional strWhen role == 'function', identifies the called function
function_callOptional FunctionCallWhen role == 'assistant', indicates a suggested function to invoke
extraOptional dictArbitrary metadata (e.g., token counts, logs, custom annotations)

Construction Notes

  • If content is None, it is automatically set to an empty string ''.
  • The role field is validated to ensure it matches one of the allowed values.

Examples:

Plain text message:

msg = Message(role='user', content='What is the weather in Tokyo?')

Multimodal input (text + image):

content = [ ContentItem(text='Describe this image:'), ContentItem(image='https://example.com/cat.jpg') ] msg = Message(role='user', content=content)

Assistant initiating a function call:

msg = Message( role='assistant', content='', function_call=FunctionCall(name='get_weather', arguments='{"city": "Tokyo"}') )

Function response:

msg = Message( role='function', name='get_weather', content='{"temperature": 25, "unit": "Celsius"}' )

Response with reasoning trace:

msgs = [Message( role='assistant', content='', reasoning_content='Step 1: Identify the city. Step 2: Fetch weather data...' ), Message( role='assistant', content='It is 25 degrees Celsius', )]

Compatibility Notes

  • The Qwen Agent receives and returns a list of messages, and the reasoning_content, content, and function_call in a response will be stored in separate messages.
  • Qwen agent will convert the data into the corresponding format (such as OpenAI Chat Completions format) when calling the model service.
  • When accessing the agent/llm message in JSON format, the received return value is also JSON. Similarly, if the input is pydantic models, the return value is also pydantic models.

Last updated on