Followup Suggestions
Qwen Code can predict what you want to type next and show it as ghost text in the input area. This feature uses an LLM call to analyze the conversation context and generate a natural next step suggestion.
This feature works end-to-end in the CLI. In the WebUI, the hook and UI plumbing are available, but host applications must trigger suggestion generation and wire the followup state for suggestions to appear.
How It Works
After Qwen Code finishes responding, a suggestion appears as dimmed text in the input area after a short delay (~300ms). For example, after fixing a bug, you might see:
> run the testsThe suggestion is generated by sending the conversation history to the model, which predicts what you would naturally type next. If the response contains an explicit tip (e.g., Tip: type post comments to publish findings), the suggested action is extracted automatically.
Accepting Suggestions
| Key | Action |
|---|---|
Tab | Accept the suggestion and fill it into the input |
Enter | Accept the suggestion and submit it immediately |
Right Arrow | Accept the suggestion and fill it into the input |
| Any typing | Dismiss the suggestion and type normally |
When Suggestions Appear
Suggestions are generated when all of the following conditions are met:
- The model has completed its response (not during streaming)
- At least 2 model turns have occurred in the conversation
- There are no errors in the most recent response
- No confirmation dialogs are pending (e.g., shell confirmation, permissions)
- The approval mode is not set to
plan - The feature is enabled in settings (enabled by default)
Suggestions will not appear in non-interactive mode (e.g., headless/SDK mode).
Suggestions are automatically dismissed when:
- You start typing
- A new model turn begins
- The suggestion is accepted
Fast Model
By default, suggestions use the same model as your main conversation. For faster and cheaper suggestions, configure a dedicated fast model:
Via command
/model --fast qwen3-coder-flashOr use /model --fast (without a model name) to open a selection dialog.
Via settings.json
{
"fastModel": "qwen3-coder-flash"
}The fast model is used for prompt suggestions and speculative execution. When not configured, the main conversation model is used as fallback.
Thinking/reasoning mode is automatically disabled for all background tasks (suggestion generation and speculation), regardless of your main model’s thinking configuration. This avoids wasting tokens on internal reasoning that isn’t needed for these tasks.
Configuration
These settings can be configured in settings.json:
| Setting | Type | Default | Description |
|---|---|---|---|
ui.enableFollowupSuggestions | boolean | true | Enable or disable followup suggestions |
ui.enableCacheSharing | boolean | true | Use cache-aware forked queries to reduce cost (experimental) |
ui.enableSpeculation | boolean | false | Speculatively execute suggestions before submission (experimental) |
fastModel | string | "" | Model for prompt suggestions and speculative execution |
Example
{
"fastModel": "qwen3-coder-flash",
"ui": {
"enableFollowupSuggestions": true,
"enableCacheSharing": true
}
}Monitoring
Suggestion model usage appears in /stats output, showing tokens consumed by the fast model for suggestion generation.
The fast model is also shown in /about output under “Fast Model”.
Suggestion Quality
Suggestions go through quality filters to ensure they are useful:
- Must be 2-12 words (CJK: 2-30 characters), under 100 characters total
- Cannot be evaluative (“looks good”, “thanks”)
- Cannot use AI voice (“Let me…”, “I’ll…”)
- Cannot be multiple sentences or contain formatting (markdown, newlines)
- Cannot be meta-commentary (“nothing to suggest”, “silence”)
- Cannot be error messages or prefixed labels (“Suggestion: …”)
- Single-word suggestions are only allowed for common commands (yes, commit, push, etc.)
- Slash commands (e.g.,
/commit) are always allowed as single-word suggestions