Skip to Content
User GuideFeaturesFollowup Suggestions

Followup Suggestions

Qwen Code can predict what you want to type next and show it as ghost text in the input area. This feature uses an LLM call to analyze the conversation context and generate a natural next step suggestion.

This feature works end-to-end in the CLI. In the WebUI, the hook and UI plumbing are available, but host applications must trigger suggestion generation and wire the followup state for suggestions to appear.

How It Works

After Qwen Code finishes responding, a suggestion appears as dimmed text in the input area after a short delay (~300ms). For example, after fixing a bug, you might see:

> run the tests

The suggestion is generated by sending the conversation history to the model, which predicts what you would naturally type next. If the response contains an explicit tip (e.g., Tip: type post comments to publish findings), the suggested action is extracted automatically.

Accepting Suggestions

KeyAction
TabAccept the suggestion and fill it into the input
EnterAccept the suggestion and submit it immediately
Right ArrowAccept the suggestion and fill it into the input
Any typingDismiss the suggestion and type normally

When Suggestions Appear

Suggestions are generated when all of the following conditions are met:

  • The model has completed its response (not during streaming)
  • At least 2 model turns have occurred in the conversation
  • There are no errors in the most recent response
  • No confirmation dialogs are pending (e.g., shell confirmation, permissions)
  • The approval mode is not set to plan
  • The feature is enabled in settings (enabled by default)

Suggestions will not appear in non-interactive mode (e.g., headless/SDK mode).

Suggestions are automatically dismissed when:

  • You start typing
  • A new model turn begins
  • The suggestion is accepted

Fast Model

By default, suggestions use the same model as your main conversation. For faster and cheaper suggestions, configure a dedicated fast model:

Via command

/model --fast qwen3-coder-flash

Or use /model --fast (without a model name) to open a selection dialog.

Via settings.json

{ "fastModel": "qwen3-coder-flash" }

The fast model is used for prompt suggestions and speculative execution. When not configured, the main conversation model is used as fallback.

Thinking/reasoning mode is automatically disabled for all background tasks (suggestion generation and speculation), regardless of your main model’s thinking configuration. This avoids wasting tokens on internal reasoning that isn’t needed for these tasks.

Configuration

These settings can be configured in settings.json:

SettingTypeDefaultDescription
ui.enableFollowupSuggestionsbooleantrueEnable or disable followup suggestions
ui.enableCacheSharingbooleantrueUse cache-aware forked queries to reduce cost (experimental)
ui.enableSpeculationbooleanfalseSpeculatively execute suggestions before submission (experimental)
fastModelstring""Model for prompt suggestions and speculative execution

Example

{ "fastModel": "qwen3-coder-flash", "ui": { "enableFollowupSuggestions": true, "enableCacheSharing": true } }

Monitoring

Suggestion model usage appears in /stats output, showing tokens consumed by the fast model for suggestion generation.

The fast model is also shown in /about output under “Fast Model”.

Suggestion Quality

Suggestions go through quality filters to ensure they are useful:

  • Must be 2-12 words (CJK: 2-30 characters), under 100 characters total
  • Cannot be evaluative (“looks good”, “thanks”)
  • Cannot use AI voice (“Let me…”, “I’ll…”)
  • Cannot be multiple sentences or contain formatting (markdown, newlines)
  • Cannot be meta-commentary (“nothing to suggest”, “silence”)
  • Cannot be error messages or prefixed labels (“Suggestion: …”)
  • Single-word suggestions are only allowed for common commands (yes, commit, push, etc.)
  • Slash commands (e.g., /commit) are always allowed as single-word suggestions
Last updated on