Skip to Content
DesignPrompt SuggestionPrompt Suggestion (NES) Design

Prompt Suggestion (NES) Design

Predicts what the user would naturally type next after the AI completes a response, showing it as ghost text in the input prompt.

Implementation status: prompt-suggestion-implementation.md. Speculation engine: speculation-design.md.

Overview

A prompt suggestion (Next-step Suggestion / NES) is a short prediction (2-12 words) of the user’s next input, generated by an LLM call after each AI response. It appears as ghost text in the input prompt. The user can accept it with Tab/Enter/Right Arrow or dismiss it by typing.

Architecture

┌─────────────────────────────────────────────────────────────┐ │ AppContainer (CLI) │ │ │ │ Responding → Idle transition │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Guard Conditions (11 categories) │ │ │ │ settings, interactive, sdk, plan mode, dialogs, │ │ │ │ elicitation, API error │ │ │ └────────────────────┬────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ generatePromptSuggestion() │ │ │ │ │ │ │ │ ┌─── CacheSafeParams available? ───┐ │ │ │ │ │ │ │ │ │ │ ▼ YES NO ▼ │ │ │ │ runForkedQuery() BaseLlmClient.generateJson() │ │ │ │ (cache-aware) (standalone fallback) │ │ │ │ │ │ │ │ ──── SUGGESTION_PROMPT ──── │ │ │ │ ──── 12 filter rules ────── │ │ │ │ ──── getFilterReason() ──── │ │ │ └────────────────────┬────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ FollowupController (framework-agnostic) │ │ │ │ 300ms delay → show as ghost text │ │ │ │ │ │ │ │ Tab → accept (fill input) │ │ │ │ Enter → accept + submit │ │ │ │ Right → accept (fill input) │ │ │ │ Type → dismiss + abort speculation │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Telemetry (PromptSuggestionEvent) │ │ │ │ outcome, accept_method, timing, similarity, │ │ │ │ keystroke, focus, suppression reason, prompt_id │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘

Suggestion Generation

LLM Prompt

[SUGGESTION MODE: Suggest what the user might naturally type next.] FIRST: Read the LAST FEW LINES of the assistant's most recent message — that's where next-step hints, tips, and actionable suggestions usually appear. Then check the user's recent messages and original request. Your job is to predict what THEY would type - not what you think they should do. THE TEST: Would they think "I was just about to type that"? PRIORITY: If the assistant's last message contains a tip or hint like "Tip: type X to ..." or "type X to ...", extract X as the suggestion. These are explicit next-step hints. EXAMPLES: Assistant says "Tip: type post comments to publish findings" → "post comments" Assistant says "type /review to start" → "/review" User asked "fix the bug and run tests", bug is fixed → "run the tests" After code written → "try it out" Task complete, obvious follow-up → "commit this" or "push it" Format: 2-12 words, match the user's style. Or nothing. Reply with ONLY the suggestion, no quotes or explanation.

Filter Rules (12)

RuleExample blocked
done”done”
meta_text”nothing found”, “no suggestion”, “silence”
meta_wrapped”(silence)”, “[no suggestion]“
error_message”api error: 500”
prefixed_label”Suggestion: commit”
too_few_words”hmm” (but allows “yes”, “commit”, “push” etc.)
too_many_words> 12 words
too_long>= 100 chars
multiple_sentences”Run tests. Then commit.”
has_formattingnewlines, markdown bold
evaluative”looks good”, “thanks” (with \b word boundaries)
ai_voice”Let me…”, “I’ll…”, “Here’s…”

Guard Conditions

AppContainer useEffect (13 checks in code):

GuardCheck
Settings toggleenableFollowupSuggestions
Non-interactiveconfig.isInteractive()
SDK mode!config.getSdkMode()
Streaming transitionResponding → Idle (2 checks)
API error (history)historyManager.history[last]?.type !== 'error'
API error (pending)!pendingGeminiHistoryItems.some(type === 'error')
Confirmation dialogsshell + general + loop detection (3 checks)
Permission dialogisPermissionsDialogOpen
ElicitationsettingInputRequests.length === 0
Plan modeApprovalMode.PLAN

Inside generatePromptSuggestion():

GuardCheck
Early conversationmodelTurns < 2

Separate feature flags (not in guard block):

FlagControls
enableCacheSharingWhether to use forked query or fallback to generateJson
enableSpeculationWhether to start speculation on suggestion display

State Management

FollowupState

interface FollowupState { suggestion: string | null; isVisible: boolean; shownAt: number; // timestamp for telemetry }

FollowupController

Framework-agnostic controller shared by CLI (Ink) and WebUI (React):

  • setSuggestion(text) — 300ms delayed show, null clears immediately
  • accept(method) — clears state, fires onAccept via microtask, 100ms debounce lock
  • dismiss() — clears state, logs ignored telemetry
  • clear() — hard reset all state + timers
  • Object.freeze(INITIAL_FOLLOWUP_STATE) prevents accidental mutation

Keyboard Interaction

KeyCLIWebUI
TabFill input (no submit)Fill input (no submit)
EnterFill + submitFill + submit (explicitText param)
Right ArrowFill input (no submit)Fill input (no submit)
TypingDismiss + abort speculationDismiss
PasteDismiss + abort speculationDismiss

Key Binding Note

The Tab handler uses key.name === 'tab' explicitly (not ACCEPT_SUGGESTION matcher) because ACCEPT_SUGGESTION also matches Enter, which must fall through to the SUBMIT handler.

Telemetry

PromptSuggestionEvent

FieldTypeDescription
outcomeaccepted/ignored/suppressedFinal outcome
prompt_idstringDefault: ‘user_intent’
accept_methodtab/enter/rightHow user accepted
time_to_accept_msnumberTime from shown to accept
time_to_ignore_msnumberTime from shown to dismiss
time_to_first_keystroke_msnumberTime to first keystroke while shown
suggestion_lengthnumberCharacter count
similaritynumber1.0 for accept, 0.0 for ignore
was_focused_when_shownbooleanTerminal had focus
reasonstringFor suppressed: filter rule name

SpeculationEvent

FieldTypeDescription
outcomeaccepted/aborted/failedSpeculation result
turns_usednumberAPI round-trips
files_writtennumberFiles in overlay
tool_use_countnumberTools executed
duration_msnumberWall-clock time
boundary_typestringWhat stopped speculation
had_pipelined_suggestionbooleanNext suggestion generated

Feature Flags and Settings

SettingTypeDefaultDescription
enableFollowupSuggestionsbooleantrueMaster toggle for prompt suggestions
enableCacheSharingbooleantrueUse cache-aware forked queries
enableSpeculationbooleanfalsePredictive execution engine
fastModel (top-level)string""Model for all background tasks (empty = use main model). Set via /model --fast

Internal Prompt ID Filtering

Background operations use dedicated prompt IDs (INTERNAL_PROMPT_IDS in utils/internalPromptIds.ts) to prevent their API traffic and tool calls from appearing in the user-visible UI:

Prompt IDUsed by
prompt_suggestionSuggestion generation
forked_queryCache-aware forked queries
speculationSpeculation engine

Filtering applied:

  • loggingContentGenerator — skips logApiRequest and OpenAI interaction logging for internal IDs
  • logApiResponse / logApiError — skips chatRecordingService.recordUiTelemetryEvent
  • logToolCall — skips chatRecordingService.recordUiTelemetryEvent
  • uiTelemetryService.addEventnot filtered (ensures /stats token tracking works)

Thinking Mode

Thinking/reasoning is explicitly disabled (thinkingConfig: { includeThoughts: false }) for all background task paths:

  • Forked query path (createForkedChat) — overrides thinkingConfig in the cloned generationConfig, covering both suggestion generation and speculation
  • BaseLlm fallback path (generateViaBaseLlm) — per-request config overrides base content generator’s thinking settings

This is safe because:

  • Cache prefix is determined by systemInstruction + tools + history, not thinkingConfig — cache hits are unaffected
  • All backends (Gemini, OpenAI-compatible, Anthropic) handle includeThoughts: false by omitting the thinking field — no API errors on models without thinking support
  • Suggestion generation and speculation don’t benefit from reasoning tokens
Last updated on