Prompt Suggestion (NES) Design

Predicts what the user would naturally type next after the AI completes a response, showing it as ghost text in the input prompt.

Implementation status: prompt-suggestion-implementation.md. Speculation engine: speculation-design.md.

Overview

A prompt suggestion (Next-step Suggestion / NES) is a short prediction (2-12 words) of the user’s next input, generated by an LLM call after each AI response. It appears as ghost text in the input prompt. The user can accept it with Tab/Enter/Right Arrow or dismiss it by typing.

Architecture


┌─────────────────────────────────────────────────────────────┐
│  AppContainer (CLI)                                         │
│                                                             │
│  Responding → Idle transition                               │
│       │                                                     │
│       ▼                                                     │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Guard Conditions (11 categories)                    │    │
│  │  settings, interactive, sdk, plan mode, dialogs,    │    │
│  │  elicitation, API error                             │    │
│  └────────────────────┬────────────────────────────────┘    │
│                       │                                     │
│                       ▼                                     │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  generatePromptSuggestion()                         │    │
│  │                                                     │    │
│  │  ┌─── CacheSafeParams available? ───┐               │    │
│  │  │                                  │               │    │
│  │  ▼ YES                         NO ▼                 │    │
│  │  runForkedQuery()      BaseLlmClient.generateJson() │    │
│  │  (cache-aware)         (standalone fallback)        │    │
│  │                                                     │    │
│  │  ──── SUGGESTION_PROMPT ────                        │    │
│  │  ──── 12 filter rules ──────                        │    │
│  │  ──── getFilterReason() ────                        │    │
│  └────────────────────┬────────────────────────────────┘    │
│                       │                                     │
│                       ▼                                     │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  FollowupController (framework-agnostic)            │    │
│  │  300ms delay → show as ghost text                   │    │
│  │                                                     │    │
│  │  Tab    → accept (fill input)                       │    │
│  │  Enter  → accept + submit                           │    │
│  │  Right  → accept (fill input)                       │    │
│  │  Type   → dismiss + abort speculation               │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Telemetry (PromptSuggestionEvent)                  │    │
│  │  outcome, accept_method, timing, similarity,        │    │
│  │  keystroke, focus, suppression reason, prompt_id     │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Suggestion Generation

LLM Prompt


[SUGGESTION MODE: Suggest what the user might naturally type next.]

FIRST: Read the LAST FEW LINES of the assistant's most recent message — that's where
next-step hints, tips, and actionable suggestions usually appear. Then check the user's
recent messages and original request.

Your job is to predict what THEY would type - not what you think they should do.
THE TEST: Would they think "I was just about to type that"?

PRIORITY: If the assistant's last message contains a tip or hint like "Tip: type X to ..."
or "type X to ...", extract X as the suggestion. These are explicit next-step hints.

EXAMPLES:
Assistant says "Tip: type post comments to publish findings" → "post comments"
Assistant says "type /review to start" → "/review"
User asked "fix the bug and run tests", bug is fixed → "run the tests"
After code written → "try it out"
Task complete, obvious follow-up → "commit this" or "push it"

Format: 2-12 words, match the user's style. Or nothing.
Reply with ONLY the suggestion, no quotes or explanation.

Filter Rules (12)

Rule	Example blocked
done	”done”
meta_text	”nothing found”, “no suggestion”, “silence”
meta_wrapped	”(silence)”, “[no suggestion]“
error_message	”api error: 500”
prefixed_label	”Suggestion: commit”
too_few_words	”hmm” (but allows “yes”, “commit”, “push” etc.)
too_many_words	> 12 words
too_long	>= 100 chars
multiple_sentences	”Run tests. Then commit.”
has_formatting	newlines, markdown bold
evaluative	”looks good”, “thanks” (with \b word boundaries)
ai_voice	”Let me…”, “I’ll…”, “Here’s…”

Guard Conditions

AppContainer useEffect (13 checks in code):

Guard	Check
Settings toggle	`enableFollowupSuggestions`
Non-interactive	`config.isInteractive()`
SDK mode	`!config.getSdkMode()`
Streaming transition	`Responding → Idle` (2 checks)
API error (history)	`historyManager.history[last]?.type !== 'error'`
API error (pending)	`!pendingGeminiHistoryItems.some(type === 'error')`
Confirmation dialogs	shell + general + loop detection (3 checks)
Permission dialog	`isPermissionsDialogOpen`
Elicitation	`settingInputRequests.length === 0`
Plan mode	`ApprovalMode.PLAN`

Inside generatePromptSuggestion():

Guard	Check
Early conversation	`modelTurns < 2`

Separate feature flags (not in guard block):

Flag	Controls
`enableCacheSharing`	Whether to use forked query or fallback to generateJson
`enableSpeculation`	Whether to start speculation on suggestion display

State Management

FollowupState


interface FollowupState {
  suggestion: string | null;
  isVisible: boolean;
  shownAt: number; // timestamp for telemetry
}

FollowupController

Framework-agnostic controller shared by CLI (Ink) and WebUI (React):

setSuggestion(text) — 300ms delayed show, null clears immediately
accept(method) — clears state, fires onAccept via microtask, 100ms debounce lock
dismiss() — clears state, logs ignored telemetry
clear() — hard reset all state + timers
Object.freeze(INITIAL_FOLLOWUP_STATE) prevents accidental mutation

Keyboard Interaction

Key	CLI	WebUI
Tab	Fill input (no submit)	Fill input (no submit)
Enter	Fill + submit	Fill + submit (`explicitText` param)
Right Arrow	Fill input (no submit)	Fill input (no submit)
Typing	Dismiss + abort speculation	Dismiss
Paste	Dismiss + abort speculation	Dismiss

Key Binding Note

The Tab handler uses key.name === 'tab' explicitly (not ACCEPT_SUGGESTION matcher) because ACCEPT_SUGGESTION also matches Enter, which must fall through to the SUBMIT handler.

Telemetry

PromptSuggestionEvent

Field	Type	Description
outcome	accepted/ignored/suppressed	Final outcome
prompt_id	string	Default: ‘user_intent’
accept_method	tab/enter/right	How user accepted
time_to_accept_ms	number	Time from shown to accept
time_to_ignore_ms	number	Time from shown to dismiss
time_to_first_keystroke_ms	number	Time to first keystroke while shown
suggestion_length	number	Character count
similarity	number	1.0 for accept, 0.0 for ignore
was_focused_when_shown	boolean	Terminal had focus
reason	string	For suppressed: filter rule name

SpeculationEvent

Field	Type	Description
outcome	accepted/aborted/failed	Speculation result
turns_used	number	API round-trips
files_written	number	Files in overlay
tool_use_count	number	Tools executed
duration_ms	number	Wall-clock time
boundary_type	string	What stopped speculation
had_pipelined_suggestion	boolean	Next suggestion generated

Feature Flags and Settings

Setting	Type	Default	Description
`enableFollowupSuggestions`	boolean	true	Master toggle for prompt suggestions
`enableCacheSharing`	boolean	true	Use cache-aware forked queries
`enableSpeculation`	boolean	false	Predictive execution engine
`fastModel` (top-level)	string	""	Model for all background tasks (empty = use main model). Set via `/model --fast`

Internal Prompt ID Filtering

Background operations use dedicated prompt IDs (INTERNAL_PROMPT_IDS in utils/internalPromptIds.ts) to prevent their API traffic and tool calls from appearing in the user-visible UI:

Prompt ID	Used by
`prompt_suggestion`	Suggestion generation
`forked_query`	Cache-aware forked queries
`speculation`	Speculation engine

Filtering applied:

loggingContentGenerator — skips logApiRequest and OpenAI interaction logging for internal IDs
logApiResponse / logApiError — skips chatRecordingService.recordUiTelemetryEvent
logToolCall — skips chatRecordingService.recordUiTelemetryEvent
uiTelemetryService.addEvent — not filtered (ensures /stats token tracking works)

Thinking Mode

Thinking/reasoning is explicitly disabled (thinkingConfig: { includeThoughts: false }) for all background task paths:

Forked query path (createForkedChat) — overrides thinkingConfig in the cloned generationConfig, covering both suggestion generation and speculation
BaseLlm fallback path (generateViaBaseLlm) — per-request config overrides base content generator’s thinking settings

This is safe because:

Cache prefix is determined by systemInstruction + tools + history, not thinkingConfig — cache hits are unaffected
All backends (Gemini, OpenAI-compatible, Anthropic) handle includeThoughts: false by omitting the thinking field — no API errors on models without thinking support
Suggestion generation and speculation don’t benefit from reasoning tokens