Prompt Suggestion (NES) Design
Predicts what the user would naturally type next after the AI completes a response, showing it as ghost text in the input prompt.
Implementation status:
prompt-suggestion-implementation.md. Speculation engine:speculation-design.md.
Overview
A prompt suggestion (Next-step Suggestion / NES) is a short prediction (2-12 words) of the user’s next input, generated by an LLM call after each AI response. It appears as ghost text in the input prompt. The user can accept it with Tab/Enter/Right Arrow or dismiss it by typing.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ AppContainer (CLI) │
│ │
│ Responding → Idle transition │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Guard Conditions (11 categories) │ │
│ │ settings, interactive, sdk, plan mode, dialogs, │ │
│ │ elicitation, API error │ │
│ └────────────────────┬────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ generatePromptSuggestion() │ │
│ │ │ │
│ │ ┌─── CacheSafeParams available? ───┐ │ │
│ │ │ │ │ │
│ │ ▼ YES NO ▼ │ │
│ │ runForkedQuery() BaseLlmClient.generateJson() │ │
│ │ (cache-aware) (standalone fallback) │ │
│ │ │ │
│ │ ──── SUGGESTION_PROMPT ──── │ │
│ │ ──── 12 filter rules ────── │ │
│ │ ──── getFilterReason() ──── │ │
│ └────────────────────┬────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ FollowupController (framework-agnostic) │ │
│ │ 300ms delay → show as ghost text │ │
│ │ │ │
│ │ Tab → accept (fill input) │ │
│ │ Enter → accept + submit │ │
│ │ Right → accept (fill input) │ │
│ │ Type → dismiss + abort speculation │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Telemetry (PromptSuggestionEvent) │ │
│ │ outcome, accept_method, timing, similarity, │ │
│ │ keystroke, focus, suppression reason, prompt_id │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘Suggestion Generation
LLM Prompt
[SUGGESTION MODE: Suggest what the user might naturally type next.]
FIRST: Read the LAST FEW LINES of the assistant's most recent message — that's where
next-step hints, tips, and actionable suggestions usually appear. Then check the user's
recent messages and original request.
Your job is to predict what THEY would type - not what you think they should do.
THE TEST: Would they think "I was just about to type that"?
PRIORITY: If the assistant's last message contains a tip or hint like "Tip: type X to ..."
or "type X to ...", extract X as the suggestion. These are explicit next-step hints.
EXAMPLES:
Assistant says "Tip: type post comments to publish findings" → "post comments"
Assistant says "type /review to start" → "/review"
User asked "fix the bug and run tests", bug is fixed → "run the tests"
After code written → "try it out"
Task complete, obvious follow-up → "commit this" or "push it"
Format: 2-12 words, match the user's style. Or nothing.
Reply with ONLY the suggestion, no quotes or explanation.Filter Rules (12)
| Rule | Example blocked |
|---|---|
| done | ”done” |
| meta_text | ”nothing found”, “no suggestion”, “silence” |
| meta_wrapped | ”(silence)”, “[no suggestion]“ |
| error_message | ”api error: 500” |
| prefixed_label | ”Suggestion: commit” |
| too_few_words | ”hmm” (but allows “yes”, “commit”, “push” etc.) |
| too_many_words | > 12 words |
| too_long | >= 100 chars |
| multiple_sentences | ”Run tests. Then commit.” |
| has_formatting | newlines, markdown bold |
| evaluative | ”looks good”, “thanks” (with \b word boundaries) |
| ai_voice | ”Let me…”, “I’ll…”, “Here’s…” |
Guard Conditions
AppContainer useEffect (13 checks in code):
| Guard | Check |
|---|---|
| Settings toggle | enableFollowupSuggestions |
| Non-interactive | config.isInteractive() |
| SDK mode | !config.getSdkMode() |
| Streaming transition | Responding → Idle (2 checks) |
| API error (history) | historyManager.history[last]?.type !== 'error' |
| API error (pending) | !pendingGeminiHistoryItems.some(type === 'error') |
| Confirmation dialogs | shell + general + loop detection (3 checks) |
| Permission dialog | isPermissionsDialogOpen |
| Elicitation | settingInputRequests.length === 0 |
| Plan mode | ApprovalMode.PLAN |
Inside generatePromptSuggestion():
| Guard | Check |
|---|---|
| Early conversation | modelTurns < 2 |
Separate feature flags (not in guard block):
| Flag | Controls |
|---|---|
enableCacheSharing | Whether to use forked query or fallback to generateJson |
enableSpeculation | Whether to start speculation on suggestion display |
State Management
FollowupState
interface FollowupState {
suggestion: string | null;
isVisible: boolean;
shownAt: number; // timestamp for telemetry
}FollowupController
Framework-agnostic controller shared by CLI (Ink) and WebUI (React):
setSuggestion(text)— 300ms delayed show, null clears immediatelyaccept(method)— clears state, firesonAcceptvia microtask, 100ms debounce lockdismiss()— clears state, logsignoredtelemetryclear()— hard reset all state + timersObject.freeze(INITIAL_FOLLOWUP_STATE)prevents accidental mutation
Keyboard Interaction
| Key | CLI | WebUI |
|---|---|---|
| Tab | Fill input (no submit) | Fill input (no submit) |
| Enter | Fill + submit | Fill + submit (explicitText param) |
| Right Arrow | Fill input (no submit) | Fill input (no submit) |
| Typing | Dismiss + abort speculation | Dismiss |
| Paste | Dismiss + abort speculation | Dismiss |
Key Binding Note
The Tab handler uses key.name === 'tab' explicitly (not ACCEPT_SUGGESTION matcher) because ACCEPT_SUGGESTION also matches Enter, which must fall through to the SUBMIT handler.
Telemetry
PromptSuggestionEvent
| Field | Type | Description |
|---|---|---|
| outcome | accepted/ignored/suppressed | Final outcome |
| prompt_id | string | Default: ‘user_intent’ |
| accept_method | tab/enter/right | How user accepted |
| time_to_accept_ms | number | Time from shown to accept |
| time_to_ignore_ms | number | Time from shown to dismiss |
| time_to_first_keystroke_ms | number | Time to first keystroke while shown |
| suggestion_length | number | Character count |
| similarity | number | 1.0 for accept, 0.0 for ignore |
| was_focused_when_shown | boolean | Terminal had focus |
| reason | string | For suppressed: filter rule name |
SpeculationEvent
| Field | Type | Description |
|---|---|---|
| outcome | accepted/aborted/failed | Speculation result |
| turns_used | number | API round-trips |
| files_written | number | Files in overlay |
| tool_use_count | number | Tools executed |
| duration_ms | number | Wall-clock time |
| boundary_type | string | What stopped speculation |
| had_pipelined_suggestion | boolean | Next suggestion generated |
Feature Flags and Settings
| Setting | Type | Default | Description |
|---|---|---|---|
enableFollowupSuggestions | boolean | true | Master toggle for prompt suggestions |
enableCacheSharing | boolean | true | Use cache-aware forked queries |
enableSpeculation | boolean | false | Predictive execution engine |
fastModel (top-level) | string | "" | Model for all background tasks (empty = use main model). Set via /model --fast |
Internal Prompt ID Filtering
Background operations use dedicated prompt IDs (INTERNAL_PROMPT_IDS in utils/internalPromptIds.ts) to prevent their API traffic and tool calls from appearing in the user-visible UI:
| Prompt ID | Used by |
|---|---|
prompt_suggestion | Suggestion generation |
forked_query | Cache-aware forked queries |
speculation | Speculation engine |
Filtering applied:
loggingContentGenerator— skipslogApiRequestand OpenAI interaction logging for internal IDslogApiResponse/logApiError— skipschatRecordingService.recordUiTelemetryEventlogToolCall— skipschatRecordingService.recordUiTelemetryEventuiTelemetryService.addEvent— not filtered (ensures/statstoken tracking works)
Thinking Mode
Thinking/reasoning is explicitly disabled (thinkingConfig: { includeThoughts: false }) for all background task paths:
- Forked query path (
createForkedChat) — overridesthinkingConfigin the clonedgenerationConfig, covering both suggestion generation and speculation - BaseLlm fallback path (
generateViaBaseLlm) — per-request config overrides base content generator’s thinking settings
This is safe because:
- Cache prefix is determined by systemInstruction + tools + history, not
thinkingConfig— cache hits are unaffected - All backends (Gemini, OpenAI-compatible, Anthropic) handle
includeThoughts: falseby omitting the thinking field — no API errors on models without thinking support - Suggestion generation and speculation don’t benefit from reasoning tokens