| @@ -2,6 +2,27 @@ | ||
| Notable changes to TreeTrace. The format follows Keep a Changelog, and the project uses semantic versioning. | ||
| + | ## 0.8.0 - 2026-06-18 | |
| + | ||
| + | ### Added | |
| + | ||
| + | - Typed rejection, refusal, and decline capture (schema v0.3). TreeTrace now records six classes of human-steering event that previously vanished from the lineage: a user declining a proposed tool action (`user_declined_tool`), an interrupt (`user_interrupt`), a typed decline like "stop, don't do that" (`user_text_decline`), a tool execution error (`tool_execution_error`), an environment permission denial (`permission_denied`), and a model refusal (`model_refusal`, captured from both `stop_reason: "refusal"` at 0.95 confidence and refusal text at 0.7). Each rejection carries kind, source, confidence, tool-use id, timestamp, and redacted evidence. Native Claude Code JSONL is fully wired; the other adapters gain an `addRejection` helper in `adapters/shared.js` for per-source wiring in later releases. Detection patterns are named, individually testable regex pieces composed at load time, following the v0.7.0 precedent for security intent and risky-command detection. | |
| + | - `--rejections` CLI flag. Mirrors `--failures` / `--lessons` / `--security` and writes `.treetrace/rejections.json`, a flattened, timestamp-sorted ledger of every captured rejection with a `byKind` summary. Each entry joins back to its source node id so consumers can locate it in the tree. | |
| + | - Read-only MCP `rejections_summary` tool. The MCP server gains a sixth read-only tool that returns the same rejection view as `--rejections`, so an agent can ask "what did the human reject in this session?" without leaving the protocol. Same no-arguments, no-mutations, redaction-shadow-scan-gated shape as the existing five tools. | |
| + | - Four new failure types derived from rejections: `user_rejected_action`, `tool_execution_failed`, `model_refused`, `permission_denied`. Each generates a lesson and an eval candidate of the matching type (`tool_permission_regression`, `tool_error_recovery`, `refusal_handling`), so the same failure-to-eval-to-handoff loop that existed for security and scope drift now exists for rejections. | |
| + | - `rejection` as a new PromptNode `kind` for synthetic nodes that exist only to carry a rejection signal (e.g. a tool-result rejection that arrived before any text prompt). Such nodes have empty `text`, a derived `title`, and one or more entries in `rejections`. | |
| + | ||
| + | ### Changed | |
| + | ||
| + | - Schema version bumped from `0.2` to `0.3`. Additive only; consumers that only understand v0.2 can keep reading `nodes` and `edges` and ignore `rejections`. The bump is centralized in `src/config.js` and propagates to every writer (per the v0.7.0 single-source change). | |
| + | - The redaction gate now scans `node.rejections[].evidence` alongside prompt text and action bodies, and applies the same redaction decisions to it. A secret in a tool_result error message or refusal text is now caught before any written artifact. Covered by a regression test. | |
| + | - The CLI `--from claude` value is now honored explicitly instead of falling through to the "unknown tool" error. The `TOOLS` array has always advertised `claude`; this closes the false-advertising gap end-to-end. | |
| + | - `flattenUserContent` now returns tool_result contents (`toolResults: [{ toolUseId, isError, content, contentType }]`) instead of just a count, so rejection classification has the text it needs. | |
| + | ||
| + | ### Performance | |
| + | ||
| + | - Rejection surfacing stays O(N) over nodes times O(R) over rejections per node (R bounded by tool blocks per turn). The pass deliberately does not call `nearestCorrectionAfter` / `nearestAcceptedAfter` for rejection-derived failures (each is O(N) and would reintroduce the quadratic scaling the v0.7.0 release eliminated on rejection-heavy sessions). A rejection IS the failure event; its resolution is implicit in the next accepted turn rather than something we chase. Covered by a 5000-node × 3-rejection regression test that completes in well under the 15s threshold. | |
| + | ||
| ## 0.7.0 - 2026-06-18 | ||
| ### Added |
| @@ -1,8 +1,8 @@ | ||
| - | # TreeTrace lineage schema v0.2 | |
| + | # TreeTrace lineage schema v0.3 | |
| `.treetrace/tree.json` is an open, vendor-neutral format for prompt lineage and agent-regression analysis in AI-assisted projects. | ||
| - | TreeTrace records the human steering layer: what was asked, what changed direction, what was corrected, what was abandoned, what future agents should remember, and which failures should become evals. | |
| + | TreeTrace records the human steering layer: what was asked, what changed direction, what was corrected, what was abandoned, what was rejected, what future agents should remember, and which failures should become evals. | |
| ## Layering | ||
| @@ -11,7 +11,7 @@ TreeTrace records the human steering layer: what was asked, what changed directi | ||
| | Code attribution | Agent Trace | which lines were AI-generated, by which model, linked to which conversation | | ||
| | Runtime telemetry | OpenTelemetry `gen_ai` | per-call spans for operators | | ||
| | Build integrity | SLSA / in-toto | signed provenance of build artifacts | | ||
| - | | Human steering | TreeTrace | prompt lineage, corrections, abandoned paths, lessons, eval candidates | | |
| + | | Human steering | TreeTrace | prompt lineage, corrections, abandoned paths, rejections, lessons, eval candidates | | |
| Agent Trace answers "which code came from AI?" TreeTrace answers "how did the human have to steer the agent?" | ||
| @@ -19,15 +19,15 @@ Agent Trace answers "which code came from AI?" TreeTrace answers "how did the hu | ||
| ```jsonc | ||
| { | ||
| - | "schemaVersion": "0.2", | |
| - | "generator": { "name": "treetrace", "version": "0.2.0", "url": "..." }, | |
| + | "schemaVersion": "0.3", | |
| + | "generator": { "name": "treetrace", "version": "0.3.0", "url": "..." }, | |
| "project": { "name": "...", "generatedAt": "ISO-8601", "sourceType": "claude-code-jsonl" }, | ||
| - | "stats": { "prompts": 41, "sessions": 6, "days": 9, "corrections": 3 }, | |
| + | "stats": { "prompts": 41, "sessions": 6, "days": 9, "corrections": 3, "rejections": 4 }, | |
| "analysis": { | ||
| - | "failureSignals": 7, | |
| + | "failureSignals": 11, | |
| "correctionChains": 3, | ||
| - | "evalCandidates": 4, | |
| - | "lessons": 4 | |
| + | "evalCandidates": 6, | |
| + | "lessons": 7 | |
| }, | ||
| "sessions": [ { "id": "...", "title": "...", "firstTs": "...", "lastTs": "...", "promptCount": 7 } ], | ||
| "nodes": [ /* PromptNode */ ], | ||
| @@ -38,7 +38,7 @@ Agent Trace answers "which code came from AI?" TreeTrace answers "how did the hu | ||
| } | ||
| ``` | ||
| - | All v0.2 additions are optional and additive. Consumers that only understand v0.1 can keep reading `nodes` and `edges`. | |
| + | All v0.3 additions are optional and additive. Consumers that only understand v0.2 can keep reading `nodes` and `edges` and ignore `rejections`. | |
| ## PromptNode | ||
| @@ -47,7 +47,7 @@ All v0.2 additions are optional and additive. Consumers that only understand v0. | ||
| | `id` | string | stable within the file (`node_001`, etc.) | | ||
| | `parentId` | string \| null | lineage parent (null = root) | | ||
| | `role` | `"user"` | reserved for future system/developer nodes | | ||
| - | | `kind` | enum | `root`, `direction`, `correction`, `scope-change`, `checkpoint`, `question` | | |
| + | | `kind` | enum | `root`, `direction`, `correction`, `scope-change`, `checkpoint`, `question`, `rejection` | | |
| | `title` | string | first-sentence distillation | | ||
| | `text` | string | full prompt text after redaction | | ||
| | `status` | enum | `accepted`, `abandoned` | | ||
| @@ -58,8 +58,11 @@ All v0.2 additions are optional and additive. Consumers that only understand v0. | ||
| | `failureSignals` | FailureSignal[] | optional v0.2 failure labels attached to this node | | ||
| | `evalCandidate` | boolean | whether this node contributes to an eval candidate | | ||
| | `lessonIds` | string[] | lessons derived from this node | | ||
| + | | `rejections` | Rejection[] | optional v0.3 typed rejection/refusal/decline events captured on this turn | | |
| | `sourceEventIds` | string[] | local transcript record UUIDs; raw transcripts are never exported | | ||
| + | The `rejection` kind (v0.3) is assigned to synthetic nodes that exist only to carry a rejection signal, e.g. a tool-result rejection that arrived before any human-typed prompt. Such nodes have empty `text`, a `title` derived from the rejection kind(s), and one or more entries in `rejections`. | |
| + | ||
| ## FailureSignal | ||
| ```jsonc | ||
| @@ -86,9 +89,42 @@ Initial `type` values: | ||
| - `format_violation` | ||
| - `user_frustration` | ||
| - `abandoned_path` | ||
| + | - `user_rejected_action` (v0.3) | |
| + | - `tool_execution_failed` (v0.3) | |
| + | - `model_refused` (v0.3) | |
| + | - `permission_denied` (v0.3) | |
| The enum may gain values. Consumers should treat unknown values as advisory labels. | ||
| + | ## Rejection (v0.3) | |
| + | ||
| + | ```jsonc | |
| + | { | |
| + | "kind": "user_declined_tool", | |
| + | "source": "tool_result", | |
| + | "confidence": 1.0, | |
| + | "toolUseId": "toolu_0123ABC", | |
| + | "tool": "Bash", | |
| + | "ts": "2026-06-18T12:34:56.789Z", | |
| + | "evidence": "The user doesn't want to proceed with this tool use..." | |
| + | } | |
| + | ``` | |
| + | ||
| + | `kind` enum: | |
| + | ||
| + | - `user_declined_tool` - human rejected a proposed tool action (Claude Code canonical "user doesn't want to proceed" text) | |
| + | - `user_interrupt` - human pressed Esc / interrupt mid-response | |
| + | - `user_text_decline` - human typed an explicit decline (`no, don't`, `stop`, `cancel`) | |
| + | - `tool_execution_error` - tool ran and returned `is_error: true` for a non-decline reason | |
| + | - `permission_denied` - environment denied the action (`permission denied`, `EACCES`, `Operation cancelled`) | |
| + | - `model_refusal` - the model declined the request (`stop_reason: "refusal"` or refusal text) | |
| + | ||
| + | `source` enum: `tool_result`, `text`, `stop_reason`, `text_heuristic`. | |
| + | ||
| + | `confidence` follows the same banding as FailureSignal: 0.95+ verified, 0.8+ high, 0.65+ confirmed, else inferred. | |
| + | ||
| + | `evidence` is truncated and redacted; it carries enough context to disambiguate the rejection class. `null` when only the structured signal (e.g. `stop_reason`) is available. | |
| + | ||
| ## Edge | ||
| ```jsonc | ||
| @@ -102,6 +138,7 @@ The enum may gain values. Consumers should treat unknown values as advisory labe | ||
| - `expands` | ||
| - `checkpoints` | ||
| - `asks` | ||
| + | - `rejects` (v0.3, from `kind: "rejection"`) | |
| ## CorrectionChain | ||
| @@ -160,6 +197,9 @@ Initial eval `type` values: | ||
| - `privacy_boundary_preservation` | ||
| - `handoff_quality` | ||
| - `tool_choice_regression` | ||
| + | - `tool_permission_regression` (v0.3) | |
| + | - `tool_error_recovery` (v0.3) | |
| + | - `refusal_handling` (v0.3) | |
| ## Separate Analysis Artifacts | ||
| @@ -8,6 +8,8 @@ export function emptyStats() { | ||
| inputTokens: 0, | ||
| outputTokens: 0, | ||
| interruptions: 0, | ||
| + | rejections: 0, | |
| + | rejectionsByKind: Object.create(null), | |
| }; | ||
| } | ||
| @@ -34,6 +36,7 @@ export function newSession(path, sessionId) { | ||
| export function finalizeSession(session) { | ||
| session.stats.models = [...session.stats.models]; | ||
| session.stats.filesTouched = [...session.stats.filesTouched]; | ||
| + | session.stats.rejectionsByKind = { ...session.stats.rejectionsByKind }; | |
| if (session.customTitle) session.title = session.customTitle; | ||
| return session; | ||
| } | ||
| @@ -63,6 +66,7 @@ export function pushTurn(session, idx, text, ts, { hasImage = false, hadToolResu | ||
| afterInterruption: false, | ||
| actions: [], | ||
| thinking: 0, | ||
| + | rejections: [], | |
| }; | ||
| session.prompts.push(prompt); | ||
| session._currentPrompt = prompt; | ||
| @@ -78,6 +82,19 @@ export function addThinking(session, n = 1) { | ||
| if (session._currentPrompt) session._currentPrompt.thinking += n; | ||
| } | ||
| + | // Adapter-side helper mirroring addAction/addThinking. Adapters that read from | |
| + | // sources which surface tool errors or permission denials (Codex rollouts, | |
| + | // Cursor toolCalls[].error, Gemini finishReason:SAFETY, etc.) can call this so | |
| + | // their rejections flow through the same schema path as native Claude Code. | |
| + | export function addRejection(session, rejection) { | |
| + | if (!session._currentPrompt || !rejection || typeof rejection.kind !== 'string') return; | |
| + | if (!Array.isArray(session._currentPrompt.rejections)) session._currentPrompt.rejections = []; | |
| + | session._currentPrompt.rejections.push(rejection); | |
| + | session.stats.rejections = (session.stats.rejections || 0) + 1; | |
| + | session.stats.rejectionsByKind = session.stats.rejectionsByKind || Object.create(null); | |
| + | session.stats.rejectionsByKind[rejection.kind] = (session.stats.rejectionsByKind[rejection.kind] || 0) + 1; | |
| + | } | |
| + | ||
| export function flattenParts(parts) { | ||
| if (typeof parts === 'string') return parts; | ||
| if (!Array.isArray(parts)) { |
| @@ -15,8 +15,33 @@ const FAILURE_TYPES = new Set([ | ||
| 'format_violation', | ||
| 'user_frustration', | ||
| 'abandoned_path', | ||
| + | // v0.3 rejection-derived failure types | |
| + | 'user_rejected_action', | |
| + | 'tool_execution_failed', | |
| + | 'model_refused', | |
| + | 'permission_denied', | |
| ]); | ||
| + | // Maps a Rejection.kind (v0.3) to a failure type. user_* rejections all funnel | |
| + | // into user_rejected_action because they are variants of the same human-steering | |
| + | // event: the agent proposed or did something and the human stopped it. | |
| + | const REJECTION_KIND_TO_FAILURE_TYPE = { | |
| + | user_declined_tool: 'user_rejected_action', | |
| + | user_interrupt: 'user_rejected_action', | |
| + | user_text_decline: 'user_rejected_action', | |
| + | tool_execution_error: 'tool_execution_failed', | |
| + | permission_denied: 'permission_denied', | |
| + | model_refusal: 'model_refused', | |
| + | }; | |
| + | ||
| + | // tier from a rejection confidence. Matches the security-signal banding. | |
| + | function tierForRejection(confidence) { | |
| + | if (confidence >= 0.95) return 'verified'; | |
| + | if (confidence >= 0.8) return 'high'; | |
| + | if (confidence >= 0.65) return 'confirmed'; | |
| + | return 'inferred'; | |
| + | } | |
| + | ||
| const CORRECTION_HINT = | ||
| /\b(no|stop|scrap|not that|you forgot|you ignored|that's wrong|that is wrong|i said|instead|redo|re do|go back|wrong|doesn'?t work|didn'?t work|still (failing|broken|wrong|bad)|not what i (asked|wanted|meant))\b/i; | ||
| const FRUSTRATION_HINT = | ||
| @@ -402,6 +427,36 @@ export function analyzeTree(tree) { | ||
| const securityNodeIds = new Set(); | ||
| tree.nodes.forEach((node, index) => { | ||
| + | // v0.3: rejection surfacing pass. Each captured rejection becomes a failure | |
| + | // signal of the mapped type. Rejection failures do not call | |
| + | // nearestCorrectionAfter / nearestAcceptedAfter (each O(N), which would | |
| + | // regress the v0.7.0 O(N) assembly guarantee on rejection-heavy sessions): | |
| + | // a rejection IS the failure event, and its resolution is implicit in the | |
| + | // next accepted turn rather than something we need to chase. Single pass, | |
| + | // O(N) over nodes times O(R) over rejections per node, where R is bounded | |
| + | // by the number of tool blocks per turn. Identical failure type on the same | |
| + | // node merges into the existing record via addFailure's dedup-by-key path. | |
| + | if (Array.isArray(node.rejections) && node.rejections.length) { | |
| + | for (const r of node.rejections) { | |
| + | const type = REJECTION_KIND_TO_FAILURE_TYPE[r.kind]; | |
| + | if (!type) continue; | |
| + | const tier = tierForRejection(r.confidence || 0); | |
| + | const ev = r.evidence | |
| + | ? `${r.kind} (${r.source || 'tool_result'}): "${quote(r.evidence)}"` | |
| + | : `${r.kind} (${r.source || 'stop_reason'})`; | |
| + | addFailure({ | |
| + | type, | |
| + | confidence: r.confidence || 0.7, | |
| + | tier, | |
| + | failureNode: node, | |
| + | correctionNode: null, | |
| + | resolvedNode: null, | |
| + | evidence: ev, | |
| + | summary: summarizeRejection(r, node), | |
| + | }); | |
| + | } | |
| + | } | |
| + | ||
| const secActs = securityActions(node); | ||
| if (secActs.length) { | ||
| // P1: corroborating co-signals -- surface class on a touched file, and a human | ||
| @@ -567,6 +622,48 @@ export function renderFailuresJson(tree, opts = {}) { | ||
| }; | ||
| } | ||
| + | // v0.3: flattened rejection view for --rejections CLI flag and MCP tool. Walks | |
| + | // nodes once (O(N) over nodes times O(R) over rejections per node) and joins | |
| + | // each rejection back to its source node id so consumers can locate it in the | |
| + | // tree. The failure-signal view in renderFailuresJson already includes the | |
| + | // derived failures; this view is the raw rejection ledger. | |
| + | export function renderRejectionsJson(tree, opts = {}) { | |
| + | analyzeTree(tree); | |
| + | const out = []; | |
| + | const byKind = Object.create(null); | |
| + | for (const node of tree.nodes) { | |
| + | if (!Array.isArray(node.rejections) || !node.rejections.length) continue; | |
| + | for (const r of node.rejections) { | |
| + | out.push({ | |
| + | nodeId: node.id, | |
| + | kind: r.kind, | |
| + | source: r.source || null, | |
| + | confidence: r.confidence, | |
| + | toolUseId: r.toolUseId || null, | |
| + | tool: r.tool || null, | |
| + | ts: r.ts || node.ts || null, | |
| + | evidence: r.evidence || null, | |
| + | }); | |
| + | byKind[r.kind] = (byKind[r.kind] || 0) + 1; | |
| + | } | |
| + | } | |
| + | out.sort((a, b) => { | |
| + | const ta = a.ts ? Date.parse(a.ts) : NaN; | |
| + | const tb = b.ts ? Date.parse(b.ts) : NaN; | |
| + | if (Number.isFinite(ta) && Number.isFinite(tb) && ta !== tb) return ta - tb; | |
| + | return (a.nodeId || '').localeCompare(b.nodeId || ''); | |
| + | }); | |
| + | return { | |
| + | schemaVersion: SCHEMA_VERSION, | |
| + | project: projectBlock(opts), | |
| + | summary: { | |
| + | total: out.length, | |
| + | byKind: { ...byKind }, | |
| + | }, | |
| + | rejections: out, | |
| + | }; | |
| + | } | |
| + | ||
| export function renderLessonsMarkdown(tree, opts = {}) { | ||
| const analysis = analyzeTree(tree); | ||
| const lines = ['# Lessons', '']; | ||
| @@ -1047,6 +1144,10 @@ function lessonFor(type, { evidence = '', summary = '' } = {}) { | ||
| format_violation: 'Preserve requested output formats', | ||
| user_frustration: 'Escalate when user frustration appears', | ||
| abandoned_path: 'Avoid abandoned paths unless explicitly revived', | ||
| + | user_rejected_action: 'Confirm proposed actions before executing', | |
| + | tool_execution_failed: 'Validate tool inputs before executing', | |
| + | model_refused: 'Rephrase refused requests instead of repeating them', | |
| + | permission_denied: 'Pre-flight check filesystem and shell permissions', | |
| }; | ||
| const guidance = { | ||
| ignored_constraint: 'Future agents should carry explicit user constraints forward as high-priority requirements.', | ||
| @@ -1062,6 +1163,10 @@ function lessonFor(type, { evidence = '', summary = '' } = {}) { | ||
| format_violation: 'Future agents should preserve requested output formats exactly unless the user approves a change.', | ||
| user_frustration: 'Future agents should treat frustration as a signal to slow down, verify assumptions, and correct course.', | ||
| abandoned_path: 'Future agents should avoid resurrecting abandoned branches unless the user explicitly asks for them.', | ||
| + | user_rejected_action: 'Future agents should not retry a tool action the user just declined without first explaining why the action is still worth taking.', | |
| + | tool_execution_failed: 'Future agents should validate command inputs and surface expected errors before running shell or write tools, instead of discovering failures after execution.', | |
| + | model_refused: 'Future agents should treat a refusal as a signal to rephrase or descope, not to retry the same request verbatim; if the user confirms the request is legitimate, surface the refusal reason.', | |
| + | permission_denied: 'Future agents should pre-flight check that required files, commands, or resources are accessible before attempting an action that needs them.', | |
| }; | ||
| const base = guidance[type] || 'Future agents should preserve this correction.'; | ||
| const concrete = String(evidence || summary || '').replace(/\s+/g, ' ').trim(); | ||
| @@ -1077,6 +1182,9 @@ function evalTypeFor(type) { | ||
| if (type === 'ignored_constraint' || type === 'format_violation') return 'constraint_preservation'; | ||
| if (type === 'wrong_tool_choice' || type === 'dependency_or_environment_mismatch') return 'tool_choice_regression'; | ||
| if (type === 'abandoned_path') return 'correction_adherence'; | ||
| + | if (type === 'user_rejected_action' || type === 'permission_denied') return 'tool_permission_regression'; | |
| + | if (type === 'tool_execution_failed') return 'tool_error_recovery'; | |
| + | if (type === 'model_refused') return 'refusal_handling'; | |
| return 'instruction_following_regression'; | ||
| } | ||
| @@ -1084,6 +1192,11 @@ function evalTaskFor(type) { | ||
| if (type === 'security_or_privacy_risk') return 'Continue development while preserving privacy and redaction boundaries.'; | ||
| if (type === 'scope_drift') return 'Continue development without drifting outside the corrected scope.'; | ||
| if (type === 'format_violation') return 'Continue development while preserving the requested output format.'; | ||
| + | if (type === 'user_rejected_action' || type === 'permission_denied') { | |
| + | return 'Continue development without re-attempting tool actions the user or environment has just rejected.'; | |
| + | } | |
| + | if (type === 'tool_execution_failed') return 'Continue development while validating tool inputs before execution.'; | |
| + | if (type === 'model_refused') return 'Continue development by rephrasing refused requests rather than repeating them.'; | |
| return 'Continue development while preserving the corrected direction from the session lineage.'; | ||
| } | ||
| @@ -1100,6 +1213,26 @@ function failureModeFor(type) { | ||
| return `Agent repeats ${type.replace(/_/g, ' ')} despite prior correction.`; | ||
| } | ||
| + | function summarizeRejection(r, node) { | |
| + | const subject = truncate(node && node.title ? node.title : 'a previous turn', 90); | |
| + | switch (r.kind) { | |
| + | case 'user_declined_tool': | |
| + | return `The user declined a proposed tool action near "${subject}".`; | |
| + | case 'user_interrupt': | |
| + | return `The user interrupted the agent mid-response near "${subject}".`; | |
| + | case 'user_text_decline': | |
| + | return `The user explicitly told the agent to stop or not proceed near "${subject}".`; | |
| + | case 'tool_execution_error': | |
| + | return `A tool execution returned an error near "${subject}".`; | |
| + | case 'permission_denied': | |
| + | return `A tool action was denied by the environment (permission denied) near "${subject}".`; | |
| + | case 'model_refusal': | |
| + | return `The model refused to proceed near "${subject}".`; | |
| + | default: | |
| + | return `A ${r.kind || 'rejection'} was captured near "${subject}".`; | |
| + | } | |
| + | } | |
| + | ||
| function confidenceLabel(score) { | ||
| if (score >= 0.8) return 'high'; | ||
| if (score >= 0.65) return 'medium'; |
| @@ -14,6 +14,7 @@ import { renderReportMarkdown, renderTerminalSummary } from './report.js'; | ||
| import { | ||
| analyzeTree, | ||
| renderFailuresJson, | ||
| + | renderRejectionsJson, | |
| renderLessonsMarkdown, | ||
| renderEvalsJsonl, | ||
| renderMemoryMarkdown, | ||
| @@ -36,6 +37,7 @@ Usage: | ||
| treetrace --report write all artifacts and print the human report | ||
| treetrace --handoff print an agent-ready handoff brief to stdout | ||
| treetrace --failures write and print failure-analysis JSON | ||
| + | treetrace --rejections write and print rejection/refusal/decline JSON (v0.3) | |
| treetrace --lessons write and print lessons Markdown | ||
| treetrace --evals write and print eval JSONL | ||
| treetrace --memory write and print compact agent memory | ||
| @@ -52,6 +54,7 @@ Options: | ||
| --report-file <file> human report output path (default: TREETRACE_REPORT.md) | ||
| --json also print lineage JSON to stdout | ||
| --analysis write failure, lesson, eval, and memory artifacts | ||
| + | --rejections write and print .treetrace/rejections.json (v0.3) | |
| --titles-only omit full prompt texts from the markdown tree | ||
| --security print a security-focused report and write hallucinations.json | ||
| --mcp start a read-only MCP server over stdio (same as: treetrace mcp) | ||
| @@ -268,6 +271,15 @@ export async function loadRedactedTree(opts, projectDir, projectName, log = () = | ||
| if (typeof action[field] === 'string') findings.push(...scanText(action[field])); | ||
| } | ||
| } | ||
| + | // v0.3: rejection evidence is rendered in reports/rejections.json and the | |
| + | // MCP tool, so it must pass the same redaction gate as prompt text and | |
| + | // action bodies. A secret in a tool_result or refusal text would otherwise | |
| + | // reach written artifacts. | |
| + | if (Array.isArray(node.rejections)) { | |
| + | for (const r of node.rejections) { | |
| + | if (typeof r.evidence === 'string') findings.push(...scanText(r.evidence)); | |
| + | } | |
| + | } | |
| } | ||
| const interactive = !forceAuto && process.stdin.isTTY && process.stderr.isTTY && !opts.redactAuto; | ||
| @@ -305,6 +317,14 @@ export async function loadRedactedTree(opts, projectDir, projectName, log = () = | ||
| } | ||
| } | ||
| } | ||
| + | // v0.3: apply the same redaction decisions to rejection evidence. | |
| + | if (Array.isArray(node.rejections)) { | |
| + | for (const r of node.rejections) { | |
| + | if (typeof r.evidence === 'string') { | |
| + | r.evidence = applyDecisions(r.evidence, findings, decisions); | |
| + | } | |
| + | } | |
| + | } | |
| } | ||
| analyzeTree(tree); | ||
| @@ -373,6 +393,11 @@ function analysisArtifacts(ttDir, tree, renderOpts, projectDir) { | ||
| path: join(ttDir, 'failures.json'), | ||
| text: JSON.stringify(renderFailuresJson(tree, renderOpts), null, 2), | ||
| }, | ||
| + | rejections: { | |
| + | label: 'rejections.json', | |
| + | path: join(ttDir, 'rejections.json'), | |
| + | text: JSON.stringify(renderRejectionsJson(tree, renderOpts), null, 2), | |
| + | }, | |
| hallucinations: { | ||
| label: 'hallucinations.json', | ||
| path: join(ttDir, 'hallucinations.json'), | ||
| @@ -399,6 +424,7 @@ function analysisArtifacts(ttDir, tree, renderOpts, projectDir) { | ||
| function requestedArtifacts(opts, artifacts) { | ||
| const requested = []; | ||
| if (opts.failures) requested.push(artifacts.failures); | ||
| + | if (opts.rejections) requested.push(artifacts.rejections); | |
| if (opts.lessons) requested.push(artifacts.lessons); | ||
| if (opts.evals) requested.push(artifacts.evals); | ||
| if (opts.memory) requested.push(artifacts.memory); | ||
| @@ -513,6 +539,7 @@ export function parseArgs(argv) { | ||
| json: false, | ||
| analysis: false, | ||
| failures: false, | ||
| + | rejections: false, | |
| lessons: false, | ||
| evals: false, | ||
| memory: false, | ||
| @@ -556,6 +583,7 @@ export function parseArgs(argv) { | ||
| case '--json': opts.json = true; break; | ||
| case '--analysis': opts.analysis = true; break; | ||
| case '--failures': opts.failures = true; break; | ||
| + | case '--rejections': opts.rejections = true; break; | |
| case '--lessons': opts.lessons = true; break; | ||
| case '--evals': opts.evals = true; break; | ||
| case '--memory': opts.memory = true; break; |
| @@ -1,4 +1,4 @@ | ||
| export const REPO_URL = | ||
| process.env.TREETRACE_REPO_URL || 'https://github.com/TreeTraceTool/TreeTrace'; | ||
| - | export const SCHEMA_VERSION = '0.2'; | |
| + | export const SCHEMA_VERSION = '0.3'; |
| @@ -7,6 +7,7 @@ const KIND = { | ||
| SCOPE: 'scope-change', | ||
| CHECKPOINT: 'checkpoint', | ||
| QUESTION: 'question', | ||
| + | REJECTION: 'rejection', | |
| }; | ||
| const CORRECTION_STRONG_OPENERS = | ||
| @@ -44,6 +45,34 @@ export function classifyPrompts(sessions) { | ||
| const text = prompt.text; | ||
| const words = text.split(/\s+/).filter(Boolean); | ||
| + | // v0.3: synthetic rejection-only prompts (text === '', isRejectionOnly flag | |
| + | // set by parse.js when a tool-result rejection arrives before any text | |
| + | // prompt). Route to kind:'rejection' directly, bypass dup/rerun/nudge | |
| + | // folding, and keep them as their own nodes so the rejection signal is | |
| + | // visible in the lineage instead of disappearing into a sibling. | |
| + | if (prompt.isRejectionOnly) { | |
| + | const node = { | |
| + | id: null, | |
| + | uuid: prompt.uuid, | |
| + | parentUuid: prompt.parentUuid, | |
| + | sessionId: session.sessionId, | |
| + | ts: prompt.ts, | |
| + | text: '', | |
| + | title: makeRejectionTitle(prompt.rejections), | |
| + | kind: KIND.REJECTION, | |
| + | status: 'accepted', | |
| + | nudges: 0, | |
| + | afterInterruption: prompt.afterInterruption, | |
| + | actions: prompt.actions || [], | |
| + | thinking: prompt.thinking || 0, | |
| + | rejections: prompt.rejections || [], | |
| + | chars: 0, | |
| + | }; | |
| + | nodes.push(node); | |
| + | prevNode = node; | |
| + | continue; | |
| + | } | |
| + | ||
| if (prevNode && isDupOf(prevNode.text, text)) { | ||
| if (text.length > prevNode.text.length) { | ||
| prevNode.text = text; | ||
| @@ -90,6 +119,7 @@ export function classifyPrompts(sessions) { | ||
| afterInterruption: prompt.afterInterruption, | ||
| actions: prompt.actions || [], | ||
| thinking: prompt.thinking || 0, | ||
| + | rejections: prompt.rejections || [], | |
| chars: text.length, | ||
| } : { | ||
| id: null, | ||
| @@ -105,6 +135,7 @@ export function classifyPrompts(sessions) { | ||
| afterInterruption: prompt.afterInterruption, | ||
| actions: prompt.actions || [], | ||
| thinking: prompt.thinking || 0, | ||
| + | rejections: prompt.rejections || [], | |
| chars: text.length, | ||
| }; | ||
| if (node.kind === KIND.ROOT) rootAssigned = true; | ||
| @@ -115,6 +146,16 @@ export function classifyPrompts(sessions) { | ||
| return nodes; | ||
| } | ||
| + | function makeRejectionTitle(rejections) { | |
| + | if (!Array.isArray(rejections) || !rejections.length) return '[agent action rejected]'; | |
| + | const kinds = [...new Set(rejections.map((r) => r.kind))]; | |
| + | if (kinds.length === 1) { | |
| + | const k = kinds[0].replace(/_/g, ' '); | |
| + | return `Agent action rejected (${k})`; | |
| + | } | |
| + | return `Agent action rejected (${kinds.length} kinds)`; | |
| + | } | |
| + | ||
| function isDupOf(a, b) { | ||
| const na = a.replace(/\s+/g, ' ').trim(); | ||
| const nb = b.replace(/\s+/g, ' ').trim(); | ||
| @@ -158,6 +199,10 @@ function mergeActions(node, prompt) { | ||
| node.actions = node.actions || []; | ||
| if (prompt.actions && prompt.actions.length) node.actions.push(...prompt.actions); | ||
| if (prompt.thinking) node.thinking = (node.thinking || 0) + prompt.thinking; | ||
| + | if (Array.isArray(prompt.rejections) && prompt.rejections.length) { | |
| + | node.rejections = node.rejections || []; | |
| + | node.rejections.push(...prompt.rejections); | |
| + | } | |
| } | ||
| export { KIND }; |
| @@ -2,7 +2,7 @@ import { createInterface } from 'node:readline'; | ||
| import { resolve } from 'node:path'; | ||
| import { parseArgs, loadRedactedTree, detectProjectName, assertClean } from './cli.js'; | ||
| import { renderHandoff } from './handoff.js'; | ||
| - | import { renderLessonsMarkdown, analyzeTree } from './analyze.js'; | |
| + | import { renderLessonsMarkdown, analyzeTree, renderRejectionsJson } from './analyze.js'; | |
| import { renderSecurityReport } from './security-report.js'; | ||
| import { renderHallucinationsJson } from './hallucinate.js'; | ||
| import { renderJson } from './render-json.js'; | ||
| @@ -38,6 +38,11 @@ const TOOL_DEFS = [ | ||
| description: 'Full prompt-lineage tree as canonical JSON (nodes, stats, analysis). The structured counterpart to the Markdown reports. Read only.', | ||
| inputSchema: { type: 'object', properties: {}, additionalProperties: false }, | ||
| }, | ||
| + | { | |
| + | name: 'rejections_summary', | |
| + | description: 'Typed rejection / refusal / decline events captured on the session (tool declines, interrupts, permission denials, tool errors, model refusals). Read only.', | |
| + | inputSchema: { type: 'object', properties: {}, additionalProperties: false }, | |
| + | }, | |
| ]; | ||
| export async function startMcpServer({ argv, version }, io = {}) { | ||
| @@ -185,6 +190,8 @@ function renderTool(name, tree, renderOpts) { | ||
| } | ||
| case 'tree': | ||
| return JSON.stringify(renderJson(tree, renderOpts), null, 2); | ||
| + | case 'rejections_summary': | |
| + | return JSON.stringify(renderRejectionsJson(tree, renderOpts), null, 2); | |
| default: | ||
| return ''; | ||
| } |
| @@ -1,9 +1,52 @@ | ||
| import { createReadStream } from 'node:fs'; | ||
| import { createInterface } from 'node:readline'; | ||
| + | import { truncate } from './util.js'; | |
| import { TreetraceError, ExitCode } from './util.js'; | ||
| const DAG_TYPES = new Set(['user', 'assistant', 'system', 'attachment']); | ||
| + | // --- Rejection / refusal / decline detection (v0.3) --- | |
| + | // Named, individually-testable regex pieces composed at load time, following the | |
| + | // v0.7.0 precedent for security intent and risky-command detection. Each class | |
| + | // maps to one Rejection.kind. Order in TOOL_RESULT_REJECTION_PATTERNS matters: | |
| + | // the first match wins, so more specific (user_declined_tool) precedes less | |
| + | // specific (permission_denied, tool_execution_error). | |
| + | ||
| + | const USER_DECLINED_TOOL_RE = | |
| + | /\bthe user (?:doesn'?t|does not|didn'?t|did not) want to proceed with this tool use\b|\bthe user (?:wants?|wanted) (?:you|me|the agent) to\b|\buser (?:rejected|declined|cancelled|canceled) (?:this|the) tool(?: use)?\b|\buser chose to reject\b/i; | |
| + | ||
| + | const PERMISSION_DENIED_RE = | |
| + | /\bpermission denied\b|\boperation not permitted\b|\bEACCES\b|\bEPERM\b|\bcommand not found\b|\bOperation cancelled\b|\baccess is denied\b|\brequires? elevation\b/i; | |
| + | ||
| + | const REFUSAL_TEXT_RE = | |
| + | /\b(?:i (?:can(?:'|no)t|am (?:unable|not able|not permitted) to|won['']?t|cannot|do not|don['']?t (?:think i (?:should|can)|feel comfortable)|'?m not (?:able|allowed|going) to)|(?:sorry|apolog(?:y|ies|ize))[,.]? i (?:can(?:'|no)t|am unable|won['']?t|cannot)|as (?:an? )?(?:ai|language model|assistant)[, ]+(?:i |we )?(?:can(?:'|no)t|cannot|am unable|won['']?t)|i'?m programmed (?:to decline|not to)|against my (?:guidelines|policies|programming))\b/i; | |
| + | ||
| + | const USER_TEXT_DECLINE_RE = | |
| + | /^(?:no(?:pe)?\s*[,.)]?\s+|stop\s*[,.)]?\s+|cancel\s*[,.)]?\s+|don'?t\s+|do not\s+|don'?t do (?:that|this|it)\b|stop (?:that|this|it|doing)\b|not that one\b|scratch that\b|nevermind\b|never mind\b)/i; | |
| + | ||
| + | // tool_result rejection classifier. Returns { kind, confidence, evidence } or null. | |
| + | function classifyToolResultRejection(content) { | |
| + | const text = typeof content === 'string' ? content : ''; | |
| + | if (!text) return { kind: 'tool_execution_error', confidence: 0.85, evidence: null }; | |
| + | if (USER_DECLINED_TOOL_RE.test(text)) { | |
| + | return { kind: 'user_declined_tool', confidence: 1.0, evidence: truncate(text, 160) }; | |
| + | } | |
| + | if (PERMISSION_DENIED_RE.test(text)) { | |
| + | return { kind: 'permission_denied', confidence: 0.85, evidence: truncate(text, 160) }; | |
| + | } | |
| + | return { kind: 'tool_execution_error', confidence: 0.9, evidence: truncate(text, 160) }; | |
| + | } | |
| + | ||
| + | function looksLikeRefusal(text) { | |
| + | return typeof text === 'string' && text.length <= 4000 && REFUSAL_TEXT_RE.test(text); | |
| + | } | |
| + | ||
| + | function looksLikeUserTextDecline(text) { | |
| + | const t = typeof text === 'string' ? text.trim() : ''; | |
| + | if (!t || t.length > 240) return false; | |
| + | return USER_TEXT_DECLINE_RE.test(t); | |
| + | } | |
| + | ||
| export async function parseSessionFile(path, sessionMeta = {}) { | ||
| const session = { | ||
| sessionId: sessionMeta.sessionId || null, | ||
| @@ -28,6 +71,8 @@ export async function parseSessionFile(path, sessionMeta = {}) { | ||
| inputTokens: 0, | ||
| outputTokens: 0, | ||
| interruptions: 0, | ||
| + | rejections: 0, | |
| + | rejectionsByKind: Object.create(null), | |
| }, | ||
| isContinuation: false, | ||
| _usageByMsgId: new Map(), | ||
| @@ -63,6 +108,7 @@ export async function parseSessionFile(path, sessionMeta = {}) { | ||
| if (session.customTitle) session.title = session.customTitle; | ||
| session.stats.models = [...session.stats.models]; | ||
| session.stats.filesTouched = [...session.stats.filesTouched]; | ||
| + | session.stats.rejectionsByKind = { ...session.stats.rejectionsByKind }; | |
| return session; | ||
| } | ||
| @@ -123,6 +169,36 @@ function indexDagNode(session, rec, { parentOverride } = {}) { | ||
| if (!rec.isSidechain) session.leafUuid = rec.uuid; | ||
| } | ||
| + | // Attach a rejection to the current prompt. If no current prompt exists (e.g. | |
| + | // a tool-result rejection arrives before any text prompt), synthesize a | |
| + | // rejection-only prompt so the signal is never lost. O(1) per call. | |
| + | function attachRejection(session, rejection) { | |
| + | if (!rejection || typeof rejection.kind !== 'string') return; | |
| + | let prompt = session._currentPrompt; | |
| + | if (!prompt) { | |
| + | prompt = { | |
| + | uuid: null, | |
| + | parentUuid: session.leafUuid || null, | |
| + | ts: rejection.ts || null, | |
| + | text: '', | |
| + | hasImage: false, | |
| + | hadToolResultContext: true, | |
| + | afterInterruption: false, | |
| + | actions: [], | |
| + | thinking: 0, | |
| + | rejections: [], | |
| + | isRejectionOnly: true, | |
| + | }; | |
| + | session.prompts.push(prompt); | |
| + | session._currentPrompt = prompt; | |
| + | } | |
| + | if (!Array.isArray(prompt.rejections)) prompt.rejections = []; | |
| + | prompt.rejections.push(rejection); | |
| + | session.stats.rejections = (session.stats.rejections || 0) + 1; | |
| + | session.stats.rejectionsByKind = session.stats.rejectionsByKind || Object.create(null); | |
| + | session.stats.rejectionsByKind[rejection.kind] = (session.stats.rejectionsByKind[rejection.kind] || 0) + 1; | |
| + | } | |
| + | ||
| function ingestUser(session, rec) { | ||
| if (rec.isSidechain || rec.agentId) return; | ||
| @@ -140,14 +216,63 @@ function ingestUser(session, rec) { | ||
| if (rec.origin && rec.origin.kind === 'task-notification') return; | ||
| const msg = rec.message || {}; | ||
| - | const { text, hasImage, hasToolResult, hasOnlyToolResult } = flattenUserContent(msg.content); | |
| - | if (hasOnlyToolResult) return; | |
| + | const { text, hasImage, hasToolResult, hasOnlyToolResult, toolResults } = flattenUserContent(msg.content); | |
| + | ||
| + | // Tool-result-only records were previously dropped silently. Now they are | |
| + | // mined for rejections (user-decline, tool error, permission denied) before | |
| + | // being skipped as non-prompts. Synthetic-tool-result echoes from the | |
| + | // harness carry no is_error and produce no rejection. | |
| + | if (hasOnlyToolResult) { | |
| + | for (const tr of toolResults) { | |
| + | if (tr && tr.isError) { | |
| + | const cls = classifyToolResultRejection(tr.content); | |
| + | attachRejection(session, { | |
| + | kind: cls.kind, | |
| + | source: 'tool_result', | |
| + | confidence: cls.confidence, | |
| + | toolUseId: tr.toolUseId || null, | |
| + | tool: null, | |
| + | ts: rec.timestamp || null, | |
| + | evidence: cls.evidence, | |
| + | }); | |
| + | } | |
| + | } | |
| + | return; | |
| + | } | |
| + | ||
| + | // Mixed text + tool_result: still extract any rejection signal from the | |
| + | // tool_result blocks before continuing into the text-classification path. | |
| + | if (hasToolResult && Array.isArray(toolResults)) { | |
| + | for (const tr of toolResults) { | |
| + | if (tr && tr.isError) { | |
| + | const cls = classifyToolResultRejection(tr.content); | |
| + | attachRejection(session, { | |
| + | kind: cls.kind, | |
| + | source: 'tool_result', | |
| + | confidence: cls.confidence, | |
| + | toolUseId: tr.toolUseId || null, | |
| + | tool: null, | |
| + | ts: rec.timestamp || null, | |
| + | evidence: cls.evidence, | |
| + | }); | |
| + | } | |
| + | } | |
| + | } | |
| let trimmed = (text || '').trim(); | ||
| if (/^\[Request interrupted by user/i.test(trimmed)) { | ||
| session.stats.interruptions++; | ||
| session._pendingInterruption = true; | ||
| + | attachRejection(session, { | |
| + | kind: 'user_interrupt', | |
| + | source: 'text', | |
| + | confidence: 1.0, | |
| + | toolUseId: null, | |
| + | tool: null, | |
| + | ts: rec.timestamp || null, | |
| + | evidence: truncate(trimmed, 160) || '[Request interrupted by user]', | |
| + | }); | |
| return; | ||
| } | ||
| @@ -171,6 +296,16 @@ function ingestUser(session, rec) { | ||
| if (!trimmed && hasImage) trimmed = '[image-only prompt: screenshot/annotated feedback]'; | ||
| if (!trimmed) return; | ||
| + | // Text-decline rejection: detect after we know trimmed is non-empty and is a | |
| + | // real prompt (not meta/command/compact). The placeholder this pushes doubles | |
| + | // as the canonical prompt for this turn (it already carries the rejection), | |
| + | // so we return immediately to avoid pushing a second prompt below. | |
| + | if (looksLikeUserTextDecline(trimmed)) { | |
| + | attachRejectionToText(session, rec, trimmed, 'user_text_decline', 'text', 0.8); | |
| + | session._pendingInterruption = false; | |
| + | return; | |
| + | } | |
| + | ||
| const prompt = { | ||
| uuid: rec.uuid || null, | ||
| parentUuid: rec.parentUuid || null, | ||
| @@ -181,12 +316,42 @@ function ingestUser(session, rec) { | ||
| afterInterruption: Boolean(session._pendingInterruption), | ||
| actions: [], | ||
| thinking: 0, | ||
| + | rejections: [], | |
| }; | ||
| session.prompts.push(prompt); | ||
| session._currentPrompt = prompt; | ||
| session._pendingInterruption = false; | ||
| } | ||
| + | // Variant of attachRejection that links the rejection to the prompt we are | |
| + | // about to create. We push a placeholder _currentPrompt first so attachRejection | |
| + | // finds it, then fill in the real fields. | |
| + | function attachRejectionToText(session, rec, text, kind, source, confidence) { | |
| + | const placeholder = { | |
| + | uuid: rec.uuid || null, | |
| + | parentUuid: rec.parentUuid || null, | |
| + | ts: rec.timestamp || null, | |
| + | text, | |
| + | hasImage: false, | |
| + | hadToolResultContext: false, | |
| + | afterInterruption: Boolean(session._pendingInterruption), | |
| + | actions: [], | |
| + | thinking: 0, | |
| + | rejections: [], | |
| + | }; | |
| + | session.prompts.push(placeholder); | |
| + | session._currentPrompt = placeholder; | |
| + | attachRejection(session, { | |
| + | kind, | |
| + | source, | |
| + | confidence, | |
| + | toolUseId: null, | |
| + | tool: null, | |
| + | ts: rec.timestamp || null, | |
| + | evidence: truncate(text, 160), | |
| + | }); | |
| + | } | |
| + | ||
| function ingestAssistant(session, rec) { | ||
| if (rec.isSidechain || rec.agentId) return; | ||
| indexDagNode(session, rec); | ||
| @@ -203,9 +368,25 @@ function ingestAssistant(session, rec) { | ||
| const current = session._currentPrompt; | ||
| const content = Array.isArray(msg.content) ? msg.content : []; | ||
| + | let refusedByText = false; | |
| for (const block of content) { | ||
| if (!block) continue; | ||
| - | if (block.type === 'tool_use') { | |
| + | if (block.type === 'text') { | |
| + | // Refusal heuristic on assistant text. Lower confidence than stop_reason | |
| + | // because phrasing overlap with normal hedging is possible. | |
| + | if (!refusedByText && looksLikeRefusal(block.text)) { | |
| + | refusedByText = true; | |
| + | attachRejection(session, { | |
| + | kind: 'model_refusal', | |
| + | source: 'text_heuristic', | |
| + | confidence: 0.7, | |
| + | toolUseId: null, | |
| + | tool: null, | |
| + | ts: rec.timestamp || null, | |
| + | evidence: truncate(typeof block.text === 'string' ? block.text : '', 160), | |
| + | }); | |
| + | } | |
| + | } else if (block.type === 'tool_use') { | |
| session.stats.toolUses++; | ||
| const input = block.input || {}; | ||
| const file = input.file_path || input.notebook_path || null; | ||
| @@ -223,6 +404,21 @@ function ingestAssistant(session, rec) { | ||
| if (current) current.thinking++; | ||
| } | ||
| } | ||
| + | ||
| + | // API-level refusal signal. Higher confidence than the text heuristic because | |
| + | // it is the provider's structured verdict, not a phrase match. If both fire, | |
| + | // both rejections are kept; downstream de-duplication collapses them by kind. | |
| + | if (msg.stop_reason === 'refusal') { | |
| + | attachRejection(session, { | |
| + | kind: 'model_refusal', | |
| + | source: 'stop_reason', | |
| + | confidence: 0.95, | |
| + | toolUseId: null, | |
| + | tool: null, | |
| + | ts: rec.timestamp || null, | |
| + | evidence: null, | |
| + | }); | |
| + | } | |
| } | ||
| const INPUT_CAP = 300; | ||
| @@ -262,13 +458,13 @@ function compactJson(value) { | ||
| function flattenUserContent(content) { | ||
| if (typeof content === 'string') { | ||
| - | return { text: content, hasImage: false, hasToolResult: false, hasOnlyToolResult: false }; | |
| + | return { text: content, hasImage: false, hasToolResult: false, hasOnlyToolResult: false, toolResults: [] }; | |
| } | ||
| if (!Array.isArray(content)) { | ||
| - | return { text: '', hasImage: false, hasToolResult: false, hasOnlyToolResult: false }; | |
| + | return { text: '', hasImage: false, hasToolResult: false, hasOnlyToolResult: false, toolResults: [] }; | |
| } | ||
| let text = ''; | ||
| - | let toolResults = 0; | |
| + | const toolResults = []; | |
| let others = 0; | ||
| let images = 0; | ||
| for (const block of content) { | ||
| @@ -277,7 +473,26 @@ function flattenUserContent(content) { | ||
| text += (text ? '\n' : '') + block.text; | ||
| others++; | ||
| } else if (block.type === 'tool_result') { | ||
| - | toolResults++; | |
| + | // Coerce tool_result content into a flat string. Claude Code shapes it | |
| + | // either as a string or as an array of {type:"text", text} blocks. | |
| + | const raw = block.content; | |
| + | let blockText = ''; | |
| + | if (typeof raw === 'string') blockText = raw; | |
| + | else if (Array.isArray(raw)) { | |
| + | for (const part of raw) { | |
| + | if (part && typeof part === 'object' && typeof part.text === 'string') { | |
| + | blockText += (blockText ? '\n' : '') + part.text; | |
| + | } else if (typeof part === 'string') { | |
| + | blockText += (blockText ? '\n' : '') + part; | |
| + | } | |
| + | } | |
| + | } | |
| + | toolResults.push({ | |
| + | toolUseId: typeof block.tool_use_id === 'string' ? block.tool_use_id : null, | |
| + | isError: block.is_error === true, | |
| + | content: blockText, | |
| + | contentType: typeof raw === 'string' ? 'string' : Array.isArray(raw) ? 'array' : 'other', | |
| + | }); | |
| } else if (block.type === 'image') { | ||
| images++; | ||
| } else { | ||
| @@ -287,8 +502,9 @@ function flattenUserContent(content) { | ||
| return { | ||
| text, | ||
| hasImage: images > 0, | ||
| - | hasToolResult: toolResults > 0, | |
| - | hasOnlyToolResult: toolResults > 0 && others === 0 && images === 0, | |
| + | hasToolResult: toolResults.length > 0, | |
| + | hasOnlyToolResult: toolResults.length > 0 && others === 0 && images === 0, | |
| + | toolResults, | |
| }; | ||
| } | ||
| @@ -380,7 +596,7 @@ export function parsePlainTranscript(text, label = 'pasted-transcript') { | ||
| gitBranch: null, | ||
| firstTs: null, | ||
| lastTs: null, | ||
| - | prompts: prompts.map((p) => ({ ...p, text: p.text.trim(), actions: [], thinking: 0 })), | |
| + | prompts: prompts.map((p) => ({ ...p, text: p.text.trim(), actions: [], thinking: 0, rejections: [] })), | |
| index: new Map(), | ||
| leafUuid: null, | ||
| activeLeafUuid: null, | ||
| @@ -393,6 +609,8 @@ export function parsePlainTranscript(text, label = 'pasted-transcript') { | ||
| inputTokens: 0, | ||
| outputTokens: 0, | ||
| interruptions: 0, | ||
| + | rejections: 0, | |
| + | rejectionsByKind: {}, | |
| }, | ||
| isContinuation: false, | ||
| }; |
| @@ -7,6 +7,7 @@ const RELATIONSHIP_BY_KIND = { | ||
| 'scope-change': 'expands', | ||
| checkpoint: 'checkpoints', | ||
| question: 'asks', | ||
| + | rejection: 'rejects', | |
| root: 'refines', | ||
| }; | ||
| @@ -32,6 +33,8 @@ export function renderJson(tree, opts = {}) { | ||
| scopeChanges: stats.scopeChanges, | ||
| checkpoints: stats.checkpoints, | ||
| abandonedBranches: stats.abandonedBranches, | ||
| + | rejections: stats.rejections || 0, | |
| + | rejectionsByKind: stats.rejectionsByKind || {}, | |
| toolUses: stats.toolUses, | ||
| filesTouched: stats.filesTouched, | ||
| models: stats.models, | ||
| @@ -69,6 +72,7 @@ export function renderJson(tree, opts = {}) { | ||
| failureSignals: n.failureSignals || [], | ||
| evalCandidate: Boolean(n.evalCandidate), | ||
| lessonIds: n.lessonIds || [], | ||
| + | rejections: n.rejections || [], | |
| sourceEventIds: n.uuid ? [n.uuid] : [], | ||
| })), |
| @@ -1,4 +1,4 @@ | ||
| - | import { analyzeTree, latestByTime } from './analyze.js'; | |
| + | import { analyzeTree, latestByTime, renderRejectionsJson } from './analyze.js'; | |
| import { plural, truncate, escapeMd } from './util.js'; | ||
| import { REPO_URL } from './config.js'; | ||
| @@ -35,6 +35,14 @@ export function renderReportMarkdown(tree, opts = {}) { | ||
| ); | ||
| if (tree.stats.corrections) lines.push(`- Corrections: ${tree.stats.corrections}`); | ||
| if (tree.stats.abandonedBranches) lines.push(`- Abandoned branches: ${tree.stats.abandonedBranches}`); | ||
| + | if (tree.stats.rejections) { | |
| + | const byKind = tree.stats.rejectionsByKind || {}; | |
| + | const breakdown = Object.entries(byKind) | |
| + | .sort((a, b) => b[1] - a[1]) | |
| + | .map(([k, v]) => `${k.replace(/_/g, ' ')}: ${v}`) | |
| + | .join(', '); | |
| + | lines.push(`- Rejections: ${tree.stats.rejections}${breakdown ? ` (${breakdown})` : ''}`); | |
| + | } | |
| if (analysis.summary.models && analysis.summary.models.length) { | ||
| lines.push(`- Models seen: ${analysis.summary.models.join(', ')}`); | ||
| } | ||
| @@ -53,6 +61,7 @@ export function renderReportMarkdown(tree, opts = {}) { | ||
| lines.push('| `PROMPT_TREE.md` | prompt lineage + replay pack |'); | ||
| lines.push('| `.treetrace/tree.json` | canonical schema |'); | ||
| lines.push('| `.treetrace/failures.json` | labels + correction chains |'); | ||
| + | lines.push('| `.treetrace/rejections.json` | typed rejections/refusals/declines (v0.3) |'); | |
| lines.push('| `.treetrace/hallucinations.json` | unresolved references |'); | ||
| lines.push('| `.treetrace/lessons.md` | correction memory |'); | ||
| lines.push('| `.treetrace/evals.jsonl` | regression eval cases |'); | ||
| @@ -92,6 +101,31 @@ export function renderReportMarkdown(tree, opts = {}) { | ||
| lines.push(''); | ||
| } | ||
| + | const rejectionsView = renderRejectionsJson(tree, opts); | |
| + | if (rejectionsView.summary.total) { | |
| + | lines.push('## Rejections'); | |
| + | lines.push(''); | |
| + | lines.push('Typed rejection / refusal / decline events captured on the session. Each one is also surfaced as a failure signal of the mapped type.'); | |
| + | lines.push(''); | |
| + | const byKind = rejectionsView.summary.byKind || {}; | |
| + | const breakdown = Object.entries(byKind) | |
| + | .sort((a, b) => b[1] - a[1]) | |
| + | .map(([k, v]) => `${k.replace(/_/g, ' ')} (${v})`) | |
| + | .join(', '); | |
| + | lines.push(`- Total: ${rejectionsView.summary.total}${breakdown ? ` - ${breakdown}` : ''}`); | |
| + | lines.push(''); | |
| + | for (const r of rejectionsView.rejections.slice(0, 12)) { | |
| + | const nodeId = r.nodeId ? ` [${r.nodeId}]` : ''; | |
| + | const pct = `${Math.round((r.confidence || 0) * 100)}%`; | |
| + | const ev = r.evidence ? ` - ${escapeMd(truncate(r.evidence, 160))}` : ''; | |
| + | lines.push(`- (${r.kind}, ${pct})${nodeId}${ev}`); | |
| + | } | |
| + | if (rejectionsView.rejections.length > 12) { | |
| + | lines.push(`- ... ${rejectionsView.rejections.length - 12} more in .treetrace/rejections.json`); | |
| + | } | |
| + | lines.push(''); | |
| + | } | |
| + | ||
| lines.push('## Artifacts'); | ||
| lines.push(''); | ||
| lines.push('See: `PROMPT_TREE.md` · `.treetrace/lessons.md` · `.treetrace/agent-memory.md` · handoff: run `treetrace --handoff`'); |
| @@ -108,12 +108,20 @@ function computeStats(sessions, nodes) { | ||
| const filesTouched = new Set(); | ||
| let toolUses = 0; | ||
| let interruptions = 0; | ||
| + | let rejections = 0; | |
| + | const rejectionsByKind = Object.create(null); | |
| const timestamps = []; | ||
| for (const s of sessions) { | ||
| for (const m of s.stats.models) models.add(m); | ||
| for (const f of s.stats.filesTouched) filesTouched.add(f); | ||
| toolUses += s.stats.toolUses; | ||
| interruptions += s.stats.interruptions; | ||
| + | rejections += s.stats.rejections || 0; | |
| + | if (s.stats.rejectionsByKind) { | |
| + | for (const [k, v] of Object.entries(s.stats.rejectionsByKind)) { | |
| + | rejectionsByKind[k] = (rejectionsByKind[k] || 0) + v; | |
| + | } | |
| + | } | |
| if (s.firstTs) timestamps.push(s.firstTs); | ||
| if (s.lastTs) timestamps.push(s.lastTs); | ||
| } | ||
| @@ -127,6 +135,8 @@ function computeStats(sessions, nodes) { | ||
| corrections: byKind['correction'] || 0, | ||
| scopeChanges: byKind['scope-change'] || 0, | ||
| checkpoints: byKind['checkpoint'] || 0, | ||
| + | rejections, | |
| + | rejectionsByKind: { ...rejectionsByKind }, | |
| abandonedBranches: abandonedRoots.length, | ||
| nudges: nodes.reduce((acc, n) => acc + n.nudges, 0), | ||
| interruptions, |
| @@ -0,0 +1,15 @@ | ||
| + | {"type":"user","message":{"role":"user","content":"Build a thing in this repo."},"uuid":"u-0001","parentUuid":null,"timestamp":"2026-06-18T10:00:00.000Z","sessionId":"rejections-fixture","cwd":"/tmp/rejections-fixture","gitBranch":"main","version":"2.0.0"} | |
| + | {"type":"assistant","message":{"id":"msg-0001","role":"assistant","model":"claude-3-opus","content":[{"type":"text","text":"I will run a shell command to inspect the repo."},{"type":"tool_use","id":"toolu-0001","name":"Bash","input":{"command":"ls -la /"}}],"stop_reason":"tool_use","usage":{"input_tokens":120,"output_tokens":40}},"uuid":"a-0001","parentUuid":"u-0001","timestamp":"2026-06-18T10:00:05.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"user","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"toolu-0001","content":"The user doesn't want to proceed with this tool use. The user wants you to answer a different question instead.","is_error":true}]},"uuid":"u-0002","parentUuid":"a-0001","timestamp":"2026-06-18T10:00:10.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"user","message":{"role":"user","content":"Use the Edit tool to add a README instead."},"uuid":"u-0003","parentUuid":"u-0002","timestamp":"2026-06-18T10:00:15.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"assistant","message":{"id":"msg-0002","role":"assistant","model":"claude-3-opus","content":[{"type":"text","text":"Sure, editing README now."},{"type":"tool_use","id":"toolu-0002","name":"Edit","input":{"file_path":"README.md","new_string":"# rejections-fixture"}}],"stop_reason":"tool_use","usage":{"input_tokens":150,"output_tokens":50}},"uuid":"a-0002","parentUuid":"u-0003","timestamp":"2026-06-18T10:00:20.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"user","message":{"role":"user","content":"[Request interrupted by user]"},"uuid":"u-0004","parentUuid":"a-0002","timestamp":"2026-06-18T10:00:25.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"user","message":{"role":"user","content":"Try writing a new file via the Write tool."},"uuid":"u-0005","parentUuid":"u-0004","timestamp":"2026-06-18T10:00:30.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"assistant","message":{"id":"msg-0003","role":"assistant","model":"claude-3-opus","content":[{"type":"text","text":"Running the write now."},{"type":"tool_use","id":"toolu-0003","name":"Bash","input":{"command":"mkdir -p /root/.config/forbidden"}}],"stop_reason":"tool_use","usage":{"input_tokens":170,"output_tokens":35}},"uuid":"a-0003","parentUuid":"u-0005","timestamp":"2026-06-18T10:00:35.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"user","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"toolu-0003","content":"mkdir: cannot create directory '/root/.config/forbidden': File exists","is_error":true}]},"uuid":"u-0006","parentUuid":"a-0003","timestamp":"2026-06-18T10:00:40.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"assistant","message":{"id":"msg-0004","role":"assistant","model":"claude-3-opus","content":[{"type":"text","text":"Let me try forcing it with sudo."},{"type":"tool_use","id":"toolu-0004","name":"Bash","input":{"command":"sudo rm -rf /root/.config/forbidden"}}],"stop_reason":"tool_use","usage":{"input_tokens":190,"output_tokens":45}},"uuid":"a-0004","parentUuid":"u-0006","timestamp":"2026-06-18T10:00:45.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"user","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"toolu-0004","content":"sudo: permission denied; user is not in the sudoers file. This incident will be reported.","is_error":true}]},"uuid":"u-0007","parentUuid":"a-0004","timestamp":"2026-06-18T10:00:50.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"user","message":{"role":"user","content":"stop, don't do that"},"uuid":"u-0008","parentUuid":"u-0007","timestamp":"2026-06-18T10:00:55.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"assistant","message":{"id":"msg-0005","role":"assistant","model":"claude-3-opus","content":[{"type":"text","text":"Understood."}],"stop_reason":"refusal","usage":{"input_tokens":200,"output_tokens":10}},"uuid":"a-0005","parentUuid":"u-0008","timestamp":"2026-06-18T10:01:00.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"user","message":{"role":"user","content":"Can you at least tell me what would have happened?"},"uuid":"u-0009","parentUuid":"a-0005","timestamp":"2026-06-18T10:01:05.000Z","sessionId":"rejections-fixture"} | |
| + | {"type":"assistant","message":{"id":"msg-0006","role":"assistant","model":"claude-3-opus","content":[{"type":"text","text":"I can't help with that request. It would require me to describe how to bypass filesystem permissions, which I am programmed not to do."}],"stop_reason":"end_turn","usage":{"input_tokens":220,"output_tokens":60}},"uuid":"a-0006","parentUuid":"u-0009","timestamp":"2026-06-18T10:01:10.000Z","sessionId":"rejections-fixture"} |
| @@ -17,6 +17,7 @@ import { renderReportMarkdown, renderTerminalSummary } from '../src/report.js'; | ||
| import { | ||
| analyzeTree, | ||
| renderFailuresJson, | ||
| + | renderRejectionsJson, | |
| renderLessonsMarkdown, | ||
| renderEvalsJsonl, | ||
| renderMemoryMarkdown, | ||
| @@ -343,7 +344,7 @@ test('renderers: markdown, json, handoff are consistent and footer-credited', as | ||
| assert.ok(md.includes('[treetrace]')); | ||
| const json = renderJson(tree, { projectName: 'demo' }); | ||
| - | assert.equal(json.schemaVersion, '0.2'); | |
| + | assert.equal(json.schemaVersion, '0.3'); | |
| assert.equal(json.nodes.length, tree.nodes.length); | ||
| assert.equal(json.edges.length, tree.nodes.filter((n) => n.parent).length); | ||
| assert.ok(json.nodes.every((n) => n.id && n.kind && typeof n.text === 'string')); | ||
| @@ -377,7 +378,7 @@ test('rendering: markdown footer stamps the tool version when provided', async ( | ||
| test('analysis renderers produce failures, lessons, evals, and memory', async () => { | ||
| const { tree } = await fixtureTree(); | ||
| const failures = renderFailuresJson(tree, { projectName: 'demo', generatedAt: '2026-01-01T00:00:00.000Z' }); | ||
| - | assert.equal(failures.schemaVersion, '0.2'); | |
| + | assert.equal(failures.schemaVersion, '0.3'); | |
| assert.ok(failures.failures.length >= 1); | ||
| assert.ok(failures.correctionChains.length >= 1); | ||
| @@ -685,7 +686,7 @@ test('cli: default run writes analysis artifacts with redaction', async () => { | ||
| assert.ok(existsSync(join(dir, file)), `${file} missing`); | ||
| } | ||
| const failures = JSON.parse(readFileSync(join(dir, '.treetrace/failures.json'), 'utf8')); | ||
| - | assert.equal(failures.schemaVersion, '0.2'); | |
| + | assert.equal(failures.schemaVersion, '0.3'); | |
| assert.ok(failures.failures.length >= 1); | ||
| const evalLine = readFileSync(join(dir, '.treetrace/evals.jsonl'), 'utf8').trim().split('\n')[0]; | ||
| @@ -1086,7 +1087,7 @@ test('mcp: initialize, tools/list, and tools/call return well-formed JSON-RPC', | ||
| const list = responses.find((r) => r.id === 2); | ||
| const names = list.result.tools.map((t) => t.name).sort(); | ||
| - | assert.deepEqual(names, ['eval_candidates', 'handoff', 'lessons', 'security_summary', 'tree']); | |
| + | assert.deepEqual(names, ['eval_candidates', 'handoff', 'lessons', 'rejections_summary', 'security_summary', 'tree']); | |
| const call = responses.find((r) => r.id === 3); | ||
| assert.ok(call.result && Array.isArray(call.result.content), 'tools/call must return content array'); | ||
| @@ -1940,3 +1941,306 @@ test('cli: --graph writes PROMPT_TREE_GRAPH.md with a mermaid flowchart', async | ||
| rmSync(dir, { recursive: true, force: true }); | ||
| } | ||
| }); | ||
| + | ||
| + | // --- v0.3 rejection / refusal / decline capture --- | |
| + | // Fixture: test/fixtures/claude-code-rejections.jsonl | |
| + | // All six rejection classes represented in one Claude Code JSONL session. | |
| + | ||
| + | const REJECTIONS_FIXTURE = join(dirname(fileURLToPath(import.meta.url)), 'fixtures', 'claude-code-rejections.jsonl'); | |
| + | ||
| + | async function loadRejectionsFixture() { | |
| + | return parseSessionFile(REJECTIONS_FIXTURE, { sessionId: 'rejections-fixture' }); | |
| + | } | |
| + | ||
| + | test('rejections: user_declined_tool captured from canonical tool_result text', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const all = session.prompts.flatMap((p) => p.rejections || []); | |
| + | const declined = all.filter((r) => r.kind === 'user_declined_tool'); | |
| + | assert.equal(declined.length, 1, 'one user_declined_tool must be captured'); | |
| + | assert.equal(declined[0].source, 'tool_result'); | |
| + | assert.equal(declined[0].confidence, 1.0); | |
| + | assert.equal(declined[0].toolUseId, 'toolu-0001'); | |
| + | assert.ok(declined[0].evidence && declined[0].evidence.includes("doesn't want to proceed")); | |
| + | }); | |
| + | ||
| + | test('rejections: user_interrupt typed as a rejection AND counter still increments', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | assert.ok(session.stats.interruptions >= 1, 'interruption counter must still increment'); | |
| + | const interrupts = session.prompts.flatMap((p) => p.rejections || []).filter((r) => r.kind === 'user_interrupt'); | |
| + | assert.equal(interrupts.length, 1); | |
| + | assert.equal(interrupts[0].confidence, 1.0); | |
| + | assert.equal(interrupts[0].source, 'text'); | |
| + | }); | |
| + | ||
| + | test('rejections: tool_execution_error captured from is_error tool_result', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const errs = session.prompts.flatMap((p) => p.rejections || []).filter((r) => r.kind === 'tool_execution_error'); | |
| + | assert.equal(errs.length, 1); | |
| + | assert.equal(errs[0].toolUseId, 'toolu-0003'); | |
| + | assert.ok(errs[0].evidence.includes('cannot create directory')); | |
| + | }); | |
| + | ||
| + | test('rejections: permission_denied captured from is_error tool_result with OS denial text', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const denied = session.prompts.flatMap((p) => p.rejections || []).filter((r) => r.kind === 'permission_denied'); | |
| + | assert.equal(denied.length, 1); | |
| + | assert.equal(denied[0].toolUseId, 'toolu-0004'); | |
| + | assert.equal(denied[0].confidence, 0.85); | |
| + | assert.ok(/permission denied/i.test(denied[0].evidence)); | |
| + | }); | |
| + | ||
| + | test('rejections: model_refusal captured from stop_reason: "refusal" at 0.95 confidence', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const stop = session.prompts.flatMap((p) => p.rejections || []).filter( | |
| + | (r) => r.kind === 'model_refusal' && r.source === 'stop_reason' | |
| + | ); | |
| + | assert.equal(stop.length, 1); | |
| + | assert.equal(stop[0].confidence, 0.95); | |
| + | }); | |
| + | ||
| + | test('rejections: model_refusal captured from text heuristic at 0.7 confidence', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const text = session.prompts.flatMap((p) => p.rejections || []).filter( | |
| + | (r) => r.kind === 'model_refusal' && r.source === 'text_heuristic' | |
| + | ); | |
| + | assert.equal(text.length, 1); | |
| + | assert.equal(text[0].confidence, 0.7); | |
| + | assert.ok(/can'?t help/i.test(text[0].evidence)); | |
| + | }); | |
| + | ||
| + | test('rejections: user_text_decline captured when prompt opens with "stop, don\'t do that"', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const declines = session.prompts.flatMap((p) => p.rejections || []).filter((r) => r.kind === 'user_text_decline'); | |
| + | assert.equal(declines.length, 1); | |
| + | assert.equal(declines[0].confidence, 0.8); | |
| + | // The decline prompt must still flow through as a real prompt with text preserved. | |
| + | const declinePrompt = session.prompts.find((p) => (p.rejections || []).some((r) => r.kind === 'user_text_decline')); | |
| + | assert.ok(declinePrompt, 'decline prompt must exist in session.prompts'); | |
| + | assert.ok(/stop, don'?t do that/i.test(declinePrompt.text), 'text is preserved on the prompt'); | |
| + | }); | |
| + | ||
| + | test('rejections: session.stats.rejections count and rejectionsByKind breakdown are populated', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const expectedKinds = { | |
| + | user_declined_tool: 1, | |
| + | user_interrupt: 1, | |
| + | tool_execution_error: 1, | |
| + | permission_denied: 1, | |
| + | model_refusal: 2, // stop_reason + text_heuristic | |
| + | user_text_decline: 1, | |
| + | }; | |
| + | const expectedTotal = Object.values(expectedKinds).reduce((a, b) => a + b, 0); | |
| + | assert.equal(session.stats.rejections, expectedTotal, 'session.stats.rejections counts every captured rejection'); | |
| + | assert.deepEqual(session.stats.rejectionsByKind, expectedKinds); | |
| + | }); | |
| + | ||
| + | test('rejections: rejection-only synthetic prompt is created when a tool_result rejection arrives with no current text prompt', async () => { | |
| + | // A fresh session whose very first record is a tool_result rejection. parse.js | |
| + | // must synthesize a rejection-only prompt (text:'', isRejectionOnly:true) so the | |
| + | // signal is never silently lost. This mirrors the "user opened agent and | |
| + | // immediately rejected something" case. | |
| + | const { parseSessionFile: parse } = await import('../src/parse.js'); | |
| + | const tmp = mkdtempSync(join(tmpdir(), 'rej-synth-')); | |
| + | const path = join(tmp, 'synth.jsonl'); | |
| + | writeFileSync( | |
| + | path, | |
| + | JSON.stringify({ | |
| + | type: 'user', | |
| + | message: { role: 'user', content: [{ type: 'tool_result', tool_use_id: 'toolu-x', content: "The user doesn't want to proceed with this tool use. The user wants you to do something else.", is_error: true }] }, | |
| + | uuid: 'u-synth-1', | |
| + | parentUuid: null, | |
| + | timestamp: '2026-06-18T11:00:00.000Z', | |
| + | sessionId: 'synth', | |
| + | }) + '\n' | |
| + | ); | |
| + | try { | |
| + | const s = await parse(path, { sessionId: 'synth' }); | |
| + | const synth = s.prompts.find((p) => p.isRejectionOnly); | |
| + | assert.ok(synth, 'a synthetic rejection-only prompt must be created'); | |
| + | assert.equal(synth.text, ''); | |
| + | assert.equal(synth.rejections.length, 1); | |
| + | assert.equal(synth.rejections[0].kind, 'user_declined_tool'); | |
| + | } finally { | |
| + | rmSync(tmp, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('rejections: rejection-only synthetic prompts get kind:"rejection" downstream', async () => { | |
| + | const { parseSessionFile: parse } = await import('../src/parse.js'); | |
| + | const tmp = mkdtempSync(join(tmpdir(), 'rej-kind-')); | |
| + | const path = join(tmp, 'k.jsonl'); | |
| + | writeFileSync( | |
| + | path, | |
| + | JSON.stringify({ | |
| + | type: 'user', | |
| + | message: { role: 'user', content: [{ type: 'tool_result', tool_use_id: 'toolu-y', content: "The user doesn't want to proceed with this tool use.", is_error: true }] }, | |
| + | uuid: 'u-kind-1', | |
| + | parentUuid: null, | |
| + | timestamp: '2026-06-18T12:00:00.000Z', | |
| + | sessionId: 'kindsession', | |
| + | }) + '\n' | |
| + | ); | |
| + | try { | |
| + | const session = await parse(path, { sessionId: 'kindsession' }); | |
| + | const nodes = classifyPrompts([session]); | |
| + | assert.equal(nodes.length, 1); | |
| + | assert.equal(nodes[0].kind, 'rejection', 'synthetic rejection-only node gets kind:"rejection", not root'); | |
| + | assert.ok(nodes[0].title && /rejected/i.test(nodes[0].title), 'title describes the rejection'); | |
| + | assert.equal(nodes[0].rejections.length, 1); | |
| + | } finally { | |
| + | rmSync(tmp, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('rejections: each rejection becomes a failure signal of the mapped type', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const nodes = classifyPrompts([session]); | |
| + | const tree = buildTree([session], nodes); | |
| + | analyzeTree(tree); | |
| + | const types = new Set(tree.analysis.failures.map((f) => f.type)); | |
| + | assert.ok(types.has('user_rejected_action'), 'user_declined_tool/user_interrupt/user_text_decline -> user_rejected_action'); | |
| + | assert.ok(types.has('tool_execution_failed'), 'tool_execution_error -> tool_execution_failed'); | |
| + | assert.ok(types.has('permission_denied'), 'permission_denied -> permission_denied'); | |
| + | assert.ok(types.has('model_refused'), 'model_refusal -> model_refused'); | |
| + | // Two model_refusal rejections on different nodes -> dedup by failureNode id means | |
| + | // at least one model_refused failure exists. | |
| + | const refusedCount = tree.analysis.failures.filter((f) => f.type === 'model_refused').length; | |
| + | assert.ok(refusedCount >= 1, 'model_refused failure signal is present'); | |
| + | }); | |
| + | ||
| + | test('rejections: lessons and eval candidates are generated for rejection-derived failures', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const nodes = classifyPrompts([session]); | |
| + | const tree = buildTree([session], nodes); | |
| + | analyzeTree(tree); | |
| + | const lessonTitles = new Set(tree.analysis.lessons.map((l) => l.title)); | |
| + | assert.ok(lessonTitles.has('Confirm proposed actions before executing'), 'user_rejected_action lesson is generated'); | |
| + | assert.ok(lessonTitles.has('Rephrase refused requests instead of repeating them'), 'model_refused lesson is generated'); | |
| + | const evalTypes = new Set(tree.analysis.evalCandidates.map((e) => e.type)); | |
| + | assert.ok(evalTypes.has('tool_permission_regression'), 'tool_permission_regression eval is generated'); | |
| + | assert.ok(evalTypes.has('refusal_handling'), 'refusal_handling eval is generated'); | |
| + | }); | |
| + | ||
| + | test('rejections: renderRejectionsJson returns a flattened, sorted, byKind-summarized view', async () => { | |
| + | const session = await loadRejectionsFixture(); | |
| + | const nodes = classifyPrompts([session]); | |
| + | const tree = buildTree([session], nodes); | |
| + | const view = renderRejectionsJson(tree, { projectName: 'rejections-fixture' }); | |
| + | assert.equal(view.schemaVersion, '0.3'); | |
| + | assert.equal(view.summary.total, 7); | |
| + | assert.equal(view.summary.byKind.model_refusal, 2); | |
| + | assert.equal(view.summary.byKind.user_declined_tool, 1); | |
| + | assert.ok(Array.isArray(view.rejections)); | |
| + | assert.equal(view.rejections.length, 7); | |
| + | // Every entry has a nodeId pointing back into the tree. | |
| + | assert.ok(view.rejections.every((r) => typeof r.nodeId === 'string')); | |
| + | // Sorted by ts ascending. | |
| + | const ts = view.rejections.map((r) => Date.parse(r.ts)).filter(Number.isFinite); | |
| + | const sorted = [...ts].sort((a, b) => a - b); | |
| + | assert.deepEqual(ts, sorted); | |
| + | }); | |
| + | ||
| + | test('rejections: O(N) preserved - the rejection surfacing pass does not regress quadratic scaling', async () => { | |
| + | // Build a synthetic tree with N nodes each carrying R rejections. If the | |
| + | // surfacing pass is O(N*R) the test completes in well under a second even at | |
| + | // N=5000. A quadratic regression would blow past the timeout. | |
| + | const N = 5000; | |
| + | const R = 3; | |
| + | const session = { | |
| + | sessionId: 'perf', | |
| + | prompts: [], | |
| + | firstTs: null, | |
| + | lastTs: null, | |
| + | stats: { models: [], filesTouched: [], rejections: 0, rejectionsByKind: {}, interruptions: 0 }, | |
| + | }; | |
| + | for (let i = 0; i < N; i++) { | |
| + | const rejections = []; | |
| + | for (let j = 0; j < R; j++) { | |
| + | rejections.push({ kind: 'user_declined_tool', source: 'tool_result', confidence: 1.0, toolUseId: `t-${i}-${j}`, tool: null, ts: null, evidence: `evidence ${i}-${j}` }); | |
| + | } | |
| + | session.prompts.push({ | |
| + | uuid: `p-${i}`, | |
| + | parentUuid: i === 0 ? null : `p-${i - 1}`, | |
| + | ts: new Date(i * 1000).toISOString(), | |
| + | text: `prompt ${i}`, | |
| + | hasImage: false, | |
| + | hadToolResultContext: false, | |
| + | afterInterruption: false, | |
| + | actions: [], | |
| + | thinking: 0, | |
| + | rejections, | |
| + | }); | |
| + | } | |
| + | const start = Date.now(); | |
| + | const nodes = classifyPrompts([session]); | |
| + | const tree = buildTree([session], nodes); | |
| + | analyzeTree(tree); | |
| + | const elapsed = Date.now() - start; | |
| + | // Threshold rationale: a quadratic regression at this scale would take | |
| + | // hours (5000x slower than linear). 15s is well above realistic linear cost | |
| + | // (~0.7ms per addFailure) and well below the quadratic danger zone. | |
| + | assert.ok(elapsed < 15000, `analyzeTree on ${N} nodes x ${R} rejections must complete in under 15s (got ${elapsed}ms)`); | |
| + | // Spot-check that rejections actually surfaced. | |
| + | assert.ok(tree.analysis.failures.length >= N, 'every node produced at least one failure signal'); | |
| + | }); | |
| + | ||
| + | test('rejections: redaction gate at the CLI layer catches secrets in rejection evidence', async () => { | |
| + | // Rejection evidence can carry anything the user or shell returned, including | |
| + | // a leaked secret. parse.js captures the evidence verbatim (truncated), and | |
| + | // the renderer does not redact. The CLI's redaction gate (applyDecisions + | |
| + | // shadow scan) must catch it before .treetrace/rejections.json is written. | |
| + | const tmp = mkdtempSync(join(tmpdir(), 'rej-redact-')); | |
| + | const path = join(tmp, 'r.jsonl'); | |
| + | writeFileSync( | |
| + | path, | |
| + | JSON.stringify({ | |
| + | type: 'user', | |
| + | message: { role: 'user', content: [{ type: 'tool_result', tool_use_id: 'toolu-s', content: "The user doesn't want to proceed with this tool use. The value was sk-ant-api03-FAKEFAKEFAKEFAKEFAKEFAKE1234.", is_error: true }] }, | |
| + | uuid: 'u-r-1', | |
| + | parentUuid: null, | |
| + | timestamp: '2026-06-18T13:00:00.000Z', | |
| + | sessionId: 'redact', | |
| + | }) + '\n' | |
| + | ); | |
| + | const dir = mkdtempSync(join(tmpdir(), 'rej-redact-out-')); | |
| + | try { | |
| + | await main(['--file', path, '--dir', dir, '--rejections', '--redact-auto', '--quiet']); | |
| + | const out = readFileSync(join(dir, '.treetrace', 'rejections.json'), 'utf8'); | |
| + | assert.ok(!out.includes('sk-ant-api03-FAKEFAKEFAKEFAKEFAKEFAKE1234'), 'raw secret must not appear in the written rejections.json'); | |
| + | assert.ok(out.includes('[REDACTED'), 'a redacted placeholder must appear in its place'); | |
| + | } finally { | |
| + | rmSync(tmp, { recursive: true, force: true }); | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('rejections: cli --rejections writes .treetrace/rejections.json and prints to stdout', async () => { | |
| + | const dir = mkdtempSync(join(tmpdir(), 'treetrace-rej-cli-')); | |
| + | try { | |
| + | await main(['--file', REJECTIONS_FIXTURE, '--dir', dir, '--rejections', '--redact-auto', '--quiet']); | |
| + | const p = join(dir, '.treetrace', 'rejections.json'); | |
| + | assert.ok(existsSync(p), '.treetrace/rejections.json must be written'); | |
| + | const text = readFileSync(p, 'utf8'); | |
| + | const parsed = JSON.parse(text); | |
| + | assert.equal(parsed.schemaVersion, '0.3'); | |
| + | assert.equal(parsed.summary.total, 7); | |
| + | assert.equal(parsed.summary.byKind.model_refusal, 2); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('rejections: --from claude works as an explicit --from value (Phase 0 false-advertising fix)', async () => { | |
| + | // The TOOLS array has always advertised 'claude' but the adapter switch never | |
| + | // handled it explicitly. ingestFile routes --from claude through parseSessionFile, | |
| + | // so this end-to-end check confirms it works and produces prompts+rejections. | |
| + | const dir = mkdtempSync(join(tmpdir(), 'treetrace-claude-from-')); | |
| + | try { | |
| + | await main(['--from', 'claude', '--file', REJECTIONS_FIXTURE, '--dir', dir, '--json', '--redact-auto', '--quiet']); | |
| + | // No assertion on stdout: success means no USAGE error. If --from claude | |
| + | // were rejected (as it would be for unknown --from values) main() would | |
| + | // throw with ExitCode.USAGE before reaching this line. | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); |