| @@ -15,7 +15,7 @@ jobs: | ||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: ${{ matrix.node }} | ||
| - | - run: node --test test/treetrace.test.js | |
| + | - run: node --test test/treetrace.test.js test/adapters.test.js | |
| - name: CLI smoke test (fixture, fail-closed redaction) | ||
| run: | | ||
| node bin/treetrace.js --file test/fixtures/synthetic-session.jsonl --dir "$RUNNER_TEMP" --redact-auto --quiet |
| @@ -4,11 +4,27 @@ Notable changes to TreeTrace. The format follows Keep a Changelog, and the proje | ||
| ## Unreleased | ||
| + | ## 0.5.0 - 2026-06-13 | |
| + | ||
| ### Added | ||
| - `--security` focused report mode. Prints a security-focused report that leads with concrete failure classes and answers five questions from the existing analysis: whether the agent touched auth, secrets, access control, crypto, dependency config, CI, deployment, or tests; whether it disabled or skipped tests; whether it ran risky shell commands; whether it referenced files, paths, imports, or packages that do not exist; and which human correction should become a future eval or memory item. It reuses the same signals as the full analysis and does not run a separate scanner. The report prints to stdout and writes `.treetrace/hallucinations.json`, both gated through the redaction shadow scan. | ||
| - | - Deterministic hallucination detector. TreeTrace runs inside the repository, so it extracts the files, paths, imports, and packages the agent referenced in prompts and captured actions, then verifies them against the real working tree and `package.json`, `package-lock.json`, and Python manifests. References that do not resolve are flagged as likely hallucinations in two categories, `hallucinated_file_or_path` and `hallucinated_import_or_package`, and surfaced both in the security report and in `.treetrace/hallucinations.json` (mirroring the `failures.json` shape). Each one carries an eval candidate. File and path existence and import and package declaration are checked; per-symbol and per-API resolution inside a module is not attempted, and the tool says so. Files the agent created during the session, relative paths, Node builtins, and Python standard library modules are excluded to avoid false positives. | |
| - | - Read-only MCP server. `treetrace mcp` (or `treetrace --mcp`) starts a Model Context Protocol server over stdio using JSON-RPC 2.0, hand-rolled with no dependencies. It implements `initialize`, `tools/list`, and `tools/call`, and exposes four read-only tools that reuse existing functionality: `handoff`, `lessons`, `security_summary`, and `eval_candidates`. No tool mutates files, runs shell, hits the network, or requires authentication. Every returned text passes the same redaction shadow scan as the file exports. | |
| + | - Deterministic hallucination detector. TreeTrace runs inside the repository, so it extracts the files, paths, imports, and packages the agent referenced in prompts and captured actions, then verifies them against the real working tree and `package.json`, `package-lock.json`, and Python manifests. References that do not resolve are flagged as likely hallucinations in two categories, `hallucinated_file_or_path` and `hallucinated_import_or_package`, and surfaced both in the security report and in `.treetrace/hallucinations.json` (mirroring the `failures.json` shape). Each one carries an eval candidate. File and path existence and import and package declaration are checked; per-symbol and per-API resolution inside a module is not attempted, and the tool says so. Files the agent created during the session, Node builtins, and Python standard library modules are excluded to avoid false positives. Relative paths inside the project (`./` and bare) are resolved and verified; absolute paths and `../` references that fall outside the project directory are treated as out of scope and are never stat checked, so detection never reveals host filesystem state outside the project. | |
| + | - Read-only MCP server. `treetrace mcp` (or `treetrace --mcp`) starts a Model Context Protocol server over stdio using JSON-RPC 2.0, hand-rolled with no dependencies. It implements `initialize`, `tools/list`, and `tools/call`, and exposes four read-only tools that reuse existing functionality: `handoff`, `lessons`, `security_summary`, and `eval_candidates`. No tool mutates files, runs shell, hits the network, or requires authentication. Every returned text passes the same redaction shadow scan as the file exports. Tools take no arguments and reject extra arguments; point the server at a project with `--dir` or import with `--file` (the JSON-RPC transport owns stdin, so `--stdin` is not available in MCP mode). | |
| + | ||
| + | ### Security | |
| + | ||
| + | - Redaction now catches generic secret assignments written in JSON style (`"api_key":"..."`), single-quoted keys, backtick values, and multiline quoted values, and treats a generic secret-key assignment as a finding even when the value has low entropy. Over-redaction is the safe side for a privacy tool. These shapes previously reached written artifacts even under `--redact-auto`. | |
| + | - A prior `keep` decision in `.treetrace/redactions.json` is no longer honored for high or medium findings under `--redact-auto`, non-interactive (non-TTY) runs, or the MCP server. A `keep` is only honored inside an interactive terminal session, so a preseeded redactions file in an untrusted repository can no longer cause a raw secret to be emitted. | |
| + | - The hallucination detector and MCP `security_summary` no longer stat absolute paths or `../` references outside the project directory, removing a filesystem existence oracle. | |
| + | - Claude session auto-discovery validates each session's recorded `cwd` against the target directory, so a different project whose path munges to the same storage directory name is no longer read. | |
| + | ||
| + | ### Fixed | |
| + | ||
| + | - The hallucination detector flags an `Edit` or `NotebookEdit` to a file that does not exist in the working tree (only `Write`, or an edit to a file that exists, counts as created), and resolves relative (`./`, bare) missing paths that were previously skipped. | |
| + | - Risky-command detection covers `rm -fr`, `rm -r -f`, `chmod -R 777`, `chmod 0777`, `curl | sudo bash`, `curl | zsh`, `bash <(curl ...)`, `DROP SCHEMA`, and bare `TRUNCATE`. Test-disable detection covers `test.skip`, `describe.skip`, `it.skip`, `xit`, and similar framework skip and removal idioms. | |
| + | - Value-taking options (`--from`, `--dir`, `--out`, `--report-file`, `--since`) reject a missing value or a value that begins with `--`, so a typo no longer writes a file named after a flag. `--since` requires a real date and applies only to timestamped sessions. `--stdin --from claude` is rejected with a clear message. | |
| + | - `--handoff` persists redaction decisions to `.treetrace/redactions.json` when any were made. | |
| ## 0.4.1 - 2026-06-13 | ||
| @@ -185,7 +185,7 @@ This is honest about its limits. File, path, import, and package existence are s | ||
| - `security_summary` - evidence-backed security-sensitive touches | ||
| - `eval_candidates` - compact regression cases | ||
| - | No tool mutates files, runs shell, reaches the network, or requires authentication. Every returned text passes the same redaction shadow scan as the file exports. Point it at a project with `--dir`, or import a transcript with `--file` or `--stdin`, exactly like a normal run. | |
| + | No tool mutates files, runs shell, reaches the network, or requires authentication. Every returned text passes the same redaction shadow scan as the file exports. Point it at a project with `--dir`, or import a transcript with `--file`. The MCP server uses stdin for its JSON-RPC transport, so `--stdin` transcript paste is not available in MCP mode; use `--file` instead. | |
| ## Redaction gate | ||
| @@ -1,6 +1,6 @@ | ||
| { | ||
| "name": "treetrace", | ||
| - | "version": "0.4.1", | |
| + | "version": "0.5.0", | |
| "description": "Turn AI coding sessions into regression-ready prompt lineage, failure analysis, eval cases, and handoff memory.", | ||
| "keywords": [ | ||
| "claude-code", | ||
| @@ -32,7 +32,8 @@ | ||
| "files": [ | ||
| "bin", | ||
| "src", | ||
| - | "SCHEMA.md" | |
| + | "SCHEMA.md", | |
| + | "NOTICE" | |
| ], | ||
| "engines": { | ||
| "node": ">=18" |
| @@ -79,7 +79,8 @@ const REMEDIATION_RE = new RegExp(`${DESTRUCTIVE_RE.source}|${RECOVERY_RE.source | ||
| const SECURITY_FILE_RE = /(?:^|[\\/])(?:\.env[^\\/]*|[^\\/]*(?:auth|session|middleware|login|signin|signup|permission|rbac|access[-_]?control|secur|crypto|jwt|oauth|passwd|password|secret|credential|token)[^\\/]*)$/i; | ||
| const SECURITY_FILE_EXCLUDE_RE = /(?:^|[\\/])(?:[^\\/]*tokens?\.[a-z]+|tokenizer[^\\/]*|[^\\/]*[-_.]?token(?:izer|s)?\.(?:tsx?|jsx?|css|scss|json|svg)|semantic[-_]?tokens?[^\\/]*|design[-_]?tokens?[^\\/]*)$/i; | ||
| - | const RISKY_CMD_RE = /(?:\brm\s+-rf\b|\bchmod\s+777\b|curl[^|]*\|\s*(?:sh|bash)|wget[^|]*\|\s*(?:sh|bash)|--no-verify\b|--force(?![\w-])|\bDROP\s+TABLE\b|\bTRUNCATE\s+TABLE\b)/i; | |
| + | const RISKY_CMD_RE = | |
| + | /(?:\brm\s+(?:-[a-zA-Z]*\s+)*-[a-zA-Z]*(?:rf|fr)[a-zA-Z]*\b|\brm\s+(?:-[a-zA-Z]*\s+)*-[a-zA-Z]*r[a-zA-Z]*\s+(?:-[a-zA-Z]*\s+)*-[a-zA-Z]*f[a-zA-Z]*\b|\brm\s+(?:-[a-zA-Z]*\s+)*-[a-zA-Z]*f[a-zA-Z]*\s+(?:-[a-zA-Z]*\s+)*-[a-zA-Z]*r[a-zA-Z]*\b|\bchmod\s+(?:-[a-zA-Z]+\s+)*0?777\b|(?:curl|wget)[^|\n]*\|\s*(?:sudo\s+)?(?:sh|bash|zsh|dash|ksh)\b|\b(?:sh|bash|zsh|dash|ksh)\s+<\(\s*(?:curl|wget)\b|--no-verify\b|--force(?![\w-])|\bDROP\s+TABLE\b|\bDROP\s+SCHEMA\b|\bTRUNCATE\s+(?:TABLE\s+)?[\w."`]+)/i; | |
| const SECRET_CONTENT_RE = /(?:\bsource\s+[^\n]*\.env\b|(?:^|[;&|]|\s)\.\s+[^\n]*\.env\b|\.env\.(?:secrets|local|prod|production)\b|\bexport\s+[A-Z0-9_]*(?:_API_KEY|_TOKEN|_SECRET|_PASSWORD|API_KEY|SECRET_KEY|ACCESS_KEY|PRIVATE_KEY)\b|\b(?:wrangler|doppler|vault)\b|\bgh\s+auth\b|\baws\s+configure\b|\bgcloud\s+auth\b|\bkubectl\s+config\s+set-credentials\b)/i; | ||
| const ACCESS_CONTROL_CONTENT_RE = /\b(?:grant\s+(?:select|insert|update|delete|all)\b|setfacl|chmod\s+[0-7]{3,4}\b)/i; | ||
| const ACCESS_CONTROL_WEAK_RE = /\b(?:rbac|access[-_]?control)\b/i; | ||
| @@ -100,6 +101,8 @@ const SECURITY_SURFACE_RULES = [ | ||
| { surface: 'deployment', re: /(?:^|[\\/])(?:Dockerfile|docker-compose[^\\/]*\.ya?ml|[^\\/]*\.(?:tf|tfvars)|wrangler\.toml|vercel\.json|netlify\.toml|fly\.toml|[^\\/]*deploy[^\\/]*)$/i }, | ||
| { surface: 'tests', re: /(?:^|[\\/])[^\\/]*(?:\.(?:test|spec)\.[a-z0-9]+|_test\.[a-z0-9]+|test_[^\\/]+)$|(?:^|[\\/])(?:tests?|__tests__|spec)[\\/]/i }, | ||
| ]; | ||
| + | const TEST_SKIP_API_RE = | |
| + | /\b(?:test|it|describe|context|suite|t)\.(?:skip|only|todo)\b|\bx(?:it|describe|test|context)\s*\(|\bf(?:it|describe)\s*\(|@(?:Disabled|Ignore|Skip)\b|\bpytest\.mark\.skip\w*|\b(?:skip|disabl\w*|remov\w*|delet\w*|drop)\b[^.\n]{0,24}\b(?:e2e|integration|unit|smoke|auth)?\s*(?:tests?|specs?|suite)\b|\b(?:tests?|specs?|suite)\b[^.\n]{0,24}\b(?:disabl|skip|remov|delet|comment(?:ed)? out|turn(?:ed)? off)\w*|--no-tests?\b|--skip-tests?\b/i; | |
| const TEST_SKIP_RE = | ||
| /\b(?:disabl|skip|remov|delet|comment(?:ed)? out|drop|turn(?:ed)? off|x?(?:it|describe)\.skip|--no-tests?|--skip-tests?)\w*\b[^.\n]{0,24}\btests?\b|\btests?\b[^.\n]{0,24}\b(?:disabl|skip|remov|delet|comment(?:ed)? out|turn(?:ed)? off)\w*/i; | ||
| @@ -116,7 +119,11 @@ export function isRiskyCommand(command) { | ||
| } | ||
| export function mentionsTestSkip(text) { | ||
| - | return typeof text === 'string' && text.length <= 4000 && TEST_SKIP_RE.test(text); | |
| + | return ( | |
| + | typeof text === 'string' && | |
| + | text.length <= 4000 && | |
| + | (TEST_SKIP_RE.test(text) || TEST_SKIP_API_RE.test(text)) | |
| + | ); | |
| } | ||
| function securityActions(node) { |
| @@ -54,6 +54,7 @@ Options: | ||
| --mcp start a read-only MCP server over stdio (same as: treetrace mcp) | ||
| --redact-auto redact every detected secret without prompting | ||
| --since <YYYY-MM-DD> only include sessions active on/after this date | ||
| + | (timestamped sessions only; plain transcripts are excluded) | |
| --quiet suppress progress output | ||
| --version, --help | ||
| @@ -82,6 +83,10 @@ export async function main(argv) { | ||
| if (opts.handoff) { | ||
| const pack = renderHandoff(tree, renderOpts); | ||
| assertClean(pack, decisions, 'handoff brief'); | ||
| + | if (Object.keys(decisions).length) { | |
| + | mkdirSync(ttDir, { recursive: true }); | |
| + | writeFileSync(decisionsPath, JSON.stringify(decisions, null, 2)); | |
| + | } | |
| process.stdout.write(pack); | ||
| log(c.green(`โ handoff brief for ${projectName} (${plural(tree.stats.promptCount, 'prompt')} distilled)`)); | ||
| return; | ||
| @@ -199,7 +204,13 @@ export async function loadRedactedTree(opts, projectDir, projectName, log = () = | ||
| } | ||
| if (opts.since) { | ||
| - | sessions = sessions.filter((s) => !s.lastTs || s.lastTs >= opts.since); | |
| + | sessions = sessions.filter((s) => s.lastTs && s.lastTs >= opts.since); | |
| + | if (!sessions.length) { | |
| + | throw new Error( | |
| + | `no sessions on or after ${opts.since}. --since only applies to timestamped sessions; ` + | |
| + | `plain transcripts carry no timestamps and are excluded when --since is set.` | |
| + | ); | |
| + | } | |
| } | ||
| const nodes = classifyPrompts(sessions); | ||
| @@ -231,10 +242,17 @@ export async function loadRedactedTree(opts, projectDir, projectName, log = () = | ||
| } | ||
| const interactive = !forceAuto && process.stdin.isTTY && process.stderr.isTTY && !opts.redactAuto; | ||
| - | const { decisions, asked, autoRedacted } = await resolveFindings(findings, priorDecisions, { | |
| + | const { decisions, asked, autoRedacted, overriddenKeeps } = await resolveFindings(findings, priorDecisions, { | |
| interactive, | ||
| autoRedact: forceAuto || opts.redactAuto, | ||
| }); | ||
| + | if (overriddenKeeps) { | |
| + | log( | |
| + | c.yellow( | |
| + | `re-redacted ${plural(overriddenKeeps, 'prior keep decision')} in non-interactive mode (keep is only honored in an interactive session)` | |
| + | ) | |
| + | ); | |
| + | } | |
| if (autoRedacted) { | ||
| log( | ||
| c.yellow( | ||
| @@ -453,10 +471,21 @@ export function parseArgs(argv) { | ||
| reportFile: null, | ||
| since: null, | ||
| }; | ||
| - | for (let i = 0; i < argv.length; i++) { | |
| + | let i = 0; | |
| + | const requireValue = (flag) => { | |
| + | const next = argv[i + 1]; | |
| + | if (next === undefined || next.startsWith('--')) { | |
| + | throw new Error(`${flag} requires a value`); | |
| + | } | |
| + | return argv[++i]; | |
| + | }; | |
| + | for (; i < argv.length; i++) { | |
| const a = argv[i]; | ||
| switch (a) { | ||
| case '--file': | ||
| + | if (argv[i + 1] === undefined || argv[i + 1].startsWith('--')) { | |
| + | throw new Error('--file requires at least one path'); | |
| + | } | |
| while (argv[i + 1] && !argv[i + 1].startsWith('--')) opts.files.push(argv[++i]); | ||
| break; | ||
| case '--stdin': opts.stdin = true; break; | ||
| @@ -476,18 +505,26 @@ export function parseArgs(argv) { | ||
| case '--help': case '-h': opts.help = true; break; | ||
| case '--version': case '-v': opts.version = true; break; | ||
| case '--from': | ||
| - | opts.from = argv[++i]; | |
| + | opts.from = requireValue('--from'); | |
| if (!TOOLS.includes(opts.from)) { | ||
| throw new Error(`unknown --from value "${opts.from}" (expected one of: ${TOOLS.join(', ')})`); | ||
| } | ||
| break; | ||
| - | case '--dir': opts.dir = argv[++i]; break; | |
| - | case '--out': opts.out = argv[++i]; break; | |
| - | case '--report-file': opts.reportFile = argv[++i]; break; | |
| - | case '--since': opts.since = argv[++i]; break; | |
| + | case '--dir': opts.dir = requireValue('--dir'); break; | |
| + | case '--out': opts.out = requireValue('--out'); break; | |
| + | case '--report-file': opts.reportFile = requireValue('--report-file'); break; | |
| + | case '--since': | |
| + | opts.since = requireValue('--since'); | |
| + | if (!/^\d{4}-\d{2}-\d{2}([T ].*)?$/.test(opts.since) || Number.isNaN(Date.parse(opts.since))) { | |
| + | throw new Error(`--since expects a date like YYYY-MM-DD (got "${opts.since}")`); | |
| + | } | |
| + | break; | |
| default: | ||
| throw new Error(`unknown option ${a} (try --help)`); | ||
| } | ||
| } | ||
| + | if (opts.stdin && opts.from === 'claude') { | |
| + | throw new Error('--stdin cannot be combined with --from claude: Claude Code JSONL sessions are read from files. Use --file, or omit --from to paste a plain transcript.'); | |
| + | } | |
| return opts; | ||
| } |
| @@ -1,4 +1,4 @@ | ||
| - | import { readdirSync, statSync, existsSync } from 'node:fs'; | |
| + | import { readdirSync, statSync, existsSync, openSync, readSync, closeSync } from 'node:fs'; | |
| import { homedir } from 'node:os'; | ||
| import { join, resolve, sep } from 'node:path'; | ||
| @@ -6,6 +6,36 @@ export function mungePath(absPath) { | ||
| return absPath.replace(/[^A-Za-z0-9-]/g, '-'); | ||
| } | ||
| + | const CWD_PROBE_BYTES = 65536; | |
| + | const CWD_RE = /"cwd"\s*:\s*"((?:[^"\\]|\\.)*)"/; | |
| + | ||
| + | export function recordedCwd(filePath) { | |
| + | let fd; | |
| + | try { | |
| + | fd = openSync(filePath, 'r'); | |
| + | const buf = Buffer.alloc(CWD_PROBE_BYTES); | |
| + | const bytes = readSync(fd, buf, 0, CWD_PROBE_BYTES, 0); | |
| + | const head = buf.toString('utf8', 0, bytes); | |
| + | const m = head.match(CWD_RE); | |
| + | if (!m) return null; | |
| + | try { | |
| + | return JSON.parse(`"${m[1]}"`); | |
| + | } catch { | |
| + | return m[1]; | |
| + | } | |
| + | } catch { | |
| + | return null; | |
| + | } finally { | |
| + | if (fd !== undefined) { | |
| + | try { | |
| + | closeSync(fd); | |
| + | } catch { | |
| + | ||
| + | } | |
| + | } | |
| + | } | |
| + | } | |
| + | ||
| export function claudeProjectsRoot() { | ||
| return process.env.CLAUDE_CONFIG_DIR | ||
| ? join(process.env.CLAUDE_CONFIG_DIR, 'projects') | ||
| @@ -35,6 +65,8 @@ export function discoverSessions(projectDir) { | ||
| } catch { | ||
| continue; | ||
| } | ||
| + | const cwd = recordedCwd(path); | |
| + | if (cwd && resolve(cwd) !== abs) continue; | |
| sessions.push({ | ||
| path, | ||
| sessionId: f.name.replace(/\.jsonl$/, ''), |
| @@ -1,5 +1,5 @@ | ||
| import { readFileSync, existsSync, statSync } from 'node:fs'; | ||
| - | import { isAbsolute, join, resolve, normalize } from 'node:path'; | |
| + | import { isAbsolute, join, resolve, sep } from 'node:path'; | |
| import { truncate } from './util.js'; | ||
| const NODE_BUILTINS = new Set([ | ||
| @@ -95,13 +95,15 @@ function packageRoot(spec) { | ||
| return spec.split('/')[0]; | ||
| } | ||
| - | function collectCreatedFiles(tree) { | |
| + | function collectCreatedFiles(tree, projectDir) { | |
| const created = new Set(); | ||
| for (const node of tree.nodes) { | ||
| for (const a of node.actions || []) { | ||
| if (!a.file || typeof a.file !== 'string') continue; | ||
| - | if (a.tool === 'Write' || a.tool === 'Edit' || a.tool === 'NotebookEdit') { | |
| + | if (a.tool === 'Write') { | |
| created.add(normalizeFileKey(a.file)); | ||
| + | } else if (a.tool === 'Edit' || a.tool === 'NotebookEdit') { | |
| + | if (fileExists(projectDir, a.file)) created.add(normalizeFileKey(a.file)); | |
| } | ||
| } | ||
| } | ||
| @@ -121,27 +123,35 @@ function looksLikeFileToken(tok) { | ||
| return true; | ||
| } | ||
| - | function fileExists(projectDir, rel) { | |
| + | function withinProjectDir(projectDir, target) { | |
| + | const root = resolve(projectDir); | |
| + | const resolved = resolve(target); | |
| + | return resolved === root || resolved.startsWith(root + sep); | |
| + | } | |
| + | ||
| + | function resolveInProject(projectDir, rel) { | |
| const clean = rel.replace(/^\.\//, ''); | ||
| - | let target; | |
| - | if (isAbsolute(clean)) { | |
| - | target = clean; | |
| - | } else { | |
| - | target = resolve(projectDir, clean); | |
| - | } | |
| + | const target = isAbsolute(clean) ? clean : resolve(projectDir, clean); | |
| + | if (!withinProjectDir(projectDir, target)) return null; | |
| + | return target; | |
| + | } | |
| + | ||
| + | function fileExists(projectDir, rel) { | |
| + | const target = resolveInProject(projectDir, rel); | |
| + | if (!target) return true; | |
| try { | ||
| if (existsSync(target)) return true; | ||
| } catch { | ||
| } | ||
| - | const base = clean.split('/').pop(); | |
| - | return globByBasename(projectDir, base, target); | |
| + | const base = rel.replace(/^\.\//, '').split('/').pop(); | |
| + | return globByBasename(projectDir, base); | |
| } | ||
| - | function globByBasename(projectDir, base, fullCandidate) { | |
| + | function globByBasename(projectDir, base) { | |
| try { | ||
| const direct = join(projectDir, base); | ||
| - | if (existsSync(direct) && statSync(direct).isFile()) return true; | |
| + | if (withinProjectDir(projectDir, direct) && existsSync(direct) && statSync(direct).isFile()) return true; | |
| } catch { | ||
| } | ||
| @@ -212,7 +222,7 @@ export function detectHallucinations(tree, projectDir, opts = {}) { | ||
| return { schemaVersion: '0.2', verifiedAgainstWorkingTree: false, hallucinations, summary: emptySummary() }; | ||
| } | ||
| - | const created = collectCreatedFiles(tree); | |
| + | const created = collectCreatedFiles(tree, projectDir); | |
| const pkgNames = readPackageNames(projectDir); | ||
| const lockNames = readLockfilePackages(projectDir); | ||
| const pyNames = readPyRequirements(projectDir); | ||
| @@ -220,7 +230,6 @@ export function detectHallucinations(tree, projectDir, opts = {}) { | ||
| for (const ref of collectFileReferences(tree)) { | ||
| if (created.has(ref.key)) continue; | ||
| - | if (REL_PREFIX_RE.test(ref.token)) continue; | |
| if (fileExists(projectDir, ref.token)) continue; | ||
| hallucinations.push({ | ||
| category: 'hallucinated_file_or_path', |
| @@ -7,6 +7,7 @@ import { renderSecurityReport } from './security-report.js'; | ||
| import { renderHallucinationsJson } from './hallucinate.js'; | ||
| const PROTOCOL_VERSION = '2024-11-05'; | ||
| + | const MAX_REQUEST_BYTES = 1048576; | |
| const TOOL_DEFS = [ | ||
| { | ||
| @@ -35,15 +36,29 @@ export async function startMcpServer({ argv, version }, io = {}) { | ||
| const input = io.input || process.stdin; | ||
| const output = io.output || process.stdout; | ||
| const opts = parseArgs((argv || []).filter((a) => a !== 'mcp' && a !== '--mcp')); | ||
| + | if (opts.stdin) { | |
| + | throw new Error( | |
| + | 'treetrace mcp does not support --stdin: stdin is the JSON-RPC transport for the MCP server. ' + | |
| + | 'Point the server at a project with --dir, or import a transcript with --file.' | |
| + | ); | |
| + | } | |
| const projectDir = resolve(opts.dir || process.cwd()); | ||
| const projectName = detectProjectName(projectDir); | ||
| let cache = null; | ||
| + | let inFlight = null; | |
| const ensureTree = async () => { | ||
| if (cache) return cache; | ||
| - | const { tree, decisions } = await loadRedactedTree(opts, projectDir, projectName, () => {}, { forceAuto: true }); | |
| - | cache = { tree, decisions, renderOpts: { projectName, version, projectDir, generatedAt: new Date().toISOString() } }; | |
| - | return cache; | |
| + | if (!inFlight) { | |
| + | inFlight = (async () => { | |
| + | const { tree, decisions } = await loadRedactedTree(opts, projectDir, projectName, () => {}, { forceAuto: true }); | |
| + | cache = { tree, decisions, renderOpts: { projectName, version, projectDir, generatedAt: new Date().toISOString() } }; | |
| + | return cache; | |
| + | })().finally(() => { | |
| + | inFlight = null; | |
| + | }); | |
| + | } | |
| + | return inFlight; | |
| }; | ||
| return new Promise((resolveServer) => { | ||
| @@ -53,6 +68,10 @@ export async function startMcpServer({ argv, version }, io = {}) { | ||
| rl.on('line', async (line) => { | ||
| const text = line.trim(); | ||
| if (!text) return; | ||
| + | if (text.length > MAX_REQUEST_BYTES) { | |
| + | send({ jsonrpc: '2.0', id: null, error: { code: -32600, message: 'Invalid Request: request exceeds size limit' } }); | |
| + | return; | |
| + | } | |
| let req; | ||
| try { | ||
| req = JSON.parse(text); | ||
| @@ -60,26 +79,37 @@ export async function startMcpServer({ argv, version }, io = {}) { | ||
| send({ jsonrpc: '2.0', id: null, error: { code: -32700, message: 'Parse error' } }); | ||
| return; | ||
| } | ||
| + | if (Array.isArray(req)) { | |
| + | send({ jsonrpc: '2.0', id: null, error: { code: -32600, message: 'Invalid Request: JSON-RPC batch requests are not supported' } }); | |
| + | return; | |
| + | } | |
| try { | ||
| await handle(req, send, ensureTree, version); | ||
| } catch (err) { | ||
| - | send({ | |
| - | jsonrpc: '2.0', | |
| - | id: req && req.id !== undefined ? req.id : null, | |
| - | error: { code: -32603, message: `Internal error: ${err && err.message ? err.message : 'unknown'}` }, | |
| - | }); | |
| + | if (isRequestWithId(req)) { | |
| + | send({ | |
| + | jsonrpc: '2.0', | |
| + | id: req.id, | |
| + | error: { code: -32603, message: `Internal error: ${err && err.message ? err.message : 'unknown'}` }, | |
| + | }); | |
| + | } | |
| } | ||
| }); | ||
| rl.on('close', () => resolveServer()); | ||
| }); | ||
| } | ||
| + | function isRequestWithId(req) { | |
| + | return Boolean(req) && typeof req === 'object' && !Array.isArray(req) && 'id' in req; | |
| + | } | |
| + | ||
| async function handle(req, send, ensureTree, version) { | ||
| + | const hasId = isRequestWithId(req); | |
| if (!req || req.jsonrpc !== '2.0' || typeof req.method !== 'string') { | ||
| - | send({ jsonrpc: '2.0', id: req && req.id !== undefined ? req.id : null, error: { code: -32600, message: 'Invalid Request' } }); | |
| + | if (hasId) send({ jsonrpc: '2.0', id: req.id, error: { code: -32600, message: 'Invalid Request' } }); | |
| return; | ||
| } | ||
| - | const isNotification = req.id === undefined || req.id === null; | |
| + | const isNotification = !hasId; | |
| const reply = (result) => { if (!isNotification) send({ jsonrpc: '2.0', id: req.id, result }); }; | ||
| const fail = (code, message) => { if (!isNotification) send({ jsonrpc: '2.0', id: req.id, error: { code, message } }); }; | ||
| @@ -108,6 +138,13 @@ async function handle(req, send, ensureTree, version) { | ||
| fail(-32602, `Unknown tool: ${name}`); | ||
| return; | ||
| } | ||
| + | const args = params.arguments; | |
| + | if (args !== undefined && args !== null) { | |
| + | if (typeof args !== 'object' || Array.isArray(args) || Object.keys(args).length > 0) { | |
| + | fail(-32602, `Tool ${name} accepts no arguments`); | |
| + | return; | |
| + | } | |
| + | } | |
| const { tree, decisions, renderOpts } = await ensureTree(); | ||
| const text = renderTool(name, tree, renderOpts); | ||
| assertClean(text, decisions, `mcp tool ${name}`); |
| @@ -25,7 +25,7 @@ export const RULES = [ | ||
| { id: 'wireguard-key', severity: 'medium', re: /\b(PrivateKey|PresharedKey)\s*=\s*[A-Za-z0-9+/]{42,44}=?/g }, | ||
| { id: 'url-basic-auth', severity: 'medium', re: /\b[a-z][a-z0-9+.-]{0,30}:\/\/[^/\s:@'"`]{2,256}:[^/\s@'"`]{2,256}@[^\s'"`]{1,512}/gi }, | ||
| { id: 'bearer-header', severity: 'medium', re: /\bBearer\s+[A-Za-z0-9._+/=-]{20,}\b/g }, | ||
| - | { id: 'secret-assignment', severity: 'medium', re: /\b(password|passwd|pwd|secret|api[_-]?key|access[_-]?token|auth[_-]?token|client[_-]?secret)\b\s*[:=]\s*(?!(?:['"]?\s*)?(?:\$\{|<|%|\*{3}|\.{3}|REDACTED|xxx+|placeholder|changeme|example|your[-_]))(?:"[^"\r\n]{8,}"|'[^'\r\n]{8,}'|[^\s'"`,;]{8,})/gi }, | |
| + | { id: 'secret-assignment', severity: 'medium', re: /["'`]?\b(password|passwd|pwd|secret|api[_-]?key|access[_-]?token|auth[_-]?token|client[_-]?secret|secret[_-]?key|token|bearer)\b["'`]?\s*[:=]\s*(?!(?:["'`]?\s*)?(?:\$\{|\$\(|<|%|\*{3}|\.{3}|REDACTED|\[REDACTED|xxx+|placeholder|changeme|example|your[-_]|null\b|true\b|false\b))(?:"[^"\\]{4,512}"|'[^'\\]{4,512}'|`[^`\\]{4,512}`|[^\s'"`,;){}]{6,512})/gi }, | |
| { id: 'email', severity: 'soft', re: /\b[A-Za-z0-9._%+-]+@(?!(?:users\.noreply\.github\.com|example\.(?:com|org)))[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g }, | ||
| { id: 'ipv4', severity: 'soft', re: /\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\b(?!\.\d)/g }, | ||
| @@ -176,14 +176,26 @@ export async function resolveFindings(findings, priorDecisions, { interactive, a | ||
| unique.get(h).count++; | ||
| } | ||
| + | const autoMode = !interactive || autoRedact; | |
| + | let overriddenKeeps = 0; | |
| + | if (autoMode) { | |
| + | for (const [h, { finding }] of unique) { | |
| + | const prior = decisions[h]; | |
| + | if (prior && prior.action === 'keep' && (finding.severity === 'high' || finding.severity === 'medium')) { | |
| + | delete decisions[h]; | |
| + | overriddenKeeps++; | |
| + | } | |
| + | } | |
| + | } | |
| + | ||
| const unresolved = [...unique.entries()].filter(([h]) => !decisions[h]); | ||
| - | if (!unresolved.length) return { decisions, asked: 0 }; | |
| + | if (!unresolved.length) return { decisions, asked: 0, overriddenKeeps }; | |
| - | if (!interactive || autoRedact) { | |
| + | if (autoMode) { | |
| for (const [h, { finding }] of unresolved) { | ||
| decisions[h] = { action: 'redact', replacement: maskFor(finding), ruleId: finding.ruleId }; | ||
| } | ||
| - | return { decisions, asked: 0, autoRedacted: unresolved.length }; | |
| + | return { decisions, asked: 0, autoRedacted: unresolved.length, overriddenKeeps }; | |
| } | ||
| const rl = createInterface({ input: process.stdin, output: process.stderr }); |
| @@ -19,8 +19,10 @@ import { | ||
| renderLessonsMarkdown, | ||
| renderEvalsJsonl, | ||
| renderMemoryMarkdown, | ||
| + | isRiskyCommand, | |
| + | mentionsTestSkip, | |
| } from '../src/analyze.js'; | ||
| - | import { main } from '../src/cli.js'; | |
| + | import { main, parseArgs } from '../src/cli.js'; | |
| import { mungePath } from '../src/discover.js'; | ||
| import { sha256, escapeMd } from '../src/util.js'; | ||
| import { detectHallucinations, renderHallucinationsJson } from '../src/hallucinate.js'; | ||
| @@ -924,3 +926,275 @@ test('mcp: initialize, tools/list, and tools/call return well-formed JSON-RPC', | ||
| rmSync(dir, { recursive: true, force: true }); | ||
| } | ||
| }); | ||
| + | ||
| + | import { recordedCwd } from '../src/discover.js'; | |
| + | ||
| + | test('redaction: JSON-style, quoted, backtick, and multiline secret assignments are caught', () => { | |
| + | const cases = [ | |
| + | '{"api_key":"supersecretvalue"}', | |
| + | '{"client_secret":"correcthorsebattery"}', | |
| + | '{"access_token":"correct-horse-battery"}', | |
| + | "{'api_key':'correcthorsebattery'}", | |
| + | 'const password = `correct horse battery staple`;', | |
| + | 'api_key: `correct-horse-battery-staple`', | |
| + | 'API_KEY="line1\nline2line2line2"', | |
| + | ]; | |
| + | for (const sample of cases) { | |
| + | const hits = scanText(sample).map((f) => f.ruleId); | |
| + | assert.ok(hits.includes('secret-assignment'), `secret-assignment missed in: ${JSON.stringify(sample)} (got ${hits})`); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('redaction: generic secret-key assignment is caught even with a low-entropy value', () => { | |
| + | const sample = 'password: "hunter2hunter2"'; | |
| + | const hits = scanText(sample).map((f) => f.ruleId); | |
| + | assert.ok(hits.includes('secret-assignment'), 'low-entropy generic secret should still be a finding'); | |
| + | }); | |
| + | ||
| + | test('redaction: placeholder secret assignments are not flagged', () => { | |
| + | for (const benign of ['token: null', 'password: ""', 'secret: "${SECRET}"', 'api_key: <your-key>', 'token=true']) { | |
| + | const hard = scanText(benign).filter((f) => f.severity !== 'soft'); | |
| + | assert.deepEqual(hard, [], `${benign} should not flag (got ${JSON.stringify(hard)})`); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('redaction: a JSON-style secret leaves no raw value in any artifact end to end', async () => { | |
| + | const secret = 'supersecretvalue'; | |
| + | const back = 'correct-horse-battery-staple'; | |
| + | const dir = mkdtempSync(join(tmpdir(), 'treetrace-json-secret-')); | |
| + | const file = join(dir, 'conv.json'); | |
| + | const convo = [{ | |
| + | mapping: { | |
| + | r: { message: null, parent: null, children: ['u'] }, | |
| + | u: { message: { author: { role: 'user' }, content: { parts: [`config is {"api_key":"${secret}"} and password = \`${back}\``] }, create_time: 1.0 }, parent: 'r', children: ['a'] }, | |
| + | a: { message: { author: { role: 'assistant' }, content: { parts: ['done'] }, create_time: 2.0 }, parent: 'u', children: [] }, | |
| + | }, | |
| + | }]; | |
| + | writeFileSync(file, JSON.stringify(convo)); | |
| + | try { | |
| + | await main(['--from', 'chatgpt', '--file', file, '--dir', dir, '--report', '--analysis', '--redact-auto', '--quiet']); | |
| + | const artifacts = [ | |
| + | 'PROMPT_TREE.md', 'TREETRACE_REPORT.md', '.treetrace/tree.json', | |
| + | '.treetrace/failures.json', '.treetrace/lessons.md', '.treetrace/evals.jsonl', '.treetrace/agent-memory.md', | |
| + | ].filter((f) => existsSync(join(dir, f))).map((f) => readFileSync(join(dir, f), 'utf8')).join('\n'); | |
| + | assert.ok(!artifacts.includes(secret), 'JSON-style secret value leaked into an artifact'); | |
| + | assert.ok(!artifacts.includes(back), 'backtick secret value leaked into an artifact'); | |
| + | assert.ok(artifacts.includes('[REDACTED:secret-assignment]'), 'expected a secret-assignment redaction marker'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('redaction: a prior keep decision is ignored under --redact-auto and non-TTY auto mode', async () => { | |
| + | const token = 'ghp_0123456789abcdefghijklmnopqrstuvwxyzAB'; | |
| + | const text = `Use token ${token} for setup`; | |
| + | const findings = scanText(text); | |
| + | const prior = { [sha256(token)]: { action: 'keep', ruleId: 'github-token' } }; | |
| + | ||
| + | const auto = await resolveFindings(findings, prior, { interactive: false, autoRedact: true }); | |
| + | assert.equal(auto.overriddenKeeps, 1, 'auto mode should override a prior keep'); | |
| + | const outAuto = applyDecisions(text, findings, auto.decisions); | |
| + | assert.ok(!outAuto.includes(token), 'raw token leaked under --redact-auto despite re-redaction'); | |
| + | assert.equal(shadowScan(outAuto, auto.decisions).length, 0, 'shadow scan should be clean after override'); | |
| + | ||
| + | const nonTty = await resolveFindings(findings, prior, { interactive: false, autoRedact: false }); | |
| + | assert.equal(nonTty.overriddenKeeps, 1, 'non-TTY auto mode should override a prior keep'); | |
| + | assert.ok(!applyDecisions(text, findings, nonTty.decisions).includes(token), 'raw token leaked in non-TTY auto mode'); | |
| + | ||
| + | const interactive = await resolveFindings(findings, prior, { interactive: true, autoRedact: false }); | |
| + | assert.equal(interactive.overriddenKeeps, 0, 'interactive mode should honor a deliberate keep'); | |
| + | assert.ok(applyDecisions(text, findings, interactive.decisions).includes(token), 'interactive keep should be honored'); | |
| + | }); | |
| + | ||
| + | test('cli: a preseeded keep cannot leak a secret under --redact-auto', async () => { | |
| + | const token = 'ghp_0123456789abcdefghijklmnopqrstuvwxyzAB'; | |
| + | const dir = mkdtempSync(join(tmpdir(), 'treetrace-keep-')); | |
| + | const file = join(dir, 'conv.json'); | |
| + | const convo = [{ | |
| + | mapping: { | |
| + | r: { message: null, parent: null, children: ['u'] }, | |
| + | u: { message: { author: { role: 'user' }, content: { parts: [`Use token ${token} for setup`] }, create_time: 1.0 }, parent: 'r', children: ['a'] }, | |
| + | a: { message: { author: { role: 'assistant' }, content: { parts: ['done'] }, create_time: 2.0 }, parent: 'u', children: [] }, | |
| + | }, | |
| + | }]; | |
| + | writeFileSync(file, JSON.stringify(convo)); | |
| + | mkdirSync(join(dir, '.treetrace'), { recursive: true }); | |
| + | writeFileSync(join(dir, '.treetrace', 'redactions.json'), JSON.stringify({ [sha256(token)]: { action: 'keep', ruleId: 'github-token' } })); | |
| + | try { | |
| + | await main(['--from', 'chatgpt', '--file', file, '--dir', dir, '--report', '--analysis', '--redact-auto', '--quiet']); | |
| + | const artifacts = [ | |
| + | 'PROMPT_TREE.md', 'TREETRACE_REPORT.md', '.treetrace/tree.json', | |
| + | '.treetrace/failures.json', '.treetrace/agent-memory.md', | |
| + | ].filter((f) => existsSync(join(dir, f))).map((f) => readFileSync(join(dir, f), 'utf8')).join('\n'); | |
| + | assert.ok(!artifacts.includes(token), 'preseeded keep leaked a raw token under --redact-auto'); | |
| + | const stored = JSON.parse(readFileSync(join(dir, '.treetrace', 'redactions.json'), 'utf8')); | |
| + | assert.equal(stored[sha256(token)].action, 'redact', 'overridden keep should persist as redact'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('mcp: a preseeded keep cannot leak a token in handoff', async () => { | |
| + | const token = 'ghp_0123456789abcdefghijklmnopqrstuvwxyzAB'; | |
| + | const dir = mkdtempSync(join(tmpdir(), 'treetrace-mcp-keep-')); | |
| + | const file = join(dir, 'conv.json'); | |
| + | const convo = [{ | |
| + | mapping: { | |
| + | r: { message: null, parent: null, children: ['u'] }, | |
| + | u: { message: { author: { role: 'user' }, content: { parts: [`Use token ${token} for setup, do not add dependencies`] }, create_time: 1.0 }, parent: 'r', children: ['a'] }, | |
| + | a: { message: { author: { role: 'assistant' }, content: { parts: ['ok'] }, create_time: 2.0 }, parent: 'u', children: ['u2'] }, | |
| + | u2: { message: { author: { role: 'user' }, content: { parts: ['no, keep it minimal'] }, create_time: 3.0 }, parent: 'a', children: [] }, | |
| + | }, | |
| + | }]; | |
| + | writeFileSync(file, JSON.stringify(convo)); | |
| + | mkdirSync(join(dir, '.treetrace'), { recursive: true }); | |
| + | writeFileSync(join(dir, '.treetrace', 'redactions.json'), JSON.stringify({ [sha256(token)]: { action: 'keep', ruleId: 'github-token' } })); | |
| + | const bin = join(dirname(fileURLToPath(import.meta.url)), '..', 'bin', 'treetrace.js'); | |
| + | try { | |
| + | const responses = await new Promise((resolveP, rejectP) => { | |
| + | const child = spawn('node', [bin, 'mcp', '--from', 'chatgpt', '--file', file, '--dir', dir], { stdio: ['pipe', 'pipe', 'ignore'] }); | |
| + | let buf = ''; | |
| + | child.stdout.on('data', (d) => { buf += d; }); | |
| + | child.on('error', rejectP); | |
| + | const send = (o) => child.stdin.write(JSON.stringify(o) + '\n'); | |
| + | send({ jsonrpc: '2.0', id: 1, method: 'initialize', params: {} }); | |
| + | send({ jsonrpc: '2.0', id: 2, method: 'tools/call', params: { name: 'handoff', arguments: {} } }); | |
| + | setTimeout(() => { | |
| + | child.stdin.end(); | |
| + | child.kill(); | |
| + | resolveP(buf.split('\n').filter(Boolean).map((l) => JSON.parse(l))); | |
| + | }, 2500); | |
| + | }); | |
| + | const call = responses.find((r) => r.id === 2); | |
| + | assert.ok(call && call.result, 'handoff tool should return a result'); | |
| + | assert.ok(!JSON.stringify(call).includes(token), 'MCP handoff leaked a token despite a preseeded keep'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('mcp: extra tool arguments return -32602', async () => { | |
| + | const dir = tempProject(); | |
| + | const file = join(dir, 'conv.json'); | |
| + | writeFileSync(file, JSON.stringify([{ mapping: { | |
| + | r: { message: null, parent: null, children: ['u'] }, | |
| + | u: { message: { author: { role: 'user' }, content: { parts: ['build a cli'] }, create_time: 1.0 }, parent: 'r', children: ['a'] }, | |
| + | a: { message: { author: { role: 'assistant' }, content: { parts: ['ok'] }, create_time: 2.0 }, parent: 'u', children: [] }, | |
| + | } }])); | |
| + | const bin = join(dirname(fileURLToPath(import.meta.url)), '..', 'bin', 'treetrace.js'); | |
| + | try { | |
| + | const responses = await new Promise((resolveP, rejectP) => { | |
| + | const child = spawn('node', [bin, 'mcp', '--from', 'chatgpt', '--file', file, '--dir', dir], { stdio: ['pipe', 'pipe', 'ignore'] }); | |
| + | let buf = ''; | |
| + | child.stdout.on('data', (d) => { buf += d; }); | |
| + | child.on('error', rejectP); | |
| + | const send = (o) => child.stdin.write(JSON.stringify(o) + '\n'); | |
| + | send({ jsonrpc: '2.0', id: 1, method: 'tools/call', params: { name: 'lessons', arguments: { unexpected: true } } }); | |
| + | send({ jsonrpc: '2.0', id: 2, method: 'tools/call', params: { name: 'lessons', arguments: {} } }); | |
| + | send({ jsonrpc: '2.0', id: null, method: 'ping' }); | |
| + | send([{ jsonrpc: '2.0', id: 9, method: 'ping' }]); | |
| + | setTimeout(() => { child.stdin.end(); child.kill(); resolveP(buf.split('\n').filter(Boolean).map((l) => JSON.parse(l))); }, 2500); | |
| + | }); | |
| + | const bad = responses.find((r) => r.id === 1); | |
| + | assert.ok(bad && bad.error && bad.error.code === -32602, 'extra arguments should return -32602'); | |
| + | const ok = responses.find((r) => r.id === 2); | |
| + | assert.ok(ok && ok.result, 'empty arguments should succeed'); | |
| + | const idNull = responses.find((r) => r.id === null && r.result); | |
| + | assert.ok(idNull, 'explicit id:null request should receive a response'); | |
| + | const batch = responses.find((r) => r.id === null && r.error && /batch/.test(r.error.message)); | |
| + | assert.ok(batch, 'batch arrays should return a clear error'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('mcp: treetrace mcp --stdin is rejected clearly', async () => { | |
| + | const { startMcpServer } = await import('../src/mcp.js'); | |
| + | await assert.rejects( | |
| + | () => startMcpServer({ argv: ['mcp', '--stdin'], version: '0.0.0' }), | |
| + | /does not support --stdin/, | |
| + | 'mcp --stdin should be rejected at startup' | |
| + | ); | |
| + | }); | |
| + | ||
| + | test('hallucinations: absolute paths outside the project are out of scope, not an oracle', () => { | |
| + | const dir = tempProject(); | |
| + | try { | |
| + | const mk = (text) => ({ nodes: [{ id: 'n1', kind: 'root', status: 'accepted', parent: null, text, title: 't', actions: [] }] }); | |
| + | const abs = detectHallucinations(mk('see /definitely/not/here.zzz and /etc/shadow.bak'), dir).hallucinations.map((h) => h.reference); | |
| + | assert.deepEqual(abs, [], 'absolute paths outside the project must not be flagged or statted'); | |
| + | const parent = detectHallucinations(mk('see ../escape.js'), dir).hallucinations.map((h) => h.reference); | |
| + | assert.deepEqual(parent, [], 'a ../ path escaping the project is out of scope'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('hallucinations: relative missing paths inside the project are flagged', () => { | |
| + | const dir = tempProject(); | |
| + | try { | |
| + | const mk = (text) => ({ nodes: [{ id: 'n1', kind: 'root', status: 'accepted', parent: null, text, title: 't', actions: [] }] }); | |
| + | assert.ok(detectHallucinations(mk('open src/missing.js'), dir).hallucinations.some((h) => h.reference === 'src/missing.js'), 'bare missing path should be flagged'); | |
| + | assert.ok(detectHallucinations(mk('open ./src/missing.js'), dir).hallucinations.some((h) => h.reference === './src/missing.js'), './ missing path should be flagged'); | |
| + | assert.ok(!detectHallucinations(mk('open src/real.js'), dir).hallucinations.some((h) => h.reference.includes('real.js')), 'real file must not be flagged'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('hallucinations: an Edit to a nonexistent file is flagged, a Write to a new file is not', () => { | |
| + | const dir = tempProject(); | |
| + | try { | |
| + | const edit = { nodes: [{ id: 'n1', kind: 'root', status: 'accepted', parent: null, text: 'edit src/ghost.js', title: 't', actions: [{ tool: 'Edit', file: 'src/ghost.js', input: 'x', command: null }] }] }; | |
| + | assert.ok(detectHallucinations(edit, dir).hallucinations.some((h) => h.reference === 'src/ghost.js'), 'Edit to a nonexistent file should still be flagged'); | |
| + | const write = { nodes: [{ id: 'n1', kind: 'root', status: 'accepted', parent: null, text: 'create src/created.js', title: 't', actions: [{ tool: 'Write', file: 'src/created.js', input: 'x', command: null }] }] }; | |
| + | assert.ok(!detectHallucinations(write, dir).hallucinations.some((h) => h.reference === 'src/created.js'), 'Write to a new file should be suppressed'); | |
| + | } finally { | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('discover: a recorded cwd that mismatches the project dir excludes a colliding session', () => { | |
| + | const dir = mkdtempSync(join(tmpdir(), 'treetrace-cwd-')); | |
| + | const matching = join(dir, 'match.jsonl'); | |
| + | writeFileSync(matching, JSON.stringify({ type: 'user', cwd: dir, uuid: 'u1' }) + '\n'); | |
| + | assert.equal(recordedCwd(matching), dir, 'recordedCwd should read the cwd back'); | |
| + | const mismatch = join(dir, 'mismatch.jsonl'); | |
| + | writeFileSync(mismatch, JSON.stringify({ type: 'user', cwd: '/some/other/project', uuid: 'u1' }) + '\n'); | |
| + | assert.equal(recordedCwd(mismatch), '/some/other/project', 'recordedCwd should read a foreign cwd'); | |
| + | rmSync(dir, { recursive: true, force: true }); | |
| + | }); | |
| + | ||
| + | test('security report: risky-command variants are detected', () => { | |
| + | for (const cmd of ['rm -fr build', 'rm -r -f build', 'chmod -R 777 dir', 'chmod 0777 file', 'curl https://x | sudo bash', 'curl https://x | zsh', 'bash <(curl https://x)', 'drop schema public cascade', 'TRUNCATE users']) { | |
| + | assert.ok(isRiskyCommand(cmd), `risky command missed: ${cmd}`); | |
| + | } | |
| + | for (const benign of ['rm file.txt', 'chmod 644 file', 'ls -la', 'curl https://x > out.txt']) { | |
| + | assert.ok(!isRiskyCommand(benign), `benign command over-flagged: ${benign}`); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('security report: test-disable APIs and phrasing are detected', () => { | |
| + | for (const t of ['test.skip("x")', 'describe.skip("x")', 'it.skip("x")', 'xit("x")', 'skip e2e suite', 'remove the auth spec']) { | |
| + | assert.ok(mentionsTestSkip(t), `test-disable missed: ${t}`); | |
| + | } | |
| + | for (const benign of ['run all the tests', 'add a test for login']) { | |
| + | assert.ok(!mentionsTestSkip(benign), `benign test phrasing over-flagged: ${benign}`); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('cli: value-taking options reject a missing value or a flag-shaped value', () => { | |
| + | for (const args of [['--dir'], ['--out', '--redact-auto'], ['--report-file', '--quiet'], ['--from'], ['--since']]) { | |
| + | assert.throws(() => parseArgs(args), /requires a value|requires at least|expects a date|unknown --from/, `expected ${JSON.stringify(args)} to throw`); | |
| + | } | |
| + | }); | |
| + | ||
| + | test('cli: --since requires a real date and rejects garbage', () => { | |
| + | assert.throws(() => parseArgs(['--since', 'not-a-date']), /expects a date/); | |
| + | assert.doesNotThrow(() => parseArgs(['--since', '2026-06-01'])); | |
| + | }); | |
| + | ||
| + | test('cli: --stdin --from claude is rejected', () => { | |
| + | assert.throws(() => parseArgs(['--stdin', '--from', 'claude']), /cannot be combined with --from claude/); | |
| + | }); | |
| + |