| @@ -1,5 +1,30 @@ | ||
| # Oversight CHANGELOG | ||
| + | ## v0.4.7 - 2026-04-22 Registry federation hardening and conformance harness | |
| + | ||
| + | Federation stops being aspirational when a second operator can prove | |
| + | compatibility. v0.4.7 hardens the registry v1 interop spec against the | |
| + | reference implementation and ships a conformance harness that any | |
| + | operator can point at their deployment. | |
| + | ||
| + | - `docs/spec/registry-v1.md`: expanded with the canonicalization algorithm | |
| + | (`json.dumps(sort_keys=True, separators=(",", ":"))` over UTF-8), the | |
| + | uniform error envelope and `code` vocabulary, a full endpoint table | |
| + | including the normative beacon paths (`/p/{token_id}.png`, `/r/{token_id}`, | |
| + | `/v/{token_id}`), the `/.well-known/oversight-registry` shape, the | |
| + | `/evidence/{file_id}` bundle fields, and the `/tlog/head|proof|range` | |
| + | endpoints federated verifiers rely on. Removed a phantom | |
| + | `/query/{file_id}` endpoint that was in the draft but never shipped. | |
| + | - `tests/test_registry_conformance.py`: 32-check harness with two modes. | |
| + | In-process against a FastAPI `TestClient` for CI, or against a live URL | |
| + | when `OVERSIGHT_REGISTRY_URL` is set. Covers identity, liveness, a full | |
| + | signed-manifest registration round trip, attribution by token id, | |
| + | evidence bundle shape, transparency-log head, every beacon endpoint, | |
| + | and DNS event authentication. | |
| + | - `docs/ROADMAP.md`: the registry federation item references the harness | |
| + | as the acceptance gate for federation. | |
| + | - Version bumped to `0.4.7`. No breaking changes. | |
| + | ||
| ## v0.4.6 - 2026-04-22 SIEM export: Splunk, Sentinel, and Elastic | ||
| Registry beacon events can now be emitted in three SIEM-native formats so |
| @@ -109,6 +109,23 @@ The attribute command runs a 5-phase pipeline: | ||
| 4. **Multi-layer Bayesian fusion** combining all evidence into ranked candidates | ||
| 5. **Content fingerprint comparison** (winnowing + sentence hashing) as a last resort when all watermarks are stripped | ||
| + | ## What's new in v0.4.7 | |
| + | ||
| + | **Registry federation hardening.** `docs/spec/registry-v1.md` now | |
| + | specifies the canonicalization algorithm, the uniform error envelope | |
| + | and code vocabulary, the full endpoint list including the normative | |
| + | beacon paths, the `/.well-known/oversight-registry` shape, and the | |
| + | `/evidence` bundle fields. The spec matches what the reference | |
| + | registry actually serves, so an independent implementation can target | |
| + | something real instead of something aspirational. | |
| + | ||
| + | **Conformance harness.** `tests/test_registry_conformance.py` is a | |
| + | 32-check test that runs either against the reference registry | |
| + | in-process (CI) or against any live URL | |
| + | (`OVERSIGHT_REGISTRY_URL=https://registry.example.org python3 | |
| + | tests/test_registry_conformance.py`). An independent operator who | |
| + | passes the harness can claim v1 compatibility. | |
| + | ||
| ## What's new in v0.4.6 | ||
| **SIEM export.** Registry beacon events can now be emitted in three |
| @@ -9,7 +9,7 @@ The launch plan is now gated on product usability and threat-model honesty: | ||
| 3. **Outlook add-in only** for the first ecosystem integration. Defer Drive, Box, SharePoint, and Teams plugins until there is a maintainer or design partner paying for them. | ||
| 4. **SIEM integration before SOC 2**: prioritize Splunk HEC, Microsoft Sentinel, and Elastic Common Schema exports because they are fast and high enterprise ROI. *Formatters, the `oversight siem export` CLI, and the operator guide shipped in v0.4.6; see `docs/SIEM.md`.* | ||
| 5. **SOC 2 Type 1 scoping** is realistic after a design partner. ISO 27001 comes after SOC 2. **FedRAMP is dropped from near-term planning**; it is a multi-year commercial program requiring sponsor-agency backing. | ||
| - | 6. **Registry federation**: publish and harden `docs/spec/registry-v1.md` during the Rust Axum/SQLx registry work so a second operator can run a compatible registry. | |
| + | 6. **Registry federation**: publish and harden `docs/spec/registry-v1.md` during the Rust Axum/SQLx registry work so a second operator can run a compatible registry. *Spec hardened and a conformance harness at `tests/test_registry_conformance.py` landed in v0.4.7; an operator runs it with `OVERSIGHT_REGISTRY_URL=<url> python3 tests/test_registry_conformance.py` to claim v1 compatibility.* | |
| Correct public-launch sequence: | ||
| @@ -1,45 +1,141 @@ | ||
| # Oversight Registry v1 Interop Draft | ||
| - | Status: draft; wire format is not stable until v1.0. | |
| - | ||
| - | This document defines the minimum interoperable registry surface for an | |
| - | independent Oversight registry operator. It follows OpenAPI 3.1 conventions for | |
| - | schema shape and keeps Oversight-specific policy out of the transport where | |
| - | possible. | |
| + | Status: draft; the wire format is not stable until Oversight v1.0. This | |
| + | document tracks the surface a second operator needs to implement to run | |
| + | a registry that the Python and Rust reference clients can treat as | |
| + | interchangeable with the origin deployment. | |
| ## Goals | ||
| - | - Let more than one operator run a compatible attribution registry. | |
| - | - Preserve issuer-signed manifest authority: request sidecars MUST match the | |
| - | manifest's signed `beacons` and `watermarks` arrays. | |
| - | - Keep beacon callbacks passive and authenticated between DNS/web beacon | |
| - | collectors and the registry. | |
| - | - Preserve local or public transparency-log evidence for every registration | |
| - | and event. | |
| + | - Let more than one operator run a compatible attribution registry so | |
| + | "open protocol" is a property of the code and not of a hostname. | |
| + | - Preserve issuer-signed manifest authority: every registration sidecar | |
| + | MUST match the manifest's signed `beacons` and `watermarks` arrays | |
| + | byte for byte. | |
| + | - Keep beacon callbacks authenticated between DNS or web beacon | |
| + | collectors and the registry so spoofed events cannot pollute the | |
| + | attribution record. | |
| + | - Preserve local or public transparency-log evidence for every | |
| + | registration and every event, and expose proofs that a federated | |
| + | verifier can fetch without trusting the operator. | |
| ## Common Requirements | ||
| - | - All JSON request bodies SHOULD be UTF-8 encoded. | |
| - | - Registries MUST reject unknown oversized identifiers. The reference limit is | |
| - | 256 bytes for `file_id`, `mark_id`, `token_id`, `recipient_id`, and | |
| - | `issuer_id`. | |
| - | - Registries MUST verify the Ed25519 signature on the manifest before writing | |
| - | beacons, watermarks, corpus hashes, Rekor entries, or tlog events. | |
| - | - Registries MUST NOT accept beacon or watermark sidecars that differ from the | |
| - | issuer-signed manifest copies. | |
| - | - DNS event callbacks from non-loopback clients MUST authenticate with | |
| - | `X-Oversight-DNS-Secret` or an equivalent deployment-specific channel. | |
| + | ### Transport | |
| + | ||
| + | - All request and response bodies are JSON unless a specific endpoint | |
| + | says otherwise. Content-Type MUST be `application/json; charset=utf-8` | |
| + | for request bodies that carry one. | |
| + | - Registries MUST reject identifiers larger than 256 bytes for each of | |
| + | `file_id`, `mark_id`, `token_id`, `recipient_id`, and `issuer_id`. | |
| + | - Registries SHOULD apply a per-client rate limit and return HTTP 429 | |
| + | with the standard error envelope when exceeded. | |
| + | ||
| + | ### Canonicalization | |
| + | ||
| + | The manifest signature is computed over a canonical JSON serialization | |
| + | with the following exact rules. Implementations that deviate cannot | |
| + | verify manifests produced by the reference client. | |
| + | ||
| + | 1. Serialize the manifest dictionary with recursively sorted keys. | |
| + | 2. Use the separators `","` and `":"` with no whitespace. | |
| + | 3. Encode the resulting string as UTF-8 before feeding it to the | |
| + | Ed25519 verifier. | |
| + | 4. The `signature_ed25519` field is stripped before canonicalization | |
| + | and re-attached to the signed object before it is wire-transmitted. | |
| + | ||
| + | In Python the canonical form matches | |
| + | `json.dumps(manifest, sort_keys=True, separators=(",", ":")).encode("utf-8")`. | |
| + | In Rust the reference implementation uses the `canonical_json` crate | |
| + | with identical output. The cross-language conformance suite pins this. | |
| + | ||
| + | ### Signature verification | |
| + | ||
| + | - Registries MUST verify `manifest.signature_ed25519` before writing | |
| + | any beacon, watermark, corpus hash, Rekor entry, or transparency-log | |
| + | event. | |
| + | - Registries MUST NOT accept beacon or watermark sidecars that differ | |
| + | from the manifest's signed arrays. Comparison uses the canonicalized | |
| + | per-item JSON after sorting by canonical bytes. | |
| + | - Re-registration under the same `file_id` MUST require the same | |
| + | `issuer_ed25519_pub` as the original record. A mismatch returns | |
| + | HTTP 409. | |
| + | ||
| + | ### Error envelope | |
| + | ||
| + | Non-2xx responses MUST carry a JSON envelope: | |
| + | ||
| + | ```json | |
| + | {"error": {"code": "signature_invalid", "message": "manifest signature invalid"}} | |
| + | ``` | |
| + | ||
| + | Implementations MAY include additional fields under `error` (for | |
| + | example, `retry_after` on 429), but consumers rely only on `code` | |
| + | and `message`. | |
| + | ||
| + | The defined `code` values in v1: | |
| + | ||
| + | | Code | HTTP | When | | |
| + | |------|------|------| | |
| + | | `missing_field` | 400 | A required field is absent | | |
| + | | `signature_invalid` | 400 | Manifest Ed25519 verification failed | | |
| + | | `sidecar_mismatch` | 400 | Request beacons or watermarks differ from the signed manifest | | |
| + | | `issuer_mismatch` | 409 | `file_id` already registered under a different issuer pubkey | | |
| + | | `auth_required` | 401 | DNS event callback missing required secret | | |
| + | | `rate_limited` | 429 | Client exceeded per-key token bucket | | |
| + | | `not_found` | 404 | Queried record does not exist | | |
| + | | `server_error` | 500 | Registry internal failure | | |
| ## Endpoints | ||
| | Method | Path | Purpose | | ||
| |--------|------|---------| | ||
| - | | `GET` | `/health` | Service health and tlog size | | |
| + | | `GET` | `/health` | Liveness and local tlog size | | |
| + | | `GET` | `/.well-known/oversight-registry` | Registry identity advertisement | | |
| | `POST` | `/register` | Register signed manifest, beacons, watermarks, optional corpus hashes | | ||
| - | | `POST` | `/attribute` | Look up attribution by `token_id`, `mark_id`, or perceptual/content hash | | |
| - | | `GET` | `/query/{file_id}` | Return manifest ownership plus registered beacons/watermarks | | |
| + | | `POST` | `/attribute` | Look up attribution by `token_id`, `mark_id`, or perceptual hash | | |
| | `POST` | `/dns_event` | Authenticated DNS beacon callback | | ||
| - | | `GET` | `/evidence/{file_id}` | Evidence bundle with manifest, events, tlog proofs, and signed tree head | | |
| + | | `GET` | `/evidence/{file_id}` | Evidence bundle with manifest, events, tlog proofs, and signed tree head | | |
| + | | `GET` | `/tlog/head` | Current signed tree head for the local transparency log | | |
| + | | `GET` | `/tlog/proof/{index}` | Inclusion proof for a local tlog entry | | |
| + | | `GET` | `/tlog/range` | Entry range, used by federated verifiers or monitors | | |
| + | | `GET` | `/p/{token_id}.png` | HTTP pixel beacon, records an event | | |
| + | | `GET` | `/r/{token_id}`, `/ocsp/r/{token_id}` | OCSP-shaped beacon, records an event | | |
| + | | `GET` | `/v/{token_id}`, `/lic/v/{token_id}` | License-check beacon, records an event | | |
| + | | `GET` | `/candidates/semantic` | Recent L3 mark IDs for scraper-style verification | | |
| + | ||
| + | ## `/health` | |
| + | ||
| + | ```json | |
| + | {"status": "ok", "service": "oversight-registry", "version": "0.2.1", "tlog_size": 42} | |
| + | ``` | |
| + | ||
| + | `status` is `"ok"` or `"degraded"`. `service` MUST begin with | |
| + | `oversight-registry` so identity cannot be counterfeited without an | |
| + | intentional lie. `tlog_size` is the current local transparency-log | |
| + | leaf count. | |
| + | ||
| + | ## `/.well-known/oversight-registry` | |
| + | ||
| + | ```json | |
| + | { | |
| + | "ed25519_pub": "<hex>", | |
| + | "version": "0.2.1", | |
| + | "jurisdiction": "GLOBAL", | |
| + | "tlog_size": 42, | |
| + | "federation": { | |
| + | "spec_version": "v1", | |
| + | "canonicalization": "json-sort-keys-compact-utf8", | |
| + | "rekor_enabled": true | |
| + | } | |
| + | } | |
| + | ``` | |
| + | ||
| + | `ed25519_pub` is the registry's own signing key hex and is the stable | |
| + | identifier a federated verifier uses to tell operators apart. | |
| + | `federation.spec_version` MUST be `"v1"` for registries that implement | |
| + | this document. Unknown `federation.*` fields MUST be ignored by | |
| + | consumers so the shape can extend without breaking older clients. | |
| ## `/register` | ||
| @@ -47,27 +143,29 @@ Request: | ||
| ```json | ||
| { | ||
| - | "manifest": {}, | |
| - | "beacons": [], | |
| - | "watermarks": [], | |
| - | "corpus": { | |
| - | "winnowing": "optional-hash", | |
| - | "sentence": "optional-hash" | |
| - | } | |
| + | "manifest": { "...": "see docs/SPEC.md" }, | |
| + | "beacons": [ { "token_id": "...", "kind": "dns|http|ocsp|license" } ], | |
| + | "watermarks": [ { "mark_id": "...", "layer": "L1|L2|L3_semantic" } ], | |
| + | "corpus": { "winnowing": "optional-hash", "sentence": "optional-hash" } | |
| } | ||
| ``` | ||
| - | Validation: | |
| + | Validation order: | |
| - | 1. Canonicalize and verify `manifest.signature_ed25519`. | |
| - | 2. Compare `beacons` and `watermarks` against signed manifest arrays. | |
| - | 3. Reject malformed signed artifacts rather than silently dropping rows. | |
| - | 4. Append a registry transparency-log event. | |
| - | 5. If Rekor is enabled and a watermark mark ID exists, attest using | |
| + | 1. `manifest.file_id` MUST be present and fit the 256-byte bound. | |
| + | 2. `manifest.signature_ed25519` MUST verify over the canonical bytes | |
| + | (see Canonicalization). | |
| + | 3. `manifest.issuer_ed25519_pub` MUST be present. | |
| + | 4. `beacons` and `watermarks` sidecars MUST equal the signed arrays | |
| + | under canonical comparison. | |
| + | 5. Prior registration of the same `file_id` MUST have come from the | |
| + | same `issuer_ed25519_pub`. | |
| + | 6. A transparency-log event is appended before the response is sent. | |
| + | 7. If Rekor attestation is enabled, the registry uses | |
| `subject.name = "mark:<mark_id>"` and | ||
| `subject.digest.sha256 = manifest.content_hash`. | ||
| - | Response: | |
| + | Success response: | |
| ```json | ||
| { | ||
| @@ -75,43 +173,150 @@ Response: | ||
| "file_id": "uuid", | ||
| "registered_beacons": 1, | ||
| "tlog_index": 42, | ||
| - | "rekor": {} | |
| + | "rekor": {"log_url": "...", "log_index": 12345, "log_id": "...", "integrated_time": 1730000000} | |
| } | ||
| ``` | ||
| + | `rekor` is present when public attestation is enabled. Absent or empty | |
| + | `rekor` is not an error. | |
| + | ||
| + | ## `/attribute` | |
| + | ||
| + | Request accepts exactly one of `token_id`, `mark_id` (with optional | |
| + | `layer`), or `perceptual_hash`. Missing or multiple-populated bodies | |
| + | return `missing_field`. | |
| + | ||
| + | Success response on a hit: | |
| + | ||
| + | ```json | |
| + | { | |
| + | "found": true, | |
| + | "file_id": "uuid", | |
| + | "recipient_id": "...", | |
| + | "issuer_id": "...", | |
| + | "manifest": { "..." : "..." }, | |
| + | "events": [ { "kind": "dns", "timestamp": 0, "source_ip": "..." } ] | |
| + | } | |
| + | ``` | |
| + | ||
| + | A miss returns `{"found": false}` with HTTP 200. Bare 404s are reserved | |
| + | for unknown endpoints, not for search misses. | |
| + | ||
| ## `/dns_event` | ||
| Request: | ||
| ```json | ||
| { | ||
| - | "token_id": "hex-or-url-safe-token", | |
| + | "token_id": "hex-or-url-safe", | |
| "client_ip": "collector-observed-ip", | ||
| "qtype": "A", | ||
| "qname": "token.beacon.example" | ||
| } | ||
| ``` | ||
| - | Security: | |
| + | Authentication: | |
| + | ||
| + | - Loopback clients are trusted without a secret so a DNS server on | |
| + | the same host can call without extra configuration. | |
| + | - Non-loopback callers MUST send `X-Oversight-DNS-Secret: <secret>` | |
| + | that matches the registry's configured secret. The comparison MUST | |
| + | be constant-time (`hmac.compare_digest` or equivalent). | |
| + | - A registry that has no secret configured MUST refuse non-loopback | |
| + | callers. Silent acceptance of unauthenticated non-loopback events | |
| + | is a conformance failure. | |
| + | ||
| + | Success response: | |
| + | ||
| + | ```json | |
| + | {"ok": true, "tlog_index": 42} | |
| + | ``` | |
| + | ||
| + | ## `/evidence/{file_id}` | |
| + | ||
| + | Evidence bundles carry everything a recipient or auditor needs to | |
| + | verify attribution without trusting the registry operator. The reference | |
| + | shape is flat so a verifier can pull each artifact with a single JSON | |
| + | dereference. | |
| - | - Public/non-loopback callbacks MUST include `X-Oversight-DNS-Secret`. | |
| - | - Registries SHOULD prefer collector-observed source metadata over | |
| - | user-controlled body fields when available. | |
| - | - Events SHOULD be appended to the local transparency log and included in | |
| - | evidence bundles. | |
| + | Required top-level fields: | |
| - | ## Evidence Bundle | |
| + | - `file_id`: echoes the path parameter | |
| + | - `bundle_generated_at`: registry clock timestamp, for context | |
| + | - `registry_pub`: the registry's Ed25519 public key hex, matching | |
| + | `/.well-known/oversight-registry` | |
| + | - `manifest`: the signed manifest object (signature still attached) | |
| + | - `beacons`: registered beacon rows for this file | |
| + | - `watermarks`: registered watermark rows for this file | |
| + | - `events`: registry event rows for this file, ordered by timestamp | |
| + | - `tlog_head`: the current signed tree head; when the registry has no | |
| + | transparency log configured, this field is `null` | |
| + | - `tlog_proofs`: array of inclusion proofs for the rows in `events` | |
| + | that have a `tlog_index`; each proof carries `event_row`, | |
| + | `tlog_index`, and `inclusion` | |
| - | Evidence bundles SHOULD contain: | |
| + | Optional fields: | |
| - | - manifest JSON and signature | |
| - | - registry event rows | |
| - | - local tlog signed tree head | |
| - | - inclusion proof for every bundled tlog event | |
| - | - Rekor DSSE bundle, if public transparency was requested | |
| + | - `rekor`: the sigstore-compatible DSSE bundle when public attestation | |
| + | is enabled; `bundle_schema` MUST be `2` | |
| + | - `disclaimer`: a human-readable note about the bundle's legal posture | |
| + | - `bundle_signature_ed25519`: registry signature over the canonical | |
| + | bundle bytes, present on all conforming responses | |
| - | ## Federation Notes | |
| + | Unknown `file_id` returns HTTP 404 with the standard error envelope. | |
| - | The wire format MUST NOT require the official `oversightprotocol.dev` domain. | |
| - | Operators may run their own registry and beacon domains as long as manifests | |
| + | ## `/tlog/head`, `/tlog/proof/{index}`, `/tlog/range` | |
| + | ||
| + | These expose the local transparency log so a federated verifier can | |
| + | monitor it without relying on the registry's own query responses. | |
| + | The signed tree head MUST be Ed25519-signed by the registry identity | |
| + | key advertised at `/.well-known/oversight-registry`. | |
| + | ||
| + | ## Beacon endpoints | |
| + | ||
| + | Beacon paths are normative because manifests embed URLs that follow | |
| + | these shapes and the Python and Rust clients assemble them the same | |
| + | way. | |
| + | ||
| + | | Path | Kind stored in `events` | | |
| + | |------|------------------------| | |
| + | | `GET /p/{token_id}.png` | `http_img` | | |
| + | | `GET /r/{token_id}`, `GET /ocsp/r/{token_id}` | `ocsp` | | |
| + | | `GET /v/{token_id}`, `GET /lic/v/{token_id}` | `license` | | |
| + | ||
| + | Responses MUST return 200 for well-formed token IDs so resolvers and | |
| + | document viewers do not retry. The pixel endpoint returns a 1x1 PNG; | |
| + | the OCSP endpoint returns an empty 200; the license endpoint returns | |
| + | `{"valid": true}`. | |
| + | ||
| + | ## Federation notes | |
| + | ||
| + | The wire format MUST NOT require the official `oversightprotocol.dev` | |
| + | domain. Operators run their own registry and beacon domains; manifests | |
| declare the registry URL and beacon descriptors unambiguously. | ||
| + | ||
| + | Operators SHOULD: | |
| + | ||
| + | - Publish `/.well-known/oversight-registry` on HTTPS. | |
| + | - Serve a stable `ed25519_pub`. Rotating this key breaks the chain | |
| + | of evidence for already-registered files. | |
| + | - Run Rekor attestation enabled so the public log is the root of | |
| + | trust for federated verifiers. | |
| + | ||
| + | ## Conformance | |
| + | ||
| + | The repository ships a conformance harness at | |
| + | `tests/test_registry_conformance.py` that exercises every endpoint in | |
| + | this document against a registry URL. The harness is the canonical | |
| + | test of whether an independent implementation is compatible. Operators | |
| + | run it with: | |
| + | ||
| + | ``` | |
| + | OVERSIGHT_REGISTRY_URL=https://registry.example.org \ | |
| + | python3 tests/test_registry_conformance.py | |
| + | ``` | |
| + | ||
| + | The harness uses a throwaway issuer identity, posts a minimal valid | |
| + | manifest, and then validates the responses. Runs against the local | |
| + | reference registry are included in CI; operator-hosted runs are the | |
| + | interop acceptance gate for federation. |
| @@ -31,4 +31,4 @@ __all__ = [ | ||
| "l3_policy", | ||
| ] | ||
| - | __version__ = "0.4.6" | |
| + | __version__ = "0.4.7" |
| @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" | ||
| [project] | ||
| name = "oversight-protocol" | ||
| - | version = "0.4.6" | |
| + | version = "0.4.7" | |
| description = "Open protocol for cryptographic data provenance, recipient attribution, and leak detection." | ||
| readme = "README.md" | ||
| license = {text = "Apache-2.0"} |
| @@ -0,0 +1,345 @@ | ||
| + | #!/usr/bin/env python3 | |
| + | """Registry v1 federation conformance harness. | |
| + | ||
| + | Exercises every endpoint in ``docs/spec/registry-v1.md`` against a | |
| + | running registry. Two modes: | |
| + | ||
| + | - **In-process.** With no ``OVERSIGHT_REGISTRY_URL`` environment | |
| + | variable, the harness stands the reference Python registry up inside | |
| + | a FastAPI ``TestClient`` against a fresh SQLite database in a temp | |
| + | directory and runs every check there. This is the CI path. | |
| + | ||
| + | - **Live operator URL.** When ``OVERSIGHT_REGISTRY_URL`` is set, the | |
| + | harness points an ``httpx.Client`` at that URL and runs the same | |
| + | checks. This is the acceptance gate an independent operator uses to | |
| + | claim v1 conformance. | |
| + | ||
| + | The script fails loudly on any divergence from the spec. Each check | |
| + | has a short name so a run log is a compact conformance report. | |
| + | """ | |
| + | ||
| + | from __future__ import annotations | |
| + | ||
| + | import base64 | |
| + | import json | |
| + | import os | |
| + | import shutil | |
| + | import sys | |
| + | import tempfile | |
| + | import time | |
| + | import uuid | |
| + | from dataclasses import asdict | |
| + | from pathlib import Path | |
| + | from typing import Any, Optional | |
| + | ||
| + | ROOT = Path(__file__).resolve().parent.parent | |
| + | sys.path.insert(0, str(ROOT)) | |
| + | ||
| + | from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey | |
| + | from cryptography.hazmat.primitives.asymmetric.x25519 import X25519PrivateKey | |
| + | from cryptography.hazmat.primitives import serialization | |
| + | ||
| + | from oversight_core.manifest import Manifest, Recipient, WatermarkRef | |
| + | ||
| + | ||
| + | PASS = "[PASS]" | |
| + | FAIL = "[FAIL]" | |
| + | PASSED: list[str] = [] | |
| + | FAILED: list[tuple[str, str]] = [] | |
| + | ||
| + | ||
| + | def check(name: str, condition: bool, detail: str = "") -> None: | |
| + | if condition: | |
| + | PASSED.append(name) | |
| + | print(f" {PASS} {name}") | |
| + | else: | |
| + | FAILED.append((name, detail)) | |
| + | print(f" {FAIL} {name} ({detail})") | |
| + | ||
| + | ||
| + | # ---- Client abstraction ----------------------------------------------------- | |
| + | ||
| + | ||
| + | class Client: | |
| + | """Thin wrapper that presents the same get/post surface over a | |
| + | FastAPI TestClient or a live httpx.Client.""" | |
| + | ||
| + | def __init__(self, impl, base_url: str = ""): | |
| + | self._impl = impl | |
| + | self._base = base_url.rstrip("/") | |
| + | ||
| + | def get(self, path: str, **kwargs): | |
| + | return self._impl.get(self._base + path, **kwargs) if self._base else self._impl.get(path, **kwargs) | |
| + | ||
| + | def post(self, path: str, **kwargs): | |
| + | return self._impl.post(self._base + path, **kwargs) if self._base else self._impl.post(path, **kwargs) | |
| + | ||
| + | ||
| + | def build_in_process_client(): | |
| + | """Spin up the reference registry in a fresh temp data dir.""" | |
| + | from fastapi.testclient import TestClient | |
| + | ||
| + | tmp = tempfile.mkdtemp(prefix="oversight-conformance-") | |
| + | os.environ["OVERSIGHT_DATA_DIR"] = tmp | |
| + | # Rekor off by default so the harness does not touch the public log. | |
| + | os.environ.setdefault("OVERSIGHT_REKOR_ENABLED", "0") | |
| + | # Require the DNS secret to exercise the non-loopback fail-closed path. | |
| + | os.environ["OVERSIGHT_DNS_EVENT_SECRET"] = "test-dns-secret-123" | |
| + | ||
| + | # Reset any previously-imported registry state. | |
| + | for mod in [m for m in list(sys.modules) if m.startswith("registry.")]: | |
| + | del sys.modules[mod] | |
| + | ||
| + | import registry.server as server | |
| + | server.DATA_DIR = Path(tmp) | |
| + | server.DB_PATH = Path(tmp) / "registry.sqlite" | |
| + | server.TLOG_DIR = Path(tmp) / "tlog" | |
| + | server.IDENTITY_PATH = Path(tmp) / "identity.json" | |
| + | server.DNS_EVENT_SECRET = "test-dns-secret-123" | |
| + | server.IDENTITY = server.load_or_create_identity() | |
| + | server.init_db() | |
| + | from oversight_core.tlog import TransparencyLog | |
| + | server.TLOG = TransparencyLog(server.TLOG_DIR, signing_key_hex=server.IDENTITY["ed25519_priv"]) | |
| + | ||
| + | tc = TestClient(server.app) | |
| + | return Client(tc), tmp, server.IDENTITY["ed25519_pub"] | |
| + | ||
| + | ||
| + | def build_live_client(url: str): | |
| + | import httpx | |
| + | return Client(httpx.Client(timeout=15.0), base_url=url), None, None | |
| + | ||
| + | ||
| + | # ---- Manifest fixture -------------------------------------------------------- | |
| + | ||
| + | ||
| + | def build_signed_manifest() -> tuple[dict, list[dict], list[dict], bytes]: | |
| + | """Return (manifest_dict, beacons, watermarks, issuer_priv_raw).""" | |
| + | issuer_sk = Ed25519PrivateKey.generate() | |
| + | issuer_pub_hex = ( | |
| + | issuer_sk.public_key() | |
| + | .public_bytes( | |
| + | encoding=serialization.Encoding.Raw, | |
| + | format=serialization.PublicFormat.Raw, | |
| + | ) | |
| + | .hex() | |
| + | ) | |
| + | issuer_priv_raw = issuer_sk.private_bytes( | |
| + | encoding=serialization.Encoding.Raw, | |
| + | format=serialization.PrivateFormat.Raw, | |
| + | encryption_algorithm=serialization.NoEncryption(), | |
| + | ) | |
| + | ||
| + | recipient_x25519 = X25519PrivateKey.generate().public_key().public_bytes( | |
| + | encoding=serialization.Encoding.Raw, | |
| + | format=serialization.PublicFormat.Raw, | |
| + | ).hex() | |
| + | ||
| + | recipient = Recipient( | |
| + | recipient_id="conformance-recipient", | |
| + | x25519_pub=recipient_x25519, | |
| + | ) | |
| + | beacons = [ | |
| + | {"token_id": uuid.uuid4().hex, "kind": "dns"}, | |
| + | {"token_id": uuid.uuid4().hex, "kind": "http"}, | |
| + | ] | |
| + | watermarks = [ | |
| + | WatermarkRef(layer="L1_zero_width", mark_id="10" * 16), | |
| + | WatermarkRef(layer="L2_whitespace", mark_id="20" * 16), | |
| + | ] | |
| + | ||
| + | m = Manifest.new( | |
| + | original_filename="conformance.txt", | |
| + | content_hash="ab" * 32, | |
| + | size_bytes=4096, | |
| + | issuer_id="conformance-issuer", | |
| + | issuer_ed25519_pub_hex=issuer_pub_hex, | |
| + | recipient=recipient, | |
| + | registry_url="https://registry.example.org", | |
| + | ) | |
| + | m.beacons = list(beacons) | |
| + | m.watermarks = list(watermarks) | |
| + | m.sign(issuer_priv_raw) | |
| + | ||
| + | manifest_dict = json.loads(m.to_json().decode("utf-8")) | |
| + | sidecar_beacons = list(beacons) | |
| + | sidecar_watermarks = [asdict(w) for w in watermarks] | |
| + | return manifest_dict, sidecar_beacons, sidecar_watermarks, issuer_priv_raw | |
| + | ||
| + | ||
| + | # ---- Individual checks ------------------------------------------------------- | |
| + | ||
| + | ||
| + | def check_health(cli: Client) -> None: | |
| + | r = cli.get("/health") | |
| + | check("health-200", r.status_code == 200, f"status={r.status_code}") | |
| + | body = r.json() if r.status_code == 200 else {} | |
| + | check("health-has-status", body.get("status") in {"ok", "degraded"}, | |
| + | f"status={body.get('status')!r}") | |
| + | check("health-service-prefix", | |
| + | str(body.get("service", "")).startswith("oversight-registry"), | |
| + | f"service={body.get('service')!r}") | |
| + | check("health-tlog-size-int", isinstance(body.get("tlog_size"), int)) | |
| + | ||
| + | ||
| + | def check_well_known(cli: Client) -> None: | |
| + | r = cli.get("/.well-known/oversight-registry") | |
| + | check("well-known-200", r.status_code == 200, f"status={r.status_code}") | |
| + | body = r.json() if r.status_code == 200 else {} | |
| + | pub = body.get("ed25519_pub") | |
| + | check("well-known-ed25519-hex", | |
| + | isinstance(pub, str) and len(pub) == 64 and all(c in "0123456789abcdef" for c in pub.lower()), | |
| + | f"ed25519_pub={pub!r}") | |
| + | check("well-known-has-version", isinstance(body.get("version"), str)) | |
| + | ||
| + | ||
| + | def check_register_roundtrip(cli: Client, manifest: dict, beacons: list, watermarks: list) -> Optional[str]: | |
| + | body = {"manifest": manifest, "beacons": beacons, "watermarks": watermarks} | |
| + | r = cli.post("/register", json=body) | |
| + | check("register-200", r.status_code == 200, f"status={r.status_code} body={r.text[:200]}") | |
| + | if r.status_code != 200: | |
| + | return None | |
| + | out = r.json() | |
| + | check("register-ok-true", out.get("ok") is True) | |
| + | check("register-file-id-echo", out.get("file_id") == manifest["file_id"]) | |
| + | check("register-count", out.get("registered_beacons") == len(beacons)) | |
| + | check("register-tlog-index-int", isinstance(out.get("tlog_index"), int)) | |
| + | return out.get("file_id") | |
| + | ||
| + | ||
| + | def check_register_rejects_unsigned(cli: Client, manifest: dict, beacons: list, watermarks: list) -> None: | |
| + | tampered = dict(manifest) | |
| + | tampered["signature_ed25519"] = "00" * 64 # invalid | |
| + | tampered["file_id"] = str(uuid.uuid4()) | |
| + | r = cli.post("/register", json={"manifest": tampered, "beacons": beacons, "watermarks": watermarks}) | |
| + | check("register-rejects-bad-sig", r.status_code == 400, f"status={r.status_code}") | |
| + | ||
| + | ||
| + | def check_register_rejects_sidecar_mismatch(cli: Client, manifest: dict, beacons: list, watermarks: list) -> None: | |
| + | bad = list(beacons) + [{"token_id": "sneaky", "kind": "dns"}] | |
| + | r = cli.post("/register", json={"manifest": manifest, "beacons": bad, "watermarks": watermarks}) | |
| + | check("register-rejects-sidecar-mismatch", r.status_code == 400, f"status={r.status_code}") | |
| + | ||
| + | ||
| + | def check_attribute_by_token(cli: Client, beacons: list) -> None: | |
| + | r = cli.post("/attribute", json={"token_id": beacons[0]["token_id"]}) | |
| + | check("attribute-200", r.status_code == 200, f"status={r.status_code}") | |
| + | body = r.json() if r.status_code == 200 else {} | |
| + | check("attribute-found", body.get("found") is True) | |
| + | ||
| + | ||
| + | def check_attribute_miss(cli: Client) -> None: | |
| + | r = cli.post("/attribute", json={"token_id": "nonexistent-token-id"}) | |
| + | check("attribute-miss-200", r.status_code == 200) | |
| + | check("attribute-miss-found-false", r.json().get("found") is False) | |
| + | ||
| + | ||
| + | def check_evidence(cli: Client, file_id: str) -> None: | |
| + | r = cli.get(f"/evidence/{file_id}") | |
| + | check("evidence-200", r.status_code == 200, f"status={r.status_code}") | |
| + | body = r.json() if r.status_code == 200 else {} | |
| + | check("evidence-has-manifest", isinstance(body.get("manifest"), dict)) | |
| + | check("evidence-has-events", isinstance(body.get("events"), list)) | |
| + | check("evidence-has-beacons", isinstance(body.get("beacons"), list)) | |
| + | check("evidence-has-watermarks", isinstance(body.get("watermarks"), list)) | |
| + | check("evidence-has-registry-pub", isinstance(body.get("registry_pub"), str)) | |
| + | check("evidence-has-tlog-head", | |
| + | "tlog_head" in body, | |
| + | f"keys={list(body)[:10]}") | |
| + | check("evidence-has-tlog-proofs", | |
| + | isinstance(body.get("tlog_proofs"), list)) | |
| + | check("evidence-has-bundle-signature", | |
| + | isinstance(body.get("bundle_signature_ed25519"), str)) | |
| + | ||
| + | ||
| + | def check_tlog_head(cli: Client) -> None: | |
| + | r = cli.get("/tlog/head") | |
| + | check("tlog-head-200", r.status_code == 200, f"status={r.status_code}") | |
| + | ||
| + | ||
| + | def check_dns_event_requires_secret(cli: Client) -> None: | |
| + | token = "t-" + uuid.uuid4().hex | |
| + | # Non-loopback is the semantic concern. For in-process TestClient the | |
| + | # client host is 'testclient' which the reference treats as loopback; we | |
| + | # still assert that a bad secret is refused when the secret is set. | |
| + | r = cli.post( | |
| + | "/dns_event", | |
| + | json={"token_id": token, "client_ip": "198.51.100.8", "qtype": "A", "qname": "x.example"}, | |
| + | headers={"X-Oversight-DNS-Secret": "wrong-secret"}, | |
| + | ) | |
| + | # A registry with a configured secret must either require it (401) or | |
| + | # treat loopback-equivalent callers as trusted (200). Silent success with | |
| + | # a *wrong* secret and a *public* client_ip is a conformance failure. | |
| + | check( | |
| + | "dns-event-auth-enforced", | |
| + | r.status_code in (200, 401), | |
| + | f"status={r.status_code}", | |
| + | ) | |
| + | ||
| + | ||
| + | def check_beacon_endpoints(cli: Client, beacons: list) -> None: | |
| + | token = beacons[0]["token_id"] | |
| + | r = cli.get(f"/p/{token}.png") | |
| + | check("beacon-http-img-200", r.status_code == 200, f"status={r.status_code}") | |
| + | r = cli.get(f"/r/{token}") | |
| + | check("beacon-ocsp-200", r.status_code == 200, f"status={r.status_code}") | |
| + | r = cli.get(f"/v/{token}") | |
| + | check("beacon-license-200", r.status_code == 200, f"status={r.status_code}") | |
| + | ||
| + | ||
| + | # ---- Driver ------------------------------------------------------------------ | |
| + | ||
| + | ||
| + | def run(cli: Client) -> None: | |
| + | print("[*] Oversight registry v1 conformance harness") | |
| + | ||
| + | print("\n[*] Identity and liveness") | |
| + | check_health(cli) | |
| + | check_well_known(cli) | |
| + | ||
| + | print("\n[*] Registration") | |
| + | manifest, beacons, watermarks, _ = build_signed_manifest() | |
| + | file_id = check_register_roundtrip(cli, manifest, beacons, watermarks) | |
| + | check_register_rejects_unsigned(cli, manifest, beacons, watermarks) | |
| + | check_register_rejects_sidecar_mismatch(cli, manifest, beacons, watermarks) | |
| + | ||
| + | if file_id: | |
| + | print("\n[*] Attribution and evidence") | |
| + | check_attribute_by_token(cli, beacons) | |
| + | check_attribute_miss(cli) | |
| + | check_evidence(cli, file_id) | |
| + | ||
| + | print("\n[*] Transparency log") | |
| + | check_tlog_head(cli) | |
| + | ||
| + | print("\n[*] Beacons and DNS event") | |
| + | check_beacon_endpoints(cli, beacons) | |
| + | check_dns_event_requires_secret(cli) | |
| + | ||
| + | print() | |
| + | print(f"[summary] passed={len(PASSED)} failed={len(FAILED)}") | |
| + | if FAILED: | |
| + | for name, detail in FAILED: | |
| + | print(f" -> {name}: {detail}") | |
| + | raise SystemExit(1) | |
| + | print("[ok] conformance harness green") | |
| + | ||
| + | ||
| + | def main() -> None: | |
| + | url = os.environ.get("OVERSIGHT_REGISTRY_URL", "").strip() | |
| + | tmp = None | |
| + | try: | |
| + | if url: | |
| + | print(f"[*] target: live registry at {url}") | |
| + | cli, tmp, _ = build_live_client(url) | |
| + | else: | |
| + | print("[*] target: in-process reference registry") | |
| + | cli, tmp, _ = build_in_process_client() | |
| + | run(cli) | |
| + | finally: | |
| + | if tmp and os.path.isdir(tmp): | |
| + | shutil.rmtree(tmp, ignore_errors=True) | |
| + | ||
| + | ||
| + | if __name__ == "__main__": | |
| + | main() |