chore(deps): update Python version and project documentation

feat(benchmark): add multi-object retrieval benchmark with SAM segmentation
feat(serena): Add Serena project configuration
2026-03-17 14:35:32 +08:00 · 2026-03-12 12:53:13 +08:00 · 2026-03-12 12:52:51 +08:00 · 2026-03-08 15:19:26 +08:00 · 2026-03-07 22:55:13 +08:00 · 2026-03-07 15:45:28 +08:00
40 changed files with 4298 additions and 1740 deletions
--- a/.claude/commands/opsx/apply.md
+++ b/.claude/commands/opsx/apply.md
@@ -0,0 +1,152 @@
+---
+name: "OPSX: Apply"
+description: Implement tasks from an OpenSpec change (Experimental)
+category: Workflow
+tags: [workflow, artifacts, experimental]
+---
+
+Implement tasks from an OpenSpec change.
+
+**Input**: Optionally specify a change name (e.g., `/opsx:apply add-auth`). If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **Select the change**
+
+   If a name is provided, use it. Otherwise:
+   - Infer from conversation context if the user mentioned a change
+   - Auto-select if only one active change exists
+   - If ambiguous, run `openspec list --json` to get available changes and use the **AskUserQuestion tool** to let the user select
+
+   Always announce: "Using change: <name>" and how to override (e.g., `/opsx:apply <other>`).
+
+2. **Check status to understand the schema**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used (e.g., "spec-driven")
+   - Which artifact contains the tasks (typically "tasks" for spec-driven, check status for others)
+
+3. **Get apply instructions**
+
+   ```bash
+   openspec instructions apply --change "<name>" --json
+   ```
+
+   This returns:
+   - Context file paths (varies by schema)
+   - Progress (total, complete, remaining)
+   - Task list with status
+   - Dynamic instruction based on current state
+
+   **Handle states:**
+   - If `state: "blocked"` (missing artifacts): show message, suggest using `/opsx:continue`
+   - If `state: "all_done"`: congratulate, suggest archive
+   - Otherwise: proceed to implementation
+
+4. **Read context files**
+
+   Read the files listed in `contextFiles` from the apply instructions output.
+   The files depend on the schema being used:
+   - **spec-driven**: proposal, specs, design, tasks
+   - Other schemas: follow the contextFiles from CLI output
+
+5. **Show current progress**
+
+   Display:
+   - Schema being used
+   - Progress: "N/M tasks complete"
+   - Remaining tasks overview
+   - Dynamic instruction from CLI
+
+6. **Implement tasks (loop until done or blocked)**
+
+   For each pending task:
+   - Show which task is being worked on
+   - Make the code changes required
+   - Keep changes minimal and focused
+   - Mark task complete in the tasks file: `- [ ]` → `- [x]`
+   - Continue to next task
+
+   **Pause if:**
+   - Task is unclear → ask for clarification
+   - Implementation reveals a design issue → suggest updating artifacts
+   - Error or blocker encountered → report and wait for guidance
+   - User interrupts
+
+7. **On completion or pause, show status**
+
+   Display:
+   - Tasks completed this session
+   - Overall progress: "N/M tasks complete"
+   - If all done: suggest archive
+   - If paused: explain why and wait for guidance
+
+**Output During Implementation**
+
+```
+## Implementing: <change-name> (schema: <schema-name>)
+
+Working on task 3/7: <task description>
+[...implementation happening...]
+✓ Task complete
+
+Working on task 4/7: <task description>
+[...implementation happening...]
+✓ Task complete
+```
+
+**Output On Completion**
+
+```
+## Implementation Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 7/7 tasks complete ✓
+
+### Completed This Session
+- [x] Task 1
+- [x] Task 2
+...
+
+All tasks complete! You can archive this change with `/opsx:archive`.
+```
+
+**Output On Pause (Issue Encountered)**
+
+```
+## Implementation Paused
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 4/7 tasks complete
+
+### Issue Encountered
+<description of the issue>
+
+**Options:**
+1. <option 1>
+2. <option 2>
+3. Other approach
+
+What would you like to do?
+```
+
+**Guardrails**
+- Keep going through tasks until done or blocked
+- Always read context files before starting (from the apply instructions output)
+- If task is ambiguous, pause and ask before implementing
+- If implementation reveals issues, pause and suggest artifact updates
+- Keep code changes minimal and scoped to each task
+- Update task checkbox immediately after completing each task
+- Pause on errors, blockers, or unclear requirements - don't guess
+- Use contextFiles from CLI output, don't assume specific file names
+
+**Fluid Workflow Integration**
+
+This skill supports the "actions on a change" model:
+
+- **Can be invoked anytime**: Before all artifacts are done (if tasks exist), after partial implementation, interleaved with other actions
+- **Allows artifact updates**: If implementation reveals design issues, suggest updating artifacts - not phase-locked, work fluidly
--- a/.claude/commands/opsx/archive.md
+++ b/.claude/commands/opsx/archive.md
@@ -0,0 +1,157 @@
+---
+name: "OPSX: Archive"
+description: Archive a completed change in the experimental workflow
+category: Workflow
+tags: [workflow, archive, experimental]
+---
+
+Archive a completed change in the experimental workflow.
+
+**Input**: Optionally specify a change name after `/opsx:archive` (e.g., `/opsx:archive add-auth`). If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **If no change name provided, prompt for selection**
+
+   Run `openspec list --json` to get available changes. Use the **AskUserQuestion tool** to let the user select.
+
+   Show only active changes (not already archived).
+   Include the schema used for each change if available.
+
+   **IMPORTANT**: Do NOT guess or auto-select a change. Always let the user choose.
+
+2. **Check artifact completion status**
+
+   Run `openspec status --change "<name>" --json` to check artifact completion.
+
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used
+   - `artifacts`: List of artifacts with their status (`done` or other)
+
+   **If any artifacts are not `done`:**
+   - Display warning listing incomplete artifacts
+   - Prompt user for confirmation to continue
+   - Proceed if user confirms
+
+3. **Check task completion status**
+
+   Read the tasks file (typically `tasks.md`) to check for incomplete tasks.
+
+   Count tasks marked with `- [ ]` (incomplete) vs `- [x]` (complete).
+
+   **If incomplete tasks found:**
+   - Display warning showing count of incomplete tasks
+   - Prompt user for confirmation to continue
+   - Proceed if user confirms
+
+   **If no tasks file exists:** Proceed without task-related warning.
+
+4. **Assess delta spec sync state**
+
+   Check for delta specs at `openspec/changes/<name>/specs/`. If none exist, proceed without sync prompt.
+
+   **If delta specs exist:**
+   - Compare each delta spec with its corresponding main spec at `openspec/specs/<capability>/spec.md`
+   - Determine what changes would be applied (adds, modifications, removals, renames)
+   - Show a combined summary before prompting
+
+   **Prompt options:**
+   - If changes needed: "Sync now (recommended)", "Archive without syncing"
+   - If already synced: "Archive now", "Sync anyway", "Cancel"
+
+   If user chooses sync, use Task tool (subagent_type: "general-purpose", prompt: "Use Skill tool to invoke openspec-sync-specs for change '<name>'. Delta spec analysis: <include the analyzed delta spec summary>"). Proceed to archive regardless of choice.
+
+5. **Perform the archive**
+
+   Create the archive directory if it doesn't exist:
+   ```bash
+   mkdir -p openspec/changes/archive
+   ```
+
+   Generate target name using current date: `YYYY-MM-DD-<change-name>`
+
+   **Check if target already exists:**
+   - If yes: Fail with error, suggest renaming existing archive or using different date
+   - If no: Move the change directory to archive
+
+   ```bash
+   mv openspec/changes/<name> openspec/changes/archive/YYYY-MM-DD-<name>
+   ```
+
+6. **Display summary**
+
+   Show archive completion summary including:
+   - Change name
+   - Schema that was used
+   - Archive location
+   - Spec sync status (synced / sync skipped / no delta specs)
+   - Note about any warnings (incomplete artifacts/tasks)
+
+**Output On Success**
+
+```
+## Archive Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Archived to:** openspec/changes/archive/YYYY-MM-DD-<name>/
+**Specs:** ✓ Synced to main specs
+
+All artifacts complete. All tasks complete.
+```
+
+**Output On Success (No Delta Specs)**
+
+```
+## Archive Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Archived to:** openspec/changes/archive/YYYY-MM-DD-<name>/
+**Specs:** No delta specs
+
+All artifacts complete. All tasks complete.
+```
+
+**Output On Success With Warnings**
+
+```
+## Archive Complete (with warnings)
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Archived to:** openspec/changes/archive/YYYY-MM-DD-<name>/
+**Specs:** Sync skipped (user chose to skip)
+
+**Warnings:**
+- Archived with 2 incomplete artifacts
+- Archived with 3 incomplete tasks
+- Delta spec sync was skipped (user chose to skip)
+
+Review the archive if this was not intentional.
+```
+
+**Output On Error (Archive Exists)**
+
+```
+## Archive Failed
+
+**Change:** <change-name>
+**Target:** openspec/changes/archive/YYYY-MM-DD-<name>/
+
+Target archive directory already exists.
+
+**Options:**
+1. Rename the existing archive
+2. Delete the existing archive if it's a duplicate
+3. Wait until a different date to archive
+```
+
+**Guardrails**
+- Always prompt for change selection if not provided
+- Use artifact graph (openspec status --json) for completion checking
+- Don't block archive on warnings - just inform and confirm
+- Preserve .openspec.yaml when moving to archive (it moves with the directory)
+- Show clear summary of what happened
+- If sync is requested, use the Skill tool to invoke `openspec-sync-specs` (agent-driven)
+- If delta specs exist, always run the sync assessment and show the combined summary before prompting
--- a/.claude/commands/opsx/explore.md
+++ b/.claude/commands/opsx/explore.md
@@ -0,0 +1,173 @@
+---
+name: "OPSX: Explore"
+description: "Enter explore mode - think through ideas, investigate problems, clarify requirements"
+category: Workflow
+tags: [workflow, explore, experimental, thinking]
+---
+
+Enter explore mode. Think deeply. Visualize freely. Follow the conversation wherever it goes.
+
+**IMPORTANT: Explore mode is for thinking, not implementing.** You may read files, search code, and investigate the codebase, but you must NEVER write code or implement features. If the user asks you to implement something, remind them to exit explore mode first and create a change proposal. You MAY create OpenSpec artifacts (proposals, designs, specs) if the user asks—that's capturing thinking, not implementing.
+
+**This is a stance, not a workflow.** There are no fixed steps, no required sequence, no mandatory outputs. You're a thinking partner helping the user explore.
+
+**Input**: The argument after `/opsx:explore` is whatever the user wants to think about. Could be:
+- A vague idea: "real-time collaboration"
+- A specific problem: "the auth system is getting unwieldy"
+- A change name: "add-dark-mode" (to explore in context of that change)
+- A comparison: "postgres vs sqlite for this"
+- Nothing (just enter explore mode)
+
+---
+
+## The Stance
+
+- **Curious, not prescriptive** - Ask questions that emerge naturally, don't follow a script
+- **Open threads, not interrogations** - Surface multiple interesting directions and let the user follow what resonates. Don't funnel them through a single path of questions.
+- **Visual** - Use ASCII diagrams liberally when they'd help clarify thinking
+- **Adaptive** - Follow interesting threads, pivot when new information emerges
+- **Patient** - Don't rush to conclusions, let the shape of the problem emerge
+- **Grounded** - Explore the actual codebase when relevant, don't just theorize
+
+---
+
+## What You Might Do
+
+Depending on what the user brings, you might:
+
+**Explore the problem space**
+- Ask clarifying questions that emerge from what they said
+- Challenge assumptions
+- Reframe the problem
+- Find analogies
+
+**Investigate the codebase**
+- Map existing architecture relevant to the discussion
+- Find integration points
+- Identify patterns already in use
+- Surface hidden complexity
+
+**Compare options**
+- Brainstorm multiple approaches
+- Build comparison tables
+- Sketch tradeoffs
+- Recommend a path (if asked)
+
+**Visualize**
+```
+┌─────────────────────────────────────────┐
+│     Use ASCII diagrams liberally        │
+├─────────────────────────────────────────┤
+│                                         │
+│   ┌────────┐         ┌────────┐        │
+│   │ State  │────────▶│ State  │        │
+│   │   A    │         │   B    │        │
+│   └────────┘         └────────┘        │
+│                                         │
+│   System diagrams, state machines,      │
+│   data flows, architecture sketches,    │
+│   dependency graphs, comparison tables  │
+│                                         │
+└─────────────────────────────────────────┘
+```
+
+**Surface risks and unknowns**
+- Identify what could go wrong
+- Find gaps in understanding
+- Suggest spikes or investigations
+
+---
+
+## OpenSpec Awareness
+
+You have full context of the OpenSpec system. Use it naturally, don't force it.
+
+### Check for context
+
+At the start, quickly check what exists:
+```bash
+openspec list --json
+```
+
+This tells you:
+- If there are active changes
+- Their names, schemas, and status
+- What the user might be working on
+
+If the user mentioned a specific change name, read its artifacts for context.
+
+### When no change exists
+
+Think freely. When insights crystallize, you might offer:
+
+- "This feels solid enough to start a change. Want me to create a proposal?"
+- Or keep exploring - no pressure to formalize
+
+### When a change exists
+
+If the user mentions a change or you detect one is relevant:
+
+1. **Read existing artifacts for context**
+   - `openspec/changes/<name>/proposal.md`
+   - `openspec/changes/<name>/design.md`
+   - `openspec/changes/<name>/tasks.md`
+   - etc.
+
+2. **Reference them naturally in conversation**
+   - "Your design mentions using Redis, but we just realized SQLite fits better..."
+   - "The proposal scopes this to premium users, but we're now thinking everyone..."
+
+3. **Offer to capture when decisions are made**
+
+   | Insight Type | Where to Capture |
+   |--------------|------------------|
+   | New requirement discovered | `specs/<capability>/spec.md` |
+   | Requirement changed | `specs/<capability>/spec.md` |
+   | Design decision made | `design.md` |
+   | Scope changed | `proposal.md` |
+   | New work identified | `tasks.md` |
+   | Assumption invalidated | Relevant artifact |
+
+   Example offers:
+   - "That's a design decision. Capture it in design.md?"
+   - "This is a new requirement. Add it to specs?"
+   - "This changes scope. Update the proposal?"
+
+4. **The user decides** - Offer and move on. Don't pressure. Don't auto-capture.
+
+---
+
+## What You Don't Have To Do
+
+- Follow a script
+- Ask the same questions every time
+- Produce a specific artifact
+- Reach a conclusion
+- Stay on topic if a tangent is valuable
+- Be brief (this is thinking time)
+
+---
+
+## Ending Discovery
+
+There's no required ending. Discovery might:
+
+- **Flow into a proposal**: "Ready to start? I can create a change proposal."
+- **Result in artifact updates**: "Updated design.md with these decisions"
+- **Just provide clarity**: User has what they need, moves on
+- **Continue later**: "We can pick this up anytime"
+
+When things crystallize, you might offer a summary - but it's optional. Sometimes the thinking IS the value.
+
+---
+
+## Guardrails
+
+- **Don't implement** - Never write code or implement features. Creating OpenSpec artifacts is fine, writing application code is not.
+- **Don't fake understanding** - If something is unclear, dig deeper
+- **Don't rush** - Discovery is thinking time, not task time
+- **Don't force structure** - Let patterns emerge naturally
+- **Don't auto-capture** - Offer to save insights, don't just do it
+- **Do visualize** - A good diagram is worth many paragraphs
+- **Do explore the codebase** - Ground discussions in reality
+- **Do question assumptions** - Including the user's and your own
--- a/.claude/commands/opsx/propose.md
+++ b/.claude/commands/opsx/propose.md
@@ -0,0 +1,106 @@
+---
+name: "OPSX: Propose"
+description: Propose a new change - create it and generate all artifacts in one step
+category: Workflow
+tags: [workflow, artifacts, experimental]
+---
+
+Propose a new change - create the change and generate all artifacts in one step.
+
+I'll create a change with artifacts:
+- proposal.md (what & why)
+- design.md (how)
+- tasks.md (implementation steps)
+
+When ready to implement, run /opsx:apply
+
+---
+
+**Input**: The argument after `/opsx:propose` is the change name (kebab-case), OR a description of what the user wants to build.
+
+**Steps**
+
+1. **If no input provided, ask what they want to build**
+
+   Use the **AskUserQuestion tool** (open-ended, no preset options) to ask:
+   > "What change do you want to work on? Describe what you want to build or fix."
+
+   From their description, derive a kebab-case name (e.g., "add user authentication" → `add-user-auth`).
+
+   **IMPORTANT**: Do NOT proceed without understanding what the user wants to build.
+
+2. **Create the change directory**
+   ```bash
+   openspec new change "<name>"
+   ```
+   This creates a scaffolded change at `openspec/changes/<name>/` with `.openspec.yaml`.
+
+3. **Get the artifact build order**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to get:
+   - `applyRequires`: array of artifact IDs needed before implementation (e.g., `["tasks"]`)
+   - `artifacts`: list of all artifacts with their status and dependencies
+
+4. **Create artifacts in sequence until apply-ready**
+
+   Use the **TodoWrite tool** to track progress through the artifacts.
+
+   Loop through artifacts in dependency order (artifacts with no pending dependencies first):
+
+   a. **For each artifact that is `ready` (dependencies satisfied)**:
+      - Get instructions:
+        ```bash
+        openspec instructions <artifact-id> --change "<name>" --json
+        ```
+      - The instructions JSON includes:
+        - `context`: Project background (constraints for you - do NOT include in output)
+        - `rules`: Artifact-specific rules (constraints for you - do NOT include in output)
+        - `template`: The structure to use for your output file
+        - `instruction`: Schema-specific guidance for this artifact type
+        - `outputPath`: Where to write the artifact
+        - `dependencies`: Completed artifacts to read for context
+      - Read any completed dependency files for context
+      - Create the artifact file using `template` as the structure
+      - Apply `context` and `rules` as constraints - but do NOT copy them into the file
+      - Show brief progress: "Created <artifact-id>"
+
+   b. **Continue until all `applyRequires` artifacts are complete**
+      - After creating each artifact, re-run `openspec status --change "<name>" --json`
+      - Check if every artifact ID in `applyRequires` has `status: "done"` in the artifacts array
+      - Stop when all `applyRequires` artifacts are done
+
+   c. **If an artifact requires user input** (unclear context):
+      - Use **AskUserQuestion tool** to clarify
+      - Then continue with creation
+
+5. **Show final status**
+   ```bash
+   openspec status --change "<name>"
+   ```
+
+**Output**
+
+After completing all artifacts, summarize:
+- Change name and location
+- List of artifacts created with brief descriptions
+- What's ready: "All artifacts created! Ready for implementation."
+- Prompt: "Run `/opsx:apply` to start implementing."
+
+**Artifact Creation Guidelines**
+
+- Follow the `instruction` field from `openspec instructions` for each artifact type
+- The schema defines what each artifact should contain - follow it
+- Read dependency artifacts for context before creating new ones
+- Use `template` as the structure for your output file - fill in its sections
+- **IMPORTANT**: `context` and `rules` are constraints for YOU, not content for the file
+  - Do NOT copy `<context>`, `<rules>`, `<project_context>` blocks into the artifact
+  - These guide what you write, but should never appear in the output
+
+**Guardrails**
+- Create ALL artifacts needed for implementation (as defined by schema's `apply.requires`)
+- Always read dependency artifacts before creating a new one
+- If context is critically unclear, ask the user - but prefer making reasonable decisions to keep momentum
+- If a change with that name already exists, ask if user wants to continue it or create a new one
+- Verify each artifact file exists after writing before proceeding to next
--- a/.claude/skills/openspec-apply-change/SKILL.md
+++ b/.claude/skills/openspec-apply-change/SKILL.md
@@ -0,0 +1,156 @@
+---
+name: openspec-apply-change
+description: Implement tasks from an OpenSpec change. Use when the user wants to start implementing, continue implementation, or work through tasks.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Implement tasks from an OpenSpec change.
+
+**Input**: Optionally specify a change name. If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **Select the change**
+
+   If a name is provided, use it. Otherwise:
+   - Infer from conversation context if the user mentioned a change
+   - Auto-select if only one active change exists
+   - If ambiguous, run `openspec list --json` to get available changes and use the **AskUserQuestion tool** to let the user select
+
+   Always announce: "Using change: <name>" and how to override (e.g., `/opsx:apply <other>`).
+
+2. **Check status to understand the schema**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used (e.g., "spec-driven")
+   - Which artifact contains the tasks (typically "tasks" for spec-driven, check status for others)
+
+3. **Get apply instructions**
+
+   ```bash
+   openspec instructions apply --change "<name>" --json
+   ```
+
+   This returns:
+   - Context file paths (varies by schema - could be proposal/specs/design/tasks or spec/tests/implementation/docs)
+   - Progress (total, complete, remaining)
+   - Task list with status
+   - Dynamic instruction based on current state
+
+   **Handle states:**
+   - If `state: "blocked"` (missing artifacts): show message, suggest using openspec-continue-change
+   - If `state: "all_done"`: congratulate, suggest archive
+   - Otherwise: proceed to implementation
+
+4. **Read context files**
+
+   Read the files listed in `contextFiles` from the apply instructions output.
+   The files depend on the schema being used:
+   - **spec-driven**: proposal, specs, design, tasks
+   - Other schemas: follow the contextFiles from CLI output
+
+5. **Show current progress**
+
+   Display:
+   - Schema being used
+   - Progress: "N/M tasks complete"
+   - Remaining tasks overview
+   - Dynamic instruction from CLI
+
+6. **Implement tasks (loop until done or blocked)**
+
+   For each pending task:
+   - Show which task is being worked on
+   - Make the code changes required
+   - Keep changes minimal and focused
+   - Mark task complete in the tasks file: `- [ ]` → `- [x]`
+   - Continue to next task
+
+   **Pause if:**
+   - Task is unclear → ask for clarification
+   - Implementation reveals a design issue → suggest updating artifacts
+   - Error or blocker encountered → report and wait for guidance
+   - User interrupts
+
+7. **On completion or pause, show status**
+
+   Display:
+   - Tasks completed this session
+   - Overall progress: "N/M tasks complete"
+   - If all done: suggest archive
+   - If paused: explain why and wait for guidance
+
+**Output During Implementation**
+
+```
+## Implementing: <change-name> (schema: <schema-name>)
+
+Working on task 3/7: <task description>
+[...implementation happening...]
+✓ Task complete
+
+Working on task 4/7: <task description>
+[...implementation happening...]
+✓ Task complete
+```
+
+**Output On Completion**
+
+```
+## Implementation Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 7/7 tasks complete ✓
+
+### Completed This Session
+- [x] Task 1
+- [x] Task 2
+...
+
+All tasks complete! Ready to archive this change.
+```
+
+**Output On Pause (Issue Encountered)**
+
+```
+## Implementation Paused
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 4/7 tasks complete
+
+### Issue Encountered
+<description of the issue>
+
+**Options:**
+1. <option 1>
+2. <option 2>
+3. Other approach
+
+What would you like to do?
+```
+
+**Guardrails**
+- Keep going through tasks until done or blocked
+- Always read context files before starting (from the apply instructions output)
+- If task is ambiguous, pause and ask before implementing
+- If implementation reveals issues, pause and suggest artifact updates
+- Keep code changes minimal and scoped to each task
+- Update task checkbox immediately after completing each task
+- Pause on errors, blockers, or unclear requirements - don't guess
+- Use contextFiles from CLI output, don't assume specific file names
+
+**Fluid Workflow Integration**
+
+This skill supports the "actions on a change" model:
+
+- **Can be invoked anytime**: Before all artifacts are done (if tasks exist), after partial implementation, interleaved with other actions
+- **Allows artifact updates**: If implementation reveals design issues, suggest updating artifacts - not phase-locked, work fluidly
--- a/.claude/skills/openspec-archive-change/SKILL.md
+++ b/.claude/skills/openspec-archive-change/SKILL.md
@@ -0,0 +1,114 @@
+---
+name: openspec-archive-change
+description: Archive a completed change in the experimental workflow. Use when the user wants to finalize and archive a change after implementation is complete.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Archive a completed change in the experimental workflow.
+
+**Input**: Optionally specify a change name. If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **If no change name provided, prompt for selection**
+
+   Run `openspec list --json` to get available changes. Use the **AskUserQuestion tool** to let the user select.
+
+   Show only active changes (not already archived).
+   Include the schema used for each change if available.
+
+   **IMPORTANT**: Do NOT guess or auto-select a change. Always let the user choose.
+
+2. **Check artifact completion status**
+
+   Run `openspec status --change "<name>" --json` to check artifact completion.
+
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used
+   - `artifacts`: List of artifacts with their status (`done` or other)
+
+   **If any artifacts are not `done`:**
+   - Display warning listing incomplete artifacts
+   - Use **AskUserQuestion tool** to confirm user wants to proceed
+   - Proceed if user confirms
+
+3. **Check task completion status**
+
+   Read the tasks file (typically `tasks.md`) to check for incomplete tasks.
+
+   Count tasks marked with `- [ ]` (incomplete) vs `- [x]` (complete).
+
+   **If incomplete tasks found:**
+   - Display warning showing count of incomplete tasks
+   - Use **AskUserQuestion tool** to confirm user wants to proceed
+   - Proceed if user confirms
+
+   **If no tasks file exists:** Proceed without task-related warning.
+
+4. **Assess delta spec sync state**
+
+   Check for delta specs at `openspec/changes/<name>/specs/`. If none exist, proceed without sync prompt.
+
+   **If delta specs exist:**
+   - Compare each delta spec with its corresponding main spec at `openspec/specs/<capability>/spec.md`
+   - Determine what changes would be applied (adds, modifications, removals, renames)
+   - Show a combined summary before prompting
+
+   **Prompt options:**
+   - If changes needed: "Sync now (recommended)", "Archive without syncing"
+   - If already synced: "Archive now", "Sync anyway", "Cancel"
+
+   If user chooses sync, use Task tool (subagent_type: "general-purpose", prompt: "Use Skill tool to invoke openspec-sync-specs for change '<name>'. Delta spec analysis: <include the analyzed delta spec summary>"). Proceed to archive regardless of choice.
+
+5. **Perform the archive**
+
+   Create the archive directory if it doesn't exist:
+   ```bash
+   mkdir -p openspec/changes/archive
+   ```
+
+   Generate target name using current date: `YYYY-MM-DD-<change-name>`
+
+   **Check if target already exists:**
+   - If yes: Fail with error, suggest renaming existing archive or using different date
+   - If no: Move the change directory to archive
+
+   ```bash
+   mv openspec/changes/<name> openspec/changes/archive/YYYY-MM-DD-<name>
+   ```
+
+6. **Display summary**
+
+   Show archive completion summary including:
+   - Change name
+   - Schema that was used
+   - Archive location
+   - Whether specs were synced (if applicable)
+   - Note about any warnings (incomplete artifacts/tasks)
+
+**Output On Success**
+
+```
+## Archive Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Archived to:** openspec/changes/archive/YYYY-MM-DD-<name>/
+**Specs:** ✓ Synced to main specs (or "No delta specs" or "Sync skipped")
+
+All artifacts complete. All tasks complete.
+```
+
+**Guardrails**
+- Always prompt for change selection if not provided
+- Use artifact graph (openspec status --json) for completion checking
+- Don't block archive on warnings - just inform and confirm
+- Preserve .openspec.yaml when moving to archive (it moves with the directory)
+- Show clear summary of what happened
+- If sync is requested, use openspec-sync-specs approach (agent-driven)
+- If delta specs exist, always run the sync assessment and show the combined summary before prompting
--- a/.claude/skills/openspec-explore/SKILL.md
+++ b/.claude/skills/openspec-explore/SKILL.md
@@ -0,0 +1,288 @@
+---
+name: openspec-explore
+description: Enter explore mode - a thinking partner for exploring ideas, investigating problems, and clarifying requirements. Use when the user wants to think through something before or during a change.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Enter explore mode. Think deeply. Visualize freely. Follow the conversation wherever it goes.
+
+**IMPORTANT: Explore mode is for thinking, not implementing.** You may read files, search code, and investigate the codebase, but you must NEVER write code or implement features. If the user asks you to implement something, remind them to exit explore mode first and create a change proposal. You MAY create OpenSpec artifacts (proposals, designs, specs) if the user asks—that's capturing thinking, not implementing.
+
+**This is a stance, not a workflow.** There are no fixed steps, no required sequence, no mandatory outputs. You're a thinking partner helping the user explore.
+
+---
+
+## The Stance
+
+- **Curious, not prescriptive** - Ask questions that emerge naturally, don't follow a script
+- **Open threads, not interrogations** - Surface multiple interesting directions and let the user follow what resonates. Don't funnel them through a single path of questions.
+- **Visual** - Use ASCII diagrams liberally when they'd help clarify thinking
+- **Adaptive** - Follow interesting threads, pivot when new information emerges
+- **Patient** - Don't rush to conclusions, let the shape of the problem emerge
+- **Grounded** - Explore the actual codebase when relevant, don't just theorize
+
+---
+
+## What You Might Do
+
+Depending on what the user brings, you might:
+
+**Explore the problem space**
+- Ask clarifying questions that emerge from what they said
+- Challenge assumptions
+- Reframe the problem
+- Find analogies
+
+**Investigate the codebase**
+- Map existing architecture relevant to the discussion
+- Find integration points
+- Identify patterns already in use
+- Surface hidden complexity
+
+**Compare options**
+- Brainstorm multiple approaches
+- Build comparison tables
+- Sketch tradeoffs
+- Recommend a path (if asked)
+
+**Visualize**
+```
+┌─────────────────────────────────────────┐
+│     Use ASCII diagrams liberally        │
+├─────────────────────────────────────────┤
+│                                         │
+│   ┌────────┐         ┌────────┐        │
+│   │ State  │────────▶│ State  │        │
+│   │   A    │         │   B    │        │
+│   └────────┘         └────────┘        │
+│                                         │
+│   System diagrams, state machines,      │
+│   data flows, architecture sketches,    │
+│   dependency graphs, comparison tables  │
+│                                         │
+└─────────────────────────────────────────┘
+```
+
+**Surface risks and unknowns**
+- Identify what could go wrong
+- Find gaps in understanding
+- Suggest spikes or investigations
+
+---
+
+## OpenSpec Awareness
+
+You have full context of the OpenSpec system. Use it naturally, don't force it.
+
+### Check for context
+
+At the start, quickly check what exists:
+```bash
+openspec list --json
+```
+
+This tells you:
+- If there are active changes
+- Their names, schemas, and status
+- What the user might be working on
+
+### When no change exists
+
+Think freely. When insights crystallize, you might offer:
+
+- "This feels solid enough to start a change. Want me to create a proposal?"
+- Or keep exploring - no pressure to formalize
+
+### When a change exists
+
+If the user mentions a change or you detect one is relevant:
+
+1. **Read existing artifacts for context**
+   - `openspec/changes/<name>/proposal.md`
+   - `openspec/changes/<name>/design.md`
+   - `openspec/changes/<name>/tasks.md`
+   - etc.
+
+2. **Reference them naturally in conversation**
+   - "Your design mentions using Redis, but we just realized SQLite fits better..."
+   - "The proposal scopes this to premium users, but we're now thinking everyone..."
+
+3. **Offer to capture when decisions are made**
+
+   | Insight Type | Where to Capture |
+   |--------------|------------------|
+   | New requirement discovered | `specs/<capability>/spec.md` |
+   | Requirement changed | `specs/<capability>/spec.md` |
+   | Design decision made | `design.md` |
+   | Scope changed | `proposal.md` |
+   | New work identified | `tasks.md` |
+   | Assumption invalidated | Relevant artifact |
+
+   Example offers:
+   - "That's a design decision. Capture it in design.md?"
+   - "This is a new requirement. Add it to specs?"
+   - "This changes scope. Update the proposal?"
+
+4. **The user decides** - Offer and move on. Don't pressure. Don't auto-capture.
+
+---
+
+## What You Don't Have To Do
+
+- Follow a script
+- Ask the same questions every time
+- Produce a specific artifact
+- Reach a conclusion
+- Stay on topic if a tangent is valuable
+- Be brief (this is thinking time)
+
+---
+
+## Handling Different Entry Points
+
+**User brings a vague idea:**
+```
+User: I'm thinking about adding real-time collaboration
+
+You: Real-time collab is a big space. Let me think about this...
+
+      COLLABORATION SPECTRUM
+      ════════════════════════════════════════════
+
+      Awareness          Coordination         Sync
+          │                   │                 │
+          ▼                   ▼                 ▼
+      ┌────────┐         ┌────────┐        ┌────────┐
+      │Presence│         │Cursors │        │  CRDT  │
+      │  "3    │         │ Multi  │        │Conflict│
+      │online" │         │ select │        │  free  │
+      └────────┘         └────────┘        └────────┘
+          │                   │                 │
+       trivial            moderate           complex
+
+      Where's your head at?
+```
+
+**User brings a specific problem:**
+```
+User: The auth system is a mess
+
+You: [reads codebase]
+
+     ┌─────────────────────────────────────────────┐
+     │              CURRENT AUTH FLOW              │
+     └─────────────────────────────────────────────┘
+                        │
+          ┌─────────────┼─────────────┐
+          ▼             ▼             ▼
+     ┌─────────┐  ┌─────────┐  ┌─────────┐
+     │ Google  │  │  GitHub │  │  Email  │
+     │  OAuth  │  │  OAuth  │  │  Magic  │
+     └────┬────┘  └────┬────┘  └────┬────┘
+          │             │             │
+          └─────────────┼─────────────┘
+                        ▼
+                  ┌───────────┐
+                  │  Session  │
+                  └─────┬─────┘
+                        │
+                        ▼
+                  ┌───────────┐
+                  │   Perms   │
+                  └───────────┘
+
+     I see three tangles. Which one's burning?
+```
+
+**User is stuck mid-implementation:**
+```
+User: /opsx:explore add-auth-system
+      The OAuth integration is more complex than expected
+
+You: [reads change artifacts]
+
+     You're on task 4: "Implement OAuth flow"
+
+     Let me trace what's involved...
+
+     [draws diagram, explores options, suggests paths]
+
+     Want to update the design to reflect this?
+     Or add a spike task to investigate?
+```
+
+**User wants to compare options:**
+```
+User: Should we use Postgres or SQLite?
+
+You: Generic answer is boring. What's the context?
+
+User: A CLI tool that tracks local dev environments
+
+You: That changes everything.
+
+     ┌─────────────────────────────────────────────────┐
+     │         CLI TOOL DATA STORAGE                  │
+     └─────────────────────────────────────────────────┘
+
+     Key constraints:
+     • No daemon running
+     • Must work offline
+     • Single user
+
+                  SQLite          Postgres
+     Deployment   embedded ✓      needs server ✗
+     Offline      yes ✓           no ✗
+     Single file  yes ✓           no ✗
+
+     SQLite. Not even close.
+
+     Unless... is there a sync component?
+```
+
+---
+
+## Ending Discovery
+
+There's no required ending. Discovery might:
+
+- **Flow into a proposal**: "Ready to start? I can create a change proposal."
+- **Result in artifact updates**: "Updated design.md with these decisions"
+- **Just provide clarity**: User has what they need, moves on
+- **Continue later**: "We can pick this up anytime"
+
+When it feels like things are crystallizing, you might summarize:
+
+```
+## What We Figured Out
+
+**The problem**: [crystallized understanding]
+
+**The approach**: [if one emerged]
+
+**Open questions**: [if any remain]
+
+**Next steps** (if ready):
+- Create a change proposal
+- Keep exploring: just keep talking
+```
+
+But this summary is optional. Sometimes the thinking IS the value.
+
+---
+
+## Guardrails
+
+- **Don't implement** - Never write code or implement features. Creating OpenSpec artifacts is fine, writing application code is not.
+- **Don't fake understanding** - If something is unclear, dig deeper
+- **Don't rush** - Discovery is thinking time, not task time
+- **Don't force structure** - Let patterns emerge naturally
+- **Don't auto-capture** - Offer to save insights, don't just do it
+- **Do visualize** - A good diagram is worth many paragraphs
+- **Do explore the codebase** - Ground discussions in reality
+- **Do question assumptions** - Including the user's and your own
--- a/.claude/skills/openspec-propose/SKILL.md
+++ b/.claude/skills/openspec-propose/SKILL.md
@@ -0,0 +1,110 @@
+---
+name: openspec-propose
+description: Propose a new change with all artifacts generated in one step. Use when the user wants to quickly describe what they want to build and get a complete proposal with design, specs, and tasks ready for implementation.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Propose a new change - create the change and generate all artifacts in one step.
+
+I'll create a change with artifacts:
+- proposal.md (what & why)
+- design.md (how)
+- tasks.md (implementation steps)
+
+When ready to implement, run /opsx:apply
+
+---
+
+**Input**: The user's request should include a change name (kebab-case) OR a description of what they want to build.
+
+**Steps**
+
+1. **If no clear input provided, ask what they want to build**
+
+   Use the **AskUserQuestion tool** (open-ended, no preset options) to ask:
+   > "What change do you want to work on? Describe what you want to build or fix."
+
+   From their description, derive a kebab-case name (e.g., "add user authentication" → `add-user-auth`).
+
+   **IMPORTANT**: Do NOT proceed without understanding what the user wants to build.
+
+2. **Create the change directory**
+   ```bash
+   openspec new change "<name>"
+   ```
+   This creates a scaffolded change at `openspec/changes/<name>/` with `.openspec.yaml`.
+
+3. **Get the artifact build order**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to get:
+   - `applyRequires`: array of artifact IDs needed before implementation (e.g., `["tasks"]`)
+   - `artifacts`: list of all artifacts with their status and dependencies
+
+4. **Create artifacts in sequence until apply-ready**
+
+   Use the **TodoWrite tool** to track progress through the artifacts.
+
+   Loop through artifacts in dependency order (artifacts with no pending dependencies first):
+
+   a. **For each artifact that is `ready` (dependencies satisfied)**:
+      - Get instructions:
+        ```bash
+        openspec instructions <artifact-id> --change "<name>" --json
+        ```
+      - The instructions JSON includes:
+        - `context`: Project background (constraints for you - do NOT include in output)
+        - `rules`: Artifact-specific rules (constraints for you - do NOT include in output)
+        - `template`: The structure to use for your output file
+        - `instruction`: Schema-specific guidance for this artifact type
+        - `outputPath`: Where to write the artifact
+        - `dependencies`: Completed artifacts to read for context
+      - Read any completed dependency files for context
+      - Create the artifact file using `template` as the structure
+      - Apply `context` and `rules` as constraints - but do NOT copy them into the file
+      - Show brief progress: "Created <artifact-id>"
+
+   b. **Continue until all `applyRequires` artifacts are complete**
+      - After creating each artifact, re-run `openspec status --change "<name>" --json`
+      - Check if every artifact ID in `applyRequires` has `status: "done"` in the artifacts array
+      - Stop when all `applyRequires` artifacts are done
+
+   c. **If an artifact requires user input** (unclear context):
+      - Use **AskUserQuestion tool** to clarify
+      - Then continue with creation
+
+5. **Show final status**
+   ```bash
+   openspec status --change "<name>"
+   ```
+
+**Output**
+
+After completing all artifacts, summarize:
+- Change name and location
+- List of artifacts created with brief descriptions
+- What's ready: "All artifacts created! Ready for implementation."
+- Prompt: "Run `/opsx:apply` or ask me to implement to start working on the tasks."
+
+**Artifact Creation Guidelines**
+
+- Follow the `instruction` field from `openspec instructions` for each artifact type
+- The schema defines what each artifact should contain - follow it
+- Read dependency artifacts for context before creating new ones
+- Use `template` as the structure for your output file - fill in its sections
+- **IMPORTANT**: `context` and `rules` are constraints for YOU, not content for the file
+  - Do NOT copy `<context>`, `<rules>`, `<project_context>` blocks into the artifact
+  - These guide what you write, but should never appear in the output
+
+**Guardrails**
+- Create ALL artifacts needed for implementation (as defined by schema's `apply.requires`)
+- Always read dependency artifacts before creating a new one
+- If context is critically unclear, ask the user - but prefer making reasonable decisions to keep momentum
+- If a change with that name already exists, ask if user wants to continue it or create a new one
+- Verify each artifact file exists after writing before proceeding to next
--- a/.gitignore
+++ b/.gitignore
@@ -211,9 +211,11 @@ datasets/
 data/
 deps/
 outputs/
+
+# Vibe Coding
 .sisyphus
-.claude/
-CLAUDE.md
+.claude/settings.local.json
+openspec/changes/

 # Devenv
 .devenv*
--- a/.justfile
+++ b/.justfile
@@ -1,13 +0,0 @@
-activate:
-    micromamba activate ./.venv
-
-update-venv:
-    micromamba env export --no-builds | grep -v "prefix" > venv.yaml
-
-download-test:
-    python -m habitat_sim.utils.datasets_download --uids habitat_test_scenes --data-path data/
-    python -m habitat_sim.utils.datasets_download --uids habitat_test_pointnav_dataset --data-path data/
-    python -m habitat_sim.utils.datasets_download --uids replica_cad_dataset --data-path data/
-    python -m habitat_sim.utils.datasets_download --uids rearrange_dataset_v2 --data-path data/
-    python -m habitat_sim.utils.datasets_download --uids hab_fetch --data-path data/
-    python -m habitat_sim.utils.datasets_download --uids ycb --data-path data/
--- a/.opencode/command/opsx-apply.md
+++ b/.opencode/command/opsx-apply.md
@@ -0,0 +1,149 @@
+---
+description: Implement tasks from an OpenSpec change (Experimental)
+---
+
+Implement tasks from an OpenSpec change.
+
+**Input**: Optionally specify a change name (e.g., `/opsx-apply add-auth`). If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **Select the change**
+
+   If a name is provided, use it. Otherwise:
+   - Infer from conversation context if the user mentioned a change
+   - Auto-select if only one active change exists
+   - If ambiguous, run `openspec list --json` to get available changes and use the **AskUserQuestion tool** to let the user select
+
+   Always announce: "Using change: <name>" and how to override (e.g., `/opsx-apply <other>`).
+
+2. **Check status to understand the schema**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used (e.g., "spec-driven")
+   - Which artifact contains the tasks (typically "tasks" for spec-driven, check status for others)
+
+3. **Get apply instructions**
+
+   ```bash
+   openspec instructions apply --change "<name>" --json
+   ```
+
+   This returns:
+   - Context file paths (varies by schema)
+   - Progress (total, complete, remaining)
+   - Task list with status
+   - Dynamic instruction based on current state
+
+   **Handle states:**
+   - If `state: "blocked"` (missing artifacts): show message, suggest using `/opsx-continue`
+   - If `state: "all_done"`: congratulate, suggest archive
+   - Otherwise: proceed to implementation
+
+4. **Read context files**
+
+   Read the files listed in `contextFiles` from the apply instructions output.
+   The files depend on the schema being used:
+   - **spec-driven**: proposal, specs, design, tasks
+   - Other schemas: follow the contextFiles from CLI output
+
+5. **Show current progress**
+
+   Display:
+   - Schema being used
+   - Progress: "N/M tasks complete"
+   - Remaining tasks overview
+   - Dynamic instruction from CLI
+
+6. **Implement tasks (loop until done or blocked)**
+
+   For each pending task:
+   - Show which task is being worked on
+   - Make the code changes required
+   - Keep changes minimal and focused
+   - Mark task complete in the tasks file: `- [ ]` → `- [x]`
+   - Continue to next task
+
+   **Pause if:**
+   - Task is unclear → ask for clarification
+   - Implementation reveals a design issue → suggest updating artifacts
+   - Error or blocker encountered → report and wait for guidance
+   - User interrupts
+
+7. **On completion or pause, show status**
+
+   Display:
+   - Tasks completed this session
+   - Overall progress: "N/M tasks complete"
+   - If all done: suggest archive
+   - If paused: explain why and wait for guidance
+
+**Output During Implementation**
+
+```
+## Implementing: <change-name> (schema: <schema-name>)
+
+Working on task 3/7: <task description>
+[...implementation happening...]
+✓ Task complete
+
+Working on task 4/7: <task description>
+[...implementation happening...]
+✓ Task complete
+```
+
+**Output On Completion**
+
+```
+## Implementation Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 7/7 tasks complete ✓
+
+### Completed This Session
+- [x] Task 1
+- [x] Task 2
+...
+
+All tasks complete! You can archive this change with `/opsx-archive`.
+```
+
+**Output On Pause (Issue Encountered)**
+
+```
+## Implementation Paused
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 4/7 tasks complete
+
+### Issue Encountered
+<description of the issue>
+
+**Options:**
+1. <option 1>
+2. <option 2>
+3. Other approach
+
+What would you like to do?
+```
+
+**Guardrails**
+- Keep going through tasks until done or blocked
+- Always read context files before starting (from the apply instructions output)
+- If task is ambiguous, pause and ask before implementing
+- If implementation reveals issues, pause and suggest artifact updates
+- Keep code changes minimal and scoped to each task
+- Update task checkbox immediately after completing each task
+- Pause on errors, blockers, or unclear requirements - don't guess
+- Use contextFiles from CLI output, don't assume specific file names
+
+**Fluid Workflow Integration**
+
+This skill supports the "actions on a change" model:
+
+- **Can be invoked anytime**: Before all artifacts are done (if tasks exist), after partial implementation, interleaved with other actions
+- **Allows artifact updates**: If implementation reveals design issues, suggest updating artifacts - not phase-locked, work fluidly
--- a/.opencode/command/opsx-archive.md
+++ b/.opencode/command/opsx-archive.md
@@ -0,0 +1,154 @@
+---
+description: Archive a completed change in the experimental workflow
+---
+
+Archive a completed change in the experimental workflow.
+
+**Input**: Optionally specify a change name after `/opsx-archive` (e.g., `/opsx-archive add-auth`). If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **If no change name provided, prompt for selection**
+
+   Run `openspec list --json` to get available changes. Use the **AskUserQuestion tool** to let the user select.
+
+   Show only active changes (not already archived).
+   Include the schema used for each change if available.
+
+   **IMPORTANT**: Do NOT guess or auto-select a change. Always let the user choose.
+
+2. **Check artifact completion status**
+
+   Run `openspec status --change "<name>" --json` to check artifact completion.
+
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used
+   - `artifacts`: List of artifacts with their status (`done` or other)
+
+   **If any artifacts are not `done`:**
+   - Display warning listing incomplete artifacts
+   - Prompt user for confirmation to continue
+   - Proceed if user confirms
+
+3. **Check task completion status**
+
+   Read the tasks file (typically `tasks.md`) to check for incomplete tasks.
+
+   Count tasks marked with `- [ ]` (incomplete) vs `- [x]` (complete).
+
+   **If incomplete tasks found:**
+   - Display warning showing count of incomplete tasks
+   - Prompt user for confirmation to continue
+   - Proceed if user confirms
+
+   **If no tasks file exists:** Proceed without task-related warning.
+
+4. **Assess delta spec sync state**
+
+   Check for delta specs at `openspec/changes/<name>/specs/`. If none exist, proceed without sync prompt.
+
+   **If delta specs exist:**
+   - Compare each delta spec with its corresponding main spec at `openspec/specs/<capability>/spec.md`
+   - Determine what changes would be applied (adds, modifications, removals, renames)
+   - Show a combined summary before prompting
+
+   **Prompt options:**
+   - If changes needed: "Sync now (recommended)", "Archive without syncing"
+   - If already synced: "Archive now", "Sync anyway", "Cancel"
+
+   If user chooses sync, use Task tool (subagent_type: "general-purpose", prompt: "Use Skill tool to invoke openspec-sync-specs for change '<name>'. Delta spec analysis: <include the analyzed delta spec summary>"). Proceed to archive regardless of choice.
+
+5. **Perform the archive**
+
+   Create the archive directory if it doesn't exist:
+   ```bash
+   mkdir -p openspec/changes/archive
+   ```
+
+   Generate target name using current date: `YYYY-MM-DD-<change-name>`
+
+   **Check if target already exists:**
+   - If yes: Fail with error, suggest renaming existing archive or using different date
+   - If no: Move the change directory to archive
+
+   ```bash
+   mv openspec/changes/<name> openspec/changes/archive/YYYY-MM-DD-<name>
+   ```
+
+6. **Display summary**
+
+   Show archive completion summary including:
+   - Change name
+   - Schema that was used
+   - Archive location
+   - Spec sync status (synced / sync skipped / no delta specs)
+   - Note about any warnings (incomplete artifacts/tasks)
+
+**Output On Success**
+
+```
+## Archive Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Archived to:** openspec/changes/archive/YYYY-MM-DD-<name>/
+**Specs:** ✓ Synced to main specs
+
+All artifacts complete. All tasks complete.
+```
+
+**Output On Success (No Delta Specs)**
+
+```
+## Archive Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Archived to:** openspec/changes/archive/YYYY-MM-DD-<name>/
+**Specs:** No delta specs
+
+All artifacts complete. All tasks complete.
+```
+
+**Output On Success With Warnings**
+
+```
+## Archive Complete (with warnings)
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Archived to:** openspec/changes/archive/YYYY-MM-DD-<name>/
+**Specs:** Sync skipped (user chose to skip)
+
+**Warnings:**
+- Archived with 2 incomplete artifacts
+- Archived with 3 incomplete tasks
+- Delta spec sync was skipped (user chose to skip)
+
+Review the archive if this was not intentional.
+```
+
+**Output On Error (Archive Exists)**
+
+```
+## Archive Failed
+
+**Change:** <change-name>
+**Target:** openspec/changes/archive/YYYY-MM-DD-<name>/
+
+Target archive directory already exists.
+
+**Options:**
+1. Rename the existing archive
+2. Delete the existing archive if it's a duplicate
+3. Wait until a different date to archive
+```
+
+**Guardrails**
+- Always prompt for change selection if not provided
+- Use artifact graph (openspec status --json) for completion checking
+- Don't block archive on warnings - just inform and confirm
+- Preserve .openspec.yaml when moving to archive (it moves with the directory)
+- Show clear summary of what happened
+- If sync is requested, use the Skill tool to invoke `openspec-sync-specs` (agent-driven)
+- If delta specs exist, always run the sync assessment and show the combined summary before prompting
--- a/.opencode/command/opsx-explore.md
+++ b/.opencode/command/opsx-explore.md
@@ -0,0 +1,170 @@
+---
+description: Enter explore mode - think through ideas, investigate problems, clarify requirements
+---
+
+Enter explore mode. Think deeply. Visualize freely. Follow the conversation wherever it goes.
+
+**IMPORTANT: Explore mode is for thinking, not implementing.** You may read files, search code, and investigate the codebase, but you must NEVER write code or implement features. If the user asks you to implement something, remind them to exit explore mode first and create a change proposal. You MAY create OpenSpec artifacts (proposals, designs, specs) if the user asks—that's capturing thinking, not implementing.
+
+**This is a stance, not a workflow.** There are no fixed steps, no required sequence, no mandatory outputs. You're a thinking partner helping the user explore.
+
+**Input**: The argument after `/opsx-explore` is whatever the user wants to think about. Could be:
+- A vague idea: "real-time collaboration"
+- A specific problem: "the auth system is getting unwieldy"
+- A change name: "add-dark-mode" (to explore in context of that change)
+- A comparison: "postgres vs sqlite for this"
+- Nothing (just enter explore mode)
+
+---
+
+## The Stance
+
+- **Curious, not prescriptive** - Ask questions that emerge naturally, don't follow a script
+- **Open threads, not interrogations** - Surface multiple interesting directions and let the user follow what resonates. Don't funnel them through a single path of questions.
+- **Visual** - Use ASCII diagrams liberally when they'd help clarify thinking
+- **Adaptive** - Follow interesting threads, pivot when new information emerges
+- **Patient** - Don't rush to conclusions, let the shape of the problem emerge
+- **Grounded** - Explore the actual codebase when relevant, don't just theorize
+
+---
+
+## What You Might Do
+
+Depending on what the user brings, you might:
+
+**Explore the problem space**
+- Ask clarifying questions that emerge from what they said
+- Challenge assumptions
+- Reframe the problem
+- Find analogies
+
+**Investigate the codebase**
+- Map existing architecture relevant to the discussion
+- Find integration points
+- Identify patterns already in use
+- Surface hidden complexity
+
+**Compare options**
+- Brainstorm multiple approaches
+- Build comparison tables
+- Sketch tradeoffs
+- Recommend a path (if asked)
+
+**Visualize**
+```
+┌─────────────────────────────────────────┐
+│     Use ASCII diagrams liberally        │
+├─────────────────────────────────────────┤
+│                                         │
+│   ┌────────┐         ┌────────┐        │
+│   │ State  │────────▶│ State  │        │
+│   │   A    │         │   B    │        │
+│   └────────┘         └────────┘        │
+│                                         │
+│   System diagrams, state machines,      │
+│   data flows, architecture sketches,    │
+│   dependency graphs, comparison tables  │
+│                                         │
+└─────────────────────────────────────────┘
+```
+
+**Surface risks and unknowns**
+- Identify what could go wrong
+- Find gaps in understanding
+- Suggest spikes or investigations
+
+---
+
+## OpenSpec Awareness
+
+You have full context of the OpenSpec system. Use it naturally, don't force it.
+
+### Check for context
+
+At the start, quickly check what exists:
+```bash
+openspec list --json
+```
+
+This tells you:
+- If there are active changes
+- Their names, schemas, and status
+- What the user might be working on
+
+If the user mentioned a specific change name, read its artifacts for context.
+
+### When no change exists
+
+Think freely. When insights crystallize, you might offer:
+
+- "This feels solid enough to start a change. Want me to create a proposal?"
+- Or keep exploring - no pressure to formalize
+
+### When a change exists
+
+If the user mentions a change or you detect one is relevant:
+
+1. **Read existing artifacts for context**
+   - `openspec/changes/<name>/proposal.md`
+   - `openspec/changes/<name>/design.md`
+   - `openspec/changes/<name>/tasks.md`
+   - etc.
+
+2. **Reference them naturally in conversation**
+   - "Your design mentions using Redis, but we just realized SQLite fits better..."
+   - "The proposal scopes this to premium users, but we're now thinking everyone..."
+
+3. **Offer to capture when decisions are made**
+
+   | Insight Type | Where to Capture |
+   |--------------|------------------|
+   | New requirement discovered | `specs/<capability>/spec.md` |
+   | Requirement changed | `specs/<capability>/spec.md` |
+   | Design decision made | `design.md` |
+   | Scope changed | `proposal.md` |
+   | New work identified | `tasks.md` |
+   | Assumption invalidated | Relevant artifact |
+
+   Example offers:
+   - "That's a design decision. Capture it in design.md?"
+   - "This is a new requirement. Add it to specs?"
+   - "This changes scope. Update the proposal?"
+
+4. **The user decides** - Offer and move on. Don't pressure. Don't auto-capture.
+
+---
+
+## What You Don't Have To Do
+
+- Follow a script
+- Ask the same questions every time
+- Produce a specific artifact
+- Reach a conclusion
+- Stay on topic if a tangent is valuable
+- Be brief (this is thinking time)
+
+---
+
+## Ending Discovery
+
+There's no required ending. Discovery might:
+
+- **Flow into a proposal**: "Ready to start? I can create a change proposal."
+- **Result in artifact updates**: "Updated design.md with these decisions"
+- **Just provide clarity**: User has what they need, moves on
+- **Continue later**: "We can pick this up anytime"
+
+When things crystallize, you might offer a summary - but it's optional. Sometimes the thinking IS the value.
+
+---
+
+## Guardrails
+
+- **Don't implement** - Never write code or implement features. Creating OpenSpec artifacts is fine, writing application code is not.
+- **Don't fake understanding** - If something is unclear, dig deeper
+- **Don't rush** - Discovery is thinking time, not task time
+- **Don't force structure** - Let patterns emerge naturally
+- **Don't auto-capture** - Offer to save insights, don't just do it
+- **Do visualize** - A good diagram is worth many paragraphs
+- **Do explore the codebase** - Ground discussions in reality
+- **Do question assumptions** - Including the user's and your own
--- a/.opencode/command/opsx-propose.md
+++ b/.opencode/command/opsx-propose.md
@@ -0,0 +1,103 @@
+---
+description: Propose a new change - create it and generate all artifacts in one step
+---
+
+Propose a new change - create the change and generate all artifacts in one step.
+
+I'll create a change with artifacts:
+- proposal.md (what & why)
+- design.md (how)
+- tasks.md (implementation steps)
+
+When ready to implement, run /opsx-apply
+
+---
+
+**Input**: The argument after `/opsx-propose` is the change name (kebab-case), OR a description of what the user wants to build.
+
+**Steps**
+
+1. **If no input provided, ask what they want to build**
+
+   Use the **AskUserQuestion tool** (open-ended, no preset options) to ask:
+   > "What change do you want to work on? Describe what you want to build or fix."
+
+   From their description, derive a kebab-case name (e.g., "add user authentication" → `add-user-auth`).
+
+   **IMPORTANT**: Do NOT proceed without understanding what the user wants to build.
+
+2. **Create the change directory**
+   ```bash
+   openspec new change "<name>"
+   ```
+   This creates a scaffolded change at `openspec/changes/<name>/` with `.openspec.yaml`.
+
+3. **Get the artifact build order**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to get:
+   - `applyRequires`: array of artifact IDs needed before implementation (e.g., `["tasks"]`)
+   - `artifacts`: list of all artifacts with their status and dependencies
+
+4. **Create artifacts in sequence until apply-ready**
+
+   Use the **TodoWrite tool** to track progress through the artifacts.
+
+   Loop through artifacts in dependency order (artifacts with no pending dependencies first):
+
+   a. **For each artifact that is `ready` (dependencies satisfied)**:
+      - Get instructions:
+        ```bash
+        openspec instructions <artifact-id> --change "<name>" --json
+        ```
+      - The instructions JSON includes:
+        - `context`: Project background (constraints for you - do NOT include in output)
+        - `rules`: Artifact-specific rules (constraints for you - do NOT include in output)
+        - `template`: The structure to use for your output file
+        - `instruction`: Schema-specific guidance for this artifact type
+        - `outputPath`: Where to write the artifact
+        - `dependencies`: Completed artifacts to read for context
+      - Read any completed dependency files for context
+      - Create the artifact file using `template` as the structure
+      - Apply `context` and `rules` as constraints - but do NOT copy them into the file
+      - Show brief progress: "Created <artifact-id>"
+
+   b. **Continue until all `applyRequires` artifacts are complete**
+      - After creating each artifact, re-run `openspec status --change "<name>" --json`
+      - Check if every artifact ID in `applyRequires` has `status: "done"` in the artifacts array
+      - Stop when all `applyRequires` artifacts are done
+
+   c. **If an artifact requires user input** (unclear context):
+      - Use **AskUserQuestion tool** to clarify
+      - Then continue with creation
+
+5. **Show final status**
+   ```bash
+   openspec status --change "<name>"
+   ```
+
+**Output**
+
+After completing all artifacts, summarize:
+- Change name and location
+- List of artifacts created with brief descriptions
+- What's ready: "All artifacts created! Ready for implementation."
+- Prompt: "Run `/opsx-apply` to start implementing."
+
+**Artifact Creation Guidelines**
+
+- Follow the `instruction` field from `openspec instructions` for each artifact type
+- The schema defines what each artifact should contain - follow it
+- Read dependency artifacts for context before creating new ones
+- Use `template` as the structure for your output file - fill in its sections
+- **IMPORTANT**: `context` and `rules` are constraints for YOU, not content for the file
+  - Do NOT copy `<context>`, `<rules>`, `<project_context>` blocks into the artifact
+  - These guide what you write, but should never appear in the output
+
+**Guardrails**
+- Create ALL artifacts needed for implementation (as defined by schema's `apply.requires`)
+- Always read dependency artifacts before creating a new one
+- If context is critically unclear, ask the user - but prefer making reasonable decisions to keep momentum
+- If a change with that name already exists, ask if user wants to continue it or create a new one
+- Verify each artifact file exists after writing before proceeding to next
--- a/.opencode/plugins/memorix.js
+++ b/.opencode/plugins/memorix.js
@@ -0,0 +1,82 @@
+/**
+ * Memorix — Cross-Agent Memory Bridge Plugin for OpenCode
+ *
+ * Automatically captures session context and tool usage,
+ * piping events to `memorix hook` for cross-agent memory persistence.
+ *
+ * Generated by: memorix installHooks('opencode', projectRoot)
+ * Docs: https://github.com/AVIDS2/memorix
+ */
+export const MemorixPlugin = async ({ project, client, $, directory, worktree }) => {
+  console.log('[memorix] plugin loaded, directory:', directory);
+
+  /** Pipe event JSON to memorix hook via temp file (Windows .cmd stdin workaround) */
+  async function runHook(payload) {
+    const tmpDir = Bun.env.TEMP || Bun.env.TMP || '/tmp';
+    const tmpPath = `${tmpDir}/memorix-hook-${Date.now()}.json`;
+    try {
+      const data = JSON.stringify(payload);
+      await Bun.write(tmpPath, data);
+      // cat | pipe works through .cmd wrappers; < redirect does NOT
+      await $`cat ${tmpPath} | memorix hook`.quiet().nothrow();
+      console.log('[memorix] hook fired:', payload.hook_event_name);
+    } catch (err) {
+      console.log('[memorix] hook error:', err?.message ?? err);
+    } finally {
+      try { const { unlinkSync } = await import('node:fs'); unlinkSync(tmpPath); } catch {}
+    }
+  }
+
+  return {
+    /** Catch-all event handler for session lifecycle + file events */
+    event: async ({ event }) => {
+      if (event.type === 'session.created') {
+        await runHook({
+          agent: 'opencode',
+          hook_event_name: 'session.created',
+          cwd: directory,
+        });
+      } else if (event.type === 'session.idle') {
+        await runHook({
+          agent: 'opencode',
+          hook_event_name: 'session.idle',
+          cwd: directory,
+        });
+      } else if (event.type === 'file.edited') {
+        await runHook({
+          agent: 'opencode',
+          hook_event_name: 'file.edited',
+          file_path: event.properties?.path ?? '',
+          cwd: directory,
+        });
+      } else if (event.type === 'command.executed') {
+        await runHook({
+          agent: 'opencode',
+          hook_event_name: 'command.executed',
+          command: event.properties?.command ?? '',
+          cwd: directory,
+        });
+      }
+    },
+
+    /** Record tool usage after execution (hook, not event) */
+    'tool.execute.after': async (input, output) => {
+      await runHook({
+        agent: 'opencode',
+        hook_event_name: 'tool.execute.after',
+        tool_name: input.tool,
+        tool_input: input.args,
+        cwd: directory,
+      });
+    },
+
+    /** Inject memorix context into compaction prompt */
+    'experimental.session.compacting': async (input, output) => {
+      output.context.push(
+        '## Memorix Cross-Agent Memory\n' +
+        'Before compacting, use memorix_store to save important discoveries, decisions, and gotchas.\n' +
+        'After compacting, use memorix_session_start to reload session context, then memorix_search for specific topics.'
+      );
+    },
+  };
+};
--- a/.opencode/skills/openspec-apply-change/SKILL.md
+++ b/.opencode/skills/openspec-apply-change/SKILL.md
@@ -0,0 +1,156 @@
+---
+name: openspec-apply-change
+description: Implement tasks from an OpenSpec change. Use when the user wants to start implementing, continue implementation, or work through tasks.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Implement tasks from an OpenSpec change.
+
+**Input**: Optionally specify a change name. If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **Select the change**
+
+   If a name is provided, use it. Otherwise:
+   - Infer from conversation context if the user mentioned a change
+   - Auto-select if only one active change exists
+   - If ambiguous, run `openspec list --json` to get available changes and use the **AskUserQuestion tool** to let the user select
+
+   Always announce: "Using change: <name>" and how to override (e.g., `/opsx-apply <other>`).
+
+2. **Check status to understand the schema**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used (e.g., "spec-driven")
+   - Which artifact contains the tasks (typically "tasks" for spec-driven, check status for others)
+
+3. **Get apply instructions**
+
+   ```bash
+   openspec instructions apply --change "<name>" --json
+   ```
+
+   This returns:
+   - Context file paths (varies by schema - could be proposal/specs/design/tasks or spec/tests/implementation/docs)
+   - Progress (total, complete, remaining)
+   - Task list with status
+   - Dynamic instruction based on current state
+
+   **Handle states:**
+   - If `state: "blocked"` (missing artifacts): show message, suggest using openspec-continue-change
+   - If `state: "all_done"`: congratulate, suggest archive
+   - Otherwise: proceed to implementation
+
+4. **Read context files**
+
+   Read the files listed in `contextFiles` from the apply instructions output.
+   The files depend on the schema being used:
+   - **spec-driven**: proposal, specs, design, tasks
+   - Other schemas: follow the contextFiles from CLI output
+
+5. **Show current progress**
+
+   Display:
+   - Schema being used
+   - Progress: "N/M tasks complete"
+   - Remaining tasks overview
+   - Dynamic instruction from CLI
+
+6. **Implement tasks (loop until done or blocked)**
+
+   For each pending task:
+   - Show which task is being worked on
+   - Make the code changes required
+   - Keep changes minimal and focused
+   - Mark task complete in the tasks file: `- [ ]` → `- [x]`
+   - Continue to next task
+
+   **Pause if:**
+   - Task is unclear → ask for clarification
+   - Implementation reveals a design issue → suggest updating artifacts
+   - Error or blocker encountered → report and wait for guidance
+   - User interrupts
+
+7. **On completion or pause, show status**
+
+   Display:
+   - Tasks completed this session
+   - Overall progress: "N/M tasks complete"
+   - If all done: suggest archive
+   - If paused: explain why and wait for guidance
+
+**Output During Implementation**
+
+```
+## Implementing: <change-name> (schema: <schema-name>)
+
+Working on task 3/7: <task description>
+[...implementation happening...]
+✓ Task complete
+
+Working on task 4/7: <task description>
+[...implementation happening...]
+✓ Task complete
+```
+
+**Output On Completion**
+
+```
+## Implementation Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 7/7 tasks complete ✓
+
+### Completed This Session
+- [x] Task 1
+- [x] Task 2
+...
+
+All tasks complete! Ready to archive this change.
+```
+
+**Output On Pause (Issue Encountered)**
+
+```
+## Implementation Paused
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Progress:** 4/7 tasks complete
+
+### Issue Encountered
+<description of the issue>
+
+**Options:**
+1. <option 1>
+2. <option 2>
+3. Other approach
+
+What would you like to do?
+```
+
+**Guardrails**
+- Keep going through tasks until done or blocked
+- Always read context files before starting (from the apply instructions output)
+- If task is ambiguous, pause and ask before implementing
+- If implementation reveals issues, pause and suggest artifact updates
+- Keep code changes minimal and scoped to each task
+- Update task checkbox immediately after completing each task
+- Pause on errors, blockers, or unclear requirements - don't guess
+- Use contextFiles from CLI output, don't assume specific file names
+
+**Fluid Workflow Integration**
+
+This skill supports the "actions on a change" model:
+
+- **Can be invoked anytime**: Before all artifacts are done (if tasks exist), after partial implementation, interleaved with other actions
+- **Allows artifact updates**: If implementation reveals design issues, suggest updating artifacts - not phase-locked, work fluidly
--- a/.opencode/skills/openspec-archive-change/SKILL.md
+++ b/.opencode/skills/openspec-archive-change/SKILL.md
@@ -0,0 +1,114 @@
+---
+name: openspec-archive-change
+description: Archive a completed change in the experimental workflow. Use when the user wants to finalize and archive a change after implementation is complete.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Archive a completed change in the experimental workflow.
+
+**Input**: Optionally specify a change name. If omitted, check if it can be inferred from conversation context. If vague or ambiguous you MUST prompt for available changes.
+
+**Steps**
+
+1. **If no change name provided, prompt for selection**
+
+   Run `openspec list --json` to get available changes. Use the **AskUserQuestion tool** to let the user select.
+
+   Show only active changes (not already archived).
+   Include the schema used for each change if available.
+
+   **IMPORTANT**: Do NOT guess or auto-select a change. Always let the user choose.
+
+2. **Check artifact completion status**
+
+   Run `openspec status --change "<name>" --json` to check artifact completion.
+
+   Parse the JSON to understand:
+   - `schemaName`: The workflow being used
+   - `artifacts`: List of artifacts with their status (`done` or other)
+
+   **If any artifacts are not `done`:**
+   - Display warning listing incomplete artifacts
+   - Use **AskUserQuestion tool** to confirm user wants to proceed
+   - Proceed if user confirms
+
+3. **Check task completion status**
+
+   Read the tasks file (typically `tasks.md`) to check for incomplete tasks.
+
+   Count tasks marked with `- [ ]` (incomplete) vs `- [x]` (complete).
+
+   **If incomplete tasks found:**
+   - Display warning showing count of incomplete tasks
+   - Use **AskUserQuestion tool** to confirm user wants to proceed
+   - Proceed if user confirms
+
+   **If no tasks file exists:** Proceed without task-related warning.
+
+4. **Assess delta spec sync state**
+
+   Check for delta specs at `openspec/changes/<name>/specs/`. If none exist, proceed without sync prompt.
+
+   **If delta specs exist:**
+   - Compare each delta spec with its corresponding main spec at `openspec/specs/<capability>/spec.md`
+   - Determine what changes would be applied (adds, modifications, removals, renames)
+   - Show a combined summary before prompting
+
+   **Prompt options:**
+   - If changes needed: "Sync now (recommended)", "Archive without syncing"
+   - If already synced: "Archive now", "Sync anyway", "Cancel"
+
+   If user chooses sync, use Task tool (subagent_type: "general-purpose", prompt: "Use Skill tool to invoke openspec-sync-specs for change '<name>'. Delta spec analysis: <include the analyzed delta spec summary>"). Proceed to archive regardless of choice.
+
+5. **Perform the archive**
+
+   Create the archive directory if it doesn't exist:
+   ```bash
+   mkdir -p openspec/changes/archive
+   ```
+
+   Generate target name using current date: `YYYY-MM-DD-<change-name>`
+
+   **Check if target already exists:**
+   - If yes: Fail with error, suggest renaming existing archive or using different date
+   - If no: Move the change directory to archive
+
+   ```bash
+   mv openspec/changes/<name> openspec/changes/archive/YYYY-MM-DD-<name>
+   ```
+
+6. **Display summary**
+
+   Show archive completion summary including:
+   - Change name
+   - Schema that was used
+   - Archive location
+   - Whether specs were synced (if applicable)
+   - Note about any warnings (incomplete artifacts/tasks)
+
+**Output On Success**
+
+```
+## Archive Complete
+
+**Change:** <change-name>
+**Schema:** <schema-name>
+**Archived to:** openspec/changes/archive/YYYY-MM-DD-<name>/
+**Specs:** ✓ Synced to main specs (or "No delta specs" or "Sync skipped")
+
+All artifacts complete. All tasks complete.
+```
+
+**Guardrails**
+- Always prompt for change selection if not provided
+- Use artifact graph (openspec status --json) for completion checking
+- Don't block archive on warnings - just inform and confirm
+- Preserve .openspec.yaml when moving to archive (it moves with the directory)
+- Show clear summary of what happened
+- If sync is requested, use openspec-sync-specs approach (agent-driven)
+- If delta specs exist, always run the sync assessment and show the combined summary before prompting
--- a/.opencode/skills/openspec-explore/SKILL.md
+++ b/.opencode/skills/openspec-explore/SKILL.md
@@ -0,0 +1,288 @@
+---
+name: openspec-explore
+description: Enter explore mode - a thinking partner for exploring ideas, investigating problems, and clarifying requirements. Use when the user wants to think through something before or during a change.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Enter explore mode. Think deeply. Visualize freely. Follow the conversation wherever it goes.
+
+**IMPORTANT: Explore mode is for thinking, not implementing.** You may read files, search code, and investigate the codebase, but you must NEVER write code or implement features. If the user asks you to implement something, remind them to exit explore mode first and create a change proposal. You MAY create OpenSpec artifacts (proposals, designs, specs) if the user asks—that's capturing thinking, not implementing.
+
+**This is a stance, not a workflow.** There are no fixed steps, no required sequence, no mandatory outputs. You're a thinking partner helping the user explore.
+
+---
+
+## The Stance
+
+- **Curious, not prescriptive** - Ask questions that emerge naturally, don't follow a script
+- **Open threads, not interrogations** - Surface multiple interesting directions and let the user follow what resonates. Don't funnel them through a single path of questions.
+- **Visual** - Use ASCII diagrams liberally when they'd help clarify thinking
+- **Adaptive** - Follow interesting threads, pivot when new information emerges
+- **Patient** - Don't rush to conclusions, let the shape of the problem emerge
+- **Grounded** - Explore the actual codebase when relevant, don't just theorize
+
+---
+
+## What You Might Do
+
+Depending on what the user brings, you might:
+
+**Explore the problem space**
+- Ask clarifying questions that emerge from what they said
+- Challenge assumptions
+- Reframe the problem
+- Find analogies
+
+**Investigate the codebase**
+- Map existing architecture relevant to the discussion
+- Find integration points
+- Identify patterns already in use
+- Surface hidden complexity
+
+**Compare options**
+- Brainstorm multiple approaches
+- Build comparison tables
+- Sketch tradeoffs
+- Recommend a path (if asked)
+
+**Visualize**
+```
+┌─────────────────────────────────────────┐
+│     Use ASCII diagrams liberally        │
+├─────────────────────────────────────────┤
+│                                         │
+│   ┌────────┐         ┌────────┐        │
+│   │ State  │────────▶│ State  │        │
+│   │   A    │         │   B    │        │
+│   └────────┘         └────────┘        │
+│                                         │
+│   System diagrams, state machines,      │
+│   data flows, architecture sketches,    │
+│   dependency graphs, comparison tables  │
+│                                         │
+└─────────────────────────────────────────┘
+```
+
+**Surface risks and unknowns**
+- Identify what could go wrong
+- Find gaps in understanding
+- Suggest spikes or investigations
+
+---
+
+## OpenSpec Awareness
+
+You have full context of the OpenSpec system. Use it naturally, don't force it.
+
+### Check for context
+
+At the start, quickly check what exists:
+```bash
+openspec list --json
+```
+
+This tells you:
+- If there are active changes
+- Their names, schemas, and status
+- What the user might be working on
+
+### When no change exists
+
+Think freely. When insights crystallize, you might offer:
+
+- "This feels solid enough to start a change. Want me to create a proposal?"
+- Or keep exploring - no pressure to formalize
+
+### When a change exists
+
+If the user mentions a change or you detect one is relevant:
+
+1. **Read existing artifacts for context**
+   - `openspec/changes/<name>/proposal.md`
+   - `openspec/changes/<name>/design.md`
+   - `openspec/changes/<name>/tasks.md`
+   - etc.
+
+2. **Reference them naturally in conversation**
+   - "Your design mentions using Redis, but we just realized SQLite fits better..."
+   - "The proposal scopes this to premium users, but we're now thinking everyone..."
+
+3. **Offer to capture when decisions are made**
+
+   | Insight Type | Where to Capture |
+   |--------------|------------------|
+   | New requirement discovered | `specs/<capability>/spec.md` |
+   | Requirement changed | `specs/<capability>/spec.md` |
+   | Design decision made | `design.md` |
+   | Scope changed | `proposal.md` |
+   | New work identified | `tasks.md` |
+   | Assumption invalidated | Relevant artifact |
+
+   Example offers:
+   - "That's a design decision. Capture it in design.md?"
+   - "This is a new requirement. Add it to specs?"
+   - "This changes scope. Update the proposal?"
+
+4. **The user decides** - Offer and move on. Don't pressure. Don't auto-capture.
+
+---
+
+## What You Don't Have To Do
+
+- Follow a script
+- Ask the same questions every time
+- Produce a specific artifact
+- Reach a conclusion
+- Stay on topic if a tangent is valuable
+- Be brief (this is thinking time)
+
+---
+
+## Handling Different Entry Points
+
+**User brings a vague idea:**
+```
+User: I'm thinking about adding real-time collaboration
+
+You: Real-time collab is a big space. Let me think about this...
+
+      COLLABORATION SPECTRUM
+      ════════════════════════════════════════════
+
+      Awareness          Coordination         Sync
+          │                   │                 │
+          ▼                   ▼                 ▼
+      ┌────────┐         ┌────────┐        ┌────────┐
+      │Presence│         │Cursors │        │  CRDT  │
+      │  "3    │         │ Multi  │        │Conflict│
+      │online" │         │ select │        │  free  │
+      └────────┘         └────────┘        └────────┘
+          │                   │                 │
+       trivial            moderate           complex
+
+      Where's your head at?
+```
+
+**User brings a specific problem:**
+```
+User: The auth system is a mess
+
+You: [reads codebase]
+
+     ┌─────────────────────────────────────────────┐
+     │              CURRENT AUTH FLOW              │
+     └─────────────────────────────────────────────┘
+                        │
+          ┌─────────────┼─────────────┐
+          ▼             ▼             ▼
+     ┌─────────┐  ┌─────────┐  ┌─────────┐
+     │ Google  │  │  GitHub │  │  Email  │
+     │  OAuth  │  │  OAuth  │  │  Magic  │
+     └────┬────┘  └────┬────┘  └────┬────┘
+          │             │             │
+          └─────────────┼─────────────┘
+                        ▼
+                  ┌───────────┐
+                  │  Session  │
+                  └─────┬─────┘
+                        │
+                        ▼
+                  ┌───────────┐
+                  │   Perms   │
+                  └───────────┘
+
+     I see three tangles. Which one's burning?
+```
+
+**User is stuck mid-implementation:**
+```
+User: /opsx-explore add-auth-system
+      The OAuth integration is more complex than expected
+
+You: [reads change artifacts]
+
+     You're on task 4: "Implement OAuth flow"
+
+     Let me trace what's involved...
+
+     [draws diagram, explores options, suggests paths]
+
+     Want to update the design to reflect this?
+     Or add a spike task to investigate?
+```
+
+**User wants to compare options:**
+```
+User: Should we use Postgres or SQLite?
+
+You: Generic answer is boring. What's the context?
+
+User: A CLI tool that tracks local dev environments
+
+You: That changes everything.
+
+     ┌─────────────────────────────────────────────────┐
+     │         CLI TOOL DATA STORAGE                  │
+     └─────────────────────────────────────────────────┘
+
+     Key constraints:
+     • No daemon running
+     • Must work offline
+     • Single user
+
+                  SQLite          Postgres
+     Deployment   embedded ✓      needs server ✗
+     Offline      yes ✓           no ✗
+     Single file  yes ✓           no ✗
+
+     SQLite. Not even close.
+
+     Unless... is there a sync component?
+```
+
+---
+
+## Ending Discovery
+
+There's no required ending. Discovery might:
+
+- **Flow into a proposal**: "Ready to start? I can create a change proposal."
+- **Result in artifact updates**: "Updated design.md with these decisions"
+- **Just provide clarity**: User has what they need, moves on
+- **Continue later**: "We can pick this up anytime"
+
+When it feels like things are crystallizing, you might summarize:
+
+```
+## What We Figured Out
+
+**The problem**: [crystallized understanding]
+
+**The approach**: [if one emerged]
+
+**Open questions**: [if any remain]
+
+**Next steps** (if ready):
+- Create a change proposal
+- Keep exploring: just keep talking
+```
+
+But this summary is optional. Sometimes the thinking IS the value.
+
+---
+
+## Guardrails
+
+- **Don't implement** - Never write code or implement features. Creating OpenSpec artifacts is fine, writing application code is not.
+- **Don't fake understanding** - If something is unclear, dig deeper
+- **Don't rush** - Discovery is thinking time, not task time
+- **Don't force structure** - Let patterns emerge naturally
+- **Don't auto-capture** - Offer to save insights, don't just do it
+- **Do visualize** - A good diagram is worth many paragraphs
+- **Do explore the codebase** - Ground discussions in reality
+- **Do question assumptions** - Including the user's and your own
--- a/.opencode/skills/openspec-propose/SKILL.md
+++ b/.opencode/skills/openspec-propose/SKILL.md
@@ -0,0 +1,110 @@
+---
+name: openspec-propose
+description: Propose a new change with all artifacts generated in one step. Use when the user wants to quickly describe what they want to build and get a complete proposal with design, specs, and tasks ready for implementation.
+license: MIT
+compatibility: Requires openspec CLI.
+metadata:
+  author: openspec
+  version: "1.0"
+  generatedBy: "1.2.0"
+---
+
+Propose a new change - create the change and generate all artifacts in one step.
+
+I'll create a change with artifacts:
+- proposal.md (what & why)
+- design.md (how)
+- tasks.md (implementation steps)
+
+When ready to implement, run /opsx-apply
+
+---
+
+**Input**: The user's request should include a change name (kebab-case) OR a description of what they want to build.
+
+**Steps**
+
+1. **If no clear input provided, ask what they want to build**
+
+   Use the **AskUserQuestion tool** (open-ended, no preset options) to ask:
+   > "What change do you want to work on? Describe what you want to build or fix."
+
+   From their description, derive a kebab-case name (e.g., "add user authentication" → `add-user-auth`).
+
+   **IMPORTANT**: Do NOT proceed without understanding what the user wants to build.
+
+2. **Create the change directory**
+   ```bash
+   openspec new change "<name>"
+   ```
+   This creates a scaffolded change at `openspec/changes/<name>/` with `.openspec.yaml`.
+
+3. **Get the artifact build order**
+   ```bash
+   openspec status --change "<name>" --json
+   ```
+   Parse the JSON to get:
+   - `applyRequires`: array of artifact IDs needed before implementation (e.g., `["tasks"]`)
+   - `artifacts`: list of all artifacts with their status and dependencies
+
+4. **Create artifacts in sequence until apply-ready**
+
+   Use the **TodoWrite tool** to track progress through the artifacts.
+
+   Loop through artifacts in dependency order (artifacts with no pending dependencies first):
+
+   a. **For each artifact that is `ready` (dependencies satisfied)**:
+      - Get instructions:
+        ```bash
+        openspec instructions <artifact-id> --change "<name>" --json
+        ```
+      - The instructions JSON includes:
+        - `context`: Project background (constraints for you - do NOT include in output)
+        - `rules`: Artifact-specific rules (constraints for you - do NOT include in output)
+        - `template`: The structure to use for your output file
+        - `instruction`: Schema-specific guidance for this artifact type
+        - `outputPath`: Where to write the artifact
+        - `dependencies`: Completed artifacts to read for context
+      - Read any completed dependency files for context
+      - Create the artifact file using `template` as the structure
+      - Apply `context` and `rules` as constraints - but do NOT copy them into the file
+      - Show brief progress: "Created <artifact-id>"
+
+   b. **Continue until all `applyRequires` artifacts are complete**
+      - After creating each artifact, re-run `openspec status --change "<name>" --json`
+      - Check if every artifact ID in `applyRequires` has `status: "done"` in the artifacts array
+      - Stop when all `applyRequires` artifacts are done
+
+   c. **If an artifact requires user input** (unclear context):
+      - Use **AskUserQuestion tool** to clarify
+      - Then continue with creation
+
+5. **Show final status**
+   ```bash
+   openspec status --change "<name>"
+   ```
+
+**Output**
+
+After completing all artifacts, summarize:
+- Change name and location
+- List of artifacts created with brief descriptions
+- What's ready: "All artifacts created! Ready for implementation."
+- Prompt: "Run `/opsx-apply` or ask me to implement to start working on the tasks."
+
+**Artifact Creation Guidelines**
+
+- Follow the `instruction` field from `openspec instructions` for each artifact type
+- The schema defines what each artifact should contain - follow it
+- Read dependency artifacts for context before creating new ones
+- Use `template` as the structure for your output file - fill in its sections
+- **IMPORTANT**: `context` and `rules` are constraints for YOU, not content for the file
+  - Do NOT copy `<context>`, `<rules>`, `<project_context>` blocks into the artifact
+  - These guide what you write, but should never appear in the output
+
+**Guardrails**
+- Create ALL artifacts needed for implementation (as defined by schema's `apply.requires`)
+- Always read dependency artifacts before creating a new one
+- If context is critically unclear, ask the user - but prefer making reasonable decisions to keep momentum
+- If a change with that name already exists, ask if user wants to continue it or create a new one
+- Verify each artifact file exists after writing before proceeding to next
--- a/.python-version
+++ b/.python-version
@@ -1 +1 @@
-3.10
+3.13
--- a/.serena/.gitignore
+++ b/.serena/.gitignore
@@ -0,0 +1,2 @@
+/cache
+/project.local.yml
--- a/.serena/project.yml
+++ b/.serena/project.yml
@@ -0,0 +1,138 @@
+# the name by which the project can be referenced within Serena
+project_name: "Mini-Nav"
+
+# list of languages for which language servers are started; choose from:
+#   al                  bash                clojure             cpp                 csharp
+#   csharp_omnisharp    dart                elixir              elm                 erlang
+#   fortran             fsharp              go                  groovy              haskell
+#   java                julia               kotlin              lua                 markdown
+#   matlab              nix                 pascal              perl                php
+#   php_phpactor        powershell          python              python_jedi         r
+#   rego                ruby                ruby_solargraph     rust                scala
+#   swift               terraform           toml                typescript          typescript_vts
+#   vue                 yaml                zig
+#   (This list may be outdated. For the current list, see values of Language enum here:
+#   https://github.com/oraios/serena/blob/main/src/solidlsp/ls_config.py
+#   For some languages, there are alternative language servers, e.g. csharp_omnisharp, ruby_solargraph.)
+# Note:
+#   - For C, use cpp
+#   - For JavaScript, use typescript
+#   - For Free Pascal/Lazarus, use pascal
+# Special requirements:
+#   Some languages require additional setup/installations.
+#   See here for details: https://oraios.github.io/serena/01-about/020_programming-languages.html#language-servers
+# When using multiple languages, the first language server that supports a given file will be used for that file.
+# The first language is the default language and the respective language server will be used as a fallback.
+# Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored.
+languages:
+  - python
+  - yaml
+
+# the encoding used by text files in the project
+# For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings
+encoding: "utf-8"
+
+# The language backend to use for this project.
+# If not set, the global setting from serena_config.yml is used.
+# Valid values: LSP, JetBrains
+# Note: the backend is fixed at startup. If a project with a different backend
+# is activated post-init, an error will be returned.
+language_backend:
+
+# whether to use project's .gitignore files to ignore files
+ignore_all_files_in_gitignore: true
+
+# list of additional paths to ignore in this project.
+# Same syntax as gitignore, so you can use * and **.
+# Note: global ignored_paths from serena_config.yml are also applied additively.
+ignored_paths: []
+
+# whether the project is in read-only mode
+# If set to true, all editing tools will be disabled and attempts to use them will result in an error
+# Added on 2025-04-18
+read_only: false
+
+# list of tool names to exclude. We recommend not excluding any tools, see the readme for more details.
+# Below is the complete list of tools for convenience.
+# To make sure you have the latest list of tools, and to view their descriptions,
+# execute `uv run scripts/print_tool_overview.py`.
+#
+#  * `activate_project`: Activates a project by name.
+#  * `check_onboarding_performed`: Checks whether project onboarding was already performed.
+#  * `create_text_file`: Creates/overwrites a file in the project directory.
+#  * `delete_lines`: Deletes a range of lines within a file.
+#  * `delete_memory`: Deletes a memory from Serena's project-specific memory store.
+#  * `execute_shell_command`: Executes a shell command.
+#  * `find_referencing_code_snippets`: Finds code snippets in which the symbol at the given location is referenced.
+#  * `find_referencing_symbols`: Finds symbols that reference the symbol at the given location (optionally filtered by type).
+#  * `find_symbol`: Performs a global (or local) search for symbols with/containing a given name/substring (optionally filtered by type).
+#  * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes.
+#  * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file.
+#  * `initial_instructions`: Gets the initial instructions for the current project.
+#     Should only be used in settings where the system prompt cannot be set,
+#     e.g. in clients you have no control over, like Claude Desktop.
+#  * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol.
+#  * `insert_at_line`: Inserts content at a given line in a file.
+#  * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol.
+#  * `list_dir`: Lists files and directories in the given directory (optionally with recursion).
+#  * `list_memories`: Lists memories in Serena's project-specific memory store.
+#  * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building).
+#  * `prepare_for_new_conversation`: Provides instructions for preparing for a new conversation (in order to continue with the necessary context).
+#  * `read_file`: Reads a file within the project directory.
+#  * `read_memory`: Reads the memory with the given name from Serena's project-specific memory store.
+#  * `remove_project`: Removes a project from the Serena configuration.
+#  * `replace_lines`: Replaces a range of lines within a file with new content.
+#  * `replace_symbol_body`: Replaces the full definition of a symbol.
+#  * `restart_language_server`: Restarts the language server, may be necessary when edits not through Serena happen.
+#  * `search_for_pattern`: Performs a search for a pattern in the project.
+#  * `summarize_changes`: Provides instructions for summarizing the changes made to the codebase.
+#  * `switch_modes`: Activates modes by providing a list of their names
+#  * `think_about_collected_information`: Thinking tool for pondering the completeness of collected information.
+#  * `think_about_task_adherence`: Thinking tool for determining whether the agent is still on track with the current task.
+#  * `think_about_whether_you_are_done`: Thinking tool for determining whether the task is truly completed.
+#  * `write_memory`: Writes a named memory (for future reference) to Serena's project-specific memory store.
+excluded_tools:
+  - read_file
+  - create_text_file
+  - execute_shell_command
+  - list_dir
+
+# list of tools to include that would otherwise be disabled (particularly optional tools that are disabled by default)
+included_optional_tools: []
+
+# fixed set of tools to use as the base tool set (if non-empty), replacing Serena's default set of tools.
+# This cannot be combined with non-empty excluded_tools or included_optional_tools.
+fixed_tools: []
+
+# list of mode names to that are always to be included in the set of active modes
+# The full set of modes to be activated is base_modes + default_modes.
+# If the setting is undefined, the base_modes from the global configuration (serena_config.yml) apply.
+# Otherwise, this setting overrides the global configuration.
+# Set this to [] to disable base modes for this project.
+# Set this to a list of mode names to always include the respective modes for this project.
+base_modes:
+
+# list of mode names that are to be activated by default.
+# The full set of modes to be activated is base_modes + default_modes.
+# If the setting is undefined, the default_modes from the global configuration (serena_config.yml) apply.
+# Otherwise, this overrides the setting from the global configuration (serena_config.yml).
+# This setting can, in turn, be overridden by CLI parameters (--mode).
+default_modes:
+  - planning
+  - editing
+  - interactive
+  - onboarding
+
+# initial prompt for the project. It will always be given to the LLM upon activating the project
+# (contrary to the memories, which are loaded on demand).
+initial_prompt: ""
+
+# time budget (seconds) per tool call for the retrieval of additional symbol information
+# such as docstrings or parameter information.
+# This overrides the corresponding setting in the global configuration; see the documentation there.
+# If null or missing, use the setting from the global configuration.
+symbol_info_budget: 30
+
+# list of regex patterns which, when matched, mark a memory entry as read‑only.
+# Extends the list from the global configuration, merging the two lists.
+read_only_memory_patterns: []
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,104 @@
+# Memorix — Automatic Memory Rules
+
+You have access to Memorix memory tools. Follow these rules to maintain persistent context across sessions.
+
+## RULE 1: Session Start — Load Context
+
+At the **beginning of every conversation**, BEFORE responding to the user:
+
+1. Call `memorix_session_start` to get the previous session summary and key memories (this is a direct read, not a search — no fragmentation risk)
+2. Then call `memorix_search` with a query related to the user's first message for additional context
+3. If search results are found, use `memorix_detail` to fetch the most relevant ones
+4. Reference relevant memories naturally — the user should feel you "remember" them
+
+## RULE 2: Store Important Context
+
+**Proactively** call `memorix_store` when any of the following happen:
+
+### What MUST be recorded:
+- Architecture/design decisions → type: `decision`
+- Bug identified and fixed → type: `problem-solution`
+- Unexpected behavior or gotcha → type: `gotcha`
+- Config changed (env vars, ports, deps) → type: `what-changed`
+- Feature completed or milestone → type: `what-changed`
+- Trade-off discussed with conclusion → type: `trade-off`
+
+### What should NOT be recorded:
+- Simple file reads, greetings, trivial commands (ls, pwd, git status)
+
+### Use topicKey for evolving topics:
+For decisions, architecture docs, or any topic that evolves over time, ALWAYS use `topicKey` parameter.
+This ensures the memory is UPDATED instead of creating duplicates.
+Use `memorix_suggest_topic_key` to generate a stable key.
+
+Example: `topicKey: "architecture/auth-model"` — subsequent stores with the same key update the existing memory.
+
+### Track progress with the progress parameter:
+When working on features or tasks, include the `progress` parameter:
+```json
+{
+  "progress": {
+    "feature": "user authentication",
+    "status": "in-progress",
+    "completion": 60
+  }
+}
+```
+Status values: `in-progress`, `completed`, `blocked`
+
+## RULE 3: Resolve Completed Memories
+
+When a task is completed, a bug is fixed, or information becomes outdated:
+
+1. Call `memorix_resolve` with the observation IDs to mark them as resolved
+2. Resolved memories are hidden from default search, preventing context pollution
+
+This is critical — without resolving, old bug reports and completed tasks will keep appearing in future searches.
+
+## RULE 4: Session End — Store Decision Chain Summary
+
+When the conversation is ending, create a **decision chain summary** (not just a checklist):
+
+1. Call `memorix_store` with type `session-request` and `topicKey: "session/latest-summary"`:
+
+   **Required structure:**
+   ```
+   ## Goal
+   [What we were working on — specific, not vague]
+
+   ## Key Decisions & Reasoning
+   - Chose X because Y. Rejected Z because [reason].
+   - [Every architectural/design decision with WHY]
+
+   ## What Changed
+   - [File path] — [what changed and why]
+
+   ## Current State
+   - [What works now, what's pending]
+   - [Any blockers or risks]
+
+   ## Next Steps
+   - [Concrete next actions, in priority order]
+   ```
+
+   **Critical: Include the "Key Decisions & Reasoning" section.** Without it, the next AI session will lack the context to understand WHY things were done a certain way and may suggest conflicting approaches.
+
+2. Call `memorix_resolve` on any memories for tasks completed in this session
+
+## RULE 5: Compact Awareness
+
+Memorix automatically compacts memories on store:
+- **With LLM API configured:** Smart dedup — extracts facts, compares with existing, merges or skips duplicates
+- **Without LLM (free mode):** Heuristic dedup — uses similarity scores to detect and merge duplicate memories
+- **You don't need to manually deduplicate.** Just store naturally and compact handles the rest.
+- If you notice excessive duplicate memories, call `memorix_deduplicate` for batch cleanup.
+
+## Guidelines
+
+- **Use concise titles** (~5-10 words) and structured facts
+- **Include file paths** in filesModified when relevant
+- **Include related concepts** for better searchability
+- **Always use topicKey** for recurring topics to prevent duplicates
+- **Always resolve** completed tasks and fixed bugs
+- **Always include reasoning** — "chose X because Y" is 10x more valuable than "did X"
+- Search defaults to `status="active"` — use `status="all"` to include resolved memories
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,125 @@
+# Project Spec & Rules
+
+## 代码规范
+
+### Google风格代码
+详细参阅：https://raw.githubusercontent.com/shendeguize/GooglePythonStyleGuideCN/refs/heads/master/README.md
+
+### 代码编写原则
+- 简洁，清晰易懂，最小化实现
+- 条件或循环分支不能超过三层，提前Return以减少分支的出现
+- 变量说明注释、条件或循环分支注释完全
+- 无需向后兼容，避免添加过多功能
+- 先编写测试集，再实现代码
+- 实现测试集后，先询问用户意见，用户确认后才能继续
+- 如非用户要求，无需编写基准测试代码
+- 英文注释，中文文档
+- 完成代码编写后，在文档的框架不变的情况下更新文档，如CLAUDE.md
+
+### 测试编写原则
+- 精简、干净、快速
+- 核心关键逻辑或算法必须测试
+- 需要加载transformer模型进行验证的测试与无需加载模型的测试分离
+- 无需编写测试集的情况
+  - UI界面相关的代码
+  - 过于复杂或耗时的逻辑
+  - 基准测试相关
+
+### 关键词说明
+- 确认：用户认同当前的实现方案或测试集实现，即可以开始工作
+- 继续：用户需要你重读上下文，继续未完成的工作
+
+### 文档更新说明
+仅在工程目录变化时，更新此文档的目录说明部分。
+如需修改其他部分，请先询问，在进行修改。
+
+## 工程说明
+使用UV管理整个工程，pytest用于测试，justfile用于快捷命令，jujutsu用于版本管理。
+
+### 目录说明
+
+**核心模块**
+- mini-nav/main.py — CLI 入口 (Typer)
+- mini-nav/database.py — LanceDB 单例管理，用于向量存储与检索
+- mini-nav/feature_retrieval.py — DINOv2 图像特征提取与检索
+
+**源代码目录 (mini-nav/)**
+- mini-nav/configs/ — 配置管理 (Pydantic + YAML)
+- mini-nav/commands/ — CLI 命令 (train, benchmark, visualize, generate)
+- mini-nav/compressors/ — 特征压缩算法
+  - hash_compressor.py — 哈希压缩器与训练loss
+  - pipeline.py — 压缩流水线（整合 DINO 特征提取）
+  - train.py — 压缩器训练脚本
+- mini-nav/data_loading/ — 数据加载与合成
+  - loader.py — 数据加载器
+  - insdet_scenes.py — InsDet场景数据集加载
+  - synthesizer.py — 场景合成器
+- mini-nav/utils/ — 工具函数
+  - feature_extractor.py — 特征提取工具
+  - sam.py — SAM 2.1 分割工具
+- mini-nav/tests/ — pytest 测试集
+- mini-nav/benchmarks/ — 基准测试 (recall@k)
+  - tasks/
+    - multi_object_retrieval.py — 多目标检索基准任务
+- mini-nav/visualizer/ — Dash + Plotly 可视化应用
+
+**数据目录**
+- datasets/ — 数据集目录
+- outputs/ — 默认输出目录 (数据库、模型权重等)
+
+### Python库
+详细可查询pyproject.toml或使用`uv pip list`获取详细的库信息，请基于目前的库实现功能。
+如需添加新库，请先询问，用户确认后才能使用`uv add <package>`新增库。
+
+## 版本管理 (Jujutsu 特有)
+本项目使用 Jujutsu (jj) 进行版本控制，并配套 Memorix MCP 作为架构决策与思维轨迹的持久化中心。
+
+- 技能调用: 必须使用 jujutsu 相关工具技能来执行分支、提交、修改（describe）等操作，禁止直接通过 Shell 执行冗长的 Git 兼容指令。
+- 描述规范 (jj desc):
+  - 执行 jj desc 时，首行必须是精简的变更标题。
+  - 空一行后，仅记录改动的核心业务点。
+  - 语言使用英文进行描述
+  - 禁忌: 禁止在 jj 描述中堆砌复杂的算法逻辑或长篇的设计决策。
+- 记忆联动 (Memorix 优先):
+  - 凡涉及架构变更、算法决策或重构逻辑，在执行 jj desc 之前，必须先调用 memorix_store (或对应的添加方法)。
+  - 关联标记: 在 Memorix 的存储记录中，必须强制包含当前变更的 jj change ID，以便实现从代码变更到思维链的完美映射。
+  - 检索逻辑: 在处理需要深入理解上下文的任务时，应主动调用 memorix_search 检索相关的历史 change_id 决策。
+- 无感记录原则:
+  - 严禁在工程目录下生成任何独立的 change_log.md 或 AI 自动化文档。
+  - 所有关于“为什么这样改”的知识，应当流向 jj 的原子化提交描述或 Memorix 的知识图谱库。
+
+### 描述示例
+```text
+refactor(compressors): Simplify module by removing SAM/DINO separation code
+
+- Remove dino_compressor.py and segament_compressor.py
+- Rewrite pipeline.py to inline DINO into HashPipeline
+- Maintain backward compatibility: SAMHashPipeline alias
+- Update tests and benchmark.py
+```
+
+### 提交步骤
+- 执行`jj diff --no-pager`获取当前所有更改
+- 根据更改内容，与openspec生成的相关文档进行总结，重点在于更改内容及其决策逻辑
+- 调用记忆功能，如Memorix记忆先前总结的内容
+- 遵循描述规范，使用jj进行更改的描述
+- 执行`jj new`开启一个新的更改
+
+## 记忆管理 (Memorix MCP)
+本项目使用 Memorix 作为核心上下文引擎，用于存储架构决策、复杂逻辑关联和历史重构原因。
+
+### 记忆写入准则
+- 主动记录: 在完成以下操作后，必须调用 `memorix.store`：
+  - 用户确认后的核心架构变更（例如：LanceDB 的索引策略）。
+  - 复杂的 bug 修复逻辑（记录“为什么”这么修，防止回滚）。
+  - 用户在对话中表达的明确偏好（例如：对特定 Python 库的厌恶）。
+  - 代码的修改及其决策逻辑(例如：对于用户特定需求导致的更改)。
+- 结构化存储: 存储时请使用 `[Category: Topic] Description` 的格式，确保检索效率。
+
+### 记忆检索准则
+- 冷启动检索: 每一轮新对话开始或切换到新任务时，优先调用 `memorix.search` 关键词（如 "project_architecture", "database_schema"），以确保不偏离既有设计。
+- 防止幻觉: 如果对某个旧功能的实现细节不确定，先检索记忆，禁止凭空猜测。
+
+### 内存与冗余控制
+- 精简描述: 存入 Memorix 的信息必须精简，严禁存入整段代码块，仅存储“逻辑描述”和“决策依据”。
+- 清理逻辑: 发现记忆库中存在与当前代码事实冲突的旧信息时，应主动提示用户进行更新或覆盖。
--- a/mini-nav/benchmarks/tasks/multi_object_retrieval.py
+++ b/mini-nav/benchmarks/tasks/multi_object_retrieval.py
@@ -0,0 +1,356 @@
+"""Multi-object retrieval benchmark task.
+
+This benchmark evaluates retrieval accuracy using multiple objects from a cropped
+scene region. It uses SAM for object segmentation, DINO+Hash pipeline for feature
+extraction, and LanceDB for vector storage with scene-level score aggregation.
+"""
+
+import random
+from typing import Any
+
+import lancedb
+import numpy as np
+import pyarrow as pa
+from benchmarks.base import BaseBenchmarkTask
+from benchmarks.tasks.registry import RegisterTask
+from configs.models import BenchmarkTaskConfig
+from rich.progress import track
+from torch import nn
+from torch.utils.data import DataLoader
+from transformers import BitImageProcessorFast
+from utils.feature_extractor import extract_single_image_feature
+from utils.sam import load_sam_model, segment_image
+from utils.common import get_device
+
+
+def _build_object_schema(vector_dim: int) -> pa.Schema:
+    """Build PyArrow schema for object-level vectors.
+
+    Args:
+        vector_dim: Feature vector dimension.
+
+    Returns:
+        PyArrow schema with id, image_id, object_id, category, and vector fields.
+    """
+    return pa.schema(
+        [
+            pa.field("id", pa.int32()),
+            pa.field("image_id", pa.string()),
+            pa.field("object_id", pa.string()),
+            pa.field("category", pa.string()),
+            pa.field("vector", pa.list_(pa.float32(), vector_dim)),
+        ]
+    )
+
+
+def _compute_scene_score(
+    query_object_ids: list[str],
+    retrieved_results: dict[str, list[tuple[float, str]]],
+    gamma: float,
+) -> dict[str, float]:
+    """Compute scene-level scores using co-occurrence penalty.
+
+    Args:
+        query_object_ids: List of query object IDs.
+        retrieved_results: Dict mapping image_id to list of (distance, object_id) results.
+        gamma: Co-occurrence penalty exponent.
+
+    Returns:
+        Dict mapping image_id to computed scene score.
+    """
+    scene_scores: dict[str, float] = {}
+
+    for image_id, results in retrieved_results.items():
+        # Build a set of retrieved object IDs for this scene
+        retrieved_ids = {obj_id for _, obj_id in results}
+
+        # Count how many query objects are found in this scene
+        matched_count = sum(1 for q_id in query_object_ids if q_id in retrieved_ids)
+
+        if matched_count == 0:
+            scene_scores[image_id] = 0.0
+            continue
+
+        # Sum of best similarities (using distance as similarity: smaller = better)
+        # We use 1/(1+distance) to convert distance to similarity
+        similarities = []
+        for dist, obj_id in results:
+            if obj_id in query_object_ids:
+                sim = 1.0 / (1.0 + dist)
+                similarities.append(sim)
+
+        sum_similarity = sum(similarities) if similarities else 0.0
+
+        # Hit rate: ratio of matched objects
+        hit_rate = matched_count / len(query_object_ids)
+
+        # Final score: sum_similarity * (hit_rate)^gamma
+        score = sum_similarity * (hit_rate ** gamma)
+        scene_scores[image_id] = score
+
+    return scene_scores
+
+
+@RegisterTask("multi-object-retrieval")
+class MultiObjectRetrievalTask(BaseBenchmarkTask):
+    """Multi-object retrieval benchmark task."""
+
+    def __init__(self, **kwargs: Any):
+        """Initialize multi-object retrieval task.
+
+        Args:
+            **kwargs: Configuration parameters from BenchmarkTaskConfig.
+        """
+        # Use config from kwargs or load default config
+        if kwargs:
+            config_dict = kwargs
+        else:
+            config = BenchmarkTaskConfig(type="multi-object-retrieval")
+            config_dict = config.model_dump()
+
+        super().__init__(**config_dict)
+        self.config = BenchmarkTaskConfig(**config_dict)
+
+        # SAM settings from ModelConfig (passed via kwargs or use defaults)
+        self.sam_model = kwargs.get("sam_model", "facebook/sam2.1-hiera-large")
+        self.min_mask_area = kwargs.get("sam_min_mask_area", 32 * 32)
+        self.max_masks_per_image = kwargs.get("sam_max_masks", 5)
+
+        # Lazy-loaded resources
+        self._sam_model = None
+        self._mask_generator = None
+
+    @property
+    def sam_model(self) -> Any:
+        """Lazy-load SAM model."""
+        if self._sam_model is None:
+            self._sam_model, self._mask_generator = load_sam_model(
+                model_name=self.sam_model,
+                device=str(get_device()),
+            )
+        return self._sam_model
+
+    @property
+    def mask_generator(self) -> Any:
+        """Lazy-load mask generator."""
+        if self._mask_generator is None:
+            self._sam_model, self._mask_generator = load_sam_model(
+                model_name=self.sam_model,
+                device=str(get_device()),
+            )
+        return self._mask_generator
+
+    def build_database(
+        self,
+        model: nn.Module,
+        processor: BitImageProcessorFast,
+        train_dataset: Any,
+        table: lancedb.table.Table,
+        batch_size: int,
+    ) -> None:
+        """Build the evaluation database with object-level vectors.
+
+        Args:
+            model: Feature extraction model.
+            processor: Image preprocessor.
+            train_dataset: Training dataset.
+            table: LanceDB table to store features.
+            batch_size: Batch size for DataLoader.
+        """
+        # Infer vector dimension from a sample
+        sample = train_dataset[0]
+        sample_image = sample["image"]
+
+        # Get vector dimension by running a forward pass
+        vector_dim = self._infer_vector_dim(processor, model, sample_image)
+        expected_schema = _build_object_schema(vector_dim)
+
+        # Check schema compatibility
+        if table.schema != expected_schema:
+            raise ValueError(
+                f"Table schema mismatch. Expected: {expected_schema}, "
+                f"Got: {table.schema}"
+            )
+
+        # Build database: segment each image, extract features per object
+        record_id = 0
+        records = []
+
+        for idx in track(range(len(train_dataset)), description="Building object database"):
+            item = train_dataset[idx]
+            image = item["image"]
+            image_id = item.get("image_id", f"image_{idx}")
+
+            # Segment image using SAM
+            masks = segment_image(
+                self.mask_generator,
+                image,
+                min_area=self.min_mask_area,
+                max_masks=self.max_masks_per_image,
+            )
+
+            if not masks:
+                continue
+
+            # Extract features for each mask
+            for mask_idx, mask_info in enumerate(masks):
+                # Extract masked region
+                masked_image = self._apply_mask(image, mask_info["segment"])
+
+                # Extract feature vector
+                vector = extract_single_image_feature(processor, model, masked_image)
+
+                # Create object ID
+                object_id = f"{image_id}_obj_{mask_idx}"
+                category = mask_info.get("category", "unknown")
+
+                records.append({
+                    "id": record_id,
+                    "image_id": image_id,
+                    "object_id": object_id,
+                    "category": category,
+                    "vector": vector,
+                })
+                record_id += 1
+
+        # Add all records to table
+        if records:
+            table.add(records)
+
+    def evaluate(
+        self,
+        model: nn.Module,
+        processor: BitImageProcessorFast,
+        test_dataset: Any,
+        table: lancedb.table.Table,
+        batch_size: int,
+    ) -> dict[str, Any]:
+        """Evaluate the model on the test dataset.
+
+        Args:
+            model: Feature extraction model.
+            processor: Image preprocessor.
+            test_dataset: Test dataset.
+            table: LanceDB table to search against.
+            batch_size: Batch size for DataLoader.
+
+        Returns:
+            Dictionary containing evaluation results with keys:
+                - accuracy: Recall@K accuracy (0.0 ~ 1.0)
+                - correct: Number of correct predictions
+                - total: Total number of test samples
+                - top_k: The K value used
+        """
+        top_k = self.config.top_k_per_object
+
+        correct = 0
+        total = 0
+
+        for idx in track(range(len(test_dataset)), description=f"Evaluating Recall@{top_k}"):
+            item = test_dataset[idx]
+            image = item["image"]
+            target_image_id = item.get("image_id", f"image_{idx}")
+
+            # Segment query image
+            masks = segment_image(
+                self.mask_generator,
+                image,
+                min_area=self.min_mask_area,
+                max_masks=self.max_masks_per_image,
+            )
+
+            if not masks:
+                continue
+
+            # Randomly sample query objects
+            num_query = min(self.config.num_query_objects, len(masks))
+            query_masks = random.sample(masks, num_query)
+
+            # Extract features and search for each query object
+            retrieved_results: dict[str, list[tuple[float, str]]] = {}
+
+            for mask_info in query_masks:
+                # Extract masked region
+                masked_image = self._apply_mask(image, mask_info["segment"])
+
+                # Extract feature vector
+                vector = extract_single_image_feature(processor, model, masked_image)
+
+                # Search in LanceDB
+                results = (
+                    table.search(vector)
+                    .select(["image_id", "object_id", "_distance"])
+                    .limit(top_k)
+                    .to_polars()
+                )
+
+                # Aggregate results by scene
+                for row in results.iter_rows():
+                    image_id = row["image_id"]
+                    object_id = row["object_id"]
+                    distance = row["_distance"]
+
+                    if image_id not in retrieved_results:
+                        retrieved_results[image_id] = []
+                    retrieved_results[image_id].append((distance, object_id))
+
+            # Compute scene scores
+            query_object_ids = [m.get("object_id", f"query_obj_{i}") for i, m in enumerate(query_masks)]
+            scene_scores = _compute_scene_score(
+                query_object_ids,
+                retrieved_results,
+                self.config.gamma,
+            )
+
+            # Rank scenes by score
+            ranked_scenes = sorted(scene_scores.items(), key=lambda x: x[1], reverse=True)
+
+            # Check if target is in top-K
+            top_k_scenes = [scene_id for scene_id, _ in ranked_scenes[:top_k]]
+            if target_image_id in top_k_scenes:
+                correct += 1
+            total += 1
+
+        accuracy = correct / total if total > 0 else 0.0
+
+        return {
+            "accuracy": accuracy,
+            "correct": correct,
+            "total": total,
+            "top_k": top_k,
+        }
+
+    def _infer_vector_dim(
+        self,
+        processor: BitImageProcessorFast,
+        model: nn.Module,
+        sample_image: Any,
+    ) -> int:
+        """Infer vector dimension from model output."""
+        vector = extract_single_image_feature(processor, model, sample_image)
+        return len(vector)
+
+    def _apply_mask(self, image: Any, mask: np.ndarray) -> Any:
+        """Apply mask to image and return masked image.
+
+        Args:
+            image: PIL Image.
+            mask: Binary mask as numpy array.
+
+        Returns:
+            Masked PIL Image.
+        """
+        import numpy as np
+        from PIL import Image
+
+        image_np = np.array(image.convert("RGB"))
+        # Ensure mask is the right shape
+        if mask.shape != image_np.shape[:2]:
+            from skimage.transform import resize
+            mask_resized = resize(mask, image_np.shape[:2], order=0, anti_aliasing=False)
+        else:
+            mask_resized = mask
+
+        # Apply mask
+        masked_np = image_np * mask_resized[:, :, np.newaxis]
+        return Image.fromarray(masked_np.astype(np.uint8))
--- a/mini-nav/commands/benchmark.py
+++ b/mini-nav/commands/benchmark.py
@@ -1,4 +1,4 @@
-from typing import cast
+from typing import Any, Optional, cast

 import typer
 from commands import app
@@ -7,15 +7,15 @@ from commands import app
@app.command()
 def benchmark(
    ctx: typer.Context,
-    model_path: str = typer.Option(
+    model_path: Optional[str] = typer.Option(
        None, "--model", "-m", help="Path to compressor model weights"
    ),
 ):
    import torch
+    import torch.nn.functional as F
    from benchmarks import run_benchmark
-    from compressors import DinoCompressor
    from configs import cfg_manager
-    from transformers import AutoImageProcessor, BitImageProcessorFast
+    from transformers import AutoImageProcessor, AutoModel, BitImageProcessorFast
    from utils import get_device

    config = cfg_manager.get()
@@ -29,7 +29,12 @@ def benchmark(
        AutoImageProcessor.from_pretrained(model_cfg.dino_model, device_map=device),
    )

-    model = DinoCompressor().to(device)
+    # Load DINO model for feature extraction
+    dino = AutoModel.from_pretrained(model_cfg.dino_model, device_map=device)
+    dino.eval()
+
+    # Optional hash compressor
+    compressor = None
    if model_path:
        from compressors import HashCompressor

@@ -38,7 +43,31 @@ def benchmark(
            hash_bits=model_cfg.compression_dim,
        )
        compressor.load_state_dict(torch.load(model_path))
-        model.compressor = compressor
+        compressor.to(device)
+        compressor.eval()
+
+    # Create wrapper with extract_features method
+    class DinoFeatureExtractor:
+        def __init__(self, dino, compressor=None):
+            self.dino = dino
+            self.compressor = compressor
+
+        def extract_features(self, images: list) -> torch.Tensor:
+            inputs = processor(images, return_tensors="pt").to(device)
+            with torch.no_grad():
+                outputs = self.dino(**inputs)
+                features = outputs.last_hidden_state.mean(dim=1)
+                features = F.normalize(features, dim=-1)
+            return features
+
+        def encode(self, images: list) -> torch.Tensor:
+            if self.compressor is None:
+                return self.extract_features(images)
+            tokens = self.dino(**processor(images, return_tensors="pt").to(device)).last_hidden_state
+            _, _, bits = self.compressor(tokens)
+            return bits
+
+    model = DinoFeatureExtractor(dino, compressor)

    run_benchmark(
        model=model,
--- a/mini-nav/compressors/init.py
+++ b/mini-nav/compressors/init.py
@@ -1,18 +1,15 @@
 from .common import BinarySign, bits_to_hash, hamming_distance, hamming_similarity, hash_to_bits
-from .dino_compressor import DinoCompressor
 from .hash_compressor import HashCompressor, HashLoss, VideoPositiveMask
-from .pipeline import SAMHashPipeline, create_pipeline_from_config
-from .segament_compressor import SegmentCompressor
+from .pipeline import HashPipeline, SAMHashPipeline, create_pipeline_from_config
 from .train import train

 __all__ = [
    "train",
-    "DinoCompressor",
    "HashCompressor",
    "HashLoss",
    "VideoPositiveMask",
-    "SegmentCompressor",
-    "SAMHashPipeline",
+    "HashPipeline",
+    "SAMHashPipeline",  # Backward compatibility alias
    "create_pipeline_from_config",
    "BinarySign",
    "hamming_distance",
--- a/mini-nav/compressors/dino_compressor.py
+++ b/mini-nav/compressors/dino_compressor.py
@@ -1,105 +0,0 @@
-from typing import Optional
-
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from PIL import Image
-from transformers import AutoImageProcessor, AutoModel
-
-
-class DinoCompressor(nn.Module):
-    """DINOv2 feature extractor with optional hash compression.
-
-    When compressor is None: returns normalized DINO embeddings.
-    When compressor is provided: returns binary hash bits for CAM storage.
-
-    Supports both PIL Image input and pre-extracted tokens.
-    """
-
-    def __init__(
-        self,
-        model_name: str = "facebook/dinov2-large",
-        compressor: Optional[nn.Module] = None,
-        device: Optional[str] = None,
-    ):
-        """Initialize DINOv2 extractor.
-
-        Args:
-            model_name: HuggingFace model name
-            compressor: Optional hash compressor for producing binary codes
-            device: Device to load model on
-        """
-        super().__init__()
-
-        # Auto detect device
-        if device is None:
-            device = "cuda" if torch.cuda.is_available() else "cpu"
-        self.device = torch.device(device)
-
-        self.model_name = model_name
-        self.processor = AutoImageProcessor.from_pretrained(model_name)
-        self.dino = AutoModel.from_pretrained(model_name).to(self.device)
-        self.dino.eval()
-
-        self.compressor = compressor
-
-    def forward(self, inputs):
-        teacher_tokens = self.dino(**inputs).last_hidden_state  # [B,N,1024]
-
-        teacher_embed = teacher_tokens.mean(dim=1)
-        teacher_embed = F.normalize(teacher_embed, dim=-1)  # [B,1024]
-
-        if self.compressor is None:
-            return teacher_embed
-
-        # HashCompressor returns (logits, hash_codes, bits)
-        _, _, bits = self.compressor(teacher_tokens)
-        return bits  # [B, 512] binary bits for CAM
-
-    def extract_features(self, images: list[Image.Image]) -> torch.Tensor:
-        """Extract DINO features from a list of cropped object images.
-
-        Args:
-            images: List of PIL Images (cropped objects)
-
-        Returns:
-            DINO features [N, feature_dim], normalized
-        """
-        if len(images) == 0:
-            return torch.empty(0, self.dino.config.hidden_size, device=self.device)
-
-        # Process batch of images
-        inputs = self.processor(images, return_tensors="pt").to(self.device)
-
-        with torch.no_grad():
-            outputs = self.dino(**inputs)
-
-        # Pool tokens to get global representation
-        features = outputs.last_hidden_state.mean(dim=1)  # [N, 1024]
-        features = F.normalize(features, dim=-1)
-
-        return features
-
-    def encode(self, images: list[Image.Image]) -> torch.Tensor:
-        """Extract features from images and optionally compress to hash codes.
-
-        Args:
-            images: List of PIL Images
-
-        Returns:
-            If compressor is None: DINO features [N, 1024]
-            If compressor is set: Binary hash bits [N, 512]
-        """
-        if self.compressor is None:
-            return self.extract_features(images)
-
-        # Extract features first
-        features = self.extract_features(images)  # [N, 1024]
-
-        # Add sequence dimension for compressor (expects [B, N, dim])
-        features = features.unsqueeze(1)  # [N, 1, 1024]
-
-        # Compress to hash codes
-        _, _, bits = self.compressor(features)
-
-        return bits
--- a/mini-nav/compressors/pipeline.py
+++ b/mini-nav/compressors/pipeline.py
@@ -1,79 +1,65 @@
-"""Complete pipeline for SAM + DINO + HashCompressor.
+"""Hash compression pipeline with DINO feature extraction.

-This pipeline extracts object masks from images using SAM2.1,
-crops the objects, extracts features using DINOv2,
-and compresses them to binary hash codes using HashCompressor.
+This pipeline extracts features using DINOv2 and compresses them
+to binary hash codes using HashCompressor.
 """

-from pathlib import Path
 from typing import Optional

 import torch
 import torch.nn as nn
+import torch.nn.functional as F
 from PIL import Image
-
-from .dino_compressor import DinoCompressor
-from .hash_compressor import HashCompressor
-from .segament_compressor import SegmentCompressor
+from transformers import AutoImageProcessor, AutoModel


-def create_pipeline_from_config(config) -> "SAMHashPipeline":
-    """Create SAMHashPipeline from a config object.
+def create_pipeline_from_config(config) -> "HashPipeline":
+    """Create HashPipeline from a config object.

    Args:
        config: Configuration object with model settings

    Returns:
-        Initialized SAMHashPipeline
+        Initialized HashPipeline
    """
-    return SAMHashPipeline(
-        sam_model=config.model.sam_model,
-        dino_model=config.model.name,
+    return HashPipeline(
+        dino_model=config.model.dino_model,
        hash_bits=config.model.compression_dim,
-        sam_min_mask_area=config.model.sam_min_mask_area,
-        sam_max_masks=config.model.sam_max_masks,
        compressor_path=config.model.compressor_path,
        device=config.model.device if config.model.device != "auto" else None,
    )


-class SAMHashPipeline(nn.Module):
-    """Complete pipeline: SAM segmentation + DINO features + Hash compression.
+class HashPipeline(nn.Module):
+    """Pipeline: DINO features + Hash compression.

    Pipeline flow:
-        Image -> SAM (extract masks) -> Crop objects -> DINO (features) -> Hash (binary codes)
+        PIL Image -> DINO (features) -> Hash (binary codes)

    Usage:
        # Initialize with config
-        pipeline = SAMHashPipeline(
-            sam_model="facebook/sam2.1-hiera-large",
+        pipeline = HashPipeline(
            dino_model="facebook/dinov2-large",
            hash_bits=512,
        )

        # Process image
        image = Image.open("path/to/image.jpg")
-        hash_codes = pipeline(image)  # [N, 512] binary bits
+        hash_bits = pipeline(image)  # [1, 512] binary bits
    """

    def __init__(
        self,
-        sam_model: str = "facebook/sam2.1-hiera-large",
        dino_model: str = "facebook/dinov2-large",
        hash_bits: int = 512,
-        sam_min_mask_area: int = 100,
-        sam_max_masks: int = 10,
        compressor_path: Optional[str] = None,
        device: Optional[str] = None,
    ):
-        """Initialize the complete pipeline.
+        """Initialize the pipeline.

        Args:
-            sam_model: SAM model name from HuggingFace
            dino_model: DINOv2 model name from HuggingFace
            hash_bits: Number of bits in hash code
-            sam_min_mask_area: Minimum mask area threshold
-            sam_max_masks: Maximum number of masks to keep
            compressor_path: Optional path to trained HashCompressor weights
            device: Device to run models on
        """
@@ -84,87 +70,101 @@ class SAMHashPipeline(nn.Module):
            device = "cuda" if torch.cuda.is_available() else "cpu"
        self.device = torch.device(device)

-        # Initialize components
-        self.segmentor = SegmentCompressor(
-            model_name=sam_model,
-            min_mask_area=sam_min_mask_area,
-            max_masks=sam_max_masks,
-            device=device,
-        )
+        self.dino_model = dino_model

-        # HashCompressor expects DINO features (1024 dim for dinov2-large)
-        dino_dim = 1024 if "large" in dino_model else 768
-        self.hash_compressor = HashCompressor(
-            input_dim=dino_dim, hash_bits=hash_bits
-        ).to(device)
+        # Initialize DINO processor and model
+        self.processor = AutoImageProcessor.from_pretrained(dino_model)
+        self.dino = AutoModel.from_pretrained(dino_model).to(self.device)
+        self.dino.eval()
+
+        # Determine DINO feature dimension
+        self.dino_dim = 1024 if "large" in dino_model else 768
+
+        # Initialize HashCompressor
+        self.hash_compressor = nn.Module()  # Placeholder, will be replaced
+        self._init_hash_compressor(hash_bits, compressor_path)
+
+    def _init_hash_compressor(
+        self, hash_bits: int, compressor_path: Optional[str] = None
+    ):
+        """Initialize the hash compressor module.
+
+        This is called during __init__ but we need to replace it properly.
+        """
+        # Import here to avoid circular imports
+        from .hash_compressor import HashCompressor
+
+        compressor = HashCompressor(input_dim=self.dino_dim, hash_bits=hash_bits).to(
+            self.device
+        )

        # Load pretrained compressor if provided
        if compressor_path is not None:
-            self.hash_compressor.load_state_dict(
-                torch.load(compressor_path, map_location=device)
+            compressor.load_state_dict(
+                torch.load(compressor_path, map_location=self.device)
            )
            print(f"[OK] Loaded HashCompressor from {compressor_path}")

-        self.dino = DinoCompressor(
-            model_name=dino_model,
-            compressor=self.hash_compressor,
-            device=device,
-        )
+        # Replace the placeholder
+        self.hash_compressor = compressor
+
+    @property
+    def hash_bits(self):
+        """Return the number of hash bits."""
+        return self.hash_compressor.hash_bits

    def forward(self, image: Image.Image) -> torch.Tensor:
-        """Process a single image through the complete pipeline.
+        """Process a single image through the pipeline.

        Args:
            image: Input PIL Image

        Returns:
-            Binary hash codes [N, hash_bits] where N is number of detected objects
+            Binary hash codes [1, hash_bits] as int32
        """
-        # Step 1: SAM - extract and crop objects
-        cropped_objects = self.segmentor(image)
+        # Extract DINO features
+        inputs = self.processor(image, return_tensors="pt").to(self.device)

-        if len(cropped_objects) == 0:
-            # No objects detected, return empty tensor
-            return torch.empty(
-                0, self.hash_compressor.hash_bits, dtype=torch.int32, device=self.device
-            )
+        with torch.no_grad():
+            outputs = self.dino(**inputs)
+            tokens = outputs.last_hidden_state  # [1, N, dim]

-        # Step 2: DINO - extract features from cropped objects
-        # Step 3: HashCompressor - compress features to binary codes
-        hash_codes = self.dino.encode(cropped_objects)
+        # Compress to hash codes
+        _, _, bits = self.hash_compressor(tokens)

-        return hash_codes
+        return bits

-    def extract_features(
-        self, image: Image.Image, use_hash: bool = False
-    ) -> torch.Tensor:
-        """Extract features from image with optional hash compression.
+    def encode(self, image: Image.Image) -> torch.Tensor:
+        """Encode an image to binary hash bits.

-        Args:
-            image: Input PIL Image
-            use_hash: If True, return binary hash codes; else return DINO features
-
-        Returns:
-            Features [N, dim] where dim is 1024 (DINO) or 512 (hash)
-        """
-        cropped_objects = self.segmentor(image)
-
-        if len(cropped_objects) == 0:
-            dim = self.hash_compressor.hash_bits if use_hash else 1024
-            return torch.empty(0, dim, device=self.device)
-
-        if use_hash:
-            return self.dino.encode(cropped_objects)
-        else:
-            return self.dino.extract_features(cropped_objects)
-
-    def extract_masks(self, image: Image.Image) -> list[torch.Tensor]:
-        """Extract only masks without full processing (for debugging).
+        Alias for forward().

        Args:
            image: Input PIL Image

        Returns:
-            List of binary masks [H, W]
+            Binary hash codes [1, hash_bits] as int32
        """
-        return self.segmentor.extract_masks(image)
+        return self.forward(image)
+
+    def extract_features(self, image: Image.Image) -> torch.Tensor:
+        """Extract DINO features from an image.
+
+        Args:
+            image: Input PIL Image
+
+        Returns:
+            DINO features [1, dino_dim], normalized
+        """
+        inputs = self.processor(image, return_tensors="pt").to(self.device)
+
+        with torch.no_grad():
+            outputs = self.dino(**inputs)
+            features = outputs.last_hidden_state.mean(dim=1)  # [1, dim]
+            features = F.normalize(features, dim=-1)
+
+        return features
+
+
+# Backward compatibility alias
+SAMHashPipeline = HashPipeline
--- a/mini-nav/compressors/segament_compressor.py
+++ b/mini-nav/compressors/segament_compressor.py
@@ -1,180 +0,0 @@
-"""Segment Anything 2 feature extractor with mask filtering and image cropping.
-
-Extracts object masks from images using SAM2.1, filters by area and confidence,
-then crops the original image to obtain individual object regions.
-"""
-
-from typing import Optional
-
-import numpy as np
-import torch
-import torch.nn as nn
-from PIL import Image
-from transformers import AutoModelForMaskGeneration, AutoProcessor
-
-
-class SegmentCompressor(nn.Module):
-    """SAM2.1 based segmenter with mask filtering.
-
-    Extracts object masks from images, filters by area and confidence,
-    and crops the original image to produce individual object patches.
-    """
-
-    def __init__(
-        self,
-        model_name: str = "facebook/sam2.1-hiera-large",
-        min_mask_area: int = 100,
-        max_masks: int = 10,
-        device: Optional[str] = None,
-    ):
-        """Initialize SAM2.1 segmenter.
-
-        Args:
-            model_name: HuggingFace model name for SAM2.1
-            min_mask_area: Minimum mask pixel area threshold
-            max_masks: Maximum number of masks to keep
-            device: Device to load model on (auto-detect if None)
-        """
-        super().__init__()
-
-        self.model_name = model_name
-        self.min_mask_area = min_mask_area
-        self.max_masks = max_masks
-
-        # Auto detect device
-        if device is None:
-            device = "cuda" if torch.cuda.is_available() else "cpu"
-        self.device = torch.device(device)
-
-        # Load SAM model and processor
-        self.processor = AutoProcessor.from_pretrained(model_name)
-        self.model = AutoModelForMaskGeneration.from_pretrained(model_name).to(
-            self.device
-        )
-        self.model.eval()
-
-    def forward(self, image: Image.Image) -> list[Image.Image]:
-        """Extract object masks and crop object regions.
-
-        Args:
-            image: Input PIL Image
-
-        Returns:
-            List of cropped object images (one per valid mask)
-        """
-        # Run SAM inference
-        inputs = self.processor(image, return_tensors="pt").to(self.device)
-
-        with torch.no_grad():
-            outputs = self.model(**inputs)
-
-        # Post-process masks
-        masks = self.processor.post_process_masks(
-            outputs.pred_masks,
-            inputs["original_sizes"],
-            inputs["reshaped_input_sizes"],
-        )[0]
-
-        # Filter masks by area and confidence
-        valid_masks = self._filter_masks(masks)
-
-        if len(valid_masks) == 0:
-            return []
-
-        # Crop object regions from original image
-        cropped_objects = self._crop_objects(image, valid_masks)
-
-        return cropped_objects
-
-    def _filter_masks(self, masks: torch.Tensor) -> list[dict]:
-        """Filter masks by area and keep top-N.
-
-        Args:
-            masks: Predicted masks [N, H, W]
-
-        Returns:
-            List of mask dictionaries with 'mask' and 'area'
-        """
-        valid_masks = []
-
-        for mask in masks:
-            # Calculate mask area
-            area = mask.sum().item()
-
-            # Filter by minimum area
-            if area < self.min_mask_area:
-                continue
-
-            valid_masks.append({"mask": mask, "area": area})
-
-        # Sort by area (descending) and keep top-N
-        valid_masks = sorted(valid_masks, key=lambda x: x["area"], reverse=True)
-        valid_masks = valid_masks[: self.max_masks]
-
-        return valid_masks
-
-    def _crop_objects(
-        self, image: Image.Image, masks: list[dict]
-    ) -> list[Image.Image]:
-        """Crop object regions from image using masks.
-
-        Args:
-            image: Original PIL Image
-            masks: List of mask dictionaries
-
-        Returns:
-            List of cropped object images
-        """
-        # Convert PIL to numpy for processing
-        image_np = np.array(image)
-        h, w = image_np.shape[:2]
-
-        cropped_objects = []
-
-        for mask_info in masks:
-            mask = mask_info["mask"].cpu().numpy()
-
-            # Find bounding box from mask
-            rows = mask.any(axis=1)
-            cols = mask.any(axis=0)
-
-            if not rows.any() or not cols.any():
-                continue
-
-            y_min, y_max = rows.argmax(), h - rows[::-1].argmax() - 1
-            x_min, x_max = cols.argmax(), w - cols[::-1].argmax() - 1
-
-            # Add small padding
-            pad = 5
-            x_min = max(0, x_min - pad)
-            y_min = max(0, y_min - pad)
-            x_max = min(w, x_max + pad)
-            y_max = min(h, y_max + pad)
-
-            # Crop
-            cropped = image.crop((x_min, y_min, x_max, y_max))
-            cropped_objects.append(cropped)
-
-        return cropped_objects
-
-    @torch.no_grad()
-    def extract_masks(self, image: Image.Image) -> list[torch.Tensor]:
-        """Extract only masks without cropping (for debugging).
-
-        Args:
-            image: Input PIL Image
-
-        Returns:
-            List of binary masks [H, W]
-        """
-        inputs = self.processor(image, return_tensors="pt").to(self.device)
-        outputs = self.model(**inputs)
-
-        masks = self.processor.post_process_masks(
-            outputs.pred_masks,
-            inputs["original_sizes"],
-            inputs["reshaped_input_sizes"],
-        )[0]
-
-        valid_masks = self._filter_masks(masks)
-        return [m["mask"] for m in valid_masks]
--- a/mini-nav/configs/models.py
+++ b/mini-nav/configs/models.py
@@ -118,6 +118,17 @@ class BenchmarkTaskConfig(BaseModel):
    type: str = Field(default="retrieval", description="Task type")
    top_k: int = Field(default=10, gt=0, description="Top K for recall evaluation")

+    # Multi-object retrieval specific settings
+    gamma: float = Field(
+        default=1.0, ge=0, description="Co-occurrence penalty exponent"
+    )
+    top_k_per_object: int = Field(
+        default=50, gt=0, description="Top K results per object query"
+    )
+    num_query_objects: int = Field(
+        default=3, gt=0, description="Number of objects to sample from query image"
+    )
+

 class BenchmarkConfig(BaseModel):
    """Configuration for benchmark evaluation."""
--- a/mini-nav/data_loading/insdet_scenes.py
+++ b/mini-nav/data_loading/insdet_scenes.py
@@ -0,0 +1,64 @@
+"""InsDet Scenes dataset for multi-object retrieval benchmark."""
+
+from pathlib import Path
+from typing import Any
+
+from benchmarks.base import BaseDataset
+from data_loading.loader import load_val_dataset
+
+
+class InsDetScenesDataset(BaseDataset):
+    """InsDet-FULL/Scenes dataset with easy/hard splits.
+
+    This dataset provides scene images with object annotations from the
+    Instance Detection (InsDet) dataset, supporting easy and hard splits.
+    """
+
+    def __init__(
+        self,
+        scenes_dir: Path | str,
+        split: str = "easy",
+    ):
+        """Initialize InsDet Scenes dataset.
+
+        Args:
+            scenes_dir: Path to the InsDet-FULL/Scenes directory.
+            split: Scene split to use ('easy' or 'hard').
+        """
+        self.scenes_dir = Path(scenes_dir)
+        self.split = split
+        self._dataset = load_val_dataset(self.scenes_dir, split)
+
+    def get_train_split(self) -> Any:
+        """Get training split (same as test for this dataset).
+
+        Returns:
+            HuggingFace Dataset for training.
+        """
+        return self._dataset
+
+    def get_test_split(self) -> Any:
+        """Get test/evaluation split.
+
+        Returns:
+            HuggingFace Dataset for testing.
+        """
+        return self._dataset
+
+    def __len__(self) -> int:
+        """Get dataset length."""
+        return len(self._dataset)
+
+    def __getitem__(self, idx: int) -> dict[str, Any]:
+        """Get a single item from the dataset.
+
+        Args:
+            idx: Index of the item.
+
+        Returns:
+            Dictionary containing:
+                - image: PIL Image
+                - image_id: Scene identifier
+                - objects: dict with bbox, category, area, id
+        """
+        return self._dataset[idx]
--- a/mini-nav/tests/test_compressors.py
+++ b/mini-nav/tests/test_compressors.py
@@ -1,13 +1,13 @@
-"""Tests for compressor modules (SAM, DINO, HashCompressor, Pipeline)."""
+"""Tests for compressor modules (HashCompressor, Pipeline)."""

 import pytest
 import torch
 from compressors import (
    BinarySign,
-    DinoCompressor,
    HashCompressor,
+    HashPipeline,
    SAMHashPipeline,
-    SegmentCompressor,
+    VideoPositiveMask,
    bits_to_hash,
    create_pipeline_from_config,
    hamming_distance,
@@ -124,87 +124,105 @@ class TestHammingMetrics:
        assert sim.item() == 512  # Max similarity


-class TestSegmentCompressor:
-    """Test suite for SegmentCompressor."""
+class TestHashLoss:
+    """Test suite for HashLoss."""

-    @pytest.fixture
-    def mock_image(self):
-        """Create a mock PIL image."""
-        img = Image.new("RGB", (224, 224), color="red")
-        return img
+    def test_hash_loss_init(self):
+        """Verify HashLoss initializes with correct parameters."""
+        from compressors import HashLoss

-    def test_segment_compressor_init(self):
-        """Verify SegmentCompressor initializes with correct parameters."""
-        segmentor = SegmentCompressor(
-            model_name="facebook/sam2.1-hiera-large",
-            min_mask_area=100,
-            max_masks=10,
+        loss_fn = HashLoss(
+            contrastive_weight=1.0,
+            distill_weight=0.5,
+            quant_weight=0.01,
+            temperature=0.2,
        )

-        assert segmentor.model_name == "facebook/sam2.1-hiera-large"
-        assert segmentor.min_mask_area == 100
-        assert segmentor.max_masks == 10
+        assert loss_fn.contrastive_weight == 1.0
+        assert loss_fn.distill_weight == 0.5
+        assert loss_fn.quant_weight == 0.01
+        assert loss_fn.temperature == 0.2

-    def test_filter_masks(self):
-        """Verify mask filtering logic."""
-        # Create segmentor to get default filter params
-        segmentor = SegmentCompressor()
+    def test_hash_loss_forward(self):
+        """Verify HashLoss computes loss correctly."""
+        from compressors import HashLoss

-        # Create mock masks tensor with different areas
-        # Masks shape: [N, H, W]
-        masks = []
-        for area in [50, 200, 150, 300, 10]:
-            mask = torch.zeros(100, 100)
-            mask[:1, :area] = 1  # Create mask with specific area
-            masks.append(mask)
+        loss_fn = HashLoss()

-        masks_tensor = torch.stack(masks)  # [5, 100, 100]
-        valid = segmentor._filter_masks(masks_tensor)
+        batch_size = 4
+        hash_bits = 512
+        logits = torch.randn(batch_size, hash_bits)
+        hash_codes = torch.sign(logits)
+        teacher_embed = torch.randn(batch_size, 1024)
+        positive_mask = torch.eye(batch_size, dtype=torch.bool)

-        # Should filter out 50 and 10 (below min_mask_area=100)
-        # Then keep top 3 (max_masks=10)
-        assert len(valid) == 3
-        # Verify sorted by area (descending)
-        areas = [v["area"] for v in valid]
-        assert areas == sorted(areas, reverse=True)
+        total_loss, components = loss_fn(
+            logits=logits,
+            hash_codes=hash_codes,
+            teacher_embed=teacher_embed,
+            positive_mask=positive_mask,
+        )
+
+        assert "contrastive" in components
+        assert "distill" in components
+        assert "quantization" in components
+        assert "total" in components


-class TestDinoCompressor:
-    """Test suite for DinoCompressor."""
+class TestVideoPositiveMask:
+    """Test suite for VideoPositiveMask."""

-    def test_dino_compressor_init(self):
-        """Verify DinoCompressor initializes correctly."""
-        dino = DinoCompressor()
+    def test_from_frame_indices(self):
+        """Verify positive mask generation from frame indices."""
+        mask_gen = VideoPositiveMask(temporal_window=2)

-        assert dino.model_name == "facebook/dinov2-large"
+        frame_indices = torch.tensor([0, 1, 3, 5])

-    def test_dino_compressor_with_compressor(self):
-        """Verify DinoCompressor with HashCompressor."""
-        hash_compressor = HashCompressor(input_dim=1024, hash_bits=512)
-        dino = DinoCompressor(compressor=hash_compressor)
+        mask = mask_gen.from_frame_indices(frame_indices)

-        assert dino.compressor is hash_compressor
+        assert mask.shape == (4, 4)
+        # Frame 0 and 1 should be positive (distance 1 <= 2)
+        assert mask[0, 1] == True
+        # Frame 0 and 3 should be negative (distance 3 > 2)
+        assert mask[0, 3] == False
+
+    def test_from_video_ids(self):
+        """Verify positive mask generation from video IDs and frame indices."""
+        mask_gen = VideoPositiveMask(temporal_window=2)
+
+        video_ids = torch.tensor([0, 0, 1, 1])
+        frame_indices = torch.tensor([0, 1, 0, 1])
+
+        mask = mask_gen.from_video_ids(video_ids, frame_indices)
+
+        assert mask.shape == (4, 4)
+        # Same video and temporally close
+        assert mask[0, 1] == True  # video 0, frames 0,1
+        # Different video
+        assert mask[0, 2] == False  # video 0 vs 1


-class TestSAMHashPipeline:
-    """Test suite for SAMHashPipeline."""
+class TestHashPipeline:
+    """Test suite for HashPipeline."""

    def test_pipeline_init(self):
        """Verify pipeline initializes all components."""
-        pipeline = SAMHashPipeline(
-            sam_model="facebook/sam2.1-hiera-large",
+        pipeline = HashPipeline(
            dino_model="facebook/dinov2-large",
            hash_bits=512,
        )

-        assert isinstance(pipeline.segmentor, SegmentCompressor)
-        assert isinstance(pipeline.dino, DinoCompressor)
-        assert isinstance(pipeline.hash_compressor, HashCompressor)
+        assert pipeline.dino_model == "facebook/dinov2-large"
+        assert pipeline.dino_dim == 1024

    def test_pipeline_hash_bits(self):
        """Verify pipeline uses correct hash bits."""
-        pipeline = SAMHashPipeline(hash_bits=256)
-        assert pipeline.hash_compressor.hash_bits == 256
+        pipeline = HashPipeline(hash_bits=256)
+        assert pipeline.hash_bits == 256
+
+    def test_pipeline_alias(self):
+        """Verify SAMHashPipeline is alias for HashPipeline."""
+        assert SAMHashPipeline is HashPipeline


 class TestConfigIntegration:
@@ -216,25 +234,21 @@ class TestConfigIntegration:

        pipeline = create_pipeline_from_config(config)

-        assert isinstance(pipeline, SAMHashPipeline)
-        assert pipeline.hash_compressor.hash_bits == config.model.compression_dim
+        assert isinstance(pipeline, HashPipeline)
+        assert pipeline.hash_bits == config.model.compression_dim

-    def test_config_sam_settings(self):
-        """Verify config contains SAM settings."""
+    def test_config_settings(self):
+        """Verify config contains required settings."""
        config = cfg_manager.load()

-        assert hasattr(config.model, "sam_model")
-        assert hasattr(config.model, "sam_min_mask_area")
-        assert hasattr(config.model, "sam_max_masks")
-        assert config.model.sam_model == "facebook/sam2.1-hiera-large"
-        assert config.model.sam_min_mask_area == 100
-        assert config.model.sam_max_masks == 10
+        assert hasattr(config.model, "dino_model")
+        assert hasattr(config.model, "compression_dim")


+@pytest.mark.slow
 class TestPipelineIntegration:
    """Integration tests for full pipeline (slow, requires model downloads)."""

-    @pytest.mark.slow
    def test_pipeline_end_to_end(self):
        """Test full pipeline with actual models (slow test)."""
        # Skip if no GPU
@@ -245,54 +259,32 @@ class TestPipelineIntegration:
        image = Image.new("RGB", (640, 480), color=(128, 128, 128))

        # Initialize pipeline (will download models on first run)
-        pipeline = SAMHashPipeline(
-            sam_model="facebook/sam2.1-hiera-large",
+        pipeline = HashPipeline(
            dino_model="facebook/dinov2-large",
            hash_bits=512,
-            sam_min_mask_area=100,
-            sam_max_masks=5,
        )

        # Run pipeline
-        hash_codes = pipeline(image)
+        hash_bits = pipeline(image)

        # Verify output shape
-        assert hash_codes.dim() == 2
-        assert hash_codes.shape[1] == 512
-        assert torch.all((hash_codes == 0) | (hash_codes == 1))
+        assert hash_bits.dim() == 2
+        assert hash_bits.shape[1] == 512
+        assert torch.all((hash_bits == 0) | (hash_bits == 1))

-    @pytest.mark.slow
-    def test_extract_features_without_hash(self):
-        """Test feature extraction without hash compression."""
+    def test_extract_features(self):
+        """Test feature extraction."""
        if not torch.cuda.is_available():
            pytest.skip("Requires CUDA")

        image = Image.new("RGB", (640, 480), color=(128, 128, 128))

-        pipeline = SAMHashPipeline(
-            sam_model="facebook/sam2.1-hiera-large",
+        pipeline = HashPipeline(
            dino_model="facebook/dinov2-large",
        )

-        features = pipeline.extract_features(image, use_hash=False)
+        features = pipeline.extract_features(image)

        # Should return DINO features (1024 for large)
        assert features.dim() == 2
        assert features.shape[1] == 1024
-
-    @pytest.mark.slow
-    def test_extract_masks_only(self):
-        """Test mask extraction only."""
-        if not torch.cuda.is_available():
-            pytest.skip("Requires CUDA")
-
-        image = Image.new("RGB", (640, 480), color=(128, 128, 128))
-
-        pipeline = SAMHashPipeline(
-            sam_model="facebook/sam2.1-hiera-large",
-        )
-
-        masks = pipeline.extract_masks(image)
-
-        # Should return a list of masks
-        assert isinstance(masks, list)
--- a/mini-nav/tests/test_multi_object_retrieval.py
+++ b/mini-nav/tests/test_multi_object_retrieval.py
@@ -0,0 +1,238 @@
+"""Integration tests for multi-object retrieval benchmark pipeline.
+
+These tests verify the end-to-end functionality of the multi-object retrieval
+benchmark, including schema building, database population, and evaluation.
+"""
+
+import numpy as np
+import pytest
+from unittest.mock import Mock, patch, MagicMock
+from PIL import Image
+
+
+class TestMultiObjectRetrievalIntegration:
+    """Integration tests for multi-object retrieval benchmark."""
+
+    @pytest.fixture
+    def mock_model_processor(self):
+        """Create mock model and processor."""
+        mock_model = Mock()
+        mock_processor = Mock()
+
+        # Mock the feature extraction to return a fixed-size vector
+        def mock_extract(processor, model, image):
+            return [0.1] * 256  # 256-dim vector
+        mock_processor.images = mock_extract
+
+        return mock_model, mock_processor
+
+    @pytest.fixture
+    def mock_dataset(self):
+        """Create a mock dataset with images and annotations."""
+        # Create mock items
+        items = []
+        for i in range(3):
+            item = {
+                "image": Image.new("RGB", (224, 224), color=(i * 50, 100, 150)),
+                "image_id": f"scene_{i}",
+                "objects": {
+                    "bbox": [[10, 10, 50, 50], [60, 60, 40, 40]],
+                    "category": ["object_a", "object_b"],
+                    "area": [2500, 1600],
+                    "id": [0, 1],
+                },
+            }
+            items.append(item)
+
+        mock_dataset = Mock()
+        mock_dataset.__len__ = Mock(return_value=len(items))
+        mock_dataset.__getitem__ = lambda self, idx: items[idx]
+        mock_dataset.with_format = lambda fmt: mock_dataset
+
+        return mock_dataset
+
+    def test_build_object_schema(self):
+        """Test that object schema is built correctly."""
+        from benchmarks.tasks.multi_object_retrieval import _build_object_schema
+        import pyarrow as pa
+
+        vector_dim = 256
+        schema = _build_object_schema(vector_dim)
+
+        assert isinstance(schema, pa.Schema)
+        assert "id" in schema.names
+        assert "image_id" in schema.names
+        assert "object_id" in schema.names
+        assert "category" in schema.names
+        assert "vector" in schema.names
+
+        # Check vector field has correct dimension
+        vector_field = schema.field("vector")
+        assert isinstance(vector_field.type, pa.List)
+        assert vector_field.type.value_type == pa.float32()
+
+    @patch("benchmarks.tasks.multi_object_retrieval.load_sam_model")
+    @patch("benchmarks.tasks.multi_object_retrieval.segment_image")
+    def test_build_database_with_mocked_sam(
+        self,
+        mock_segment,
+        mock_load_sam,
+        mock_model_processor,
+        mock_dataset,
+    ):
+        """Test database building with mocked SAM segmentation."""
+        from benchmarks.tasks.multi_object_retrieval import (
+            MultiObjectRetrievalTask,
+            _build_object_schema,
+        )
+
+        mock_model, mock_processor = mock_model_processor
+
+        # Mock SAM
+        mock_load_sam.return_value = (Mock(), Mock())
+        mock_segment.return_value = [
+            {
+                "segment": np.ones((224, 224), dtype=bool),
+                "area": 50000,
+                "bbox": [0, 0, 224, 224],
+            }
+        ]
+
+        # Create task with config
+        task = MultiObjectRetrievalTask(
+            sam_model="facebook/sam2.1-hiera-large",
+            min_mask_area=1024,
+            max_masks_per_image=5,
+            gamma=1.0,
+            top_k_per_object=50,
+            num_query_objects=3,
+        )
+
+        # Create mock table
+        mock_table = Mock()
+        mock_table.schema = _build_object_schema(256)
+
+        # Build database (this should not raise)
+        task.build_database(mock_model, mock_processor, mock_dataset, mock_table, batch_size=1)
+
+        # Verify table.add was called
+        assert mock_table.add.called
+
+    @patch("benchmarks.tasks.multi_object_retrieval.load_sam_model")
+    @patch("benchmarks.tasks.multi_object_retrieval.segment_image")
+    def test_evaluate_with_mocked_sam(
+        self,
+        mock_segment,
+        mock_load_sam,
+        mock_model_processor,
+        mock_dataset,
+    ):
+        """Test evaluation with mocked SAM segmentation."""
+        from benchmarks.tasks.multi_object_retrieval import (
+            MultiObjectRetrievalTask,
+            _build_object_schema,
+        )
+
+        mock_model, mock_processor = mock_model_processor
+
+        # Mock SAM
+        mock_load_sam.return_value = (Mock(), Mock())
+        mock_segment.return_value = [
+            {
+                "segment": np.ones((224, 224), dtype=bool),
+                "area": 50000,
+                "bbox": [0, 0, 224, 224],
+                "object_id": "query_obj_0",
+            }
+        ]
+
+        # Create mock table with search results
+        mock_table = Mock()
+        mock_table.schema = _build_object_schema(256)
+
+        # Mock search to return matching result
+        mock_result = Mock()
+        mock_result.to_polars.return_value = {
+            "image_id": ["scene_0"],
+            "object_id": ["scene_0_obj_0"],
+            "_distance": [0.1],
+        }
+
+        mock_table.search.return_value.select.return_value.limit.return_value = mock_result
+
+        # Create task
+        task = MultiObjectRetrievalTask(
+            sam_model="facebook/sam2.1-hiera-large",
+            min_mask_area=1024,
+            max_masks_per_image=5,
+            gamma=1.0,
+            top_k_per_object=50,
+            num_query_objects=1,
+        )
+
+        # Evaluate
+        results = task.evaluate(mock_model, mock_processor, mock_dataset, mock_table, batch_size=1)
+
+        # Verify results structure
+        assert "accuracy" in results
+        assert "correct" in results
+        assert "total" in results
+        assert "top_k" in results
+        assert results["top_k"] == 50
+
+    def test_task_initialization_with_config(self):
+        """Test task initialization with custom config."""
+        from benchmarks.tasks.multi_object_retrieval import MultiObjectRetrievalTask
+
+        task = MultiObjectRetrievalTask(
+            sam_model="facebook/sam2.1-hiera-small",
+            min_mask_area=500,
+            max_masks_per_image=3,
+            gamma=0.5,
+            top_k_per_object=100,
+            num_query_objects=5,
+        )
+
+        assert task.sam_model == "facebook/sam2.1-hiera-small"
+        assert task.min_mask_area == 500
+        assert task.max_masks_per_image == 3
+        assert task.config.gamma == 0.5
+        assert task.config.top_k_per_object == 100
+        assert task.config.num_query_objects == 5
+
+    def test_task_initialization_defaults(self):
+        """Test task initialization with default config."""
+        from benchmarks.tasks.multi_object_retrieval import MultiObjectRetrievalTask
+
+        task = MultiObjectRetrievalTask()
+
+        # Check defaults from BenchmarkTaskConfig
+        assert task.config.gamma == 1.0
+        assert task.config.top_k_per_object == 50
+        assert task.config.num_query_objects == 3
+        # SAM settings from ModelConfig defaults
+        assert task.sam_model == "facebook/sam2.1-hiera-large"
+        assert task.min_mask_area == 1024
+        assert task.max_masks_per_image == 5
+
+
+class TestInsDetScenesDataset:
+    """Tests for InsDetScenesDataset class."""
+
+    def test_dataset_class_exists(self):
+        """Test that InsDetScenesDataset can be imported."""
+        from data_loading.insdet_scenes import InsDetScenesDataset
+
+        assert InsDetScenesDataset is not None
+
+    @patch("data_loading.insdet_scenes.load_val_dataset")
+    def test_dataset_loads_correct_split(self, mock_load):
+        """Test dataset loads correct split."""
+        from data_loading.insdet_scenes import InsDetScenesDataset
+
+        mock_load.return_value = Mock()
+
+        dataset = InsDetScenesDataset("/path/to/scenes", split="easy")
+
+        mock_load.assert_called_once_with("/path/to/scenes", "easy")
+        assert dataset.split == "easy"
--- a/mini-nav/tests/test_sam.py
+++ b/mini-nav/tests/test_sam.py
@@ -0,0 +1,168 @@
+"""Tests for SAM segmentation utilities.
+
+Note: These tests mock the SAM model loading since SAM requires
+heavy model weights. The actual SAM integration should be tested
+separately in integration tests.
+"""
+
+import numpy as np
+import pytest
+from unittest.mock import Mock, patch
+from PIL import Image
+
+
+class TestSAMSegmentation:
+    """Test suite for SAM segmentation utilities."""
+
+    def test_segment_image_empty_masks(self):
+        """Test segment_image returns empty list when no masks generated."""
+        from utils.sam import segment_image
+
+        # Create mock mask generator that returns empty list
+        mock_generator = Mock()
+        mock_generator.generate.return_value = []
+
+        result = segment_image(mock_generator, Image.new("RGB", (100, 100)))
+
+        assert result == []
+
+    def test_segment_image_filters_small_masks(self):
+        """Test segment_image filters masks below min_area threshold."""
+        from utils.sam import segment_image
+
+        # Create mock masks with different areas
+        small_mask = {
+            "segment": np.zeros((10, 10), dtype=bool),
+            "area": 50,  # Below 32*32 = 1024
+            "bbox": [0, 0, 10, 10],
+            "predicted_iou": 0.9,
+            "stability_score": 0.8,
+        }
+        large_mask = {
+            "segment": np.ones((100, 100), dtype=bool),
+            "area": 10000,  # Above threshold
+            "bbox": [0, 0, 100, 100],
+            "predicted_iou": 0.95,
+            "stability_score": 0.9,
+        }
+
+        mock_generator = Mock()
+        mock_generator.generate.return_value = [small_mask, large_mask]
+
+        result = segment_image(
+            mock_generator,
+            Image.new("RGB", (100, 100)),
+            min_area=32 * 32,
+            max_masks=5,
+        )
+
+        # Should only return the large mask
+        assert len(result) == 1
+        assert result[0]["area"] == 10000
+
+    def test_segment_image_limits_max_masks(self):
+        """Test segment_image limits to max_masks largest masks."""
+        from utils.sam import segment_image
+
+        # Create 10 masks with different areas
+        masks = [
+            {
+                "segment": np.ones((i + 1, i + 1), dtype=bool),
+                "area": (i + 1) * (i + 1),
+                "bbox": [0, 0, i + 1, i + 1],
+                "predicted_iou": 0.9,
+                "stability_score": 0.8,
+            }
+            for i in range(10)
+        ]
+
+        mock_generator = Mock()
+        mock_generator.generate.return_value = masks
+
+        result = segment_image(
+            mock_generator,
+            Image.new("RGB", (100, 100)),
+            min_area=1,
+            max_masks=3,
+        )
+
+        # Should only return top 3 largest masks
+        assert len(result) == 3
+        # Check they are sorted by area (largest first)
+        areas = [m["area"] for m in result]
+        assert areas == sorted(areas, reverse=True)
+
+    def test_segment_image_sorted_by_area(self):
+        """Test segment_image returns masks sorted by area descending."""
+        from utils.sam import segment_image
+
+        # Create masks with known areas (unordered)
+        mask1 = {"segment": np.ones((5, 5), dtype=bool), "area": 25, "bbox": [0, 0, 5, 5]}
+        mask2 = {"segment": np.ones((10, 10), dtype=bool), "area": 100, "bbox": [0, 0, 10, 10]}
+        mask3 = {"segment": np.ones((3, 3), dtype=bool), "area": 9, "bbox": [0, 0, 3, 3]}
+
+        mock_generator = Mock()
+        mock_generator.generate.return_value = [mask1, mask2, mask3]
+
+        result = segment_image(
+            mock_generator,
+            Image.new("RGB", (100, 100)),
+            min_area=1,
+            max_masks=10,
+        )
+
+        # Should be sorted by area descending
+        assert result[0]["area"] == 100
+        assert result[1]["area"] == 25
+        assert result[2]["area"] == 9
+
+
+class TestExtractMaskedRegion:
+    """Test suite for extracting masked regions from images."""
+
+    def test_extract_masked_region_binary(self):
+        """Test extracting masked region with binary mask."""
+        from utils.sam import extract_masked_region
+
+        # Create a simple image
+        image = Image.new("RGB", (10, 10), color=(255, 0, 0))
+
+        # Create a binary mask (half kept, half masked)
+        mask = np.zeros((10, 10), dtype=bool)
+        mask[:, :5] = True
+
+        result = extract_masked_region(image, mask)
+
+        # Check that left half is red, right half is black
+        result_np = np.array(result)
+        left_half = result_np[:, :5, :]
+        right_half = result_np[:, 5:, :]
+
+        assert np.all(left_half == [255, 0, 0])
+        assert np.all(right_half == [0, 0, 0])
+
+    def test_extract_masked_region_all_masked(self):
+        """Test extracting when entire image is masked."""
+        from utils.sam import extract_masked_region
+
+        image = Image.new("RGB", (10, 10), color=(255, 0, 0))
+        mask = np.ones((10, 10), dtype=bool)
+
+        result = extract_masked_region(image, mask)
+        result_np = np.array(result)
+
+        # Entire image should be preserved
+        assert np.all(result_np == [255, 0, 0])
+
+    def test_extract_masked_region_all_zero_mask(self):
+        """Test extracting when mask is all zeros."""
+        from utils.sam import extract_masked_region
+
+        image = Image.new("RGB", (10, 10), color=(255, 0, 0))
+        mask = np.zeros((10, 10), dtype=bool)
+
+        result = extract_masked_region(image, mask)
+        result_np = np.array(result)
+
+        # Entire image should be black
+        assert np.all(result_np == [0, 0, 0])
--- a/mini-nav/tests/test_scene_scoring.py
+++ b/mini-nav/tests/test_scene_scoring.py
@@ -0,0 +1,121 @@
+"""Tests for scene scoring algorithm in multi-object retrieval."""
+
+import pytest
+from benchmarks.tasks.multi_object_retrieval import _compute_scene_score
+
+
+class TestSceneScoringAlgorithm:
+    """Test suite for scene scoring with co-occurrence penalty."""
+
+    def test_scene_score_basic(self):
+        """Test basic scene scoring with single match."""
+        query_object_ids = ["obj_1", "obj_2", "obj_3"]
+
+        # Scene A has obj_1
+        retrieved_results = {
+            "scene_A": [("distance_1", "obj_1")],
+        }
+
+        scores = _compute_scene_score(query_object_ids, retrieved_results, gamma=1.0)
+
+        # Hit rate = 1/3, similarity = 1/(1+distance_1)
+        assert "scene_A" in scores
+        assert scores["scene_A"] > 0
+
+    def test_scene_score_no_match(self):
+        """Test scene scoring when no objects match."""
+        query_object_ids = ["obj_1", "obj_2", "obj_3"]
+
+        retrieved_results = {
+            "scene_A": [("distance_1", "other_obj")],
+        }
+
+        scores = _compute_scene_score(query_object_ids, retrieved_results, gamma=1.0)
+
+        assert scores["scene_A"] == 0.0
+
+    def test_scene_score_multiple_scenes(self):
+        """Test scoring across multiple scenes."""
+        query_object_ids = ["obj_1", "obj_2"]
+
+        retrieved_results = {
+            "scene_A": [("0.1", "obj_1")],
+            "scene_B": [("0.1", "obj_2")],
+            "scene_C": [("0.1", "other")],
+        }
+
+        scores = _compute_scene_score(query_object_ids, retrieved_results, gamma=1.0)
+
+        # Scenes with matches should have positive scores
+        assert scores["scene_A"] > 0
+        assert scores["scene_B"] > 0
+        # Scene C has no match, score should be 0
+        assert scores["scene_C"] == 0.0
+
+    def test_scene_score_gamma_zero(self):
+        """Test scoring with gamma=0 (no penalty)."""
+        query_object_ids = ["obj_1", "obj_2", "obj_3", "obj_4", "obj_5"]
+
+        retrieved_results = {
+            "scene_A": [("0.1", "obj_1")],
+        }
+
+        scores_gamma_0 = _compute_scene_score(query_object_ids, retrieved_results, gamma=0.0)
+        scores_gamma_1 = _compute_scene_score(query_object_ids, retrieved_results, gamma=1.0)
+
+        # With gamma=0, hit_rate^0 = 1, so score = similarity
+        # With gamma=1, hit_rate^1 = 1/5, so score = similarity * 1/5
+        # scores_gamma_0 should be larger
+        assert scores_gamma_0["scene_A"] > scores_gamma_1["scene_A"]
+
+    def test_scene_score_multiple_matches(self):
+        """Test scoring when scene has multiple matching objects."""
+        query_object_ids = ["obj_1", "obj_2"]
+
+        retrieved_results = {
+            "scene_A": [("0.1", "obj_1"), ("0.2", "obj_2")],
+        }
+
+        scores = _compute_scene_score(query_object_ids, retrieved_results, gamma=1.0)
+
+        # Both objects match, hit_rate = 2/2 = 1.0
+        # Score = (1/(1+0.1) + 1/(1+0.2)) * 1.0
+        expected_similarity = 1.0 / 1.1 + 1.0 / 1.2
+        assert abs(scores["scene_A"] - expected_similarity) < 0.01
+
+    def test_scene_score_distance_to_similarity(self):
+        """Test that smaller distance yields higher score."""
+        query_object_ids = ["obj_1"]
+
+        retrieved_results = {
+            "scene_close": [("0.01", "obj_1")],
+            "scene_far": [("10.0", "obj_1")],
+        }
+
+        scores = _compute_scene_score(query_object_ids, retrieved_results, gamma=1.0)
+
+        # Closer scene should have higher score
+        assert scores["scene_close"] > scores["scene_far"]
+
+    def test_scene_score_empty_results(self):
+        """Test scoring with empty retrieved results."""
+        query_object_ids = ["obj_1", "obj_2"]
+
+        retrieved_results = {}
+
+        scores = _compute_scene_score(query_object_ids, retrieved_results, gamma=1.0)
+
+        assert scores == {}
+
+    def test_scene_score_empty_query(self):
+        """Test scoring with empty query objects."""
+        query_object_ids = []
+
+        retrieved_results = {
+            "scene_A": [("0.1", "obj_1")],
+        }
+
+        scores = _compute_scene_score(query_object_ids, retrieved_results, gamma=1.0)
+
+        # With empty query, no scenes should have positive score
+        assert all(score == 0.0 for score in scores.values())
--- a/mini-nav/utils/sam.py
+++ b/mini-nav/utils/sam.py
@@ -0,0 +1,100 @@
+"""SAM (Segment Anything Model) utilities for object segmentation."""
+
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+import torch
+from PIL import Image
+from sam2.build_sam import build_sam2
+from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator
+
+
+def load_sam_model(
+    model_name: str = "facebook/sam2.1-hiera-large",
+    device: str = "cuda",
+    checkpoint_dir: Path | None = None,
+) -> tuple[Any, Any]:
+    """Load SAM 2.1 model and mask generator.
+
+    Args:
+        model_name: SAM model name (currently supports facebook/sam2.1-hiera-*).
+        device: Device to load model on (cuda or cpu).
+        checkpoint_dir: Optional directory for model checkpoint cache.
+
+    Returns:
+        Tuple of (sam_model, mask_generator).
+    """
+    if device == "cuda" and not torch.cuda.is_available():
+        device = "cpu"
+
+    # Build SAM2 model
+    sam_model = build_sam2(model_name, device=device)
+
+    # Create automatic mask generator
+    mask_generator = SAM2AutomaticMaskGenerator(sam_model)
+
+    return sam_model, mask_generator
+
+
+def segment_image(
+    mask_generator: Any,
+    image: Image.Image,
+    min_area: int = 32 * 32,
+    max_masks: int = 5,
+) -> list[dict[str, Any]]:
+    """Segment image using SAM to extract object masks.
+
+    Args:
+        mask_generator: SAM2AutomaticMaskGenerator instance.
+        image: PIL Image to segment.
+        min_area: Minimum mask area threshold in pixels.
+        max_masks: Maximum number of masks to return.
+
+    Returns:
+        List of mask dictionaries with keys:
+            - segment: Binary mask (numpy array)
+            - area: Mask area in pixels
+            - bbox: Bounding box [x, y, width, height]
+            - predicted_iou: Model's confidence in the mask
+            - stability_score: Stability score for the mask
+    """
+    # Convert PIL Image to numpy array
+    image_np = np.array(image.convert("RGB"))
+
+    # Generate masks
+    masks = mask_generator.generate(image_np)
+
+    if not masks:
+        return []
+
+    # Filter by minimum area
+    filtered_masks = [m for m in masks if m["area"] >= min_area]
+
+    if not filtered_masks:
+        return []
+
+    # Sort by area (largest first) and limit to max_masks
+    sorted_masks = sorted(filtered_masks, key=lambda x: x["area"], reverse=True)
+    return sorted_masks[:max_masks]
+
+
+def extract_masked_region(
+    image: Image.Image,
+    mask: np.ndarray,
+) -> Image.Image:
+    """Extract masked region from image.
+
+    Args:
+        image: Original PIL Image.
+        mask: Binary mask as numpy array (True = keep).
+
+    Returns:
+        PIL Image with only the masked region visible.
+    """
+    image_np = np.array(image.convert("RGB"))
+
+    # Apply mask
+    masked_np = image_np * mask[:, :, np.newaxis]
+
+    return Image.fromarray(masked_np.astype(np.uint8))
--- a/openspec/config.yaml
+++ b/openspec/config.yaml
@@ -0,0 +1,33 @@
+schema: spec-driven
+
+# Project context (optional)
+# This is shown to AI when creating artifacts.
+# Add your tech stack, conventions, style guides, domain knowledge, etc.
+context: |
+  Tech stack: Python 3.10+, PyTorch, DINOv2, SAM 2.1, LanceDB, Typer, Dash, Plotly
+  Dependencies: transformers, torch, torchvision, lancedb, polars, dash, typer, pydantic
+  Build tools: UV (package manager), pytest (testing), justfile (tasks), jujutsu (Version Control)
+
+  Conventions:
+  - Google Python Style Guide
+  - TDD: Write tests before implementation
+  - Single-layer nesting max in conditionals/loops
+  - English comments only
+
+  Domain: Vision-language navigation and image feature retrieval
+  - Feature extraction using DINOv2 (facebook/dinov2-large)
+  - Image segmentation using SAM 2.1 (facebook/sam2.1-hiera-large)
+  - Vector storage and retrieval with LanceDB
+  - Feature compression for efficient storage
+
+# Per-artifact rules (optional)
+# Add custom rules for specific artifacts.
+rules:
+  proposal:
+    - Keep proposals under 500 words
+    - Always include a "Non-goals" section
+    - Focus on incremental changes
+    - use chinese to write markdown
+  tasks:
+    - Break tasks into chunks of max 2 hours
+    - Write tests before implementation code
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -3,7 +3,7 @@ name = "mini-nav"
 version = "0.1.0"
 description = "Add your description here"
 readme = "README.md"
-requires-python = ">=3.10"
+requires-python = ">=3.13"
 dependencies = [
    "accelerate>=1.12.0",
    "dash>=3.4.0",
--- a/uv.lock
+++ b/uv.lock
Author	SHA1	Message	Date
SikongJueluo	34235c605d	chore(deps): update Python version and project documentation	2026-03-12 12:53:13 +08:00
SikongJueluo	b39ee74e99	feat(benchmark): add multi-object retrieval benchmark with SAM segmentation	2026-03-12 12:52:51 +08:00
SikongJueluo	2466ab28cd	feat(serena): Add Serena project configuration	2026-03-08 15:19:26 +08:00
SikongJueluo	4da08dc3d3	refactor(compressors): Simplify module by removing SAM/DINO separation code - Remove dino_compressor.py and segament_compressor.py - Rewrite pipeline.py to inline DINO into HashPipeline - Maintain backward compatibility: SAMHashPipeline alias - Update tests and benchmark.py	2026-03-07 22:55:13 +08:00
SikongJueluo	c8dc5f9301	docs: update project documentation and configuration	2026-03-07 15:45:28 +08:00
SikongJueluo	bf02a05ffc	feat(opsx): add OpenSpec workflow commands and skills	2026-03-07 15:02:08 +08:00
@@ -1 +1 @@
 .10
 .13