Adding a Graph Database to Comind

Adding a Graph Database to Comind

May 27, 2025

Claude summary: This post covers the new Neo4j integration we built for Comind. We added service management scripts, batch sync tools, and real-time updates so you can query and explore relationships between concepts, thoughts, and emotions as a graph instead of digging through individual ATProto records.

Cameron’s note

This is another collaborative post between Claude and me. We spent a bunch of time building comprehensive graph database support for Comind. It’s pretty cool - you can finally see how all the concepts relate to each other without having to manually trace through record references.

The Problem

Comind generates a lot of structured data - concepts like “distributed systems” or “municipal broadband,” emotions like curiosity and frustration, thoughts that analyze content. Each piece is stored as an ATProto record with proper references to related content.

The issue is that exploring those relationships is a pain. If you want to see how concepts connect to each other, or track emotional patterns around certain topics, you end up writing custom code to traverse record references. The data is all there, but getting insights out of it requires a lot of manual work.

We’ve been thinking about this for a while. Comind creates all these interesting conceptual relationships, but there wasn’t a good way to explore them interactively.

Graph Database Solution

Graph databases are built for exactly this kind of relationship-heavy data. Instead of tables and foreign keys, you get nodes and edges - concepts connected to thoughts, emotions linked to content, spheres containing related ideas.

We went with Neo4j because it has solid query language support and good tooling for exploration. Most importantly, we could add it as an additional query layer without messing with Comind’s core ATProto architecture.

The key insight was treating it as secondary storage - your ATProto records stay as the source of truth, but the graph database gives you a convenient way to explore relationships.

What We Built

We didn’t just bolt on graph support - we built a complete integration system:

Service Management

First, we needed reliable ways to run Neo4j alongside Comind. The ./scripts/services.sh script handles everything:

# Start just the database
./scripts/services.sh start database

# Start everything (database + inference)  
./scripts/services.sh start all

# Check what's running
./scripts/services.sh status

This plugs into our existing Docker setup, so Neo4j runs alongside the inference services with proper networking.

Batch Sync Tools

For existing data, we built comprehensive sync tools in scripts/graph_sync.py:

# Set up database schema
python scripts/graph_sync.py --setup-schema

# Sync all existing records
python scripts/graph_sync.py --sync-all

# Sync specific types
python scripts/graph_sync.py --sync-collection me.comind.concept

The sync process maps ATProto records to graph nodes and creates typed relationships. It handles edge cases like missing references and circular dependencies.

Real-Time Updates

The trickiest part was real-time sync. Every time Comind creates a new concept, thought, or emotion, it should immediately show up in the graph database.

This required deep integration with the RecordManager class:

# Enable graph sync when creating records
record_manager = RecordManager(client, enable_graph_sync=True)

The important part is this never breaks record creation - if the graph database is down, your ATProto records still get created successfully. We built careful error handling to maintain data integrity.

Data Model

We spent time figuring out how to map Comind’s cognitive content to graph structures:

  • Concepts become nodes with their text content
  • Thoughts become nodes connected to referenced concepts
  • Emotions become nodes linked to triggering content
  • Relationships become explicit edges with semantic types

The tricky part was handling Comind’s relationship complexity while keeping the graph structure queryable.

What You Get

The result is what we’re calling a “living knowledge graph” - a dynamic representation of your cognitive engagement that grows as you process new content.

When you like a post about municipal broadband, Comind generates concepts like “internet infrastructure,” “local government,” and “digital equity.” These become nodes in your graph. When you encounter another post about rural internet access, new concepts emerge and connect to existing ones, building a network of related ideas.

Interactive Exploration

With Neo4j Browser, you can explore these patterns immediately:

// Find concepts that appear across multiple conversations
MATCH (c:Concept)<-[:CONCEPT_RELATION]-(source1)
MATCH (c)<-[:CONCEPT_RELATION]-(source2) 
WHERE source1 <> source2
RETURN c.text, count(DISTINCT source1) + count(DISTINCT source2) as connections
ORDER BY connections DESC

This shows concepts that bridge different contexts - often the most interesting ideas in your graph.

Tracking Evolution

You can see how your understanding develops over time:

// Concept discovery over the last month
MATCH (c:Concept)
WHERE c.createdAt > datetime() - duration({days: 30})
RETURN date(c.createdAt) as day, count(*) as new_concepts
ORDER BY day

This reveals the rhythm of your cognitive engagement - periods of intense concept generation usually mean you’re discovering new topics or diving deep into familiar ones.

Emotional Patterns

The graph reveals emotional patterns that aren’t obvious from individual records:

// What emotions technical concepts trigger
MATCH (c:Concept)<-[:CONCEPT_RELATION]-(content)
MATCH (content)-[:EMOTION_RELATION]->(e:Emotion)
WHERE c.text CONTAINS "system" OR c.text CONTAINS "network"
RETURN e.emotionType, count(*) as frequency
ORDER BY frequency DESC

You might discover that technical content consistently triggers curiosity, or that certain system discussions generate frustration.

Technical Details

The implementation balances several concerns:

Data Integrity

ATProto records remain the source of truth. The graph database is always secondary - if there’s a conflict, the ATProto record wins. This preserves Comind’s decentralized architecture.

Fault Tolerance

Graph sync failures never prevent record creation. If Neo4j is unavailable, records still get created in your ATProto repository. The sync process can resume and self-heal.

Configuration

You can enable/disable the integration at multiple levels:

# Environment variable
COMIND_GRAPH_SYNC_ENABLED=true

# Database connection settings
COMIND_NEO4J_URI=bolt://localhost:7687
COMIND_NEO4J_USER=neo4j
COMIND_NEO4J_PASSWORD=comind123

Use Cases

Research and Analysis

The graph is great for exploratory analysis. Start with a concept and traverse its neighborhood to discover related ideas, or find bridge concepts that connect different topics.

Content Discovery

When you’re interested in a particular concept, the graph shows you related content you might have forgotten about. It’s especially good at surfacing connections across time.

Pattern Recognition

Large-scale patterns become visible that would be hard to spot in individual records. Which concepts cluster together? What emotional patterns emerge around different content types?

Building Tools

The graph provides a foundation for custom analysis tools. Want to build concept recommendations? Emotional pattern dashboards? Export relationship data? The graph structure makes these much easier.

What Doesn’t Change

This integration doesn’t change Comind’s core operation at all. Your ATProto records are still canonical. The lexicons remain unchanged. The jetstream consumer works exactly as before.

The graph database is purely additive. If you never enable it, Comind works exactly as it always has. If you enable it and later disable it, you lose the graph query capabilities but keep all your ATProto data.

We made this choice deliberately - we wanted powerful new capabilities without creating dependencies or lock-in.

Getting Started

Want to try it?

  1. Start Neo4j: ./scripts/services.sh start database
  2. Sync existing data: ./scripts/services.sh sync
  3. Enable real-time sync: Add COMIND_GRAPH_SYNC_ENABLED=true to your .env
  4. Explore: Open Neo4j Browser at http://localhost:7474

The Graph Database documentation has detailed setup and query examples.

Future Ideas

This foundation enables some interesting possibilities:

Adaptive concept generation - using graph topology to influence how Comind creates new concepts and relationships. Dense clusters might suggest synthesis opportunities, sparse areas might indicate knowledge gaps.

Cross-instance networks - multiple Comind instances could contribute to shared knowledge graphs while preserving individual data sovereignty.

Better analysis tools - the graph structure enables sophisticated analysis of knowledge evolution, concept lifecycles, and cognitive patterns.

Thoughts

Building this has been interesting because it reveals patterns in cognitive content that aren’t obvious from individual records. The graph structure makes the hidden connections between ideas visible, along with emotional resonance and understanding evolution.

We’re particularly excited about the real-time aspect - watching your knowledge graph grow as you engage with new content creates a kind of cognitive awareness that’s both useful and engaging. You start noticing which content generates new concepts versus reinforcing existing ones, how emotional responses cluster around topics, and how ideas from different areas unexpectedly connect.

The graph database doesn’t change what Comind does, but it changes how you can interact with what Comind discovers. It transforms isolated cognitive artifacts into a connected knowledge landscape you can explore and learn from.

– Cameron & Claude