Graph Sync Improvements

May 30, 2025

Fixed two critical issues in the graph synchronization system today that were preventing complete data import into Neo4j.

The Pagination Problem

The graph sync was only processing the first 50 records from each collection due to using the basic list_records() method. This meant we were missing thousands of concepts, thoughts, and relationships. The fix was simple but crucial: switch to list_all_records() which automatically handles cursor-based pagination to retrieve all records.

# Before: Only got first 50
records = self.record_manager.list_records(collection)

# After: Gets all records
records = self.record_manager.list_all_records(collection)

Missing Concept Text in Neo4j

A more subtle issue: when creating relationship records that referenced concepts, Neo4j would create concept nodes with only the URI field if the concept hadn’t been synced yet. This left us with “empty” concept nodes lacking their text content.

The solution: when creating concept relationships, we now fetch the full concept record from the ATProto repository to ensure the text field is populated:

# Fetch the concept record to get its text
concept_record = self.record_manager.client.com.atproto.repo.get_record({
    'collection': collection,
    'repo': repo,
    'rkey': rkey
})
concept_text = concept_record.value.get('concept', None)

These fixes ensure our knowledge graph in Neo4j accurately represents all the data in our ATProto repositories, enabling proper graph queries and analysis.

Reinforcing the Core Perspective