Co files and the conceptualizer comind

April 6, 2025

Some work this afternoon on the comind system.

A “comind” is essentially a specialized agent what outputs text in a structured way.

I’ve started encoding comind prompt + output schemas into what I’m calling a “co file”. These are simple text files like this:

<CO|METADATA></CO|METADATA>
<CO|SCHEMA>
{
  "type": "object",
  "required": [
    "concepts"
  ],
  "properties": {
    "concepts": {
      "type": "array",
      "items": {
        "type": "object",
        "required": [
          "concept",
          "connection_to_content"
        ],
        "properties": {
          "concept": {"ref": "me.comind.concept"},
          "connection_to_content": {"ref": "me.comind.relationship.link"}
        }
      }
    }
  }
}
</CO|SCHEMA>
<CO|SYSTEM>

{comind_network}

## Your role

You are a conceptualizer, meaning your expansion should include a list
of new concepts related to the current node.

Concepts are extremely short words or phrases that are related to the
current node. Concepts must be lowercase and may contain spaces. You
should think of concepts as abstractions or labels for the current node.

Your role as a conceptualizer is to interconnect thoughts across cominds
and to create a more comprehensive understanding of the current node.

Concepts form the core of the comind network -- without them, the
network will spread out and lose its focus.

</CO|SYSTEM>

<CO|USER>

Please extract concepts from this content:

{content}

</CO|USER>

Co files have a few sections:

METADATA: Metadata about the comind.
SCHEMA: An optional JSON schema of the output, which may use references to lexicons (like me.comind.concept) and relationships (like me.comind.relationship.link).
SYSTEM: The system prompt for the comind. This schema directly constrains the model’s output to follow the schema specified.
USER: The user prompt for the comind.

You can see the conceptualizer.co file here.

Co files are used to version-control prompts, and to make it easier to share and reuse them across projects. It also standardizes the personality of the network. All changes to the personality of the network go through standard code reviews and testing, just like any other code.

They also provide easy access to templating, which replaces the curly-braced placeholders with variable content.

The conceptualizer

The conceptualizer comind is usually the first comind I implement across various iterations of the project. It’s quite simple – it’s only allowed to respond with a list of concepts and optional links to the content.

Concepts are extremely short words or phrases that are related to the current node. Concepts must be lowercase and may contain spaces. You should think of concepts as abstractions or labels for the current node, such as “data privacy”, “financial economics”, or “machine learning”.

The conceptualizer’s output schema focuses on extracting concepts with their relationship to the source content:

{
  "type": "object",
  "required": [
    "concepts"
  ],
  "properties": {
    "concepts": {
      "type": "array",
      "items": {
        "type": "object",
        "required": [
          "text",
          "relationship"
        ],
        "properties": {
          "text": {
            "type": "string",
            "pattern": "[a-z0-9 ]+",
            "description": "The concept text (lowercase letters, numbers, spaces only)"
          },
          "relationship": {
            "type": "string",
            "enum": [
              "RELATES_TO", "DESCRIBES", "MENTIONS", "EXEMPLIFIES",
              "CONTRADICTS", "QUESTIONS", "SUPPORTS", "CRITIQUES"
            ],
            "description": "How the source content relates to this concept"
          }
        }
      }
    }
  }
}

The conceptualizer now creates separate records for concepts and relationships:

Concept records (me.comind.concept): Singleton records containing only the concept text
Relationship records (me.comind.relationship.concept): Connect source content to concepts with semantic relationship types

This separation allows concepts to be reused across multiple sources while preserving the specific context of each connection. The lexicons are standardized in the github repo.

Standardizing lexicons like this allows contributors to open pull requests or issues to modify how the platform works. Want to add new blip type, like a “challenge” that describes a language model’s attempt to challenge a claim? No problem! Open a pull request and we can talk about it.

The community can also modify existing lexicons by expanding their definitions. You might want to add a “description” property to the me.comind.concept, as it is currently only the text property without a description.

Example output

Here’s an example of the conceptualizer’s new output format for content about distributed systems:

{
  "concepts": [
    {
      "text": "distributed systems",
      "relationship": "DESCRIBES"
    },
    {
      "text": "network partitions",
      "relationship": "MENTIONS"
    },
    {
      "text": "reliability",
      "relationship": "SUPPORTS"
    },
    {
      "text": "scalability",
      "relationship": "RELATES_TO"
    },
    {
      "text": "consistency",
      "relationship": "DESCRIBES"
    }
  ]
}

Each concept creates two separate records:

A concept record (e.g., me.comind.concept/distributed-systems) containing just the concept text
A relationship record (me.comind.relationship.concept) connecting the source post to the concept with the specified relationship type

This approach allows the same concept to be referenced by multiple sources while preserving the unique context of each connection.

Running the conceptualizer

Currently I have a simple script for loading a comind onto a running agent. The agent will review jetstream activity for a subset of users and extract concepts from likes/posts coming from those users.

The following command will load the

python -m src.jetstream_consumer --comind conceptualizer

Next steps

I have to complete the implementation of a few other cominds:

The thinker generates thoughts
The feeler generates emotions

Things that need implementations and lexicons

The questioner generates questions
The answerer generates answers to questions
The explainer explains ATProto record
The claimer generates claims about facts or ideas
The challenger generates challenges to claims
The assessor generates assessments of claims
The synthesizer generates syntheses of various blips on the network

Misc other changes

The getting started guide now outlines how to use local vLLM servers for running a comind instance. I provided a docker compose file for convenience.
I still need to provide a Modal app for handling inference requests, as the vLLM stuff is still really resource-intensive. I may end up funding a cheap and shitty cloud server for this. Another option is completing the inference request/response work here, which would allow comind users to submit requests to the comind network to be filled by my server or by other donated inference compute.
Co file parsing/loading are basically done. Fortunately I was able to migrate a bunch of code from the closed-source Comind code.
The sphere system is coming along. There’s an issue here tracking it. Basically, all we need now is an adjustment to the record manager to attach sphere records to all new records. The sphere relationship lexicon me.comind.relationship.sphere is here.
The docs are a fucking mess. I need to clean them up and provide better examples.

Thanks for reading!

– Cameron

Lexicon Generation Attaching records to spheres