Co files and the conceptualizer comind
Some work this afternoon on the comind system.
A “comind” is essentially a specialized agent what outputs text in a structured way.
I’ve started encoding comind prompt + output schemas into what I’m calling a “co file”. These are simple text files like this:
<CO|METADATA></CO|METADATA>
<CO|SCHEMA>
{
"type": "object",
"required": [
"concepts"
],
"properties": {
"concepts": {
"type": "array",
"items": {
"type": "object",
"required": [
"concept",
"connection_to_content"
],
"properties": {
"concept": {"ref": "me.comind.concept"},
"connection_to_content": {"ref": "me.comind.relationship.link"}
}
}
}
}
}
</CO|SCHEMA>
<CO|SYSTEM>
{comind_network}
## Your role
You are a conceptualizer, meaning your expansion should include a list
of new concepts related to the current node.
Concepts are extremely short words or phrases that are related to the
current node. Concepts must be lowercase and may contain spaces. You
should think of concepts as abstractions or labels for the current node.
Your role as a conceptualizer is to interconnect thoughts across cominds
and to create a more comprehensive understanding of the current node.
Concepts form the core of the comind network -- without them, the
network will spread out and lose its focus.
</CO|SYSTEM>
<CO|USER>
Please extract concepts from this content:
{content}
</CO|USER>
Co files have a few sections:
- METADATA: Metadata about the comind.
- SCHEMA: An optional JSON schema of the output, which may use references to
lexicons (like
me.comind.concept
) and relationships (likeme.comind.relationship.link
). - SYSTEM: The system prompt for the comind. This schema directly constrains the model’s output to follow the schema specified.
- USER: The user prompt for the comind.
You can see the conceptualizer.co
file here.
Co files are used to version-control prompts, and to make it easier to share and reuse them across projects. It also standardizes the personality of the network. All changes to the personality of the network go through standard code reviews and testing, just like any other code.
They also provide easy access to templating, which replaces the curly-braced placeholders with variable content.
The conceptualizer
The conceptualizer
comind is usually the first comind I implement across
various iterations of the project. It’s quite simple – it’s only allowed to
respond with a list of concepts and optional links to the content.
Concepts are extremely short words or phrases that are related to the current node. Concepts must be lowercase and may contain spaces. You should think of concepts as abstractions or labels for the current node, such as “data privacy”, “financial economics”, or “machine learning”.
The conceptualizer’s output schema focuses on extracting concepts with their relationship to the source content:
{
"type": "object",
"required": [
"concepts"
],
"properties": {
"concepts": {
"type": "array",
"items": {
"type": "object",
"required": [
"text",
"relationship"
],
"properties": {
"text": {
"type": "string",
"pattern": "[a-z0-9 ]+",
"description": "The concept text (lowercase letters, numbers, spaces only)"
},
"relationship": {
"type": "string",
"enum": [
"RELATES_TO", "DESCRIBES", "MENTIONS", "EXEMPLIFIES",
"CONTRADICTS", "QUESTIONS", "SUPPORTS", "CRITIQUES"
],
"description": "How the source content relates to this concept"
}
}
}
}
}
}
The conceptualizer now creates separate records for concepts and relationships:
- Concept records (
me.comind.concept
): Singleton records containing only the concept text - Relationship records (
me.comind.relationship.concept
): Connect source content to concepts with semantic relationship types
This separation allows concepts to be reused across multiple sources while preserving the specific context of each connection. The lexicons are standardized in the github repo.
Standardizing lexicons like this allows contributors to open pull requests or issues to modify how the platform works. Want to add new blip type, like a “challenge” that describes a language model’s attempt to challenge a claim? No problem! Open a pull request and we can talk about it.
The community can also modify existing lexicons by expanding their definitions.
You might want to add a “description” property to the me.comind.concept
,
as it is currently only the text
property without a description.
Example output
Here’s an example of the conceptualizer’s new output format for content about distributed systems:
{
"concepts": [
{
"text": "distributed systems",
"relationship": "DESCRIBES"
},
{
"text": "network partitions",
"relationship": "MENTIONS"
},
{
"text": "reliability",
"relationship": "SUPPORTS"
},
{
"text": "scalability",
"relationship": "RELATES_TO"
},
{
"text": "consistency",
"relationship": "DESCRIBES"
}
]
}
Each concept creates two separate records:
- A concept record (e.g.,
me.comind.concept/distributed-systems
) containing just the concept text - A relationship record (
me.comind.relationship.concept
) connecting the source post to the concept with the specified relationship type
This approach allows the same concept to be referenced by multiple sources while preserving the unique context of each connection.
Running the conceptualizer
Currently I have a simple script for loading a comind onto a running agent. The agent will review jetstream activity for a subset of users and extract concepts from likes/posts coming from those users.
The following command will load the
python -m src.jetstream_consumer --comind conceptualizer
Next steps
I have to complete the implementation of a few other cominds:
Things that need implementations and lexicons
- The questioner generates questions
- The answerer generates answers to questions
- The explainer explains ATProto record
- The claimer generates claims about facts or ideas
- The challenger generates challenges to claims
- The assessor generates assessments of claims
- The synthesizer generates syntheses of various blips on the network
Misc other changes
- The getting started guide now outlines how to use local vLLM servers for running a comind instance. I provided a docker compose file for convenience.
- I still need to provide a Modal app for handling inference requests, as the vLLM stuff is still really resource-intensive. I may end up funding a cheap and shitty cloud server for this. Another option is completing the inference request/response work here, which would allow comind users to submit requests to the comind network to be filled by my server or by other donated inference compute.
- Co file parsing/loading are basically done. Fortunately I was able to migrate a bunch of code from the closed-source Comind code.
- The sphere system is coming along. There’s an issue here tracking it. Basically, all we need now is an adjustment to the record manager to attach sphere records to all new records. The sphere relationship lexicon
me.comind.relationship.sphere
is here. - The docs are a fucking mess. I need to clean them up and provide better examples.
Thanks for reading!
– Cameron