The Model Doesn't Need to Know You

Last month I argued that token spend is becoming the new headcount line — the cost of inference splitting off the salary line and turning into its own item on the income statement. If you accept that, the next question is the obvious one. Once token spend is a real budget, how do you actually control it, and who holds the memory the spend produces?

That control problem is one of two distinct vectors I see emerging across broader AI adoption right now. The other is memory. The topic I most want to lay out this month is memory in AI — in both the personal and the corporate context — because once you have multiple models doing real work, where the context lives quietly becomes the whole game.

Let me take the two vectors in order.

The Router Is the Budget

The first vector is the control plane, and it exists because businesses are watching their token costs explode. The release of Claude Fable 5 today — the safe-for-general-use version of Mythos, the rationed frontier model I wrote about in May — is only going to push those costs further out. The frontier keeps getting more expensive, and companies are realizing there are options at the frontier. Not all work should run on the most expensive, most capable model.

If you are doing basic categorization work, that should happen on a less capable model. If you are doing advanced coding or math or some other critical business operation, that should run on the most advanced model you can get. The trick is that you need a router sitting in front of all of it — something that decides which model handles what, and controls how you spend across them.

This is true on both sides of the model world. It applies to the closed models from OpenAI and Anthropic, and it applies to the open-source models, mostly coming from Chinese competitors. Those open models are closely approximating the performance and the intelligence of the frontier closed models. And frankly, for most workloads, they are more than sufficient — at an order of magnitude lower cost, if not more. That gap is the whole reason the control plane has to exist. If every model were priced the same, you would simply route everything to the best one and stop thinking about it. They are not priced the same, and the spread is enormous, so the routing decision becomes a budgeting decision on every single request.

via @eglyman, co-founder of Ramp

This is essentially the AI version of having all your most expensive senior executives in a room deciding which caterer to choose for the holiday party, instead of outsourcing the decision to a party planner or intern.

So companies are realizing they need to manage request and order routing. But the most frontier companies are still willing to eat the costs right now. Two reasons. One, you need to operate at the frontier. And two, you want all of your employees thinking about nothing but adopting AI, experimenting with it, and seeing what it is capable of — and you want to see what they are able to develop and produce as well. The bill, for those companies, is the price of finding out what their people can do once you stop rationing the tool.

That is the first piece. The control plane of spending.

The Harness Is Where Cost and Quality Get Decided

The second piece follows directly from the first. If you are using multiple models, what becomes important is the harness that sits on top of them. That harness is, effectively, your smart model router. As this matures, you route some of your jobs to the super-frontier models and some of your jobs to the less expensive ones. You get the best blend of cost and quality when you fine-tune the harness in a specific way. The intelligence of the system stops being only a property of the model. It starts being a property of the thing orchestrating the models.

Claude Code is one example of a harness. In Claude Code's case, they are happy for you to spend as much money as you possibly can and use the most extreme model for all of your tasks, because they are able to charge for that directly. That is their business, and it is a turning out to be one of the best ever.

The open routers are a different shape. If you are using something like Hermes or OpenClaw or another open-source router, then the harness determines, through your own configuration, how it routes different workloads to different models and different tasks. And here is where the thing I actually care about this month shows up. What becomes important inside that harness is that you have a shared context that can be applied across all of the work you are doing. The harness decides which model answers. The shared context decides what the model knows when it does.

In a Single-Vendor World, the Memory is Theirs

If you rely exclusively on a single model vendor, you are generally fine with the fact that all of your data sits and compounds inside that vendor's walled garden. OpenAI has its own memory product. So does Claude. And on a broader scale, so does Google — because if all your data lives in Gmail, Google Workspace, and Google Docs, and Gemini is plugged into it, then the memory and context layer all live inside their state, on their platform. That is a coherent arrangement, and most people never feel the edges of it because they never try to leave.

That is comfortable right up until you want to move between models.

If you are switching between models — using one for one job and another for a different job — you want to keep that consistency. For coding, you already have the answer. You can use GitHub and have a repo that holds all of that context. As you jump between models, all of your code and documentation sit there and live there as the context for the next job you begin. The repo is the memory, and the model on top of it is interchangeable.

But think about everyday workloads. Inside a company you might ask, who are our best customers? That is a quick answer, and it should run on the low-grade model. But then you might say, okay, within those customers, give me the dashboard that breaks down dollar retention and all the advanced analytics around them. Now you want to be feeding off a shared brain, building out dashboards and running analytics, and that probably wants a more expensive model. As you jump between those models, you want a shared data context underneath all of it. The cheap answer and the expensive analysis should be drawing from the same source material.

That is the business context. The same thing works in the personal context.

The Personal Context Advantage

This is what I have been experimenting with most lately.

Anthropic demonstrated last week in a study of AI agents in biology when they tested six models — cheap and frontier, open and closed — on pulling viral-sequence data out of a reference database. On their own, the models were all over the map, scoring anywhere from under 17% to over 91% and often giving different answers to the same question on repeated runs. When they handed every model the same deterministic tool for retrieving that data, every one of them climbed above 90%. Their own conclusion was that the retrieval layer "made model choice much less important," and that reliable results "should not depend on access to the newest or most expensive model."

GBrain serves as my version of that. The model I point at it matters far less than the quality of the context it gets to read — a cheaper, sub-frontier model with my whole brain behind it routinely produces work I would otherwise reach for a frontier model to get. It is the same bet the control plane makes from the other end: when the context layer is good enough, you can route most of your work to cheaper models and lose almost nothing. GBrain, open-sourced by Garry Tan, the CEO of Y Combinator, compiles a combined knowledge base that syncs all of my email, Telegram, Calendar, Google Workspace, and a variety of other sources like Notion, and pulls all of it into a single unified database hosted, in my case, with Supabase. That database has about 80,000 pages of data covering every person I have met with or corresponded with across any of those surfaces, going back as far as my personal digital records exist.

You can feed in as much context as you want and enrich it from a variety of sources — pulling data from LinkedIn, or other channels, or even paid data sources you have access to. It becomes LinkedIn-embedded, so I can ask personal questions across my network, or personal questions about travel plans. It knows very clearly everything about my personal travel history, my upcoming plans, where I am going and when, and it can tailor responses against the personal data it knows about me rather than a generic model response, drawn from this entire corpus of digital knowledge I have accumulated over the last 20-plus years.

That creates a far more useful and interesting model for me to use and delegate work to than if I were just using ChatGPT or Claude out of the box. It knows how I write, who I talk to, where I eat, and where I go on vacation. It knows everything about me. And I host it, not Anthropic or OpenAI. As I find interesting content from research or around the web, like books or podcasts, I feed it into this brain as well so it then becomes additional reference material and digests of that material can be produced specific to my interests and research objectives across the full data plane.

But the model itself does not need know any of this information about me.

That is the key. The model itself only sends queries to my personal, private, secure data store, which then responds back with the context that gets used in the model's response. Anthropic and OpenAI are not, in this case, storing any of the data about me. Nor, if I were using this on an enterprise subscription, would they have access to any of it for their model training, or have it sitting in memories, or anything like that. All of the data lives in my own storage — storage that I control, that I manage, and that I keep distinct from any individual data provider.

Everyone Will Have a Personal Assistant That Knows Them

I expect two things are about to occur.

The first is that the next major consumer app category is going to be this one: the personal assistant. It productizes the experience I have been able to curate with Hermes and my own personal brain, and does so in a productionized form. We saw several of these products launch this month. The most notable was one called Town, which launched to a ton of VCs praising it as their new tool that they use every day for firm-wide knowledge. Another is Ollie, targeted at parents.

In a letter published by OpenAI yesterday

So there is going to be a battle for these personal AI assistants, in the same way there was a battle for the email clients everyone would use over the last several years. Some people will surely get this from OpenAI, some from Google, some from Claude. And it may be distinct between your personal and your work life. But this personal AI assistant — a detailed, deep knowledge base all about you, that you can tailor and customize your results against — is going to be something everyone has over the next couple of years.

The second thing is where it starts. It is likely to begin in the business context. But very few businesses have even digitized any of this data, and almost none are anywhere close to establishing it in a way they can actually make use of. If they could, it would be an incredible enabler of their businesses. The companies sitting on years of email, documents, and customer history already own the raw material for exactly this. They simply have not assembled it yet.

Crypto Was Supposed to Be the Memory Layer

There is an older version of this argument that I have been making since long before GBrain, and it ran straight through crypto. Back in 2024, I argued that your AI memory belonged on a public blockchain — a self-custodial layer you own outright, where a model could reach your knowledge without ever possessing it, and your context would follow you from one model to the next. That is almost exactly the case I am making for a shared brain now, just with a different substrate underneath it.

I did not end up building it that way. GBrain runs on storage I control, but the storage still sits with a provider, because that was practical to stand up today. But the reasoning that pulled me toward crypto in the first place has not gone anywhere. A memory state that is owned, portable, and where I can securely control and grant scoped access cryptographically still makes sense to me. This shared, self-custodial memory layer is yet another logical place where these two frontier technologies meet, aside from just the agentic payments story.

So that is where I have been spending most of my energy this past month. The control plane decides what you spend. The harness decides how the work gets routed. But the part that compounds and turns a generic model into something that actually works for you is the shared brain underneath it, stored in storage you own rather than rented in the model providers’ data centers.

The smarter models are still exponentially increasing in general utility. A model pointed at your own memory increases much more in personal utility.

Thanks as always for reading,

-Jake Dwyer

Founder & Managing Partner
Factor Capital

Factor Capital Update - June 2026