fidelitas.io

Reciprocal knowledge elicitation: when the interview goes both ways

2026-05-18T23:30:00+02:00

A few years ago I sat in on a knowledge transfer session that was, on paper, going to be the answer to every team’s prayer.

The senior engineer we were going to lose to retirement — eight months notice, a clean handover, every manager’s dream — had agreed to spend an hour a week with a junior. They had a Confluence page open. There was a structured template. There was a coffee. The template had sections for “decision rationale”, “known edge cases”, and “institutional history”. Everyone was very pleased with how organised it all was.

Eight months later he retired. The Confluence page was forty pages long. About six of those forty pages were genuinely useful. The other thirty-four were a mixture of obvious things, things we already had runbooks for, and an unsettling amount of “you’ll know it when you see it” hedging.

The useful six pages were the ones where the junior had pushed back. “Wait, but why?” “What if the queue’s full?” “What did you do that time in 2019 when prod crashed at 3am?” Those questions had pulled out the knowledge that wasn’t in his head as words — it was in his head as patterns, and it only came out when something tugged at the pattern from the right angle.

That experience was my first real encounter with the gap between “asking an expert what they know” and “actually getting what they know out of them”. I didn’t have a name for it at the time. There is, now, an emerging body of research that gives it one. It calls that approach reciprocal knowledge elicitation, and I think it’s going to matter more than most people in the field currently realise.

What is reciprocal knowledge elicitation?

The short version: reciprocal knowledge elicitation is an approach where the human and the AI both contribute to extracting, refining, and validating knowledge during the same interaction, instead of the AI passively collecting whatever the human happens to volunteer.

In a traditional setup — a structured interview, a survey, a knowledge capture form — the expert is the source, the system is the recorder. Information flows one way. The system’s only job is to not lose what it’s given.

In a reciprocal setup, the system has a job during the conversation. It asks targeted follow-ups. It summarises what it thinks it just heard. It surfaces internal contradictions. It points at gaps the expert didn’t realise were gaps. It can, when configured well, even challenge the expert’s assumptions: “you said the timeout is 60 seconds, but earlier you mentioned the upstream times out at 45 — which one wins?”

That isn’t just nicer UX. It’s a substantively different epistemic process. Done right, reciprocal elicitation does for knowledge what peer review does for a paper, or what code review does for a pull request — it pressure-tests the artifact through structured disagreement, in a way the artifact’s author cannot do alone.

The most active research community working on this is gathering around a workshop series called RKEHAC (“Reciprocal Knowledge Elicitation for Human-Agent Collaboration”), which is co-located with the Hybrid Human-AI Intelligence conference; the 2026 edition runs in Brussels in July. Earlier iterations are written up in the ACM Digital Library. This is a small but growing field, and it’s worth paying attention to.

Why one-way elicitation keeps failing

The deeper issue with traditional knowledge elicitation is that most of the knowledge worth capturing isn’t propositional — it isn’t a set of facts the expert could write down if you just gave them the right form. It’s tacit.

Tacit knowledge is the kind that:

Lives in pattern-matching rather than facts (“this looks like the thing we saw in Q3 of 2022”)
Surfaces only in context (“if the queue is full and it’s a Monday, the right move is X; on any other day it’s Y”)
Includes the negative space (“we tried that in 2019, it almost worked, here’s why we never did it again”)
Doesn’t compress well into bullet points without losing what makes it useful

The retired engineer in my opening story knew thousands of small rules of that shape. He didn’t think of them as rules. He couldn’t enumerate them on demand. They came out when the situation matched.

This is, by the way, the same reason new hires don’t become useful by reading the wiki. The wiki has the facts. The job has the patterns. APQC’s work on knowledge elicitation has been making this case for years — that the value of structured elicitation is mostly in surfacing the patterns the expert doesn’t think to mention. What’s new is that we now have the tooling, in the form of capable language models, to actually do this at scale without booking a forty-hour series of interviews.

How a reciprocal loop actually works

The skeleton of a working reciprocal elicitation session, stripped to its essentials, looks something like this:

The human describes a task, a decision, or an exception. Loose. Conversational. Not a form field. Whatever shape they want to describe it in.
The AI asks one — and only one — targeted follow-up. Not five. Not a clarification form. A single question that pushes on whatever was vague, incomplete, or surprising.
The AI summarises its current understanding back to the human. In the human’s own vocabulary, preferably. With the unknowns labeled as unknowns.
The human corrects, extends, or rejects. “That’s mostly right but you’ve inverted the failure mode.” “You’re missing the case where the upstream is degraded but not down.”
The refined version gets stored, with a trace of why it reached that final shape. The trace is the part that gets skipped, and the trace is the part that actually matters six months later.

Steps 2 through 4 loop until the conversation runs out of pressure, not until a timer expires. That last detail is more important than it sounds. Most of the bad versions of “AI interviewer” I’ve seen in the wild stop too early — they capture the first plausible answer and move on, because their underlying architecture treats the conversation as a sequential form-fill rather than an iterative search. The good ones keep going until they can’t find a new angle to push from.

The thing this loop produces, when it works, is qualitatively different from a transcript. It’s closer to a negotiated artifact than a recording. The human contributed the raw material. The AI contributed structure, scepticism, and the discipline of not letting ambiguity slide. Both ended up with something neither would have produced alone.

That’s why the word “reciprocal” matters. Not because the AI is “learning” in any deep sense during the session (most of the time it isn’t — that’s a separate question about how the stored knowledge gets fed back into the model). The reciprocity is in the immediate exchange: both parties are doing intellectual work, and both walk away with more than they brought.

Where it’s actually being used

A non-exhaustive list of places I’ve seen this approach show real results in the last eighteen months:

Requirements engineering. This is where reciprocal elicitation fits most naturally. Stakeholders are notoriously bad at writing down what they want, partly because they don’t know what they want until somebody pushes back. An AI that asks “you said you want the report to be ‘fast’. What’s fast in this context — under a second, under five seconds, under a minute?” will, on a long enough timeline, extract better requirements than any survey ever has. This is, incidentally, why I’ve spent enough time recently with Eliciteer to recommend it: it’s built specifically around this pattern. You write a briefing, share a link, and the system runs the interview against whoever you sent the link to, asking adaptive follow-ups along the way. It is, in spirit, exactly the kind of tool the RKEHAC line of work is arguing for.

Incident post-mortems. Most post-mortems get done in a meeting where the loudest person remembers wrong, and the quiet engineer who actually saw the failure at 3am doesn’t get a word in. A reciprocal interview, async, against each participant separately, produces strictly better source material. You then merge the versions and look at where they disagree. Where they disagree is where the post-mortem’s real lessons live.

Knowledge transfer before someone leaves. The exact scenario I opened this post with. The thing that pulled the useful six pages out of forty was the junior asking “wait, but why?” That role can be played, fairly competently, by a model that’s been briefed to ask exactly that kind of question. I would have killed for one in 2019.

Domain expert onboarding for AI products. If you’re building anything that needs domain knowledge — legal, medical, industrial, financial — you will, sooner or later, need to interview experts. Doing it reciprocally compresses fifty hours of human interviewer time into something an expert can do in three hours of their own, at their own pace. The output also tends to be more honest, because the expert isn’t performing for another human; they’re talking to a system that doesn’t judge.

Compliance and audit trails. This is the underrated use case. The trace of why a knowledge artifact reached its final shape — which questions were asked, which answers were accepted, which were rejected and why — turns out to be exactly the kind of evidence a serious audit wants to see. If you’ve ever tried to defend a decision to a regulator with “the SME just kind of knew” as your documentation, you’ll understand why this matters.

Design principles I’d defend

If you’re designing one of these, or evaluating a tool that claims to do it, here are the things I’d insist on. None of these are novel; most of them have been in the academic literature for years. The reason to write them down anyway is that most products that claim to be doing reciprocal elicitation are, in fact, doing sequential form-fill with extra steps.

One question at a time. Not three. Not “first, please tell us about X, Y, and Z.” If you find yourself stacking sub-questions into a single prompt, you’ve turned your interview back into a form and the depth advantage is gone.

Distinguish facts from judgments from assumptions. “The pod restarts every 12 hours” is a fact. “The pod should restart every 12 hours” is a judgment. “I think someone set that to 12 hours because Kubernetes was being weird in 2021” is an assumption. Treating them as the same kind of statement is how knowledge bases end up indistinguishable from urban legend.

Use examples and counterexamples aggressively. “Tell me about a time the system did the right thing here, and a time it didn’t.” This is the single most reliable prompt for unlocking pattern-based knowledge. It works in interviews. It works in interrogations. It works on a 6-year-old trying to explain a story they made up. The mechanism is universal.

Let the human review the AI’s summary before storing. This is non-negotiable. If the artifact in the knowledge base wasn’t explicitly signed off on by the human, it isn’t their knowledge anymore — it’s the model’s interpretation of their knowledge, and those two things diverge faster than anyone wants to admit.

Keep the trace. Not just the final answer. The path to it. Which questions were asked. Which were skipped. Where the human pushed back, and what they pushed back against. This is the part that pays for itself the day someone asks “who decided this and why?” and you don’t have to guess.

Don’t lie about your epistemic state. If the system isn’t sure what it just heard, it should say so. “I think you’re saying X, but that contradicts what you said earlier about Y. Which one stands?” This is awkward. It is also the difference between a knowledge artifact you can build on and one you can’t.

A worked example: bug triage

Let me ground this in something concrete, because abstract design principles without a use case are how blog posts end up doing nothing useful.

Say you’ve got a user-reported bug. The user is on the other side of the planet, they don’t have the patience for a meeting, and your support engineer is going to have to reconstruct what happened from a single sentence in a ticket.

The form-based version of this is the standard “browser, OS, reproduction steps” template. About 40% of tickets get filled in properly. The other 60% say things like “doesn’t work on Mac” and that’s it.

The reciprocal version, in practice, looks something like this:

User:   The export button doesn't work on my computer.

AI:     Got it. A couple of things would help me reproduce. When
        you click "Export", does anything happen at all — a
        loading spinner, an error message, a frozen tab — or
        does the button just stay there with no response?

User:   It spins for a bit then nothing.

AI:     OK — that's useful. The spinner running and then stopping
        suggests the request started but didn't complete. Two
        more questions:
        Roughly how long does it spin before stopping?

User:   Maybe 30 seconds? Feels like a timeout.

AI:     That matches a timeout pattern. Last one for now — does
        this happen for every export, or only for certain ones?
        (For example: a particular date range, a particular
        report type, a particular project.)

User:   Only on the big quarterly report. The monthly works fine.

AI:     Summary so far, please correct anything wrong:
        - Export only fails for the quarterly report
        - Monthly export works
        - Failure pattern: button shows a spinner for ~30s,
          then nothing happens (no error message)
        - Likely cause based on this pattern: request timeout
          on the server, possibly related to report size

User:   Yes, exactly.

Compare that to “doesn’t work on Mac”. The reciprocal version got to a probable root cause (server-side timeout on large report generation) in four exchanges, and it has a trace that the support engineer can paste into the ticket. The human did about twenty seconds of typing. The AI did the work of pulling the shape of the problem out of them.

You could do this with a human interviewer too, of course. You could do it with a human interviewer better, in fact. The problem is your support team has six people and three hundred tickets a day, and the math doesn’t work. The reciprocal version scales. That’s the actual pitch, not “the AI is smarter” but “the discipline of a structured interview, run cheaply enough to apply to every ticket”.

Doing it without building it yourself

This is the practical question most teams will land on after reading the above. Building a reciprocal elicitation pipeline from scratch — managing the conversation state, the follow-up policy, the summary loop, the trace storage — is a substantial piece of engineering, and the academic literature is honest about how easy it is to do it badly.

The honest answer is that you have three options, in increasing order of effort:

Use an existing product. Eliciteer is the one I happen to know best — it implements the core reciprocal loop (briefing, async interview, adaptive follow-ups, structured summary) as a hosted service, with a CLI and Python SDK for piping results into your own systems. It’s the closest thing I’ve seen to an off-the-shelf RKEHAC implementation. There are others worth looking at, but most of the ones I’ve evaluated stop at “AI-generated survey questions” without the adaptive depth that makes this approach work.
Build a thin wrapper on top of a frontier model. If your use case is narrow and you have specific reasons to control the prompt logic, building your own loop on top of an LLM API is feasible. Expect to spend more time on the conversation policy — when to ask, when to summarise, when to stop — than on the model itself.
Build it properly from scratch. Only worth doing if you have unusual requirements (regulated industries, on-prem, custom fine-tunes). In that case, look at the RKEHAC workshop proceedings and budget six months.

For most teams, option one is the right answer. I say this not because I want to sell you anything — I have no financial relationship with any of the products I named — but because the specific failure mode of “we built our own AI interviewer” tends to be six months of engineering effort that produces a worse result than something already on the shelf. There are interesting problems in this space worth doing yourself. The bare conversation loop is not one of them.

Frequently asked questions

Is reciprocal knowledge elicitation the same as a chatbot interview? No. A chatbot interview, in the way the term is normally used, follows a scripted decision tree. A reciprocal elicitation system adapts its questioning based on what’s already been said, surfaces contradictions actively, and produces a negotiated artifact rather than a transcript. The key difference is whether the system is pushing on the answers it gets, or just collecting them.

Doesn’t this just push the bias into the AI? A fair concern. The honest answer: yes, some bias is shifted from the human side (memory, salience, social desirability) to the model side (training data, prompt design, hallucination). The mitigation is the trace and the human review step. If you can see why the model went where it went, and the human signs off on the artifact before it’s stored, you’ve at least got auditable bias instead of invisible bias. That’s a real improvement, but it isn’t a fix.

How long should one of these sessions be? The literature suggests diminishing returns past about 45 minutes of interaction time, and frustration sets in past 60. The practical advice is: design for 15–30 minute sessions, with the option to come back. Long async sessions, split across days, tend to produce richer artifacts than single sittings — because the human has time to think about the questions in between.

Can I use this to extract knowledge from junior engineers, not just senior ones? Yes, and you probably should more often than you do. Junior engineers see things senior engineers have stopped seeing because they’ve normalised them. A reciprocal session aimed at “what confused you in your first month?” produces some of the most useful onboarding documentation a team can have. I have, more than once, learned things about my own systems by reading the output of such sessions.

What’s the difference between this and just having a smart interviewer? A smart interviewer is, almost by definition, better — for the one interview they’re doing. They don’t scale. The point of reciprocal elicitation isn’t to beat the best human interviewer; it’s to do better than the median form-fill at near-zero marginal cost.

Why I think this matters

Two reasons.

The first is mundane: institutional knowledge is leaving most organisations faster than they can capture it. Retirements, attrition, layoffs, reorgs. Forms haven’t worked. Wiki pages haven’t worked. Recorded interviews haven’t really worked either, because nobody reads them. A method that produces negotiated structured knowledge, at scale, asynchronously, at near-zero incremental cost per expert, is the first new tool we’ve had for this problem in about thirty years. That alone makes it worth paying attention to.

The second is less mundane. A reciprocal elicitation loop is, if you squint, a small instance of the broader pattern we’re all going to have to figure out: how do humans and AI agents think together, in a way that takes both their strengths seriously? The research community has called this “hybrid human-AI intelligence”, and it’s still mostly aspirational. Reciprocal knowledge elicitation is one of the first concrete, well-defined problems where you can actually see what the answer might look like in practice. It’s a tractable special case of a much larger question.

That, more than the immediate productivity wins, is why I think RKEHAC and the work coming out of it deserve more attention than they’re currently getting. It’s not just a better way to do expert interviews. It’s a small, well-instrumented dress rehearsal for the kind of collaboration that’s about to become the default.

Rust vs. Go in 2026 — a calmer take

2026-05-18T16:00:00+02:00

The Rust-vs-Go discourse online is, charitably, not great.

If you spend any time on Hacker News or the relevant subreddits you will get the impression that this is a fundamentally tribal question. One camp believes the borrow checker is the second coming and anyone still writing garbage-collected code is committing professional malpractice. The other camp believes Rust is a self-indulgent academic exercise and real engineers ship Go. Both camps tend to have very strong opinions on what kind of person prefers their opponent’s language.

I have, for the last five years, shipped meaningful production code in both. For most of that time my answer to “which should I use?” was a cowardly “it depends,” which is the answer that gets the most nods at meetups and the fewest fights, and is, on careful inspection, correct but useless.

In 2026 I think we have enough collective scar tissue to be a bit more specific. So here’s my attempt at a calmer version. Not the hot take. The cooler, slower take, where I try to commit to actual positions and explain them.

The TL;DR table

If you only have thirty seconds:

Workload	What I’d reach for first	Why
HTTP/gRPC service, normal CRUD	Go	`net/http`, generics, and a 4-second compile beat everything else in this niche
Latency-critical, p99.9 budget under 1ms	Rust	No GC tail. The first time you see a 30ms Go GC pause in a hot path you’ll understand
CLI tool you want people to install	Rust, narrowly	Cargo + static binaries is the most painless distribution story going
Embedded, firmware, WASM	Rust	Go’s runtime is too heavy. This isn’t a close call.
Internal batch jobs and data pipelines	Go	Channels are the right shape for the work, throughput rarely the bottleneck
Library you want consumed from C, Python, Ruby, Node	Rust	C ABI + cbindgen is the lingua franca. Go’s FFI story is a journey nobody enjoys
Anything where the spec changes weekly for the next year	Go	The “blank page to production” gradient is gentler
Anything that has to be correct for the next five years	Rust	The type system compounds

If your case isn’t in there, read on. If it is, you can probably stop reading and go to lunch.

What’s changed since 2021

It’s worth pausing on this because a lot of online discourse is still arguing about the 2020 versions of both languages.

Rust, in 2026

Compile times are not the deal-breaker they were. They are still slow. They are not “go make coffee” slow anymore. With cargo check, workspace splitting, sccache, and mold as your linker, an incremental rebuild on a medium-sized service is comfortably under ten seconds. It is still slower than Go’s equivalent, and probably always will be. You will live.
Async is, mostly, fine. Tokio is the default and won. async fn in traits works without async-trait ceremony. The Send bound and pinning footguns are still there but they’re documented now and most of the standard libraries handle them for you.
The ecosystem is enormous. The hard problem stopped being “is there a crate for this?” and became “which of the eleven crates is actively maintained?” This is, on balance, a better problem to have.

Go, in 2026

Generics happened, and the sky didn’t fall. The stdlib has absorbed them tastefully (slices, maps, cmp). Go code in 2026 is materially more expressive than it was in 2021 without being less readable, which is a trick I would not have predicted.
The runtime got quietly, dramatically faster. GC pauses, goroutine scheduling, escape analysis: all materially better. The same Go service in 2026 vs 2021 with zero source changes is meaningfully faster.
Modules stopped being a meme. go.mod, go.sum, go work finally feel like a coherent story.
The language has stayed deliberately small. This is the thing the Go team gets the least credit for, and it is, in my view, the most important property the language has.

Both languages, in other words, are noticeably better than the languages people are still arguing about online. Both are also more themselves — Rust has gotten Rustier, Go has gotten Go-ier. Which is to say: the trade-off between them is sharper now, not fuzzier.

Where Rust earns its keep

Three cases. I will name them.

Case 1: tail latency that actually has to hold. If your SLO is “p99.9 under a millisecond” and you can’t afford to burn entire cores on amortising GC, Go will, sooner or later, surprise you. Rust won’t. I’ve been part of two metrics-ingestion rewrites where the math came out in Rust’s favour by a factor of roughly 2.5x on throughput and an order of magnitude on tail latency. Those numbers are real and they were not the result of micro-optimising Go for six months first — we tried that, it helped some, the GC was still the GC.

Case 2: memory pressure you can’t predict. Streaming pipelines, parsers, anything that fans in a million small allocations per second. Rust’s lack of GC isn’t free — you pay for it in ownership annotations and 'a lifetimes that occasionally make grown engineers cry — but the worst case is predictable, which is precisely what you need when there is no worst case in your test suite that survives contact with reality.

Case 3: software you can’t easily redeploy. Firmware, agents in customer data centers, browser-side WASM, kernel modules, anything where “we’ll just push a hotfix” is not a sentence you get to say. Rust’s “if it compiles, it tends to work” property is, in this class of problem, the difference between a quiet on-call rotation and a recurring nightmare. I have lived both. I know which one I prefer.

A worked example, badly anonymised: a metrics ingestion service my team ran absorbed ~3.2M data points per second per pod, peak. It started in Go. It worked, but cost us 80 pods and uncomfortable p99.9 latency. We rewrote it in Rust over about ten weeks. Throughput went up by a factor of 2.4. p99.9 ingest latency dropped from ~4ms to ~700µs. Pod count went from 80 to 32. The rewrite paid for itself, conservatively, in seven weeks of saved cloud spend.

That kind of math is uncommon. When it shows up, it is very loud.

Where Go earns its keep

Three cases here too. Different cases.

Case 1: CRUD that’s shaped like CRUD. A frankly enormous fraction of internal services do this: take a request, call two other services, write to a database, return JSON. Go does this without ceremony. The stdlib does the boring parts. The code that comes out is approximately the code anyone on the team would write, in approximately the same way. That last property — everyone writes Go the same way — is genuinely valuable at scale and undervalued in language debates.

Case 2: teams that change a lot. A new hire is productive in Go in days. A new hire is productive in Rust in… longer. If your org has rotations, contractors, “we’re seconding one engineer for a quarter” arrangements, or anything else that churns the people on the codebase, Go’s onboarding gradient is straightforwardly friendlier. Rust pays back over years. If you don’t have years, you don’t get the payback.

Case 3: pipelines and orchestration. Goroutines plus channels remain the cleanest concurrency primitives in any mainstream language I’ve used. Tokio is great. It is not as natural for “a thousand small workers reading from a queue” as Go’s primitives are, and pretending otherwise is something I think the Rust community could be slightly more honest about.

The thing nobody mentions in these debates: the median Go program written by the median Go programmer is strikingly better than the median Rust program written by the median Rust programmer. Not because Rust is worse — it’s better, on raw expressiveness — but because Rust gives you more rope, and Go won’t let you go looking for rope at all. There’s a real virtue in a language that limits how clever you can be.

How I actually decide, in practice

When the technical answer is “either of these works”, which is honestly most of the time, here’s how I break the tie:

Who is going to maintain this in three years? If the answer is “we have no idea, probably people we haven’t hired yet, possibly a contractor in a region with a thin Rust community” — lean Go. If the answer is “the same five people, with maybe one rotation” — lean Rust.
What’s the cost of a runtime panic? Low — either works. High, in the way that costs money or trust or shows up in the news — Rust’s type system pays for itself.
How much is the spec going to change in the next year? A lot — lean Go. The cost of rewriting Rust code under a moving spec is, in my experience, materially higher than the cost of rewriting Go code. The Rust pays off when the spec calms down. Until it does, you pay the tax for nothing.

Things I will not argue about

Some statements I see repeated online and have stopped engaging with, because they are either wrong, outdated, or bad-faith:

“Rust is too complex to ship in a team.” Demonstrably false. Many teams ship Rust every week, including some of the largest infrastructure organisations on the planet. Pick a different argument.
“Go isn’t a real systems language.” It runs Kubernetes. It runs Docker. It runs most of the cloud-native ecosystem your Rust service is deployed inside. It is, by any sane definition, a systems language.
“GC is always a problem at scale.” It isn’t. Plenty of high-throughput services run garbage-collected runtimes. The question is whether your workload is GC-pressure-bound. Most workloads aren’t.
“The borrow checker is just a fancy linter.” It is, and that linter prevents an entire category of CVEs. The CVE-prevention business is, depending on your industry, worth quite a lot of money.

What I’d actually do, on a Tuesday

If I were starting a new project tomorrow:

A new product where everything’s in motion: Go. No hesitation.
A service in our hot path with a meaningful latency budget: Rust.
A CLI tool I plan to distribute to other engineers: Rust. Cargo ships single static binaries. It is the single best distribution story in any language ecosystem in 2026 and I am tired of pretending otherwise.
A library that other teams in other languages will consume: Rust, exposed via a C ABI. This is, again, not a close call.
An internal Lambda or Cloud Function: Go, because cold start still matters and Go’s binaries are small enough that nobody has to think about it.

I no longer feel guilty using both in the same organisation. The trick is being honest about which problem you have, not which language matches your team’s identity. Far too many teams choose their language because of identity (“we’re a Rust shop”) rather than fit, and that’s how you end up with a monorepo full of Rust config-loading scripts that take eight seconds to start, and Go services rewritten in Rust because it sounded cool, not because the math worked.

Pick the boring tool for the boring problem. Pick the precise tool for the precise problem. Be willing to be wrong; be willing to revise. That’s the entire thing. It just sounds less impressive than tribalism does.

A love letter to cURL — the world’s most underrated debugger

2026-05-17T10:15:00+02:00

I want to start with a confession. For about a decade of my career, I used curl the way most engineers do: to copy-paste the contents of the “Copy as cURL” button from Chrome DevTools into a shell, run it once, see what came back, and never think about the tool again.

I want to start with a confession because, looking back, that was an embarrassing waste of curl. Sort of like owning a piano and only playing the C-major scale with one finger. There’s a whole instrument in there. Most engineers I know — including, until recently, me — don’t realise it.

This is a long-overdue thank-you post to the most underrated debugger in our profession.

A brief, biased argument

I have nothing against GUI HTTP clients. Postman has, on net, probably made the world a better place for the average web developer. Insomnia has a nicer UI than Postman. Bruno is open source and ships faster than either. Hoppscotch runs in a browser. These are all good tools and they all have their place.

Their place is exploration. Building a request from scratch. Sharing a collection with the rest of the team. Onboarding a new hire. Generating client code from a spec. Those are real, valuable workflows and a GUI is the right shape for them.

The place a GUI is wrong for is debugging. Debugging happens late, on someone else’s box, over an SSH session, when something is already on fire and the GUI is not installed and you don’t have the patience to install it. In that mode, the tool that wins is the one that’s already there.

curl is already there. It is, in fact, the most “already there” piece of software on the modern internet. There is curl in your container. There is curl in your Raspberry Pi. There is, somewhat absurdly, curl running on Mars right now. It will outlive you. You should probably learn it properly.

The six flags that get you 80% of the way

Memorise these. They go together.

curl -sS \
     -i \
     -X POST \
     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer eyJhbGciOi...' \
     --data '{"hello":"world"}' \
     https://api.example.com/v1/things

What’s going on:

-s makes curl shut up about the progress meter. -S puts errors back on stderr. The combination — -sS — is what you want approximately always. Without -S, a silent failure is genuinely silent. You will, at some point, lose half an hour to this. Save yourself.
-i includes the response headers in stdout. They are, more often than not, the part of the response you actually care about. Run curl without -i and you’ll spend the next ten minutes wondering why your authenticated endpoint returns an empty body. It’s because it’s returning a 307 and you can’t see it.
-X POST does what it says. There’s -d/--data for the body. There’s also --data-raw, --data-binary, --data-urlencode, and a couple of others, each for a slightly different shape of payload. The mnemonic I use: --data is for JSON, --data-binary is for the moment you discover that JSON is corrupting your bytes.

That’s the boring 80%. The fun starts past it.

When you need to know what actually went over the wire

The single most-useful flag in curl and almost nobody talks about it:

curl --trace-ascii /tmp/trace.txt https://api.example.com/healthz

This writes every byte sent and received, in a readable form, to /tmp/trace.txt. Not “the headers”. Every byte. Including the bytes that are causing you to lose your mind.

A non-exhaustive list of bugs I have personally solved by cat-ing a --trace-ascii output and squinting at it:

A reverse proxy that was quietly stripping Transfer-Encoding: chunked and breaking streaming uploads in production. The proxy config was managed by another team. The bug had been live for about six weeks. We found it in twenty minutes.
An HTTP client library appending an extra \r\n to a JSON body, which the server’s body parser silently treated as “end of body, but the body is invalid, return 400 with no detail.”
A load balancer rejecting HTTP/2 in production but accepting it in staging. Root cause: a misconfigured ALPN cipher list, six letters different. The “X-Service-Version” header was right. The TLS handshake metadata was not.
An “invalid signature” error caused by a single zero-width space pasted from someone’s Slack message into a config file. I am not making that up. I wish I were.

I want to be clear here: I could not have caught any of these in Postman. Postman would have shown me “400 Bad Request” and a body that said “invalid request”. --trace-ascii showed me the bytes. The bytes were lying. The bytes are always lying. You just have to look at them.

The timing breakdown

This is the one I tell every junior engineer about and they look at me like I just handed them a cheat code, which, in fairness, I sort of did. Add this to your ~/.curlrc:

# ~/.curlrc
-w '\n
namelookup:     %{time_namelookup}s
connect:        %{time_connect}s
appconnect:     %{time_appconnect}s
pretransfer:    %{time_pretransfer}s
starttransfer:  %{time_starttransfer}s
total:          %{time_total}s
http_code:      %{http_code}
size_download:  %{size_download} bytes\n'

Now every curl invocation tells you, for free, exactly where the time went. DNS slow? That’s namelookup. TLS handshake slow? That’s appconnect. Server slow? That’s starttransfer minus pretransfer. Egress slow? Subtract.

If you’ve ever spent half an hour in a meeting arguing about whether your latency problem is “the network” or “the application”, and the answer turned out to be “TLS handshake on a cold connection”, this is the tool that would have ended that meeting in 30 seconds.

Replaying traffic, badly, with shell

Most teams I’ve worked with have at least once gone through the ritual of “we need a load testing tool”. They evaluate k6, locust, vegeta, gatling, sometimes JMeter (don’t), pick one, document something on Confluence, and then nobody ever uses it again because running it requires too much setup.

curl plus xargs plus a directory of captured requests will, in my experience, cover 70% of the use cases of an actual load testing tool, with zero setup:

# Replay 1000 captured requests at staging, 20 in parallel,
# count response codes.
ls captured/*.json | head -1000 |
  xargs -I{} -P20 \
    curl -sS -o /dev/null -w '%{http_code}\n' \
         -H 'Content-Type: application/json' \
         --data-binary @{} \
         https://staging.api.example.com/v1/things |
  sort | uniq -c | sort -rn

I have used variants of that one-liner to:

Confirm a new validation schema rejects ~0.03% of historical traffic, none of which integration tests had ever caught.
Bisect which day of which release introduced a sudden 403 spike (it was a Tuesday, the spike was a customer with a leading whitespace in their API key, the customer’s lawyers were involved).
Smoke-test a new region in a CDN before swinging DNS, by combining the above with --resolve (see next section).

I’m not saying don’t use a real load tester for real load testing. I’m saying: a surprising amount of “real” load testing is actually this, with a fancier wrapper.

TLS is the rest of `curl`’s superpower

There is a whole second curl hiding inside curl, and it’s about TLS. A non-exhaustive tour:

# What certificate chain is the server actually serving?
curl -vI https://api.example.com 2>&1 | grep -E '^\*'

# Force a particular TLS version, to see if the new version is the
# problem or the old version is.
curl --tls-max 1.2 https://api.example.com

# Mutual TLS, in two flags.
curl --cert client.pem --key client.key https://api.example.com/private

# Pin a hostname to a specific IP, bypassing DNS, GeoDNS, Anycast,
# and whatever other routing voodoo is between you and the server.
# This is the trick.
curl --resolve api.example.com:443:203.0.113.42 https://api.example.com

That last one — --resolve — is the killer feature. The first time you use it to hit one specific edge node in a CDN, in a specific region, bypassing every routing layer between you and that one box, you will wonder how you ever debugged a CDN issue any other way. (Answer: very, very slowly, while arguing with support.)

The real argument

You could do all of the above in a GUI. You could also, in principle, drive a screw with a butter knife. The question is whether the tool fits the situation.

Debugging fits a specific situation. The situation is: tired, late, on someone else’s machine, with no patience for friction. In that situation, the tool that wins is the one that’s already installed, scriptable, pipeable, honest about what’s happening at the byte level, and old enough that the answer to any question you have is on Stack Overflow.

curl is all four. It has been all four since 1996. It will still be all four in 2050. It works in a Docker FROM scratch image (well, with the static build). It works over SSH on a box that hasn’t been updated since 2014. It works on Mars.

Spend a weekend with man curl. Yes, the whole thing — it’s not that long, and most of it is genuinely useful. Take notes. Build yourself a cheat sheet, preferably on actual paper. The next time something is on fire at 3am, you’ll save half an hour and possibly your sanity.

Daniel Stenberg, if you’re reading this: thank you. Genuinely. We all owe you several beers.

Performance engineering as a habit, not a project

2026-05-16T18:00:00+02:00

Most performance work in the industry happens in one of two modes.

The first is heroics. Somebody — usually a tenured engineer with a reputation for caring about this stuff — disappears for three weeks with a stack of profilers, comes back with a deck full of flamegraphs, deletes a toString() somewhere, drops the cloud bill by 18%, and is celebrated at the next all-hands. Within four months the bill is back where it was. The engineer is now annoyed. Nobody can pinpoint exactly when the regression came back. It just sort of… did.

The second is reactive. A latency dashboard turns red. PagerDuty goes off. A small task force is convened in a war room called something like #perf-tigers-q3. The fires get put out. The task force dissolves. Within four months the dashboard turns red again, sometimes for the same reason.

Both modes work, sort of. They also share a fatal property: they treat performance as a project. Projects end. The regression doesn’t care that the project ended. It just patiently waits for the next quarter and starts climbing again.

The teams I’ve seen ship genuinely fast systems do something else. They don’t have a “performance initiative” with a project lead and an OKR. They have a habit. The habit is boring. The boringness is the point.

The habit, in three rules

If I had to compress eight years of doing this to a single page:

Measure on every PR. No exceptions, no opt-outs, no “we’ll add benchmarks once the feature stabilises.”
Compare against yesterday, not against an SLO. SLOs catch fires. Diffs catch the people lighting the matches.
Make regression a build failure, not a Slack ping somebody gets around to reading on Friday.

That’s it. That’s most of the trick. Everything below is plumbing, diff-formatting, and politics. Particularly the politics.

Why measuring every PR matters more than you think

Almost no production performance regression comes from a single bad commit. They come from twenty 0.6% regressions, none of which is big enough to be worth arguing about in code review, stacked over a quarter. By the time anyone notices, the original cause is buried in a git log nobody is going to bisect.

The defence is a benchmark suite that runs on every PR. Same shape as your test suite. Not a separate thing. Not a “performance team” thing. Just part of CI.

It doesn’t have to be elaborate. It has to be:

Reliable. Hermetic runner, pinned hardware, no noisy neighbours. If your CI runs on shared compute in a hyperscaler, do the perf runs on a dedicated machine somewhere quiet. I’ve used a literal Mac Mini under a desk for this. It worked great.
Statistically honest. One run is not a benchmark, it’s a coin flip. I aim for at least eight runs with benchstat (Go) or criterion (Rust) or pytest-benchmark (Python) doing the statistical heavy lifting.
Boringly visible. The diff has to show up in the PR. Slack doesn’t count. Email doesn’t count. Anything that requires a human to go look at it will, over a long enough timeline, be forgotten.

For a Go service, the bones of it look like:

package bench

import (
    "context"
    "testing"
)

func BenchmarkCreateOrder(b *testing.B) {
    srv := newTestServer(b)
    b.ReportAllocs()
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        if _, err := srv.CreateOrder(context.Background(), validOrder()); err != nil {
            b.Fatal(err)
        }
    }
}

Five lines, plus setup. Run it on every PR. Store the results. That’s it.

Compare against yesterday, not against an SLO

SLOs are great. SLOs catch fires, page on-call, force prioritization, all the things they’re supposed to do.

SLOs are also completely useless at catching the kind of regression I’m describing. A 4% bump in p99 latency does not violate any reasonable SLO. It also, over fifty deploys, more than doubles your tail latency. You will not catch this with thresholds. You will always catch it with diffs.

The single most useful artefact in a perf-focused team’s day is not a Grafana dashboard. It’s a comment in a PR that looks like this:

benchmark                  before     after      delta
---------------------------------------------------------
BenchmarkCreateOrder       412 µs     438 µs     +6.3%   *
BenchmarkListOrders         91 µs      93 µs     +2.2%
BenchmarkAuth               18 µs      19 µs     +5.6%   *
BenchmarkSerializeOrder    7.2 µs     7.3 µs     +1.4%

* indicates statistically significant (n=8, p<0.01)

Posted by a bot. Before human review. Every PR.

The point of this is not really the numbers. The point is social. Once “+6.3%” is sitting in the review thread, the conversation changes. The author justifies it (“we added a required signature check, this is expected”), or they back it out, or they file a follow-up. Either way: the regression is seen. Seen regressions get fixed. Invisible ones don’t.

This is, deeply, a sociotechnical fix wearing a technical hat. The tooling is the easy part.

Make regression a build failure

The first time you turn this on, CI will be on fire for a week.

This is good. You are paying down debt you’d otherwise pay in production, in a smaller and more controlled way. The week sucks. The quarters that follow are noticeably calmer.

A reasonable opening policy (yours should be more strict over time):

Block PRs with a >10% regression on hot-path benchmarks.
Warn but don’t block for 3–10%.
Allow opt-out — with a written justification in the PR description.

That last bit is the important one. “Migrating to a new crypto library, +14% but mandated for FIPS, follow-up tracked at JIRA-1234” is a perfectly fine justification. “Refactor, will fix later” is exactly how regressions get accepted, and it’s the thing the policy exists to stop. The justification doesn’t have to be approved by anyone, it just has to exist. The act of writing it forces the author to think about whether the regression is actually fine.

A skeleton GitHub Actions job:

name: perf
on: [pull_request]

jobs:
  bench:
    runs-on: self-hosted-perf  # pinned hardware, not a hyperscaler runner
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 2 }

      - name: Bench base
        run: |
          git checkout $
          go test -bench=. -benchmem -count=8 ./... | tee /tmp/base.txt

      - name: Bench head
        run: |
          git checkout $
          go test -bench=. -benchmem -count=8 ./... | tee /tmp/head.txt

      - name: Compare and enforce
        run: |
          benchstat /tmp/base.txt /tmp/head.txt | tee /tmp/diff.txt
          ./scripts/enforce-budget.py /tmp/diff.txt --max-regression 0.10

You can buy fancier versions of this off the shelf. They are not, in my experience, better. The simple version above has shipped to production three times for me and worked all three times.

The part that’s actually hard

Tools are 20% of this. The other 80% is the team agreeing, out loud, that performance is a feature. Not a nice-to-have. Not a “non-functional requirement”, which is a phrase invented by people who didn’t want to do the work. A feature. Same priority as “can the user log in”.

I will spare you the inspirational paragraph. Here are the small rituals that, in practice, calcify the habit:

A 30-minute weekly perf review. Fixed dashboard. What got faster this week, what got slower, what’s the spend trend. Cancel it exactly zero times in the first three months even if there’s “nothing to discuss”.
Performance budgets in design docs. “This endpoint must serve p99 < 80ms at 1000 RPS on a standard pod” should be a sentence in the doc before any code gets written. If it isn’t, the doc isn’t done.
Profile every release. Save the flamegraph. Diff against last release. You’d be amazed what shows up.

The last one is my favourite, and the one that gets dropped first. Don’t drop it.

When you know it worked

The signal that the habit has taken hold is that nobody mentions performance anymore. Bench diffs are part of every review the way green checkmarks are. Regressions get reverted the same day. The CFO stops being surprised by the cloud bill, which means they stop asking about it, which means you get to spend your one-on-ones on something else.

Performance, done well, is invisible. Performance, done badly, is a permanent low-grade crisis with a different name every quarter. Pick one.

Your event-driven architecture is a distributed monolith — and that’s fine, if you measure it

2026-05-15T12:30:00+02:00

So you rewrote the monolith into events.

Every team now owns a service. Every service owns its own database. The architecture diagram has 47 arrows on it and a legend that says “asynchronous, fire-and-forget”. Standups are shorter. Deploys are calmer. The principal engineer who pushed for this rewrite has, in fact, been promoted.

And then on a Tuesday around 3pm somebody from billing pings you on Slack and asks why the dashboard team’s deploy is causing refund events to vanish.

Welcome to the distributed monolith. The good news is you’re not alone: this is, in my completely unscientific estimate, what about three out of every four “event-driven” architectures look like in practice. The bad news is the people who built it almost never realise it until something breaks loudly. And the worse news is that the usual fix — “let’s just move some of these back to synchronous calls” — is, in my experience, the wrong fix.

This post is about the right fix, which is mostly boring and mostly about discipline.

What actually happened

The pitch for going event-driven was: services are decoupled, deploys are independent, blast radius is small. That pitch is sometimes true. It is true when the contract between services is explicit, versioned, and validated. It is almost never the default state.

What actually happens, more often:

order-service ──▶  orders.created  ─┐
                                    ├──▶ inventory-service
                                    ├──▶ shipping-service
                                    ├──▶ billing-service
                                    ├──▶ analytics
                                    └──▶ that one Lambda nobody owns

The bus is technically decoupling those services. The schema of the event is not. Inventory, shipping, billing, analytics, and the mystery Lambda all parse the same JSON blob, and they all parse it in slightly different ways. Field gets renamed. Field gets repurposed. Field gets a new value that the old consumers interpret as “unknown, proceed with defaults”. Defaults are wrong. Refunds vanish.

You didn’t decouple anything. You hid the coupling behind a message broker, where nobody can see it without grep and a war room.

The thing about message brokers

A message broker is a transport. It is not a contract, it is not a schema, it is not an interface. It is a pipe. If you stick a typed RPC call into a pipe and call the result “decoupled”, you’ve changed the shape of the coupling from “synchronous and observable” to “asynchronous and silent”. The total amount of coupling didn’t go down. It just got harder to find.

If you only take one thing from this post: the bus is not the contract.

Step one: schemas are not folklore

A schema is folklore if the only way to find out what a orders.created event looks like in 2026 is to ask the engineer who originally wrote the producer. A schema is not folklore if you can pip install it, or at least open a JSON Schema file in a registry and diff two versions.

The smallest possible registry that works is a directory of files in a repo every service depends on:

// schemas/orders.created/v2.json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "orders.created.v2",
  "type": "object",
  "required": ["order_id", "customer_id", "created_at", "total_cents", "currency"],
  "properties": {
    "order_id":    { "type": "string", "format": "uuid" },
    "customer_id": { "type": "string", "format": "uuid" },
    "created_at":  { "type": "string", "format": "date-time" },
    "total_cents": { "type": "integer", "minimum": 0 },
    "currency":    { "type": "string", "pattern": "^[A-Z]{3}$" }
  },
  "additionalProperties": false
}

Two important things in there.

The first is additionalProperties: false. Yes, really. I know this is unfashionable. I know “be liberal in what you accept” is the default advice. It is also the reason nobody can ever delete a field from a payload, and the reason your bus carries 14kB of legacy junk per event seven years after launch. Set it to false. Bump the version when you need a new field. Pay the cost up front.

The second thing is that the schema lives in a repo, not in a wiki. Wikis go stale. Repos have CI. You want a CI job that fails when a producer ships a payload that doesn’t validate against the registered schema, and you want that job to be louder than a green checkmark.

This is unglamorous work. It is the single most valuable hour-per-week you can spend on an event-driven system, and it almost never gets prioritized until after the first big incident. Prioritize it before.

Step two: subscriptions are not folklore either

Question: which services consume orders.created?

If you can’t answer that without grep-ing across every repo in your org, the answer is “you have no idea, and neither does anyone else”. Which means you also have no idea what breaks when you change the event.

The fix is some mechanism — any mechanism — that makes consumer subscriptions explicit and discoverable. I have, at various employers, shipped three versions of this:

A subscriptions.yaml at the root of every service repo. Cheap, ugly, works. The downside is everyone forgets to update it.
A decorator on the handler function that registers it in a tiny internal library. Less forgetful, but only works inside one language ecosystem.
A handler-naming convention strict enough to grep for with ripgrep. Honestly? My favourite of the three. The constraint is the documentation.

Whatever you pick, run a scheduled job that crawls the org, extracts the declarations, and writes them into a graph. Render that graph somewhere visible. I once had a printed version of this graph on the wall outside my desk. People stopped asking me questions about ownership. Highly recommend.

Step three: trace context, on every message, no exceptions

The thing that made the monolith debuggable was the call stack. The thing that makes a distributed system debuggable is distributed tracing. There is no third option. There is, in particular, no “we’ll just check the logs in each service” option — anyone who has tried to reconstruct a multi-service flow from interleaved Loki queries at 2am can tell you exactly how that goes.

Every message has to carry trace context end-to-end. W3C trace context, OpenTelemetry baggage, whatever your stack speaks natively. The wrapper is about ten lines:

# producer side
from opentelemetry import propagate

def publish(topic, payload):
    headers = {}
    propagate.inject(headers)
    bus.publish(topic, payload, headers=headers)

# consumer side
def handle(msg):
    ctx = propagate.extract(msg.headers)
    with tracer.start_as_current_span(f"handle.{msg.topic}", context=ctx):
        process(msg.payload)

Once that’s deployed everywhere, the question “what happens when I publish orders.created?” becomes a single query in your tracing tool, and you can stop being the team’s human DAG diagram.

If you only have time to do one of the three things in this post, do this one. Schemas help you ship safely. Subscriptions help you plan. Trace context is what saves your weekend.

The fan-out problem

When you finally have visibility, you will notice a thing that may make you uncomfortable: the average user request, traced end-to-end, produces approximately 40 spans. Sometimes 80. Once, on a particularly memorable Black Friday morning, I watched a single “add to cart” produce 312.

This isn’t inherently a problem. Fan-out is fine. Fan-out is sometimes the whole reason to use events. But unbounded, unwatched fan-out is how your tail latency budget evaporates without anyone making a decision about it.

A rough rubric I use, with no claim to it being scientific:

fan-out per request	what I do
under 10	nothing, ship it
10–30	sample traces, watch the p99.9 tail
30–80	start asking who’s publishing inside a loop
80+	treat as an incident, even if nothing’s on fire yet

The biggest culprit, in basically every postmortem I’ve been part of, is a service that publishes events inside a loop iterating over a collection. The author insists this is fine, because “we’re just emitting domain events”. You are not. You are emitting 4,000 domain events, each of which fans out to seven consumers, each of which writes a row to a database. The math doesn’t care about your intent.

If you find one of these in code review, kill it. There is almost always a single “batch” event that captures the same intent. If there genuinely isn’t, that’s a design smell worth pulling on for half an hour.

The thing nobody tells you

Here’s the part the “events are decoupled!” pitch leaves out: an event-driven system is strictly more work to operate than a synchronous one. Not less. More.

It has more failure modes. It has worse default error messages. It needs idempotent consumers. It needs replay tooling. It needs schema discipline, subscription discipline, fan-out discipline, trace context discipline, dead-letter discipline. Most of those words are unglamorous and none of them ship features.

In exchange, you get a system where teams can deploy independently without coordinating, where you can replay a day of traffic to recover from a bug, and where the blast radius of any one component failing is contained. Those are real wins. They are not free.

The mistake is going event-driven and then pretending you’re not running a distributed system. You are. Pay the operational tax, keep paying it, and the architecture will keep paying you back. Skip it and, well. See you in the war room.

Welcome to fidelitas.io — a manifesto, of sorts

2026-05-12T09:00:00+02:00

OK so. I have, against my better judgment, started another technology blog.

There are already approximately ten million of them. Most are dead. The ones that aren’t dead are mostly either thinly-veiled marketing for whatever consultancy the author runs, or LinkedIn-grade “I’ve been thinking about leadership” takes from people whose actual job is to ship code. I read maybe four programming blogs a week and find one of them useful per quarter.

So why this one. Honestly, mostly for me. The job I do for a living involves writing a lot of internal documents that nobody outside the company will ever read — incident reviews, design docs, that one wiki page about the queue that keeps coming up in interviews — and at some point I noticed that the documents I’d put real effort into were the ones I’d want to read myself if I worked somewhere else. So I figured: why not just write them somewhere else.

That’s the whole pitch, really. There’s no business model. There is, as of this evening, no audience either.

What this is going to look like

Roughly four shapes of post, in descending order of how likely I am to finish them:

Field notes. Short. Something broke in production this week, here is the dumb reason, here is the slightly less dumb fix. The kind of post you wish you’d read on Monday and didn’t see until Friday’s incident review.

Tooling. Opinionated reviews of stuff on my $PATH. Sometimes these will be about a CLI that’s older than the internet. Sometimes they’ll be about whatever is currently being hyped on Hacker News. No affiliate links — I make zero money from this and would prefer to keep it that way.

Deep dives. Longer, slower, occasionally pedantic. When something is worth explaining properly, I’d rather explain it properly than turn it into a Twitter thread.

Essays. Once in a while. Nothing on the calendar. These tend to write themselves at 11pm after a hard week.

What this is not going to look like

A few promises, mostly to keep me honest later:

No tracking. No analytics, no fingerprinting, no Google Tag Manager hiding inside a “performance monitoring” library. The simplest way for me to find out if you’re reading is to ask, and I don’t intend to ask.
No newsletter. I don’t want your email and I really don’t want a Substack revenue line item. Subscribe via RSS or don’t.
No popups, no cookie banners, no GDPR consent modal. The site has nothing to consent to. (If you live in a jurisdiction where this paragraph is, technically, the cookie banner, then welcome.)
No AI-generated filler. I use tooling to spellcheck and occasionally argue with me about a paragraph. I do not ship words I didn’t write and stand behind.
No drive-by hot takes. If a post is going to be strong, it’d better earn it.

That last one is going to be the hardest. I have a lot of hot takes. Roughly 60% of them, when I sit down and try to write them out, dissolve into “actually it depends.” This is, on reflection, mostly a good thing.

What’s already in the queue

I have three pieces drafted, in varying states of readiness:

Why your event-driven architecture quietly became a distributed monolith, and the small handful of things that bring it back from the dead.
Performance engineering as a habit, not a project. (This one’s been in the drafts folder for over a year.)
A love letter to curl. Long overdue.

And one I haven’t started but keep thinking about: Rust vs. Go, but calmer. The conversation about those two has gotten so dumb online that I’d like to attempt a sober version. We’ll see if I have the energy.

Housekeeping

If you want to follow along, the RSS feed is the only mechanism. There’s no comments section, which means there’s no place to argue with me directly. That’s deliberate. Disagreement is fine, even welcome, but it should live on your own blog where you have to take responsibility for it. (See also: the entire history of Twitter.)

Right. That’s enough preamble. Let’s write something.