Reciprocal knowledge elicitation: when the interview goes both ways

Traditional knowledge elicitation is a one-way street: an expert talks, a system listens. A new wave of human-agent research is rewriting that contract. Here's what reciprocal knowledge elicitation actually means, why it matters for capturing tacit knowledge, and how to design one without losing the plot.

A few years ago I sat in on a knowledge transfer session that was, on paper, going to be the answer to every team’s prayer.

The senior engineer we were going to lose to retirement — eight months notice, a clean handover, every manager’s dream — had agreed to spend an hour a week with a junior. They had a Confluence page open. There was a structured template. There was a coffee. The template had sections for “decision rationale”, “known edge cases”, and “institutional history”. Everyone was very pleased with how organised it all was.

Eight months later he retired. The Confluence page was forty pages long. About six of those forty pages were genuinely useful. The other thirty-four were a mixture of obvious things, things we already had runbooks for, and an unsettling amount of “you’ll know it when you see it” hedging.

The useful six pages were the ones where the junior had pushed back. “Wait, but why?” “What if the queue’s full?” “What did you do that time in 2019 when prod crashed at 3am?” Those questions had pulled out the knowledge that wasn’t in his head as words — it was in his head as patterns, and it only came out when something tugged at the pattern from the right angle.

That experience was my first real encounter with the gap between “asking an expert what they know” and “actually getting what they know out of them”. I didn’t have a name for it at the time. There is, now, an emerging body of research that gives it one. It calls that approach reciprocal knowledge elicitation, and I think it’s going to matter more than most people in the field currently realise.

What is reciprocal knowledge elicitation?

The short version: reciprocal knowledge elicitation is an approach where the human and the AI both contribute to extracting, refining, and validating knowledge during the same interaction, instead of the AI passively collecting whatever the human happens to volunteer.

In a traditional setup — a structured interview, a survey, a knowledge capture form — the expert is the source, the system is the recorder. Information flows one way. The system’s only job is to not lose what it’s given.

In a reciprocal setup, the system has a job during the conversation. It asks targeted follow-ups. It summarises what it thinks it just heard. It surfaces internal contradictions. It points at gaps the expert didn’t realise were gaps. It can, when configured well, even challenge the expert’s assumptions: “you said the timeout is 60 seconds, but earlier you mentioned the upstream times out at 45 — which one wins?”

That isn’t just nicer UX. It’s a substantively different epistemic process. Done right, reciprocal elicitation does for knowledge what peer review does for a paper, or what code review does for a pull request — it pressure-tests the artifact through structured disagreement, in a way the artifact’s author cannot do alone.

The most active research community working on this is gathering around a workshop series called RKEHAC (“Reciprocal Knowledge Elicitation for Human-Agent Collaboration”), which is co-located with the Hybrid Human-AI Intelligence conference; the 2026 edition runs in Brussels in July. Earlier iterations are written up in the ACM Digital Library. This is a small but growing field, and it’s worth paying attention to.

Why one-way elicitation keeps failing

The deeper issue with traditional knowledge elicitation is that most of the knowledge worth capturing isn’t propositional — it isn’t a set of facts the expert could write down if you just gave them the right form. It’s tacit.

Tacit knowledge is the kind that:

  • Lives in pattern-matching rather than facts (“this looks like the thing we saw in Q3 of 2022”)
  • Surfaces only in context (“if the queue is full and it’s a Monday, the right move is X; on any other day it’s Y”)
  • Includes the negative space (“we tried that in 2019, it almost worked, here’s why we never did it again”)
  • Doesn’t compress well into bullet points without losing what makes it useful

The retired engineer in my opening story knew thousands of small rules of that shape. He didn’t think of them as rules. He couldn’t enumerate them on demand. They came out when the situation matched.

This is, by the way, the same reason new hires don’t become useful by reading the wiki. The wiki has the facts. The job has the patterns. APQC’s work on knowledge elicitation has been making this case for years — that the value of structured elicitation is mostly in surfacing the patterns the expert doesn’t think to mention. What’s new is that we now have the tooling, in the form of capable language models, to actually do this at scale without booking a forty-hour series of interviews.

How a reciprocal loop actually works

The skeleton of a working reciprocal elicitation session, stripped to its essentials, looks something like this:

  1. The human describes a task, a decision, or an exception. Loose. Conversational. Not a form field. Whatever shape they want to describe it in.
  2. The AI asks one — and only one — targeted follow-up. Not five. Not a clarification form. A single question that pushes on whatever was vague, incomplete, or surprising.
  3. The AI summarises its current understanding back to the human. In the human’s own vocabulary, preferably. With the unknowns labeled as unknowns.
  4. The human corrects, extends, or rejects. “That’s mostly right but you’ve inverted the failure mode.” “You’re missing the case where the upstream is degraded but not down.”
  5. The refined version gets stored, with a trace of why it reached that final shape. The trace is the part that gets skipped, and the trace is the part that actually matters six months later.

Steps 2 through 4 loop until the conversation runs out of pressure, not until a timer expires. That last detail is more important than it sounds. Most of the bad versions of “AI interviewer” I’ve seen in the wild stop too early — they capture the first plausible answer and move on, because their underlying architecture treats the conversation as a sequential form-fill rather than an iterative search. The good ones keep going until they can’t find a new angle to push from.

The thing this loop produces, when it works, is qualitatively different from a transcript. It’s closer to a negotiated artifact than a recording. The human contributed the raw material. The AI contributed structure, scepticism, and the discipline of not letting ambiguity slide. Both ended up with something neither would have produced alone.

That’s why the word “reciprocal” matters. Not because the AI is “learning” in any deep sense during the session (most of the time it isn’t — that’s a separate question about how the stored knowledge gets fed back into the model). The reciprocity is in the immediate exchange: both parties are doing intellectual work, and both walk away with more than they brought.

Where it’s actually being used

A non-exhaustive list of places I’ve seen this approach show real results in the last eighteen months:

Requirements engineering. This is where reciprocal elicitation fits most naturally. Stakeholders are notoriously bad at writing down what they want, partly because they don’t know what they want until somebody pushes back. An AI that asks “you said you want the report to be ‘fast’. What’s fast in this context — under a second, under five seconds, under a minute?” will, on a long enough timeline, extract better requirements than any survey ever has. This is, incidentally, why I’ve spent enough time recently with Eliciteer to recommend it: it’s built specifically around this pattern. You write a briefing, share a link, and the system runs the interview against whoever you sent the link to, asking adaptive follow-ups along the way. It is, in spirit, exactly the kind of tool the RKEHAC line of work is arguing for.

Incident post-mortems. Most post-mortems get done in a meeting where the loudest person remembers wrong, and the quiet engineer who actually saw the failure at 3am doesn’t get a word in. A reciprocal interview, async, against each participant separately, produces strictly better source material. You then merge the versions and look at where they disagree. Where they disagree is where the post-mortem’s real lessons live.

Knowledge transfer before someone leaves. The exact scenario I opened this post with. The thing that pulled the useful six pages out of forty was the junior asking “wait, but why?” That role can be played, fairly competently, by a model that’s been briefed to ask exactly that kind of question. I would have killed for one in 2019.

Domain expert onboarding for AI products. If you’re building anything that needs domain knowledge — legal, medical, industrial, financial — you will, sooner or later, need to interview experts. Doing it reciprocally compresses fifty hours of human interviewer time into something an expert can do in three hours of their own, at their own pace. The output also tends to be more honest, because the expert isn’t performing for another human; they’re talking to a system that doesn’t judge.

Compliance and audit trails. This is the underrated use case. The trace of why a knowledge artifact reached its final shape — which questions were asked, which answers were accepted, which were rejected and why — turns out to be exactly the kind of evidence a serious audit wants to see. If you’ve ever tried to defend a decision to a regulator with “the SME just kind of knew” as your documentation, you’ll understand why this matters.

Design principles I’d defend

If you’re designing one of these, or evaluating a tool that claims to do it, here are the things I’d insist on. None of these are novel; most of them have been in the academic literature for years. The reason to write them down anyway is that most products that claim to be doing reciprocal elicitation are, in fact, doing sequential form-fill with extra steps.

One question at a time. Not three. Not “first, please tell us about X, Y, and Z.” If you find yourself stacking sub-questions into a single prompt, you’ve turned your interview back into a form and the depth advantage is gone.

Distinguish facts from judgments from assumptions. “The pod restarts every 12 hours” is a fact. “The pod should restart every 12 hours” is a judgment. “I think someone set that to 12 hours because Kubernetes was being weird in 2021” is an assumption. Treating them as the same kind of statement is how knowledge bases end up indistinguishable from urban legend.

Use examples and counterexamples aggressively. “Tell me about a time the system did the right thing here, and a time it didn’t.” This is the single most reliable prompt for unlocking pattern-based knowledge. It works in interviews. It works in interrogations. It works on a 6-year-old trying to explain a story they made up. The mechanism is universal.

Let the human review the AI’s summary before storing. This is non-negotiable. If the artifact in the knowledge base wasn’t explicitly signed off on by the human, it isn’t their knowledge anymore — it’s the model’s interpretation of their knowledge, and those two things diverge faster than anyone wants to admit.

Keep the trace. Not just the final answer. The path to it. Which questions were asked. Which were skipped. Where the human pushed back, and what they pushed back against. This is the part that pays for itself the day someone asks “who decided this and why?” and you don’t have to guess.

Don’t lie about your epistemic state. If the system isn’t sure what it just heard, it should say so. “I think you’re saying X, but that contradicts what you said earlier about Y. Which one stands?” This is awkward. It is also the difference between a knowledge artifact you can build on and one you can’t.

A worked example: bug triage

Let me ground this in something concrete, because abstract design principles without a use case are how blog posts end up doing nothing useful.

Say you’ve got a user-reported bug. The user is on the other side of the planet, they don’t have the patience for a meeting, and your support engineer is going to have to reconstruct what happened from a single sentence in a ticket.

The form-based version of this is the standard “browser, OS, reproduction steps” template. About 40% of tickets get filled in properly. The other 60% say things like “doesn’t work on Mac” and that’s it.

The reciprocal version, in practice, looks something like this:

User:   The export button doesn't work on my computer.

AI:     Got it. A couple of things would help me reproduce. When
        you click "Export", does anything happen at all — a
        loading spinner, an error message, a frozen tab — or
        does the button just stay there with no response?

User:   It spins for a bit then nothing.

AI:     OK — that's useful. The spinner running and then stopping
        suggests the request started but didn't complete. Two
        more questions:
        Roughly how long does it spin before stopping?

User:   Maybe 30 seconds? Feels like a timeout.

AI:     That matches a timeout pattern. Last one for now — does
        this happen for every export, or only for certain ones?
        (For example: a particular date range, a particular
        report type, a particular project.)

User:   Only on the big quarterly report. The monthly works fine.

AI:     Summary so far, please correct anything wrong:
        - Export only fails for the quarterly report
        - Monthly export works
        - Failure pattern: button shows a spinner for ~30s,
          then nothing happens (no error message)
        - Likely cause based on this pattern: request timeout
          on the server, possibly related to report size

User:   Yes, exactly.

Compare that to “doesn’t work on Mac”. The reciprocal version got to a probable root cause (server-side timeout on large report generation) in four exchanges, and it has a trace that the support engineer can paste into the ticket. The human did about twenty seconds of typing. The AI did the work of pulling the shape of the problem out of them.

You could do this with a human interviewer too, of course. You could do it with a human interviewer better, in fact. The problem is your support team has six people and three hundred tickets a day, and the math doesn’t work. The reciprocal version scales. That’s the actual pitch, not “the AI is smarter” but “the discipline of a structured interview, run cheaply enough to apply to every ticket”.

Doing it without building it yourself

This is the practical question most teams will land on after reading the above. Building a reciprocal elicitation pipeline from scratch — managing the conversation state, the follow-up policy, the summary loop, the trace storage — is a substantial piece of engineering, and the academic literature is honest about how easy it is to do it badly.

The honest answer is that you have three options, in increasing order of effort:

  1. Use an existing product. Eliciteer is the one I happen to know best — it implements the core reciprocal loop (briefing, async interview, adaptive follow-ups, structured summary) as a hosted service, with a CLI and Python SDK for piping results into your own systems. It’s the closest thing I’ve seen to an off-the-shelf RKEHAC implementation. There are others worth looking at, but most of the ones I’ve evaluated stop at “AI-generated survey questions” without the adaptive depth that makes this approach work.
  2. Build a thin wrapper on top of a frontier model. If your use case is narrow and you have specific reasons to control the prompt logic, building your own loop on top of an LLM API is feasible. Expect to spend more time on the conversation policy — when to ask, when to summarise, when to stop — than on the model itself.
  3. Build it properly from scratch. Only worth doing if you have unusual requirements (regulated industries, on-prem, custom fine-tunes). In that case, look at the RKEHAC workshop proceedings and budget six months.

For most teams, option one is the right answer. I say this not because I want to sell you anything — I have no financial relationship with any of the products I named — but because the specific failure mode of “we built our own AI interviewer” tends to be six months of engineering effort that produces a worse result than something already on the shelf. There are interesting problems in this space worth doing yourself. The bare conversation loop is not one of them.

Frequently asked questions

Is reciprocal knowledge elicitation the same as a chatbot interview? No. A chatbot interview, in the way the term is normally used, follows a scripted decision tree. A reciprocal elicitation system adapts its questioning based on what’s already been said, surfaces contradictions actively, and produces a negotiated artifact rather than a transcript. The key difference is whether the system is pushing on the answers it gets, or just collecting them.

Doesn’t this just push the bias into the AI? A fair concern. The honest answer: yes, some bias is shifted from the human side (memory, salience, social desirability) to the model side (training data, prompt design, hallucination). The mitigation is the trace and the human review step. If you can see why the model went where it went, and the human signs off on the artifact before it’s stored, you’ve at least got auditable bias instead of invisible bias. That’s a real improvement, but it isn’t a fix.

How long should one of these sessions be? The literature suggests diminishing returns past about 45 minutes of interaction time, and frustration sets in past 60. The practical advice is: design for 15–30 minute sessions, with the option to come back. Long async sessions, split across days, tend to produce richer artifacts than single sittings — because the human has time to think about the questions in between.

Can I use this to extract knowledge from junior engineers, not just senior ones? Yes, and you probably should more often than you do. Junior engineers see things senior engineers have stopped seeing because they’ve normalised them. A reciprocal session aimed at “what confused you in your first month?” produces some of the most useful onboarding documentation a team can have. I have, more than once, learned things about my own systems by reading the output of such sessions.

What’s the difference between this and just having a smart interviewer? A smart interviewer is, almost by definition, better — for the one interview they’re doing. They don’t scale. The point of reciprocal elicitation isn’t to beat the best human interviewer; it’s to do better than the median form-fill at near-zero marginal cost.

Why I think this matters

Two reasons.

The first is mundane: institutional knowledge is leaving most organisations faster than they can capture it. Retirements, attrition, layoffs, reorgs. Forms haven’t worked. Wiki pages haven’t worked. Recorded interviews haven’t really worked either, because nobody reads them. A method that produces negotiated structured knowledge, at scale, asynchronously, at near-zero incremental cost per expert, is the first new tool we’ve had for this problem in about thirty years. That alone makes it worth paying attention to.

The second is less mundane. A reciprocal elicitation loop is, if you squint, a small instance of the broader pattern we’re all going to have to figure out: how do humans and AI agents think together, in a way that takes both their strengths seriously? The research community has called this “hybrid human-AI intelligence”, and it’s still mostly aspirational. Reciprocal knowledge elicitation is one of the first concrete, well-defined problems where you can actually see what the answer might look like in practice. It’s a tractable special case of a much larger question.

That, more than the immediate productivity wins, is why I think RKEHAC and the work coming out of it deserve more attention than they’re currently getting. It’s not just a better way to do expert interviews. It’s a small, well-instrumented dress rehearsal for the kind of collaboration that’s about to become the default.