If you create long-form content, you have probably tried using AI to turn it into social posts — and noticed the output sounds nothing like you. Brand voice extraction fixes this by reading your actual writing and building a structured profile of how you sound.
And right now, it matters more than ever. LinkedIn's 2026 algorithm actively downranks generic AI content. Human-written content gets 5.44x more traffic. One creator on Reddit put it bluntly: "One post I wrote by hand earned me more than the 143 I created using AI."
We built a brand voice extraction pipeline at Sembra to solve this problem. Here is how we built it, what broke, and what we learned.
Why "Pick Your Tone" Fails
Before we could build the right solution, we had to understand why the obvious one does not work. The content marketing community has started calling the problem "Voice Drift" — AI repurposing tools strip away the creator's personality, leaving behind neutral, generic output. Most tools try to fix this by asking you to self-describe your voice. Pick from a dropdown: formal or casual, playful or serious, authoritative or approachable. This fails for a reason that might sound counterintuitive: people are genuinely terrible at describing how they write.
I asked one of our test writers if he hedges. He said no. Then I ran his writing through the extraction pipeline and counted 15 instances of "I think," "maybe," and "tends to". He hedges constantly — he just does not experience it that way. Self-perception and linguistic reality diverge consistently, and a generation model built on self-perception produces content that sounds like who the writer thinks they are, not who they actually are on the page.
The specificity problem is maybe worse. "Professional but friendly" describes half the internet. It gives a generation model nothing concrete to work with. What a model actually needs is: this writer hedges at medium frequency using "rather," "might," and "possibly"; uses em dashes and semicolons but never exclamation points; writes sentences averaging 23 words with high variance; and addresses the reader directly but rarely uses inclusive "we." That is a usable voice profile. A dropdown is not.
Reading the Writer Instead
So rather than asking writers to describe themselves, we built a pipeline that reads their actual content and extracts a structured voice profile — a machine-readable representation of how someone genuinely writes.
We needed a foundation more rigorous than intuition, so we grounded the schema in Ken Hyland's metadiscourse framework — an empirically validated model from academic linguistics. The framework describes two dimensions of writing: stance (how the writer positions themselves relative to their own claims — hedging, boosting, attitude, self-reference) and engagement (how they connect with readers — direct address, questions, directives, shared knowledge). Hyland's framework includes curated lists of marker words — the specific words that signal hedging, boosting, and attitude. But those lists were built from academic writing — which uses completely different language than blogs and newsletters. Words like "really" and "think" barely appear in academic prose but show up constantly in informal writing. We adapted the lists using research on how informal writing actually works.
The pipeline splits work between traditional NLP and an LLM. Everything countable — word frequencies, sentence lengths, pronoun ratios, punctuation marks — is extracted deterministically. The LLM only handles what requires judgment: classifying tone, identifying structural patterns, assessing rhetorical function. And crucially, the LLM receives the deterministic results as context, so its qualitative assessments are grounded in what the numbers actually show.
The pipeline ends with a personality reveal — not a settings page. Each writer gets a name and a short portrait in second person. One of our test writers came back as "The Curious Bridge-Builder": you give direct instructions — "stop," "reconsider" — while softening them with shared knowledge appeals like "you've been waiting to feel ready." Your attitude runs on "genuinely" and "interesting," and you address the reader as a peer, not a student. She read it and recognized herself immediately. That is the validation that mattered most: not that the numbers were correct, but that the writer saw her own voice reflected back at her.
When Words Lie: The Context Problem
With the pipeline scaffolded, we tested five writers with genuinely different voices — and immediately hit a wall. The initial extraction used flat word lists: if a word appeared on our "hedge" list, it got counted as a hedge. The problem is that common English words change meaning depending on how they are used.
"I think this matters" — that is a hedge; the writer is expressing uncertainty. "People think AI is magic" — that is not a hedge; the writer is stating what others believe. Same word, completely different function. The same problem showed up with boosters: "Research shows that X" sounds assertive, but the writer is citing an external source, not personally claiming anything. Compare that with "I'll show you that this works" — now the writer is personally asserting. A writer who cites research often is not the same as a writer who personally asserts often, even though both use the word "shows."
The fix: instead of just matching words, we look at who is saying it. Linguistics research confirmed what our false positives were showing — these words only count as voice markers when the writer themselves is the subject. "I think" is a hedge; "people think" is not. "I'll show" is a booster; "research shows" is not. That single rule eliminated false positives across the board, and every writer's profile became more accurate.
The Propose-Prove Pattern
Attitude markers — the evaluative language that gives writing its emotional texture — posed a genuinely different kind of problem. Our initial approach used a curated list of evaluative words from the linguistics literature: "surprisingly," "remarkably," "importantly." This worked for some writers but completely missed others.
One test writer's entire emotional register lives in words like "startling," "astonishing," and "genuine breakthroughs" — none of which were on our list. We could have expanded the list, but that is whack-a-mole; English has hundreds of evaluative words, and different writers use completely different ones. No finite list would provide coverage.
We hit a similar wall with shared knowledge — phrases where the writer assumes common ground with the reader ("Everyone knows that evals are important"). The LLM could identify these naturally, but it also hallucinated plausible-sounding phrases that were not actually in the text. Tighter prompting helped but could not fully solve the hallucination problem.
The solution — and the most important pattern we developed — is what we call the Propose-Prove Pattern. The LLM reads the text and proposes candidate markers or phrases. Then a deterministic function proves each candidate against the source text: does this exact phrase actually appear in the writing? Any candidate the LLM hallucinated gets zero matches and is automatically dropped.
Linguistics research has long documented that curated word lists break down for open-vocabulary features like evaluative language — there are simply too many ways writers express attitude for any fixed list to capture. That is precisely why we needed a different approach for these fields: the LLM has essentially unlimited vocabulary, and the deterministic proof guarantees precision. It can identify "spookily good" for one writer and "brutal reality" for another without either appearing in any predefined list. And if the LLM invents a marker that is not in the text, the validation catches it automatically. We applied the Propose-Prove Pattern to both attitude markers and shared knowledge, and it solved both problems identically.
What Surprised Me
The most surprising finding was that contradictions in a voice profile are often the most distinctive feature. One writer's schema shows both medium-to-high hedging and high directives — she hedges her claims while telling readers exactly what to do. Both signals are true. The tension between them is her voice. A schema that tried to resolve this into a single "confidence score" would lose the most interesting signal.
The Full Pipeline
The extraction runs in 12-20 seconds per document. Traditional NLP handles the countable features in under 2.5 seconds; the LLM qualitative analysis takes 4-8 seconds. For multi-document analysis (the production path uses up to 3 writing samples), the system pools text, runs detection once, and uses a final LLM pass to reconcile qualitative fields across documents.
The architectural decision to handle everything countable with traditional NLP rather than sending it all to the LLM is not just about accuracy — it also keeps the pipeline fast and cost-efficient at scale.
Why This Changes Your Content Strategy
All of the work described above — the context filtering, the Propose-Prove Pattern, the linguistics foundation — exists to solve one problem: when Sembra turns your blog post into 15-25 social posts, every one of them needs to sound like you wrote it. Self-description dropdowns were never going to get us there. Extraction was the only viable path.
We have since fed these voice profiles into the generation pipeline — and the results are genuinely surprising. The generated posts do not just avoid sounding like AI; they pick up the writer's specific patterns. The hedging words, the punctuation habits, the sentence rhythm. It works. Brand voice is not a setting you configure — it is a signal you extract.
Join the Sembra waitlist — voice-matched content amplification is coming.