← Journal

On the format

Voice vs. text: what speech transmits that typing strips out

By Cody, Founder of CallByrd · June 3, 2026 · 7 min read

Updated June 8, 2026

Grounded in the research cited below. Clinical review by a licensed practitioner is being added. Our editorial standards

Most AI companion products are chat-first: text in, text out. For research tasks, drafting, and information retrieval, this format works well. But a growing body of users has adopted chat-based AI for something different — for conversation itself, for thinking out loud, for processing a feeling. The research on communication channels suggests this use case is structurally mismatched to the medium. Voice carries information that text cannot, and the mechanism by which voice helps process experience operates differently from typing.

What does the research say about voice versus text?

The empirical comparison of voice and text communication has produced consistent findings across two decades of social psychology research. The most-cited works come from Nicholas Epley's lab at the University of Chicago Booth School.

In a 2017 study published in Psychological Science, Schroeder, Kardas, and Epley had participants deliver the same substantive arguments either by voice (audio recordings or calls) or by text. Listeners and readers then rated the communicator on dimensions including thoughtfulness, rationality, and emotional presence. Voice consistently produced higher ratings — for the same words and the same content. The authors titled the paper The Humanizing Voice.

A 2021 follow-up by Kumar and Epley, published in the Journal of Experimental Psychology: General, examined the gap between predicted and actual experience of phone calls versus text exchanges. Participants predicted text and voice would produce similar feelings of connection, and that voice would feel more awkward. The actual results were the opposite: phone calls produced significantly stronger feelings of connection than text. The paper's title summarizes the finding: It's Surprisingly Nice to Hear You.

The pattern across these studies: voice carries information that text strips out, and most people do not predict this in advance.

Why typing and talking aren't equivalent processes

When a person types, the sentence on the screen is typically not the first version that came to mind. There is an editing window — a cursor, a backspace, the option to revise — between the original thought and the final message. The user selects for clarity, for tone, for self-presentation. The text that gets sent is the third or fourth pass.

For tasks where the user wants polish (writing email, composing arguments), this editing capacity is useful. For tasks where the user is trying to processa thought or feeling — to figure out what they actually think — the same editing capacity becomes an obstacle. The user's own editor intercepts the thought before it is fully articulated.

Speech operates differently. There is no backspace. A spoken sentence is committed before its ending is fully formed; the ending arrives because the beginning has already been uttered. This produces an articulation effect that text allows the user to bypass.

How does articulation help?

James Pennebaker's foundational 1997 study, published in Psychological Science, demonstrated that articulating emotional experience — putting it into language — measurably reduces its psychological weight. The mechanism Pennebaker identified was not catharsis (emotional release) but synthesis: the cognitive process of translating an internal experience into a structured linguistic form.

Pennebaker's original research studied writing as the modality of articulation. Subsequent work has supported that the underlying mechanism — translating experience into language — does not depend on the channel. Speaking aloud accomplishes the same synthesis as writing, with one difference: speech has no composition delay. The speaker cannot quietly edit out the parts of the thought that feel uncomfortable to commit. This makes speech a faster and less filtered articulation surface than writing.

What this implies for AI conversation tools

The implications for AI conversation products follow from the research:

  1. Text-based AI is well-suited to tasks. For research, drafting, planning, coding, and any case where the response is an artifact, text is the appropriate medium.
  2. Voice-based AI is structurally better matched to conversation as the goal. Where the user wants to process a feeling, work out a thought, or have a conversation that is not about producing an artifact, voice removes the editing filter that text allows.
  3. The two are not substitutes for one another. A 2024 study by Maples and colleagues on AI conversational support emphasizes that AI conversation tools serve specific stress-management functions; they are not substitutes for human relationships or professional care.

Voice-based AI conversation tools — including CallByrd, a phone-based AI designed for unstructured conversation — fit the second use case. They are appropriate when the conversation itself is the goal, not the artifact produced by the conversation.

The bottom line

The research literature on voice versus text communication is consistent: voice transmits information that text strips out, and most people underestimate the difference. For conversation-as-conversation — processing, thinking out loud, connection — voice is the structurally appropriate medium. Speaking aloud produces a faster, less filtered articulation than typing, and the listener (or AI listener) receives information that does not survive transmission as text.

Common questions

Is voice communication actually more effective than text?
For tasks where the conversation is the point — connection, emotional processing, working out a feeling — the research supports voice over text. Schroeder, Kardas, and Epley (2017) found listeners rate the same arguments as more thoughtful when delivered by voice rather than text. Kumar and Epley (2021) found that phone calls produce significantly stronger feelings of connection than text exchanges, even when participants predicted otherwise. For tasks where the artifact is the point (research, drafting, summarization), text remains the right medium.
Why does talking aloud help process feelings?
Pennebaker's foundational 1997 work on expressive disclosure found that putting emotional experience into language reduces its psychological weight — through synthesis, not catharsis. The same mechanism applies whether the articulation is written or spoken, but speech has the advantage of immediacy: there is no composition delay, no opportunity to edit before the thought is uttered. This forces a synthesis that text allows the user to bypass.
Is talking to an AI the same as talking to a person?
No, and current evidence does not support that framing. A 2024 study by Maples and colleagues found AI conversation can serve specific stress-management functions when used as a complement to — not replacement for — human relationships and professional support. Voice AI is a meaningfully different shape of company than text AI, but neither substitutes for human relationships.
Should I use ChatGPT for emotional conversations?
ChatGPT is text-first and was built for tasks where the artifact is the response. Using it as company is widespread but operates against the medium's strengths. For talk-based interaction — processing a feeling, thinking out loud, conversation without an artifact goal — voice-based AI is structurally better matched.
What about people in crisis or with chronic mental health concerns?
Neither voice AI nor text AI is a substitute for clinical care. Anyone in crisis should call or text 988 (U.S. Suicide & Crisis Lifeline). Anyone experiencing persistent symptoms of depression, anxiety, or functional impairment should consult a licensed clinician.

Try a call.

45 minutes free when you sign up.

Free trial caps at 3 minutes. Sign up for 45.

Or .

Keep reading

Sources

  1. Schroeder, J., Kardas, M., & Epley, N. (2017). The Humanizing Voice: Speech Reveals, and Text Conceals, a More Thoughtful Mind in the Midst of Disagreement. Psychological Science, 28(12), 1745–1762. View ↗
  2. Kumar, A., & Epley, N. (2021). It's Surprisingly Nice to Hear You: Misunderstanding the Impact of Communication Media Can Lead to Suboptimal Choices of How to Connect with Others. Journal of Experimental Psychology: General, 150(3), 595–607. View ↗
  3. Pennebaker, J. W. (1997). Writing About Emotional Experiences as a Therapeutic Process. Psychological Science, 8(3), 162–166. View ↗
  4. Maples, B., Cerit, M., Vishwanath, A., & Pea, R. (2024). Loneliness and Suicide Mitigation for Students Using GPT3-Enabled Chatbots. npj Mental Health Research. View ↗

Links open in a new tab. If we ever cite something you can't verify, tell us at hello@callbyrd.com.