On format
The case for voice-first AI companions: research and design trade-offs
By Cody, Founder of CallByrd · May 18, 2026 · 7 min read
Updated June 8, 2026
Grounded in the research cited below. Clinical review by a licensed practitioner is being added. Our editorial standards
Most AI companion products built between 2020 and 2025 share a common architecture: a chat thread inside an app. Users open the app, see a scrollable history of past messages, type or tap to interact, and receive a typed reply. A small but growing category of AI products departs from this pattern by using the phone call as the interaction surface instead. Understanding why this design choice exists, and what it implies, requires looking at the underlying communication research and at the structural differences between the two formats.
What does the research say about communication mediums?
The empirical comparison of voice and text channels has produced consistent findings across roughly a decade of social psychology research. The central work comes from Nicholas Epley's research group at the University of Chicago Booth School.
In a 2017 study published in Psychological Science, Schroeder, Kardas, and Epley demonstrated that listeners rate the same substantive arguments as more thoughtful, more rational, and as coming from a more capable mind when those arguments are delivered by voice rather than text. The effect held across both audio recordings and live calls. The authors titled the paper The Humanizing Voice.
A 2021 follow-up by Kumar and Epley examined the predicted versus actual experience of phone calls relative to text exchanges. Participants predicted text would feel similar to voice — and that voice would feel more awkward. The actual results showed the opposite: phone calls produced significantly stronger feelings of connection, with no awkwardness penalty. The paper's title summarizes the finding: It's Surprisingly Nice to Hear You.
The combined pattern is that voice produces qualitatively different perception of, and connection with, a speaker than text does — even when the words are identical and the participants do not predict the difference in advance.
How engagement-based design affects user experience
Most consumer apps, including most AI companion products, are designed around engagement metrics: session length, daily active users, return frequency, retention curves. These metrics are not neutral. Tristan Harris and the now-substantial body of literature on attention-economy business models has documented that design patterns optimized for engagement — infinite scroll, push notifications, streaks, badges, intermittent reinforcement — are structurally in tension with user wellbeing.
For an AI companion product specifically, engagement design creates an incentive to extend conversation length and return frequency beyond what is useful for the user. The business model rewards sessions that do not end. The user outcome that matches good practice — using the product when wanted, putting it down when finished — runs counter to the business model.
Phone calls are structurally bounded. A call has a beginning, a middle, and an end. The format does not afford infinite scroll, push notifications, or streaks. When the call ends, the relationship pauses. This is a property of the medium, not a deliberate restraint applied to a chat-based product.
The role of the screen in AI companion criticism
Much of the public discussion of AI companion products focuses on concerns about dependence, parasocial attachment, and constant accessibility. Examining the specifics of these critiques, a substantial share of them are properties of screens and apps — not of AI per se. Avatar customization, scrollable history, push notifications, and high-frequency micro-interactions are design choices made at the product level, not necessary features of an AI companion.
When the conversation occurs over a phone call rather than inside an app, several of these patterns become structurally unavailable. There is no avatar to look at. There is no scrollable thread to scroll. There is no notification designed to pull the user back in mid-evening. The conversation occurs, ends, and does not follow the user out of the call.
What voice-first formats trade off
Voice-first AI is not strictly superior to text-first AI. The two formats afford different things, and the appropriate choice depends on the user's purpose:
- No searchable artifact. A phone call is not a scrollable thread. Users who want to reference what was said earlier, copy responses, or share the conversation with someone else are better served by text-based products.
- Less parallel processing. A user on a call is mostly on the call. For interactions where concurrent activity is wanted (responding while doing something else on a screen), chat is more practical.
- No fine-grained editing. Spoken messages are committed in real time. For tasks requiring careful composition (writing, code, plans), text is the appropriate medium.
- Hands-free and screen-free. In exchange, voice-first AI is usable while driving, walking, or otherwise occupied — including by users for whom an app interface is a barrier (older users, users without smartphones).
How voice-first AI fits the use case
For users whose purpose is conversation itself — processing a feeling, thinking out loud, ordinary catching up — the affordances of a phone call match the use case. The conversation is the point, not an artifact about the conversation. The boundary at the end of the call matches the natural shape of an episode of human contact.
Voice-first AI conversation tools — including CallByrd, a phone-based AI designed for unstructured conversation — fit this category. They are not appropriate for tasks where the artifact is the point, and they are not substitutes for human relationships or for professional care.
The bottom line
The choice between voice-first and text-first AI companion design is not a matter of one being inherently better. It is a matter of fit to use case. Communication research supports voice as the appropriate medium for conversation-as-goal; app-based chat is the appropriate medium for artifact-producing tasks. The growing population of users who treat AI as company rather than as a search tool fits the first use case, which is what voice-first products are built to serve.
Common questions
- What is a voice-first AI companion?
- A voice-first AI companion is a conversational AI product whose primary or sole interface is the spoken phone call rather than text-based chat. Examples include CallByrd, which is reached by dialing a phone number rather than opening an app. The category is distinct from text-based AI companions like Replika or Character.AI, which present chat as the primary interaction.
- Why does the medium of an AI companion matter?
- Research on communication channels demonstrates that the medium shapes both what is transmitted and how the relationship is perceived. Schroeder, Kardas, and Epley (2017) found voice produces higher ratings of thoughtfulness and emotional presence than text for the same content. Kumar and Epley (2021) found phone calls produce stronger feelings of connection than text exchanges. The choice of medium is not neutral.
- Aren't AI companion apps designed to maximize engagement?
- Most consumer apps — including most chat-based AI companions — are designed around engagement metrics such as session length and daily active users. This creates a structural tension between app design incentives and user wellbeing. Phone calls, by their nature, end. There is no scrolling, no infinite session, no notification designed to pull the user back in. Voice-first AI products that adopt the phone-call format inherit this boundary by default.
- What are the trade-offs of using a phone call instead of an app?
- Phone calls lack the artifact properties of text: no search history, no scrollable thread, no easy multi-tasking. For users who want to reference what was said earlier or interact in parallel with other activities, text is more practical. For users for whom the conversation itself is the point, the absence of a screen and a thread is the feature, not a limitation.
- Is a voice-first AI companion appropriate for everyone?
- No single AI product is appropriate for all use cases or all users. Voice-first AI suits conversation-as-conversation; text-first AI suits research and artifact-producing tasks. Neither is a substitute for human relationships or for professional care. Anyone experiencing a mental health crisis should contact 988 (U.S. Suicide & Crisis Lifeline). Anyone with persistent symptoms should consult a licensed clinician.
Try it yourself.
45 minutes free when you sign up. No subscription required.
Keep reading
Read next
How to use a voice-based AI companion — five patterns →Five practical use patterns, from a 60-second question to an hour-long depth call. The research behind each, and what the format isn't for.
Read next
Voice does something text doesn't →Why a phone call feels different from a chat thread — and what the research says speech transmits that typing strips out.
Read next
What words do when they're heard →Being heard is its own kind of help. What travels in voice — and what gets stripped out when words land on a screen instead of in an ear.
Read next
Put your earbuds in first →Why the earbuds setup is the single biggest predictor of whether someone calls back a second time.
Read next
AI friend vs therapist — the difference →Where AI companionship sits next to therapy, and where it has no business going.
Compare
An AI you can actually call →Not an app, not a chat box — a phone number you call and a friend picks up. Works on any phone.
Sources
- Schroeder, J., Kardas, M., & Epley, N. (2017). The Humanizing Voice: Speech Reveals, and Text Conceals, a More Thoughtful Mind in the Midst of Disagreement. Psychological Science, 28(12), 1745–1762. View ↗
- Kumar, A., & Epley, N. (2021). It's Surprisingly Nice to Hear You: Misunderstanding the Impact of Communication Media Can Lead to Suboptimal Choices of How to Connect with Others. Journal of Experimental Psychology: General, 150(3), 595–607. View ↗
- Maples, B., Cerit, M., Vishwanath, A., & Pea, R. (2024). Loneliness and Suicide Mitigation for Students Using GPT3-Enabled Chatbots. npj Mental Health Research. View ↗
Links open in a new tab. If we ever cite something you can't verify, tell us at hello@callbyrd.com.