AI Voice Cloning in Podcasts: Where the Line Is and Where Producers Are Crossing It


The technical question of whether AI voice cloning works in podcasts was answered some time ago. It works. Modern systems can produce voice that’s near-indistinguishable from the original speaker, given a short reference sample. The interesting question now is not technical but editorial: when should you use it, when shouldn’t you, and what do you have to disclose?

I’ve been talking to producers and hosts across the Australian podcast industry about how they’re working through this. The consensus is that there isn’t yet a consensus, and that’s actually the most useful thing to write down.

Where AI voice is being used legitimately

A few applications have become broadly accepted across the industry.

Fixing flubs and minor re-records. A host fluffs a word, gets a fact wrong, or stumbles on a name. Rather than scheduling a re-record session, the producer uses a clone of the host’s voice trained on the existing show to insert a corrected phrase. The host knows about it, has approved the practice in general terms, and listens to the final product. This is now industry-standard at many independent productions and at some networks.

Multi-language editions. Producing a Spanish or Mandarin version of an English-language show used to require either dubbing by a native speaker (changing the voice cast) or full re-recording by the host in the new language (rarely feasible). Voice cloning across languages is now good enough that the host can sound like themselves in languages they don’t speak. The ethical issues are interesting but the application is genuinely valuable for accessibility.

Continuity in episodic content. Long-running shows with hosts who have left, retired, or in unfortunate cases died have used voice cloning sparingly to maintain narrative continuity. This is more common in true crime and narrative podcasts than in conversational ones. The estate of the original voice owner is usually involved and the use is typically disclosed.

Audio book and narrative production. Audiobooks read by AI clones of the author have become commercially significant. The author consents, gets paid, and the production cost drops dramatically. This is changing the publishing economics in ways that haven’t fully played out.

Where the line gets fuzzy

Several uses are common enough to be talked about openly but uncomfortable enough that the industry hasn’t settled on norms.

Substantial generative content. Producing significant amounts of show content using voice cloning of the host who didn’t actually speak the words. The host has approved the script in advance, or in some cases hasn’t even seen the script and has approved the practice. This crosses from “production tool” into something closer to ghostwriting performed by an AI, and listeners don’t know.

Promoted episodes that include host-voiced ad reads they didn’t record. This is widespread and not transparent. The host has agreed to host-voiced advertising in general; the specific ad copy was produced and voiced without the host. The advertiser pays for what sounds like a host endorsement. The host endorses the practice in principle. The listener gets the impression of a personal endorsement that didn’t quite happen.

“Bonus content” generated from the existing show corpus. Some platforms are experimenting with AI-generated supplementary content that uses cloned host voices to discuss material outside the original episodes. The audience doesn’t know whether they’re hearing real opinions or generated ones.

Where it’s not okay

A few uses are broadly agreed to be wrong, even if they’re occurring.

Voice cloning a host or guest without their knowledge or consent. This shouldn’t need saying. It happens anyway, particularly in lower-tier production environments where rights and clearances are informal.

Generating content that the original speaker would object to. Even with general consent to clone the voice for production purposes, putting words in their mouth that contradict their views or place them in compromising positions is a betrayal of the agreement.

Using a deceased speaker’s voice without estate consent. The legal and ethical framework around this is still developing, but the principle is widely accepted in the industry.

Misrepresenting AI-generated content as authentic recording in journalism or documentary contexts. Listeners reasonably trust that the voice they’re hearing is the voice that was actually recorded saying those words. Documentary and journalism contexts in particular carry that trust forward, and violating it is a problem.

The disclosure question

The Australian podcast industry has been slow to develop disclosure norms around AI voice use. The Australian Communications and Media Authority regulates broadcasting but the podcast space falls into a gap. Industry bodies like the Australian Podcast Awards have begun touching on the topic but haven’t published binding standards.

The current state is that some producers disclose AI voice use in show notes or in the episode itself; many don’t. The disclosure practice is more common in narrative and documentary podcasts than in conversational ones. There’s no consensus on what triggers a need for disclosure — a corrected sentence? Substantial generative content? Ad reads?

A reasonable position, which a number of industry voices have argued for, is that any material use of AI-generated voice in a production should be disclosed in show notes, and any material use that’s not obvious from context should be disclosed in the episode itself. The challenge is what counts as “material.”

What producers should be thinking about

A few practical points for anyone making podcasts in this environment.

Explicit consent agreements with hosts and guests. Cover voice cloning specifically, including what it can and can’t be used for. The general consent to be recorded doesn’t extend to voice cloning by default and shouldn’t be treated as if it does.

Documentation of use. Keep records of when AI voice has been used and for what purpose, even if disclosure to the audience is limited. The norms will tighten over time and being able to demonstrate considered practice will matter.

Clear internal editorial policies. Producers, editors, and ad operations all need to know what’s allowed and what isn’t. The interesting failures tend to happen at the seam between editorial and commercial sides of a show, where assumptions diverge.

Tooling discipline. Track what voice models exist, where they’re stored, who has access. A cloned voice model of a recognisable host is a sensitive asset and should be treated as such. There have been quiet incidents of voice models being mishandled or used outside their intended scope.

Several podcast production companies have brought in outside expertise to set up appropriate workflows around AI tools. The right partner combines understanding of audio production with serious AI engineering — a few Australian consultancies do this kind of work, including the Team400 AI agency and a couple of others focused on creative production. The pattern that works is the same as it is everywhere: scope the engagement around a workflow outcome, not a technology purchase.

The longer view

Voice cloning is going to keep getting easier and better. The asymmetry between production capability and editorial discipline will widen rather than narrow over the next few years. The industry will have to develop norms that protect both creators and listeners, and the alternative is some unfortunate scandal that forces external regulation faster and less thoughtfully than the industry could do it itself.

For now, the people I most respect in podcast production are using AI voice carefully, disclosing where it’s material, and treating the tools as serious responsibilities rather than just productivity wins. That’s a reasonable model. It’s also, importantly, sustainable. The producers who treat AI voice as a free productivity boost without thinking about the editorial implications will eventually be the source of the incident that ratchets the whole industry’s freedom of action down a notch.