DeepZen has released for purchase the first AI narrated audiobook.

From and Sweet AI, I’m Dr. Peper and today I’m talking about AI audiobooks.

In a previous flash talk, I discussed how we’re entering a voice first future. With smart assistants leading the way, we will request and consume information by speaking rather than type or read from a screen. We will type less on our laptops and smart phones and communicate more with voice. And as a result, people will consume more audiobooks.

Text to Speech

There are about one million books published each year in the US. Despite this only 40,000 books are recorded due to the costs. Audiobooks are time consuming and can cost up to $5000 per book to record. Not surprisingly then, companies have focused on perfecting AI to change text into speech through deep learning based systems. And there’s a whole history of machine learning breakthroughs over the last few years which has led to progressive improvement in the natural language processing algorithms. One of the biggest hurdles is AI generated voices sound flat and without emotion, in an almost comical way. Remember the Youtube Ben Bernanke video of the financial crisis? Well, all that’s changed.


DeepZen, an London based AI company, released examples of it’s latest AI text to speech technology and they sound really good. The DeepZen team trained their algorithms on thousands of hours of narrator speech. As a result, the algorithm produces human sounding, highly emotive audio recordings using text from a book. Judge for yourself. Here’s a snippet of the audiobook, The Metamorphosis by Franz Kafka, generated by DeepZen’s text to speech technology.

Isn’t that fantastic? This is an audio recording generated by a machine from the text of a book. Because of this AI technology, it’ll be easy and cost effective to make an audio recording of any book out there. Eventually in all different languages.

Emotion AI

DeepZen, and other companies like it, are at work on translating human emotion through machine or deep learning for other things besides recording audiobooks. It’s the field of emotion AI which allows machines to determine a person’s mood by the sound of their voice. And will create more human like interactions between machines and man. We can talk about that in the next Short and Sweet AI. I’m Dr. Peper.

Leave a Reply