Piper is our new voice for the Open Home
8 min read

Piper is our new voice for the Open Home

Piper is our new voice for the Open Home

Welcome to the May edition of the Open Home newsletter, the place to learn about the latest and greatest things for your smart home that improve its privacy, choice, and durability.

May was a big voice month for the Open Home. Home Assistant had their Year of the Voice - Chapter 2 event and launched all the building blocks for people to start building their own voice assistants. And oh did that happen!

Note: Home Assistant does not currently support wake words (ie. Hey Google). We are working on hardware to make this possible.

Home Assistant 2023.5 includes a super fast and high quality voice assistant speaking over 130 languages and dialects. It is available to people who support the development of Home Assistant by subscribing to Home Assistant Cloud.

You can also go fully local, and set up a voice assistant powered by OpenAI's Whisper (Speech-to-Text) and our own, brand-new Text-to-Speech system called Piper.

Screenshot of the Home Assistant interface to manage voice assistants.
In Home Assistant you can configure as many voice assistants as you want.

In today's newsletter we're going to just focus on Piper, because it's new technology that we made and it already had a lot of impact since its introduction.

If you want to learn more about all the different new voice features and how to start using them, including how to get the World's Most Private Voice Assistant, check out our other coverage:

Piper is high quality Text-to-Speech for everyone

Piper is a new Text-to-Speech system that is developed in-house at Nabu Casa by Mike Hansen, PhD. What makes Piper unique is that it's a neural network that is optimized to perform well on a Raspberry Pi 4. It takes 1 second to generate 1.6 seconds of speech audio at a medium quality level. This means that we can generate audio faster than the time it takes to play it.

Piper is able to achieve this speed without sacrificing on quality:

audio-thumbnail
Piper sample: Ryan, medium quality, English (US)
0:00
/0:17

(click for more Piper samples including other languages)

Piper is trained using open datasets and currently supports 19 different languages, which can all generate speech audio 100% locally. Home Assistant users can install Piper as an add-on with a single click. It's not limited to running inside Home Assistant, it can also be run as a Docker container.

The Piper logo.

The impact of Piper

Last month I wrote about our goals for voice in the Open Home. These goals include that we want to create an active voice community and build voice technology that is easy to extend and adapt. We've been working on this the whole year, but with the release of Piper we're starting to see real world impact.

Home Assistant is for everyone and Home Assistant is by everyone. Home Assistant is not just an application, it's a community, it's the Open Home, it's a movement. Launching Piper gave us a building block for the Open Home. The next step is to work with our community on improving it and building on top.

To create a Piper voice, you don't need an army of volunteers. You need a native speaker of a language to create a dataset, which is then trained into a Piper model and verified again by the native speaker. To help interested people contribute languages, we opened up the email address voice@nabucasa.com.

Are you missing a language? Email us if you want to help out!

One of the first to approach us was the Language and Voice Lab from the Reykjavík University. They took existing Icelandic datasets, trained them and have contributed 4 Icelandic voices to Piper 🇮🇸

audio-thumbnail
Piper sample: Ugla, medium quality, Icelandic
0:00
/0:19

They have not been the only ones – our inbox is full of people that want to help out. Here are a handful of outreach that we got 🥰

Me and my boyfriend are using Home Assistant at home. We noticed that Hungarian is missing from piper. I'm interested in helping to create the Hungarian version of it.
Hi, I’m super excited about this project. I’ve been using HA and Nabu for about a year now, still relatively new, but keen to help integrate southern Scottish English into the library because we’re notoriously hard for the standard models to interpret!
Hi there, what is done in homeassistant till now is outstanding . Sometimes when I read the new features I'm speechless. I read about the year of the voice article and interested in contributing with my arabic native language If still there is a room for arabic contributers.

We're working with these people to help them contribute their languages to Piper. Since launch we already had people contribute new voices for Brazilian Portuguese, Spanish and English 🚀

The Creator of Piper

Mike Hansen, PhD is the mastermind behind Piper and the Rhasspy open-source project. From his house in Iowa, in the middle of the United States, he works at Nabu Casa and leads the Year of the Voice at Home Assistant.

Self-portrait by Mike Hansen

Mike started Rhasspy in 2018 as a hobby side project to make open source voice assistant technology easier to use. Five years later and it is now his full-time job and having real world impact.

One of the things that sets his work apart is that he has always put a lot of effort in making the technology work for underrepresented languages. Instead of focusing on English or German, Mike spends time working on languages like Kazakh, Nepali, Vietnamese and Ukrainian. The GPUs in his basement are constantly busy training models to improve language support.

With Home Assistant we want to make sure anyone in the world can enjoy a smart home that focuses on privacy and local control. And with Mike's leadership we make sure a voice assistant will be a part of this.

The Kazakh voice for Piper

Although Piper was created with the smart home in mind, it is a generic Text-to-Speech system. It can benefit people in many situations.

For the Kazakh language, Mike worked together with the Institute of Smart Systems and Artificial Intelligence from the Nazarbayev University in Kazakhstan. They have been conducting research into providing image captioning for the visually impaired and blind for low-resource languages (languages with few publicly available datasets).

Piper is optimized for low-powered devices like the Raspberry Pi 4 and can generate audio really fast. This allowed the researchers to create a system demonstrating real-time operation with low latency.

They wrote up their findings and a pre-print version of their academic paper is available here.

A table from their academic paper showing the time it takes for each step of the process, including Piper for Text-to-Speech.
Piper's speed allows for real-time operation (source)

Piper for screen readers

Musharraf Omer has made a Piper add-on for the open source screen reader software NVDA. This will allow any user of NVDA to leverage all of the Piper voices to navigate the content on their screen.

Piper is optimized for low-powered devices and so it's fast. This is important because for a screen reader, speed is essential. A slower Text-to-Speech engine means that it takes longer for every element on your screen to be read to the user.

Musharraf felt that more speed was possible and he has started porting Piper to Rust. This should make it even faster and make it easier to port to other platforms in the future.

Support our work

The Nabu Casa logo.

Ten years ago I started Home Assistant. Five years later we started Nabu Casa to make the development of Home Assistant, the 2nd most active project on GitHub, sustainable. We have never raised any money and our work is fully funded by people like you, who like what we do and want to support our mission of building the Open Home.

If you want to help out too, subscribe to Home Assistant Cloud by Nabu Casa.

Your subscription funds the development of Home Assistant, ESPHome, Year of the Voice, Python Matter Server, Z-Wave JS, Zigpy (Zigbee) and many other projects.

Subscribers also get access to end-to-end encrypted remote access and high quality voices for 130+ different languages and dialects for the Home Assistant Voice Assistant.

Shelly Button is first product to launch with BTHome

Alright, shifting gears a little, let's talk Bluetooth. Last year in September, on the 9th anniversary of Home Assistant, we released BTHome. BTHome is an open standard for broadcasting sensor data and button presses over Bluetooth LE. It allows devices to be discovered and integrated in smart home platforms, like Home Assistant, out of the box.

Shelly BLU Button1 works out of the box with Home Assistant

Which brings us to the new Shelly BLU Button1, the first mass-market product that runs BTHome out of the box. Press the button and it will be discovered by Home Assistant. Configure it and subsequent button presses fire an event that can be the trigger for any automation!

Extra shout out: Shelly has registered the BTHome service with the Bluetooth alliance to make the standard official and has donated it back to the BTHome project. A license is available on the website to allow any manufacturer to make products using this standard. Great way of giving back 👏

Shelly BLU Button 1

Screenshot of a discovered BTHome device in Home Assistant.
Press the button and Home Assistant will automatically discover the Shelly BLU Button1 (Bluetooth integration required)

Community highlights

Dr. Zzs joins the voice party and shows how to add custom sentences to your Home Assistant voice assistant and try out alternative hardware

Christos created an ESP32 project that recognizes faces and shares it via BTHome to Home Assistant.

Technithusiast controls his home via Telegram and ChatGPT.
If you don't use zones much, check out all the different ways how James uses them.

In other news

When everything can become a voice assistant, everything will become a voice assistant. (Pierre, GitHub)
Facebook
Mastodon
twitter
YouTube
Discord