The state of smart speaker voice search in 2018

In the first part of this series, we looked closely at voice search on mobile. Once the dominant type of voice search, it has ceded some ground in the past couple of years to smart speaker voice search, thanks to the huge and growing consumer popularity of devices like the Amazon Echo and the Google Home. But because of the omnipresence and convenience of smartphones, mobile voice search is still the most widespread type of voice search in 2018, and has a certain critical mass on its side.

How does smart speaker voice search measure up by comparison? Despite its growth, is it really worth optimising for in 2018 – and how much do optimisation techniques differ from mobile voice search? Let’s dive in.

The state of voice on smart speakers

Smart speakers have been the hottest new thing in voice-controlled technology ever since the Amazon Echo’s meteoric rise to popularity in 2016, quickly followed by Google developing a rival device, the Google Home.

Now, the smart speaker market is a veritable battlefield of the tech giants, with Apple, Baidu, Facebook, Samsung and Microsoft all in the process of either launching or developing their own smart speakers with voice-controlled assistants.

At the same time, Amazon and Google Home have been busy launching more affordable versions of their respective devices, the Echo Dot and the Google Home Mini, taking them from a high-end tech accessory to a device that more or less anyone can own. Combine this with the amount of choice shortly to be available for consumers, and it seems likely that marketers will have a massive potential audience on their hands very soon.

According to eMarketer, in 2018, more than 18% of the U.S. population and 21.9% of internet users worldwide will use a smart speaker at least once a month. But how many of these people are using it for voice search?

On the Amazon Echo, the main means of carrying out tasks, from checking the weather to listening to the radio or news headlines, is through Alexa Skills. Google Home has an equivalent, Actions on Google, which claim to allow Google Home owners to accomplish more than a million different things with their smart speaker device.

Why are there still so few voice case studies out there?

Anyone can build and submit a Skill or an Action for use on Amazon or Google’s smart speakers, presenting a clear opportunity for brands to have a presence on these devices. But as far as voice search goes, these uses of a smart speaker fall under the umbrella of “voice commands” rather than “voice search”. There’s no search engine involved in activating an Alexa Skill or an Action on Google.

So how much of a voice search opportunity is there to optimise for, now and in the near future?

What volume of voice search is being carried out on smart speakers?

The good news is that early research indicates smart speaker owners use voice on their devices much more frequently that smartphone owners – unsurprising really, because voice is the sole means of controlling a smart speaker, whereas there are many other ways to use a smartphone.

Research by voicebot.ai and Verto Analytics found that smart speaker owners used their devices on average 2.79 times per day in 2017, compared with smartphone owners who interacted with their voice assistants only 0.33 times per day on average.

Further research by voicebot estimates that 47.3 million people, or nearly one in five adults, in the United States currently own a smart speaker. (If you’re wondering about the equivalent figure for the UK, research by YouGov recently found that one in ten Brits owns a smart speaker, which works out to about 6.6 million people).

With an average of 2.79 smart speaker uses per day, that equals close to 132 million daily interactions with a smart speaker voice assistant, if we use the US figures. If we then imagine – as we did for mobile voice search – that about 20-25% of those interactions are actual voice searches for local businesses, information, or other things brands can rank for, we end up with a rough ballpark figure of 26.4 million smart speaker voice searches per day.

This is far less than our (fairly conservative) estimate of the number of mobile voice searches that are being carried out (which we estimated in part one of this series). However, the figures show that this is an audience who are engaged with their devices (using them multiple times per day), and is still growing – whereas smartphone sales have been shown to be tapering off as the market for smartphones becomes saturated.

Meeker’s Internet Trends report in 2017 highlighted the slowing growth of both smartphone shipments and the smartphone installed base worldwide.

Figures from Statista forecast that unit sales for smart speakers will grow through 2019, tapering off in 2020 and 2021, but still with more than 30 million units shipped each year. Taken together, the figures predict nearly 150 million additional smart speakers being shipped from 2018 to 2021 in the US alone.

The growth of smart speakers, as purely voice-controlled devices, can also be linked directly to the growth of voice in a way that smartphones can’t, and their introduction to the market may be getting consumers used to using voice controls in a way that will transfer across to other devices like smartphones and wearables.

Who is using voice search on smart speakers, and what for?

With that said, is it worth optimising specifically for a userbase who are carrying out voice search on their smart speakers? The growth of smart speakers is typically cited by industry figures as an additional reason to optimise for voice, but is this the same audience, and should you cater to it in the same way?

Research by CapTech Consulting has provided some insight into the demographics of smart speaker owners, as well as what they typically use their devices for. It comes as little surprise that the majority of smart speaker owners are aged between 18 and 36, with this demographic making up 53% of smart speaker owners. A further 32% are aged 37 to 52, leaving only 15% of smart speaker owners who are aged 53 and over.

More interesting is the fact that, according to CapTech’s research, nearly three-quarters (73.5%) of smart speaker owners are married, and 77% own their own home. Fifty-eight percent of smart speaker owners are also well-off, earning more than $75,000 per year (which tallies with the home ownership).

This makes the profile of a “typical” smart speaker owner a young, married, home-owning couple with relatively high income, possibly having just started or about to start a family.

If you’re wondering about equivalent statistics for smartphone users, we have surprisingly little information, but research by Google in 2014 found that teenagers were more likely than adults to use mobile voice search more than once per day: 55% of teens were found to use voice search multiple times per day, versus 41% of adults.

Thus, it’s safe to assume in both instances that if you optimise for voice search, you’ll be dealing with a younger audience. However, the scenarios in which smart speaker owners use search on their devices are likely to be quite different.

In the last part of this series, we discovered that people were most likely to use voice search on their smartphones while driving, with close to 60% of respondents to a survey by Stone Temple Consulting reporting that they are most likely to use voice search in the car.

By contrast, smart speakers are largely restricted to use in the home – even if Amazon has been making efforts to bring its Alexa assistant to cars.

Smart speakers are still likely to come in useful in other scenarios where mobile voice search is frequently used, such as when a user’s hands are full or dirty, or their phone is out of reach. However, as a smart speaker is confined to one room of the house, it makes it of limited use during activities such as DIY or gardening.

A 2017 study by IFTTT (If This, Then That) discovered that most smart speakers are located in either the living room (61% of respondents), the bedroom (46%) or the kitchen (46%), making them likely candidates for questions about cooking and food, information on current events, general knowledge, pets and children, and potentially some shopping or price comparison queries.

Infographic by IFTTT

CapTech Consulting also found that inquiries and information gathering are the second-most popular use for smart speakers (reported by 42% of smart speaker owners), behind playing music (reported by 82%). Although some of these inquiries will be made straight to a relevant Alexa Skill or Google Home Action (such as Flash Briefing), Skills and Actions require a very specific phrasing to activate – something which can frustrate smart speaker owners, as reported anecdotally by Slate in an article on life with an Amazon Echo after the novelty wears off.

This may make voice search a preferred or more convenient way to access the same information.

How to optimise for smart speaker voice search: best practice tips

The volume of smart speaker voice searches in 2018 is not vast, but it could grow, as the presence of smart speakers in consumers’ homes becomes increasingly commonplace, and people become accustomed to turning to their voice assistants for information and updates.

If you’ve decided that optimising for voice search on smart speakers is a bandwagon worth jumping onto, here are some best practice tips. The good news is that many of these also overlap with techniques for optimising for voice search on mobile, so if you’ve already taken steps to optimise for a mobile audience using voice search, you can cater to smart speaker users without much additional effort.

More to the point, these are all best practices for SEO as a whole in 2018, and so can benefit your ranking and search presence across all channels.

Keep your content concise and readable

This is both a common-sense consideration for voice search and something which can help you rank. I mentioned last time that it’s best to avoid lengthy paragraphs of text when optimising for voice, as your content will be read aloud as a search result, and there’s no way for the user to quickly skim an audio answer to pick out key information. Indeed, Google’s own blog post on its Search Quality Rating Guidelines for voice results states that:

When a displayed answer is too long, users can quickly scan it visually and locate the relevant information. For voice answers, that is not possible. It is much more important to ensure that we provide a helpful amount of information, hopefully not too much or too little.

The statistics also back this up: a study by Backlinko which analysed 10,000 Google Home search results found that the average voice search result is just 29 words long. Readability is also key, as the typical Google voice search result is written at a 9^th grade level (age 14-15).

With that said, it doesn’t mean that your content has to be thin or simplistic. In fact, Backlinko found that the average voice search result page is 2,312 words long – so Google will happily draw from long-form content as long as it’s written in an accessible way.

Mark up your content with schema.org

This is another general voice search tactic which works equally well for mobile and smart speakers. Schema.org markup, a type of structured data, helps search engines determine which parts of a page contain which types of information, such as reviews, opening hours, prices or contact details.

This means that when looking for a single, definitive answer to a voice search query, the search engine will know exactly which information your voice assistant needs to read aloud.

Schema.org markup also increases your chances of getting rich results in desktop search and on mobile, making it a good move for SEO all round.

An introduction to schema.org markup for voice

Focus on speed

Google prizes fast pages. Site speed has been a ranking factor on desktop search since 2010, and as of this month, it’s also a ranking factor on mobile. With that in mind, it’s no surprise that speed can also give you a boost in voice search: Backlinko found that the typical loading speed of a page from which a voice search result is sourced tends to be significantly faster than the average webpage.

According to Backlinko, the average Time to First Byte (which measures the time between the request for a webpage being made and the first byte of the page being received by the user’s browser) for webpages that rank in voice search is just 0.54 seconds – much shorter than the worldwide average of 2.1 seconds.

Infographic by Backlinko

On top of that, the overall load time for voice search result pages is 4.6 seconds, compared with 8.8 seconds for the average webpage. So the takeaway is: it pays to be fast with voice.

Establish your authority and credibility

Domain authority is integral to SEO, as it tells Google how trusted your website is as a source of information, and determines how likely you are to rank for certain keywords.

On desktop and mobile, a user carrying out a search will be presented with a list of results to choose from; domain authority helps to determine who ranks highest, but even when a featured snippet, answer box or quick answer is present, there’s no one “be-all end-all” result presented as the definitive answer. Users can easily scroll past the top results to choose a link lower down.

Not so with voice search, where – as Backlinko points out – “Google needs to be extremely confident that they’re giving you accurate information”.

This leads to domain authority being an even stronger signal in voice search: Backlinko found that the average Domain Rating (a metric developed by Ahrefs to evaluate website-level link authority) for a voice search result was 76.8, which is very high.

Other than building your domain authority, how can you give your website additional credibility in voice search results? Positive reviews and mentions from trusted sources are two other factors that can contribute immensely to your overall brand authority, which carries a lot of weight in search both with voice and elsewhere.

In the next part of this series, I’ll examine how far we’ve come in conversing with search engines: the state of natural language search, and what it means for voice.

Read the final two parts of this series: