It’s OK to be human in a machine-learned world

What do you do when an AB test can’t tell you what to do with your ML product? Nhi Ngo, Senior User Researcher at Spotify, tells the story behind Shortcuts and why a human understanding of how other humans think, see and experience the world can help build better ML products. Spotify are our 2020 Gold Partner.

Data, data,…humans?

I have always been a believer in data. As a social psychologist, I spent many years quantifying emotion through behavioral experiments. As a mixed-method user researcher, I always make product recommendations based on insights from data, from qualitative (e.g. user interviews, surveys) to quantitative (e.g. AB test results). In my experience working at Google, Netflix, and now Spotify, I have learned that data-informed decisions work especially well for products that rely heavily on personalization algorithms – most often, the models we build are so complex that relying on data is the only way we could make confident decisions.

And yet, I learned recently that we could become over-reliant on data, when it doesn’t even give you the full picture. We can build better ML products that are both impactful for the business, benefit the users, and delight them, if we use something else other than data alone: a human understanding of how other humans think, see and experience the world.

I hesitate to call this approach “intuition” – which implies it lacks reasoning, or even “empathy”. Remember that empathy is difficult, sometimes impossible, and most often overrated. Making decisions based on interpersonal human understanding, or cognitive and affective perspective taking, is rational, complementary to a data-driven approach, and increasingly important in a world where metrics dominate conversations. I learned a lot about this somewhat counterintuitive process of making decisions not entirely based on data through my recent work at Spotify with my team, Home Personalization.

ML-powered Shortcuts on Spotify Home

Earlier this year, we launched a feature internally called Shortcuts on the Spotify Home tab. It’s a dedicated space that showcases what Spotify algorithms have learned to be the user’s current favorites. You can read more about how this feature was developed here.

Product Insights partnered closely with engineering, product and design throughout the entire product development cycle. Along with my data scientist partner, we used 1-2 week longitudinal user studies and AB tests to evaluate the usability of the design, the quality of the recommendations, and the overall value of this space for Spotify users. A few questions we answered were, how much gain did the ML model offer compared to the best heuristic model? Are these gains valuable and recognizable to the users? Is the grid format – which is different from the usual carousel format on Home – a better user experience?

Throughout this development cycle, qualitative data inspired hypotheses for AB tests, and qualitative data contextualized test results. It felt like we always had the answer somewhere in the data we collected to make the decisions we thought were best for the product and the users.

Until we tested the user-facing name of this feature.

What’s in a name?

Among all the evaluations that we did, this was the “simpler” one. No complex model to deal with. However, it was crucial in helping users understand what Shortcuts is, and what to use it for. In early user testing, we found that users could not always explain why Shortcuts was recommending to them the things it did. They also couldn’t predict how these recommendations changed over time. This created confusion, and also hurt the value proposition of Shortcuts. This is not surprising, as explainable recommendations improve effectiveness and satisfaction with recommendation systems, so the lack of explainability hurts these systems.

Along with improvements to the model that we identified through research to address the explainability problem, we also wanted to aid the user with recsplanation – an explanation for our recommendations in the Shortcuts space. A few candidates that were tested were “Listen Now” (the objective that the model optimizes for), “Shortcuts” (the user-facing functionality), “Quick Access” (a UX goal of this space), and last but not least, a daypart greeting, “Good morning” (that would change with the time of day to Good afternoon or Good evening).

We were counting on the AB test to help us make this important decision. The test returned neutral.

We had a few debates about these names, as there were branding and technical constraints we had to consider, and each key member of the team also had their own preferences. We were counting on the AB test to help us make this important decision.

The test returned neutral.

What that meant was, over our typical two-week period of letting the AB test run, we did not observe any significant change in our consumption metrics – the success criteria that we previously set all our tests to optimize for. It was “too close to call” – and surprisingly, this happens very, very often in the product world. As much as we rely on AB tests, they don’t always give us the answer – or any answer!

Our designer recommended we go with the daypart name, despite my reservations. In user research sessions, we observed that participants really need a straightforward explanation of what this very visible space at the top of Home is. However, our product designer was passionate in his belief that “Good morning” would create a more human and personal experience for users. The daypart greeting would be a strong indicator to users that this is their Home, and especially, that Shortcuts is theirs, a personalized space we created just for them.

Indeed, participants were most often positively surprised in our interview sessions whenever they opened their phone and saw the greetings, even though it was followed by confusion about the recommendations. Convinced by our designer’s humanistic approach and recognizing the intangible benefits of providing users with this joy of being “greeted by Spotify” observed in these user sessions, we decided to go with our perspective taking as humans to humans, and chose the daypart name.

A human decision by humans, for humans

What we didn’t realize at the time was that choosing this name forced us to work harder and consequently make the recommendations more intuitive and explainable for the users. Participants in follow-up studies expected the recommendations to change with the daypart greetings and time of day – obviously, if Spotify says “Good morning” and shows a sleep music playlist, that does not make a good impression. So the team incorporated more time-based features in the model that helped the content become more time sensitive for those users with time-dependent habits.

We need to think humanistically to be better researchers, designers, and product owners, simply because a human perspective pushes us to build better products and experiences.

We also improved the recommendation models with the vision that Shortcuts, being the only “fixed” space on Home, would become a space with Shortcuts that will give you Quick Access to content you want to Listen Now without us having to tell you. Indeed, after the product launch, we conducted a post-launch evaluative diary study for over a month, and observed that users not only came to understand Shortcuts very quickly, but also became emotionally attached to and frequently delighted by the daypart greetings. Shortcuts, by all accounts six months after launch, is a resounding success, both quantitatively and qualitatively.

This experience was a sobering reminder for myself as a researcher that data doesn’t always tell the whole story, especially since we most often test – both qualitatively and quantitatively – at a much shorter time scale than how long people live with our products. The decision to go with a more human “recsplanation” that didn’t win a test or follow a strict convention of explainability has larger implications than just choosing the right name for a feature. We need to think humanistically to be better researchers, designers, and product owners, simply because a human perspective pushes us to build better products and experiences that benefit both the business and the users long-term. It’s an intentional attitude we adopt when designing interfaces and recommendation systems at Spotify

When data can’t give you a definitive answer, it is OK to be human and make a human decision. Prioritize user joy; treat them as you would any human in your life.

And with that, I bid you Good morning, Good afternoon, and Good night.

Nhi Ngo is a Senior User Researcher at Spotify, working in the Personalization Mission with the Home team, where Spotify surfaces personalized recommendations for users. Nhi used to be a social-cognitive psychologist studying what contextual factors shape emotional perception. In the past few years Nhi has worked on how content influences the user experience of Google Maps, and how creative assets contextualize Netflix recommendations.

Buy Your Ticket