Case Study: Connected’s In-House 360 Audio Platform
August 18, 2020
Written by Nenad Kozul, Ryan McLeod, Rajiv Silva, and Eren Livingstone.
This article takes you through a project by Connected Labs—Connected’s dedicated R&D function—that built a 360 audio platform aimed at reducing “Zoom fatigue” and increasing collaboration.
As we’re powering through into the second half of 2020, it’s easy to forget that major disruptions—like a global pandemic—are anything but new. What is new, however, is our ability to endure and stay productive by working from home.
Projecting one’s presence over the internet has been an option for quite some time, but now that it’s mandatory many have found that this arrangement comes with a set of new challenges—call fatigue chief among them. With this in mind, our dedicated R&D function, Connected Labs, dove head-first into the journey of finding ways to make communication and collaboration easier, both in and out of Connected.
The spark of an idea we initially had was “spatial audio”. That’s it. The implementation details, designs, requirements, as well as timelines were non-existent, but that’s exactly what Labs is, and sometimes the quickest way to validate an idea is to just build it. We couldn’t learn whether spatial audio would help us solve the usual WFH problems without being able to try it out first, so we stuck to light up-front research with the goal of diving into prototyping as quickly as we could.
What is Spatial Audio?
Spatial Audio is best described as true 3D sound. But wait, don’t we already have 3D sound? Yes we do. Both 5.1 and 7.1 surround sound are fairly established in both music and home theater, while stereo audio rules the world of music. But these solutions all have the same flaw, they are only present on a single plane where the listener is static.
Spatial audio is the next evolution in audio formats. It lets the listener fully immerse themselves in sounds coming from all dimensions. More advanced simulations also take into account the movement of the listener to alter the audio streams to make it more realistic to the new perspective the listener is in. The only thing that’s needed to experience this is a pair of headphones.
And now to our project…
The first thing we wanted to learn was what work had already been done in this space. We found quite a few usages of spatial audio, and some that even combined it with teleconferencing. However, none of these products fit our desired use case.
By far the most common reason for this was that the majority of products similar to our concept also employ virtual reality (VR). VR is great (and fun), but it’s very obtrusive and the devices are clunky. Many tasks are significantly more difficult if not impossible while wearing a VR headset. They also increase fatigue, which is the opposite of what we wanted to accomplish. Even for products where not wearing a VR headset is an option, such as Mozilla Hubs, a fully rendered 3D environment was a much more complex user experience than we desired.
Interestingly, the closest prior work we found was an IEEE paper published in 2010, entitled “Placing the Participants of a Spatial Audio Conference Call”. What the authors did was extend the functionality of an open source SIP client called Ekiga to allow spatial placement of call participants. Of particular interest to us, they also performed user experiments with different spatial placements of participants in a virtual room in order to determine which ones were better based on how understandable and how locatable the speakers were to the listener. As far as we could determine, this research did not directly result in a product intended for end users, but they did conclude that the teleconferencing experience was enhanced by incorporating spatial audio. This gave us some validation of the product concept before we even started building.
As many of us grew up, we have been told to eat our vegetables, sit up straight, say “please” and “thank you”, but also to not stare, not interrupt others, and refrain from utilizing complicated setup processes for engaging in conferencing sessions. (I may have made the last one up.)
I, for one, have upped my vegetable intake as I grew older, but when it comes to engaging in conference meetings, why are we still doing the last three? They’re dominant in our daily workplace rituals, and they’re astonishingly draining.
Of all the examples I can think of, the best one would be an iconic Bugs Bunny quote from 1946: “Did you ever have the feeling you was being watched?” Indeed, I have. And as you rummage through your Zoom meeting participants, you’ll notice others likely share the same sentiment, which is why they promptly switch off their webcams as soon as the situation permits.
And why wouldn’t you? Your co-workers are always seemingly directing their eyes in your direction, even though you’re not even talking. To drive the point home, in most video conferencing software these days, you also get a constant mirror image of yourself staring right back. It’s hard to concentrate on the task at hand if you’re inclined to overanalyze your COVID haircut, all while wondering whether you should have dusted your bookshelf in the background.
This induces “Zoom Fatigue”, a newly coined phenomenon where even a simple conversation can leave video conference participants feeling drained and more tired than usual, even after they have left the call altogether.
Taking this into consideration, we’ve decided to hold off on implementing any video features for this project. Audio only. This decision pairs well with other collaboration tools, such as online whiteboarding apps, or screen sharing apps, where talking heads on a screen are not really that important. Afterall, it’s the task at hand that matters.
Many a train of thought has been derailed spectacularly due to innocent interruptions, whether they be valid questions, or just a random fire truck passing by outside. Encouraging an always-muted policy is great, but at some point other people must unmute in order to form a dialogue. While we can’t prevent the sound of cracking open a soda from being transmitted to everyone in a conference, the least we can do is make it sound less annoying and intrusive.
With spatial audio being employed, the distracting sound comes off from only one direction, as opposed to being broadcasted to both ears at the same time in a grandiose fashion. In our testing, we’ve determined the sound coming from a specific direction allows us to easily disregard any inconvenient audio or intrusive noises.
When we gave it a bit more thought, we realized nobody actually talks directly in our faces at the office, or anywhere else for that matter. We perceive such behaviour as rude, or even anxiety inducing, thus the perception of distance was a welcome addition in our daily conversations.
Make it quick, make it simple
No matter one’s knowledge in all things tech, no one likes setting up their conferencing environment when a calendar notification has popped up. This is why we insisted on making the user interface dead simple, while minimizing all entrance formalities. The environment is web-based, much like Google Meet, meaning all you need is a modern HTML5 browser and any device on hand. As long as you have a microphone.
Creating a new room and sending it over Slack or e-mail in the form of a link is quick and easy.
The First Prototype
We experimented with several different possible tech stacks, but quickly realized the simplest approach was to build our own web-based, audio-only calling app using WebRTC. To apply spatial audio to the different participants on the call, we used Google’s Resonance Audio SDK.
The first prototype was not pretty, but we completed it in a matter of days, at which point we had our first spatial audio call as a team. The difference was obvious to us all immediately. Especially when multiple participants were making sounds at the same time, having participants spatially located made it much easier to focus on one speaker and understand what they were saying.
32 years ago, Microsoft manager Paul Maritz sent Brian Valentine an email, interestingly titled “Eating our own Dogfood”. Brian was asked to increase internal usage of the company’s product which was being developed at the time. The idea behind it is, “If your product is so good, why aren’t you using it?” So, we did!
Our daily stand-up rituals were conducted through the 360 Audio platform. Some have even done their 1:1s on it! This allowed us to weed out any problems which would have appeared during a regular use session, as well as notice any pros and cons to using the platform. Oddly enough, our daily rituals have dragged on longer than usual due to them being so comfortable compared to a regular video meeting.
While video conferencing has it’s time and place, we feel this solution would be much more suited to longer sessions, especially ones which involve collaboration. Some examples would include:
- Whiteboarding new ideas
- Debugging code
- Pairing on a project
Even just having a plain casual conversation over this platform is also quite rewarding. The re-arranged audio tones down the atmosphere of urgency, which is often present in other communication tools we have today. We have seen more organic, real-world types of conversation and collaboration as a result of building this tool.
During these unprecedented times, there are some precedents which need not be disregarded.
Human nature and the way our mind processes sound are not something that can be weeded out. Instead, these need to be cared for and accommodated. Just because a tool does it’s job does not make it comfortable for daily use, and prolonged exposure can take a toll on the well-being of anyone. Reducing the pressure on our senses allows for greater productivity and less fatigue at the end of the day.
As we continue to evolve this tool to improve life for Connectors, we can’t help but wonder what impact this could have on employees across the world…
To learn more about this project and to discuss a walkthrough of the product, please reach out to email@example.com.
Thu Dec 1
Global Day of Coderetreat Toronto 2022
Earlier this month, we hosted Global Day of Coderetreat (GDCR) at our Toronto office. After a three-year hiatus, we wanted to re-awaken some of the community enthusiasm when we hosted the event for the first time back in 2019. This year we planned for a larger event which turned out to be a really good thing as we nearly doubled our attendance. And in case you were wodering, this is how we did it.
Wed Nov 9
You’re Wrong & Don’t Know It: Process Biases
Process biases occur when you process information based on cognitive factors instead of concrete evidence, skewing your perception of reality, even if all the pertinent and necessary data is right in front of you. And in our third installment of You’re Wrong & Don’t Know It, discover some of the different types of process biases, their impact, and most importantly, how they can be avoided.