The Importance of Spatial Audio

An interview with Hear360’s Founders: Matt Marrin and Greg Morgenstein

1. How did you go from being engineers and producers to launching HEAR360 and creating audio technology?

Matt: Greg and I have been working in the music business for over 20 years. I started working in recording studios in Los Angeles and moved up through a traditional system. I started as a runner, taking food and drink orders, and then moved up the ranks as an assistant to the engineers in recording sessions. If you’re fortunate enough, you start working with those engineers and start producing and engineering records on your own. I met Jimmy Jam and Terry Lewis and started working with them exclusively, recording a lot of their projects – Janet Jackson among many others.

After you are in this world for a long time, people start to recognize that you have a lot of experience with sound and audio and a number of opportunities come up. That’s what happened for Greg as well. New start-ups and companies in the audio technology world are looking for experienced audio professionals that can evaluate, critique and help develop their technology.

Around 2007, Greg and I started consulting for a spatial audio company that was developing software for headphones and soundbars. That led us to testing and tuning prototypes like sound systems for cars with spatial audio. I consulted for SONOS evaluating the sound of their products when they were in early and late-stage development, and later started consulting for DTS. They were looking to expand their spatial headphone system and needed people with mixing experience to put together demos, and to evaluate and make observations on improving those systems.

Through the path of consulting and talking to companies about how to improve their products, we started coming up with our own product concepts and realized we could do something on our own. We launched HEAR360 and decided to make those concepts a reality.

2. Briefly, why is spatial audio important and how does it work?

Matt: There are different ways to experience immersive content – in a virtual reality (VR) scenario where you’re in a head-mounted display (HMD), watching something on a mobile device in 360°, or watching on your laptop in 360°. When you’re watching an immersive video you can look around, experience different aspects of it and control your experience. If the audio doesn’t have a head-tracking component then you’re not really getting a fully immersive experience. You’re only getting 50% of it. In these scenarios, in order to sell the consumer on the experience, you have to have 100% otherwise they won’t buy it.

People often overlook sound and consider it last. Most people pay attention to the visuals and think it’s the most important part. We think video is only half of the experience. Even though a user may not be able to articulate what immersive sound is or how important it is, they instantly recognize when they hear something that makes them feel like they are really inside an environment.

Greg: We provide those components that complete the whole sensory connection between the visual and auditory. When that is completed the experience becomes believable, real, and engaging.

Spatial audio is achieved by replicating the way our ears hear sound. Imagine someone trying to record a stereoscopic static video. They would set up two camera lenses in close proximity to our eyes so that they can see something through the lenses the same way that we do. We take that same approach to recording sound. It’s really important to capture sound with a pair of microphones from a perspective that replicates how our ears hear that sound.

Hear360_Team Photo
Matt Marrin, Greg Morgenstein and Saul Laufer, Hear360

3. What are the other ways of replicating 3D sound?

Greg: We set out to create something that captures audio with a very natural sounding result. The outcome is that you feel like you’re really there. Our current approach represents how the human ears work together in capturing scenarios. Another way of capturing spatial audio or creating a spatial audio experience is the synthesis of it. This is where you capture a sound signal and create an experience in post-based on that sound field, similar to ambisonic capture or ambisonic deliverables. It’s another method that a lot of companies in this industry have adopted. We’ve created a lot of those tools and built a ton of prototypes of those tools along the way, but we think that our first products should be really natural sounding, so we decided to start by pushing the binaural experience. We believe binaural has a very important place in VR and immersive content.

4. Explain the difference between binaural and ambisonic.

Matt: We can separate spatial audio capture into two main categories: binaural capture and delivery, versus ambisonic capture and delivery. One difference between the two formats is that ambisonic recordings require synthesis to be spatialized – the audio needs to be converted to B-Format and then fed through HRTF filters in order for a listener to hear the spatial component. The problem here is that you can’t control how a platform’s HRTF filters sound – meaning that you lose control of how your recordings sound at the delivery stage. There are other problems with ambisonic recording including phase and frequency response issues that are inherent in the de-code process. The reason why we love our omni-binaural recording solution is because it is not synthesized. All the spatial audio cues are baked into the original recording in high resolution. Coming from a background of over 20 years of recording, mixing, playing, and producing music, as an audio engineer you’re drawn to capturing something, hearing it the way you captured it and understanding how it sounds. We are obsessed with resolution and how we record things. We’re fascinated with our playback monitors, and when we mix we listen in all sorts of different environments including our cars, homes, laptops, etc. We do that for one reason – because we want to make sure our mix sounds great wherever it’s played and we want to have a firm understanding of that.

5. What was the motivation or inspiration for developing a spatial microphone? Did it have to do with creative challenges or developments in 360° or VR?

Greg: We wanted to have an end-to-end solution where we control the quality of it from capture to delivery. We felt like there was a place for an omni-binaural capture because we wanted to have something that sounded very natural and sounded like the space and the performance that was played in the initial environment. We liked that you could capture sound that way, without the need to change it or affect it if you didn’t want to. So we sought to develop a microphone that could accomplish that. From there, we had to create all of the tools to edit the captured content, mix it with non-spatial audio, or things that you wanted to become spatial, and finally deliver it encoded for your preferred content platform.

Matt: Before the 360° video and VR world evolved, another big motivation for us was how cool binaural audio is. We had been working in a space where static binaural was something that had been around for years, but not a lot of people had experienced it. When you hear it, it really changes the way you think about experiencing recorded sound. When you realize you can experience recorded sound in three dimensions, it’s kind of mind-blowing. Now there’s an opportunity where you can interact and move around in a sound environment, making the experience even more believable. When VR and 360° video came into play, immersive audio was a neglected area. At the time, it was more of an idea than anything. For us, it was all about the excitement of delivering something new and sharing that excitement with other people.

Vic Mensa, AT&T and Direct TV

6. Technically, how does the 8ball microphone work when capturing sound? Is it easy to use when it comes to production? Who should buy the 8ball microphone — who is your ideal customer?

Greg: The technical aspect of how it works is that it basically mimics four human heads facing in four opposing directions – front, left, rear and right. In every human head, we have two ears, a pinna and we have acoustical shadowing based off of the curvature of the head. Those are the basic aspects of a capture. When your ears are separate, you have a time delay between their capture and when you introduce a pinna you have filtering or tone colourization based on directionality towards a pinna on the front or the rear. When you combine all these things – including the curvature of the sphere, acoustic shadowing, time delay level differences based on the directionality of the oncoming signal and omni-directional capture – you get a spatial capture. When you do it in every direction you can head-track that capture. Since it’s based off the human head in a standard binaural capture, you get a very natural sound.

When it comes to production, we designed 8ball to be point and shoot. There are eight channels that you’re capturing. You’ll need a multi-channel audio recorder – Sound Devices, Zoom or any other hardware device that records eight channels of audio will work – and then you just point and shoot. We’ve designed all of our post tools to make it easy to manipulate it if you’ve captured something wrong. In that case, you can recalibrate with our calibration tools. If you dont want to do anything in post, you can just multiplexer (MUX) it with video with our encode tool and deliver to whatever platform you see fit. You can also convert it to an ambisonic deliverable for Facebook and YouTube.

Ideally, the people who purchase 8ball use it for a lot of music related capturing, reporting, documentary, cinematic, or live streaming because of how natural it sounds. It’s simple to use and streamlined, so it doesn’t require a lot of encoding or manipulation. Another good example is capturing natural background environments such as a park, jungle or street for VR gaming experiences and 360° experiences. It’s eerie how real it sounds and it’s great to put that into a virtual experience.

HEAR360 8ball

7. Do you need special software to decode captured audio from an 8ball microphone? Do your tools integrate with traditional audio post-production tools?

Greg: Yes, it’s all very easy to use. You don’t have to learn anything new to mix and integrate with traditional audio post-production tools. We built our workflow into all the standard digital audio workstation (DAW) systems. The first one we supported was Pro Tools because that’s basically the industry standard for mixing and editing. A lot of people use Reaper because it’s very flexible, so we created virtual studio technology (VST) plugins. These tools can be utilized in other DAW systems as well. We are working on creating some limited tools for video editors like Premiere and Final Cut Pro, where they can take an 8ball capture and put it into these systems, calibrate to capture, edit, encode, and deliver it.

8. What about playback of spatial audio, why not focus on this as well? Wouldn’t the ultimate experience of immersive sound be through a high-end amplifier or expensive speakers? What is the best way to hear spatial audio?

Greg: Most people experience VR content in a head-mounted display (HMD) and every HMD requires a headphone component to it. So that’s the first thing we wanted to support. When you’re wearing headphones, you’re separating your left ear from your right ear. During a spatial audio playback, it’s easier to separate them because you don’t have crosstalk. For over speakers, we’ve created tools that include crosstalk cancellation – so when you play information from the left speaker your right ear won’t pick it up, and vice versa. You can steer that information and work with delay to manipulate things in order to mimic a headphone experience and create a compelling spatial audio experience. It’s something that we’ve been working on since we started the company, but we don’t yet see a large use of it in the industry. Most people are wearing headphones, so we are focusing on headphones first. However, we have created tools for beamforming, crosstalk cancellation, and tools to deploy a spatial audio experience over speakers for future products.

The 8ball in action: capturing 360 audio at the Madrid Open

9. Can you live-stream spatial audio?

Matt: Yes, but only if you build very specific tools, which we have. In order to do this, we built our own streaming server to live-stream multi-channel audio. We set it up so that our web player could render live 4K video with head-trackable spatial audio over LTE.

There are a few people who are attempting to live-stream with spatial audio, however, it’s a pretty difficult process to pull off. It’s not as simple as what we’ve created – our solution is more point and shoot. Our solution also gives you the ability to scale up into an advanced live broadcast and the ability to MUX non-spatial content with spatial captures. We’ve created all the tools to allow engineers to do this in real-time, with 4K video and head-tracking over our servers.

10. What creative possibilities do you see in the future of spatial audio?

Matt: A wider range of flexibility will happen over mobile and the delivery format will open up so you can experience VR in your home, or watch it on television more easily with fully interactive sound. Also, live experiences with spatial audio like car racing, horse racing, and other sporting events will become more readily available and expected by consumers who will demand increasing levels of interactivity with the content they consume.

Spatial audio is very important for live events as it completes the transformation of the user feeling like they are truly inside the experience. The difference between headphones and speakers will start to blur and you’ll be able to experience spatial qualities on any playback system. Augmented Reality (AR), increased computing power, personalized HRTFs, and object-based spatial audio solutions will allow us to be truly interactive with sound through normal everyday activities.