Filming and editing in virtual reality—creating a new language of imagery and sound

Filming and editing in virtual reality—creating a new language of ...

Ever since Google got into virtual reality, the company has strived to make the technology accessible to everyone. Instead of needing a sophisticated headset, for example, Google Cardboard can produce a rich, immersive VR experience with just, you guessed it, cardboard and a few pieces of plastic, combined with a smartphone. But the technology required to create VR environments was out of reach for many.

With the goal of simplifying VR filming and editing, a team of Google engineers was tasked with making it easier to create VR content so VR users would have more experiences to choose from. The team decided to create a system that would make creating stereoscopic 360, or 360-3D, video more efficient. After all, it cost VR filmmakers weeks of time and thousands of dollars to painstakingly stitch together a series of images into a 360 frame. “Creators had to manually mark correspondences in order to stitch images together,” explains software engineer Robert Anderson. “It tended to be very slow and painstaking.”

Robert and his team got to work prototyping algorithms that would speed up stitching time and on building a camera rig that could work with the software to expedite the process. Armed with a 3D-printed rig that held 16 GoPro cameras, the team went outside to film some test footage. “I think the first video was of one of our team members, Richard, unicycling down the canal outside the office,” Robert recalls. “We shot some video and stitched it together and it actually kind of worked! The first video we got off the prototype was way better than we expected!”

Indeed, the video test proved they were on their way to creating what would become Jump, an ecosystem for creating 3D-360 virtual reality content. It consisted of three parts: a 16-camera rig called the GoPro Odyssey, a player (YouTube), and the groundbreaking Jump Assembler, cloud-based software that converts the inputs from the multiple cameras into one seamless, three-dimensional 360-degree video.

Eager to present the GoPro Odyssey and Jump Assembler at 2015’s Google I/O, the team knew they needed compelling footage to demonstrate just how exciting this breakthrough technology was. They enlisted Jessica Brillhart, formerly a Google filmmaker, to go out into the field and create a short film.

The film, named World Tour, was an odyssey that took Jessica and the team to California, Puerto Rico, Iceland, and Japan. Along the way, cameras overheated and failed, SD cards got mixed in the shuffle (there were 16 after all), and syncing all of the prototype’s cameras to shoot at the same time proved challenging. “We spent many late nights on the phone with Jessica, troubleshooting and commiserating,” says Christopher Hoover, a senior software engineer who oversaw the camera rig. “It was a lot of flying by the seat of our pants, trying to get something that was actually watchable.”

In the end, they got the footage, and the Jump Assembler got to shine, automating the process of stitching stereoscopic 360 video into a VR-ready experience in a matter of hours.

Now, when a user uploads their raw footage into the Jump Assembler, algorithms analyze and process the footage using a combination of computational photography (the stitching together of the individual images from each of the 16 cameras to create one 360-degree image) and computer vision (algorithms that allow the computers to see the images and make decisions about how they should fit together). The result is a seamless and clear 360 video that has depth, so that objects nearby look close and objects in the background look far away.

I’m looking forward to extending, and getting Jump into the hands of more filmmakers.

Recording sound that surrounds

Software engineers Dillon Cower and Brian O’Toole were tasked with converting the shoot’s audio files into sound that could fill a 360 degree space. “If you have a 360 video without spatial audio, as you look around, the sound isn’t changing with respect to your point of view,” Dillon explains. “Spatial audio unlocks the ability to rotate sound as you’re hearing it, so that when you turn your head to hear a sound coming from the left, you actually hear what you’re looking at.”

Needing to learn on the fly, Dillon and Brian dug deep into 1970s research on Ambisonics, an early surround sound technology, to help engineer software that could convert traditional audio to spatial audio. “The filmmaker would give us audio, and we’d convert it to spatial audio. Then we had to listen to all the clips to decide which ones matched the video best,” says Dillon. After combining the video and audio, they’d send it back for feedback. Piece by piece, the audio was engineered to match the video.

Looking forward

The team’s groundbreaking technology debuted at I/O ’15 as planned, expanding the world of VR filmmaking. “It was a crazy, mad rush up I/O,” says Robert, who wrote algorithms for the Jump Assembler. “We saw it coming together in bits and pieces, so it was really cool to see the final piece and people’s reaction to it.”

My favorite thing about working here is that people trust you and give you the independence to figure things out for yourself and independently solve problems.

While some of the team have moved on to other projects, they are all eager to see what the future brings for VR filmmaking. “At the moment we only have a couple hundred cameras out there, which puts a fairly hard limit on what can be produced,” says Robert. “I’m looking forward to extending, and getting Jump into the hands of more filmmakers.”

Christopher agrees. “There’s a whole language of filmmaking that is starting to develop in VR. We need to figure out how to fully utilize VR for education and teleportation. How do we make it so that you can buy a ticket to a VR experience of a concert that is happening live? These are questions we want to answer.”

In the meantime, stitching algorithms and spatial audio have room to improve. “You could think of the current form of spatial audio we use today as being similar to low-res video,” Dillon explains. “We’re working on a higher-resolution aspect called ‘high-order ambisonics’ that requires a lot more processing power. There are multiple teams working on it—it’s a really hard problem to solve.”

But that’s the beauty of working at Google, adds Dillon. “There are a ton of unsolved problems at Google, especially in VR,” he says. “My favorite thing about working here is that people trust you and give you the independence to figure things out for yourself and independently solve problems.”