Innovation Forums: Video for VR/AR

  • The forum will be held at 16:00 - 18:00 on 03/29 (Friday)

  • For AR/VR experiences to be immersive, the requirements on image quality are demanding, e.g., high resolution, wide field-of-view, high frame rate, 6 DoF user perspective, etc. While the lack of some of these features lead to loss in immersion, others can lead to discomfort. Recent developments in image capture and processing enable us to create media-based immersive experiences involving real-world scenes. However, to build practical systems, many engineering decisions need to be made on content creation, representation, interaction and rendering, and may need a fundamental redesign of the media pipeline in the context of AR/VR. In this panel, we will discuss some of the challenges and trade-offs for building AR/VR media systems.

    Moderator: Haricharan Lakshman
    Research Scientist, Facebook

    Hari Lakshman is currently a Research Scientist at Facebook. Prior to this, he was a Research Architect in Dolby Labs, Visiting Assistant Professor in Stanford University, and Researcher in Fraunhofer HHI, Germany. His interests are in image processing, video coding, and AR/VR, with a focus on creating immersive visual experiences. A central theme in his recent years' work is to establish a media pipeline from novel capture devices that model the plenoptic function to display devices like VR & AR head-mounted displays.

    Sam Tsai
    Applied Research Scientist, Facebook

    Sam Tsai is an Applied Research Scientist at Facebook, developing machine learning-based computer vision solutions to enable intelligent cameras. Prior to Facebook, he worked at Amazon’s subsidiary A9, developing visual search solutions, and at Realtek developing embedded multimedia systems solution.

    Shlomo Dubnov
    Professor, UCSD
    Talk Title: Improvised Video - Machine Learning for Interactive Procedural Content Generation

    Machine Learning holds the promise to produce cheaper and more realistic content for VR/AR. The main challenge for such applications is understanding the environment for seamless integration of synthetic content with the real, understanding user actions in order to trigger appropriate machine response, and finally act as the content generator itself. Recently generative Deep Learning methods have shown very promising results for image and video synthesis. Moreover sequence models may capture dynamics of movement in synthetic agents. In the talk I will compare generative methods with query-based image concatenative methods that are inspired by speech and music synthesis. We will discuss issues of quality and problems in producing credible autonomous responses for improvisational video generation.

    Shlomo Dubnov is a Professor in Music and Computer Science and Engineering and founding member of the Halıcıoğlu Data Science Institute in UCSD. He is a graduate of the prestigious Israel Defence Forces (IDF) Talpiot program, the Rubin Music Academy in Jerusalem in composition and holds a doctorate in computer science from the Hebrew University, Jerusalem. Prior to joining UCSD, he served as a researcher at the world-renowned Institute for Research and Coordination in Acoustics/Music (IRCAM) in Paris, and headed the multimedia track for the Department of Communication Systems Engineering at Ben-Gurion University in Israel. He was a visiting Professor in KEIO University in Japan and University of Bordeaux, France. Currently he is serving as a director of the Center for Research in Entertainment and Learning at UCSD’s Qualcomm Institute and teaches in the Interdisciplinary Computing in the Arts program.

    Phil Chou
    Senior Staff Research Scientist, Google Daydream
    Talk Title: Compression and Streaming of Volumetric Media

    Volumetric images and video, popularly known as holograms, are the most significant new immersive media since the emergence of natural images (as photographs in 1838), audio (as telephony in 1875), and video (as television in 1926). In this talk, I argue that volumetric video (unlike 360 video) is the natural medium for the 3D world, and hence is the natural medium for Augmented Reality. Thus, volumetric media will be needed to drive tomorrow's AR computing systems, just as ordinary images and video drive today's mobile/web computing systems. However, the state of the art in volumetric compression and streaming today is still primitive, analogous to the state of the art in video compression and streaming in the late 1980s and 1990s, respectively. Hence, more work on compression and communication of volumetric media is needed to get natural, immersive content ready for AR. [These are my personal views and do not necessarily reflect those of Google.

    Philip A. Chou is currently a senior staff research scientist in the Google Daydream group. Prior to Google, he was head of compression at, partner research manager at Microsoft Research, compression group manager at VXtreme, member of research staff at Xerox PARC, and member of technical staff at AT&T Bell Laboratories. He has also been on the affiliate faculty at Stanford University, the University of Washington, and the Chinese University of Hong Kong. He has been an active member of MPEG. Dr. Chou has been responsible for seminal work in rate-distortion optimization, multiple reference frame video coding, streaming video on demand over the Internet, decision tree and dag design, practical network coding with random linear codes, wireless network coding, and a host of other work including point cloud compression. He has over 250 publications, 120 patents (some pending), and 85 standards contributions (including instigation of the MP4 file format). He is an IEEE Fellow.

    Adam Gaige
    Senior Software Engineer, JauntXR
    Talk Title: Distributing VR/AR Video at Scale

    With large data rates, few standards, and an increasing number of headsets that require high resolution, this discussion presents methods for dynamic encoding pipelines to deliver VR/AR video at scale. The discussion details how streaming video formats are used, includes compression techniques for both 360 video and volumetric assets, and covers challenges in bringing VR/AR video pipelines to production.

    With a background in Software Engineering for 3D Animation and 10+ years of industry experience working in technology companies that produce creative media, Adam has enjoyed creating software products that empower creative content to be made and distributed. His background includes studying CS and Electronic Media at Rensselaer Polytechnic Institute, working at PDI/DreamWorks Animation producing in house tools and plugins, and most recently working with Jaunt on software for generating and distributing VR/AR video.