Project description

The advent of 3-D and holographic display technologies made devices and applications such as 3DTV, video games and virtual reality systems which use those technologies available to the end-users outside the confines of research labs. The reproduction of spatial audio is essential in order for these systems to have a higher level of realism and immersiveness. 3-D and spatial audio systems can be categorised into two: The first type of systems aim to reproduce (using multiple loudspeakers) a sound field that is physically indistinguishable from an actual sound field within a prescribed temporal and spatial frequency bandwidth. The second type of systems that use multiple loudspeakers or headphones are based on psychoacoustics, and aim to achieve perceptual sufficiency of the reproduced sound as opposed to its physical accuracy. Wave field synthesis (WFS) and Ambisonics are two examples of the first kind of systems. The examples to the second type of systems are binaural audio for which audio is reproduced over a pair of headphones and multichannel audio for which audio is reproduced over multiple loudspeakers. The systems of the first type are mostly used with virtual reality applications and high-end simulators due to the high level of equipment and hardware requirements. The systems of the second type are mostly used (subject to equipment constraints) with computer games, teleconferencing applications as well as movies distributed on media formats such as DVD or Blu-ray discs.

One of the most important problems of current spatial audio systems is about terminal interoperability. Consider two users on different sides of a multimedia network sharing a virtual environment. Assume that one of the users is using a 5.1 multichannel audio system and the other is using a binaural audio system. In order for these two parties to share a similar auditory experience, the multichannel audio signals generated for the first user has to be converted to binaural and the binaural signals generated for the second user has to be converted to multichannel signals. This requires the design an execution of computationally complex and costly transcoding algorithms. If we consider a third user sharing the same virtual environment but using an Ambisonics system, we would need to design and run six transcoding algorithms which can convert binaural to Ambisonics, multichannel to Ambisonics, Ambisonics to multichannel, binaural to multichannel, Ambisonics to binaural, and multichannel to binaural signals. In other words, in a situation similar to this one, N(N-1) transcoding algorithms have to be run concurrently for N different types of spatial audio systems.

Considering that some users will be using low-end computational terminals like tablet PCs, it becomes apparent that this type of bidirectional transcoding is clearly not feasible. Another important issue with spatial audio systems is that system-specific recordings will in general not allow the (interactive) modification and editing of the spatial properties (directions and distances of sound sources, reverberation properties of the recording venue etc.) of the recorded audio scene. For instance, it is not possible to selectively increase the level of the singer and decrease the level of a single instrument from a 5.1 multichannel recording of a concert. This is due to the fact that the interchannel magnitude and phase relationships in spatial audio recordings are essential to encode the spatial information of the recorded sound secene correctly. Thus, it is very hard, if not impossible, to edit the sound signals to recompose a recorded sound scene.

The aim of this project is to process audio signals from special microphone arrays in order to separate the sound scene into sound objects and acoustical features to allow recording and coding spatial audio in a terminal-independent format. The project is being carried out at METU Graduate School of Informatics and is funded by the Scientific and Technological Research Council of Turkey (TUBITAK) under the 1001 (Support Program for Scientific and Technological Research Projects) with the contract number 113E531. Please use the navigation bar on the right hand side for more information.


Pages