Interview with José Luis Díaz

Sound in film production is a crucial component that complements the impact that images have on the viewer. It is not difficult to cement what we feel when we watch a movie using a fine audio system instead of through the internal speakers of the TV.

Recording audio for audiovisual productions has its challenges. Sometimes you can place microphones on set, but often it is necessary to recreate the sound in a post-production studio. Beyond the capture of sound, production criteria and subsequent audio mixes are items that require skill and experience.
José Luis Díaz is a professional with extensive experience in sound design. To mention the movies in which he worked would make for a very extensive list of titles, however, I will highlight his work in “El secreto de sus ojos”, the argentine film that won the Oscar for best foreign film in 2010, and “Relatos salvajes”, the renowned film directed by Damian Szifron.

In the interview, José Luis comments very interestingly on the process of recording, editing and mixing, to help us to understand all the work necessary in an audiovisual production. How did you start your career in sound for film and TV?

José L. Diaz: I have a technical basis from high school. I graduated in Servomechanisms and as a Control Mechanisms Technician. Then, already studying film, I noticed that none of my classmates, who came from other more humanistic subjects, had the ability or interest to perform some of the technical tasks of our exercises such as the sound on our shorts. Naturally, I dealt with that stage in almost all cases. At that point, I decided to work in film and not for Xerox (the company where I worked as a photocopier repair technician). I met Nerio Barberis in Alex Laboratory, who was working at that time on the sound for “Nazareno Cruz y el Lobo” by Leonardo Favio. I was a kind of Intern at his side. At one point I was offered to be the sound assistant in a film by Leopoldo Torre Nilsson called “Piedra Libre”. At that time, I quit Xerox and never stopped working in sound for film. Many films coming from Hollywood surprise the audience in the cinema by their sound, however, these usually do not receive awards or prizes. From your perspective, could you tell when a movie has a good sound?

José L. Diaz: It is very difficult to explain. Often, films that receive sound recognition by the Academy of Motion Picture Arts and Sciences are those of the “super production” type. That is, those films which have large amounts of sound special effects; war films, disaster, science fiction, etc.

Regardless of the high quality of the sound work on these cases, the Academy tends to reward films that sustain their industry, or that give an increased workload to their own industry. They rarely reward work that is financially modest, but none the less very well done or innovative in the expressive sound field. But there are exceptions. Whiplash is an example. This is a film which was shot in just 18 days in a single location, but with excellent capture of direct sound and very intelligent and sensitive sound post production. Well, this film won the Oscar for Best Achievement in Sound Mixing. But it is clear, all movies receiving an Oscar for some of the sound topics have excellent professional work and do not receive these awards capriciously. But back to your question, we can say that a film has good sound when the story of a film is magnified emotionally from what is heard. The sound is able to hypnotise the viewer, to subdue him without being aware of it. It is capable of great expressive power with few elements. I would say good sound helps to immerse the viewer in a climate in a way that no other discipline can do as effectively and with so little.

jose-luis-diaz-1 Let’s say you have two takes of the same scene, one sounds better than the other but the synchro is not so good; what would your actions be in such a case?

José L. Diaz: Also here we could write a lot. First, we have several pieces of software that help us to solve this problem. Stretching or compressing the audio material without changing the pitch (musical pitch). They can trace the timing of the original take and move it to another take, for example. But it should be analysed case by case, because there are thousands of possible solutions depending on what the material is like. Considering the cleanliness you usually have in Foley recordings, how do you take care during mixing so as to not to exaggerate certain sounds?

José L. Diaz: If a sound element is free from noise (hum, clicks, distortion, etc.), that should not lead to its “overuse”. During mixing we work towards the principle, in general, that the sound sticks to the screen. We are not looking to sound bigger or smaller than what you see. If steps are perfect from every point of view, that does not mean that the steps will have a higher value than the story calls for (or that we think it is asking). Those steps, clothes, or foley generally, should not be mixed with a higher volume than the story needs. We must not distract the viewer with sounds that do not help tell the story. Everything we put forward, sonically speaking, has to further contribute to the story. Nothing should distract. During mixing of the scenes, is it usual that some of the original sound from shooting is retained, or is everything recorded and produced after the actors finish their work in the location?

José L. Diaz: The direct sound, which is captured synchronously with the image shooting, in general, is essential. During filming the actors interact under the supervision of the Director, wearing the same clothes, doing physical efforts that are seen in the film, in climatic conditions that are perceived in a certain manner. If you had to duplicate some of these dialogues after in a dubbing studio, it would be quite difficult for the actress/actor replicate the crying or laughing or physical effort made three or four months ago, without the presence of her/his scene partner. The direct sound must be recorded very well. The set must have the minimum amount of noise possible, and actors should be as comfortable as possible. Dubbings should be avoided as much as possible, unless there are good expressive reasons, for example if we know that we will need to progressively move away from ambient noise as a conversation progresses, ending with the words and music. The locomotives and train horns from Retiro station are disappearing as he tries to declare his love for her and he can’t.
So it is clear that the direct sound in a case like this is not feasible; or it is better to double this scene. But in general, one must always preserve the voices of the actors against all odds. So what elements of the sound of a movie are captured at the moment of the shooting?

José L. Diaz: The voices of the actors, and eventually some extra elements that, for production reasons, would be inconvenient to record later. For example, if the production team hired 300 extras to fill a theatre and clap rhythmically, Production Sound Mixer should record a Wild Track (sound that is specifically recorded for use during post sound without the camera is filming) of those slaps. Why? Because those 300 extras are already in situ, in the same theatre as that of the film, with the same sonic characteristics of the texts that were recorded in live sound. The integration of claps recorded in the same place with the dialogues is guaranteed. Now, what if the Production Sound Mixer does not record those claps (because he did not realise or because at the time of request, the Assistant Director banned it because the shooting schedule was very ambitious and behind schedule, for example? In the sound post production, we would ask to be provided with 300 extras to record these palms (and shouts, etc.). The cost of this is very high. They would not give us 300 but 30. And clearly, although we duplicate several times, those 30 are never going to sound like 300. In addition, although they provide us the same theatre where the scene was filmed for making this recording, the acoustics of the theatre are no longer the same as when they filmed. Why? Because 300 extras (plus 30 or 40 technicians present during the shooting) plus curtains and decorations that are no longer on stage to absorb acoustically. The absorption unit is in Sabin. 300 human bodies are many Sabins placed within a habitat. When they give us only 30 extras, the acoustic box that is the theatre hall, will sound empty, with a much greater reverberation time when the theatre was full. Therefore, that opportunity was lost during filming, and the movie will have a lower sound quality, much lower than it might have done otherwise.

Recordings of dialogue outdoors surely offers many challenges. What are the problems you face at the time of the mix to get what the director is looking for? Well, “outdoors” is a very vague word. It is ranging from a desert to a Canyon (Colorado), an alley with high parallel walls, the Bombonera, etc. Each of these locations has a unique response. One problem to be solved is everything sounding right there. During the post production, sound elements (papers daily moved by the wind, a rusty sheet that sways in the wind on a fence, the howling of the wind against some wires, or distant car horns or distant sirens, etc.) that were not present during capture live sound are added. The challenge would be for the exogenous elements to be integrated into the soundscape; so that the resulting environment has unity and it cannot be detected that they were added in the laboratory.

jose-luis-diaz-3 How or what resources you use to convey the feeling of the sound of the scene to an audience that is in a room with different acoustic responses?

José L. Diaz: It is with the use of reverberation processors that we can give cohesion to the sound elements to integrate with each other and adhere them to the screen. Not too much, nor too little, neither a too long reverb or too short. Our hearing system is very clever. You can imagine what an environment might be like just by hearing the reverb tail that returns from a given set. Furthermore, cinemas have their own reverberation. Typically, 1 second (or more). So, we must understand that a percentage of reverb that we add will hide (be inaudible to the audience) in the natural reverberation of the room where the film is projected. It is, therefore, very important to mix a movie in a cinema mix room that is acoustically designed to emulate real theatres. If instead you mix in a small studio, where the reverberation time of the Control Room is very small (0.3 seconds, for example) we will have the tendency to add too little reverb. We very easily detect if there is too little reverb. A mix made in these conditions can sound very dry in a real movie theatre. What mix formats are used for cinema productions?

José L. Diaz: 5.1 or 7.1. The Dolby Digital Print Master process is no longer performed. 35mm copies of movies are no longer made. All end in DCP. This is, we deliver PCM files without data compression. Dolby Digital is a format where the audio information is compressed. The audio channels are converted into something like that of a good quality mp3 file. What happens is that the media support, the 35 mm optics copy, cannot accommodate all the data, the entire volume of bits of 6 discrete channels, 5 of them full range (20 -20 KHz) and one more limited (20-120 Hz). The solution found required the compression on information in a psychoacoustic fashion. Existing algorithms were adapted to compress the information and give the necessary redundancy to ensure recovery of lost or corrupted data, and so that shows are reliable. Well, with DCP is not necessary to compress the information. The audio is in its “natural” state in PCM format and 5.1 or 7.1 or AURO or ATMOS. Do you use any personal scheme to arrange elements of a mix within 5.1 or 7.1 environments?

José L. Diaz: Editions of environments for a film whose final format mix is in 7.1 require more attention than if it was 5.1. Having another pair of surround channels means one must place the vaguest environments in the rear surround. In a 5.1 mix we place the most active and more defined environments in the front, that is, the channels that go behind the screen. And we put the more diffused environments in the two surround channels. In 7.1 format there are 4 surround channels, 2 on the sides of the room and 2 on the back wall of the room. Then, you have to determine the degree of precision, of focus, of the environments and send them to their final geographical destination depending on their activity, proximity, diffusion, etc. So, we would place environments with clear voices behind the screen, those voices more diffuse on the sides and distant and incompressible on channels at the back of the room.

jose-luis-diaz-5 How do you place the music within a multichannel mix? What mix format do you send to the composer?

José L. Diaz: Regarding the dynamic location (considering volume) of music, there is a hierarchical pyramid in any mix. On the cusp of the pyramid is dialogue. In the middle, music, and in the lower part, we place noises and environments. In the absence of one of the above elements, the hierarchy of lower-level items rises to the level of the absent elements. So, without dialogue, music can take the leading role. Without music, environments can ascend. And so on. However, the rules can break. There must be a good reason to break them, but when they successfully break it is something glorious. The location of the music – in my school – is as follows. The melody goes to the centre. Rhythmic and accompanying elements are distributed in the front. Reverberation chambers can go to all channels, especially the surround. The strings, if they are continuous, to the surround. Any continuous pad can go to surround. Virtually no instrument should go to LFE. But these rules can be broken. There must be a good reason, but they can break. For example, you should never put percussion at the surround or anything “spicy”, i.e., sounds that have many transients. But if the percussion is rich and appears to have a dialogue between the various components, and if his phrasing is very short, and if the pad is important, it can be very well distributed around the room. That way, this dialogue happens between the front and back of the room. But the limit is distraction. It is forbidden to distract the viewer with fireworks that don’t have to do with the history. If by putting items in the back, we distract the viewer, or we make the viewer pay attention to what we put in the surround, we’re in trouble. We lost the viewer. The person has to be immersed in the story. We should not distract her/him with silly things. It is fine the viewer be immersed into a magical environment; but distract her/him away from the story? NEVER!

We usually work in 5.1 or 7.1; but almost no one can give us something like that because there’re virtually no mixing studios with monitoring in 5.1, and much fewer in 7.1. Moreover, musicians have been trained throughout their life in stereo format. Everyone has two speakers in the living room, all our cars have a stereo playback system, CDs hold only two channels, virtually all headphones are built for stereo playback. In ordinary life hardly anyone experiences listening in 5.1, less in 7.1, outside of a movie theatre. So, it is reasonable that they (the musicians) are disoriented when we ask them to use 5.1 or 7.1 format for delivering music. They don’t know what to put in the centre channel, what to do with the surrounds, nor what use to give to the LFE. When this happens – which is most of the time – we ask them that deliver us stems. Stems are channels, usually stereo, by categories: strings stem, brasses stem, percussion stem, etc. In turn, there may be two strings stems (one more agile and one more continuous), or two or more of percussion (Timpani, on one hand, triangles, on the other, and snares on the other), etc. We ask that these stems are balanced with each other. That is, that at unity gain (all faders at 0dB) the mix sounds like the composer desires. This gives us a good starting point. With these elements on the mixer we can begin placing the music within the criteria explained above. We usually ask that the musicians don’t overdo the use of reverb. We prefer to deal with this, because then it will be better integrated. What are the differences between a mix for film and one for TV? Has your work has been modified due to content of streaming platforms?

José L. Diaz: Mixes for film have the ability to use a much wider dynamic range that mixes for TV or those with a final destination like YouTube, for example. TV dynamic range is restricted to a few dBs. That is, the distance between a whisper and a scream is very close in terms of decibels. So you have to compress dynamically. Monitoring using TV speakers is not a bad idea when the program will be broadcast in that media. Each streaming platform (YouTube, Apple Music, Spotify, Tidal, Vimeo, etc.) has its own dynamic rules. There are several articles online about this. This is one of them:

Fabio García
ZioGiorgio Network

© 2001 – 2016 NRG30 srl. All rights reserved

Read other news tagged with:
Skip to toolbar