Tonal and Dynamic Masking in the Mix
Tonal and Dynamic Masking in the Mix
Every musical mix is a unique balance in which the hierarchy of elements and the management of tonal and dynamic masking play crucial roles. This article will guide you in the art of prioritising, optimising the sonic fit and enhancing each track to create a clear, cohesive and musically effective mix.
Hierarchy of mix elements
The elements of the mix take on a hierarchical importance, which differs from mix to mix.
This hierarchy depends on many factors, including the nature of the production to be mixed: in hip-hop, for example, the 'beat' and the vocals are generally the most important elements.
In jazz, the ride is 'more important' than the kick, while 'space effects' are an important element in ambient music and absolutely accessory in other genres
The bass drum is a central element in dance music, but it is somewhat less important than the snare in pop music in general.
And so many other examples could be cited.
For better reasoning, we should consider the nature of each element and its role in the overall musical context, thus imagining a kind of specific hierarchy for each piece of music.
Lead vocals, for instance, are always of primary importance, but lyrics can also be of considerable importance, so that the voice that 'narrates' an 'important' text (e.g. in songwriting) will have to emerge more clearly than in other contexts, to ensure its maximum 'readability', while in other cases the lead voice may remain more immersed in the mix.
The relative importance of the individual elements will influence the way we mix them, be it levels, frequencies, effects, panning or depth.
Clarifying this hierarchy can improve workflow by minimising less important processes, such as spending too much time refining the sound of a synth chord pad to be used at a very low volume in a single verse.
This priority becomes even more important in 'closed-budget productions', characterised by a pre-established 'amount of hours' to devote to the mix; consequently, we may need to plan in a very binding way the time to devote to each element according to this hierarchy and according to the difficulty of the process, for example: 1 hour to the treatment of the drums, 15 minutes to the bass, 1 hour and a half to the lead vocals, and so on (obviously, any global operational plan with a binding pre-established time will have to keep ample room for contingencies and second thoughts).
This is why it can be decisive to be able to define, when performing a specific operation, how important it is in relation to the total economy of the piece.
By speaking of hierarchy, I certainly do not mean to say that the trumpet is more important than the guitar or other such things; instead, I mean to state that every piece of arranged and orchestrated music has within it musical parts that are more or less important in correspondence with their role, regardless of which instruments or voices perform them.
There are specific pieces and genres of music, for example, in which the harmonic support is more important than the rhythmic support and vice versa, but in general, the criteria defining the hierarchy of musical parts are the same for all.
Apart from the practical reasons mentioned above, why establish a hierarchy?
In the orchestration of a piece of music, there are numerous sound sources which, tonally and dynamically, compete with each other to 'conquer a space of audibility' within the piece, to the partial detriment of the other sources.
Consequently, while avoiding distorting the individual sources, a more or less profound intervention in dynamics and eq is required, in order to favour a good fit rather than mere superimposition.
To this end, two different operational paths open up:
- tweak all the sources a bit proportionally
- giving maximum sound quality to the most important elements, consequently exposing the less important ones to more profound interventions to adapt to the former
Most of the time between the two, I prefer the second approach, which allows the 'essential' elements of the mix to be respected and enhanced to the highest degree, leaving the other ingredients the less essential task of 'dressing' it up.
It is obvious that between the two criteria there will exist infinite intermediate gradations, more or less adoptable according to the sound content of the specific piece being worked on.
Here is an example of a hierarchical criterionwhich can serve as inspiration for some areas of pop music and related genres:
Primary elements
To be considered in order of importance:
- main melodic elements (any soloist: voice, sax, lead guitar, piano, etc.).
- The marking and supporting elements in the bass: bass drum, snare drum, HH, bass guitar
- the main element in rhythmic-harmonic structure (only one, e.g. piano, acoustic guitar, etc.).
- the remaining pieces of the drum set or the other main percussive elements (e.g. congas and bongos)
Secondary elements
To be considered in order of importance:
- secondary melodic phrasing (back-vocals, 'obbligato' phrasing of woodwinds, strings, etc.)
- harmonic elements (e.g. string or wind chords, or keyboards, etc.)
- secondary rhythmic elements (percussion, rhythm guitars, etc.)
- other musical elements
- non-musical effects
The above is by no means a rigid criterion, as in fact every song has its own 'recipe' with a specific preponderance of ingredients, just as every musical genre outside rock-pop may require quite different criteria of 'hierarchy'.
It should be considered that these criteria will only be fully applicable when the recording has been overdubbed or in any case when the sonic independence between the audio tracks is sufficiently high, such as in the case of cable line recordings or acoustic recordings made in well-isolated boxes or rooms.
When there is a large sonic influence between tracks, however, one cannot proceed in a strictly hierarchical manner, but must seek an overall balance through an almost obligatory path that will probably lead the mix to a dimension closer to the original shooting proportions, in order to be able to safeguard the tonal integrity of all elements.
The hierarchical criterion is especially applicable in the pop sector, with filming often done in overdubs or using isolated rooms or recording via cable.
In such cases, it would be advisable that, before proceeding towards a final mix, one should focus on obtaining a essential mixi.e. made up of a few primary elements with which an almost complete and convincing mix can be achieved.
Only then can the new ingredients be added, proceeding with the necessary care so as not to upset the balance previously achieved.
Masking in the Mix
Masking is the ability of one sound element to partially cover another.
Louder sounds mask weaker ones, so the louder the volume of one element in the mix, the more clearly it will tend to be perceived, but this will be at the expense of the others.
Tonal masking
This occurs when the masking elements are mainly expressed in the same tonal range as the masked ones.
Competition for the same tonal space is thus the basis of mutual masking.
The problem can be optimised by adjusting the elements of the mix in a complementary manner, i.e:
- bringing out specific tonal ranges in each of them
- attenuating the other tonal bands in order to free up space for the other elements in the mix.
This will make each element appear more defined and clear.
Dynamic masking
Percussive instruments come and go, and peaks are short in duration; for example, a kick will generally have little or no sound content between the various 'hits': it is therefore unlikely that a 'short' percussive sound, however loud it may be, can mask long duration sounds; we can say that it makes its way with each 'hit' for a very short time in which it manifests a dynamic preponderance (volume) that allows it to emerge when needed.
Percussion instruments compete for tonal space in various time-limited instants, while other instruments sustain sound for much longer periods and thus constantly struggle to gain tonal space.
A synth pad and the harmonisations of woodwinds, strings and a choir, but also the solo phrasings of voices, strings, woodwinds, and so any other sound source at high volume, will all require more attention than percussive ones, because each of their level, panning or equalisation settings will have a greater impact on the ensemble due to their persistence over time.
Surely raising the volume of a pad will cause greater masking problems than raising the volume of a snare drum, which, even if it masked the pad, would only do so for very short, negligible durations that would not destroy the continuity of the pad's musical perception; if, on the other hand, a loud pad masked a low-volume snare drum, it would do so constantly and cause a serious problem.
In this sense, percussion or plucked string instruments, due to their decay (generally less rapid than percussion), are somewhere in between.
Due to their characteristics, the piano and sometimes acoustic guitars (both in accompaniment function) can be subjected to a process of dynamic expansion that brings out more of the parts with the most incisive performance while attenuating the others.
In this way, their masking power would be reduced in many moments, in order to make the other elements of the mix stand out more easily.
Framing to define
A good tonal coupling will then allow the maximum definition of the musical parts of the arrangement and the overall sound of the mix.
Let us clarify that the masking of an element can be resolved simply by raising the volume of the element we wish to bring out; proceeding in this way alone, however, will risk masking the other elements even more; consequently, tonal masking will have to be resolved partly by managing the volumes and partly by using the tone controls.
One can also try to give a hint of harmonic saturation to the element to be enhanced, so as to create harmonics in an otherwise deficient tonal range. This tends to work especially with low and medium-low texture sources (bass, electric guitar chords, low texture synths).
How can tonal interlocking be improved?
Masking analysis
As we have seen, we must first distinguish between impulsive, short sounds (such as percussion) and soft, long sounds (such as voices and strings).
The snare drum, for example, might have an essential tonal range very close to that of the lead vocal, but its short duration will not allow it to mask the latter to any appreciable extent.
A guitar, piano and keyboard pad playing together continuously throughout the song with sustained parts and using roughly the same octave range, on the other hand, would certainly compete with each other to carve out a defined tonal space in the mix.
In general, in the high tonal ranges the tonal overlap would create less masking and confusion than in the lower tonal ranges.
Let us analyse what happens in the various bands.
In the lower bands (between 20 and 80 hz) fortunately, few sound elements are expressed: in rock pop contexts, for example, we basically find the bass guitar (long sound with soft initial peak) and the bass drum (short sound with impulsive peak), as well as the occasional incursion of the drums' timpani; consequently, in this potentially critical band, in rock pop and related contexts, it will suffice to obtain a good tonal fit between the bass drum and the bass.
The tonal range between 80 and 500 hertz is perhaps, in fact, the one most prone to the problems of tonal masking as it retains much of the criticality of the low end but is packed with 'competing' sound sources.
While not considering impulsive sounds, but only long and 'maintained' ones, we consider that in this range one finds:
- the fundamentals of certain bass notes and their most important natural harmonics
- the low and medium-low notes of instruments such as guitars (electric and acoustic), piano and the 'pads' of keyboards and strings
- the fundamentals and first harmonics of soloists such as voice, sax, lead guitar
The tonal range between 500 hz and 5000 hz is also subject, albeit to a lesser extent, to the same tonal overlap problem,
Finally, the even higher end suffers somewhat less, not least because few elements will be massively expressed between 5,000 and 20,000 hzso that this area, while still being crowded with the natural harmonics and various overtones of all sound elements (which in certain tracks we can also drop altogether), will essentially only be occupied by very bright and subtle elements, such as drum cymbals, the triangle and other similar elements.
Tonal interlocking
Methods for optimising tonal embedding are all those aimed at freeing up useful tonal space for other sound sources.
Here, then, is an operational decalogue:
- Eliminate tonal bands below the lowest fundamental performed by the element
This will be achieved by means of a high-pass filter (HPF) or its functional counterpart Low-Shelving-Eq
It must be understood that, at times, below the fundamental there will also be noise elements that are functional to the pulp of the sound, which can be attenuated with a standard slope of 6-12 db/oct in the presence of a rarefied mix, or even eliminated with a drastic slope of 24-60 db/oct) in the presence of a dense mix in the low and/or medium-low range.
- Eliminating the higher tone bands
This practice is risky in that it will cut off some of the natural harmonics and other hypertones of the sources, so this expedient should only be implemented for the 'dark' elements that are not functionally expressed in such bands, such as bass drum, bass guitar, tomtoms and a few others.
The cutoff frequency should be chosen as appropriate, also in consideration of tonal crowding in the super-high zone, by applying a low-pass filter with a moderate to medium slope (6 to 18 db oct) and a cutoff frequency between 5 and 12 kHz, as appropriate.In any case, it will be good practice to cut frequencies above 20,000 Hz in every track and in every bus, with even a drastic slope of 48 db oct for example.
This should help to drastically reduce the risks of aliasing, i.e. the generation of unwanted harmonics in the low and mid-range as a result of exceeding a frequency half of that used for sampling in the DAW session (e.g. for 48 Khz sampling, the cut should be drastically below 24 Khz, so a 20 Khz cut will do just fine).
By using higher sampling frequencies (e.g. 192 Khz), of course, the aliasing problem will be much less relevant and ultrasound cutting will become a negligible practice.
It should be borne in mind that the elimination of harmonic infiltrations in the low end as a consequence of aliasing, in addition to creating a slight distortion of timbre and a bit of dysphony, will result in the maintenance of greater sonic cleanliness and definition, thus contributing to the limitation of masking causes.
- Attenuating the low end of polyphonic accompaniment instruments
We are talking about the instruments that accompany a soloist, such as guitars, piano, a keyboard pad or a string harmonisation.
Often their extension reaches the frequencies in which the bass fundamentals are expressed (they can mostly operate between 30 and 170 hz and occasionally reach 200-240 hz.
In order to avoid excessive overlapping in the bass range, a slight but progressive attenuation of the overlapping frequencies will often be appropriate, to be carried out by means of a Low-Shelving-Eq set between 150 and 300 hz, with an attenuation slope of 6 db oct or even more
The determination of the slope and frequency can be established by ear, but this will largely depend on the actual extension of the bass part detected in the specific track on which one is working (if, for example, the bass line is between D 74 hz and B 124 hz, it will be advisable to choose a suitably low cut-off frequency in order not to leave a frequency band uncovered, creating a tonal 'hole'.
The use of static Eq to diminish mutual masking between a piano (left) and a guitar (right). Both instruments were previously optimised by means of preliminary equalisation (carried out 'upstream' with other equalisers), so only the subsequent de-masking operations carried out during mix equalisation are visible here. The musical parts of the two instruments were played at the same time, and both were played at medium extension, so they tended to partially mask each other. Frequencies below 80 hz and above 20 Khz were first cut off and frequencies below 300 hz were attenuated slightly and progressively. Subsequently the 1200 hz band was boosted in the piano and attenuated in the guitar; similarly the 4400 hz band was boosted in the guitar and attenuated in the piano; the latter finally received a brilliant boost around 7 Khz. To complete the de-masking, it was decided to split the piano and guitar symmetrically to the opposite stereo channels (40% L and 40% R).
- Distribute sources of similar tonal range differently on the stereo front
If, for example, we have a stereo synth pad, a piano and an acoustic guitar accompanying a soloist playing together on the same frequency band, a typical solution is to assign each of these elements an opposite position on the stereo front.
We could, for example:
- place the pad in the centre position opposing the pan-pot of the two channels L and R, with maximum opening at 100% of L and R;
- Place the guitar to the 50% on the left channel and counter the piano to the 50% on the right channel.
Such angular positions of the piano and guitar could become more drastic up to the 85-90% towards L or R (with further improvements in tonal embedding), but then such sources would have to be heavily reverberated in stereo to distribute the ambience of the 'cornered' element across the entire stereo front.
- Attenuating and enhancing tonal bands in a complementary manner
Let's take as an example a piano and a guitar playing an accompaniment line in parallel: in one of these instruments, it might be useful to enhance (for example) the medium-high tonal range and attenuate the mid-range, and then perform the exact opposite operation with the other source, enhancing the mid-range and attenuating the mid-high.
- Using a multi-band compressor
In place of or in addition to the action of the static eq above, it is often preferable to operate with a dynamic eq (in the form of a multi-band compressor) to achieve a more effective result without distorting the original sounds.
It is sufficient to identify the critical tonal band shared by several elements, so as to limit it when it exceeds a certain threshold in one and the other element, independently.
By using a multi-band compressor, the tonal band can be attenuated only at its peak and only to the extent necessary, without changing the perfect tonal balance obtained during the preliminary equalisation of each element.
- Digging a tonal band for the soloist
This arrangement allows more tonal space to be reserved for the soloist, in order to keep him more immersed in the mix without, however, affecting his definition, which frees up a lot of space for other secondary elements.
It is specifically a matter of attenuating, in the elements that disturb the soloist, the zone that corresponds to his fundamentals and first harmonics (200-1000 hz, depending on the type of voice and individual cases), or to the 'vocal formant' (around 2500 hz).
When hollowing out an element on a mid-band to free up space for another element, in many cases one will feel the need to compensate by enhancing an adjacent band of the attenuated element.
De-masking intervention for a quartet (female vocals accompanied by acoustic guitar, electric bass and drums). Prior to this intervention, the two tracks underwent an initial balancing with the usual preliminary equalisation process. On the guitar (left Eq) the bass below 75 Hz and the treble above 20 Khz were cut. The midrange with wide Q (o.50) was also attenuated to free up space for the voice (right Eq), which was boosted on the same band. On vocals, the 350 hz that sounded a bit muddy was attenuated, also to make room for the very pleasant natural bass of the guitar, which emerged after digging out the mids. After this reduction, the guitar sounded a little muffled but this created a 'special' magic together with the voice. The voice was also given brilliance by boosting 10 Khz but cutting above 15 Khz, which is also the tonal range where the guitar received a boost that was needed to compensate for the mid-high attenuation. This resulted in an excellent tonal fit that allowed the vocal to sit well within the guitar without being overpowered even in the softest of pitches and facilitating dynamic control in the subsequent mastering.
- Contain secondary sources with a side-chain compressor
This expedient is particularly effective when the pilot track of the side-chain compressor is the soloist of the song; it allows the volume of the secondary elements that disturb the soloist to be reduced, in order to create more dynamic space for him in the only moments in which he is active.
This expedient will keep the soloist's average volume at a lower level, in turn freeing up tonal space for other elements.
In order to avoid the onset of a pumping effect of the attenuated element (i.e. an excessively rapid rise in volume after compression), it is necessary to contain the action of the compressor within a range of about 2 db (maximum 3) and to dose the attack and release speed in such a way as to obtain maximum effectiveness without, however, highlighting the artifice; to begin with, one can try with an attack of 50 ms and a release of 100 ms, and then try varying these values until the most natural result is obtained.
Using the expedient of side-chaining with a multi-band compressor could further improve effectiveness and limit the artefacts generated by the compression, by setting the plug-in to obtain for example a maximum attenuation of 3 db in the single and specific tonal band concerned (generally around 3000 hz) and a minor reduction (or no reduction at all) in the other tonal bands.
- Diversifying elements in depth
When a solo element is partly masked by a secondary element, one can also try to "move" the latter away from the on-face presence zone, using all the parameters already suggested for this result, i.e.: decrease the direct volume, increase the volume of echoes and reverberations (sometimes even exaggerating the stereo openness of the latter a bit), decrease the high and low frequencies a bit, soften the transients with a compressor, and so on.
In this way, the element that has been penalised by the volume to make the other stand out will be able to stand out in 'diffusion', thanks to the stereo reverb, leaving some 'presence' space free.
- Using interlocking arrangements
This is not a mix gimmick, but I wanted to mention it to emphasise an important concept: a well-written arrangement would require the musical parts to be written 'interlocking', i.e. appropriately interspersing the rhythm of phrases and accents and using different ranges for the overlapping elements (e.g. different octaves, according to the dictates of good orchestral ranking).
By proceeding in this way, the definition of the parts would already be obtained at the base using only the elements of 'musical writing', as the musical parts would then always be distinct without the need for 'grasping at straws'.
Unfortunately, these techniques are only known and mastered by composers, orchestrators and arrangers with a high musical culture, and are therefore often in short supply in pop, in which, along with some very good musicians, too many producers with little quality musical background try their hand.
However, it must be acknowledged that even in genres of popular derivation (in well-crafted rock, for example), concepts, customs and conventions of 'orchestration' have gradually established themselves such that the sound elements, in the best and most mature productions and with the evolution of the style, have reached satisfactory and functional 'interlocking' criteria for the specific expression of that musical genre, also maturing the expressive style.
Dynamic interlocking
To achieve a lively and interesting mix, the dynamic expressions of the performances should be preserved to the highest degree.
It has to be said that dynamic expression should stem, at its base, from dynamically coherent and richly expressive performances, if possible conceived with an interlocking criterion.
If, for example, the simultaneous piano and guitar parts, while both insisting on the middle register, were designed to alternate (rather than overlap) the performance accents, the dynamic accents of the two instruments would manifest themselves at different points, contributing greatly to the definition of the parts without using too much space in the mix.
Unfortunately, in over-recorded performances the expressive dynamic is often mortified due to the lack of inter-play between the musicians, unless they are particularly experienced or guided by a good artistic director.
When mixing, we may therefore find ourselves in front of tracks performed with good expressive dynamics or in front of dynamically 'flat' tracks, to which it will be very difficult (if not impossible) to restore a hint of dynamic liveliness.
Respect for dynamics
Gone are the days of the 'loudness-war' when people rushed to exaggeratedly compress tracks, busses and masters in a 'perverse' attempt to impose the volume of their tracks on CD and radio compilations.
Now, in the age of streaming, this extreme tendency no longer has much meaning and, in any case, it will be appropriate to leave the mastering, in addition to its other tasks, to finalise the loudness of the master appropriately.
However, the practice of compressing masters to the max persists among many operators, with results that I personally find almost always regrettable.
The task of compression
Nowadays, compression is therefore no longer required to 'give volume' to the track within the audio medium, but rather it must meet a few but far more important requirements:
Modelling purpose
draw at will the intensity ratios between the transient and the sustain of sounds, enhancing or depowering one of the two with respect to the other; it goes without saying that this practice is not absolutely indispensable but can respond to specific creative needs aimed at making certain performances softer or, on the contrary, more aggressive; another modelling effect of the compressor, as we shall see, consists in compressing in an exasperated manner the parallel clone of a track in order to be able to dose it to the original in such a way as to be able to dose a more harsh, slabby and richer effect of the ambient colours of the take conducted to an 'on face' level by this type of compression
Levelling purpose
better define the underlying musical parts, especially in very dense mixes; compression should assist the work of levelling out volumes already carried out by means of volume faders; this levelling can be achieved on the one hand by containing excessive peaks and on the other hand by reinforcing sound emissions that are too weak; in order to maintain maximum dynamic expressiveness, however, it is required that interventions are carried out to the minimum extent possible.
For best results it will very often be preferable to perform detail work by careful volume management with faders rather than abusing compression.
A judicious use of the above-mentioned anti-masking techniques will often allow the use of levelling compression to be dispensed with or limited, while preserving the original dynamic expressiveness;
Bonding purpose
create a tonal and dynamic bonding between the various sources, leading to a greater 'sound compactness' to be dosed to an extent to be defined according to the musical genre being worked on;
Each musical genre will in fact require more or less sound gluing; this demand is highest in dance and hip hop genres, it is quite sustained in rock and pop genres in general, medium in expressive modern genres such as fusion and in modern folk and jazz, it is minimal or even non-existent in purist genres such as classical music and traditional jazz
This sticking should be handled mainly in the mastering phase, but in certain cases it can be arranged in certain groups of correlated tracks, in order to pre-determine in them a specific sound identity, functional to the 'sound' of the track
It is therefore not a question of 'giving volume' to the song by means of compression, but rather of dosing its action in the various phases, in order to obtain a specific sound, which is functional to the context of the musical genre.
Any excess compression, in fact, will diminish the pathos of the performances, to the point of creating a flat and 'boring' mix; it is for this reason that mastering must also be conducted ensuring that the target loudness required by the music industry is reached without destroying dynamic expressiveness.
In certain cases (e.g. in the presence of excessively flat performances or recordings made with heavy input compressions), as will be seen, it will even be necessary to attempt an opposite process of expansion, in an attempt to create or recover a dimension of greater dynamic vivacity.
As will be seen, this process is almost impossible when applied to a mix, whereas it can often succeed in revitalising a single track or a not too dense percussive ensemble.
An expander, adjusted to restore some liveliness to an uninspiring congas & bongos performance. The dynamics of the performance were slightly restrained at unaccented points, just enough to bring about a more incisive portamento without, however, revealing audible artefacts. With threshold at 0db and immediate attack, the entire dynamic range of the track was expanded; attenuation was contained to within 4 db thanks to the range control.
Leave a Reply
Want to join the discussion?Feel free to contribute!