Audio Watermarking Techniques

Roberta Eklund

Independent Consultant



The digital age is here. One creation of the digital era is that digital media copies are the same as the original. This fact has led to great concern for media theft via illegal duplication. Thus, the need for tracing ownership of such artistic works has spawned the digital data hiding technique known as watermarking. Watermarking is defined as the process of electronically attaching the identity of the owner of the copyright of an artistic work in such a way that it is difficult to erase. The chapter overviews the requirements of audio watermarking and the current techniques investigated by industry and academia. The current status of these techniques in terms of meeting the requirements of digital audio watermarking are presented. Applications of digital watermarking and future directions are also reviewed.



Originally, watermark referred to the invisible ink in official documents that one sees by holding a candle against the paper. Digital watermarking is the process of attaching the identity of a copyright owner to a digital audio work that is difficult to remove. The technique is analogous to a stain on a dress that is impossible to get out or the mark of floodwater upon a building after the tide has receded. The goal of digital audio watermarking is to give the artistic work owner the capability to prove the work is theirs.

There are many challenges to audio watermarking. The technical requirements are that the watermark does not add distortion, be audible in the signal, be robust against further audio processing, be not removable by attacks designed to separate the watermarking from the audio illegally and it must also be easily identifiable by a decoding mechanism.

Watermark Extraction, or the ability to detect and remove the watermark reliably is the most important point of a particular watermarking technique. Unreliable assessment of the copyright owner of course creates the question of why one would bother to hide data in the artistic work in the first place. This concept is analogous to the thief who never finds their buried treasure.


Watermarks can have other useful purposes beyond copyright protection. For instance, one could include metadata and other information hidden in the original signal to be sent, bound with the audio. Metadata are textual descriptors, such as artist name, copyright, ectera, for additional information on the artistic work. For example, I could send a Internet web site address (URL) in my compressed audio stream that tells the purchaser where to download the album art for this particular piece of music.

Figure 1. Overview of an audio watermarking system (Neubauer and Herre, 1998)


Any audio watermarking system has some basic components. An encoder to insert the audio watermark, and a decoder for extracting the audio watermark to identify the original owner. Furthermore, transactional watermarking, or unique watermarking per copy of an artistic work, must be done in real time, or a minimal, very rapidly (Webreference, 1998).


1.1 Requirements of Audio Watermarking

When first examining a problem, one needs to find the objectives that create a solution. This section discusses the requirements of audio watermarking in order to meet the goals of binding the artist owner, digitally, to their original artist work (Linnart, 1997).

Some other fundamental ideas are that detection of the watermark should not require the original cleartext artistic work to be detected. Cleartext means there is no watermark in the signal.

Table 1. Necessary conditions of audio watermarking algorithms


Survives Additional Audio Processing

Survives Bitstream and Postprocessing Audio Editing

Support for Multiple Applications

Bound to the Original Digital Audio

Easily Extracted by Detection Algorithm

Destroys Artistic work upon attack


1.1.1 InAudible

Artifacts, noise and other distortions should not be introduced by the watermark. A marketing statement on this attribute might be, "listen, you can't hear it!" as was used by Zhao (1998).


1.1.2 Robustness to Current and Future Audio Compression Technologies

A watermark should survive perceptual audio coding, lossless coding and tandeming, for these technologies are deployed in a variety of digital audio applications to date.

Lossless audio compression is the compression of an audio signal with no loss of data. Audio watermarking will automatically survive this technique by the definition of lossless audio compression.

Perceptual audio coding is a lossy compression technology. The original audio signal is analyzed to see which parts, if removed, will not be perceived missing by the listener. The signal is then compressed and decompressed. The resulting decompressed signal is only the parts of the original audio work that are required by the listener to psychoacoustically reconstruct the original piece of audio to be preceived as the original. Therefore, the audio watermark should possess the same properties as the "important" parts of the original audio so it will not be removed.

Some applications also deploy the use of tandeming, or multiple encodings and decodings of the original audio signal. Audio watermarking should survive this process. If the audio watermark is removed during tandeming, then the audio quality should also be degraded.

Finally, there is a realm of audio and speech compression that is a quality level below the level of imperceptual difference between the original and the compressed-decompressed signal. This is in the realm of "AM quality" or "FM quality" or "magnetic tape" quality, i.e. it is "good enough". In this realm, audio watermarking might also be desirable. We all have seen bootlegged copies of famous artists rock concerts for example, that are certainly not of qualitiy level as their desired released works, yet these performances are stolen and sold to adoring fans. Audio watermarking in this case could also assist in tracking these sorts of thefts and give the artist more control over what they wish to be released to the public and what they do not intend their public to experience.


1.1.3 Audio Signal Processing Robustness

Audio authoring tools, such as audio digital workstations perform pre and post processing of digital audio to insure the best audio quality for the application. Such pre and post processing includes, dynamic compression, limiters, equalization, A/D and D/A conversion, time scaling, pitch scaling, sample rate conversion, linear and non-linear filtering, mixing, addition of noise into the signal and finally noise reduction algorithms such as Dolby "C".

Also involved is the editing and cross fading of the audio work for such applications as commercials, movies, "dance version" of a song, etcera.

All of these sorts of manipulations of digital audio data imply that the digital audio watermark must be extractable and readable after this sort of processing has been done.


1.1.4 Binding the Watermark to the Original Signal

The audio watermark must be inexplicably bound to the audio signal. This implies that the statistics of the audio watermark should possess similar statistics to the original audio signal. A thief could analyze this signal for "musical" statistics and "watermark" patterns then remove the hidden data. This also implies that the watermark should be imbedded in data that is critical for the final resulting audio signal. For example, one would not place the audio watermark in a file header that could easily be stripped out, yet one could still recover the actual audio within that file format.


1.1.5 Overall Strategy of Watermark Placement

From the previous sections, we discover there is a certain strategy in the audio watermark placement, similar to hiding one's valuables in a hotel room. One should not place the watermark in place in the audio signal that are easy to detect and remove, such as in the LSB (least significant bit) of a 24bit resulting audio datatype, or in one frequency bin of a Fast Fourier Transform. One should place the audio watermark where the perceptually critical aspects of the audio signal reside. The reason for this is that most audio processing applications and lossy compression technologies will leave only the significant portions of the audio signal intact. Otherwise these processing technologies will simply destroy the original audio signal. The goal is to bind the audio watermark, hence if the watermark is removed, the music piece is also destroyed beyond recognition to the original artistic work.


1.2 Watermarking Detection

There are two ways that a pirate can defeat a watermarking scheme. The first is to manipulate the audio signal to make all watermarks undetectable by any recovery mechanism. The second is to create a situation where the watermarking detection algorithm generates a false result that is equal to the probability of a true result (Boney, et al., 1996).

The detection of the watermarking signal is the most important aspect of the entire watermarking process. For if one cannot easily and reliably extract the actual data that was inserted in the original signal, it matters little what exotic techniques were used to perform this insertion. The watermark extraction will occur in the presence of jamming signals and the above real life harsh audio conditions.


1.3 What Watermarking Should Not Be

From the previous criteria in section 1.1, there are certain watermarking techniques that should not be used. For example, any watermarking that always places the data into one frequency bin should not be utilized. The reason for this is, as a pirate, it is very easy for one to design a low pass filter, a high pass filter or a band reject filter that simply removes the watermark. Also, it is apparent that this sort of watermark automatically degrades the original audio signal.

Another poor technique is to place the watermark in moments of time where is does not matter that it is audible (so much). For example, in-between songs on a CD or in a place in the audio where the quality of the original work is so poor that the watermarked audio quality is as bad as the song. Again, this is an unacceptable technique. Simple audio editing tools could easily remove the watermark.


2 Techniques

Audio Watermarking is a newly established engineering field. Ongoing research and standardization efforts are currently in process. There have been identified four different areas in which audio watermarking has been attempted. Time domain based audio watermarking, audio watermarking in the frequency domain, audio watermarking in the compressed domain and finally, combinations of the previous three techniques have research results presented. Some techniques combine audio compression technologies and others apply to cleartext, or PCM data. PCM, or pulse code modulation is the standard representation of an audio signal in the digital realm. PCM are the bits stored on your musical CD.

2.1 Watermarking in the Time Domain

Manipulation of an audio signal in the time domain means that the work is done on the PCM data or time based data. The watermark is not inserted in the frequency domain. Frequency based techniques can be used to analyze the signal for proper placement of the digital audio watermark.


2.1.1 Changing the Least Significant Bits of Each Sample

In Bassia's technique (Bassia, 1998), a watermark is inserted by adding the watermark to the original audio signal x(i). The function f() is a noise floor calculation so that the watermark is not heard. The random sequence is represented by w(i) and is the watermark signal to be inserted.


To extract the watermark, we first represent the sum of the final watermarked signal times the original watermark that was inserted. N is fixed and represented at least one second of sampled audio data.



Then, by substitution, we replace y(i) in equation 2 with its original process. The result is equation 3. N is the same for all equations in this section.





By breaking up equation 3 into subsections, Bassia now manipulates the fact that w(i), the watermark signal, is basically gaussian white noise. Thus an equally biased audio signal times a gaussian white noise signal's mean is zero. Sometimes, the watermarked signal is not equally distributed random data. Therefore, one represents this portion of the watermarked signal by the sum from 1 to D w. The rest of the watermarked signal, represented by the first sum in equation 4, is assumed to have a mean of zero.




Then, since N is fixed, one can rewrite the second sum in equation for as equation 5.




Firstly, if the signal is not watermarked, equation 4 reduces to equation 5. If the signal is watermarked the equation is now reduced to the last two sum terms of equation 4. Now, notice that the equation still relies on x(i), the original audio signal. This implies that we must possess the cleartext audio, which is something undesirable.

Here is the fudge. We approximate x(i) by replacing this signal in the last sum term of equation 4 with y(i). The detective value, denoted by r, thus becomes




The normalized detection value does possess inaccuracies due to the approximation of the cleartext signal by the audio signal to be detected for watermarking.


2.1.2 Pyschoacoustic Models to Perform Cleartext Marking

Many watermarking techniques use psychoacoustic modeling to properly place the watermark. Often the watermark itself is represented by a shift keying technique, such as amplitude-shift keying or frequency-shift keying (Lathi, 1983). usually the sinusoidally represented data is placed where the masking property of psychoacoustic models can be utilized.

The psychoacoustic model (Tiiki and Beex, 1996) for the MPEG-4 natural audio coder calculates the maximum distortion energy, or the masking threshold that still will not be perceived as such by the audio listener (Neubauer, et al., 1998). The basic steps to this calculation are described in more detail by Bosi, et al. (1996).


2.1.3 Echo Data Hiding

Echo data hiding is the idea of placing data very close, i.e within microseconds, of the original signal in the time domain. (Bender, et al., 1996) Echo data hiding uses three parameters, initial amplitude, decay rate and offset. Figure 2 illustrates the elements of the technique. The delay plus offset is set at 1/1000 second. This equates to 1ms, which is below the lowest known delay of two signals the human ear can hear as separate. Then the delta is an additional time that is also below perceived signal sepration. Thus both a zero and a one will not be perceived with the original signal. This is assuming that the amplitude of the added data is below the amplitude of the original signal.


Figure 2. Parameters of echo data hiding (Bender, et. al, 1996)


If one draws a line connecting the original signal, the zero and the one, a decay envelope is created. This decay basically smears the original signal a bit, the line that one connected the dots with is the line perceived by the human ear as one signal, but this entire process is below the threshold of human perception, thus the signal is still perceived as the original one.


Figure 3. The auditory effect (Bender, et. al, 1996)

A sequence is encoded by representing a "zero" with offset and amplitude x and a "one" with delta + offset and amplitude y. To add more than 1 bit encoding to an entire signal, one simply breaks down the entire audio signal into small segments of length N. Then each audio segment is convolved with the data. A delta function, (which is basically what the "one" or "zero" is), convolved with the original signal produces a shift in the resulting output. Shift audio to where the "zero" or "one" is located on the time domain axis.


Figure 4. Small segments convolved with data (Bender, et. al, 1996)

Now the entire sequence segments of length N is put back together sequentially. This is the data sequence to be inserted into the original signal.


Figure 5. Reconstructed sequence delayed by "one", "zero" (Bender, et. al, 1996)

Next, create an original signal sequence, sifted initially by a "one", then another copy of the original audio sequence, shifted by a "zero". Multiply the "one" sequence by one, when a one is actually in the encoded data sequence for that segment of N and multiple the "zero" initially sifted sequence by the inverted original binary sequence to be inserted. Then add these two resulting signals together. The overall amplitude will be the same as the original signal. Also, this last mix step creates for a less noticeable distortion.

Decoding of the signal to extract the binary bit stream of data requires that one knows where delay of the one and the delay plus offset of the zero. The location is derived by taking the autocorrelation of the complex cepstrum of the original signal and the echoed version. The cepstrum is defined as the logarithm of the resulting output. The basic property here is that a power spectrum of signals with echo possesses an additive periodic component. Thus, the Fourier transform of the logarithm of the power spectrum exhibits a peak exactly where our beginning sequence is (Oppenheim and Schafer, 1989). Further details can be analyzed in (Bender, et al., 1996).


2.2 Watermarking in the Frequency Domain

For purposes of arbitrary categorization, this chapter defines Frequency Domain based watermarking on the insertion point of the watermark in Figure 6.

Figure 6. Frequency domain based watermarking (Cox, et al., 1995)


2.2.1 Phase Coding

Phase Coding can be an exceptional coding method. When the phase differential between the original signal and the modified one is kept small, it maintains close to the original un-watermarked audio's signal to perceived noise ratio (Bender, et al, 1996).

The idea behind phase coding is to hide the data by exchanging the original phase of the signal with the phase of the binary watermark plus the differential of the original audio phase. In a way, it is differential coding plus an binary phase offset. The reason for this is a randomly generated binary stream is a square wave. A square wave always possesses a phase of or .

The steps to perform phase coding are illustrated through the following figures. These figures were taken from (Bender, et al., 1996).

The first step is to break up the sound sequence into short segments as shown in figure 7. Each segment possesses fixed length of N.

Figure 7. Original signal and signal divided into segments (Bender, et al., 1996)

A discrete Fourier transform (Oppenheim and Schafer, 1983) is computed a on each segment of length N. For each ith element of phase data resulting from the DFT, up to N-1 values, compute the phrase differential between adjacent phase elements.

Figure 8. Magnitude and phase plots of DFT (Bender, et al., 1996)

The absolute phase of the watermark data signal is added to the differential, or D f of the last step. The result of this action is figure 8.

Figure 9. Data to be inserted, set this value as absolute phase (Bender, et al., 1996)

Figure 10. Adding watermark absolute phase to D f of original (Bender, et al., 1996)


Next, perform the inverse DFT on each segment with the original magnitude and the new modified phase value. The results should look like figure 11.

Figure 11. Resulting watermarking signal with phase coding (Bender, et al., 1996)

Phase coding relies on the fact that human are much more sensitive to relative phase differences in an audio signal versus requiring an absolute phrase reference. Decoding is performed by synchronization. The initial phase value is detected as a zero or one.


2.2.2 Spread Spectrum

Spread spectrum is the technique of spreading data over the frequency spectrum of the original signal. There are two types of categories for spread spectrum, frequency hopping and direct sequence. Both types of spread spectrum require that the transmitter and receiver are synchronized. Frequency hopping is basically moving about the spectra in a pseudo random pattern. The frequency spectra is divided into bands and then the signal to be hidden is moved from one band to another in rapid succession. Direction sequence, is the multiplication of the original audio signal by a binary sequence in the encoder. Direct Sequence Spread Spectrum is often referred to as DSSS.


Figure 12. Basic DSSS spread spectrum DSSS (Bender, et al., 1996)

In figure 12, the chip is the name given to the pseudorandom sequence, which is modulated by the carrier rate. The chip rate has it's own sampling frequency. Pseudorandom noise basically has similar properties as white noise, i.e. flat spectra across the entire frequency, guassian distribution and zero mean (Ziemer and Tranter, 1990). Another interesting property of pseudo random sequences is that the auto correlation of the PNS is at peak when delay is zero and minimal, or approximately zero for all other delays. For refreshment, the autocorrelation function is defined as:



The chip rate must be many times greater than the data rate. This property is illustrated in equation 8. T represents the period of each signal.




Figure 13 illustrates the DSSS encoding technique. The chip is basically the key needed by both the encoder and decoder to modulate the data sequence. The binary code is multiplied by the carrier wave and also by the chip or psuedorandom binary sequence. In this case, the carrier wave represents the frequency band that the data is being spread over. This result is then attenuated and added to the original audio signal. The effect is similar to adding white noise, due to the multiplication by the psuedorandom sequence to the data.


Figure 13. Spread spectrum process (Bender et al., 1996)


To decode the hidden data, one must phase lock loop to the chip frequency and also know the start of the chip, or pseudo-random noise sequence. A PLL, or phase lock loop detects the phase of the incoming signal and locks upon the signal (Ziemer and Tranter, 1990). The data rate must also be know, in order to synchronize up to the data of the received signal.

By its nature this technique is adding noise to the signal, thus spread spectrum may be at odds with advances in compression technology at low bit rates.


2.3 Watermarking in the Compressed Domain

Compressed domain watermarking means manipulation of the bitstream. This implies that one does not change the original bitstream syntax. The example given by (Lacy, et al., 1998) is to apply a data envelope to an MPEG-2 Advanced Audio Coding (AAC) bitstream. There are other techniques to hide data in the bitstream and also MIDI data hiding techniques. The main point is to make decoding the bitstream dependent on the watermark existing in it.


2.4 Combinations of Watermarking Techniques


2.4.1 Psychoacoustic Models and Spread Spectrum

Neubauer and Herre (1998) approached the audio watermarking solution by also utilizing a multi-dimensional approach. This work is an extension of the previous work of Boney (1996). Figure 14 illustrates the encoding process. There are three main sections, modulation, signal conditioning and the input/output audio process.


Figure 14. Psychoacoustic model and DSSS encoding (Neubauer, et al., 1998)


The masking model used here is the same process described in section 2.1.2. The main idea is to determine where one can place the data, now in the form of a modulated signal, in the audio data. Normally, one must note that it is possible for the data to be removed by the psychoacoustic model. Certainly data carrier at is susceptible to a psychoacoustic model.

The spectral weighting function is designed to scale the frequency domain representation of the modulated watermarked signal so that it will always be hidden below the masking threshold of the audio. For each critical band the energy of the spread spectrum signal is weighted to be the energy computed by the psychoacoustic model. This does imply that the watermark could be lost during periods of silence or of extreme low energy, but even this is not that much of a problem, as we shall see in the decoder description.

Modulation is very similar to the spread spectrum technique described in section 2.2.2. There are five main components in this diagram


    • PNS source - pseudo noise sequence
    • Data source - watermark data to be inserted into the audio signal
    • BPSK Spreader - binary phase shift keying (Proakis, 1995)
    • Multiplication process
    • Carrier wave - cos(wt)


This technique is called DSSS-BPSK modulation (Proakis, 1995). In this technique the BPSK spreader is more complex than the spread spectrum technique of (Bender, 1996). In Bender's technique the BPSK spreader is the second multiplication in equation 9. Neubauer (1998) implies that the BPSK spreader breaks down to a multiply. The reason for this is the data and the PNS possess opposite values, i.e. -1, +1, for all time. The pseudo noise sequence is a bipolar, N length maximum sequence or m-sequence. Similarly to the previous PNS property discussion in the spread spectrum section, an m-sequence has this autocorrelation function:






The decoder will use this property of the m-sequence to detect the watermark. The matched filter, seen in figure 15, inverts the data sequence so it is not reversed, i.e. 1000 because 0001. This sequence now becomes the matched filter's coefficients. Note that by multiplying the original sequence by the inverse of that sequence, one is computing the autocorrelation when j=0. Thus, when the gate closes the threshold decision is true upon receiving value N. The synchronization unit is there to count the N length of the sequence. Since the energy of the watermarked signal has been scaled, this aspect of the algorithm is particularly important to insure that proper sequencing occurs.


Figure 15. Decoder (Neubauer, et al., 1998)


2.4.2 Frequency Domain Shaping and Time Domain Weighting

Laurence Boney, et al., introduces a technique by utilizing the psychoacoustic masking ability of the MPEG Psychoacoustic Model (MPEG, 1993). Boney creates a Frequency Domain Based Masking Filter on the PN Sequence representing the data to be inserted and then a time domain energy based weighting to insure that the inserted watermark is above the MPEG audio compression algorithm quantization level.


Figure 16. Generator for masking filter data insertion (Boney, et. al, 1996)



The window length used in is 512 samples, weighted with a Hanning window. Overlapped transform of 50% is used. The masking threshold is approximated with a 10th order all poll filter M(w). This technique is performed using a least squares approximation technique (MPEG, 1993). Then the PN-sequence, S(w), is filtered with the masking filter M(w) to insure that the watermark PN-sequence spectrum is below the masking threshold for this particular block of audio being processed. As one can see from figure 14, the watermark is then additionally weighted with a scale factor, multiplied by the weighted Hanning Window and finally inserted into the audio spectrum before proceeding to the quantization phase of the encoding process.




2.4.3 Psychoacoustic Masking and Bitstream Watermarking

ATT Research Labs presented a technique that incorporated bitstream watermarking with psychoacoustic masking. This technique is only described in conjunction with MPEG2-AAC audio. The tools listed that are used are the psychoacoustic model, rate control, quantization and noiseless, or huffman coding blocks. In this system, the audio watermarking procedure is to choose scale factor bands to be marked via the perceptual model from the MPEG-2 audio. These scale factor bands, from the spectral lines of AAC, are then sent to a quantizer step size and huffman table. Lacy implies that the huffman table representing null huffman codes, or noise, should not be watermarked. This implies that one is hiding the data via some sort of sinusoidal masking concept. This particular table is not transmitted, as there is no spectral data to send.


Figure 17. Block diagram of ATT's combined technique (Lacy, et al., 1998)


The original quantizers, or scale factors of MPEG2-AAC, are divided by a set of multipliers. An offset vector, representing the watermark data, is added to the newly scaled quantizers. There are three techniques or methods to perform this operation as presented by Lacy, et al. (1998).

In order to find the beginning of the sequence, Lacy used the frame boundary of the AAC bitstream. The initial scale factors are modified in the LSB in order to contain a series of synchronization codes.

One of the more interesting aspects of this placement is the fact that random flipping of a scale factor LSB will produce artifacts. This is unlike the situation in manipulating the LSB of a time domain based signal. Due to the conservation of bits, i.e. compression, all data matters. 



The applications of watermarking are vast. Hence, watermarks often perform different functions based upon the application. These three functions are: identification of the original of the content, tracing illegally distributed copies of the content and disabling unauthorised access to the content (Lacy, et al., 1997).

In this section, the advantages and disadvantages of each technique are listed in table form. Audio quality issues are also discussed. No one algorithm, to date, has deemed to be perfected. Hence, the quest for better algorithms and more secure networks is ongoing.


3.1 Time Domain

The issue with this technique is computational complexity and I must know the actual watermark. If I do not know the watermark, I must go through 2x combinations of bits, where x is the frame size, to actually identify the watermark. Thus, for transactional watermarking, where the decoder is identifying the client, this technique is not applicable.

Table 2. Advantages and disadvantages of least significant bit insertion

Too many Assumptions


Highly Susceptible to Attack


Requires no Decoder


Potentially reversed Engineered by Player for Removal



Psychoacoustic models and data placement fall short when one then compresses the audio at a later date with some perceptual coder. The psychoacoustic model in the compression algorithm performs the same task as the watermark encoder. It seeks redundant portions of the audio signal that will not be perceived. Since one of these redunancies happens to be your carefully placed watermark, Wala! it is now removed.


Table 3. Advantages and disadvantages of psychoacoustic models and cleartext

Too Complex for Transactional


Susceptible to Advances in Perceptual Models


Survives all Processed Generations


Potentially reversed Engineered by Player for Removal



The most difficult aspect of echo hiding is the computation of the complex cepstrum space to decode the watermark. It does not require the original watermark to be detected. It is also susceptible to psychoacoustic models removing the watermark.


Table 4. Advantages and disadvantages of echo data hiding

Too complex for transactional


Watermark detection difficult


Not robust for compression


Not robust for resonance audio processing



3.2 Frequency Domain

The main difficulty with spread spectrum is that the noise added eventually with be at odds with any perceptual coder. Thus, the watermark to be inaudible, under all bitrates is the challenge of this technique. The synchronization methods do require that the PNS data rate is known, but do not require the actual watermark.


Table 5. Advantages and disadvantages of spread spectrum





Bound to audio


Reliable Detection



Phase coding is also a candidate to run at odds with lossy compression technology and further audio processing. There are many audio processes that manipulate the phase of the audio signal. This may interfere with the watermark being detected.


Table 6. Advantages and disadvantages of phase coding



Data Interval


Watermark not needed


Low Distortion




3.3 Compressed Domain

Hiding information within a bitstream is highly desirable. There is little processing involved and the information is easy to parse Unfortunately, bitstreams are more easily hacked that the original audio by computer hackers. Certainly one can be incredibly clever and create data dependencies with the bitstream and the watermark, but unfortunately, hackers are also usually quite clever. The watermark is also not audible.


Table 7: advantages and disadvantages of compressed domain

Cannot Survive D/A conversion


Not Robust against Attack


Low Complexity


Supports All Bitrates




3.4 Combined Techniques

Most of these techniques are the ones most recently developed. As research has progressed, it has become apparent in the perceptual audio coding world that binding the watermark to the actual encoding process is a much better idea to insure the watermark is not audible. Bitrates are now in the 8kbps range for audio coding, hence there is very little redundant information for which to hide the watermark behind. By combining techniques, one has a better assurance that the watermark will not be audible under all bitrates of a compression technology.

Table 8. Frequency and time domain shaping and weighting

Robust to Processing


Access to Original PN Sequence




Reliable Detection



This technique has low computational complexity, thus is ideal for transactional, or per sale, watermarking. It also does not degrade the audio quality. One issue will be it's suseptability to attack and also the watermark is lost upon D/A conversion and also after the bitstream is decoded.

Table 9. Psychoacoustic modeling and bitstream marking



Computational Complexity Low


Bitrate of Encoded Audio


Reliable Detection



The use of a watermark algorithm bound to the compression technology is interesting. It stops the "at odds" problem of adding data to a process designed to extract redundant audio information. Yet, experimental results reveal that this technique still creates some noticable audio distortion. The technique is robust against attack and also survives D/A conversion and other signal processing techniques.


Table 10. Psychoacoustic modeling and spread spectrum



Computational Complexity Low


Bitrate of Encoded Audio


Reliable Detection





Currently, there are many companies promoting their particular audio watermarking solution. To date derived from various corporate web sites; the author ascertained which techniques each company was using. Table 11 lists these overall approaches.


Table 11. Some corporate techniques

Solana Technology

Frequency Hopping Spread Spectrum

Aris Technologies

Direct Sequence Phase Keying


Spread Spectrum and Psychoacoustic Masking


Perceptual Convolution and FFT Analysis


Spread Spectrum and Psychacoustic Masking



The ISO MPEG-4 group has rejected standardization of audio watermarking in the multimedia standard. If one standardizes watermarking, one must also publish, in detail, how watermarking is done, in order for all interested parties to build such a device. This fact defeats the purpose of watermarking, for it gives the solution to all who can read the standard and run the verification model code provided by the MPEG committee. Also, one may not necessarily desire one particular watermarking method in certain applications. Therefore, convergence on one methodology will most likely occur in product consortiums, such as the DVD audio group.

Currently, the Copy Protection Technical Working Group is requesting industry to meet contradictory goals. Ultra low complexity in watermark detection, yet the best system industry can provide. The Data Hiding SubGroup, a Cross-industry body, has listed 13 goals deemed essential to the protection scheme (Yoshida, 1998b). These goals are listed in table 12.


Table 12: DHSG digital-video watermarking technology goals


Low-cost digital Detection

Digital-detection domain

Generation copy control for one copy

Low false-positive detection

Reliable detection

Watermark will survive normal video processing in consumer use

Licensable under reasonable terms

Export/import status

Technical maturity

Data payload- watermark system should carry at least 8 bits of information

Minimum impact on content preparation

Data rate-minimum 11.08Mbytes/s to 25Mbytes/s to 270 Mbytes for video



The International Federation of the Phonographic Industry (IFPI, 1998) is currently searching for audio watermarking algorithms for various audio applications. This activity is ongoing in the MUSE project. Currently, there is no activity to merge multimedia watermarking into one solution (Yoshida, 1998).

Table 13: MUSE project test conditions

Two Successive D/A and A/D Conversions

Stead-State Time Compression and Time Expansion of 10 Percent

Data Reduction Coding Techniques such as:

MPEG(all versions)

Adaptive Transform Coding(ATRAC)

Adaaptive Subband Coding

Digital Audio Broadcasting(DAB)

Dolby AC2 and AC3 systems

Multiband Non-linear Amplitude Compression (applied to Broadcast systems)

Additive and Multiplicative Noise

Add Second Embedded Signal, using same system to Single Program Fragment

Frequency Response Distortion (Bass, Mid, Treble, +- 15dB)

Group-delay Distortions according to:

Frequency(500Hz, 1kHz, 2kHz, 4kHz, 8kHz)

Group Delay


Frequency Notches with Frequency Hopping Placement




Audio Watermarking is now coming to be a critical strategy technology to insure theft does not occur of artistic works. The business model that relied on the failings of technology to inadequately reproduce copies of original works is no longer valid. Some watermarking techniques have been presented. Yet, the application and widespread use of digital audio watermarking is still under standardization and development. Certainly for the music and film industry to continue in the manner of allowing artists to receive payment for their contributions to humanity, digital watermarking must be incorporated into existing products. Else, we return to the days where artists are not properly acknowledged and supported for their enhancement of the human condition.



Bassia, P.; Pitas, I. 1998. Robust Audio Watermarking in the time-domain, to appear in Proc. of EUSIPCO'98, Rhodes, Greece.

Neubauer, C.; Herre, J. 1998. Digital Watermarking and its Influence on Audio Quality. 105th AES Convention, San Fransisco, California. Preprint 4823.

Neubauer, C.; Herre, J.; Brandenburg, K. 1998. Continuous Stenographic Data Transmission Using Uncompressed Audio. Workshop on Information Data Hiding. Portland, Oregon.

Lacy, J.; Quackenbush R.; Reibman A.; Shur D.; Snyder J. 1998. On Combining Watermarking with Perceptual Coding. ICASSP Seattle, Washington. MMSP1.9.

Lacy, J.; Reibman, A.; Snyder, J. 1997. Watermarking as a Protection Mechanism for IPR in MPEG-4. Fribourg, Switzerland ,ISO/IEC JTC1/SC29/WG11/M2829.

Boney L.; Tewfik A. H.; Hamdy K. N. 1996 Digital Watermarks for Audio Signals. EUSIPCO-96, VIII European Signal Proc. Conf., Trieste, Italy. (Patent pending)

Bender, W.; Gruhl, D.; Morimoto N.; Lu A. 1996. Techniques for Data Hiding. IBM Systems Journal, Vol. 35, Nos. 3&4.

Cox, I. J; Kilian J.; Leithton T.; Shamoon T. 1995. Secure Spread Spectrum Watermarking for Multimedia NEC Research Institute, Technical Report, 95-10.

Moskowitz, S. 1998. So This is Convergence? Technical, Economic, Legal, Crypto-graphic, and Philosophical Considerations for Secure Implementations of Digital Watermarking. Available via WWW: <URL:>.

Cox I. J; Kilian J.; Leithton T.; Shamoon T. 1995. Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking, 7th Int. Conference on Signal Processing Applications & Technology, Boston MA, pp. 476-480, 7-10.

Bosi, M.; Brandenburg, K.; Quackenbush, S.; Dietz, M.; Johnston, J.; Herre, J.; Fuchs, H.; Oikawa, Y.; Akagiri, K.; Coleman, M.; Iwadare, M.; Lueck, C.; Gbur, U.; Teichmann, B. 1996 IS 13818-7 (MPEG-2 Advanced Audio Coding, AAC).

Yoshida, J. 1998. Watermark Scheme Seeks Contradictory Goals, EETimes, p.50.

Yoshida, J. 1998b Digital World Divided on Watermark Specs, EETimes, p. 1.

Linnartz, J.; 1998 Fingerprinting and Watermarking. Available via WWW: <URL: http://diva.EECS.Berkeley.EDU:80/~linnartz/>.

IFPI 1998. International Federation of the Phonographic Industry Available via WWW: <URL:>.

ISO/IEC JTC1/SC29/WG11 1993. MPEG, International Standard IS 11172-3 Coding of Moving Pictures and Associated Audio for Digital Storage Media up to 1.5 Mb/s, Part 3: Audio.

Webreference, 1998. Watermarking Companies and Articles. Available via WWW: <URL:>.

Zhao, J. 1997. Look its not There - Digital Watermarking. Byte Magazine, January. Available via WWW: <URL:>.

Tilki, J. F.; Beex, A. A. 1996. Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking, 7th International Conference on Signal Processing Applications and Technology, Boston MA, pp. 476-480, 7-10.

Lathi, B. P. 1998. Modern Digital and Analog Communication Systems, 3rd Edition, Oxford, pp. 179-182.

Oppenheim, A. V.; Shaffer, R. W. 1989. Discrete-Time Signal Processing, Prentice-Hall, Inc., Englewood Cliffs, New Jersey.

Proakis, J. G. 1988. Digital Communications, 3rd Edition, McGraw-Hill, New York.

Ziemer, R.; Tranter, W. 1990. Principles of Communications, 3rd Edition, Houghton Mifflin, Boston.

Kahrs, M.; Brandenburg, K. 1998. Applications of Digital Signal Processing to Audio and Acoustics, Kluwer, Boston.