Method and System for Data-Hiding Within Audio Transmissions

20190189135 ยท 2019-06-20

Assignee

Inventors

Cpc classification

International classification

Abstract

A method for hiding data within cover audio uses a set of sample codebook waveforms that are each assigned a unique representative digit value. A hidden data sequence representing the data is formed from the waveforms by concatenation of the waveforms assigned to the digit values of the data. The sequence is superimposed upon segments of the cover audio at a fractional amplitude. After transmission, the received signal is decompressed if necessary, the hidden data sequence is recovered from the cover audio, and the data is recovered from the hidden data sequence. This may be done by recovering the locations of the codebook waveforms and interpolating the time markers of the locations. The recovered data may be cleaned up by using estimated distances between successive cross-correlations to discard extraneous correlation peaks and sequence recurrence to probabilistically delete overlapping correlation peaks.

Claims

1. A method for hiding data within cover audio, comprising the steps of: choosing a set of sample codebook waveforms; assigning a unique representative digit value to each codebook waveform in the set; based on the codebook waveform representative digit values, forming, from the codebook waveforms, a hidden data sequence representing the data; and repeatedly superimposing the hidden data sequence upon segments of cover audio at a fraction of the amplitude of the cover audio.

2. The method of claim 1, further comprising the steps of: transmitting the cover audio with superimposed hidden data sequence; receiving the transmitted cover audio with superimposed hidden data sequence; recovering the hidden data sequence from the received cover audio with superimposed hidden data sequence; and recovering the data from the hidden data sequence.

3. The method of claim 2, wherein the step of recovering further comprises: recovering the locations of the codebook waveforms; and interpolating the time markers of the locations to determine the transmitted data sequence.

4. The method of claim 3, wherein the locations of the codebook waveforms are recovered by matched filtering.

5. The method of claim 2, further comprising the step of cleaning up the recovered data by using estimated distances between successive cross-correlations to discard extraneous correlation peaks and sequence recurrence to probabilistically delete overlapping correlation peaks.

6. The method of claim 1, further comprising the steps of: repeatedly segmenting the cover audio to match the size of the hidden data sequence for the step of superimposing; and reconstructing the cover audio as a continuous stream prior to transmission.

7. The method of claim 1, wherein each codebook waveform in the set is a short spoken word.

8. The method of claim 1, wherein the hidden data sequence is formed by concatenation of the codebook waveforms for the representative digit values of the data.

9. The method of claim 2, further comprising the steps of: compressing the cover audio with superimposed hidden data sequence prior to the step of transmitting; and decompressing the received compressed cover audio with superimposed hidden data sequence prior to the step of recovering.

10. A system for sending hidden data within cover audio, comprising: a codebook waveform selection application configured to select a set of codebook waveforms and assign a representative data value to each codebook waveform; a hidden data sequence generator configured to form a hidden data sequence by concatenating codebook waveforms according to their associated representative data value to represent the data to be hidden; a cover audio with superimposed hidden data sequence signal generator configured to repeatedly superimpose the hidden data sequence upon segments of cover audio at a fraction of the amplitude of the cover audio; and a hidden data recovery application configured to recover the hidden data sequence from the cover audio with superimposed hidden data sequence and to recover the data to be hidden from the hidden data sequence.

11. The system of claim 10, further comprising: a transmitter configured for transmitting the cover audio with superimposed hidden data sequence; and a receiver configured for receiving the transmitted cover audio with superimposed hidden data sequence.

12. The system of claim 10, wherein the hidden data recovery application is further configured to recover the locations of the codebook waveforms and interpolate the time markers of the locations to determine the transmitted hidden data sequence.

13. The system of claim 12, wherein the hidden data recovery application is configured to recover the locations of the codebook waveforms by matched filtering.

14. The system of claim 12, wherein the hidden data recovery application is further configured to clean up the recovered data by using estimated distances between successive cross-correlations to discard extraneous correlation peaks and sequence recurrence to probabilistically delete overlapping correlation peaks.

15. The method of claim 10, wherein the cover audio with superimposed hidden data sequence signal generator is further configured to repeatedly segment the cover audio to match the size of the hidden data sequence for the step of superimposing and to reconstruct the cover audio as a continuous stream prior to transmission.

16. The system of claim 10, wherein each codebook waveform in the set is a short spoken word.

17. The system of claim 10, further comprising applications configured for compressing the cover audio with superimposed hidden data sequence prior to transmission and decompressing the received compressed cover audio with superimposed hidden data sequence prior to recovery of the hidden data sequence.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] Other aspects, advantages and novel features of the invention will become more apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings, wherein:

[0014] FIG. 1 is a simplified flow chart of a preferred embodiment of a method according to one aspect of the invention.

[0015] FIG. 2 is an illustration of an example implementation of the first part of the method of FIG. 1, according to one aspect of the invention.

[0016] FIG. 3 is an illustration of an example implementation of the second part of the method of FIG. 1, according to one aspect of the invention.

[0017] FIG. 4 is a graph of the relationship between the raw bit error of a recovered sample byte of data and the ratio between the amplitudes of the encoded code words and the cover audio across various lengths of the code waveforms, as fractions of their original length, for an example implementation of the invention.

[0018] FIG. 5 is a graph demonstrating the performance of a variety of sample codewords, chosen across several languages, against English language cover samples, for an example implementation of the invention.

[0019] FIG. 6 is a simplified block diagram of a preferred embodiment of a system for sending hidden data within cover audio, according to one aspect of the invention.

[0020] FIG. 7 is a schematic representation of an overview of an example implementation of infrastructure for employing the data hiding technique according to the invention in an emergency services context.

DETAILED DESCRIPTION

[0021] In the present invention, the task of embedding data within cover phone audio to be transmitted and recovered by a receiving party is treated as a steganography problem, but with a critical difference. Phone audio must undergo compression via GSM standard speech codecs, and the data embedding must be capable of surviving the compression protocol [Sun, Lingfen, et al., Speech Compression, Guide to Voice and Video over IP, Springer London, 2013, pp. 17-51; Hanzo, Lajos, F. Clare A. Somerville, and Jason Woodard, Voice and audio compression for wireless communications, John Wiley and Sons, 2008].

[0022] While most standard data hiding techniques fail in the face of speech compression, the invention presents a simple but effective alternativeusing voice itself as the medium for embedding and recovering critical data. The method operates on three unique principles: 1) It relaxes the constraint on inaudibility, while still not impeding the quality of the transmitted cover audio; 2) It operates independently of the internal specifications of standard speech codecs, treating speech compression as a black box; and 3) It capitalizes on the most important behavioral component of speech codecsthat they are designed to preserve only what appears to be speech.

[0023] In order for data to be exchanged via a representation that is distinct from its original form, common information is required by both the transmitting and receiving parties. For example, both compression codecs and popular coding techniques require the notion of a codebook, an established agreement on both sides about the meaning of the signals chosen to be communicated. The present method is a simple adaptation of this concept into a previously unexplored space, one that specifically uses human speech samples as the code.

[0024] A preferred embodiment of the method of the invention uses speech itself as a medium for data embedding. The four basic steps of this embodiment comprise:

[0025] Step 1. Sample waveforms of short, spoken words, belonging to the English language or any other, are chosen as codebook waveforms. These waveforms are chosen ahead of transmission and are agreed upon on both the transmitting and receiving ends of the channel.

[0026] Step 2. The codebook waveforms are assigned representative digit values (such as, but not limited to, 0, 1, and 2 in a base 3 sequence) and the sequence representing the hidden data intended to be transmitted is then formed by concatenation.

[0027] Step 3. The concatenated sequence from Step 2 is repeatedly superimposed upon segments of speech or noise that are being additionally transmitted through the audio channel, at a fraction of the amplitude of this cover audio. The cover audio is repeatedly segmented to match the size of the hidden data sequence for the purpose of superimposition and then reconstructed as a continuous stream prior to being fed to the compression codec for transmission.

[0028] Step 4. On the receiving end, the locations of the codebook waveforms in the data stream are recovered by matched filtering, and the time markers of the locations are interpolated to determine the transmitted data sequence. Given a priori knowledge of the length of the data sequence, the interpolation uses iterative peak finding to search for the minimum number of required digits. The recovered data sequence is then cleaned-up by using the estimated distances between successive cross-correlations to discard extraneous correlation peaks, and sequence recurrence is used to probabilistically delete overlapping correlation peaks.

[0029] FIG. 1 is a simplified flow chart of a preferred embodiment of a method for transmitting hidden data according to one aspect of the invention. As shown in FIG. 1, sample waveforms of short, spoken words are chosen 110 as codebook waveforms. The selected codebook waveforms are assigned 120 representative digit values and the sequence representing the hidden data is formed. The hidden data sequence is repeatedly superimposed 130 upon segments of cover audio, at a fraction of the amplitude of the cover audio, and transmitted. Once the transmitted data is received, the locations of the codebook waveforms are recovered and the time markers of the locations are interpolated to recover 140 the transmitted data sequence, which is then cleaned up.

[0030] FIG. 2 is an illustration of an example implementation of Steps 1-2 of the method according to the invention. In the base 3 sequence example shown in FIG. 2, waveforms 210, 215, 220 of corresponding spoken words 230, 235, 240 across different languages are respectively mapped to representative digits 250, 255, 260, and are then concatenated 270 in the order that matches the final data sequence intended to be transmitted.

[0031] FIG. 3 is an illustration of an example implementation of Steps 3-4 of the method according to the invention, as applied to the example of FIG. 2. As shown in FIG. 3, the concatenated audio sequence 310 is superimposed upon the cover audio 320, forming the combined signal 330 to be transmitted. This signal is compressed 340 and decompressed 345 via the speech codec 350 (Adaptive Multi-Rate in this example) to obtain the cross-correlation post compression 360, 365, 370 for each digit value. The locations of individual samples are then obtained by matched filtering to reconstruct the original data sequence.

[0032] This approach has several important properties. First, the audio superimposition and cross-correlation are simple signal processing operations that can be implemented in software at either end of the transmission and receiving networks, entirely independent of existing infrastructure. Second, it requires fairly low rate data embedding for robust recovery. As shown in FIG. 2, the chosen samples are on the order of one second in length, although this parameter can be varied. Additionally, the plots of FIGS. 2 and 3 demonstrate variability in performance between the codewords; for example, the word chosen to represent the digit 2 in FIG. 3 has a poorer signal-to-noise ratio than the codewords representing digits 0 and 1, as indicated by the cross-correlation signals. This raises a question pertaining to the choice of effective codewords. Finally, the superimposition of one source of audio upon another in this manner almost guarantees perceptibility. However, many parameters pertaining to the codebook waveforms themselves, such as amplitude, pitch, and length, can be varied to minimize perceptibility or distinct identification of the chosen codewords.

[0033] It is important to note that the method, as it is presented, does not include any higher order Error Correcting Code (ECC) as might be used in other transmission protocolssuch codes can be applied to improve the recovery accuracy, but is not a required component of the approach delineated here. It is clear, however, that use of error correcting codes in conjunction with the present invention is within the ability of one of skill in the art and may be advantageously applied to the present invention.

[0034] Primary Characterization.

[0035] In order to study the methodology of the invention, particularly to understand the trade-off between perceptibility and accuracy, software simulations of the entire pipeline were developed and tested. For the purpose of demonstration, the Adaptive Multi-Rate (AMR) Codec standard was chosen for the compression process, and recordings of the Harvard Sentence Set from the PN/NC corpus database [McCloy, D. R., Souza, P. E., Wright, R. A., Haywood, J., Gehani, N., and Rudolph, S., The PN/NC corpus, Version 1.0, 2013] were chosen as cover speech samples.

[0036] An initial experiment sheds light on the relationship between the fractional amplitude of an embedded data byte and the bitwise accuracy of its recovery after AMR compression, as well the relationship between the fractional lengths of the codewords used and the resulting bitwise accuracy, as shown in FIG. 4. To generate this data, ten codeword samples were used to embed a single byte across thirty cover speech samples at each fractional amplitude value, the results averaged. FIG. 4 demonstrates the relationship between the raw bit error 410 of a recovered sample byte of data and the ratio between the amplitudes of the encoded code words and the cover audio (fraction of amplitude of cover speech) 420 for samples 430, 440, 450, 460, 470 of length 0.1, 0.3, 0.5, 0.8, and 1.0, respectively. This relationship is shown across various lengths of the code waveforms, as fractions of their original length.

[0037] As expected, the greater the data amplitude, the higher the recovery accuracy. Without any form of higher level Error-Correcting Code, the figure indicates that the system can operate with code words embedded at roughly 20-30 percent of the amplitude of the cover audio, while achieving raw bit recovery accuracies of more than 80 percent. The plot in FIG. 4 also illustrates the tradeoff between data rate and perceptibilityusing shorter segments of the code waveforms allows for a greater data rate, but a greater amplitude is necessary to maintain recovery accuracy. Moreover, the plot in FIG. 4 demonstrates that data can be embedded at amplitudes as low as 20 to 30 percent of the cover speech amplitude with greater than 50 percent of the original code waveform used to obtain a raw bitwise accuracy of at least 80 percent for a single byte.

[0038] Parameter Optimization.

[0039] Choosing codewords. The method according to the invention is extremely broad in scope, and exposes several parameters that can be optimized in light of the aforementioned constraints, including what words should be chosen as the codewords. FIG. 5 is a graph demonstrating the performance of a variety of sample codewords, chosen across five languages, against English language cover samples. Codewords sampled from five different languages, namely English, Arabic, Mandarin, Tamizh, and French, were tested against thirty cover speech samples from the Harvard Sentence Set [McCloy, D. R., Souza, P. E., Wright, R. A., Haywood, J., Gehani, N., and Rudolph, S., The PN/NC corpus, Version 1.0, 2013] through the developed simulation pipeline. Apart from words in English with uncommon sounds, such as words containing the letter z, foreign language codewords outperformed English language codewords against English cover speech.

[0040] Reducing Perceptibility.

[0041] The notion of perceptibility assigned to a string of codewords, or the degree to which the data embedding inhibits understanding of the cover speech, is determined by their amplitude in relation to the cover speech, their pitch, and their length. Shortening a set of chosen code words arbitrarily makes them less intelligible; lowering or raising their pitch in relation to the cover speech might make them appear like background noise or indistinct chatter; and lowering their amplitude makes them less observable. In order to choose optimal values for these parameters as part of a complete presentation of this technique, a function level optimization utilizing Powell's method was run on a base two data embedding scheme simulation [Gershenfeld, Neil A. The nature of mathematical modeling. Cambridge university press, 1999]. That is, two of the highest performing waveforms from the optimization experiment above were chosen to represent a 0 value bit and a 1 value bit, and a cost function negatively weighting amplitude, pitch, and length while positively weighting system accuracy was optimized. The cost function is:


f=w.sub.accA(p.sub.0, p.sub.1, e.sub.0, e.sub.1, l.sub.0, l.sub.1) w.sub.p[p.sub.0|p.sub.1]w.sub.e[e.sub.0|e.sub.1]w.sub.l[l.sub.0+l.sub.1]

where p.sub.i, e.sub.i, and l.sub.i represent the unitless fractional parameter values for pitch, amplitude, and length of the respective code waveforms; A represents the resulting bitwise recovery accuracy as a function of parameters p, e, and l, and w.sub.acc, w.sub.p, w.sub.e, w.sub.l represent the variable weights assigned to the system accuracy and these parameters respectively in the cost function.

[0042] Evaluating the optimizations for varying combinations of parameter weights w.sub.x permits examination of the performance of the system under different desired conditions. For example, the recovery of a single byte using parameters optimized for a weighting of w.sub.acc=0.7 and w.sub.p=w.sub.e=w.sub.1=0.1 results in 100 percent bitwise recovery accuracy; whereas a weighting of w.sub.acc=0.1 and w.sub.p=w.sub.1=0.4, w.sub.e=0.1 results in a 60 percent bitwise recovery accuracy. Table 1 presents example optimal parameter value results for sample weight combinations.

TABLE-US-00001 TABLE 1 Accuracy Pitch Energy Length Pitch Pitch Energy Energy Length Weight Weight Weight Weight Cost Value 0 Value 1 Value 0 Value 1 Value 0 0.3 0.2 0.4 0.1 0.573586 1.000000 1.000000 0.100004 0.100001 0.464122 1 0.7 0.1 0.1 0.1 0.835804 1.000000 1.000000 0.100000 0.100000 0.441956 2 0.3 0.3 0.3 0.1 0.799304 1.000000 1.000000 0.100000 0.100004 0.406945 3 0.1 0.4 0.1 0.4 0.793253 0.998777 0.999517 0.100000 0.104530 0.112127 4 0.9 0.0 0.1 0.0 0.878487 0.623762 0.427608 0.100000 0.115133 0.588911 5 0.5 0.1 0.3 0.1 0.599322 1.000000 0.999997 0.100000 0.100000 0.406772

[0043] Physical Implementation.

[0044] The method of the invention provides a simple data hiding technique for the low-rate transmission of critical information in phone channel audio, by using voice samples as a medium for embedding and recovery. The method is not sophisticated or infrastructurally demanding; it should be easily implementable by one of skill in the art having a knowledge of software development and audio signal processing.

[0045] FIG. 6 is a simplified block diagram of an example embodiment of a system for sending hidden data within cover audio. Shown in FIG. 6 are input data to be hidden 605, input cover audio 610, codebook waveform selection application 615, hidden data sequence generator 620, cover audio with superimposed hidden data sequence signal generator 625, transmitter 640, receiver 650, hidden data recovery application 660, output cover audio 680 and recovered hidden data 685.

[0046] FIG. 7 is an overview of an example implementation of infrastructure for employing the data hiding technique according to the invention in an emergency services context. In FIG. 7, victim 705 uses mobile phone 710 to call (voice audio 715) for help. Data-hiding application 720 on phone 710 retrieves location information from GPS application 725 and embeds it within audio transmission 740 to emergency receiving unit 750. Data Recovering application 755 on emergency receiving unit 750 breaks received transmission 740 into voice audio 760, which is sent to responder 765, and recovered location data 770, which is sent to responder 775. While a specific arrangement and division of components is shown in the example of FIG. 7, it will be clear to one of skill in the art of the invention that many other arrangements and divisions are suitable for employment with and in the invention.

[0047] While preferred embodiments of the invention are disclosed in the attached materials, many other implementations will occur to one of ordinary skill in the art and are all within the scope of the invention. Each of the various embodiments described may be combined with other described embodiments in order to provide multiple features. Furthermore, while the attached materials describe a number of separate embodiments of the apparatus and method of the present invention, what has been described is merely illustrative of the application of the principles of the present invention. Other arrangements, methods, modifications, and substitutions by one of ordinary skill in the art are therefore also considered to be within the scope of the present invention.