Method and System for Data-Hiding Within Audio Transmissions
20190189135 ยท 2019-06-20
Assignee
Inventors
Cpc classification
G10L19/018
PHYSICS
International classification
Abstract
A method for hiding data within cover audio uses a set of sample codebook waveforms that are each assigned a unique representative digit value. A hidden data sequence representing the data is formed from the waveforms by concatenation of the waveforms assigned to the digit values of the data. The sequence is superimposed upon segments of the cover audio at a fractional amplitude. After transmission, the received signal is decompressed if necessary, the hidden data sequence is recovered from the cover audio, and the data is recovered from the hidden data sequence. This may be done by recovering the locations of the codebook waveforms and interpolating the time markers of the locations. The recovered data may be cleaned up by using estimated distances between successive cross-correlations to discard extraneous correlation peaks and sequence recurrence to probabilistically delete overlapping correlation peaks.
Claims
1. A method for hiding data within cover audio, comprising the steps of: choosing a set of sample codebook waveforms; assigning a unique representative digit value to each codebook waveform in the set; based on the codebook waveform representative digit values, forming, from the codebook waveforms, a hidden data sequence representing the data; and repeatedly superimposing the hidden data sequence upon segments of cover audio at a fraction of the amplitude of the cover audio.
2. The method of claim 1, further comprising the steps of: transmitting the cover audio with superimposed hidden data sequence; receiving the transmitted cover audio with superimposed hidden data sequence; recovering the hidden data sequence from the received cover audio with superimposed hidden data sequence; and recovering the data from the hidden data sequence.
3. The method of claim 2, wherein the step of recovering further comprises: recovering the locations of the codebook waveforms; and interpolating the time markers of the locations to determine the transmitted data sequence.
4. The method of claim 3, wherein the locations of the codebook waveforms are recovered by matched filtering.
5. The method of claim 2, further comprising the step of cleaning up the recovered data by using estimated distances between successive cross-correlations to discard extraneous correlation peaks and sequence recurrence to probabilistically delete overlapping correlation peaks.
6. The method of claim 1, further comprising the steps of: repeatedly segmenting the cover audio to match the size of the hidden data sequence for the step of superimposing; and reconstructing the cover audio as a continuous stream prior to transmission.
7. The method of claim 1, wherein each codebook waveform in the set is a short spoken word.
8. The method of claim 1, wherein the hidden data sequence is formed by concatenation of the codebook waveforms for the representative digit values of the data.
9. The method of claim 2, further comprising the steps of: compressing the cover audio with superimposed hidden data sequence prior to the step of transmitting; and decompressing the received compressed cover audio with superimposed hidden data sequence prior to the step of recovering.
10. A system for sending hidden data within cover audio, comprising: a codebook waveform selection application configured to select a set of codebook waveforms and assign a representative data value to each codebook waveform; a hidden data sequence generator configured to form a hidden data sequence by concatenating codebook waveforms according to their associated representative data value to represent the data to be hidden; a cover audio with superimposed hidden data sequence signal generator configured to repeatedly superimpose the hidden data sequence upon segments of cover audio at a fraction of the amplitude of the cover audio; and a hidden data recovery application configured to recover the hidden data sequence from the cover audio with superimposed hidden data sequence and to recover the data to be hidden from the hidden data sequence.
11. The system of claim 10, further comprising: a transmitter configured for transmitting the cover audio with superimposed hidden data sequence; and a receiver configured for receiving the transmitted cover audio with superimposed hidden data sequence.
12. The system of claim 10, wherein the hidden data recovery application is further configured to recover the locations of the codebook waveforms and interpolate the time markers of the locations to determine the transmitted hidden data sequence.
13. The system of claim 12, wherein the hidden data recovery application is configured to recover the locations of the codebook waveforms by matched filtering.
14. The system of claim 12, wherein the hidden data recovery application is further configured to clean up the recovered data by using estimated distances between successive cross-correlations to discard extraneous correlation peaks and sequence recurrence to probabilistically delete overlapping correlation peaks.
15. The method of claim 10, wherein the cover audio with superimposed hidden data sequence signal generator is further configured to repeatedly segment the cover audio to match the size of the hidden data sequence for the step of superimposing and to reconstruct the cover audio as a continuous stream prior to transmission.
16. The system of claim 10, wherein each codebook waveform in the set is a short spoken word.
17. The system of claim 10, further comprising applications configured for compressing the cover audio with superimposed hidden data sequence prior to transmission and decompressing the received compressed cover audio with superimposed hidden data sequence prior to recovery of the hidden data sequence.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] Other aspects, advantages and novel features of the invention will become more apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings, wherein:
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
DETAILED DESCRIPTION
[0021] In the present invention, the task of embedding data within cover phone audio to be transmitted and recovered by a receiving party is treated as a steganography problem, but with a critical difference. Phone audio must undergo compression via GSM standard speech codecs, and the data embedding must be capable of surviving the compression protocol [Sun, Lingfen, et al., Speech Compression, Guide to Voice and Video over IP, Springer London, 2013, pp. 17-51; Hanzo, Lajos, F. Clare A. Somerville, and Jason Woodard, Voice and audio compression for wireless communications, John Wiley and Sons, 2008].
[0022] While most standard data hiding techniques fail in the face of speech compression, the invention presents a simple but effective alternativeusing voice itself as the medium for embedding and recovering critical data. The method operates on three unique principles: 1) It relaxes the constraint on inaudibility, while still not impeding the quality of the transmitted cover audio; 2) It operates independently of the internal specifications of standard speech codecs, treating speech compression as a black box; and 3) It capitalizes on the most important behavioral component of speech codecsthat they are designed to preserve only what appears to be speech.
[0023] In order for data to be exchanged via a representation that is distinct from its original form, common information is required by both the transmitting and receiving parties. For example, both compression codecs and popular coding techniques require the notion of a codebook, an established agreement on both sides about the meaning of the signals chosen to be communicated. The present method is a simple adaptation of this concept into a previously unexplored space, one that specifically uses human speech samples as the code.
[0024] A preferred embodiment of the method of the invention uses speech itself as a medium for data embedding. The four basic steps of this embodiment comprise:
[0025] Step 1. Sample waveforms of short, spoken words, belonging to the English language or any other, are chosen as codebook waveforms. These waveforms are chosen ahead of transmission and are agreed upon on both the transmitting and receiving ends of the channel.
[0026] Step 2. The codebook waveforms are assigned representative digit values (such as, but not limited to, 0, 1, and 2 in a base 3 sequence) and the sequence representing the hidden data intended to be transmitted is then formed by concatenation.
[0027] Step 3. The concatenated sequence from Step 2 is repeatedly superimposed upon segments of speech or noise that are being additionally transmitted through the audio channel, at a fraction of the amplitude of this cover audio. The cover audio is repeatedly segmented to match the size of the hidden data sequence for the purpose of superimposition and then reconstructed as a continuous stream prior to being fed to the compression codec for transmission.
[0028] Step 4. On the receiving end, the locations of the codebook waveforms in the data stream are recovered by matched filtering, and the time markers of the locations are interpolated to determine the transmitted data sequence. Given a priori knowledge of the length of the data sequence, the interpolation uses iterative peak finding to search for the minimum number of required digits. The recovered data sequence is then cleaned-up by using the estimated distances between successive cross-correlations to discard extraneous correlation peaks, and sequence recurrence is used to probabilistically delete overlapping correlation peaks.
[0029]
[0030]
[0031]
[0032] This approach has several important properties. First, the audio superimposition and cross-correlation are simple signal processing operations that can be implemented in software at either end of the transmission and receiving networks, entirely independent of existing infrastructure. Second, it requires fairly low rate data embedding for robust recovery. As shown in
[0033] It is important to note that the method, as it is presented, does not include any higher order Error Correcting Code (ECC) as might be used in other transmission protocolssuch codes can be applied to improve the recovery accuracy, but is not a required component of the approach delineated here. It is clear, however, that use of error correcting codes in conjunction with the present invention is within the ability of one of skill in the art and may be advantageously applied to the present invention.
[0034] Primary Characterization.
[0035] In order to study the methodology of the invention, particularly to understand the trade-off between perceptibility and accuracy, software simulations of the entire pipeline were developed and tested. For the purpose of demonstration, the Adaptive Multi-Rate (AMR) Codec standard was chosen for the compression process, and recordings of the Harvard Sentence Set from the PN/NC corpus database [McCloy, D. R., Souza, P. E., Wright, R. A., Haywood, J., Gehani, N., and Rudolph, S., The PN/NC corpus, Version 1.0, 2013] were chosen as cover speech samples.
[0036] An initial experiment sheds light on the relationship between the fractional amplitude of an embedded data byte and the bitwise accuracy of its recovery after AMR compression, as well the relationship between the fractional lengths of the codewords used and the resulting bitwise accuracy, as shown in
[0037] As expected, the greater the data amplitude, the higher the recovery accuracy. Without any form of higher level Error-Correcting Code, the figure indicates that the system can operate with code words embedded at roughly 20-30 percent of the amplitude of the cover audio, while achieving raw bit recovery accuracies of more than 80 percent. The plot in
[0038] Parameter Optimization.
[0039] Choosing codewords. The method according to the invention is extremely broad in scope, and exposes several parameters that can be optimized in light of the aforementioned constraints, including what words should be chosen as the codewords.
[0040] Reducing Perceptibility.
[0041] The notion of perceptibility assigned to a string of codewords, or the degree to which the data embedding inhibits understanding of the cover speech, is determined by their amplitude in relation to the cover speech, their pitch, and their length. Shortening a set of chosen code words arbitrarily makes them less intelligible; lowering or raising their pitch in relation to the cover speech might make them appear like background noise or indistinct chatter; and lowering their amplitude makes them less observable. In order to choose optimal values for these parameters as part of a complete presentation of this technique, a function level optimization utilizing Powell's method was run on a base two data embedding scheme simulation [Gershenfeld, Neil A. The nature of mathematical modeling. Cambridge university press, 1999]. That is, two of the highest performing waveforms from the optimization experiment above were chosen to represent a 0 value bit and a 1 value bit, and a cost function negatively weighting amplitude, pitch, and length while positively weighting system accuracy was optimized. The cost function is:
f=w.sub.accA(p.sub.0, p.sub.1, e.sub.0, e.sub.1, l.sub.0, l.sub.1) w.sub.p[p.sub.0|p.sub.1]w.sub.e[e.sub.0|e.sub.1]w.sub.l[l.sub.0+l.sub.1]
where p.sub.i, e.sub.i, and l.sub.i represent the unitless fractional parameter values for pitch, amplitude, and length of the respective code waveforms; A represents the resulting bitwise recovery accuracy as a function of parameters p, e, and l, and w.sub.acc, w.sub.p, w.sub.e, w.sub.l represent the variable weights assigned to the system accuracy and these parameters respectively in the cost function.
[0042] Evaluating the optimizations for varying combinations of parameter weights w.sub.x permits examination of the performance of the system under different desired conditions. For example, the recovery of a single byte using parameters optimized for a weighting of w.sub.acc=0.7 and w.sub.p=w.sub.e=w.sub.1=0.1 results in 100 percent bitwise recovery accuracy; whereas a weighting of w.sub.acc=0.1 and w.sub.p=w.sub.1=0.4, w.sub.e=0.1 results in a 60 percent bitwise recovery accuracy. Table 1 presents example optimal parameter value results for sample weight combinations.
TABLE-US-00001 TABLE 1 Accuracy Pitch Energy Length Pitch Pitch Energy Energy Length Weight Weight Weight Weight Cost Value 0 Value 1 Value 0 Value 1 Value 0 0.3 0.2 0.4 0.1 0.573586 1.000000 1.000000 0.100004 0.100001 0.464122 1 0.7 0.1 0.1 0.1 0.835804 1.000000 1.000000 0.100000 0.100000 0.441956 2 0.3 0.3 0.3 0.1 0.799304 1.000000 1.000000 0.100000 0.100004 0.406945 3 0.1 0.4 0.1 0.4 0.793253 0.998777 0.999517 0.100000 0.104530 0.112127 4 0.9 0.0 0.1 0.0 0.878487 0.623762 0.427608 0.100000 0.115133 0.588911 5 0.5 0.1 0.3 0.1 0.599322 1.000000 0.999997 0.100000 0.100000 0.406772
[0043] Physical Implementation.
[0044] The method of the invention provides a simple data hiding technique for the low-rate transmission of critical information in phone channel audio, by using voice samples as a medium for embedding and recovery. The method is not sophisticated or infrastructurally demanding; it should be easily implementable by one of skill in the art having a knowledge of software development and audio signal processing.
[0045]
[0046]
[0047] While preferred embodiments of the invention are disclosed in the attached materials, many other implementations will occur to one of ordinary skill in the art and are all within the scope of the invention. Each of the various embodiments described may be combined with other described embodiments in order to provide multiple features. Furthermore, while the attached materials describe a number of separate embodiments of the apparatus and method of the present invention, what has been described is merely illustrative of the application of the principles of the present invention. Other arrangements, methods, modifications, and substitutions by one of ordinary skill in the art are therefore also considered to be within the scope of the present invention.