Distributed audience measurement systems and methods

Abstract

Systems and methods are disclosed for customizing, distributing and processing audio signature data. Examples disclosed herein include waking a portable device from an inactive state in response to an activation signal. Disclosed examples also include generating a first signature based on audio from a microphone of the portable device during a period of time specified by the activation signal. Disclosed examples further include comparing the first signature with a second signature obtained from the activation signal to determine a match score, and communicating a match result based on the match score to a server via a network.

Claims

1. A portable device comprising: a microphone; and a processor to execute computer readable instructions, the processor to at least: wake the portable device from an inactive state in response to an activation signal; generate a first signature based on audio from the microphone during a period of time specified by the activation signal; compare the first signature with a second signature obtained from the activation signal to determine a match score; and communicate a match result based on the match score to a server via a network.

2. The portable device of claim 1, wherein the activation signal is received by the portable device from the server via the network.

3. The portable device of claim 2, wherein the processor is to communicate an acknowledgement signal to the server via the network in response to the activation signal.

4. The portable device of claim 1, wherein the activation signal includes executable content to be executed by the processor to wake the portable device from the inactive state.

5. The portable device of claim 1, further including memory, wherein the processor is to store the second signature obtained from the activation signal in the memory.

6. The portable device of claim 1, wherein to compare the first signature with the second signature, the processor is to determine a difference between first characteristics of the first signature and second characteristics of the second signature to determine the match score.

7. The portable device of claim 6, wherein the processor is to compare the match score to a threshold to determine the match result.

8. A computer readable storage device comprising computer readable instructions that, when executed, cause a processor of a portable device to at least: wake the portable device from an inactive state in response to an activation signal; generate a first signature based on audio from a microphone of the portable device during a period of time specified by the activation signal; compare the first signature with a second signature obtained from the activation signal to determine a match score; and communicate a match result based on the match score to a server via a network.

9. The computer readable storage device of claim 8, wherein the activation signal is received by the portable device from the server via the network.

10. The computer readable storage device of claim 9, wherein the instructions cause the processor to communicate an acknowledgement signal to the server via the network in response to the activation signal.

11. The computer readable storage device of claim 8, wherein the activation signal includes executable content to be executed by the processor to wake the portable device from the inactive state.

12. The computer readable storage device of claim 8, wherein the instructions cause the processor to store the second signature obtained from the activation signal in memory of the portable device.

13. The computer readable storage device of claim 8, wherein to compare the first signature with the second signature, the instructions cause the processor to determine a difference between first characteristics of the first signature and second characteristics of the second signature to determine the match score.

14. The computer readable storage device of claim 13 wherein the instructions cause the processor to compare the match score to a threshold to determine the match result.

15. A method comprising: waking a portable device from an inactive state in response to an activation signal; generating, by executing an instructions with a processor of the portable device, a first signature based on audio from a microphone of the portable device during a period of time specified by the activation signal; comparing, by executing an instruction with the processor, the first signature with a second signature obtained from the activation signal to determine a match score; and communicating a match result based on the match score to a server via a network.

16. The method of claim 15, wherein the activation signal is received by the portable device from the server via the network, and further including communicating an acknowledgement signal to the server via the network in response to the activation signal.

17. The method of claim 15, wherein the activation signal includes executable content to be executed by the processor to wake the portable device from the inactive state.

18. The method of claim 15, further including storing the second signature obtained from the activation signal in memory of the portable device.

19. The method of claim 15, wherein the comparing of the first signature with the second signature includes determining a difference between first characteristics of the first signature and second characteristics of the second signature to determine the match score.

20. The method of claim 15, further including comparing the match score to a threshold to determine the match result.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a block diagram illustrating an exemplary system for extracting and matching audio fingerprints;

(2) FIG. 2 is a block diagram illustrating an example configuration for generating an audio fingerprint;

(3) FIG. 3 is a block diagram illustrating an example configuration for extracting features from an audio fingerprint;

(4) FIG. 4 is a block diagram illustrating an example system for distributing audio fingerprint processing; and

(5) FIG. 5 is an example flowchart illustrating a process for distributing audio fingerprint processing.

DETAILED DESCRIPTION

(6) FIG. 1 illustrates an example system for audio fingerprint extraction and matching for a media source 101 that communicates audio to one or more portable devices 102. Portable device 102 may be a cell phone, Personal Digital Assistant (PDA), media player/reader, computer laptop, tablet PC, or any other processor-based device that is known in the art, including a desktop PC and computer workstation. During operation, portable device 102 receives an activation signal 114 and a fingerprint for known audio content. Upon receiving the activation signal 114, portable device 102 becomes operative to sample room audio for a period of time specified by the activation signal. In some examples, the samples are obtained through a microphone (not shown) coupled to the portable device 102. Preferably, the activation signal 114 is tailored so that the period of time of activation corresponds to the length of the known media content.

(7) As an example, known media content may comprise a 20-second commercial. Typically, when commercials are communicated or transmitted, they are scheduled for set times during a programmers' normal content. If these times are known or estimated, an activation signal may be sent to the portable device 102 in advance (e.g., one minute before the commercial is scheduled to air). Activation signal 114 would also accompanied by a pre-recorded fingerprint of the known content, which could be received before, after, or simultaneously with activation signal 114. In the example of FIG. 1, the pre-recorded fingerprint would be stored in memory 112 associated with portable device 102. Activation signal 114 preferably includes executable content which forces portable device 102 to “wake up” if it is inactive, and/or otherwise prepare for sampling ambient audio prior to the scheduled commercial.

(8) Continuing with the example, once the portable device 102 downloads activation signal 114, the device begins sampling ambient sound in regular intervals for a predetermined time prior to the airing of the commercial (e.g., 10 seconds), and continues sampling for a predetermined time after the airing of the commercial (e.g., 10 seconds), for a total sampling time period (10 sec.+20 sec.+10 sec.=40 seconds). Once the total sampling time period expires, portable device 102 stops sampling and records the sampled data in memory 112, and forwards the sampled data to fingerprint formation module 103 of portable device 102.

(9) Fingerprint formation module 103 comprises an audio conversion module 104 and fingerprint modeling module 105. Audio conversion module 104 performs front-end processing on the sampled audio to convert the audio signal into a series of relevant features which are then forwarded to fingerprint modeling module 105. Module 105 performs audio modeling processing to define the final fingerprint representation such as a vector, a trace of vectors, a codebook, a sequence of indexes to Hidden Markov Model (HMM) sound classes, a sequence of error correcting words or musically meaningful high-level attributes. Further details regarding formation module 103 and modeling module 105 are discussed below.

(10) Once the fingerprint is formed, a matching module 106 determines whether a match exists between the signature formed from module 103 and the pre-recorded signature stored in memory 112. Look-up module 107 preferably comprises a similarity portion 108 and a search portion 109. Similarity portion 108 measures characteristics of the fingerprint formed from module 103 against the pre-recorded fingerprint using a variety of techniques. One such technique includes using a correlation metric for comparing vector sequences in the fingerprint. When the vector feature sequences are quantized, a Manhattan distance may be measured. In cases where the vector feature sequence quantization is binary, a Hamming distance may be more appropriate. Other techniques may be appropriate such as a “Nearest Neighbor” classification using a cross-entropy estimation, or an “Exponential Pseudo Norm” (EPN) metric to better distinguish between close and distant values of the fingerprints. Further details these metrics and decoding may be found in U.S. Pat. No. 6,973,574, titled “Recognizer of Audio Content In Digital Signals”, filed Apr. 24, 2001, and U.S. Pat. No. 6,963,975, titled “System and Method for Audio Fingerprinting”, filed Aug. 10, 2001, both of which are incorporated by reference in its entirety herein.

(11) In some examples, pre-recorded fingerprints may have pre-computed distances established among fingerprints registered in a central repository (not shown). By pre-computing distances among fingerprints registered in the repository, a data structure may be built to reduce or simplify evaluations made from signatures in memory 112 at the portable device end. Alternately, sets of equivalent classes may be pre-recorded for a given fingerprint, and forwarded to memory 112, where the portable device calculates similarities in order to discard certain classes, while performing more in-depth search for the remaining classes.

(12) Once a fingerprint is located from look-up module 107, confirmation module 110 “scores” the match to confirm that a correct identification has occurred. The matching “score” would relate to a threshold that the match would exceed in order to be determined as a “correct” match. The specific threshold would depend on the fingerprint model used, the discriminative information, and the similarity of the fingerprints in the memory 112. In some examples, memory 112 would be loaded with a limited database that would correlate to the fingerprints and/or other data received at the time of activation signal 114. Accordingly, unlike conventional fingerprinting systems, the matching and threshold configurations can be significantly simplified. Once a determination is made from matching module 106, the result (i.e., match/no match) is communicated from output 113 of portable device 102 to a central repository for storage and subsequent analysis.

(13) Turning to FIG. 2, an example audio conversion module 104 described above in FIG. 1, is illustrated in greater detail. As audio is received in fingerprint formation module 103 after activation, the audio is digitized (if necessary) in pre-processing module 120 and converted to a suitable format, for example, by using pulse-code modulation (PCM), so that where the magnitude of the audio is sampled regularly at uniform intervals (e.g., 5-44.1 KHz), then quantized to a series of symbols in a numeric (binary) code. Additional filtering and normalization may also be performed on the audio as well.

(14) Next, the processed audio is forwarded to framing module 121, where the audio is divided into frames, using a particular frame size (e.g. 10-100 ms), where the number of frames processed are determined according to a specified frame rate. An overlap function should also be applied to the frames to establish robustness of the frames in light of shifting for cases where the input audio data is not properly aligned to the original audio used for generating the fingerprint.

(15) The transform module 122 of FIG. 2 then performs a further signal processing on the audio to transform audio data from the time domain to the frequency domain to create new data point features for further processing. One particularly suited transform function is the Fast Fourier Transform (FFT) which is performed periodically with or without temporal overlap to produce successive frequency bins each having a frequency width. Other techniques are available for segregating the frequency components of the audio signals, such as a wavelet transform, discrete Walsh Hadamard transform, discrete Hadamard transform, discrete cosine transform (DCT), Modulated Complex transform (MCLT) as well as various digital filtering techniques.

(16) Once transformed, the audio data undergoes feature extraction 123 in order to generate the final acoustic vectors for the audio fingerprint. Further details of feature extraction module 123 will be discussed below in connection with FIG. 3. Once features are extracted, post-processing 124 may be performed on the audio data to provide quantization and normalization and to reduce distortions in the audio. Suitable post-processing techniques include Cepstrum Mean Normalization (CMN), mean subtraction and component-wise variance normalization. Once post-processing is completed, the audio data undergoes modeling 105, discussed above in connection with FIG. 1 and generates an audio fingerprint 125 that will subsequently be used for matching.

(17) FIG. 3 illustrates example feature extraction models for module 123. As a practical matter, only a single model is selected at a given time, where the type feature extraction is dependent upon the type of audio conversion being performed in module 104. Regardless of the type used, the feature extraction should be optimized to reduce the dimensionality and audio variance attributable to distortion. After undergoing transformation (e.g., FFT) from transform module 122, the audio spectrum scale bands are processed in module 150. One option shown in FIG. 3 involves mel-frequency cepstrum (MFC), which is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear “spectrum-of-a-spectrum”). As shown in FIG. 3, the MFCCs would be derived by taking the FFT of the signal and mapping the powers of the spectrum obtained above onto the mel scale, using overlapping windows (150). Next, the logs (151) of the powers at each of the mel frequencies are recorded, and a DCT of the list log powers is taken to obtain the MFCCs (158) representing the amplitudes of the resulting spectrum.

(18) Another option involves the use of spectral flatness measure (SFM) to digitally process signals to characterize an audio spectrum. Spectral flatness is measured 152 and calculated by dividing the geometric mean of the power spectrum by the arithmetic mean of the power spectrum. A high spectral flatness (e.g., “1”) indicates that the spectrum has a similar amount of power in all spectral bands—this would sound similar to white noise, and the graph of the spectrum would appear relatively flat and smooth. A low spectral flatness (e.g., “0”) indicates that the spectral power is concentrated in a relatively small number of bands—this would typically sound like a mixture of sine waves, and the spectrum would appear spiky. A tonality vector could then be generated, where the vector would be the collection of tonality coefficients for a single frame. More specifically, the tonality vector contains a tonality coefficient for each critical band.

(19) Alternately, band representative vectors 160 may be generated from the feature extraction by carrying out peak-based band selection 153. Here, the audio data is represented by vectors containing positions of bands in which significant amplitude peaks take place. Thus, for a particular time frame, significant peaks can be identified for a particular frequency band. Also, filter bank energies 161 may be measured by taking the energy 154 in every band in the filtered spectrum and storing the energy in a vector for each time frame. Also, the sign of the changes of energy differences of adjacent bark-spaced frequency bands and time derivative(s) may be measured 155 to form a hash string 162 for the fingerprint. Yet another technique for feature extraction for the audio fingerprint involves modulation frequency estimation 164. This approach describes the time varying behavior of bark-spaced frequency bands by calculating the envelope 156 of the spectrum for each band over a certain amount of time frames. This way, a modulation frequency 163 can be estimated for each interval. The geometric mean of these frequencies for each band is used to obtain a compact signature of the audio material.

(20) FIG. 4 illustrates an example of distributing and collecting fingerprints from portable devices such as a smart phone 406 and laptop 405, both of which have internal memory storages 408 and 409, respectively. Central site 400 comprises one or more servers 401, and a mass storage device 403. Central site 400 also may comprise a wireless transmitter 402 for communicating with portable device 406, as well as other devices (such as laptop 405). Central site is coupled to a network 404, such as the Internet, which in turn couples one or more devices (405, 406) together.

(21) Broadcaster 407 emits an acoustic audio signal that is received at portable device 406 and laptop 405 (shown as dotted arrow in FIG. 4). In some examples, an acoustic signal is received at each device using a microphone that picks up ambient sound for subsequent processing and recording. The broadcast format may be in any known form, including, but not limited to, radio, television and/or computer network-based communication. As discussed above, certain audio items, such as commercials, announcements, pre-recorded songs, etc. are known ahead of time by the broadcaster, and are broadcast according to a certain schedule. Prior to broadcast, one or more audio items undergo processing according to any of the techniques discussed above (see FIGS. 1-3 and related text) to produce an audio fingerprints. Each of the fingerprints are stored in mass storage device 403

(22) In addition to storing audio fingerprints, central site 400 also may store panelist data, household data and datasets that pertain to the owners of the portable devices, especially when the system is being utilized for audience-measurement applications. In some examples, household-level data representing media exposure, media usage and/or consumer behavior may be converted to person-level data, and vice-versa. In some examples, data about panelists is gathered relating to one or more of the following: panelist demographics; exposure to various media including television, radio, outdoor advertising, newspapers and magazines. retail store visits, purchases, internet usage and consumer beliefs and opinions relating to consumer products and services. This list is merely example and other data relating to consumers may also be gathered.

(23) Various datasets may be produced by different organizations, in different manners, at different levels of granularity, regarding different data, pertaining to different timeframes, and so on. Some examples integrate data from different datasets. Some examples convert, transform or otherwise manipulate the data of one or more datasets. In some examples, datasets providing data relating to the behavior of households are converted to data relating to behavior of persons within those households. In some examples, data from datasets are utilized as “targets” and other data utilized as “behavior.” In some examples, datasets are structured as one or more relational databases. In some examples, data representative of respondent behavior is weighted.

(24) For each of the examples described herein, datasets are provided from one or more sources. Examples of datasets that may be utilized include the following: datasets produced by Arbitron Inc. (hereinafter “Arbitron”) pertaining to broadcast, cable or radio (or any combination thereof); data produced by Arbitron's Portable People Meter System; Arbitron datasets on store and retail activity; the Scarborough retail survey; the JD Power retail survey; issue specific print surveys; average audience print surveys; various competitive datasets produced by TNS-CMR or Monitor Plus (e.g., National and cable TV; Syndication and Spot TV); Print (e.g., magazines, Sunday supplements); Newspaper (weekday, Sunday, FSI); Commercial Execution; TV national; TV local; Print; AirCheck radio dataset; datasets relating to product placement; TAB outdoor advertising datasets; demographic datasets (e.g., from Arbitron; Experian; Axiom, Claritas, Spectra); Internet datasets (e.g., Comscore; NetRatings); car purchase datasets (e.g., JD Power); purchase datasets (e.g., IRI; UPC dictionaries)

(25) Datasets, such as those mentioned above and others, provide data pertaining to individual behavior or provide data pertaining to household behavior. Currently, various types of measurements are collected only at the household level, and other types of measurements are collected at the person level. For example, measurements made by certain electronic devices (e.g., barcode scanners) often only reflect household behavior. Advertising and media exposure, on the other hand, usually are measured at the person level, although sometimes advertising and media exposure are also measured at the household level. When there is a need to cross-analyze a dataset containing person level data and a dataset containing household level data, the existing common practice is to convert the dataset containing person level data into data reflective of the household usage, that is, person data is converted to household data. The datasets are then cross-analyzed. The resultant information reflects household activity.

(26) Currently, databases that provide data pertaining to Internet related activity, such as data that identifies websites visited and other potentially useful information, generally include data at the household level. That is, it is common for a database reflecting Internet activity not to include behavior of individual participants (i.e., persons). Similarly, databases reflective of shopping activity, such as consumer purchases, generally include household data. Examples of such databases are those provided by IRI, HomeScan, NetRatings and Comscore. Additional information and techniques for collecting and correlating panelist and household data may be found in U.S. patent application Ser. No. 12/246,225, titled “Gathering Research Data” and U.S. patent application Ser. No. 12/425,127, titled “Cross-Media Interactivity Metrics”, both of which are incorporated by reference in their entirety herein.

(27) Once panelist and/or household data is established, operators of central site 400 may tailor fingerprint distribution to targeted devices (e.g., single males, age 18-24, annual household income exceeding $50K). FIG. 5 illustrates an example in which, at the start of the process 500, audio content is identified 501 and fingerprinted 502 as discussed above. The audio content is then coordinated with the broadcaster to determine a schedule 503 to determine what times the content will be communicated. In other examples, the broadcaster site 407 may have a dedicated connection with central site 400 in order to send an alert message, indicating that the content is about to be communicated.

(28) In addition to identifying audio content, central site 400 would also correlate the content to panelist and/or household data to determine the most effective audience for polling. Once identified, central site 400 messages each portable device associated with the panelist and/or household data 504, where each message comprises an activation signal. Additionally, the activation signal would be accompanied by the pre-recorded audio fingerprint which may be communicated before, after, or simultaneous with the communication of the activation signal. After the message, activation signal and fingerprint are communicated to the devices, an acknowledgement signal is received at the central site, indicating whether the devices received the information, and if the devices were responsive (i.e., the portable device activated in response to the message). If the portable device was unresponsive, this information is communicated back to the central site 509.

(29) Once the portable device is activated 506, the device prepares for reception of the audio content by activating a microphone or other recording device just prior to the actual communication of the audio content, and remain on during the period of time in which the content is communicated, and deactivate at a predetermined time thereafter. During the time in which the audio content is communicated, the portable device records the audio and forms an audio fingerprint as described above. After the audio fingerprint is formed in the portable device, the portable device performs fingerprint matching locally 510. The matching compares the recorded fingerprint against the prerecorded fingerprint received at the time of messaging to see if there is a match 510. If a match 511 or no match 513 result is obtained, the result is marked and forwarded to central site 400 in step 512. The matching result message should preferably contain identification information of the portable device, identification information of the audio content and/or fingerprint, and a message portion that indicates the results of the match. The message portion may simply be a binary “1” indicating that a match has occurred, or a binary “0” indicating that there was no match. Other message formats are also possible and may be specifically tailored for the specific hardware platform being used.

(30) For streaming content, an activation message would be sent to a panel of computers or smart phones causing them to wake up shortly before a specific time to open a particular stream. At this point, the devices would collect audio matching fingerprints to a period spanning the length of the content and determine if there is a match. The information sent back to the central site would then comprise of yes/no decisions for each content analyzed. One advantage of the configurations described above is that it greatly simplifies the design and implementation of the central site, since it no longer would require substantial processing to match fingerprints for a collective group of devices. This in turn provides greater freedom in customizing, distributing and processing audio fingerprint data, and allows for scaling to enormous panel sizes.

(31) Although various examples and/or embodiments of the present invention have been described with reference to a particular arrangement of parts, features and the like, these are not intended to exhaust all possible arrangements or features, and indeed many other examples, embodiments, modifications and/or variations will be ascertainable to those of ordinary skill in the art.

Distributed audience measurement systems and methods

Assignee

Inventors

Cpc classification

Classification Explorer

G06Q30/0201

PHYSICS

Classification Explorer

G10L25/72

PHYSICS

Classification Explorer

G10L19/018

PHYSICS

Classification Explorer

H04H60/46

ELECTRICITY

Classification Explorer

G10L25/51

PHYSICS

Classification Explorer

H04H60/31

ELECTRICITY

Classification Explorer

G06Q30/0204

PHYSICS

Classification Explorer

G10L25/48

PHYSICS

Classification Explorer

H04H60/33

ELECTRICITY

Classification Explorer

G06Q30/0203

PHYSICS

Classification Explorer

H04H60/58

ELECTRICITY

Classification Explorer

Y02D30/70

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

G10L25/81

PHYSICS

International classification

Classification Explorer

H04H60/31

ELECTRICITY

Classification Explorer

G06Q30/0201

PHYSICS

Classification Explorer

G06Q30/0203

PHYSICS

Classification Explorer

G06Q30/0204

PHYSICS

Classification Explorer

G10L19/018

PHYSICS

Classification Explorer

G10L25/48

PHYSICS

Classification Explorer

G10L25/51

PHYSICS

Classification Explorer

G10L25/72

PHYSICS

Classification Explorer

G10L25/81

PHYSICS

Classification Explorer

H04H60/33

ELECTRICITY

Classification Explorer

H04H60/46

ELECTRICITY

Classification Explorer

H04H60/58

ELECTRICITY