Brand sonification

Abstract

A mobile device comprising a software application configured to detect the sound of a product use event; provide a user reward using said software application in response to said detection; capture data relating to said product use event; and provide said captured data to a remote computer system for analysis.

Claims

1. A mobile device comprising: a microphone; memory storing machine-readable instructions; and a processor configured to access the memory and execute the machine readable instructions, the machine readable instructions when executed cause the processor to: receive audio data obtained from a sound in an environment that is captured by the microphone, wherein the sound is generated by a product as a result of a use of said product in said environment, and wherein the generated sound is a characteristic of said product; process said audio data to determine a product use event associated with the use of said product; determine a brand of the product that generated the sound based at least on said processing of said audio data; provide a user reward in response to said determination of said brand of said product; capture data associated with said product use event; and transmit said captured data associated with the produce use event to a remote computer system for analysis.

2. A mobile device as claimed in claim 1 wherein the sound generated by the product during said use of said product comprises a sound of opening a package of said product.

3. A mobile device as claimed in claim 2 wherein the sound generated by the product during said use of said product comprises a sound of a can ring-pull/tab opening event on a can of pressurized beverage.

4. A mobile device as claimed in claim 2 wherein the sound generated by the product during said use of said product comprises a sound of a screw cap twisting and opening event on a beverage bottle.

5. A mobile device as claimed in claim 1 wherein said user reward is only delivered when said software processor has detected a specific number of product use events.

6. A non-transitory data carrier carrying processor control code which when executed on a processor of a mobile device: receives audio data obtained from a sound in an environment that is captured by a microphone of the mobile device, wherein the sound is generated by a product as a result of a use of said product in said environment, and wherein the generated sound is a characteristic of said product; processes said audio data to determine a product use event associated with the use of said product; processes at least said audio data to determine a brand of said product that generated the sound; provides a user reward in response to said determination of said brand of said product; captures date associated with said product use event; and transmit said captured data associated with the product use event to a remote computer system for analysis.

7. A mobile device as claimed in claim 1 wherein said use of said product comprises one or more of: using said product, activating said product, opening said product, and consuming said product.

8. A mobile device as claimed in claim 1 wherein the sound is generated by an audio generation device of the product.

9. A mobile device as claimed in claim 1 wherein the processor is configured to provide said captured data to said remote computer system for analysis in response to the determination of said brand of said product.

10. A mobile device as claimed in claim 1 wherein determining said brand of the product is further based on a user input.

11. A mobile device comprising: memory storing machine-readable instructions; a processor configured to access the memory and execute the machine readable instructions, the machine readable instructions when executed cause the processor to: receive audio data obtained from a sound captured in an environment, wherein the sound is generated by a product as a result of a use of said product in said environment, said use of said product comprising one or more of: using said product, activating said product, opening said product, and consuming said product; and wherein the generated sound is characteristic of said product; process said audio data to determine a product use event associated with the use of said product; determine a brand of the product that generated the sound based at least on said processing of said audio data; provide a user reward in response to said determination of said brand of said product; capture data associated with said product use event; and transmit said captured data associated with said product use event to a remote computer system for analysis.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The invention is diagrammatically illustrated, by way of example, in the accompanying drawings, in which:

(2) FIG. 1 shows a process to detect brand sonification in an embodiment of the invention;

(3) FIG. 2 shows a process to detect brand sonification via the internet in an embodiment of the invention;

(4) FIG. 3 shows a process to detect brand sonification on a local computing device in an embodiment of the invention; and

(5) FIG. 4a illustrates the mechanism of opening a typical pressurized can and FIG. 4b illustrates an example can modification to increase product sound distinctiveness.

DETAILED DESCRIPTION OF THE DRAWINGS

(6) 1. Brand Sonification

(7) FIG. 1 shows a general process to detect brand sonification in an embodiment of the invention. A sound is generated by a product when it is used, activated or opened. For example, the opening of a pressurized beverage or the opening of a packet of crisps/chips may generate particular sounds that are characteristic of a particular brand. The sound is generated by an audio generation device of the product. For instance, on a pressurized beverage can, the pulling of a ring-pull or pull-tab on the lid can generates a sound as a scored part of the lid comes away from the rest of the can lid. In another example, the pulling apart of a crisp/chip packet when opening may generate audio. In a further example, the consuming of the crisps/chips within the packet may in itself generate audio which is characteristic of a particular brand, e.g. the crunching sound.

(8) In the illustrated embodiment, the brand sonification detection is performed using a mobile device comprising a software application. The mobile device may be a mobile telephone, a smartphone, a tablet computer etc. The software application is configured to detect the sound, capture data relating to detected sound, provide a user with a ‘reward’ and forward the captured data to a remote computer system for analysis—these processes are described in more detail below.

(9) Generally, the sound is received by a microphone located within the mobile device via acoustic coupling (e.g. using an acoustic coupler or similar device within the mobile device). The software application may be configured to activate when the microphone detects a sound. The microphone sends the signal to a processing unit. As is described below in more detail, in particular embodiments, the processing unit within the mobile device may perform the sound identification analysis, while in alternative embodiments, an external processing unit may perform the analysis.

(10) The processing unit and software (either within the mobile device or external to it) determine whether the received sound matches particular stored sound models. In general, the sound models are generated in two ways:

(11) By reducing thousands of similar sounds into their constituent parts to enable a model to be generated for a particular class of sounds. This is achieved by collecting thousands of hours of audio recordings, in a wide range of environments using a wide variety of different recording equipment. For example, for brand sonification, the audio recordings may be of different products being used, activated or opened (e.g. beverage containers being opened, crisps/chips being eaten, software applications being initialized) in different environments (e.g. home, office, in a park, in a café etc.). This sound data allows product use to be identified in the presence of a range of different background sounds. Received audio data is compared to the stored sound models to determine if the received audio data has constituent parts that match those of a particular model.

(12) By using a closed-loop system to update and improve existing sound models based on audio data received from users of mobile devices. Although the sound models may have been created using thousands of audio recordings, the audio data may not represent all possible ways in which a product may be used or the different environments the product may be used in. For example, the sound a pressurized beverage container makes may depend on how the user holds the can or the specific position of the user's fingers on the ring-pull. In a further example, the sounds of some consumables being used or opened may differ under different pressures. Thus, the actual sounds generated by use of a product by users can be used to help improve existing models.

(13) More details on how sounds identification models are determined and how new sounds are identified or matched to models can be found respectively in sections 3 and 4 below. If the processing unit establishes that the sound matches a known model, the received sound is considered a ‘product use event’. Data associated with the event is then transmitted to a further system or systems which is(are) located externally to the mobile device. The event data may be the location of the mobile device, the time and the date when the product was used (i.e. when the product sound was detected). The location of the mobile device may be determined using the GPS-capability of the mobile device itself. The time and date may be logged by the mobile device's processing unit on receipt of the signal from the microphone. In embodiments, the location and time/date information of the product use event may be transmitted to a system run by the owner of the identified brand, in order to provide them with precise information on the usage of their products. Such information may, for example, enable a brand owner to determine that one of their products is typically used by consumers on weekdays at lunchtime.

(14) In embodiments, the event data may be transmitted to a further system to request additional content from an online service. Preferably, the event data is transmitted to a further system which is configured to deliver content or additional functionality to the mobile device, i.e. a ‘reward’. The reward may be, for example, access to an exclusive promotional video linked to the brand, or a monetary reward such as a percentage discount off the user's next purchase of the branded product. The reward may be delivered on a conditional basis, such as when a user has opened/consumed a specific number of the branded beverages. The further system is configured to communicate with the mobile device's operating system to deliver the reward in the appropriate manner (e.g. causing a video to be played or a money-off coupon to be downloaded).

(15) Turning now to FIG. 2, this shows a process to detect brand sonification via the internet and cloud computing in an embodiment of the invention. Here, the local computing device (i.e. the user's mobile device) receives a sound for identification, but rather than perform the sound identification analysis on the local processing unit, the sound data is transmitted for analysis to an external processing unit (e.g. to ‘the cloud’) via a wired or wireless communication channel. Currently, the computational power of many mobile devices limits the ability to perform the sound analysis on the local processing unit. Thus, advantageously, in the embodiment of FIG. 2 the sound analysis is performed via ‘the cloud’. The audio data transmitted to ‘the cloud’ for processing may be a coarser representation of the sound than the original audio, e.g. a Fast Fourier Transform of the original data. This coarser representation is generated by the internal processing unit of the mobile device.

(16) If the received audio data matches a sound model, event data is delivered to a further system as previously outlined with reference to FIG. 1. During the analysis of the audio data, the external processing unit may store the audio data in an audio database. One or more sound models may be stored within ‘the cloud’ or elsewhere, such that the sound models are accessible to the external processing unit. As mentioned above, the sound models may be updated based on received audio data in order to improve the original models. Thus, the audio data stored within the audio database may be sent to a system for updating the sound models. Preferably, for efficiency, new audio data is only transmitted to the model updating system when a certain number of new audio recordings have been received. Thus, preferably, audio recordings are sent to the model updating system in batches (e.g. after a specific number of new audio recordings is received), and the sound models are updated as described in section 4 below.

(17) FIG. 3 illustrates a process to detect brand sonification on a mobile device in another embodiment of the invention. Here, the mobile device processor may have the processing capability required to perform the sound identification analysis. In FIG. 3, the local processing unit is shown to have access to locally stored sound models (i.e. stored on the mobile device), but it may be possible that the sound models are stored externally (e.g. in ‘the cloud’) and accessed by the local processing unit when perform the sound analysis. The local processing unit may, on successful identification of brand sonification, create and deliver an interactive experience for the user on the mobile device.

(18) In FIG. 3, even though the local processing unit performs the sound analysis, the local processing unit does not perform any updates to the sound models itself. This is because the sound models are ideally updated using audio data collected from multiple users, and the local processing unit does not have access to an audio database. Rather, the received audio data may be transmitted to an externally located server, audio database and model updating system, e.g. located in ‘the cloud’. Additionally or alternatively, the updating task may be distributed over a number of different systems, devices or processors, e.g. using a training process distributed in ‘the cloud’. New models and updated models created using the user audio data are sent back to user mobile devices in order to enable the local processing unit to perform the sound analysis the next time a sound is received by the device.

(19) As shown in FIG. 3, in one embodiment, the local processing unit may itself create a local interactive experience for the user. Additionally or alternatively, the local processing unit may transmit event data to a further, external system which is configured to create and deliver an interactive experience or reward for the user, as described with reference to FIG. 1. The event data may be delivered to a further system run by the brand owner, in order to provide them with precise information on the usage of their products, as outlined above.

(20) 2. Product Modification

(21) As described above, brand owners may be keen to identify when and where their branded products are used or activated and the brand sonification identification process enables them to obtain this information. In the above described embodiments, it may in certain circumstances be difficult to determine the brand owner from the audio data alone. For example, many drinks manufacturers use similar or standard containers for their drinks, which make the same or similar sounds when they are opened. In this situation, the user may be required to input further information into their mobile device before a ‘reward’ is delivered. For instance, once the sound identification process has determined that the sound represented the opening of a pressurized beverage can, the user may be prompted to input the brand owner or select the brand owner from a list provided by the local processing unit. Thus, brand owners may wish to modify their product packaging in order to achieve brand sonification.

(22) Turning now to FIG. 4a, this shows a series of pictures illustrating the process of opening a typical pressurized beverage can 10 which comprises a ring pull 12 (also known as a pull-tab) attached at a rivet to the lid of the can. The lid is perforated or scored to define a tab 14 in the lid. As shown in FIG. 4a, pulling the ring pull 12 causes the tab-end of the ring pull 12 to exert a downwards force on the tab 14, and eventually causes the tab 14 to be pushed into the can to create an opening. Many drinks manufacturers use the same ring pull design and use the same can shape and size for their beverages, and thus, it can be difficult to distinguish between brands based on the sound created upon pulling the ring pull and opening the can. However, brand owners may be able to achieve brand sonification by modifying the ring pull. For example, FIG. 4b shows a modified tab 14 on a can 10. The tab 14 comprises one or more ridges 16 which can cause a specific, distinctive sound to be created when the ring pull is pulled to open the can. The ring pull 12 may modified (not shown) so that it interacts with the ridges when pulled. For instance, the tab-end of the ring pull 12 may be extended so that it runs over the one or more ridges 16 when it is operated in the usual way. Or, the ring pull and rivet may be configured to enable the tab-end of the ring pull to slide over the ridges 16, such that as the ring pull is moved from the horizontal position (on the left-hand side of FIG. 4a) into the vertical/upright position (on the right-hand side of FIG. 4a), the ring pull 12 is successively brought into contact with ridges 16 to cause a series of “clicks”. Such modifications may enable a specific sound to be created when the can is opened, which can then be more easily determined as brand sonification by the sound analysis process.

(23) 3. Sound Identification

(24) The applicant's PCT application WO2010/070314, which is incorporated by reference in its entirety, describes in detail various methods to identify sounds. Broadly speaking an input sample sound is processed by decomposition into frequency bands, and optionally de-correlated, for example, using PCA/ICA, and then this data is compared to one or more Markov models to generate log likelihood ratio (LLR) data for the input sound to be identified. A (hard) confidence threshold may then be employed to determine whether or not a sound has been identified; if a “fit” is detected to two or more stored Markov models then preferably the system picks the most probable. A sound is “fitted” to a model by effectively comparing the sound to be identified with expected frequency domain data predicted by the Markov model. False positives are reduced by correcting/updating means and variances in the model based on interference (which includes background) noise.

(25) There are several practical considerations when trying to detect sounds from compressed audio formats in a robust and scalable manner. Where the sound stream is uncompressed to PCM (pulse code modulated) format and then passed to a classification system, the first stage of an audio analysis system may be to perform a frequency analysis on the incoming uncompressed PCM audio data. However, the recently compressed form of the audio may contain a detailed frequency description of the audio, for example where the audio is stored as part of a lossy compression system. By directly utilising this frequency information in the compressed form, i.e., sub-band scanning in an embodiment of the above still further aspect, a considerable computational saving may be achieved by not uncompressing and then frequency analyzing the audio. This may mean a sound can be detected with a significantly lower computational requirement. Further advantageously, this may make the application of a sound detection system more scalable and enable it to operate on devices with limited computational power which other techniques could not operate on.

(26) The digital sound identification system may comprise discrete cosine transform (DCT) or modified DCT coefficients. The compressed audio data stream may be an MPEG standard data stream, in particular an MPEG 4 standard data stream.

(27) The sound identification system may work with compressed audio or uncompressed audio. For example, the time-frequency matrix for a 44.1 KHz signal might be a 1024 point FFT with a 512 overlap. This is approximately a 20 milliseconds window with 10 millisecond overlap. The resulting 512 frequency bins are then grouped into sub bands, or example quarter-octave ranging between 62.5 to 8000 Hz giving 30 sub-bands.

(28) A lookup table is used to map from the compressed or uncompressed frequency bands to the new sub-band representation bands. For the sample rate and STFT size example given the array might comprise of a (Bin size÷2)×6 array for each sampling-rate/bin number pair supported. The rows correspond to the bin number (centre)—STFT size or number of frequency coefficients. The first two columns determine the lower and upper quarter octave bin index numbers. The following four columns determine the proportion of the bins magnitude that should be placed in the corresponding quarter octave bin starting from the lower quarter octave defined in the first column to the upper quarter octave bin defined in the second column. e.g. if the bin overlaps two quarter octave ranges the 3 and 4 columns will have proportional values that sum to 1 and the 5 and 6 columns will have zeros. If a bin overlaps more than one sub-band more columns will have proportional magnitude values. This example models the critical bands in the human auditory system. This reduced time/frequency representation is then processed by the normalization method outlined. This process is repeated for all frames incrementally moving the frame position by a hop size of 10 ms. The overlapping window (hop size not equal to window size) improves the time-resolution of the system. This is taken as an adequate representation of the frequencies of the signal which can be used to summarise the perceptual characteristics of the sound. The normalization stage then takes each frame in the sub-band decomposition and divides by the square root of the average power in each sub-band. The average is calculated as the total power in all frequency bands divided by the number of frequency bands. This normalised time frequency matrix is the passed to the next section of the system where its mean, variances and transitions can be generated to fully characterize the sound's frequency distribution and temporal trends. The next stage of the sound characterization requires further definitions. A continuous hidden Markov model is used to obtain the mean, variance and transitions needed for the model. A Markov model can be completely characterized by λ=(A, B, Π) where A is the state transition probability matrix, B is the observation probability matrix and Π is the state initialization probability matrix. In more formal terms:
A=└a.sub.ij┘ where a.sub.ij≡P(q.sub.t+1=S.sub.j|q.sub.t=S.sub.i)
B=└b.sub.j(m)┘ where b.sub.j(m)≡P(O.sub.t=v.sub.m|q.sub.t=S.sub.j)
Π=[π.sub.i] where π.sub.i≡P(q.sub.1=S.sub.i)

(29) where q is the state value, O is the observation value. A state in this model is actually the frequency distribution characterised by a set of mean and variance data. However, the formal definitions for this will be introduced later. Generating the model parameters is a matter of maximizing the probability of an observation sequence. The Baum-Welch algorithm is an expectation maximization procedure that has been used for doing just that. It is an iterative algorithm where each iteration is made up of two parts, the expectation ε.sub.t(i, j) and the maximization γ.sub.t(i). In the expectation part, ε.sub.t(i, j) and γ.sub.t(i), are computed given λ, the current model values, and then in the maximization λ is step recalculated. These two steps alternate until convergence occurs. It has been shown that during this alternation process, P(O|λ) never decreases. Assume indicator variables z.sub.i.sup.t as

(30) $Expectation$ ${.Math.}_{t} (i, j) = \frac{α_{t} (i) a_{ij} b_{j} (O_{t + 1}) β_{t + 1} (j)}{\underset{k}{.Math.} \underset{l}{.Math.} α (k) a_{kl} b_{l} (O_{t + 1}) β_{t + 1} (l)}$ $γ_{t} (i) = {.Math.}_{j = 1}^{N} {.Math.}_{t} (i, j)$ $E [z_{i}^{t}] = γ_{t} (i) and [z_{ij}^{t}] - {.Math.}_{t} (i, j)$ $z_{i}^{t} = {\begin{matrix} 1 & if q_{t} = S_{i} \\ 0 & otherwise \end{matrix} z_{ij}^{t} = {\begin{matrix} 1 & if q_{t} = S_{i} and q_{t + 1} = S_{j} \\ 0 & otherwise \end{matrix} Maximisation {\hat{a}}_{ij} = \frac{{.Math.}_{k = 1}^{K} {.Math.}_{t = 1}^{T_{k} - 1} {.Math.}_{t}^{k} (i, j)}{{.Math.}_{k = 1}^{K} {.Math.}_{t = 1}^{T_{k} - 1} γ_{t}^{k} (i)} {\hat{b}}_{j} (m) = \frac{{.Math.}_{k = 1}^{K} {.Math.}_{t = 1}^{T_{k} - 1} γ_{t}^{k} (j) 1 (O_{t}^{k} = v_{m})}{{.Math.}_{k = 1}^{K} {.Math.}_{t = 1}^{T_{k} - 1} γ_{t}^{k} (j)} \hat{π} = \frac{{.Math.}_{K = 1}^{K} γ_{1}^{k} (i)}{K}$

(31) Gaussian mixture models can be used to represent the continuous frequency values, and expectation maximization equations can then be derived for the component parameters (with suitable regularization to keep the number of parameters in check) and the mixture proportions. Assume a scalar continuous frequency value, O.sub.t∈ custom character with a normal distribution
p(O.sub.t|q.sub.t=S.sub.j,λ)˜N(μ.sub.j,σ.sub.j.sup.2)

(32) This implies that in state S.sub.j, the frequency distribution is drawn from a normal distribution with mean μ.sub.j and variance σ.sub.j.sup.2. The maximization step equation is then

(33) ${\hat{μ}}_{j} = \frac{\underset{t}{.Math.} γ_{t} (j) O_{t}}{\underset{t}{.Math.} γ_{t} (j)}$ ${\hat{σ}}_{j}^{2} = \frac{\underset{t}{.Math.} γ_{t} (j) {(O_{t - 1} - {\hat{μ}}_{j})}^{2}}{\underset{t}{.Math.} γ_{t} (j)}$

(34) The use of Gaussians enables the characterization of the time-frequency matrix's features. In the case of a single Gaussian per state, they become the states. The transition matrix of the hidden Markov model can be obtained using the Baum-Welch algorithm to characterize how the frequency distribution of the signal change over time.

(35) The Gaussians can be initialized using K-Means with the starting points for the clusters being a random frequency distribution chosen from sample data.

(36) 4. Matching New Sounds to Model(s)

(37) To classify new sounds and adapt for changes in the acoustic conditions, a forward algorithm can be used to determine the most likely state path of an observation sequence and produce a probability in terms of a log likelihood that can be used to classify and incoming signal. The forward and backward procedures can be used to obtain this value from the previously calculated model parameters. In fact only the forward part is needed. The forward variable α.sub.t(i) is defined as the probability of observing the partial sequence {O.sub.1 . . . O.sub.t} until time t and being in S.sub.i at time t, given the model λ.
α.sub.t(i)≡P(O.sub.1 . . . O.sub.t,q.sub.t=S.sub.i|λ)

(38) This can be calculated by accumulating results and has two steps, initialization and recursion. α.sub.t(i) explains the first t observations and ends in state S.sub.i. This is multiplied by the probability α.sub.ij of moving to state S.sub.j, and because there are N possible previous states, there is a need to sum over all such possible previous S.sub.i. The term b.sub.j(O.sub.t+1) is then the probability of generating the next observation, frequency distribution, while in state S.sub.j at time t+1. With these variables it is then straightforward to calculate the probability of a frequency distribution sequence.

(39) $P (O .Math. λ) = {.Math.}_{i = 1}^{N} α_{T} (i)$

(40) Computing α.sub.t(i) has order O(N.sup.2T) and avoids complexity issues of calculating the probability of the sequence. The models will operate in many different acoustic conditions and as it is practically restrictive to present examples that are representative of all the acoustic conditions the system will come in contact with, internal adjustment of the models will be performed to enable the system to operate in all these different acoustic conditions. Many different methods can be used for this update. For example, the method may comprise taking an average value for the sub-bands, e.g. the quarter octave frequency values for the last T number of seconds. These averages are added to the model values to update the internal model of the sound in that acoustic environment.

(41) No doubt many other effective alternatives will occur to the skilled person. It will be understood that the invention is not limited to the described embodiments and encompasses modifications apparent to those skilled in the art lying within the spirit and scope of the claims appended hereto.

Brand sonification

Assignee

Inventors

Cpc classification

Classification Explorer

G10L25/51

PHYSICS

Classification Explorer

H04R29/00

ELECTRICITY

Classification Explorer

G06Q30/0226

PHYSICS

Classification Explorer

G06Q30/0231

PHYSICS

International classification

Classification Explorer

G06Q30/02

PHYSICS

Classification Explorer

G10L25/51

PHYSICS

Classification Explorer

H04R29/00

ELECTRICITY

Abstract

Claims

Description