Patent classifications
G10L19/00
SPEECH RECOGNITION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY STORAGE MEDIUM
A speech recognition apparatus (2000) includes a first model (10) and a second model (20). The first model (10) is learned by training data with an audio frame as input data, and with, as correct answer data, compressed character string data acquired by encoding character string data represented by the audio frame. The second model (20) is a learned decoder (44) acquired by learning an autoencoder (40) being constituted of an encoder (42) converting input character string data into compressed character string data, and the decoder (44) converting, into character string data, the compressed character string data output from the encoder. The speech recognition apparatus (2000) inputs an audio frame to the first model (10), inputs, to the second model (20), compressed character string data output from the first model (10), and thereby generates character string data corresponding to the audio frame.
SPEECH RECOGNITION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY STORAGE MEDIUM
A speech recognition apparatus (2000) includes a first model (10) and a second model (20). The first model (10) is learned by training data with an audio frame as input data, and with, as correct answer data, compressed character string data acquired by encoding character string data represented by the audio frame. The second model (20) is a learned decoder (44) acquired by learning an autoencoder (40) being constituted of an encoder (42) converting input character string data into compressed character string data, and the decoder (44) converting, into character string data, the compressed character string data output from the encoder. The speech recognition apparatus (2000) inputs an audio frame to the first model (10), inputs, to the second model (20), compressed character string data output from the first model (10), and thereby generates character string data corresponding to the audio frame.
Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream
An apparatus for decoding data segments representing a time-domain data stream, a data segment being encoded in the time domain or in the frequency domain, a data segment being encoded in the frequency domain having successive blocks of data representing successive and overlapping blocks of time-domain data samples. The apparatus includes a time-domain decoder for decoding a data segment being encoded in the time domain and a processor for processing the data segment being encoded in the frequency domain and output data of the time-domain decoder to obtain overlapping time-domain data blocks. The apparatus further includes an overlap/add-combiner for combining the overlapping time-domain data blocks to obtain a decoded data segment of the time-domain data stream.
Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream
An apparatus for decoding data segments representing a time-domain data stream, a data segment being encoded in the time domain or in the frequency domain, a data segment being encoded in the frequency domain having successive blocks of data representing successive and overlapping blocks of time-domain data samples. The apparatus includes a time-domain decoder for decoding a data segment being encoded in the time domain and a processor for processing the data segment being encoded in the frequency domain and output data of the time-domain decoder to obtain overlapping time-domain data blocks. The apparatus further includes an overlap/add-combiner for combining the overlapping time-domain data blocks to obtain a decoded data segment of the time-domain data stream.
System, method, and apparatus to interactively broadcast value
A system, method, apparatus, and computer program product for interactively broadcasting value via a sound, a Wi-Fi, a Bluetooth, an email, a messaging, an RFID, or an NFC to incentivize direct consumer engagement. The system produces verifiable data on customer preferences, habits, and purchasing while using secure digital cryptographic technology to convey an item of value. The system is configured to transact a digital token normally carried on a blockchain leger in an off-ledger condition. The digital token is embedded with a media content. A mobile computing device having a representation of a digital wallet is configured to detect the digital token, decode the digital token and is transfer the digital token back to the blockchain ledger. When validated by the blockchain ledger, the digital token is transacted to a digital wallet on the blockchain ledger.
System, method, and apparatus to interactively broadcast value
A system, method, apparatus, and computer program product for interactively broadcasting value via a sound, a Wi-Fi, a Bluetooth, an email, a messaging, an RFID, or an NFC to incentivize direct consumer engagement. The system produces verifiable data on customer preferences, habits, and purchasing while using secure digital cryptographic technology to convey an item of value. The system is configured to transact a digital token normally carried on a blockchain leger in an off-ledger condition. The digital token is embedded with a media content. A mobile computing device having a representation of a digital wallet is configured to detect the digital token, decode the digital token and is transfer the digital token back to the blockchain ledger. When validated by the blockchain ledger, the digital token is transacted to a digital wallet on the blockchain ledger.
Reconstruction of audio scenes from a downmix
Audio objects are associated with positional metadata. A received downmix signal comprises downmix channels that are linear combinations of one or more audio objects and are associated with respective positional locators. In a first aspect, the downmix signal, the positional metadata and frequency-dependent object gains are received. An audio object is reconstructed by applying the object gain to an upmix of the downmix signal in accordance with coefficients based on the positional metadata and the positional locators. In a second aspect, audio objects have been encoded together with at least one bed channel positioned at a positional locator of a corresponding downmix channel. The decoding system receives the downmix signal and the positional metadata of the audio objects. A bed channel is reconstructed by suppressing the content representing audio objects from the corresponding downmix channel on the basis of the positional locator of the corresponding downmix channel.
Reconstruction of audio scenes from a downmix
Audio objects are associated with positional metadata. A received downmix signal comprises downmix channels that are linear combinations of one or more audio objects and are associated with respective positional locators. In a first aspect, the downmix signal, the positional metadata and frequency-dependent object gains are received. An audio object is reconstructed by applying the object gain to an upmix of the downmix signal in accordance with coefficients based on the positional metadata and the positional locators. In a second aspect, audio objects have been encoded together with at least one bed channel positioned at a positional locator of a corresponding downmix channel. The decoding system receives the downmix signal and the positional metadata of the audio objects. A bed channel is reconstructed by suppressing the content representing audio objects from the corresponding downmix channel on the basis of the positional locator of the corresponding downmix channel.
Apparatus and method for encoding/decoding audio signal using information of previous frame
Disclosed is an apparatus and method for encoding/decoding an audio signal using information of a previous frame. An audio signal encoding method includes: generating a current latent vector by reducing dimension of a current frame of an audio signal; generating a concatenation vector by concatenating a previous latent vector generated by reducing dimension of a previous frame of the audio signal with the current latent vector; and encoding and quantizing the concatenation vector.
Apparatus and method for encoding/decoding audio signal using information of previous frame
Disclosed is an apparatus and method for encoding/decoding an audio signal using information of a previous frame. An audio signal encoding method includes: generating a current latent vector by reducing dimension of a current frame of an audio signal; generating a concatenation vector by concatenating a previous latent vector generated by reducing dimension of a previous frame of the audio signal with the current latent vector; and encoding and quantizing the concatenation vector.