G10L19/018

Generation and detection of watermark for real-time voice conversion
11538485 · 2022-12-27 · ·

A method watermarks speech data by using a generator to generate speech data including a watermark. The generator is trained to generate the speech data including the watermark. The training process generates first speech from the generator. The first speech data is configured to represent speech. The first speech data includes a candidate watermark. The training also produces an inconsistency message as a function of at least one difference between the first speech data and at least authentic speech data. The training further includes transforming the first speech data, including the candidate watermark, using a watermark robustness module to produce transformed speech data including a transformed candidate watermark. The transformed speech data includes a transformed candidate watermark. The training further produces a watermark-detectability message, using a watermark detection machine learning system, relating to one or more desirable watermark features of the transformed candidate watermark.

INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
20220406306 · 2022-12-22 ·

Provided is an information processing system including: an information processing device (20) and a playback device (10), the information processing device including: a first detection unit (204) that detects, from collected sound, audio processing superimposed on the sound by the playback device; a specifying unit (206) that specifies an utterance subject of the sound on the basis of the audio processing that has been detected; and a determination unit (208) that determines whether or not to execute a command included in the sound on the basis of a result of the specification.

CONFERENCE TERMINAL AND EMBEDDING METHOD OF AUDIO WATERMARKS
20220406317 · 2022-12-22 · ·

A conference terminal and an embedding method of audio watermarks are provided. In the method, a first speech signal and a first audio watermark signal are received respectively. The first speech signal relates to a speaker corresponding to another conference terminal, and the first audio watermark signal corresponds to the another conference terminal. The first speech signal is assigned to a host path to output a second speech signal. The first audio watermark signal is assigned to an offload path to output a second audio watermark signal. The host path provides more digital signal processing (DSP) effects than the offload path. The second speech signal and the second audio watermark signal are synthesized to output a synthesized audio signal. The synthesized audio signal is adapted for audio playback. A completed audio watermark signal is outputted accordingly.

CONFERENCE TERMINAL AND EMBEDDING METHOD OF AUDIO WATERMARKS
20220406317 · 2022-12-22 · ·

A conference terminal and an embedding method of audio watermarks are provided. In the method, a first speech signal and a first audio watermark signal are received respectively. The first speech signal relates to a speaker corresponding to another conference terminal, and the first audio watermark signal corresponds to the another conference terminal. The first speech signal is assigned to a host path to output a second speech signal. The first audio watermark signal is assigned to an offload path to output a second audio watermark signal. The host path provides more digital signal processing (DSP) effects than the offload path. The second speech signal and the second audio watermark signal are synthesized to output a synthesized audio signal. The synthesized audio signal is adapted for audio playback. A completed audio watermark signal is outputted accordingly.

METHOD AND SYSTEM FOR ENCODING AND DECODING DATA IN AUDIO
20220406322 · 2022-12-22 ·

Methods and systems for encoding and decoding data in an audio channel are provided. At least one notch attribute for each of a set of notches to be applied to a source audio channel corresponding to data to be encoded is determined. The data is encoded by applying notch-filtering to the source audio channel to create a modified audio channel having the set of notches having the at least one notch attribute. The notches in the modified audio channel are then analyzed to determine at least one characteristic of each of the notches, and data is then decoded from the at least one characteristic of each of the notches.

AUDIOMETRIC RECEIVER SYSTEM TO DETECT AND PROCESS AUDIO SIGNALS

In an approach for detecting and processing multiple audio signals simultaneously, an audiometric receiver system comprises a transmitter, wherein the transmitter comprises a digital signal processor, and wherein the digital signal processor comprises a quality check component, an amplifier or attenuator component, mixer component, a modulator component, and an encrypter component; and a receiver, wherein the receiver comprises a decrypter component, a demodulator component, a splitter component, and a second amplifier or attenuator component.

Frequency pairing for device synchronization
11522619 · 2022-12-06 · ·

A device may include a processor, a receiver, and a transmitter. The receiver may be configured to receive a content signal. The transmitter may be configured to transmit the content signal. The transmitter may be configured to transmit an associated inaudible signal. The content signal, the associated inaudible signal, or both, may be transmitted to one or more electronic devices. Each of the one or more electronic devices may be configured with audio interfaces. The receiver may be configured to receive a respective message from each of the one or more electronic devices. Each respective message may be based on the associated inaudible signal. Each respective message may include a respective electronic device identifier. The transmitter may be configured to transmit one of the respective messages.

Encoding machine-learning models and determining ownership of machine-learning models
11521121 · 2022-12-06 · ·

Methods, systems, and non-transitory computer readable storage media are disclosed for generating a machine-learning model and encoding ownership information in the machine-learning model. For example, the disclosed system can generate parameters of a machine-learning model utilizing digital content items modified by a filter. The disclosed system can then process digital content items modified by the filter to generate first outputs based on the digital content items being modified by the filter. The disclosed system can also process digital content items unmodified by the filter to generate second outputs based on the digital content items not being modified by the filter. The disclosed system can determine that the second outputs are degraded relative to the first outputs. Accordingly, the disclosed system can determine ownership of the machine-learning model based on detecting that information about the filter is embedded in parameters of the machine-learning model.

Wearable audio device with user own-voice recording

Various implementations include wearable audio devices configured to record a user's voice without recording other ambient acoustic signals, such as others talking nearby. In some particular aspects, a wearable audio device includes: a frame for contacting a head of a user; an electro-acoustic transducer within the frame and configured to output audio signals; at least one microphone; a voice activity detection (VAD) accelerometer; and a controller coupled with the electro-acoustic transducer, the at least one microphone and the VAD accelerometer, the controller configured in a first mode to: detect that the user is speaking; and record a voice of the user solely with signals from the VAD accelerometer in response to detecting that the user is speaking.

Wearable audio device with user own-voice recording

Various implementations include wearable audio devices configured to record a user's voice without recording other ambient acoustic signals, such as others talking nearby. In some particular aspects, a wearable audio device includes: a frame for contacting a head of a user; an electro-acoustic transducer within the frame and configured to output audio signals; at least one microphone; a voice activity detection (VAD) accelerometer; and a controller coupled with the electro-acoustic transducer, the at least one microphone and the VAD accelerometer, the controller configured in a first mode to: detect that the user is speaking; and record a voice of the user solely with signals from the VAD accelerometer in response to detecting that the user is speaking.