G10L17/10

SIGNAL EXTRACTION SYSTEM, SIGNAL EXTRACTION LEARNING METHOD, AND SIGNAL EXTRACTION LEARNING PROGRAM

A neural network input unit 81 inputs a neural network in which a first network having a layer for inputting an anchor signal belonging to a predetermined class and a mixed signal including a target signal belonging to the class and a layer for outputting, as an estimation result, a reconstruction mask indicating a time-frequency domain in which the target signal is present in the mixed signal, and a second network having a layer for inputting the target signal extracted by applying the mixed signal to the reconstruction mask and a layer for outputting a result obtained by classifying the input target signal into a predetermined class are combined. A reconstruction mask estimation unit 82 applies the anchor signal and mixed signal to the first network to estimate the reconstruction mask of the class to which the anchor signal belongs. A signal classification unit 83 applies the mixed signal to the estimated reconstruction mask to extract the target signal, and applies the extracted target signal to the second network to classify the target signal into the class.

SIGNAL EXTRACTION SYSTEM, SIGNAL EXTRACTION LEARNING METHOD, AND SIGNAL EXTRACTION LEARNING PROGRAM

A neural network input unit 81 inputs a neural network in which a first network having a layer for inputting an anchor signal belonging to a predetermined class and a mixed signal including a target signal belonging to the class and a layer for outputting, as an estimation result, a reconstruction mask indicating a time-frequency domain in which the target signal is present in the mixed signal, and a second network having a layer for inputting the target signal extracted by applying the mixed signal to the reconstruction mask and a layer for outputting a result obtained by classifying the input target signal into a predetermined class are combined. A reconstruction mask estimation unit 82 applies the anchor signal and mixed signal to the first network to estimate the reconstruction mask of the class to which the anchor signal belongs. A signal classification unit 83 applies the mixed signal to the estimated reconstruction mask to extract the target signal, and applies the extracted target signal to the second network to classify the target signal into the class.

ADAPTIVE MANAGEMENT OF CASTING REQUESTS AND/OR USER INPUTS AT A RECHARGEABLE DEVICE

Implementations set forth herein relate to management of casting requests and user inputs at a rechargeable device, which provides access to an automated assistant and is capable of rendering data that is cast from a separate device. Casting requests can be handled by the rechargeable device despite a device SoC of the rechargeable device operating in a sleep mode. Furthermore, spoken utterances provided by a user for invoking the automated assistant can also be adaptively managed by the rechargeable device in order mitigate idle power consumption by the device SoC. Such spoken utterances can be initially processed by a digital signal processor (DSP), and, based on one or more features (e.g., voice characteristic, conformity to a particular invocation phrase, etc.) of the spoken utterance, the device SoC can be initialized for an amount of time that is selected based on the features of the spoken utterance.

ADAPTIVE MANAGEMENT OF CASTING REQUESTS AND/OR USER INPUTS AT A RECHARGEABLE DEVICE

Implementations set forth herein relate to management of casting requests and user inputs at a rechargeable device, which provides access to an automated assistant and is capable of rendering data that is cast from a separate device. Casting requests can be handled by the rechargeable device despite a device SoC of the rechargeable device operating in a sleep mode. Furthermore, spoken utterances provided by a user for invoking the automated assistant can also be adaptively managed by the rechargeable device in order mitigate idle power consumption by the device SoC. Such spoken utterances can be initially processed by a digital signal processor (DSP), and, based on one or more features (e.g., voice characteristic, conformity to a particular invocation phrase, etc.) of the spoken utterance, the device SoC can be initialized for an amount of time that is selected based on the features of the spoken utterance.

Deleting user data using keys

Described are techniques for tracking associations between known keys and internal keys related to user data received at a natural language processing system and shared with target systems. The system can receive a request to delete data associated with a user or device, and determine one or more known keys related to the request. The system can retrieve previously stored associations between known keys and internal keys, and use the associations to generate a delete command containing relevant internal keys to be sent to the target systems, which in turn can delete data associated with the internal keys.

AUTHENTICATING A USER SUBVOCALIZING A DISPLAYED TEXT

A computing device (200) for authenticating a user (110) is provided. The computing device is operative to display a first text (131) to the user, acquire a representation of the user subvocalizing a part of the first text, derive a user phrasing signature from the acquired representation, and authenticate the user in response to determining that the user phrasing signature and a reference phrasing signature fulfil a similarity condition. Optionally, the computing device may be further operative to determine if the user is authorized to read the first text. Further optionally, the computing device may be operative to reveal obfuscated parts of the first text in response to determining that the user is authorized to read the first text, or to discontinue displaying the first text, or to obfuscate at least part of the first text, in response to determining that the user is not authorized to read the first text.

METHODS AND SYSTEMS FOR SPEECH DETECTION
20220189483 · 2022-06-16 ·

Methods and systems for processing user input to a computing system are disclosed. The computing system has access to an audio input and a visual input such as a camera. Face detection is performed on an image from the visual input, and if a face is detected this triggers the recording of audio and making the audio available to a speech processing function. Further verification steps can be combined with the face detection step for a multi-factor verification of user intent to interact with the system.

Authenticating received speech
11341974 · 2022-05-24 · ·

A speech signal is received by a device comprising first and second transducers, and the first transducer comprises a microphone. A method comprises performing a first voice biometric process on speech contained in a first part of a signal received by the microphone, in order to determine whether the speech is the speech of an enrolled user. A first correlation is determined, between said first part of the signal received by the microphone and a corresponding part of the signal received by the second transducer. A second correlation is determined, between said second part of the signal received by the microphone and the corresponding part of the signal received by the second transducer. It is then determined whether the first correlation and the second correlation satisfy a predetermined condition. If it is determined that the speech contained in the first part of the received signal is the speech of an enrolled user and that the first correlation and the second correlation satisfy the predetermined condition, the received speech signal is authenticated.

Using voice biometrics for trade of financial instruments

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving first voice data from a user, receiving behavior data associated with the user, and determining that the user is authentic at least partially based on voice recognition of the first voice data, and behavior analysis of the behavior data, and subsequently: prompting one or more spoken commands from the user, receiving second voice data representative of at least one spoken command of the user to trade at least one financial instrument, providing trade data for executing the trade of the at least one financial instrument, and determining that the trade data is valid at least partially based on the second voice data, and in response, initiating execution of the trade.

SYSTEM AND METHODS FOR INTELLIGENT TRAINING OF VIRTUAL VOICE ASSISTANT

Embodiments of the present invention provide systems and methods for using machine learning to analyze and infer the contextual significance of a conversational language in order to proactively engage with one or more users in a familiar manner via a virtual voice assistant. As such, the systems and methods reduce redundancy of process steps for the user in accessing relevant information or initiating certain resource activities via disparate channels of communication by creating a continuity of conversational tone and substance.