Patent classifications
G10L17/22
Electronic device and control method thereof
Disclosed is an electronic device. The electronic device comprises: a microphone comprising circuitry; a speaker comprising circuitry; and a processor electrically connected to the microphone and speaker, wherein the processor, when a first user's voice is input through the microphone, identifies a user who uttered the first user's voice and provides a first response sound, which is obtained by inputting the first user's voice to an artificial intelligence model learned through an artificial intelligence algorithm, through the speaker, and when a second user's voice is input through the microphone, identifies a user who uttered the second user's voice, and if the user who uttered the first user's voice is the same as the user who uttered the second user's voice, provides a second response sound, which is obtained by inputting the second user's voice and utterance history information to the artificial intelligence model, through the speaker. In particular, at least some of the methods of providing a response sound to a user's voice may use an artificial intelligence model learned in accordance with at least one of a machine learning, neural network, or deep learning algorithm.
INFORMATION PROVIDING DEVICE, INFORMATION PROVIDING METHOD, AND INFORMATION PROVIDING PROGRAM
The present invention provides an information provision device and the like capable of reliably providing necessary guidance information while reducing the amount of audio information that is necessary for processing.
The information provision device is provided with: a guidance voice output control unit 1 and a speaker 10 which output a guidance voice; a microphone 11 and a collected sound transmission unit 2 which collect sounds that are made after the guidance voice starts to be output, wherein the guidance voice output control unit 1 and the speaker 10 output re-guidance voice corresponding to the guidance voice, on the basis of the result of determination of whether the collected sounds include a specific sound.
INFORMATION PROVIDING DEVICE, INFORMATION PROVIDING METHOD, AND INFORMATION PROVIDING PROGRAM
The present invention provides an information provision device and the like capable of reliably providing necessary guidance information while reducing the amount of audio information that is necessary for processing.
The information provision device is provided with: a guidance voice output control unit 1 and a speaker 10 which output a guidance voice; a microphone 11 and a collected sound transmission unit 2 which collect sounds that are made after the guidance voice starts to be output, wherein the guidance voice output control unit 1 and the speaker 10 output re-guidance voice corresponding to the guidance voice, on the basis of the result of determination of whether the collected sounds include a specific sound.
Systems and methods for identifying and storing a portion of a media asset
Systems and methods are described herein for a media guidance application that can cause a specific portion of a media asset to be stored based on a user command. For example, if the user requests the closing scene from a given movie, the media guidance application may detect the command, determine that it comprises an instruction to store a portion of a media asset, identify a source of the portion of the media asset, and cause the portion of the media asset to be stored. The media guidance application may also cause the entirety of the media asset to be stored and initiate playback at the start of the requested portion. This may allow users to store and watch portions of particular interest without requiring that the users seek through the entire media asset on their own.
Systems and methods for identifying and storing a portion of a media asset
Systems and methods are described herein for a media guidance application that can cause a specific portion of a media asset to be stored based on a user command. For example, if the user requests the closing scene from a given movie, the media guidance application may detect the command, determine that it comprises an instruction to store a portion of a media asset, identify a source of the portion of the media asset, and cause the portion of the media asset to be stored. The media guidance application may also cause the entirety of the media asset to be stored and initiate playback at the start of the requested portion. This may allow users to store and watch portions of particular interest without requiring that the users seek through the entire media asset on their own.
Audio verification
Techniques for enabling a system to verify operations or transactions as being associated with a user account are described. A system receives message data associated with an unverified operation or an unverified transaction. The system generates first audio data that includes a representation of a first digital signature based on at least a first verification code. The system sends a message including second message data with an ability to output the first audio data responsive to first device playing the first audio data within earshot of the second device. The system receives, from a second device, second audio data that represents the first audio data. The system determines that the second audio data includes an audio representation of a second digital signature based on at least the first verification code. The system verifies the unverified operation and associates the operation with the user account to indicate that the operation is a verified operation.
Audio verification
Techniques for enabling a system to verify operations or transactions as being associated with a user account are described. A system receives message data associated with an unverified operation or an unverified transaction. The system generates first audio data that includes a representation of a first digital signature based on at least a first verification code. The system sends a message including second message data with an ability to output the first audio data responsive to first device playing the first audio data within earshot of the second device. The system receives, from a second device, second audio data that represents the first audio data. The system determines that the second audio data includes an audio representation of a second digital signature based on at least the first verification code. The system verifies the unverified operation and associates the operation with the user account to indicate that the operation is a verified operation.
Enrollment with an automated assistant
Techniques are described herein for dialog-based enrollment of individual users for single- and/or multi-modal recognition by an automated assistant, as well as determining how to respond to a particular user's request based on the particular user being enrolled and/or recognized. Rather than requiring operation of a graphical user interface for individual enrollment, dialog-based enrollment enables users to enroll themselves (or others) by way of a human-to-computer dialog with the automated assistant.
MULTI-REGISTER-BASED SPEECH DETECTION METHOD AND RELATED APPARATUS, AND STORAGE MEDIUM
This application discloses a multi-sound area-based speech detection method and related apparatus, and a storage medium, which is applied to the field of artificial intelligence. The method includes: obtaining sound area information corresponding to each sound area in N sound areas; using the sound area as a target detection sound area, and generating a control signal corresponding to the target detection sound area according to sound area information corresponding to the target detection sound area; processing a speech input signal corresponding to the target detection sound area by using the control signal corresponding to the target detection sound area, to obtain a speech output signal corresponding to the target detection sound area; and generating a speech detection result of the target detection sound area according to the speech output signal corresponding to the target detection sound area. Speech signals in different directions are processed in parallel based on a plurality of sound areas, so that in a multi-sound source scenario, the speech signals in different directions may be retained or suppressed by a control signal, to separate and enhance speech of a target detection user in real time, thereby improving the accuracy of speech detection.
MULTI-REGISTER-BASED SPEECH DETECTION METHOD AND RELATED APPARATUS, AND STORAGE MEDIUM
This application discloses a multi-sound area-based speech detection method and related apparatus, and a storage medium, which is applied to the field of artificial intelligence. The method includes: obtaining sound area information corresponding to each sound area in N sound areas; using the sound area as a target detection sound area, and generating a control signal corresponding to the target detection sound area according to sound area information corresponding to the target detection sound area; processing a speech input signal corresponding to the target detection sound area by using the control signal corresponding to the target detection sound area, to obtain a speech output signal corresponding to the target detection sound area; and generating a speech detection result of the target detection sound area according to the speech output signal corresponding to the target detection sound area. Speech signals in different directions are processed in parallel based on a plurality of sound areas, so that in a multi-sound source scenario, the speech signals in different directions may be retained or suppressed by a control signal, to separate and enhance speech of a target detection user in real time, thereby improving the accuracy of speech detection.