Patent classifications
G10L15/10
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM
An information processing apparatus that performs a control for detecting a person from an image captured by an image capturing, detecting a first direction based on a gesture performed by the person, specifying, as an indicated region, a background information region including a background information in an image captured by the image capturing unit, in a case where the background information region and the first direction intersect; and adjusting an angle of view of the image capturing such that the person and the indicated region are included in the angle of view, wherein in a case where a plurality of background information regions in the image and the first direction intersect, the indicated region is specified corresponding to a background information region that fulfills a predetermined condition from among the plurality of background information regions.
METHOD AND APPARATUS FOR INTELLIGENT VOICE RECOGNITION
Methods and systems are described for recognizing, based on a voice input, a user and/or a voice command. An algorithm is described herein that processes data associated with a voice input. The data may indicate characteristics of the voice such as a gender, an age, or accent associated with the voice and other metadata. For example, the system may process the data and determine the gender of a voice. The determined characteristics may be used as an input into a voice recognition engine to improve the accuracy of identifying the user who spoke the voice input and identifying a voice command associated with the voice input. For example, the determined gender may be used as a parameter to improve the accuracy of an identified user (e.g., the speaker) or command. The algorithm may adjust, based on gender, parameters such as confidence thresholds used to match voices and voice commands.
METHOD AND APPARATUS FOR INTELLIGENT VOICE RECOGNITION
Methods and systems are described for recognizing, based on a voice input, a user and/or a voice command. An algorithm is described herein that processes data associated with a voice input. The data may indicate characteristics of the voice such as a gender, an age, or accent associated with the voice and other metadata. For example, the system may process the data and determine the gender of a voice. The determined characteristics may be used as an input into a voice recognition engine to improve the accuracy of identifying the user who spoke the voice input and identifying a voice command associated with the voice input. For example, the determined gender may be used as a parameter to improve the accuracy of an identified user (e.g., the speaker) or command. The algorithm may adjust, based on gender, parameters such as confidence thresholds used to match voices and voice commands.
Anomalous sound detection training apparatus, acoustic feature extraction apparatus, anomalous sound sampling apparatus, and methods and programs for the same
An anomalous sound detection training apparatus includes: a first acoustic feature extraction unit that extracts an acoustic feature of normal sound based on training data for normal sound by using an acoustic feature extractor; a normal sound model updating unit that updates a normal sound model by using the acoustic feature extracted; a second acoustic feature extraction unit that extracts an acoustic feature of anomalous sound based on simulated anomalous sound and extracts the acoustic feature of normal sound based on the training data for normal sound by using the acoustic feature extractor; and an acoustic feature extractor updating unit that updates the acoustic feature extractor by using the acoustic feature of anomalous sound and the acoustic feature of normal sound that have been extracted, in which processing by the units is repeatedly performed.
Anomalous sound detection training apparatus, acoustic feature extraction apparatus, anomalous sound sampling apparatus, and methods and programs for the same
An anomalous sound detection training apparatus includes: a first acoustic feature extraction unit that extracts an acoustic feature of normal sound based on training data for normal sound by using an acoustic feature extractor; a normal sound model updating unit that updates a normal sound model by using the acoustic feature extracted; a second acoustic feature extraction unit that extracts an acoustic feature of anomalous sound based on simulated anomalous sound and extracts the acoustic feature of normal sound based on the training data for normal sound by using the acoustic feature extractor; and an acoustic feature extractor updating unit that updates the acoustic feature extractor by using the acoustic feature of anomalous sound and the acoustic feature of normal sound that have been extracted, in which processing by the units is repeatedly performed.
Method and terminal for reconstructing speech signal, and computer storage medium
The present disclosure discloses a method performed at a terminal for reconstructing a speech signal, and a computer storage medium, and relates to the field of speech recognition. The method includes: collecting, by the terminal, a plurality of sound signals through a plurality of sensors of a microphone array; determining, by the terminal, a first speech signal in the plurality of sound signals; performing, by the terminal, signal separation on the first speech signal to obtain a second speech signal; and performing, by the terminal, reconstruction on the second speech signal through a distortion recovery model to obtain a reconstructed speech signal; the distortion recovery model being obtained by training based on a clean speech signal and a distorted speech signal. The embodiments of the present disclosure improve accuracy of speech recognition results.
Method and apparatus for outputting analysis abnormality information in spoken language understanding
Embodiments of the present disclosure provide a method and apparatus for outputting analysis abnormality information in a spoken language understanding. The method includes: acquiring first spoken language information and a first analysis result for the first spoken language information; determining whether the first analysis result includes intent information; determining whether the first spoken language information includes a predefined word slot value, in response to determining the first analysis result not including the intent information; and outputting information indicating an existence of an abnormality in the first analysis result, in response to determining the first spoken language information including the predefined word slot value.
Method and apparatus for outputting analysis abnormality information in spoken language understanding
Embodiments of the present disclosure provide a method and apparatus for outputting analysis abnormality information in a spoken language understanding. The method includes: acquiring first spoken language information and a first analysis result for the first spoken language information; determining whether the first analysis result includes intent information; determining whether the first spoken language information includes a predefined word slot value, in response to determining the first analysis result not including the intent information; and outputting information indicating an existence of an abnormality in the first analysis result, in response to determining the first spoken language information including the predefined word slot value.
Artificial intelligence device capable of controlling other devices based on device information
An artificial intelligence device includes a microphone configured to receive a voice command of a user, a communication unit configured to wirelessly perform communication with a first slave artificial intelligence device and a second slave artificial intelligence device, and a processor configured to acquire an intention of the voice command and determine a device which will perform operation corresponding to the intention based on the acquired intention and information on each of the artificial intelligence device, the first slave artificial intelligence device and the second slave artificial intelligence device.
Artificial intelligence device capable of controlling other devices based on device information
An artificial intelligence device includes a microphone configured to receive a voice command of a user, a communication unit configured to wirelessly perform communication with a first slave artificial intelligence device and a second slave artificial intelligence device, and a processor configured to acquire an intention of the voice command and determine a device which will perform operation corresponding to the intention based on the acquired intention and information on each of the artificial intelligence device, the first slave artificial intelligence device and the second slave artificial intelligence device.