Patent classifications
G10L15/25
Display control device, communication device, display control method, and recording medium
An disclosure includes: moving image acquisition unit configured to acquire moving image data obtained through moving image capturing of at least a mouth part of an utterer; a lip detection unit configured to detect a lip part from the moving image data and detect motion of the lip part; a moving image processing unit configured to generate a moving image enhanced to increase the motion of the lip part detected by the lip detection unit; and a display control unit configured to control a display panel to display the moving image generated by the moving image processing unit.
Display control device, communication device, display control method, and recording medium
An disclosure includes: moving image acquisition unit configured to acquire moving image data obtained through moving image capturing of at least a mouth part of an utterer; a lip detection unit configured to detect a lip part from the moving image data and detect motion of the lip part; a moving image processing unit configured to generate a moving image enhanced to increase the motion of the lip part detected by the lip detection unit; and a display control unit configured to control a display panel to display the moving image generated by the moving image processing unit.
Voice commands recognition method and system based on visual and audio cues
A method and system for voice commands recognition. The system comprises a video camera and a microphone producing an audio/video recording of a user issuing vocal commands and at least one processor connected to the video camera and the microphone. The at least one processor has an associated memory having stored therein processor executable code causing the processor to perform the steps of: obtain the audio/video recording from the video camera and the microphone; extract video features from the audio/video recording and store the result in a first matrix; extract audio features from the audio/video recording and store the result in a second matrix; apply a speech-to-text engine to the audio portion of the audio/video recording and store the resulting syllables in a text file; and identify via a neural network the vocal commands of the user based on the first matrix, the second matrix and the text file.
Voice commands recognition method and system based on visual and audio cues
A method and system for voice commands recognition. The system comprises a video camera and a microphone producing an audio/video recording of a user issuing vocal commands and at least one processor connected to the video camera and the microphone. The at least one processor has an associated memory having stored therein processor executable code causing the processor to perform the steps of: obtain the audio/video recording from the video camera and the microphone; extract video features from the audio/video recording and store the result in a first matrix; extract audio features from the audio/video recording and store the result in a second matrix; apply a speech-to-text engine to the audio portion of the audio/video recording and store the resulting syllables in a text file; and identify via a neural network the vocal commands of the user based on the first matrix, the second matrix and the text file.
TRANSLATION SYSTEM, TRANSLATION APPARATUS, TRANSLATION METHOD, AND TRANSLATION PROGRAM
The present invention contributes to reducing the burden on a user while preventing speeches translated into a plurality of languages from interfering with each other. A translation system comprises a camera that obtains surroundings information; a directional speaker that is movable so as to output sound toward a specified position; a directional microphone that is movable so as to receive sound from a specified position; and a translation apparatus that determines a location of a user from the surroundings information obtained by the camera, moves the directional speaker and the directional microphone toward the location of the user, identifies the language of a speech received by the directional microphone, translates the language into another language to output the translated language from another directional speaker, and retranslates the translation in the another language into the language to output the retranslated language from the directional speaker.
TRANSLATION SYSTEM, TRANSLATION APPARATUS, TRANSLATION METHOD, AND TRANSLATION PROGRAM
The present invention contributes to reducing the burden on a user while preventing speeches translated into a plurality of languages from interfering with each other. A translation system comprises a camera that obtains surroundings information; a directional speaker that is movable so as to output sound toward a specified position; a directional microphone that is movable so as to receive sound from a specified position; and a translation apparatus that determines a location of a user from the surroundings information obtained by the camera, moves the directional speaker and the directional microphone toward the location of the user, identifies the language of a speech received by the directional microphone, translates the language into another language to output the translated language from another directional speaker, and retranslates the translation in the another language into the language to output the retranslated language from the directional speaker.
INFORMATION PROCESSING APPARATUS AND COMMAND PROCESSING METHOD
An acoustic feature detection unit (31) detects acoustic features of voice discretely input separately from a command instructing movement of an operation target. A movement control unit (32) controls the movement of the operation target instructed by the command on the basis of the acoustic features detected by the acoustic feature detection unit (31).
INFORMATION PROCESSING APPARATUS AND COMMAND PROCESSING METHOD
An acoustic feature detection unit (31) detects acoustic features of voice discretely input separately from a command instructing movement of an operation target. A movement control unit (32) controls the movement of the operation target instructed by the command on the basis of the acoustic features detected by the acoustic feature detection unit (31).
Determining input for speech processing engine
A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.
Determining input for speech processing engine
A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.