G10L17/26

EMOTION RECOGNITION SYSTEM AND EMOTION RECOGNITION METHOD
20230141724 · 2023-05-11 ·

An emotion recognition system includes an input unit configured to input first speech data and second speech data, and a processing unit configured to input the first speech data and the second speech data to a differential-emotion recognition model that infers a differential emotion between two pieces of speech data, and acquire, from the differential-emotion recognition model, differential-emotion information indicating a differential emotion between the first speech data and the second speech data.

EMOTION RECOGNITION SYSTEM AND EMOTION RECOGNITION METHOD
20230141724 · 2023-05-11 ·

An emotion recognition system includes an input unit configured to input first speech data and second speech data, and a processing unit configured to input the first speech data and the second speech data to a differential-emotion recognition model that infers a differential emotion between two pieces of speech data, and acquire, from the differential-emotion recognition model, differential-emotion information indicating a differential emotion between the first speech data and the second speech data.

Audio collection system and method for sound capture, broadcast, analysis, and presentation
11646039 · 2023-05-09 ·

At least a system or a method is provided for remote delivery or collection of a device such as an audio collection device. For example, a device comprising an aperture collection and retrieval pin is provided. An apparatus is provided having an aperture receiver, an aperture drive gear and a drive motor. The drive motor is configured to drive the aperture drive gear to open or close the aperture receiver of the apparatus for retrieving or releasing the device comprising the aperture collection pin.

Audio collection system and method for sound capture, broadcast, analysis, and presentation
11646039 · 2023-05-09 ·

At least a system or a method is provided for remote delivery or collection of a device such as an audio collection device. For example, a device comprising an aperture collection and retrieval pin is provided. An apparatus is provided having an aperture receiver, an aperture drive gear and a drive motor. The drive motor is configured to drive the aperture drive gear to open or close the aperture receiver of the apparatus for retrieving or releasing the device comprising the aperture collection pin.

Detection of calls from voice assistants

Embodiments described herein provide for automatically classifying the types of devices that place calls to a call center. A call center system can detect whether an incoming call originated from voice assistant device using trained classification models received from a call analysis service. Embodiments described herein provide for methods and systems in which a computer executes machine learning algorithms that programmatically train (or otherwise generate) global or tailored classification models based on the various types of features of an audio signal and call data. A classification model is deployed to one or more call centers, where the model is used by call center computers executing classification processes for determining whether incoming telephone calls originated from a voice assistant device, such as Amazon Alexa® and Google Home®, or another type of device (e.g., cellular/mobile phone, landline phone, VoIP).

VOICE EVALUATION SYSTEM, VOICE EVALUATION METHOD, AND COMPUTER PROGRAM
20230138068 · 2023-05-04 · ·

A voice evaluation system includes: an acquisition unit that obtains voice uttered by a group of a plurality of persons; a detection unit that detects an element corresponding to a feeling from the obtained voice; and an evaluation unit that evaluates the obtained voice on the basis of the detected element. According to such a voice evaluation system, it is possible to properly evaluate the voice uttered by the group. For example, it is possible to properly evaluate the feelings as a whole group by using the voice of the group.

EFFICIENT EMPIRICAL DETERMINATION, COMPUTATION, AND USE OF ACOUSTIC CONFUSABILITY MEASURES
20230206914 · 2023-06-29 ·

A computer-implemented method includes generating an empirically derived acoustic confusability measure by processing example utterances and iterating from an initial estimate of the acoustic confusability measure to improve the measure. The method can further include using the acoustic confusability measure to selectively limit phrases to make recognizable by a speech recognition application.

EFFICIENT EMPIRICAL DETERMINATION, COMPUTATION, AND USE OF ACOUSTIC CONFUSABILITY MEASURES
20230206914 · 2023-06-29 ·

A computer-implemented method includes generating an empirically derived acoustic confusability measure by processing example utterances and iterating from an initial estimate of the acoustic confusability measure to improve the measure. The method can further include using the acoustic confusability measure to selectively limit phrases to make recognizable by a speech recognition application.

AUTOMATIC IN-GAME SUBTITLES AND CLOSED CAPTIONS
20230201717 · 2023-06-29 ·

An approach is provided for a gaming overlay application to provide automatic in-game subtitles and/or closed captions for video game applications. The overlay application accesses an audio stream and a video stream generated by an executing game application. The overlay application processes the audio stream through a text conversion engine to generate at least one subtitle. The overlay application determines a display position to associate with the at least one subtitle. The overlay application generates a subtitle overlay comprising the at least one subtitle located at the associated display position. The overlay application causes a portion of the video stream to be displayed with the subtitle overlay.

AUTOMATIC IN-GAME SUBTITLES AND CLOSED CAPTIONS
20230201717 · 2023-06-29 ·

An approach is provided for a gaming overlay application to provide automatic in-game subtitles and/or closed captions for video game applications. The overlay application accesses an audio stream and a video stream generated by an executing game application. The overlay application processes the audio stream through a text conversion engine to generate at least one subtitle. The overlay application determines a display position to associate with the at least one subtitle. The overlay application generates a subtitle overlay comprising the at least one subtitle located at the associated display position. The overlay application causes a portion of the video stream to be displayed with the subtitle overlay.