Patent classifications
G10L15/005
Speaker conversion for video games
This specification describes a computer-implemented method of generating speech audio for use in a video game, wherein the speech audio is generated using a voice convertor that has been trained to convert audio data for a source speaker into audio data for a target speaker. The method comprises receiving: (i) source speech audio, and (ii) a target speaker identifier. The source speech audio comprises speech content in the voice of a source speaker. Source acoustic features are determined for the source speech audio. A target speaker embedding associated with the target speaker identifier is generated as output of a speaker encoder of the voice convertor. The target speaker embedding and the source acoustic features are inputted into an acoustic feature encoder of the voice convertor. One or more acoustic feature encodings are generated as output of the acoustic feature encoder. The one or more acoustic feature encodings are derived from the target speaker embedding and the source acoustic features. Target speech audio is generated for the target speaker. The target speech audio comprises the speech content in the voice of the target speaker. The generating comprises decoding the one or more acoustic feature encodings using an acoustic feature decoder of the voice convertor.
CONVERTING SIGN LANGUAGE
Methods and devices related to converting sign language are described. In an example, a method can include receiving, at a processing resource of a computing device via a radio of the computing device, first signaling including at least one of text data, audio data, or video data, or any combination thereof, converting, at the processing resource, at least one of the text data, the audio data, or the video data to data representing a sign language, generating, at the processing resource, different video data based at least in part on the data representing the sign language, wherein the different video data comprises instructions for display of a performance of the sign language, transmitting second signaling representing the different video data from the processing resource to a user interface, and displaying the performance of the sign language on the user interface in response to the user interface receiving the second signaling.
Speech Recognition Method and Apparatus, Terminal, and Storage Medium
An artificial intelligence (AI)-based speech recognition method includes steps for obtaining a target speech signal, determining a target language type of the target speech signal, and outputting text information of the target speech signal using a real-time speech recognition model corresponding to the target language type. The real-time speech recognition model is obtained by training a training set including an original speech signal and an extended speech signal, and the extended speech signal is obtained by converting an existing text of a basic language type.
Information processing apparatus, information processing method, and recording medium
An information processing apparatus includes a controller that is configured to identify a first language into which a content of a speech that is input is to be translated, based on first information about a place, estimate an intention of the content of the speech based on the content of the speech that is translated into the first language, select a service to be provided, based on the intention that is estimated, and provide a guide related to the service that is selected, in a language of the speech. The first language is different from the language of the speech.
Language Agnostic Multilingual End-To-End Streaming On-Device ASR System
A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.
LANGUAGE INFERENCE APPARATUS, LANGUAGE INFERENCE METHOD, AND PROGRAM
A language inference apparatus (100) includes an acquisition unit (102) that acquires nationality information, a selection unit (104) that selects a language inference engine by using the acquired nationality information, and a determination unit (106) that determines a language used by a speaker, by analyzing voice information of the speaker using the selected language inference engine (110).
Recommending results in multiple languages for search queries based on user profile
Systems and methods for a media guidance application that generates results in multiple languages for search queries. In particular, the media guidance application resolves multiple language barriers by taking automatic and manual user language settings and applying those settings to a variety of potential search results.
Detection and prevention of inmate to inmate message relay
Secure system and method of detecting and preventing inmate to inmate message relays. A system and method which monitors inmate communications for similar phrases that occur as part of two or more separate inmate messages. These similar phrases may be overlapping in real time as in a conference call or can occur at separate times in separate messages. The communications that appear similar are assigned a score and the score is compared to a threshold. If the score is above a certain threshold, the communication is flagged and remedial actions are taken. If the flagged communication contains illegal matter then the communication can be disconnected or restricted in the future.
VOICE-BASED CONTROL OF SEXUAL STIMULATION DEVICES
A system and method for voice-based control of sexual stimulation devices. In some configurations, the system and method involve receiving voice data, analyzing the voice data to detect spoken commands, and generating control signals based on the commands. In some configurations, the system and method involve receiving voice data, analyzing the voice data for non-speech vocalizations, detecting voice stress patterns, and generating control signals based on the detected patterns. In some configurations, the analyses of the voice data are performed by machine learning algorithms which may be trained on associations between speech and non-speech vocalizations of a user while the user engages in one or more voice-based training tasks, associating speech and non-speech vocalizations with controls of the sexual stimulation device. In some configurations, machine learning algorithms are used to make the associations. In some configurations, data from other biometric sensors is included in the associations.
Virtual receptionist via videoconferencing
One disclosed example system includes a reception room meeting device configured for establishing a video conference with a device associated with a remote receptionist. The reception room meeting device sends a request for a video meeting with one of a plurality of candidate remote receptionists in response to receiving an activation signal triggered by a visitor to a reception area, and establishes the video meeting with a device associated with one remote receptionist selected based on the request. The system further includes a virtual receptionist system configured to access visitor data obtained by various input devices at the reception area, and determine the status of the visitor based on the visitor data. The virtual receptionist system further transmits the status of the visitor to the device associated with the selected remote receptionist to facilitate the check-in process.