Patent classifications
G10L15/1822
Recommending Results In Multiple Languages For Search Queries Based On User Profile
Systems and methods for a media guidance application that generates results in multiple languages for search queries. In particular, the media guidance application resolves multiple language barriers by taking automatic and manual user language settings and applying those settings to a variety of potential search results.
DIALOGUE APPARATUS, METHOD AND PROGRAM
A dialogue apparatus includes a speech recognition unit (1) configured to perform speech recognition on utterance input to generate a text corresponding to the utterance, a speech waveform corresponding to the utterance, and information regarding a length of sound of the utterance; a language understanding unit (2) configured to grasp contents of the utterance by using the text corresponding to the utterance; a dialogue management unit (3) configured to determine contents of a response corresponding to the utterance by using the content of the utterance; an utterance state extraction unit (4) configured to extract a state of the utterance by using the text corresponding to the utterance, the speech waveform corresponding to the utterance, and the information regarding the length of the sound of the utterance; a response state determination unit (5) configured to determine a state of the response according to the state of the utterance; a response sentence generation unit (6) configured to generate a response sentence by using the content of the response; and a speech synthesis unit (7) configured to synthesize speech corresponding to the response sentence with the state of the response taken into account.
Voice controlled assistant with coaxial speaker and microphone arrangement
A voice controlled assistant has a housing to hold one or more microphones, one or more speakers, and various computing components. The housing has an elongated cylindrical body extending along a center axis between a base end and a top end. The microphone(s) are mounted in the top end and the speaker(s) are mounted proximal to the base end. The microphone(s) and speaker(s) are coaxially aligned along the center axis. The speaker(s) are oriented to output sound directionally toward the base end and opposite to the microphone(s) in the top end. The sound may then be redirected in a radial outward direction from the center axis at the base end so that the sound is output symmetric to, and equidistance from, the microphone(s).
Method and device for user registration, and electronic device
Provided in embodiments of the present application are a method and apparatus for user registration and electronic device. The method includes: after obtaining a wake-up voice of a user each time, extracting and storing a first voiceprint feature corresponding to the wake-up voice; clustering the stored first voiceprint features to divide the stored first voiceprint features into at least one category, wherein, each of the at least one category includes at least one first voiceprint feature which belongs to the same user; assigning one category identifier to each category; storing each category identifier in correspondence to at least one first voiceprint feature corresponding to this category identifier to complete user registration. The embodiments of the present application can simplify the user operation and improve the user experience.
Using frames for action dialogs
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using frames for performing tasks. One of the methods includes receiving a first request to perform a task, the first request comprising user speech identifying the task; generating a frame associated with the task, wherein the frame comprises one or more types of values necessary to perform the task, and wherein each type of value can be satisfied by a respective value; receiving a second request to provide information related to a question, the second request comprising user speech identifying the question; providing information identifying the question to a search engine, and receiving a response identifying one or more terms; determining that at least one term can satisfy a type of value necessary to perform the task; and storing the at least one term in the frame.
Real time correction of accent in speech audio signals
Systems and methods for real-time correction of an accent in a speech audio signal are provided. A method includes dividing the speech audio signal into a stream of input chunks, an input chunk from the stream of input chunks including a pre-defined number of frames of the speech audio signal, extracting, by an acoustic features extraction module from the input chunk and a context associated with the input chunk, acoustic features, the context is a pre-determined number of the frames preceding the input chunk in the stream; extracting, by a linguistic features extraction module from the input chunk and the context, linguistic features, receiving a speaker embedding for a human speaker, providing the speaker embedding, the acoustic features, and the linguistic features to a synthesis module to generate a melspectrogram with a reduced accent, providing the melspectrogram to a vocoder to generate an output chunk of an output audio signal.
Methods and systems for recommending content in context of a conversation
A media guidance application may monitor a conversation among users, and identify keywords in the conversation, without the use of wakewords. The keywords are used to search for media content that is relevant to the on-going conversation. Accordingly, the media guidance application presents relevant content to the users, during the conversation, to more actively engage the users. A conversation monitoring window may be used to present conversation information as well as relevant content. A listening mode may be used to manage when the media guidance application processes speech from a conversation. The media guidance application may access user profiles for keywords, select content types, select content sources, and determine relevancy of media content, to provide content in context of a conversation.
Skill shortlister for natural language processing
Devices and techniques are generally described for application determination in speech processing. Input data corresponding to a spoken utterance may be received. Speech recognition processing may be performed on the input data to generate text data. A machine learning encoder may generate a vector representation of the input data. A first binary classifier may determine a first probability that the input data corresponds to a first speech-processing application. A second binary classifier may determine a second probability that the input data corresponds to a second speech-processing application. A selection between the first speech-processing application and the second speech-processing application may be made based at least in part on the first probability and the second probability.
Cognitive Training Using Voice Command
Systems and methods for cognitive training using voice command are described. One aspect includes a device to repeatedly present visual stimuli to a user that require performance of a task. A microphone may be positioned to provide audio input from the user to the device, with the audio input from the user providing input required to measure task performance. A processing system may perform real-time analysis of measured task performance.
Intelligent Voice Interface for Handling Out-of-Context Dialog
In a method for handling out-of-sequence caller dialog, an intelligent voice interface is configured to lead callers through pathways of an algorithmic dialog that includes available voice prompts for requesting different types of caller information. The method may include, during a voice communication with a caller via a caller device, receiving from the caller device caller input data indicative of a voice input of the caller, without having first provided to the caller device any voice prompt that requests a first type of caller information, and determining, by processing the caller input data, that the voice input includes caller information of the first type. The method also includes after determining that the voice input includes the caller information of the first type, bypassing one or more voice prompts, of the available voice prompts, that request the first type of caller information.