Patent classifications
G10L15/22
Systems and methods for variably paced real-time translation between the written and spoken forms of a word
An enunciation system (ES) enables users to gain acquaintance, understanding, and mastery of the relationship between letters and sounds in the context of an alphabetic writing system. The ES enables the user to experience the action of sounding out a word, before their own phonics knowledge enables them to sound out the word independently; its continuous, unbroken speech output or input avoids the common confusions that ensue from analyzing words by breaking them up into discrete sounds; its user-controlled pacing allows the user to slow down enunciation at specific points of difficulty within the word; its real-time touch control allows the written word to be “played” like a musical instrument, with expressive and aesthetic possibilities; and its highlighting of the letter cluster that is responsible for the recognized phoneme enunciated by the user as it occurs allows the user to more easily associated the letters with the sounds.
Methods and systems for pushing audiovisual playlist based on text-attentional convolutional neural network
In some embodiments, methods and systems for pushing audiovisual playlists based on a text-attentional convolutional neural network include a local voice interactive terminal, a dialog system server and a playlist recommendation engine, where the dialog system server and the playlist recommendation engine are respectively connected to the local voice interactive terminal. In some embodiments, the local voice interactive terminal includes a microphone array, a host computer connected to the microphone array, and a voice synthesis chip board connected to the microphone array. In some embodiments, the playlist recommendation engine obtains rating data based on a rating predictor constructed by the neural network; the host computer parses the data into recommended playlist information; and the voice terminal synthesizes the results and pushes them to a user in the form of voice.
Methods and systems for pushing audiovisual playlist based on text-attentional convolutional neural network
In some embodiments, methods and systems for pushing audiovisual playlists based on a text-attentional convolutional neural network include a local voice interactive terminal, a dialog system server and a playlist recommendation engine, where the dialog system server and the playlist recommendation engine are respectively connected to the local voice interactive terminal. In some embodiments, the local voice interactive terminal includes a microphone array, a host computer connected to the microphone array, and a voice synthesis chip board connected to the microphone array. In some embodiments, the playlist recommendation engine obtains rating data based on a rating predictor constructed by the neural network; the host computer parses the data into recommended playlist information; and the voice terminal synthesizes the results and pushes them to a user in the form of voice.
System and method for context-enriched attentive memory network with global and local encoding for dialogue breakdown detection
A method, an electronic device and computer readable medium for dialogue breakdown detection are provided. The method includes obtaining a verbal input from an audio sensor. The method also includes generating a reply to the verbal input. The method additionally includes identifying a local context from the verbal input and a global context from the verbal input, additional verbal inputs previously received by the audio sensor, and previous replies generated in response to the additional verbal inputs. The method further includes identifying a dialogue breakdown in response to determining that the reply does not correspond to the local context and the global context. In addition, the method includes generating sound corresponding to the reply through a speaker when the dialogue breakdown is not identified.
System and method for context-enriched attentive memory network with global and local encoding for dialogue breakdown detection
A method, an electronic device and computer readable medium for dialogue breakdown detection are provided. The method includes obtaining a verbal input from an audio sensor. The method also includes generating a reply to the verbal input. The method additionally includes identifying a local context from the verbal input and a global context from the verbal input, additional verbal inputs previously received by the audio sensor, and previous replies generated in response to the additional verbal inputs. The method further includes identifying a dialogue breakdown in response to determining that the reply does not correspond to the local context and the global context. In addition, the method includes generating sound corresponding to the reply through a speaker when the dialogue breakdown is not identified.
Virtual assistant-initiated conversations
Aspects of the subject disclosure may include, for example, a method in which a processing system generates a list of communications desired by a user, and determines an attentiveness level of the user with respect to equipment of the user currently in use, to determine whether the user is available to participate in a desired communication. The method also includes accessing equipment of a target party of the communication to determine an attentiveness level of the target party with respect to equipment of the target party, to determine whether the target party is available to participate in the communication. The method also includes generating a notice at the equipment of the user that the target party is available; and initiating the communication between the equipment of the user and the equipment of the target party. Other embodiments are disclosed.
Virtual assistant-initiated conversations
Aspects of the subject disclosure may include, for example, a method in which a processing system generates a list of communications desired by a user, and determines an attentiveness level of the user with respect to equipment of the user currently in use, to determine whether the user is available to participate in a desired communication. The method also includes accessing equipment of a target party of the communication to determine an attentiveness level of the target party with respect to equipment of the target party, to determine whether the target party is available to participate in the communication. The method also includes generating a notice at the equipment of the user that the target party is available; and initiating the communication between the equipment of the user and the equipment of the target party. Other embodiments are disclosed.
Multi-user devices in a connected home environment
A device implementing a system for providing content in response to a request includes a processor configured to receive a voice request for content associated with a home environment, the voice request corresponding to a user account, and determine, based on the voice request, not to provide the content via the device. The processor is further configured to select, in response to the determining, a second device from among multiple devices associated with the home environment, wherein the selecting is based at least in part on configuration settings associated with the home environment, and provide for the second device to output the content based on a profile of the user account.
Multi-user devices in a connected home environment
A device implementing a system for providing content in response to a request includes a processor configured to receive a voice request for content associated with a home environment, the voice request corresponding to a user account, and determine, based on the voice request, not to provide the content via the device. The processor is further configured to select, in response to the determining, a second device from among multiple devices associated with the home environment, wherein the selecting is based at least in part on configuration settings associated with the home environment, and provide for the second device to output the content based on a profile of the user account.
Preventing audio delay-induced miscommunication in audio/video conferences
Embodiments for delay-induced miscommunication reduction are provided. The embodiment may include capturing data streams transmitted between participants in an A/V exchange; translating, on a sender device prior to transmission to a recipient device, an audio stream within the data streams to text; timestamping, on a sender device prior to transmission to the recipient device, each word in the translated audio stream; transmitting the audio stream and the sender-side translated and timestamped audio stream to the recipient device; translating, on the recipient device, the transmitted audio stream to text; timestamping, on the recipient device, each word in the translated audio stream; determining a lag exists in the A/V exchange based on a comparison of each timestamp for corresponding words on the sender-side translated and timestamped audio stream and the recipient-side translated and timestamped audio stream; and generating a true transcript of an intended exchange between the participants based on the comparison.