Patent classifications
G10L13/02
SCOPE WITH TEXT AND SPEECH COMMUNICATION SYSTEM
An apparatus can include an optical device comprising an internal display, a communications system coupled to the optical device, the communications system to receive a wireless signal comprising data representing a textual signal, and a hardware processor to convert the data representing the textual signal into a format for displaying on the digital display. The apparatus can include a text and speech processor for converting signals representing audible speech to visual text and primitive graphics for display on the internal display. The text and speech processor can also convert text to speech for audio output.
SCOPE WITH TEXT AND SPEECH COMMUNICATION SYSTEM
An apparatus can include an optical device comprising an internal display, a communications system coupled to the optical device, the communications system to receive a wireless signal comprising data representing a textual signal, and a hardware processor to convert the data representing the textual signal into a format for displaying on the digital display. The apparatus can include a text and speech processor for converting signals representing audible speech to visual text and primitive graphics for display on the internal display. The text and speech processor can also convert text to speech for audio output.
METHOD AND SYSTEM FOR VIRTUAL INTELLIGENCE USER INTERACTION
A method and apparatus to generate and update virtual personification using artificial intelligence comprising a system configured to perform the following. Receive data associated with a person such as text files, audio files, image files, and video files. Render a virtual personification of the person and output the virtual personification to a user, such as on a display screen. Then, receiving and interpreting a user input to generate a user request, and then updating the virtual personification. The update may include generating an audio output using the text files and the audio files of the person and/or generating a video output using the image files and the video files of the person. The audio output and the video output is presented to the user by the virtual personification and it has not previously occurred by the person or thing represented by the virtual personification.
METHOD AND SYSTEM FOR VIRTUAL INTELLIGENCE USER INTERACTION
A method and apparatus to generate and update virtual personification using artificial intelligence comprising a system configured to perform the following. Receive data associated with a person such as text files, audio files, image files, and video files. Render a virtual personification of the person and output the virtual personification to a user, such as on a display screen. Then, receiving and interpreting a user input to generate a user request, and then updating the virtual personification. The update may include generating an audio output using the text files and the audio files of the person and/or generating a video output using the image files and the video files of the person. The audio output and the video output is presented to the user by the virtual personification and it has not previously occurred by the person or thing represented by the virtual personification.
On-device speech synthesis of textual segments for training of on-device speech recognition model
Processor(s) of a client device can: identify a textual segment stored locally at the client device; process the textual segment, using a speech synthesis model stored locally at the client device, to generate synthesized speech audio data that includes synthesized speech of the identified textual segment; process the synthesized speech, using an on-device speech recognition model that is stored locally at the client device, to generate predicted output; and generate a gradient based on comparing the predicted output to ground truth output that corresponds to the textual segment. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.
Speech synthesizer for evaluating quality of synthesized speech using artificial intelligence and method of operating the same
A speech synthesizer for evaluating quality of a synthesized speech using artificial intelligence includes a database configured to store a synthesized speech corresponding to text, a correct speech corresponding to the text and a speech quality evaluation model for evaluating the quality of the synthesized speech, and a processor configured to compare a first speech feature set indicating a feature of the synthesized speech and a second speech feature set indicating a feature of the correct speech, acquire a quality evaluation index set including indices used to evaluate the quality of the synthesized speech according to a result of comparison, and determine weights as model parameters of the speech quality evaluation model using the acquired quality evaluation index set and the speech quality evaluation model.
Speech synthesizer for evaluating quality of synthesized speech using artificial intelligence and method of operating the same
A speech synthesizer for evaluating quality of a synthesized speech using artificial intelligence includes a database configured to store a synthesized speech corresponding to text, a correct speech corresponding to the text and a speech quality evaluation model for evaluating the quality of the synthesized speech, and a processor configured to compare a first speech feature set indicating a feature of the synthesized speech and a second speech feature set indicating a feature of the correct speech, acquire a quality evaluation index set including indices used to evaluate the quality of the synthesized speech according to a result of comparison, and determine weights as model parameters of the speech quality evaluation model using the acquired quality evaluation index set and the speech quality evaluation model.
Phonemic keyboard apparatus and method
An apparatus which includes: a computer processor; a computer memory; and a computer display; and an audio speaker; wherein the computer memory has computer programming stored therein which is configured to be implemented by the computer processor to display a keyboard having a plurality of keys on the computer display; wherein each of the plurality of keys includes indicia for a phonemic sound; and wherein the computer programming stored in the computer memory is configured to be implemented by the computer processor to cause a sound corresponding to the phonemic sound of each corresponding key of the plurality of keys to be emitted from the audio speaker when each key of the plurality of keys is selected. A plurality of grapheme outcomes may be determined for each of the plurality of keys when each of the plurality of keys is selected.
Phonemic keyboard apparatus and method
An apparatus which includes: a computer processor; a computer memory; and a computer display; and an audio speaker; wherein the computer memory has computer programming stored therein which is configured to be implemented by the computer processor to display a keyboard having a plurality of keys on the computer display; wherein each of the plurality of keys includes indicia for a phonemic sound; and wherein the computer programming stored in the computer memory is configured to be implemented by the computer processor to cause a sound corresponding to the phonemic sound of each corresponding key of the plurality of keys to be emitted from the audio speaker when each key of the plurality of keys is selected. A plurality of grapheme outcomes may be determined for each of the plurality of keys when each of the plurality of keys is selected.
METHOD AND APPARATUS FOR PROCESSING SPEECH, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method for processing a speech includes: acquiring an original speech; extracting a spectrogram from the original speech; acquiring a speech synthesis model, where the speech synthesis model comprises a first generation sub-model and a second generation sub-model; generating a harmonic structure of the spectrogram, by invoking the first generation sub-model to process the spectrogram; and generating a target speech, by invoking the second generation sub-model to process the harmonic structure and the spectrogram.