Patent classifications
G10L21/003
HEARING SYSTEM INCLUDING A HEARING INSTRUMENT AND METHOD FOR OPERATING THE HEARING INSTRUMENT
A hearing system includes a hearing instrument for capturing a sound signal from an environment of the hearing instrument. The captured sound signal is processed, and the processed sound signal is output to a user of the hearing instrument. In a speech recognition step, the captured sound signal is analyzed to recognize speech intervals, in which the captured sound signal contains speech. In a speech enhancement procedure performed during recognized speech intervals, the amplitude of the processed sound signal is periodically varied according to a temporal pattern that is consistent with a stress rhythmic pattern of the user. A method for operating the hearing instrument is also provided.
METHOD, APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT FOR VIDEO COMMUNICATION
A method for video communication includes: selecting a first virtual object on a first video communication interface and obtaining associated first virtual object information, displaying a first virtual reality video image and a second virtual reality video image on the first video communication interface, the first virtual reality video image corresponds the first virtual object information and a first user feature, and the second virtual reality video image corresponds second virtual object information and a second user feature; and playing a target virtual audio, the target virtual audio including one or both of a first virtual audio or a second virtual audio, the first virtual audio corresponds to first voice data and the first virtual object information, and the second virtual audio corresponds to second voice data and the second virtual object information.
PROJECTION ON A VEHICLE WINDOW
A system includes a camera aimed externally to a vehicle, a window of the vehicle, a projector positioned to project on the window, and a computer communicatively coupled to the camera and the projector. The computer is programmed to, upon receiving data from the camera indicating a first person outside the vehicle, instruct the projector to project an image on the window depicting a second person inside the vehicle.
Audio improvement using closed caption data
Methods and systems are described herein for improving audio for hearing impaired content consumers. An example method may comprise determining a content asset. Closed caption data associated with the content asset may be determined. At least a portion of the closed caption data may be determined based on a user setting associated with a hearing impairment. Compensating audio comprising a frequency translation associated with at least the portion of the closed caption data may be generated. The content asset may be caused to be output with audio content comprising the compensating audio and the original audio.
Training method of a speaker identification model based on a first language and a second language
A training method of training a speaker identification model which receives voice data as an input and outputs speaker identification information for identifying a speaker of an utterance included in the voice data is provided. The training method includes: performing voice quality conversion of first voice data of a first speaker to generate second voice data of a second speaker; and performing training of the speaker identification model using, as training data, the first voice data and the second voice data.
Training method of a speaker identification model based on a first language and a second language
A training method of training a speaker identification model which receives voice data as an input and outputs speaker identification information for identifying a speaker of an utterance included in the voice data is provided. The training method includes: performing voice quality conversion of first voice data of a first speaker to generate second voice data of a second speaker; and performing training of the speaker identification model using, as training data, the first voice data and the second voice data.
Device for outputting sound and method therefor
A device for outputting sound and a method therefor are provided. The sound output method includes predicting external sound to be received from an external environment, variably adjusting sound to be output from the device, based on the predicted external sound, and outputting the adjusted sound.
Removal of identifying traits of a user in a virtual environment
A virtual environment platform may receive, from a user device, a request to access a virtual reality (VR) environment and may verify, based on the request, a user of the user device to allow the user device access to the VR environment. The virtual environment platform may receive, after verifying the user of the user device, user voice input and user handwritten input from the user device. The virtual environment platform may generate processed user speech by processing the user voice input, wherein a characteristic of the processed user speech and a corresponding characteristic of the user voice input are different and may generate formatted user text by processing the user handwritten input, wherein the formatted user text is machine-encoded text. The virtual environment platform may cause the processed user speech to be audibly presented and the formatted user text to be visually presented in the VR environment.
PERSONALIZED VOICE CONVERSION SYSTEM
A personalized voice conversion system includes a cloud server and an intelligent device that communicates with the cloud server. The intelligent device upstreams an original voice signal to the cloud server. The cloud server converts the original voice signal into an intelligible voice signal based on an intelligible voice conversion model. The intelligent device downloads and plays the intelligible voice signal. Based on the original voice signal and the corresponding intelligible voice signal, the cloud server and the intelligent device train an off-line voice conversion model provided to the intelligent device. When the intelligent device stops communicating with the cloud server, the intelligent device converts a new original voice signal into a new intelligible voice signal based on the off-line voice conversion model and plays the new intelligible voice signal.
END-TO-END SPEECH CONVERSION
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.