Patent classifications
G10L13/033
SYNTHESIZED SPEECH AUDIO DATA GENERATED ON BEHALF OF HUMAN PARTICIPANT IN CONVERSATION
Generating synthesized speech audio data on behalf of a given user in a conversation. The synthesized speech audio data includes synthesized speech that incorporates textual segment(s). The textual segment(s) can include recognized text that results from processing spoken input, of the given user, using a speech recognition model and/or can include a selection of a rendered suggestion that conveys the textual segment(s). Some implementations dynamically determine one or more prosodic properties for use in speech synthesis of the textual segment, and generate the synthesized speech with the one or more determined prosodic properties. The prosodic properties can be determined based on the textual segment(s) used in speech synthesis, textual segment(s) corresponding to recent spoken input of additional participant(s), attribute(s) of relationship(s) between the given user and additional participant(s) in the conversation, and/or feature(s) of a current location for the conversation.
PROJECTION ON A VEHICLE WINDOW
A system includes a camera aimed externally to a vehicle, a window of the vehicle, a projector positioned to project on the window, and a computer communicatively coupled to the camera and the projector. The computer is programmed to, upon receiving data from the camera indicating a first person outside the vehicle, instruct the projector to project an image on the window depicting a second person inside the vehicle.
PROJECTION ON A VEHICLE WINDOW
A system includes a camera aimed externally to a vehicle, a window of the vehicle, a projector positioned to project on the window, and a computer communicatively coupled to the camera and the projector. The computer is programmed to, upon receiving data from the camera indicating a first person outside the vehicle, instruct the projector to project an image on the window depicting a second person inside the vehicle.
MULTI-USER VOICE ASSISTANT WITH DISAMBIGUATION
Disambiguating question answering responses by receiving voice command data associated with a first user, determining a first user identity according to the first user voice command data, determining a first user activity context according to the first user voice command data, determining a first response for the first user, receiving voice command data associated with a second user, determining a second user identity according to the second user voice command data, determining a second user activity context according to the second user voice command data, determining a second response for the second user, determining a predicted ambiguity between the first response and the second response, altering the first response according to the predicted ambiguity, and providing the first response and the second response.
Synthetic speech processing
A speech-processing system receives input data representing text. A first encoder processes segments of the text to determine embedding data representing the text, and a second encoder processes corresponding audio data to determine prosodic data corresponding to the text. The embedding and prosodic data is processed to create output data including a representation of speech corresponding to the text and prosody.
Speaker identity and content de-identification
One embodiment of the invention provides a method for speaker identity and content de-identification under privacy guarantees. The method comprises receiving input indicative of privacy protection levels to enforce, extracting features from a speech recorded in a voice recording, recognizing and extracting textual content from the speech, parsing the textual content to recognize privacy-sensitive personal information about an individual, generating de-identified textual content by anonymizing the personal information to an extent that satisfies the privacy protection levels and conceals the individual's identity, and mapping the de-identified textual content to a speaker who delivered the speech. The method further comprises generating a synthetic speaker identity based on other features that are dissimilar from the features to an extent that satisfies the privacy protection levels, and synthesizing a new speech waveform based on the synthetic speaker identity to deliver the de-identified textual content. The new speech waveform conceals the speaker's identity.
Systems and methods for variably paced real-time translation between the written and spoken forms of a word
An enunciation system (ES) enables users to gain acquaintance, understanding, and mastery of the relationship between letters and sounds in the context of an alphabetic writing system. The ES enables the user to experience the action of sounding out a word, before their own phonics knowledge enables them to sound out the word independently; its continuous, unbroken speech output or input avoids the common confusions that ensue from analyzing words by breaking them up into discrete sounds; its user-controlled pacing allows the user to slow down enunciation at specific points of difficulty within the word; its real-time touch control allows the written word to be “played” like a musical instrument, with expressive and aesthetic possibilities; and its highlighting of the letter cluster that is responsible for the recognized phoneme enunciated by the user as it occurs allows the user to more easily associated the letters with the sounds.
CORRECTION METHOD OF SYNTHESIZED SPEECH SET FOR HEARING AID
A method for correcting a synthesized speech set for hearing aid according to an aspect of the present invention includes the steps of outputting first synthesized speech for testing on the basis of first synthesized speech data for testing correlated with a first phoneme label in a synthesized speech set for testing, accepting a first answer selected by a user, outputting second synthesized speech for testing on the basis of second synthesized speech data for testing correlated with a second phoneme label in the synthesized speech set for testing, accepting a second answer selected by the user, and correlating first synthesized speech data for hearing aid with the second phoneme label instead of second synthesized speech data for hearing aid in a synthesized speech set for hearing aid, in a case in which the first answer matches the second phoneme label and also the second answer does not match the second phoneme label.
REPRODUCTION CONTROL METHOD, CONTROL SYSTEM, AND PROGRAM
A reproduction control method implemented by a computer includes receiving, from a first terminal device, a first reproduction request in accordance with an instruction from a first user, receiving, from a second terminal device, a second reproduction request in accordance with an instruction from a second user, acquiring a first acoustic signal representing a first sound in accordance with the first reproduction request, and a second acoustic signal representing a second sound which is in accordance with the second reproduction request and have acoustic characteristics that differ from acoustic characteristics of the first sound represented by the first acoustic signal, mixing the first acoustic signal and the second acoustic signal, thereby generating a third acoustic signal, and causing a reproduction system to reproduce a third sound represented by the third acoustic signal.
REPRODUCTION CONTROL METHOD, CONTROL SYSTEM, AND PROGRAM
A reproduction control method implemented by a computer includes receiving, from a first terminal device, a first reproduction request in accordance with an instruction from a first user, receiving, from a second terminal device, a second reproduction request in accordance with an instruction from a second user, acquiring a first acoustic signal representing a first sound in accordance with the first reproduction request, and a second acoustic signal representing a second sound which is in accordance with the second reproduction request and have acoustic characteristics that differ from acoustic characteristics of the first sound represented by the first acoustic signal, mixing the first acoustic signal and the second acoustic signal, thereby generating a third acoustic signal, and causing a reproduction system to reproduce a third sound represented by the third acoustic signal.