Patent classifications
G10L17/00
METHOD OF CONVERTING SPEECH, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM
A method of converting a speech, an electronic device, and a readable storage medium are provided, which relate to a field of artificial intelligence technology such as speech and deep learning, in particular to speech converting technology. The method of converting a speech includes: acquiring a first speech of a target speaker; acquiring a speech of an original speaker; extracting a first feature parameter of the first speech of the target speaker; extracting a second feature parameter of the speech of the original speaker; processing the first feature parameter and the second feature parameter to obtain a Mel spectrum information; and converting the Mel spectrum information to output a second speech of the target speaker having a tone identical to a tone of the first speech of the target speaker and a content identical to a content of the speech of the original speaker.
SYSTEM AND METHOD FOR CONVERSATION-BASED NOTIFICATION MANAGEMENT
A method for dynamic notification management at a head mounted display (HMD) includes presenting, at the HMD, an augmented reality display, receiving, at a processor controlling the HMD, a notification from an application for display at the HMD, receiving, at the processor at a first time, first sensor data from one or more of a camera or a microphone, determining, based on the first sensor data, a first value of one or more factors associated with a probability that a user of the HMD is currently in a real-world conversation, determining an importance value of the received notification at the first time, and determining whether to display the notification from the application based on a comparison of the first value of the one or more factors associated relative to the importance value of the received notification.
SYSTEM AND METHOD FOR CONVERSATION-BASED NOTIFICATION MANAGEMENT
A method for dynamic notification management at a head mounted display (HMD) includes presenting, at the HMD, an augmented reality display, receiving, at a processor controlling the HMD, a notification from an application for display at the HMD, receiving, at the processor at a first time, first sensor data from one or more of a camera or a microphone, determining, based on the first sensor data, a first value of one or more factors associated with a probability that a user of the HMD is currently in a real-world conversation, determining an importance value of the received notification at the first time, and determining whether to display the notification from the application based on a comparison of the first value of the one or more factors associated relative to the importance value of the received notification.
VOICE PROCESSING SYSTEM AND VOICE PROCESSING METHOD
A voice processing system includes: a first acquisition processor that acquires voice data corrected by a microphone installed in a microphone-speaker device; a second acquisition processor that acquires authentication information of a wearer who wears the microphone-speaker device, the authentication information being acquired by an authentication information acquirer installed in the microphone-speaker device; and a control processor that executes predetermined processing related to the voice data, which is acquired by the first acquisition processor, on the basis of the authentication information acquired by the second acquisition processor.
Wakeword detection
Techniques for implementing multiple wakeword detectors on a single device are described. A digital signal processor (DSP) of the device may initially include an untrained wakeword detection component. The wakeword detection component of the DSP may be trained by engaging a user to speak particular utterances. Once a companion application is configured to implement a wakeword detection component, the companion application's wakeword detection component may be trained specific to the user of the device. Once the companion application's wakeword detection component is trained, the DSP wakeword detection component may be deactivated or its accuracy adjusted.
Wakeword detection
Techniques for implementing multiple wakeword detectors on a single device are described. A digital signal processor (DSP) of the device may initially include an untrained wakeword detection component. The wakeword detection component of the DSP may be trained by engaging a user to speak particular utterances. Once a companion application is configured to implement a wakeword detection component, the companion application's wakeword detection component may be trained specific to the user of the device. Once the companion application's wakeword detection component is trained, the DSP wakeword detection component may be deactivated or its accuracy adjusted.
Intelligent automated assistant for TV user interactions
Systems and processes are disclosed for controlling television user interactions using a virtual assistant. In an example process, a virtual assistant can interact with a television set-top box to control content shown on a television display. Speech input for the virtual assistant can be received from a device with a microphone. The speech input can comprise a query associated with content shown on the television display. A user intent of the query can be determined based on one or more of the content shown on the television display and a viewing history of media content. A result of the query can be caused to be displayed based on the determined user intent.
Intelligent automated assistant for TV user interactions
Systems and processes are disclosed for controlling television user interactions using a virtual assistant. In an example process, a virtual assistant can interact with a television set-top box to control content shown on a television display. Speech input for the virtual assistant can be received from a device with a microphone. The speech input can comprise a query associated with content shown on the television display. A user intent of the query can be determined based on one or more of the content shown on the television display and a viewing history of media content. A result of the query can be caused to be displayed based on the determined user intent.
Automatic dubbing method and apparatus
An automatic dubbing method is disclosed. The method comprises: extracting speeches of a voice from an audio portion of a media content (504); obtaining a voice print model for the extracted speeches of the voice (506); processing the extracted speeches by utilizing the voice print model to generate replacement speeches (508); and replacing the extracted speeches of the voice with the generated replacement speeches in the audio portion of the media content (510).
Automatic dubbing method and apparatus
An automatic dubbing method is disclosed. The method comprises: extracting speeches of a voice from an audio portion of a media content (504); obtaining a voice print model for the extracted speeches of the voice (506); processing the extracted speeches by utilizing the voice print model to generate replacement speeches (508); and replacing the extracted speeches of the voice with the generated replacement speeches in the audio portion of the media content (510).