G10L13/00

Method for providing speech and intelligent computing device controlling speech providing apparatus
11580953 · 2023-02-14 · ·

A method for providing a speech and an intelligent computing device controlling a speech providing apparatus are disclosed. A method for providing a speech according to an embodiment of the present invention includes obtaining a message, converting the message into a speech, and determining output pattern based on a generation situation of the message, so that it is possible to more realistically convey a situation at a time of message generation to a receiver of TTS. One or more of the voice providing method, devices, intelligent computing devices controlling the voice providing device, and servers of the present invention may include artificial intelligence modules, drones (Unmanned Aerial Vehicles, UAVs), robots, Augmented Reality (AR) devices, and virtual reality (VR) devices, devices related to 5G services, and the like.

Systems and methods for addressing a corrupted segment in a media asset

Systems and methods for addressing a corrupted segment in a media asset. The media guidance application determines that a segment of a media asset is corrupted. The media guidance application determines whether a retrieval period to retrieve an uncorrupted copy of the segment exceeds a threshold period. If the retrieval period does not exceed the threshold period, the media guidance application retrieves and generates for display the uncorrupted copy of the segment. If the retrieval period exceeds the threshold period, the media guidance application determines whether an importance level of the corrupted segment exceeds a threshold level. If the importance level exceeds the threshold level, the media guidance application generates for display a summary for the corrupted segment. If the importance level does not exceed the threshold level, the media guidance application generates for display the subsequent segment and the summary for the corrupted segment in an overlay.

Systems and methods for addressing a corrupted segment in a media asset

Systems and methods for addressing a corrupted segment in a media asset. The media guidance application determines that a segment of a media asset is corrupted. The media guidance application determines whether a retrieval period to retrieve an uncorrupted copy of the segment exceeds a threshold period. If the retrieval period does not exceed the threshold period, the media guidance application retrieves and generates for display the uncorrupted copy of the segment. If the retrieval period exceeds the threshold period, the media guidance application determines whether an importance level of the corrupted segment exceeds a threshold level. If the importance level exceeds the threshold level, the media guidance application generates for display a summary for the corrupted segment. If the importance level does not exceed the threshold level, the media guidance application generates for display the subsequent segment and the summary for the corrupted segment in an overlay.

Multilingual speech synthesis and cross-language voice cloning

A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

Speaker identity and content de-identification

One embodiment of the invention provides a method for speaker identity and content de-identification under privacy guarantees. The method comprises receiving input indicative of privacy protection levels to enforce, extracting features from a speech recorded in a voice recording, recognizing and extracting textual content from the speech, parsing the textual content to recognize privacy-sensitive personal information about an individual, generating de-identified textual content by anonymizing the personal information to an extent that satisfies the privacy protection levels and conceals the individual's identity, and mapping the de-identified textual content to a speaker who delivered the speech. The method further comprises generating a synthetic speaker identity based on other features that are dissimilar from the features to an extent that satisfies the privacy protection levels, and synthesizing a new speech waveform based on the synthetic speaker identity to deliver the de-identified textual content. The new speech waveform conceals the speaker's identity.

Speaker identity and content de-identification

One embodiment of the invention provides a method for speaker identity and content de-identification under privacy guarantees. The method comprises receiving input indicative of privacy protection levels to enforce, extracting features from a speech recorded in a voice recording, recognizing and extracting textual content from the speech, parsing the textual content to recognize privacy-sensitive personal information about an individual, generating de-identified textual content by anonymizing the personal information to an extent that satisfies the privacy protection levels and conceals the individual's identity, and mapping the de-identified textual content to a speaker who delivered the speech. The method further comprises generating a synthetic speaker identity based on other features that are dissimilar from the features to an extent that satisfies the privacy protection levels, and synthesizing a new speech waveform based on the synthetic speaker identity to deliver the de-identified textual content. The new speech waveform conceals the speaker's identity.

Sign language information processing method and apparatus, electronic device and readable storage medium

Sign language information processing method and apparatus, an electronic device and a readable storage medium provided by the present disclosure, achieve real-time collection of language data in a current communication of a user by obtaining voice information and video information collected by a user terminal in real time; and then match a speaking person with his or her speaking content by determining, in the video information, a speaking object corresponding to the voice information; and finally, make it possible for the user to clarify the corresponding speaking object when the user sees AR sign language animation in a sign language video by superimposing and displaying an augmented reality AR sign language animation corresponding to the voice information on a gesture area corresponding to the speaking object to obtain a sign language video. Therefore, it is possible to provide a higher user experience.

Systems and methods for privacy-protecting hybrid cloud and premise stream processing

Systems and methods for privacy-protecting hybrid cloud and premise stream processing are disclosed. In one embodiment, in an information processing device comprising at least one computer processor, a method for processing a voice communication including restricted content may include: (1) receiving from an electronic device, a customer communication; (2) identifying restricted content in the customer communication; (3) masking or marking the restricted content in the customer communication; (4) communicating the customer communication with the masked or marked restricted content to a cloud processor; (5) receiving a processed responsive communication comprising the masked or marked restricted content from the cloud processor; (6) unmasking or unmarking the restricted content in the processed responsive communication; and (7) communicating the processed responsive communication comprising the unmasked or unmarked restricted content to the electronic device.

Systems and methods for privacy-protecting hybrid cloud and premise stream processing

Systems and methods for privacy-protecting hybrid cloud and premise stream processing are disclosed. In one embodiment, in an information processing device comprising at least one computer processor, a method for processing a voice communication including restricted content may include: (1) receiving from an electronic device, a customer communication; (2) identifying restricted content in the customer communication; (3) masking or marking the restricted content in the customer communication; (4) communicating the customer communication with the masked or marked restricted content to a cloud processor; (5) receiving a processed responsive communication comprising the masked or marked restricted content from the cloud processor; (6) unmasking or unmarking the restricted content in the processed responsive communication; and (7) communicating the processed responsive communication comprising the unmasked or unmarked restricted content to the electronic device.

Preventing audio delay-induced miscommunication in audio/video conferences

Embodiments for delay-induced miscommunication reduction are provided. The embodiment may include capturing data streams transmitted between participants in an A/V exchange; translating, on a sender device prior to transmission to a recipient device, an audio stream within the data streams to text; timestamping, on a sender device prior to transmission to the recipient device, each word in the translated audio stream; transmitting the audio stream and the sender-side translated and timestamped audio stream to the recipient device; translating, on the recipient device, the transmitted audio stream to text; timestamping, on the recipient device, each word in the translated audio stream; determining a lag exists in the A/V exchange based on a comparison of each timestamp for corresponding words on the sender-side translated and timestamped audio stream and the recipient-side translated and timestamped audio stream; and generating a true transcript of an intended exchange between the participants based on the comparison.