G10L13/00

VOICE DATA CREATION DEVICE
20230223005 · 2023-07-13 · ·

A voice data creation device is a device configured to create voice data including an additional word which is a word to be added to a recognition target in a speech recognition system, and includes: a sentence example extraction unit configured to extract one or more text corpora including the additional word from a text corpus group including a plurality of text corpora consisting of sentence examples including a plurality of words; a sentence example selection unit configured to select a text corpus having a highest measure indicating a likelihood of occurrence as a sentence among the text corpora extracted by the sentence example extraction unit 11 as an optimal sentence example for the additional word; and a voice creation unit configured to output a synthesized voice of the optimal sentence example generated by a predetermined voice synthesis system as voice data corresponding to the additional word.

Automated sign language translation and communication using multiple input and output modalities
11557152 · 2023-01-17 · ·

Methods, apparatus and systems for recognizing sign language movements using multiple input and output modalities. One example method includes capturing a movement associated with the sign language using a set of visual sensing devices, the set of visual sensing devices comprising multiple apertures oriented with respect to the subject to receive optical signals corresponding to the movement from multiple angles, generating digital information corresponding to the movement based on the optical signals from the multiple angles, collecting depth information corresponding to the movement in one or more planes perpendicular to an image plane captured by the set of visual sensing devices, producing a reduced set of digital information by removing at least some of the digital information based on the depth information, generating a composite digital representation by aligning at least a portion of the reduced set of digital information, and recognizing the movement based on the composite digital representation.

Processing Multimodal User Input for Assistant Systems
20230222605 · 2023-07-13 ·

In one embodiment, a method includes receiving at a head-mounted device a speech input from a user and a visual input captured by cameras of the head-mounted device, wherein the visual input comprises subjects and attributes associated with the subjects, and wherein the speech input comprises a co-reference to one or more of the subjects, resolving entities corresponding to the subjects associated with the co-reference based on the attributes and the co-reference, and presenting a communication content responsive to the speech input and the visual input at the head-mounted device, wherein the communication content comprises information associated with executing results of tasks corresponding to the resolved entities.

Computer-based systems and methods configured for one or more technological applications for the automated assisting of telephone agent services

At least some embodiments, a system includes a memory, and a processor configured to convert an audio stream of a speech of a customer during a customer call session into customer-originated text. The customer-originated text is displayed in a first chat interface. A request from a first call center agent is sent to a second call center agent via the first chat interface to interact with the customer during the customer call session and displayed in a second chat interface. The second agent is allowed to participate in the customer call session when the second call center agent accepts the request from the first call center agent. First agent-originated text and second agent-originated text during the customer call session is merged to form a combined agent-originated text and synthesized to computer-generated agent speech having a voice of a computer-generated agent based on the combined agent-originated text communicated to the customer over the voice channel.

Computer-based systems and methods configured for one or more technological applications for the automated assisting of telephone agent services

At least some embodiments, a system includes a memory, and a processor configured to convert an audio stream of a speech of a customer during a customer call session into customer-originated text. The customer-originated text is displayed in a first chat interface. A request from a first call center agent is sent to a second call center agent via the first chat interface to interact with the customer during the customer call session and displayed in a second chat interface. The second agent is allowed to participate in the customer call session when the second call center agent accepts the request from the first call center agent. First agent-originated text and second agent-originated text during the customer call session is merged to form a combined agent-originated text and synthesized to computer-generated agent speech having a voice of a computer-generated agent based on the combined agent-originated text communicated to the customer over the voice channel.

Viseme data generation for presentation while content is output

Systems and methods for viseme data generation are disclosed. Uncompressed audio data is generated and/or utilized to determine the beats per minute of the audio data. Visemes are associated with the audio data utilizing a Viterbi algorithm and the beats per minute. A time-stamped list of viseme data is generated that associates the visemes with the portions of the audio data that they correspond to. An animatronic toy and/or an animation is caused to lip sync using the viseme data while audio corresponding to the audio data is output.

Systems and methods for morpheme reflective engagement response for revision and transmission of a recording to a target individual
11699037 · 2023-07-11 · ·

Systems and methods for increasing the impact of a message for a target individual are provided. An audio recording of the message and audio recordings of the target individual are each associated with transcribed text, which is separated into morphemes. Morphemes in the message are substituted with, or supplemented by, matching morphemes in the audio recordings of the target individual to create a revised version of the audio recording of the message, and then electronically transmit the revised audio recording to an electronic device associated with the target individual.

Method for converting vibration to voice frequency wirelessly

The present application discloses a Method for converting vibration to voice frequency wirelessly and a method thereof. By sensing a first vibration variation data and a voice frequency variation data of a vocal vibration part in a first sensing period, a voice frequency reference data is obtained from the voice frequency variation data and the first vibration result. A second vibration result is obtained at a second sensing period for converting to a voice frequency output signal, and the voice frequency output signal is used to output as a voice signal corresponding to the voice frequency various result. Thus, the present application provides a voice signal close to a human voice.

Method for converting vibration to voice frequency wirelessly

The present application discloses a Method for converting vibration to voice frequency wirelessly and a method thereof. By sensing a first vibration variation data and a voice frequency variation data of a vocal vibration part in a first sensing period, a voice frequency reference data is obtained from the voice frequency variation data and the first vibration result. A second vibration result is obtained at a second sensing period for converting to a voice frequency output signal, and the voice frequency output signal is used to output as a voice signal corresponding to the voice frequency various result. Thus, the present application provides a voice signal close to a human voice.

MEDIA DISTRIBUTION DEVICE, MEDIA DISTRIBUTION METHOD, AND PROGRAM
20230215102 · 2023-07-06 ·

The present disclosure relates to a media distribution device, a media distribution method, and a program enabling to generate a guide voice more appropriately. A media distribution device includes a guide voice generation unit that generates a guide voice describing a rendered image viewed from a viewpoint in a virtual space by using a scene description as information describing a scene in the virtual space and a user viewpoint information indicating a position and a direction of the viewpoint of a user; and an audio encoding unit that mixes the guide voice with original audio, and encodes the guide voice. The present technology can be applied to, for example, a media distribution system that distributes 6DoF media.