Patent classifications
G10L13/02
Outside ordering system
An ordering system can be positioned partially, or completely, outside in a retail environment with an ordering device located outside of a building on a site. The ordering device receiving a first audio stream concurrently with a second audio stream from an employee and proceeds to capture the first audio stream with a first port of an on-site computing device while capturing the second audio stream with a second port of the on-site computing device. A customer strategy can be executed with an intelligence module of the on-site computing device connected to the ordering device with the on-site customer strategy directing automated interactions with a first on-site customer to compile a retail order. The employee may communicate directly with the intelligence module via the second port without interrupting the first audio stream.
Automatic synthesis of translated speech using speaker-specific phonemes
An embodiment includes converting an original audio signal to an original text string, the original audio signal being from a recording of the original text string spoken by a specific person in a source language. The embodiment generates a translated text string by translating the original text string from the source language to a target language, including translation of a word from the source language to a target language. The embodiment assembles a standard phoneme sequence from a set of standard phonemes, where the standard phoneme sequence includes a standard pronunciation of the translated word. The embodiment also associates a custom phoneme with a standard phoneme of the standard phoneme sequence, where the custom phoneme includes the specific person's pronunciation of a sound in the translated word. The embodiment synthesizes the translated text string to a translated audio signal including the translated word pronounced using the custom phoneme.
Automatic synthesis of translated speech using speaker-specific phonemes
An embodiment includes converting an original audio signal to an original text string, the original audio signal being from a recording of the original text string spoken by a specific person in a source language. The embodiment generates a translated text string by translating the original text string from the source language to a target language, including translation of a word from the source language to a target language. The embodiment assembles a standard phoneme sequence from a set of standard phonemes, where the standard phoneme sequence includes a standard pronunciation of the translated word. The embodiment also associates a custom phoneme with a standard phoneme of the standard phoneme sequence, where the custom phoneme includes the specific person's pronunciation of a sound in the translated word. The embodiment synthesizes the translated text string to a translated audio signal including the translated word pronounced using the custom phoneme.
SPEECH SYNTHESIZER, AUDIO WATERMARKING INFORMATION DETECTION APPARATUS, SPEECH SYNTHESIZING METHOD, AUDIO WATERMARKING INFORMATION DETECTION METHOD, AND COMPUTER PROGRAM PRODUCT
According to an embodiment, a speech synthesizer includes a source generator, a phase modulator, and a vocal tract filter unit. The source generator generates a source signal by using a fundamental frequency sequence and a pulse signal. The phase modulator modulates, with respect to the source signal generated by the source generator, a phase of the pulse signal at each pitch mark based on audio watermarking information. The vocal tract filter unit generates a speech signal by using a spectrum parameter sequence with respect to the source signal in which the phase of the pulse signal is modulated by the phase modulator.
SPEECH SYNTHESIZER, AUDIO WATERMARKING INFORMATION DETECTION APPARATUS, SPEECH SYNTHESIZING METHOD, AUDIO WATERMARKING INFORMATION DETECTION METHOD, AND COMPUTER PROGRAM PRODUCT
According to an embodiment, a speech synthesizer includes a source generator, a phase modulator, and a vocal tract filter unit. The source generator generates a source signal by using a fundamental frequency sequence and a pulse signal. The phase modulator modulates, with respect to the source signal generated by the source generator, a phase of the pulse signal at each pitch mark based on audio watermarking information. The vocal tract filter unit generates a speech signal by using a spectrum parameter sequence with respect to the source signal in which the phase of the pulse signal is modulated by the phase modulator.
VOICE COMMAND-DRIVEN DATABASE
A voice command-driven system and computer-implemented method are disclosed for selecting a data item in a list of text-based data items stored in a database using a simple affirmative voice command input without utilizing a connection to a network. The text-based data items in the list are converted to speech using an embedded text-to-speech engine and an audio output of a first converted data item is provided. A listening state is entered into for a predefined pause time to await receipt of the simple affirmative voice command input. If the simple affirmative voice command input is received during the predefined pause time, the first converted data item is selected for processing. If the simple affirmative voice command input is not received during the predefined pause time, an audio output of a next converted data item in the list is provided.
Text-to-speech from media content item snippets
A text-to-speech engine creates audio output that includes synthesized speech and one or more media content item snippets. The input text is obtained and partitioned into text sets. A track having lyrics that match a part of one of the text sets is identified. The location of the track's audio that contains the lyric is extracted based on forced alignment data. The extracted audio is combined with synthesized speech corresponding to the remainder of the input text to form audio output.
COMMUNICATION SYSTEM
A communication system includes a group calling controller configured to perform first processing of broadcasting utterance voice data received from one of mobile communication terminals to other mobile communication terminals and second processing of chronologically accumulating the result of utterance voice recognition from voice recognition processing on the received utterance voice data as a communication history and allowing users to see the communication history in synchronization, and an individual calling controller configured to transmit utterance voice data only to a specified user during group calling. A communication controller including the controllers is configured to identify a user participating in an individual calling mode that utterance voice data being transmitted only to the specified user during the broadcast of the first processing and to perform, after end of the individual calling mode, processing for notifying the identified user that the broadcast was performed during the individual calling mode.
COMMUNICATION SYSTEM
A communication system includes a group calling controller configured to perform first processing of broadcasting utterance voice data received from one of mobile communication terminals to other mobile communication terminals and second processing of chronologically accumulating the result of utterance voice recognition from voice recognition processing on the received utterance voice data as a communication history and allowing users to see the communication history in synchronization, and an individual calling controller configured to transmit utterance voice data only to a specified user during group calling. A communication controller including the controllers is configured to identify a user participating in an individual calling mode that utterance voice data being transmitted only to the specified user during the broadcast of the first processing and to perform, after end of the individual calling mode, processing for notifying the identified user that the broadcast was performed during the individual calling mode.
Link-based audio recording, collection, collaboration, embedding and delivery system
A machine has a processor and a memory connected to the processor. The memory stores instructions executed by the processor to supply a name page in response to a request from an administrator machine. Name page updates are received from the administrator machine. The name page updates include participants and associated network contact information for the participants. A code is utilized to form a link to the name page. Prompts for textual name information and audio name information are supplied to a client machine that activates the link to the name page. Textual name information and audio name information are received from the client machine. The textual name information and audio name information are stored in association with the name page. Navigation tools are supplied to facilitate access to the textual name information and audio name information.