G10L13/00

Cooking management system with wireless voice engine server
11710485 · 2023-07-25 · ·

The disclosed technology provides computer-to-wireless-voice integration methods and systems. In some implementations, the methods and systems deliver real-time voice instructions to users of required time-sensitive actions and ensure that such directives are received and a recipient effectively acts on the directives. The systems and methods include receiving a notification of an event from a terminal in a wireless active voice engine (WAVE) system, determining an active voice directive corresponding to the event with a WAVE module, converting the active voice directive into a voice event via a directive converter, and notifying a targeted recipient of the active voice directive corresponding to the event with a communications module. In some implementations, the systems and methods include sending a confirmation event via the receiver to the communications module that the active voice directive was received by the targeted recipient and communicating the active voice directive has been completed.

Text-to-speech from media content item snippets

A text-to-speech engine creates audio output that includes synthesized speech and one or more media content item snippets. The input text is obtained and partitioned into text sets. A track having lyrics that match a part of one of the text sets is identified. The location of the track's audio that contains the lyric is extracted based on forced alignment data. The extracted audio is combined with synthesized speech corresponding to the remainder of the input text to form audio output.

Text-to-speech from media content item snippets

A text-to-speech engine creates audio output that includes synthesized speech and one or more media content item snippets. The input text is obtained and partitioned into text sets. A track having lyrics that match a part of one of the text sets is identified. The location of the track's audio that contains the lyric is extracted based on forced alignment data. The extracted audio is combined with synthesized speech corresponding to the remainder of the input text to form audio output.

DYNAMIC TEMPERED SAMPLING IN GENERATIVE MODELS INFERENCE
20230237986 · 2023-07-27 · ·

A method of sampling output audio samples includes, during a packet loss concealment event, obtaining a sequence of previous output audio samples. At each time step during the event, the method includes generating a probability distribution over possible output audio samples for the time step. Each sample includes a respective probability indicating a likelihood that the corresponding sample represents a portion of an utterance at the time step. The method also includes determining a temperature sampling value based on a function of a number of time steps that precedes the time step, and an initial, a minimum, and a maximum temperature sampling value. The method also includes applying the temperature sampling value to the probability distribution to adjust a probability of selecting possible samples and randomly selecting one of the possible samples based on the adjusted probability. The method also includes generating synthesized speech using the randomly selected sample.

DYNAMIC TEMPERED SAMPLING IN GENERATIVE MODELS INFERENCE
20230237986 · 2023-07-27 · ·

A method of sampling output audio samples includes, during a packet loss concealment event, obtaining a sequence of previous output audio samples. At each time step during the event, the method includes generating a probability distribution over possible output audio samples for the time step. Each sample includes a respective probability indicating a likelihood that the corresponding sample represents a portion of an utterance at the time step. The method also includes determining a temperature sampling value based on a function of a number of time steps that precedes the time step, and an initial, a minimum, and a maximum temperature sampling value. The method also includes applying the temperature sampling value to the probability distribution to adjust a probability of selecting possible samples and randomly selecting one of the possible samples based on the adjusted probability. The method also includes generating synthesized speech using the randomly selected sample.

COMMUNICATION SYSTEM

A communication system includes a group calling controller configured to perform first processing of broadcasting utterance voice data received from one of mobile communication terminals to other mobile communication terminals and second processing of chronologically accumulating the result of utterance voice recognition from voice recognition processing on the received utterance voice data as a communication history and allowing users to see the communication history in synchronization, and an individual calling controller configured to transmit utterance voice data only to a specified user during group calling. A communication controller including the controllers is configured to identify a user participating in an individual calling mode that utterance voice data being transmitted only to the specified user during the broadcast of the first processing and to perform, after end of the individual calling mode, processing for notifying the identified user that the broadcast was performed during the individual calling mode.

VOICE COMMUNICATION BETWEEN A SPEAKER AND A RECIPIENT OVER A COMMUNICATION NETWORK

Voice communication, between a speaker and a recipient, either or both of which may be in a motor vehicle, is provided via a communication network. In a first step, an input speech utterance is received from the speaker. Optionally, a bandwidth of a connection to the communication network is evaluated at the side of the speaker. The input speech utterance is then converted to text. At least the text is transmitted over the communication network. In case of a sufficiently large bandwidth, the input speech utterance may be transmitted as voice and as text. The transmitted text is converted into an output speech utterance that simulates a voice of the speaker. Finally, the output speech utterance is provided to the recipient.

Personal audio assistant device and method
11521632 · 2022-12-06 · ·

A system includes a first microphone that captures audio, a communication module communicatively coupled to the first microphone, a logic circuit communicatively coupled to the first microphone and communication module, a speaker operatively coupled to the logic circuit, and an interaction element. The interaction element and logic circuit are configured to initiate control of audio content for output from the speaker in response to at least one voice command detected in captured audio. Other embodiments are disclosed.

Personal audio assistant device and method
11521632 · 2022-12-06 · ·

A system includes a first microphone that captures audio, a communication module communicatively coupled to the first microphone, a logic circuit communicatively coupled to the first microphone and communication module, a speaker operatively coupled to the logic circuit, and an interaction element. The interaction element and logic circuit are configured to initiate control of audio content for output from the speaker in response to at least one voice command detected in captured audio. Other embodiments are disclosed.

Signal processing apparatus, communication system, method performed by signal processing apparatus, storage medium for signal processing apparatus, method performed by communication terminal, and storage medium for communication terminal to receive text data from another communication terminal in response to a unique texting completion notice

According to one embodiment, a signal processing apparatus correlates a plurality of communication terminals as a group and enables one-to-many communications in the group. The signal processing apparatus includes processing circuitry. The processing circuitry assigns a transmission right to one of the communication terminals in the group. The processing circuitry generates text data based on voice data from said one of the communication terminals in possession of the transmission right. The processing circuitry gives a texting completion notice indicative of completion of texting processing to the communication terminals in the group. The processing circuitry transmits, after the texting completion notice is given, the generated text data to at least one of the communication terminals in the group.