G10L21/057

INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD
20240282017 · 2024-08-22 ·

An information processing device determines a priority sound data item from among a plurality of sound data items of which sound generation timings overlap each other and which are directed at an avatar of a first user in a virtual space. The information processing device performs control to notify the first user of contents of the priority sound data item by reproducing the priority sound data item at a first timing. The information processing device performs control to notify the first user of contents of a non-priority sound data item without reproducing the non-priority sound data item at the first timing. The non-priority sound data item is not determined to be the priority sound data item from among the plurality of sound data items, without reproducing the non-priority sound data item at the first timing.

FAST PLAYBACK IN MEDIA FILES WITH REDUCED IMPACT TO SPEECH QUALITY
20180350388 · 2018-12-06 ·

The present invention is a system and method for increasing the playback speed of audio waves. The system analyzes an audio wave to detect a first silent section that has a length greater than a minimum short pause length required to distinguish between words. The system then calculates a new playback speed of the first silent section so that the total playback time for the first silent section is less than or equal to the minimum short pause length and controls an audio playback device to play the audio wave in a manner so that the first silent section is played back at the new playback speed. In another embodiment, the system analyzes spoken words, phonemes by phonemes, and increases the spoken word playback speed by dynamically reducing the length of each phoneme and inter-syllable silent pauses. Thus, the system functions equally well for all languages and accents.

Communication apparatus mounted with speech speed conversion device

In a communication apparatus, an encoder compresses telephone call voice which is transmitted from another communication apparatus. A voice accumulator preserves the telephone call voice, which is compressed by the encoder, as a message. A decoder expands the telephone call voice which is preserved in the voice accumulator. A signal memory temporarily maintains the telephone call voice which is expanded by the decoder. A speech speed convertor performs speech speed conversion on the telephone call voice, which is read from the signal memory, and outputs resulting voice from a speaker. A memory monitor temporarily stops to expand the telephone call voice in the decoder in a case where the memory monitor determines that an idle capacity of the signal memory approaches a predetermined lower limit value.

Communication apparatus mounted with speech speed conversion device

In a communication apparatus, an encoder compresses telephone call voice which is transmitted from another communication apparatus. A voice accumulator preserves the telephone call voice, which is compressed by the encoder, as a message. A decoder expands the telephone call voice which is preserved in the voice accumulator. A signal memory temporarily maintains the telephone call voice which is expanded by the decoder. A speech speed convertor performs speech speed conversion on the telephone call voice, which is read from the signal memory, and outputs resulting voice from a speaker. A memory monitor temporarily stops to expand the telephone call voice in the decoder in a case where the memory monitor determines that an idle capacity of the signal memory approaches a predetermined lower limit value.

AUDIO PLAYBACK AND CAPTIONING
20240304208 · 2024-09-12 ·

A method, a structure, and a computer system for playback optimization. The exemplary embodiments may include transcribing an audio stream at normal and at high playback speed, as well as determining whether portions of the audio stream are unintelligible at the high playback speed. The exemplary embodiments may further include reducing a speed of the high playback speed for portions determined unintelligible, and transcribing the audio stream with the unintelligible portions played at the reduced high playback speed. The exemplary embodiments may further include determining whether the unintelligible portions are still unintelligible at the reduced high playback speed and, if not, identifying the reduced high playback speed as an optimal playback speed of the unintelligible portions. The exemplary embodiments may additionally include, in response to receiving a request to play the audio stream at the high playback speed, playing the unintelligible portions at the optimal playback speed.

AUDIO PLAYBACK AND CAPTIONING
20240304208 · 2024-09-12 ·

A method, a structure, and a computer system for playback optimization. The exemplary embodiments may include transcribing an audio stream at normal and at high playback speed, as well as determining whether portions of the audio stream are unintelligible at the high playback speed. The exemplary embodiments may further include reducing a speed of the high playback speed for portions determined unintelligible, and transcribing the audio stream with the unintelligible portions played at the reduced high playback speed. The exemplary embodiments may further include determining whether the unintelligible portions are still unintelligible at the reduced high playback speed and, if not, identifying the reduced high playback speed as an optimal playback speed of the unintelligible portions. The exemplary embodiments may additionally include, in response to receiving a request to play the audio stream at the high playback speed, playing the unintelligible portions at the optimal playback speed.

SYSTEMS, METHODS AND DEVICES FOR INTELLIGENT SPEECH RECOGNITION AND PROCESSING
20180268841 · 2018-09-20 ·

Systems, methods, and devices for intelligent speech recognition and processing are disclosed. According to one embodiment, a method for improving intelligibility of a speech signal may include (1) at least one processor receiving an incoming speech signal comprising a plurality of sound elements; (2) the at least one processor recognizing a sound element in the incoming speech signal to improve the intelligibility thereof; (3) the at least one processor processing the sound element by at least one of modifying and replacing the sound element; and (4) the at least one processor outputting the processed speech signal comprising the processed sound element.

Sound enhancement through reverberation matching

Embodiments of the present invention relate to enhancing sound through reverberation matching. In sonic implementations, a first sound recording recorded in a first environment is received. The first sound recording is decomposed to a first clean signal and a first reverb kernel. A second reverb kernel corresponding with a second sound recording recorded in a second environment is accessed, for example, based on a user indication to enhance the first sound recording to sound as though recorded in the second environment. An enhanced sound recording is generated based on the first clean signal and the second reverb kernel. The enhanced sound recording is a modification of the first sound recording to sound as though recorded in the second environment.

Sound enhancement through reverberation matching

Embodiments of the present invention relate to enhancing sound through reverberation matching. In sonic implementations, a first sound recording recorded in a first environment is received. The first sound recording is decomposed to a first clean signal and a first reverb kernel. A second reverb kernel corresponding with a second sound recording recorded in a second environment is accessed, for example, based on a user indication to enhance the first sound recording to sound as though recorded in the second environment. An enhanced sound recording is generated based on the first clean signal and the second reverb kernel. The enhanced sound recording is a modification of the first sound recording to sound as though recorded in the second environment.

AUDIO ADJUSTMENT BASED ON COLOCATION OF USERS

Systems, apparatuses and methods identify first data associated with a first user and determines whether a second user is colocated with the first user based on the first data. The systems, apparatuses and methods receive first audio data associated with the first user, and second audio data associated with the second user, and responsive to a determination that the second user is colocated with the first user, one or more of determine that a first audio feed to the first user will exclude the second audio data, or determine that a second audio feed to the second user will exclude the first audio data, and generate one or more of the first audio feed or the second audio feed.