IPIQ

G10L15/00

Intent authoring using weak supervision and co-training for automated response systems

11568856 · 2023-01-31 ·

International Business Machines Corporation

A combination of propagation operations and learning algorithms is applied, using a selected set of labeled conversational logs retrieved from a subset of a plurality of conversational logs, to a remaining corpus of the plurality of conversational logs to train an automated response system according to an intent associated with each of the conversational logs. The combination of propagation operations and learning algorithms may include defining the labels by a user for the selected set of the subset of the plurality of conversational logs; training a probabilistic classifier using the defined labels of features of the selected set, wherein the probabilistic classifier produces labeling decisions for the subset of conversational logs; weighting the features of the selected set in a model optimization process; and/or training an additional classifier using the weighted features of the selected set and applying the additional classifier to the remaining corpus.

Real time correction of accent in speech audio signals

11715457 · 2023-08-01 ·

Intone Inc.

Systems and methods for real-time correction of an accent in a speech audio signal are provided. A method includes dividing the speech audio signal into a stream of input chunks, an input chunk from the stream of input chunks including a pre-defined number of frames of the speech audio signal, extracting, by an acoustic features extraction module from the input chunk and a context associated with the input chunk, acoustic features, the context is a pre-determined number of the frames preceding the input chunk in the stream; extracting, by a linguistic features extraction module from the input chunk and the context, linguistic features, receiving a speaker embedding for a human speaker, providing the speaker embedding, the acoustic features, and the linguistic features to a synthesis module to generate a melspectrogram with a reduced accent, providing the melspectrogram to a vocoder to generate an output chunk of an output audio signal.

VOICE RECOGNITION SYSTEM, SERVER, DISPLAY APPARATUS AND CONTROL METHODS THEREOF

20230028729 · 2023-01-26 ·

Samsung Electronics Co., Ltd.

Ji-eun CHAE

A voice recognition system includes a server storing a plurality of manuals and a display apparatus transmitting, when a spoken voice of a user is recognized, characteristic information and a spoken voice signal corresponding to the spoken voice to the server, the characteristic information is characteristic information of the display apparatus, the server transmits a response signal to the spoken voice signal to the display apparatus based on a manual corresponding to the characteristic information among the plurality of manuals, and the display apparatus processes an operation corresponding to the received response signal; as a result, user convenience increases.

SOUND CROSSTALK SUPPRESSION DEVICE AND SOUND CROSSTALK SUPPRESSION METHOD

20230026003 · 2023-01-26 ·

Panasonic Intellectual Property Management Co., Ltd.

A sound crosstalk suppression device includes: a speaker analysis unit configured to analyze a speaker situation in a closed space based on voice signals respectively collected by a plurality of microphones arranged in the closed space; a filter update unit that includes a filter configured to generate a suppression signal of a crosstalk component included in a voice signal of a main speaker, that is configured to update a parameter of the filter, and that is configured to store the updated parameter in a memory; a reset unit configured to reset the parameter of the filter in a case where it is determined that an analysis result of the speaker situation is switched; and a crosstalk suppression unit configured to suppress a crosstalk component by using a suppression signal.

SYSTEMS AND METHODS FOR AUTOMATED AUDIO TRANSCRIPTION, TRANSLATION, AND TRANSFER FOR ONLINE MEETING

20230026467 · 2023-01-26 ·

The present invention discloses systems and methods for multimedia processing. For example, the present invention provides systems and methods for receiving spoken audio, converting the spoken audio to text, and transferring the text to a user. As desired, the speech or text can be translated into one or more different languages. Systems and methods for real-time conversion and transmission of speech and text are provided, including systems and methods for large scale processing of multimedia events.

Automated call requests with status updates

11563850 · 2023-01-24 ·

Google Llc

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to synthetic call status updates. In some implementations, a method includes determining, by a task manager module, that a triggering event has occurred to provide a current status of a user call request. The method may then determine, by the task manager module, the current status of the user call request. A representation of the current status of the user call request is generated. Then, the generated representation of the current status of the user call request is provided to the user.

Speech recognition method, electronic device, and computer storage medium

11562736 · 2023-01-24 ·

TENCENT TECHNOLOGY (SHEN ZHEN) COMPANY LIMITED

Qiusheng Wan

A speech recognition method includes segmenting captured voice information to obtain a plurality of voice segments, and extracting voiceprint information of the voice segments; matching the voiceprint information of the voice segments with a first stored voiceprint information to determine a set of filtered voice segments having voiceprint information that successfully matches the first stored voiceprint information; combining the set of filtered voice segments to obtain combined voice information, and determining combined semantic information of the combined voice information; and using the combined semantic information as a speech recognition result when the combined semantic information satisfies a preset rule.

Operating modes that designate an interface modality for interacting with an automated assistant

11561764 · 2023-01-24 ·

Google Llc

Haywai Chan

Implementations described herein relate to transitioning a computing device between operating modes according to whether the computing device is suitably oriented for received non-audio related gestures. For instance, the user can attach a portable computing device to a docking station of a vehicle and, while in transit, wave their hand near the portable computing device in order to invoke the automated assistant. Such action by the user can be detected by a proximity sensor and/or any other device capable of determining a context of the portable computing device and/or an interest of the user in invoking the automated assistant. In some implementations location, orientation, and/or motion of the portable computing device can be detected and used in combination with an output of the proximity sensor to determine whether to invoke the automated assistant in response to an input gesture from the user.

Operating modes that designate an interface modality for interacting with an automated assistant

11561764 · 2023-01-24 ·

Google Llc

Haywai Chan

Digital Monitoring Badge System

20230228832 · 2023-07-20 ·

A wearable badge for an employee that records and transmits audio from client interactions with the professional, comprising two microphones and two microphone channels that focus one microphone on the speech of the employee and the other microphone on the speech of the customer, making diarizing easier. The wearable badge also comprises a module to determine whether or not the employee is maintaining an appropriate social distance with customers.

Patent classifications

G10L15/00