H04M3/569

Complex computing network for enabling substantially instantaneous switching between conversation mode and listening mode on a mobile application

Systems, methods, and computer program products are provided for enabling substantially instantaneous switching between conversation mode and listening mode on a mobile application. For example, a method comprises: determining a first user accesses a mobile application on a first mobile device of the first user; and enabling the first user the select a conversation mode option or a listening mode option on the mobile application, wherein the conversation mode option and the listening mode option are presented on a user interface of the mobile application on the first mobile device of the first user.

SYSTEMS AND METHODS FOR VOICE IDENTIFICATION AND ANALYSIS
20210065713 · 2021-03-04 · ·

Obtaining configuration audio data including voice information for a plurality of meeting participants. Generating localization information indicating a respective location for each meeting participant. Generating a respective voiceprint for each meeting participant. Obtaining meeting audio data. Identifying a first meeting participant and a second meeting participant. Linking a first meeting participant identifier of the first meeting participant with a first segment of the meeting audio data. Linking a second meeting participant identifier of the second meeting participant with a second segment of the meeting audio data. Generating a GUI indicating the respective locations of the first and second meeting participants, and the GUI indicating a first transcription of the first segment and a second transcription of the second segment. The first transcription is associated with the first meeting participant in the GUI, and the second transcription is associated with the second meeting participant in the GUI.

MUTED COMPONENT DETECTION

One embodiment provides a method, comprising: transmitting, from a communication component, a signal down a communication channel; determining, using a processor, whether an echo associated with the signal is detected by the communication component; and providing, responsive to determining that the echo is not detected, a notification to a user that a mute control is enabled at another communication component along the communication channel. Other aspects are described and claimed.

DISTRIBUTED TELECONFERENCING USING PERSONALIZED ENHANCEMENT MODELS
20230421702 · 2023-12-28 · ·

This document relates to distributed teleconferencing. Some implementations can employ personalized enhancement models to enhance microphone signals for participants in a call. Further implementations can perform proximity-based mixing, where microphone signals received from devices in a particular room can be omitted from playback signals transmitted to other devices in the same room. These techniques can allow enhanced call quality for teleconferencing sessions where co-located users can employ their own devices to participate in a call with other users.

CONTEXT ACQUIRING METHOD AND DEVICE BASED ON VOICE INTERACTION

Embodiments of the present disclosure provide a context acquiring method based on voice interaction and a device, the method comprising: acquiring a scene image collected by an image collection device at a voice start point of a current conversation, and extracting a face feature of each user in the scene image; if it is determined that there is a second face feature matching a first face feature according to the face feature of each user and a face database, acquiring a first user identifier corresponding to the second face feature from the face database; if it is determined that a stored conversation corresponding to the first user identifier is stored in a voice database, determine a context of a voice interaction according to the current conversation and the stored conversation, and after the voice end point of the current conversation is obtained, storing the current conversation into the voice database.

SYSTEMS AND METHODS FOR IMPROVING AUDIO CONFERENCING SERVICES
20200411038 · 2020-12-31 ·

Systems and methods are disclosed herein for improving audio conferencing services. One aspect relates to processing audio content of a conference. A first audio signal is received from a first conference participant, and a start and an end of a first utterance by the first conference participant are detected from the first audio signal. A second audio signal is received from a second conference participant, and a start and an end of a second utterance by the second conference participant is detected from the second audio signal. The second conference participant is provided with at least a portion of the first utterance, wherein at least one of start time, start point, and duration is determined based at least in part on the start, end, or both, of the second utterance.

Video communication method and robot for implementing the method
10878822 · 2020-12-29 · ·

Provided are a video communication method and a robot implementing the same. The robot includes a camera configured to acquire a first video of a space for a video call, a multi-channel microphone configured to receive a sound signal output to the space, a memory storing one or more instructions, and a processor configured to execute the one or more instructions. The processor determines a first user among N users in the first video based on the sound signal received in a previous time period prior to a first time point, wherein the first user is a main user of the video call at the first time point and N is an integer greater than or equal to 2.

AUDIO CHANNEL MIXING
20200388292 · 2020-12-10 ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for audio channel mixing are disclosed. In one aspect, a method includes the actions of receiving first audio data for a first audio channel. The actions further include transmitting the first audio data. The actions further include, while receiving and transmitting the first audio data, receiving second audio data for a second audio channel; determining a first speech audio energy level of the first audio data and a first noise energy level of the first audio data; determining a second speech audio energy level of the second audio data and a second noise energy level of the second audio data; and determining whether to switch to transmitting the second audio data or continue transmitting the first audio data. The actions further include transmitting the first audio data or the second audio data.

Method and device for message sending and receiving based on a communication interface framework
10848595 · 2020-11-24 · ·

Embodiments of the disclosure provide a communication interface framework, a message sending method and device based on a communication interface framework, a message receiving method and device based on a communication interface framework, and a communication system. The communication interface framework comprises: a device layer, a core layer and a protocol layer, wherein the device layer comprises a transmission device for providing the framework with, when transmitting data information, a transmission interface for transmitting the data information; the core layer comprises an interface protocol, a sending queue, and a receiving queue; and the protocol layer comprises a user mode application program interface and a kernel mode application program interface.

Systems and methods for voice identification and analysis
10839807 · 2020-11-17 · ·

Obtaining configuration audio data including voice information for a plurality of meeting participants. Generating localization information indicating a respective location for each meeting participant. Generating a respective voiceprint for each meeting participant. Obtaining meeting audio data. Identifying a first meeting participant and a second meeting participant. Linking a first meeting participant identifier of the first meeting participant with a first segment of the meeting audio data. Linking a second meeting participant identifier of the second meeting participant with a second segment of the meeting audio data. Generating a GUI indicating the respective locations of the first and second meeting participants, and the GUI indicating a first transcription of the first segment and a second transcription of the second segment. The first transcription is associated with the first meeting participant in the GUI, and the second transcription is associated with the second meeting participant in the GUI.