Patent classifications
H04M3/569
Electronic device and generating conference call participants identifications
An example electronic device for conducting conference calls includes a memory to store a user profile including first identifying information of a first participant in a conference call. A receiver receives second identifying information of the first participant from a transmitting device associated with the first participant. The first identifying information and the second identifying information form an identifier for the first participant. An audio encoder receives an audio signal. A processor, in response to determining which transmitting device is nearest to a source of the audio signal relative to other transmitting devices, identifies the first participant as a source of the audio signal, and combines the identifier for the first participant with the audio signal generated by the first participant. A router forwards the combined identifier and audio of the first participant to a receiving device of a second participant in the conference call.
Detecting and assigning action items to conversation participants in real-time and detecting completion thereof
Described herein is a system for automatically detecting and assigning action items in a real-time conversation and determining whether such action items have been completed. The system detects, during a meeting, a plurality of action items and an utterance that corresponds to a completed action item. Responsive to detecting the utterance, the system generates a similarity score with respect to a first action item of the plurality of action items. The system compares the similarity score to a first threshold. Responsive to determining that the similarity score does not exceed the first threshold, the system generates a second similarity score with respect to a second action item of the plurality of action items. The system compares the second similarity score to a second threshold, which exceeds the first threshold. Responsive to determining that the second similarity score exceeds the second threshold, the system marks the second action item as completed.
DETECTING AND ASSIGNING ACTION ITEMS TO CONVERSATION PARTICIPANTS IN REAL-TIME AND DETECTING COMPLETION THEREOF
Described herein is a system for automatically detecting and assigning action items in a real-time conversation and determining whether such action items have been completed. The system detects, during a meeting, a plurality of action items and an utterance that corresponds to a completed action item. Responsive to detecting the utterance, the system generates a similarity score with respect to a first action item of the plurality of action items. The system compares the similarity score to a first threshold. Responsive to determining that the similarity score does not exceed the first threshold, the system generates a second similarity score with respect to a second action item of the plurality of action items. The system compares the second similarity score to a second threshold, which exceeds the first threshold. Responsive to determining that the second similarity score exceeds the second threshold, the system marks the second action item as completed.
Video conference collaboration
A system and method providing an accessibility tool that enhances a graphical user interface of an online meeting application is described. In one aspect, a computer-implemented method performed by an accessibility tool (128), the method includes accessing (802), in real-time, audio data of a session of an online meeting application (120), identifying (804) a target user, a speaking user, and a task based on the audio data, the speaking user indicating the task assigned to the target user in the audio data, generating (806) a message (318) that identifies the speaking user, the target user, and the task, the message (318) including textual content, and displaying (808) the message (318) in a chat pane (906) of a graphical user interface (902) of the online meeting application (120) during the session.
Shared Speakerphone System for Multiple Devices in a Conference Room
A speakerphone system is shared with multiple participant devices of participants in a physical meeting that are using a web conferencing service. An active speaker is identified from the participants. The participant device of the active speaker is switched, such that the speakerphone system receives and renders audio of the active speaker. Video of the participant device of the active speaker is enabled, such that the web conferencing service displays the video to the participant devices.
Speech Activity Detection Using Dual Sensory Based Learning
A dual sensory input speech detection method includes receiving, at a first time, a first video image input of a conference participant of the video conference and a first audio input of the conference participant; communicating the first video image input to the video conference; identifying the first video image input as a first facial image of the conference participant; determining, based on the first facial image, the first video image input indicates the conference participant is in a speaking state; identifying the first audio input as a first speech sound; determining, while in the speaking state, the first speech sound originates from the conference participant; and communicating the first audio input to an audio output for the video conference.
LEVERAGING A NETWORK OF MICROPHONES FOR INFERRING ROOM LOCATION AND SPEAKER IDENTITY FOR MORE ACCURATE TRANSCRIPTIONS AND SEMANTIC CONTEXT ACROSS MEETINGS
In conventional audio and video conferencing, connecting the devices of participants in the same room to the conference can degrade the audio quality for every conference participant. Different speakers emit the same audio signals at different times, even in the same room, making echo cancellation difficult or impossible. Routing signals among devices in the same consumes bandwidth and introduces variable latency. The inventive conferencing technology eliminates these problems with more intelligent routing and mixing. An inventive conference bridge organize colocated clients into groups, picks one Elected Speaker per group, and sends signals to only the Elected Speakers. The Elected Speakers mix audio from other groups, share it within their groups using low-latency local connections, and play the audio after a delay. The other speakers may play the audio too and use the distributed mixes for automatic echo cancellation, improving call quality in real-time, and send the processed audio directly back to the bridge.
Speech activity detection using dual sensory based learning
A dual sensory input speech detection method includes receiving, at a first time, a first video image input of a conference participant of the video conference and a first audio input of the conference participant; communicating the first video image input to the video conference; identifying the first video image input as a first facial image of the conference participant; determining, based on the first facial image, the first video image input indicates the conference participant is in a speaking state; identifying the first audio input as a first speech sound; determining, while in the speaking state, the first speech sound originates from the conference participant; and communicating the first audio input to an audio output for the video conference.
CALL AUDIO MIXING PROCESSING
A call audio mixing processing method is provided. In the method, call audio streams from terminals of call members participating in a call are obtained. Voice analysis is performed on the call audio streams to determine voice activity corresponding to each of the terminals. The voice activity of the terminals indicate activity levels of the call members participating in the call. According to the voice activity of the terminals, respective voice adjustment parameters corresponding to the terminals are determined. According to the respective voice adjustment parameters corresponding to the terminals, the call audio streams of the terminals are adjusted. Further, mixing processing is performed on the adjusted call audio streams to obtain a mixed audio stream.
SYSTEMS AND METHODS FOR RESOLVING OVERLAPPING SPEECH IN A COMMUNICATION SESSION
Systems, methods, and non-transitory computer-readable media can be configured to determine first audio associated with a first user and second audio associated with a second user, the first user and the second user associated with a communication session. The second audio can be muted based on a determination that the first audio and the second audio overlap. The second audio can be provided based on completion of the first audio.