Patent classifications
H04M3/569
Complex computing network for improving establishment and broadcasting of audio communication among mobile computing devices and for improving switching from listening mode to conversation mode on a mobile application
Systems, methods, and computer program products are provided for improving establishment and broadcasting of audio communication among mobile computing devices and for improving switching from listening mode to conversation mode on a mobile application. For example, a method comprises: receiving a selection of a conversation mode option from the user interface of the mobile application on a first mobile device of a first user displaying the conversation mode option simultaneously with both a first visual representation of a second user involved in the first audio conversation, and the second visual representation of a third user involved in the first audio conversation; and in response to receiving selection of the conversation mode option, modifying a first image of the conversation mode option, and placing the first user in the first audio conversation with the second user and the third user, or in a second audio conversation with a fourth user.
DYNAMIC VIRTUAL ENVIRONMENT
Techniques for conducting a virtual event are described. One example method includes displaying, on a display screen of a computing device, a plurality of icons, each icon representing a different virtual event participant, wherein the plurality of icons includes a first icon representing a virtual event participant associated with the computing device; receiving, from an input device of the computing device, input representing a direction of movement for the first icon; and in response to receiving the input, moving the first icon on the display screen in the direction represented by the input.
SOUND OUTPUT CONTROL APPARATUS, SOUND OUTPUT CONTROL SYSTEM, SOUND OUTPUT CONTROL METHOD, AND PROGRAM
Provided are a sound output control apparatus, a sound output control system, a sound output control method, and a program that can appropriately thin out the output of pieces of sound data. A sound data reception section receives a plurality of pieces of sound data transmitted from transmission apparatuses that are different from each other. A selection section selects a portion of the plurality of pieces of sound data on the basis of at least one of a result of a voice activity detection process performed on each of the pieces of sound data or moving averages of volumes of sounds represented by the pieces of sound data. A sound data transmission section outputs the selected portion of the pieces of sound data.
SCALABLE VOICE SCENE MEDIA SERVER
A communication system, method, and computer-readable medium therefor comprise a media server configured to receive a plurality of audio streams from a corresponding plurality of client devices, the media server including circuitry configured to rank the plurality of audio streams based on a predetermined metric, group a first portion of the plurality of audio streams into a first set, the first portion of the plurality of audio streams being the N highest-ranked audio streams, group a second portion of the plurality of audio streams into a second set, the second portion of the plurality of audio streams being the M lowest-ranked audio streams, forward respective audio streams of the first set to a receiver device, and discard respective audio streams of the second set, wherein N and M are independent integers.
Information Handling Systems And Methods For Accurately Identifying An Active Speaker In A Communication Session
The present disclosure provides various embodiments of methods for intelligent active speaker identification and information handling systems (IHS s) utilizing such methods. In general, the methods disclosed herein may be used to accurately identify an active speaker in a communication session with an application or an IHS, regardless of whether the active speaker is alone, in a group environment, or using someone else's system or login to participate in the communication session. The methods disclosed herein may use voice processing technology and one or more voice identification databases (VIDs) to identify the active speaker in a communication session. In some embodiments, the disclosed methods may display the identity of the active speaker to other users or participants in the same communication session. In other embodiments, the disclosed methods may dynamically switch between user profiles or accounts during the communication session based on the identity of the active speaker.
Detecting user identity in shared audio source contexts
Computerized systems are provided for determining an identity of one or more users that use a same audio source, such as a microphone. The identity of one or more users that use a same audio source can based on generating a list of participant candidates who are likely to participate in an associated event, such as a meeting. For instance, embodiments can generate one or more network graphs of a meeting invitee any only voice input samples of the meeting invitee's N closest connections are compared to an utterance to determine the identity of the user associated with the utterance. One or more indicators that identify the users who are using the same audio source, as well as additional information or metadata associated with the identified user can be caused to be presented.
Speech Activity Detection Using Dual Sensory Based Learning
A dual sensory input speech detection method includes receiving, at a first time, a first video image input of a conference participant of the video conference and a first audio input of the conference participant; communicating the first video image input to the video conference; identifying the first video image input as a first facial image of the conference participant; determining, based on the first facial image, the first video image input indicates the conference participant is in a speaking state; identifying the first audio input as a first speech sound; determining, while in the speaking state, the first speech sound originates from the conference participant; and communicating the first audio input to an audio output for the video conference.
Systems and methods for resolving overlapping speech in a communication session
Systems, methods, and non-transitory computer-readable media can be configured to determine first audio associated with a first user and second audio associated with a second user, the first user and the second user associated with a communication session. The second audio can be muted based on a determination that the first audio and the second audio overlap. The second audio can be provided based on completion of the first audio.
SPEECH EVALUATION SYSTEM, SPEECH EVALUATION METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM STORING PROGRAM
A speech detection unit detects a speech in communication based on output values of microphones of a plurality of wearable terminals, and identifies a wearable terminal corresponding to the detected speech. A speech period detection unit detects for each speech a start timing of the speech and an end timing thereof. An evaluation-value calculation unit calculates, for each speech detected by the speech detection unit, an evaluation value for the speech based on an output value of an acceleration sensor of a wearable terminal other than the wearable terminal corresponding to the speech in an evaluation period from a first timing, which is at or later than the start timing of the speech and earlier than the end timing of the speech, to a second timing, which is later than the end timing of the speech.
INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND PROGRAM
An information processing system which enables more appropriate analysis on a state of communication participated by a plurality of participants is provided. The information processing apparatus includes an utterance judgement unit and an analysis section determination section. The utterance judgement unit judges an utterance section which is a time-series section of an utterance of each of a plurality of participants in the communication. The analysis section determination unit uses a reference analysis section to set boundaries of a plurality of actual analysis sections so that time points of boundaries of the plurality of actual analysis sections become times corresponding to the non-utterance times.