Patent classifications
H04M3/569
AUDIO MIXING FOR DISTRIBUTED AUDIO SENSORS
A method, apparatus and computer program product enhance audio quality during a voice communication session, enhancing audio quality for a device of a remote user. The apparatus can comprise a processor, a memory, and a computer program code, and can be configured to receive and weight audio signals based on signal attributes such that audio signals can be adjusted and adjusted signals mixed to form a composite audio signal. In a method, a processor receives audio signals from audio sensors around a room, selects one or more audio signals based on audio signal attributes, and adjusts or causes the adjustment of the one or more audio signals based on the audio signal attributes. The method can also include causing the adjusted audio signals to be mixed to form a composite signal, and causing the composite signal to be communicated to a device of a remote user.
HAPTIC FEEDBACK DURING PHONE CALLS
A method for providing haptic feedback to participants of multi-party phone conversations that includes opening a communications session with a conference system for at least two users each having user specific communications devices, user specific conduct measuring devices, and user specific haptic feedback device registered with the conference system. Analyzing content of the communications session from content received by the conference system through the user specific communications device for at least one of the users; and capturing status for said at least two users from data measured by the user specific conduct measuring device for the at least two users. Determining with the conference system if the content of the communications session and the status of said at least two users calls for input by the user through said user specific communications device. Sending a feedback signal from the conference system to the user specific haptic feedback device.
Haptic feedback during phone calls
A method for providing haptic feedback to participants of multi-party phone conversations that includes opening a communications session with a conference system for at least two users each having user specific communications devices, user specific conduct measuring devices, and user specific haptic feedback device registered with the conference system. Analyzing content of the communications session from content received by the conference system through the user specific communications device for at least one of the users; and capturing status for said at least two users from data measured by the user specific conduct measuring device for the at least two users. Determining with the conference system if the content of the communications session and the status of said at least two users calls for input by the user through said user specific communications device. Sending a feedback signal from the conference system to the user specific haptic feedback device.
AUTOMATED WRITTEN INDICATOR FOR SPEAKERS ON A TELECONFERENCE
While a teleconference is occurring, data of a teleconference is analyzed to determine first participant data associated with a first speaker and second participant data associated with a second speaker. At a different application, addition of a first speaker indicator and a second speaker indicator to a text entry of a user is caused, the first speaker indicator added concurrently with identification that the first speaker is speaking and the second speaker indicator added concurrently with identification that the second speaker is speaking. At the different application, addition of key information to a text entry of a user is caused, the key information comprising participant data associated with the first speaker and second participant data associated with the second speaker.
AUDIO SIGNAL PROCESSING APPARATUS AND AUDIO SIGNAL PROCESSING METHOD
An audio signal processing method includes selecting a channel group of at least two channels according to a predetermined reference, from among audio signals of at least three channels; and controlling a gain of the audio signal of each channel of the selected channel group, according to a volume level of the audio signal of each channel of the channel group.
Video communication method and robot for implementing the method
Provided are a video communication method and a robot implementing the same. The robot includes a camera configured to acquire a first video for a video call, a multi-channel microphone configured to receive a sound signal, a memory storing one or more instructions, and a processor configured to execute the one or more instructions. The processor calculates positions at which a plurality of voice signals included in the sound signal are generated, calculates positions of N users appearing in the first video (here, N is an integer greater than or equal to 2), selects N voice signals generated at the same positions as the N users from among the plurality of voice signals, calculates a ratio of times during which a voice is detected from waveforms of the N voice signals in a previous time period prior to a first time point, and determines a main user of the video call at the first time point on the basis of the ratio of times.
Audio assisted auto exposure
Systems, methods, and computer-readable media for automatically setting exposure levels in a session based on an active participant. In some embodiments, a method can include detecting faces of one or more participants in one or more images in a captured video feed at a location of the participants illuminated at one or more illumination levels at the location. The one or more detected faces can be associated with brightness levels of the one or more participants based on the one or more illumination levels at the location. Audio input can be received for the one or more participants at the location and a first participant can be identified as an active participant using the audio input. Further, an exposure level of the captured video feed can be set based on the first participant acting as the active participant according to a brightness level in the one or more images associated with a face detection of the first participant in the one or more images.
Spatial Audio Capture & Processing
An apparatus, method and computer program is disclosed, comprising a means for determining that a first one of a plurality of audio capture devices, which collectively contribute respective audio signals to a spatial audio signal, has entered a condition associated with not contributing to the spatial audio signal. A further means may be provided, configured responsive to said determination for causing removal of the one or more audio signals contributed by the first audio capture device from the spatial audio signal.
CONTEXT BASED IDENTIFICATION OF NON-RELEVANT VERBAL COMMUNICATIONS
A computer-implemented method includes identifying a first set of utterances from a plurality of utterances. The plurality of utterances is associated with a conversation and transmitted via a plurality of audio signals. The computer-implemented method further includes mining the first set of utterances for a first context. The computer-implemented method further includes determining that the first context associated with the first set of utterances is not relevant to a second context associated with the conversation. The computer-implemented method further includes dynamically muting, for at least a first period of time, a first audio signal in the plurality of audio signals corresponding to the first set of utterances. A corresponding computer system and computer program product are also disclosed.
Communicating metadata that identifies a current speaker
A computer system may communicate metadata that identifies a current speaker. The computer system may receive audio data that represents speech of the current speaker, generate an audio fingerprint of the current speaker based on the audio data, and perform automated speaker recognition by comparing the audio fingerprint of the current speaker against stored audio fingerprints contained in a speaker fingerprint repository. The computer system may communicate data indicating that the current speaker is unrecognized to a client device of an observer and receive tagging information that identifies the current speaker from the client device of the observer. The computer system may store the audio fingerprint of the current speaker and metadata that identifies the current speaker in the speaker fingerprint repository and communicate the metadata that identifies the current speaker to at least one of the client device of the observer or a client device of a different observer.