Patent classifications
H04M3/568
Method of Noise Reduction for Intelligent Network Communication
The present invention discloses a method of noise reduction for an intelligent network communication, which includes the following steps: first, receiving a local sound message through a sound receiver of a communication device at the transmitting end. Next, a voice recognizer is used to identify the voice characteristics of the speaker; then, it is determined from a voice database whether there is a corresponding or similar voice characteristic of the speaker recognized by the voice recognizer. Finally, filtering other signals other than the voice characteristic signal of the speaker through a sound filter to obtain the original sound emitted by the speaker.
METHOD AND SYSTEM FOR VOLUME CONTROL
A method performed by a first electronic device, the method includes, while engaged in a call with a second electronic device, initiating a joint media playback session in which the first and second electronic devices independently stream media content for synchronous playback; driving a speaker with a mix of a downlink signal of the call and an audio signal of the media content at an overall volume level; receiving a user-adjustment at a single volume control for the first electronic device to reduce the overall volume level; in response to the user adjustment, applying a first gain adjustment to the downlink signal and a second gain adjustment to the audio signal; and driving the speaker with a mix of the downlink signal and the audio signal at the reduced volume level.
Method and system for making context based announcements during a communication session
A method, system, device to play context-based announcement to specific participants during an active communication session. The method includes receiving a context-based announcement for a participant while the participant is in the active communication session. The method also includes monitoring the communication session, to determine a time when the context-based announcement should be played to the participant. The method further includes playing the context-based announcement to the participant at the determined time. The method may also include displaying to the participant a visual alert associated with the context-based announcement, the visual alert may include controls that allows the participant to select when the announcement is played.
Systems and methods for videoconferencing with spatial audio
A system may provide for the generation of spatial audio for audiovisual conferences, video conferences, etc. (referred to herein simply as “conferences”). Spatial audio may include audio encoding and/or decoding techniques in which a sound source may be specified at a location, such as on a two-dimensional plane and/or within a three-dimensional field, and/or in which a direction or target for a given sound source may be specified. A conference participant's position within a conference user interface (“UI”) may be set as the source of sound associated with the conference participant, such that different conference participants may be associated with different sound source positions within the conference UI.
Audio Conferencing Using a Distributed Array of Smartphones
Described is a method of hosting a teleconference among a plurality of client devices arranged in two or more acoustic spaces, each client device having an audio capturing capability and/or an audio rendering capability, the method comprising: grouping the plurality of client devices into two or more groups based on their belonging to respective acoustic spaces, receiving first audio streams from the plurality of client devices, generating second audio streams from the first audio streams for rendering by respective client devices among the plurality of client devices, based on the grouping of the plurality of client devices into the two or more groups, and outputting the generated second audio streams to respective client devices. Further described are corresponding computation devise, computer programs, and computer-readable storage media.
AUTOMATIC GAIN CONTROL BASED ON MACHINE LEARNING LEVEL ESTIMATION OF THE DESIRED SIGNAL
Method includes receiving, at a server device, from a plurality of input devices, audio data. The audio data of each input device corresponds to a time-related portion of the audio data. The method determines a speech energy level for each input device by providing the time-related audio portion as input to a trained model. For each input device, a statistical value associated with the speech energy level is determined. A strongest input device is identified based on the statistical value. The statistical value associated with the speech energy level of each input device other than the strongest input device is compared to the statistical value of the strongest input device. Depending on the comparison, the method determines whether to update the gain value of an input device to an estimated target gain value based on the statistical value of the speech energy level of the respective input device.
Methods, Systems, and Devices for Presenting an Audio Difficulties User Actuation Target in an Audio or Video Conference
A conferencing system terminal device includes a display, an audio output, a user interface, a communication device, and one or more processors. The one or more processors present an audio difficulties user actuation target upon the display during an audio or video conference occurring across a network and concurrently with a presentation of conference content. Actuation of the audio difficulties user actuation target indicates that audio content associated with the audio or video conference being delivered by the audio output is impaired.
Visual Interactive Voice Response
A method includes connecting a call from a client device to a destination having an interactive voice response service; transcribing audio from the destination during the call to identify menu options of the interactive voice response service; generating visualizations representing the menu options; and outputting the visualizations to a display associated with the client device. A system includes a telephony system, an automatic speech recognition processing tool, and a visualization output generation tool. The telephony system connects a call from a client device to a destination having an interactive voice response service. The automatic speech recognition processing tool transcribes audio from the destination during the call to identify menu options of the interactive voice response service. The visualization output generation tool generates visualizations representing the menu options. The telephony system outputs the visualizations to a display associated with the client device.
AUDIO FILTER EFFECTS VIA SPATIAL TRANSFORMATIONS
An audio system of a client device applies transformations to audio received over a computer network. The transformations (e.g., HRTFs) effect changes in apparent source positions of the received audio, or of segments thereof. Such transformations may be used to achieve “animation” of audio, in which the source positions of the audio or audio segments appear to change over time (e.g., circling around the listener). Additionally, segmentation of audio into distinct semantic audio segments, and application of separate transformations for each audio segment, can be used to intuitively differentiate the different audio segments by causing them to sound as if they emanated from different positions around the listener.
Crosstalk data detection method and electronic device
A method and an electronic device for detecting crosstalk data are provided. The method for detecting crosstalk data can detect whether an audio data stream includes crosstalk data. The method includes: receiving a first audio data block, a second audio data block, and a reference time difference, wherein the first audio data block and the second audio data block separately include a plurality of audio data segments; using a time difference between an acquisition time of an audio data segment in the first audio data block and a corresponding audio data segment in the second audio data block as an audio segment time difference; and determining that the audio data segment of the first audio data block includes crosstalk data when the audio segment time difference does not match the reference time difference.