Patent classifications
H04M2203/5072
Shared Speakerphone System for Multiple Devices in a Conference Room
A speakerphone system is shared with multiple participant devices of participants in a physical meeting that are using a web conferencing service. An active speaker is identified from the participants. The participant device of the active speaker is switched, such that the speakerphone system receives and renders audio of the active speaker. Video of the participant device of the active speaker is enabled, such that the web conferencing service displays the video to the participant devices.
Systems and methods for filtering unwanted sounds from a conference call using voice synthesis
To filter unwanted sounds from a conference call, a first voice signal is captured by a first device during a conference call and converted into corresponding text, which is then analyzed to determine that a first portion of the text was spoken by a first user and a second portion of the text was spoken by a second user. If the first user is relevant to the conference call while the second user is not, the first voice signal is prevented from being transmitted into the conference call, the first portion of text is converted into a second voice signal using a voice profile of the first user to synthesize the voice of the first user, and the second voice signal is then transmitted into the conference call. The second portion of text is not converted into a voice signal, as the second user is determined not to be relevant.
Systems and methods for filtering unwanted sounds from a conference call
To filter unwanted sounds from a conference call, a voice profile of a first user is generated based on a first voice signal captured by a media device during a first conference call. The voice profile may be generated by identifying a base frequency of the first voice signal and determining a plurality of voice characteristics, such as pitch, intonation, accent, loudness, and speech rate. These data may be stored in association with the first user. During a second conference call, a second voice signal captured by the media device is analyzed to determine, based on the voice profile of the first user, whether the second voice signal includes the voice of a second user. If so, the second voice signal is prevented from being transmitted into the conference call. A voice profile of the second user may be generated from the second voice signal for future use.
LEVERAGING A NETWORK OF MICROPHONES FOR INFERRING ROOM LOCATION AND SPEAKER IDENTITY FOR MORE ACCURATE TRANSCRIPTIONS AND SEMANTIC CONTEXT ACROSS MEETINGS
In conventional audio and video conferencing, connecting the devices of participants in the same room to the conference can degrade the audio quality for every conference participant. Different speakers emit the same audio signals at different times, even in the same room, making echo cancellation difficult or impossible. Routing signals among devices in the same consumes bandwidth and introduces variable latency. The inventive conferencing technology eliminates these problems with more intelligent routing and mixing. An inventive conference bridge organize colocated clients into groups, picks one Elected Speaker per group, and sends signals to only the Elected Speakers. The Elected Speakers mix audio from other groups, share it within their groups using low-latency local connections, and play the audio after a delay. The other speakers may play the audio too and use the distributed mixes for automatic echo cancellation, improving call quality in real-time, and send the processed audio directly back to the bridge.
CALL AUDIO MIXING PROCESSING
A call audio mixing processing method is provided. In the method, call audio streams from terminals of call members participating in a call are obtained. Voice analysis is performed on the call audio streams to determine voice activity corresponding to each of the terminals. The voice activity of the terminals indicate activity levels of the call members participating in the call. According to the voice activity of the terminals, respective voice adjustment parameters corresponding to the terminals are determined. According to the respective voice adjustment parameters corresponding to the terminals, the call audio streams of the terminals are adjusted. Further, mixing processing is performed on the adjusted call audio streams to obtain a mixed audio stream.
Systems and methods for filtering unwanted sounds from a conference call using voice synthesis
To filter unwanted sounds from a conference call, a first voice signal is captured by a first device during a conference call and converted into corresponding text, which is then analyzed to determine that a first portion of the text was spoken by a first user and a second portion of the text was spoken by a second user. If the first user is relevant to the conference call while the second user is not, the first voice signal is prevented from being transmitted into the conference call, the first portion of text is converted into a second voice signal using a voice profile of the first user to synthesize the voice of the first user, and the second voice signal is then transmitted into the conference call. The second portion of text is not converted into a voice signal, as the second user is determined not to be relevant.
Dynamically controlled aspect ratios for communication session video streams
The disclosed techniques improve user engagement and promote efficient use of computing resources by providing dynamically controlled aspect ratios for communication session renderings based on a physical orientation of a device. In some configurations, a system can select a first aspect ratio for individual video streams of a communication session when a device is in a first orientation, e.g., a portrait orientation. In addition, the system can select a second aspect ratio for the individual video streams when the device is in a second orientation, e.g., a landscape orientation. In some configurations, the first aspect ratio can be greater than the second aspect ratio, or the aspect ratios can be selected based on a target aspect ratio, which can be adjusted over time. By dynamically selecting an aspect ratio for individual stream renderings, screen space of a device can be optimized while the device is held in various physical orientations.
Mediated multi party electronic conference system
An AI based moderator system for an electronic conference. The moderator scores users based on ratings and diversity, and attempts to keep a high rating person talking while maintaining diversity.
SYSTEMS AND METHODS FOR FILTERING UNWANTED SOUNDS FROM A CONFERENCE CALL
To filter unwanted sounds from a conference call, a voice profile of a first user is generated based on a first voice signal captured by a media device during a first conference call. The voice profile may be generated by identifying a base frequency of the first voice signal and determining a plurality of voice characteristics, such as pitch, intonation, accent, loudness, and speech rate. These data may be stored in association with the first user. During a second conference call, a second voice signal captured by the media device is analyzed to determine, based on the voice profile of the first user, whether the second voice signal includes the voice of a second user. If so, the second voice signal is prevented from being transmitted into the conference call. A voice profile of the second user may be generated from the second voice signal for future use.
SYSTEMS AND METHODS FOR FILTERING UNWANTED SOUNDS FROM A CONFERENCE CALL USING VOICE SYNTHESIS
To filter unwanted sounds from a conference call, a first voice signal is captured by a first device during a conference call and converted into corresponding text, which is then analyzed to determine that a first portion of the text was spoken by a first user and a second portion of the text was spoken by a second user. If the first user is relevant to the conference call while the second user is not, the first voice signal is prevented from being transmitted into the conference call, the first portion of text is converted into a second voice signal using a voice profile of the first user to synthesize the voice of the first user, and the second voice signal is then transmitted into the conference call. The second portion of text is not converted into a voice signal, as the second user is determined not to be relevant.