Patent classifications
G10L25/57
PERSONAL CONTENT MANAGED DURING EXTENDED DISPLAY SCREEN RECORDING
In aspects of personal content managed during extended display screen recording, a screen recording system includes a wireless device that provides digital image content for display on an extended display device, and a screen recording session on the wireless device captures the digital image content and audio data. The wireless device implements a content control module that can determine the screen recording session would capture personal content associated with a user of the wireless device. The content control module can initiate a private screen review mode in which the personal content is displayable on a display screen of the wireless device and is prevented from visual display on the extended display device. The content control module can also generate a shareable screen recording that includes the audio data and the digital image content displayed on the extended display device, without including the personal content.
METHOD AND SYSTEM FOR SPEECH DETECTION AND SPEECH ENHANCEMENT
A method of speech detection and speech enhancement in a speech detection and speech enhancement unit of Multipoint Conferencing Node (MCN) and a method of training the same. The method comprising receiving input audio segments, and determining an acoustic environment based on input audio auxiliary information, extracting T-F-domain features from the received input audio segments, determining if each of the received input audio segments is speech by inputting the T-F domain features into a speech detection classifier trained for the determined acoustic environment, determining, when one of the received input audio segments is speech, if the received audio segment is noisy speech by inputting the T-F domain features into a noise classifier using a statistical generative model representing the probability distributions of the T-F domain features of noisy speech trained for the determined acoustic environment, and applying a noise reduction mask on the received input audio segments according to the determination of the received audio segment is noisy speech
METHOD AND SYSTEM FOR SPEECH DETECTION AND SPEECH ENHANCEMENT
A method of speech detection and speech enhancement in a speech detection and speech enhancement unit of Multipoint Conferencing Node (MCN) and a method of training the same. The method comprising receiving input audio segments, and determining an acoustic environment based on input audio auxiliary information, extracting T-F-domain features from the received input audio segments, determining if each of the received input audio segments is speech by inputting the T-F domain features into a speech detection classifier trained for the determined acoustic environment, determining, when one of the received input audio segments is speech, if the received audio segment is noisy speech by inputting the T-F domain features into a noise classifier using a statistical generative model representing the probability distributions of the T-F domain features of noisy speech trained for the determined acoustic environment, and applying a noise reduction mask on the received input audio segments according to the determination of the received audio segment is noisy speech
Identifying task assignments
Task assignments are identified. A dataset that includes one or more electronic messages is received. Then, one or more pending tasks in the dataset are identified, and each of a plurality of people who are mentioned in the dataset is also identified. Then, for each of the pending tasks, one or more of the identified people are identified as potentially being people who are assigned to complete the pending task, and the pending task is associated with these identified one or more of the identified people. For each of the pending tasks, one or more of the identified people are also identified as potentially being people for whom the pending task is to be completed, and the pending task is also associated with these identified one or more of the identified people.
Identifying task assignments
Task assignments are identified. A dataset that includes one or more electronic messages is received. Then, one or more pending tasks in the dataset are identified, and each of a plurality of people who are mentioned in the dataset is also identified. Then, for each of the pending tasks, one or more of the identified people are identified as potentially being people who are assigned to complete the pending task, and the pending task is associated with these identified one or more of the identified people. For each of the pending tasks, one or more of the identified people are also identified as potentially being people for whom the pending task is to be completed, and the pending task is also associated with these identified one or more of the identified people.
Devices, systems, and methods for real time surveillance of audio streams and uses therefore
Various examples are provided for surveillance of an audio stream. In one example, a method includes identifying presence or absence of a sound type of interest at a location during a time period; selecting the sound type from a library of sound type information to provide a collection of sound type information; incorporating the collection on a device proximate to the location; acquiring an audio stream from the location by the device to provide a locational audio stream; analyzing the locational audio stream to determine whether a sound type in the collection is present in the audio stream; and generating a notification to a user or computer if a sound type in the collection is present. The device can acquire and process the audio stream. In another example, a bulk sound type information library can be generated by identifying sound types of interest including them based upon a confidence level.
Devices, systems, and methods for real time surveillance of audio streams and uses therefore
Various examples are provided for surveillance of an audio stream. In one example, a method includes identifying presence or absence of a sound type of interest at a location during a time period; selecting the sound type from a library of sound type information to provide a collection of sound type information; incorporating the collection on a device proximate to the location; acquiring an audio stream from the location by the device to provide a locational audio stream; analyzing the locational audio stream to determine whether a sound type in the collection is present in the audio stream; and generating a notification to a user or computer if a sound type in the collection is present. The device can acquire and process the audio stream. In another example, a bulk sound type information library can be generated by identifying sound types of interest including them based upon a confidence level.
METHOD AND APPARATUS FOR RECOGNIZING SUBTITLE REGION, DEVICE, AND STORAGE MEDIUM
A method and an apparatus for recognizing a subtitle region, a device, and a storage medium are provided, relating to the field of computer vision technologies of artificial intelligence. The method includes: recognizing a video to obtain n candidate subtitle regions, the candidate subtitle regions being regions in which text contents are displayed in the video, and n being a positive integer; and screening the n candidate subtitle regions according to a subtitle region screening policy to obtain the subtitle region, the subtitle region screening policy being used for determining a candidate subtitle region in which text contents have a repetition rate being lower than a repetition rate threshold and have a longest total display duration as the subtitle region. By using the method and apparatus, device, and system, labor resources required for subtitle region recognition can be saved.
HIERARCHICAL GENERATED AUDIO DETECTION SYSTEM
Disclosed is a hierarchical generated audio detection system, comprising an audio preprocessing module, a CQCC feature extraction module, a LFCC feature extraction module, a first-stage lightweight coarse-level detection model and a second-stage fine-level deep identification model; the audio preprocessing module preprocesses collected audio or video data to obtain an audio clip with a length not exceeding the limit; inputting the audio clip into CQCC feature extraction module and LFCC feature extraction module respectively to obtain CQCC feature and LFCC feature; inputting CQCC feature or LFCC feature into the first-stage lightweight coarse-level detection model for first-stage screening to screen out the first-stage real audio and the first-stage generated audio; inputting the CQCC feature or LFCC feature of the first-stage generated audio into the second-stage fine-level deep identification model to identify the second-stage real audio and the second-stage generated audio, and the second-stage generated audio is identified as generated audio.
COMPUTER-BASED TECHNIQUES FOR VISUALLY NARRATING RECORDED MEETING CONTENT
In various embodiments, a meeting narration application generates visualizations of recorded meeting data. The meeting narration application generates a first visualization of a set of parameters based on a set of transcript sentences associated with the recorded meeting data. The meeting narration application displays the first visualization and a first expanded content visualization of a first transcript sentence included in the set of transcript sentences within a graphical user interface (GUI). Subsequently, the meeting narration application receives a user event associated with the first visualization via the GUI. The meeting narration application modifies a first parameter selection associated with the set of parameters based on the user event to generate a modified parameter selection. Based on the modified parameter selection, the meeting narration application displays a first compressed content visualization of the first transcript sentence within the GUI.