Patent classifications
G10L25/78
AUTOMATED MIXING OF AUDIO DESCRIPTION
A computer-implemented method of audio processing, the method comprising: receiving audio object data and audio description data, wherein the audio object data includes a first plurality of audio objects; calculating a long-term loudness of the audio object data and a long- term loudness of the audio description data; calculating a plurality of short-term loudnesses of the audio object data and a plurality of short-term loudnesses of the audio description data; reading a first plurality of mixing parameters that correspond to the audio object data; generating a second plurality of mixing parameters based on the first plurality of mixing parameters, the long-term loudness of the audio object data, the long-term loudness of the audio description data, the plurality of short-term loudnesses of the audio object data, and the plurality of short-term loudnesses of the audio description data; generating a gain adjustment visualization corresponding to the second plurality of mixing parameters, the audio object data and the audio description data; and generating mixed audio object data by mixing the audio object data and the audio description data according to the second plurality of mixing parameters, wherein the mixed audio object data includes a second plurality of audio objects, wherein the second plurality of audio objects correspond to the first plurality of audio objects mixed with the audio description data according to the second plurality of mixing parameters.
AUTOMATED MIXING OF AUDIO DESCRIPTION
A computer-implemented method of audio processing, the method comprising: receiving audio object data and audio description data, wherein the audio object data includes a first plurality of audio objects; calculating a long-term loudness of the audio object data and a long- term loudness of the audio description data; calculating a plurality of short-term loudnesses of the audio object data and a plurality of short-term loudnesses of the audio description data; reading a first plurality of mixing parameters that correspond to the audio object data; generating a second plurality of mixing parameters based on the first plurality of mixing parameters, the long-term loudness of the audio object data, the long-term loudness of the audio description data, the plurality of short-term loudnesses of the audio object data, and the plurality of short-term loudnesses of the audio description data; generating a gain adjustment visualization corresponding to the second plurality of mixing parameters, the audio object data and the audio description data; and generating mixed audio object data by mixing the audio object data and the audio description data according to the second plurality of mixing parameters, wherein the mixed audio object data includes a second plurality of audio objects, wherein the second plurality of audio objects correspond to the first plurality of audio objects mixed with the audio description data according to the second plurality of mixing parameters.
DYNAMIC ADAPTATION OF PARAMETER SET USED IN HOT WORD FREE ADAPTATION OF AUTOMATED ASSISTANT
Hot word free adaptation, of function(s) of an automated assistant, responsive to determining, based on gaze measure(s) and/or active speech measure(s), that a user is engaging with the automated assistant. Implementations relate to techniques for mitigating false positive occurrences of and/or false negative occurrences, of hot word free adaptation, through utilization of a permissive parameter set in some situation(s) and a restrictive parameter set in other situation(s). For example, utilizing the restrictive parameter set when it is determined that a user is engaged in conversation with additional user(s). The permissive parameter set includes permissive parameter(s) that are more permissive than counterpart(s) in the restrictive parameter set. A parameter set is utilized in determining whether condition(s) are satisfied, where those condition(s), if satisfied, indicate that the user is engaging in hot word free interaction with the automated assistant and result in adaptation of function(s) of the automated assistant
DYNAMIC ADAPTATION OF PARAMETER SET USED IN HOT WORD FREE ADAPTATION OF AUTOMATED ASSISTANT
Hot word free adaptation, of function(s) of an automated assistant, responsive to determining, based on gaze measure(s) and/or active speech measure(s), that a user is engaging with the automated assistant. Implementations relate to techniques for mitigating false positive occurrences of and/or false negative occurrences, of hot word free adaptation, through utilization of a permissive parameter set in some situation(s) and a restrictive parameter set in other situation(s). For example, utilizing the restrictive parameter set when it is determined that a user is engaged in conversation with additional user(s). The permissive parameter set includes permissive parameter(s) that are more permissive than counterpart(s) in the restrictive parameter set. A parameter set is utilized in determining whether condition(s) are satisfied, where those condition(s), if satisfied, indicate that the user is engaging in hot word free interaction with the automated assistant and result in adaptation of function(s) of the automated assistant
SYSTEM AND METHOD FOR GENERATING WRAP UP INFORMATION
A system for generating wrap-up information is capable of learning how interactions are transformed into contact notes and outcome codes using natural language processing and can generate the contact notes and outcome codes for new incoming interactions by applying prediction models trained on interaction data, contact notes and outcome codes. The system for generating wrap-up information receives interaction data, including interaction audio data, interaction transcripts, associated contact notes and associated outcome codes. The interaction transcripts are generated from the previous interactions between agents and customers. The contact notes and outcome codes are generated by agents during the associated previous interactions. The system processes and uses the interaction data to train prediction models to analyze interaction audio data and interaction transcripts and predict appropriate contact notes and outcome codes for the interaction. Once trained the prediction model(s) can generate appropriate contact notes and outcome codes for new interactions.
Detailed Videoconference Viewpoint Generation
A videoconference system is described that generates a video for a room including multiple videoconference participants and outputs the video as part of the videoconference. The videoconference system is configured to generate the video as including a detailed view of one of the multiple videoconference participants located in the room. To do so, the videoconference system detects user devices located in the room capable of capturing video and determines a position of each user device. The videoconference system then detects a user speaking in the room and determines a position of the active speaker. At least one of the user devices is identified as including a camera oriented for capturing the active speaker. Video content captured by one or more user devices is then processed by the videoconference system to generate a detailed view of the active speaker.
Detailed Videoconference Viewpoint Generation
A videoconference system is described that generates a video for a room including multiple videoconference participants and outputs the video as part of the videoconference. The videoconference system is configured to generate the video as including a detailed view of one of the multiple videoconference participants located in the room. To do so, the videoconference system detects user devices located in the room capable of capturing video and determines a position of each user device. The videoconference system then detects a user speaking in the room and determines a position of the active speaker. At least one of the user devices is identified as including a camera oriented for capturing the active speaker. Video content captured by one or more user devices is then processed by the videoconference system to generate a detailed view of the active speaker.
VOICE DRIVEN DYNAMIC MENUS
Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.
HEARING AUGMENTATION AND WEARABLE SYSTEM WITH LOCALIZED FEEDBACK
Aspects of the present disclosure provide techniques, including devices and system implementing the techniques, to provide feedback to a user of an event when the user is wearing a wearable device. For example, the wearable device may provide high quality noise canceling audio playback to the user, lowering the user's situation awareness. The techniques include measuring ambient sound using two or more microphones on the wearable device. The measured ambient sound is used to determine a related event worth relaying to the user. Based on the location attribute and sound properties, the nature and/or classification of the event may be ascertained using pattern recognition algorithms according to user threshold settings. Insignificant events that the user prefers to ignore will be ruled out by the algorithm. Upon determining the event that merits the user's attention, the wearable device provides feedback to the user indicating the nature and location of the event.
HEARING AUGMENTATION AND WEARABLE SYSTEM WITH LOCALIZED FEEDBACK
Aspects of the present disclosure provide techniques, including devices and system implementing the techniques, to provide feedback to a user of an event when the user is wearing a wearable device. For example, the wearable device may provide high quality noise canceling audio playback to the user, lowering the user's situation awareness. The techniques include measuring ambient sound using two or more microphones on the wearable device. The measured ambient sound is used to determine a related event worth relaying to the user. Based on the location attribute and sound properties, the nature and/or classification of the event may be ascertained using pattern recognition algorithms according to user threshold settings. Insignificant events that the user prefers to ignore will be ruled out by the algorithm. Upon determining the event that merits the user's attention, the wearable device provides feedback to the user indicating the nature and location of the event.