G10L25/72

Systems and methods for generating labeled data to facilitate configuration of network microphone devices

Systems and methods for generating training data are described herein. Pieces of metadata captured by a plurality of networked sensor systems can be captured, where each piece of metadata is associated with a specific set of sensor data captured by one of the plurality of networked sensor systems and includes a set of characteristics for the specific set of captured sensor data. A probabilistic model can be generated based on the received metadata and simulations can be performed based upon a training corpus by generating multiple scenarios, and, for each scenario, a scenario specific version of a particular annotated sample is generated by performing a simulation using the particular annotated sample. The scenario specific versions of annotated samples from the training corpus can be stored as a training data set on the at least one network device.

MOBILE DEVICE INFERENCE AND LOCATION PREDICTION OF A MOVING OBJECT OF INTEREST

A first set of data may be received indicating that an object of interest has been identified. A second set of data may be received indicating a first location of where the object of interest was identified. The first location may correspond to a geographical area. In response to the receiving of the first set of data and the second set of data, the first location may be associated with a first transceiver base station. In response to the associating, a first list of one or more mobile devices may be obtained that are within an active range of the first transceiver base station.

Waypoint detection for a contact center analysis system

A contact center analysis system can receive various types of communications from customers, such as audio from telephone calls, voicemails, or video conferences; text from speech-to-text translations, emails, live chat transcripts, text messages, and the like; and other media or multimedia. The system can segment the communication data using temporal, lexical, semantic, syntactic, prosodic, user, and/or other features of the segments. The system can cluster the segments according to one or more similarity measures of the segments. The system can use the clusters to train a machine learning classifier to identify one or more of the clusters as waypoints (e.g., portions of the communications of particular relevance to a user training the classifier). The system can automatically classify new communications using the classifier and facilitate various analyses of the communications using the waypoints.

Speech signal processing system facilitating natural language processing using audio transduction
11705151 · 2023-07-18 ·

Systems and methods transmit, to a user device across a network, digital communication(s) thereby facilitating displaying the digital communication(s) via a user interface of the user device. Based on a user of the user device providing user input(s) in response to the digital communication(s) response data related to physical location(s) are received and data processing is performed thereon to determine whether additional data collection sequence(s) should be provided. Based on determining an additional data collection sequence should be provided, a condition-specific data collection sequence is provided via the user interface to facilitate obtaining condition-specific data related to a condition at a physical location, where the condition-specific data includes audio data collected via the user device and where the obtaining the condition-specific data comprises using a speech signal processing system to perform audio transduction to generate the audio data from a speech signal and facilitate performing natural language processing thereon.

Speech signal processing system facilitating natural language processing using audio transduction
11705151 · 2023-07-18 ·

Systems and methods transmit, to a user device across a network, digital communication(s) thereby facilitating displaying the digital communication(s) via a user interface of the user device. Based on a user of the user device providing user input(s) in response to the digital communication(s) response data related to physical location(s) are received and data processing is performed thereon to determine whether additional data collection sequence(s) should be provided. Based on determining an additional data collection sequence should be provided, a condition-specific data collection sequence is provided via the user interface to facilitate obtaining condition-specific data related to a condition at a physical location, where the condition-specific data includes audio data collected via the user device and where the obtaining the condition-specific data comprises using a speech signal processing system to perform audio transduction to generate the audio data from a speech signal and facilitate performing natural language processing thereon.

SYSTEM AND METHOD FOR IDENTIFYING ACTIVITY IN AN AREA USING A VIDEO CAMERA AND AN AUDIO SENSOR
20230222799 · 2023-07-13 ·

Identifying activity in an area even during periods of poor visibility using a video camera and an audio sensor are disclosed. The video camera is used to identify visible events of interest and the audio sensor is used to capture audio occurring temporally with the identified visible events of interest. A sound profile is determined for each of the identified visible events of interest based on sounds captured by the audio sensor during the corresponding identified visible event of interest. Then, during a time of poor visibility, a subsequent sound event is identified in a subsequent audio stream captured by the audio sensor. One or more sound characteristics of the subsequent sound event are compared with the sound profiles associated with each of the identified visible events of interest, and if there is a match, one or more matching sound profiles are filtered out from the subsequent audio stream.

SYSTEM AND METHOD FOR IDENTIFYING ACTIVITY IN AN AREA USING A VIDEO CAMERA AND AN AUDIO SENSOR
20230222799 · 2023-07-13 ·

Identifying activity in an area even during periods of poor visibility using a video camera and an audio sensor are disclosed. The video camera is used to identify visible events of interest and the audio sensor is used to capture audio occurring temporally with the identified visible events of interest. A sound profile is determined for each of the identified visible events of interest based on sounds captured by the audio sensor during the corresponding identified visible event of interest. Then, during a time of poor visibility, a subsequent sound event is identified in a subsequent audio stream captured by the audio sensor. One or more sound characteristics of the subsequent sound event are compared with the sound profiles associated with each of the identified visible events of interest, and if there is a match, one or more matching sound profiles are filtered out from the subsequent audio stream.

Voice vector framework for authenticating user interactions

There are provided systems and methods for a voice vector framework that authenticates user interactions. A service provider server receives user interaction data having audio data that is associated with an interaction between a user device and the service provider server. The server extracts user attributes from the audio data and obtains user account information associated with the user device. The server selects a classifier that corresponds to a select combination of features based on the user account information and applies the classifier to the user attributes. The server generates a voice vector that includes multiple scores indicating likelihoods that a respective user attribute corresponds to an attribute of the select combination of features. The server compares the voice vector to a baseline vector corresponding to a predetermined combination of features and sends a notification to an agent device with an indication of whether the user device is verified.

Voice vector framework for authenticating user interactions

There are provided systems and methods for a voice vector framework that authenticates user interactions. A service provider server receives user interaction data having audio data that is associated with an interaction between a user device and the service provider server. The server extracts user attributes from the audio data and obtains user account information associated with the user device. The server selects a classifier that corresponds to a select combination of features based on the user account information and applies the classifier to the user attributes. The server generates a voice vector that includes multiple scores indicating likelihoods that a respective user attribute corresponds to an attribute of the select combination of features. The server compares the voice vector to a baseline vector corresponding to a predetermined combination of features and sends a notification to an agent device with an indication of whether the user device is verified.

MEETING INCLUSION AND HYBRID WORKPLACE INSIGHTS

The disclosure herein describes a system for calculating meeting inclusion metrics including insights and recommendations. Meeting data associated with one or more meetings attended by at least one participant remotely is converted into anonymized meeting data for inclusivity metric analysis. An inclusivity insights manager generates inclusivity metrics associated with inclusive behavior and language occurring during meetings to measure the level of inclusivity. The inclusivity metrics include attendee participation metrics measuring an amount of participation by each meeting attendee, participation in-person versus participation remotely, concurrent speech indicating attendees may be talking over one another or other interruptions occurring during meetings. Inclusivity metric data includes insights and actionable recommendations to improve inclusivity at future meetings provided at an individual level, group level or organizational level. The inclusivity insights can also include percentage metric values, graphs, feedback, and other metric-related information for improving participation by meeting attendees.