Patent classifications
G10L15/083
Generation of training data for verbal harassment detection
In some cases, one or more heuristics can be automatically generated using a small dataset of segments previously labeled by one or more domain experts. The generated one or more heuristics along with one or more patterns can be used to assign training labels to a large unlabeled dataset of segments. A subset of segments representing an occurrence of verbal harassment can be selected using the assigned training labels. Randomly selected segments can be used as being indicative of a non-occurrence of verbal harassment. The selected subset of segments and randomly selected segments can be used to train one or more machine learning models for verbal harassment detection.
REAL-TIME NAME MISPRONUNCIATION DETECTION
A real-time name mispronunciation detection feature can enable a user to receive instant feedback anytime they have mispronounced another person's name in an online meeting. The feature can receive audio input of a speaker and obtain a transcript of the audio input; identify a name from text of the transcript based on names of meeting participants; and extract a portion of the audio input corresponding to the name identified from the text of the transcript. The feature can obtain a reference pronunciation for the name using a user identifier associated with the name; and can obtain a pronunciation score for the name based on a comparison between the reference pronunciation for the name and the portion of the audio input corresponding to the name. The feature can then determine whether the pronunciation score is below a threshold; and in response, notify the speaker of a pronunciation error.
DYNAMIC CONTEXT EXTRACTION FROM MEDIA STREAMS
A method of enabling a virtual assistant (VA) serving a user to dynamically acquire contextual information regarding digital media environment accessed by a user includes: extracting, by an analysis engine, the contextual information dynamically from at least one of media content accessed by the user and webpage content accessed by the user; and injecting, by the analysis engine, the extracted contextual information into a VA memory to serve the user. The analysis engine is configured to analyze the extracted contextual information using at least one machine learning (ML) model. The extracted contextual information includes at least one of topics, intents, entities, sentiments, and products of interest. The at least one ML model includes at least one of Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), speaker diarization, sentiment analysis on media streams, and web analytics for product focus.
Context aware beamforming of audio data
Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.
Annotation of media files with convenient pause points
A computer-implemented method, a computer system and a computer program product annotate media files with convenient pause points. The method includes acquiring a text file version of an audio narration file. The text file version includes a pause point history of a plurality of prior users. The method also includes generating a list of pause points based on the pause point history. In addition, the method includes determining a tone of voice being used by a speaker at each pause point using natural language processing algorithms. The method further includes determining a set of convenient pause points based on the list of pause points and the determined tone of voice. Lastly, the method includes inserting the determined set of convenient pause points into the audio narration file.
SYSTEMS AND METHODS FOR DETERMINING USAGE INFORMATION
Systems and methods are described for determining usage information. A computing device may determine an advertising event associated with content. The computing device may cause activation of a data capture component to capture data at one or more times associated with the advertising event. The data can be analyzed to determine usage information indicative of user behavior during the advertising event.
Voice communication targeting user interface
User interfaces may enable users to initiate voice-communications with voice-controlled devices via a Wi-Fi network or other network via an Internet Protocol (IP) address. The user interfaces may include controls to enable users to initiate voice communications, such as Voice over Internet Protocol (VoIP) calls, with devices that do not have connectivity with traditional mobile telephone networks, such as traditional circuit transmissions of a Public Switched Telephone Network (PSTN). For example, the user interface may enable initiating a voice communication with a voice-controlled device that includes network connectivity via a home Wi-Fi network. The user interfaces may indicate availability of devices and/or contacts for voice communications and/or recent activity of devices or contact.
Electronic device for performing voice recognition using microphones selected on basis of operation state, and operation method of same
Various embodiments of the present invention relate to an electronic device for performing voice recognition using microphones selected on the basis of the operation state, and an operation method of same. According to an embodiment, the electronic device includes: one or more microphone arrays which include a plurality of microphones; at least one processor operatively connected to the microphone arrays; and at least one memory electrically connected to the processor, wherein the memory may store instructions for the processor to, at the time of execution; receive wake-up utterances, for calling designated voice services, by using a first group of microphones among the plurality of microphones when operating in a first state; operate in a second state in response to the wake-up utterances; and receive subsequent utterances using a second group of microphones among the plurality of microphones when operating in the second state. Various other embodiments are also possible.
Determining order preferences and item suggestions
A computer system may connect to various customer-facing devices and manage or automate the order process between a retail store and the customer. The computer system may perform the dialogue and receive an order for items from the retail store and may perform quality control monitoring of the dialogue between customers and employees taking orders. The ordering system may utilize the ordered items in combination with various contextual cues to determine a customer identity which may then be linked to past orders and/or various order preferences. Based on the determined customer identity, the system may provide recommendations of additional order items or order alterations to the customer before personally identifying information has been collected from the customer. The determination of the customer identity and the determination of recommendations may be performed by machine learning algorithms that were trained on customer data and the retail store products.
Method and system of recommending accommodation for tourists using multi-criteria decision making and augmented reality
Disclosed are a method and a system of recommending an accommodation for tourists using multi-criteria decision making (MCDM) and augmented reality. A method of recommending an accommodation for tourists using multi-criteria decision making and augmented reality, which is performed by a server device includes: selecting a recommendation target accommodation based on a current location of a user; selecting a plurality of recommended accommodations by MCDM based on user information including pre-registered preference information among the recommendation target accommodations; and providing an augmented reality interface displaying information on a recommended accommodation to a user terminal.