G10L25/00

Device control method and electric device

A method for controlling an operation of a target device using a plurality of input devices is disclosed. The method comprises: receiving from one of the plurality of the input devices a first operation instruction issued to the target device, with a first data format; recognizing the first operation instruction and the first data format; determining that the one of the plurality of the input devices is a first input device corresponding to the first data format; and providing to a user of the target device a recommendation for a second input device, a type of the second input device being different from a type of the first input device, when it is determined that a type of the first operation instruction is identical to a type of a second operation instruction received from the second input device earlier than the reception of the first operation instruction.

Audio processing techniques for semantic audio recognition and report generation

System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. Extracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.

Adaptively filtering a microphone signal responsive to vibration sensed in a user's face while speaking

Electronic devices and methods are disclosed that adaptively filter a microphone signal responsive to vibration that is sensed in the face of a user speaking into a microphone of the device. An electronic device can include a microphone, a vibration sensor, a vibration characterization unit, and an adaptive sound filter. The microphone generates a microphone signal that can include a user speech component and a background noise component. The vibration sensor senses vibration of the face while a user speaks into the microphone, and generates a vibration signal containing frequency components that are indicative of the sensed vibration. The vibration characterization unit generates speech characterization data that characterize at least one of the frequency components of the vibration signal that is associated with the speech component of the microphone signal. The adaptive sound filter filters the microphone signal using filter coefficients that are tuned in response to the speech characterization data to generate a filtered speech signal with an attenuated background noise component relative to the user speech component from the microphone signal.

Adaptively filtering a microphone signal responsive to vibration sensed in a user's face while speaking

Electronic devices and methods are disclosed that adaptively filter a microphone signal responsive to vibration that is sensed in the face of a user speaking into a microphone of the device. An electronic device can include a microphone, a vibration sensor, a vibration characterization unit, and an adaptive sound filter. The microphone generates a microphone signal that can include a user speech component and a background noise component. The vibration sensor senses vibration of the face while a user speaks into the microphone, and generates a vibration signal containing frequency components that are indicative of the sensed vibration. The vibration characterization unit generates speech characterization data that characterize at least one of the frequency components of the vibration signal that is associated with the speech component of the microphone signal. The adaptive sound filter filters the microphone signal using filter coefficients that are tuned in response to the speech characterization data to generate a filtered speech signal with an attenuated background noise component relative to the user speech component from the microphone signal.

Voice enabled searching for wireless devices associated with a wireless network and voice enabled configuration thereof

Utilizing a voice capturing device (e.g., smart phone, tablet, smart speaker) to capture voice commands and send the voice commands to a cloud based voice recognition/processing engine to convert the commands to text commands. Processing the text commands at an access point for a WiFi network. The voice commands may include search queries about particular wireless devices that are associated with the WiFi network. The access point may search the configuration and connectivity data for the WiFi network to determine what access point the wireless device is connected to and a location for the access point. The result of the search may be announced to the user via the voice capturing device. The voice activated search may be to find wireless devices that have misplaced or for inventory management. The voice activated commands may also include voice WiFi network configuration commands.

Method and system for confidential sentiment analysis

A confidential sentiment analysis method includes receiving call data, storing the call data including interaction metadata, generating a speech-to-text transcript corresponding to words spoken by one or more callers, generating an anonymized transcript by anonymizing personally identifiable words, and generating a sentiment score by analyzing the anonymized transcript. A computing system includes a processor, and a memory including computer executable instructions that, when executed by the one processor, cause the system to receive call data, store the call data, generate a speech-to-text transcript, generate an anonymized transcript by anonymizing personally identifiable words, and generate a sentiment score based on the anonymized transcript. A non-transitory computer readable medium contains program instructions that when executed, cause a computer system to receive call data, store the call data, generate a speech-to-text transcript, generate an anonymized transcript by anonymizing personally identifiable words, and generate a sentiment score based on the anonymized transcript.

Method, device and system for detecting working state of tower controller
11211070 · 2021-12-28 · ·

The present disclosure provides a method, device and system for detecting a working state of a tower controller, the method includes: collecting voice data of a tower controller, and extracting a keyword from the voice data; acquiring a video image of the tower controller, and acquiring a gaze area of the tower controller from the video image; analyzing and detecting whether the tower controller has correctly accomplished an observation action according to the gaze area of the tower controller and the keyword. The present disclosure implements more efficient and accurate detection on the working state of the tower controller, and at the same time ensures the safety of an aircraft in an airport area and reduces a risk of colliding with other obstacles.

Audible command modification

A method and system for modifying an audible command is provided. The method includes continuously receiving audible commands associated with a context of interactions between a user and individuals. The audible commands are analyzed with respect to associated actions and user attributes of the audible commands are identified. Specified information required for executing each command of the audible commands and portions of the specified information associated with specified individuals of the individuals are determined. Digital audio samples of the user are retrieved and assigned to the portions of the specified information with respect to each command. The associated actions are modified with respect to the specified individuals and self-learning software code comprising the modified actions is generated and executed such that the commands are executed with respect to the modified actions.

Systems, methods, and computer-readable media for improved audio feature discovery using a neural network

Systems, methods, and computer-readable storage devices are disclosed for improved audio feature discovery using a neural network. One method including: receiving a trained neural network model, the trained neural network configured to output an audio feature classification of audio data; deconstructing the trained neural network model to generate at least one saliency map, the at least one saliency map providing a successful classification of the audio feature; and extracting at least one visualization of the audio feature the trained neural network model relies on for classification based on the at least one saliency map.

Utilizing bi-directional recurrent encoders with multi-hop attention for speech emotion recognition
11205444 · 2021-12-21 · ·

The present disclosure relates to systems, methods, and non-transitory computer readable media for determining speech emotion. In particular, a speech emotion recognition system generates an audio feature vector and a textual feature vector for a sequence of words. Further, the speech emotion recognition system utilizes a neural attention mechanism that intelligently blends together the audio feature vector and the textual feature vector to generate attention output. Using the attention output, which includes consideration of both audio and text modalities for speech corresponding to the sequence of words, the speech emotion recognition system can apply attention methods to one of the feature vectors to generate a hidden feature vector. Based on the hidden feature vector, the speech emotion recognition system can generate a speech emotion probability distribution of emotions among a group of candidate emotions, and then select one of the candidate emotions as corresponding to the sequence of words.