Patent classifications
G10L25/00
Speech recognition method and apparatus
Disclosed is a speech recognition method and apparatus, wherein the apparatus acquires first outputs from sub-models in a recognition model based on a speech signal, acquires a second output including values corresponding to the sub-models from a classification model based on the speech signal, and recognizes the speech signal based on the first outputs and the second output.
Method for capturing and updating database entries of CRM system based on voice commands
Described herein are systems and methods for facilitating the information entry and task updates to a task database in a cloud server. The task database is in synchronization with a customer relationship management (CRM) system. The systems and methods described herein enable users to update the task database and enter information into the task database in a timely manner such that the task database can stay updated. The updated database can be used to construct a suggested task set at the beginning of a period of time to meet a preset target sales value for the end of the period of time. In one embodiment, a method includes the operations of receiving, by one or more neutral network models on a cloud server, voice instructions captured by an interactive voice response (IVR) application on a mobile device, wherein the voice instructions are to update states of one or more tasks displayed in a voice interface of the IVR application; recognizing the voice instructions and constructing appropriate texts using the one or more trained neutral network models; presenting the texts to one or more voice interfaces provided by the IVR application for confirmation by a user; and storing the user confirmed texts to a repository on the cloud server.
Machine interpretation of distress situations using body language
A system for remote body communication that includes a processor configured to receive a video feed of a person performing a body communication. The processor is configured to determine whether the body communication is indicative of an emergency situation, translate the body communication, to a text or audio communication, and transmit the video feed and the text or audio communication to a receiving monitoring station. The processor then receives a responsive video feed of a responsive body communication and a responsive text or audio communication indicative of an instruction related to the emergency situation, and outputs the responsive video feed of the responsive body communication and the responsive text or audio communication via an operatively connected output processor.
Target-device resolution
Systems and methods for target-device resolution are disclosed. A user may speak a user utterance requesting an action to be performed with respect to an accessory device, such as a smart-home device. The user utterance may include an identifier for the accessory device, but that identifier may not correspond to a naming indicator of an accessory device and/or may correspond to multiple naming indicators. In these examples, one or more models are utilized to identify which accessory device is most likely to correspond to the accessory device targeted by the user utterance for operation.
Utilizing a dynamic memory network to track digital dialog states and generate responses
The present disclosure relates to generating digital responses based on digital dialog states generated by a neural network having a dynamic memory network architecture. For example, in one or more embodiments, the disclosed system provides a digital dialog having one or more segments to a dialog state tracking neural network having a dynamic memory network architecture that includes a set of multiple memory slots. In some embodiments, the dialog state tracking neural network further includes update gates and reset gates used in modifying the values stored in the memory slots. For instance, the disclosed system can utilize cross-slot interaction update/reset gates to accurately generate a digital dialog state for each of the segments of digital dialog. Subsequently, the system generates a digital response for each segment of digital dialog based on the digital dialog state.
Singing voice separation with deep U-Net convolutional networks
A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.
Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier
Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.
Real-time conversation analysis system
A conversation analysis system has a local system configured to capture conversation data associated with a conversation between an individual and a user and a backend system electronically connected to the local system. The backend system has a suggestion system configured to receive the conversation data from the local system and extract, by an analysis unit, information from the conversation data indicative of a mood of the individual. The suggestion system is further configured to classify, by a classifier unit, the individual based on the extracted information, including determining a current classification and determine, by a suggestion unit, one or more rules based at least on the current classification and stored information in a knowledge base. The suggestion system is further configured to provide, by the suggestion unit, at least one of an alert or suggestion to the user based on the one or more rules.
Adaptive and compensatory speech recognition methods and devices
A speech recognition method includes clustering feature vectors of training data to obtain clustered feature vectors of training data performing interpolation calculation on feature vectors of data to be recognized using the clustered feature vectors of training data, and inputting the feature vectors of data to be recognized after the interpolation calculation into a speech recognition model to adaptively adjust the speech recognition model. The techniques of the present disclosure improve speech recognition accuracy and adaptive processing efficiency.
Method and system for generating dynamic text responses for display after a search
A system and method for operating the same includes a language processing module generating a search request text signal and determining identified data from the search request text signal. A search module generates search results in response to the search request text signal. A dialog manager classifies the search request text signal into a response classification associated with a plurality of templates, selects a first template from the plurality of templates in response to the response classification and corrects search results in response to the identified data and the template to form a corrected response signal. A device receives and displays the corrected response signal.