Patent classifications
G10L25/00
Context aware projection
In some embodiments, the disclosed subject matter involves a system for mapping projection of content to surfaces in an environment. Groups of users in the environment are identified and surfaces in the environment are selected/assigned for projection and/or touch input based on user preferences, ranking of surfaces for projectability or touchability, content to be displayed, proximity of user groups to one another and surfaces, and user feedback and control. Other embodiments are described and claimed.
Very short pitch detection and coding
System and method embodiments are provided for very short pitch detection and coding for speech or audio signals. The system and method include detecting whether there is a very short pitch lag in a speech or audio signal that is shorter than a conventional minimum pitch limitation using a combination of time domain and frequency domain pitch detection techniques. The pitch detection techniques include using pitch correlations in time domain and detecting a lack of low frequency energy in the speech or audio signal in frequency domain. The detected very short pitch lag is coded using a pitch range from a predetermined minimum very short pitch limitation that is smaller than the conventional minimum pitch limitation.
Method and system for emotion-triggered capturing of audio and/or image data
The present disclosure relates to a method for emotion-triggered capturing of audio and/or image data by an audio and/or image capturing device. The method includes receiving and analyzing a time-sequential set of data including first physiological data representing a first physiological parameter corresponding to a first person, a second physiological data representing a second physiological parameter corresponding to a second person, and voice audio data including a voice of at least one of the first and the second person, to determine whether a simultaneous change of emotional state of a first person and a second person occurs and transmitting a trigger signal to the capturing device. The present disclosure also relates to a corresponding apparatus and a system comprising the apparatus.
Binary caching for XML documents with embedded executable code
A method, system and voice browser execute voice applications to perform a voice-based function. A document is retrieved and parsed to create a parse tree. Script code is created from the parse tree, thereby consuming part of the parse tree to create a reduced parse tree. The reduced parse tree is stored in a cache for subsequent execution to perform the voice-based function.
Method and apparatus for machine translation using neural network and method of training the apparatus
A machine translation method and a machine translation apparatus using a neural network model are provided. The machine translation apparatus extracts information associated with a keyword from a source sentence, obtains a supplement sentence associated with the source sentence based on the extracted information associated with the keyword, acquires a first vector value from the source sentence and a second vector value from the supplement sentence using neural network model-based encoders, and outputs a target sentence corresponding to a translation of the source sentence based on any one or any combination of the first vector value and the second vector value using a neural network model-based decoder.
Method for detecting and recognizing an emotional state of a user
A method includes: prompting a user to recite a story associated with a first target emotion; recording the user reciting the story and recording a first timeseries of biosignal data via a set of sensors integrated into a wearable device worn by the user; accessing a first timeseries of emotion markers extracted from the voice recording; labeling the first timeseries of biosignal data according to the first timeseries of emotion markers; generating an emotion model linking biosignals to emotion markers for the user based on the first emotion-labeled timeseries of biosignal data; detecting a second instance of the first target emotion exhibited by the user based on a second timeseries of biosignal data and the emotion model; and notifying the user of the second instance of the first target emotion.
Natural language generation by an edge computing device
Systems and methods for natural language generation by an edge computing device are disclosed. In one embodiments, a method comprises: receiving, by an edge computing device, event data from an edge event; determining, by the edge computing device, that a network connection to a cloud server is not available; extracting, by the edge computing device, features of the event data; predicting, by a local neural network of the edge computing device, an action for the edge computing device to take based on the features of the event data, wherein the action is associated with a confidence level; and determining, by the edge computing device, whether the confidence level meets a predetermined threshold value.
Low-complexity tonality-adaptive audio signal quantization
The invention provides an audio encoder for encoding an audio signal so as to produce therefrom an encoded signal, the audio encoder including: a framing device configured to extract frames from the audio signal; a quantizer configured to map spectral lines of a spectrum signal derived from the frame of the audio signal to quantization indices, wherein the quantizer has a dead-zone, in which the input spectral lines are mapped to quantization index zero; and a control device configured to modify the dead-zone; wherein the control device includes a tonality calculating device configured to calculate at least one tonality indicating value for at least one spectrum line or for at least one group of spectral lines, wherein the control device is configured to modify the dead-zone for the at least one spectrum line or the at least one group of spectrum lines depending on the respective tonality indicating value.
Multi-lingual semantic parser based on transferred learning
The disclosure relates to transferred learning from a first language (e.g., a source language for which a semantic parser has been defined) to a second language (e.g., a target language for which a semantic parser has not been defined). A system may use knowledge from a trained model in one language to model another language. For example, the system may transfer knowledge of a semantic parser from a first (e.g., source) language to a second (e.g., target) language. Such transfer of knowledge may occur and be useful when the first language has sufficient training data but the second language has insufficient training data. The foregoing transfer of knowledge may extend the semantic parser for multiple languages (e.g., the first language and the second language).
Hotword-based speaker recognition
Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving an utterance from a user in a multi-user environment, each user having an associated set of available resources, determining that the received utterance includes at least one predetermined word, comparing speaker identification features of the uttered predetermined word with speaker identification features of each of a plurality of previous utterances of the predetermined word, the plurality of previous predetermined word utterances corresponding to different known users in the multi-user environment, attempting to identify the user associated with the uttered predetermined word as matching one of the known users in the multi-user environment, and based on a result of the attempt to identify, selectively providing the user with access to one or more resources associated with a corresponding known user.