Patent classifications
G10L25/48
System and method of generating effects during live recitations of stories
One aspect of this disclosure relates to presentation of a first effect on one or more presentation devices during an oral recitation of a first story. The first effect is associated with a first trigger point, first content, and/or first story. The first trigger point being one or more specific syllables from a word and/or phrase in the first story. A first transmission point associated with the first effect can be determined based on a latency of a presentation device and user speaking profile. The first transmission point being one or more specific syllables from a word and/or phrase before the first trigger point in the first story. Control signals for instructions to present the first content at the first trigger point are transmitted to the presentation device when a user recites the first transmission point such that first content is presented at the first trigger point.
Voice user interface for intervening in conversation of at least one user by adjusting two different thresholds
An electronic device is provided. The electronic device includes a memory configured to store at least one instruction, and at least one processor where the at least one processor is configured to execute the instruction to obtain voice data from a conversation of at least one user, convert the voice data to text data, determine at least one parameter indicating characteristic of the conversation based on at least one of the voice data or the text data, adjust a condition for triggering intervention in the conversation based on the determined at least one parameter, and output a feedback based on the text data when the adjusted condition is satisfied, wherein the adjustment of the condition includes adjusting a first and a second threshold based on change of the at least one parameter.
BEHAVIOR DETECTION
A system includes a microphone and a computing device including a processor and a memory. The memory stores instructions executable by the processor to identify a word sequence in audio input received from the microphone, to determine a behavior pattern from the word sequence, and to report the behavior pattern to a remote server at a specified time.
NEURAL TRANSLATOR
A method and apparatus are provided for processing a set of communicated signals associated with a set of muscles, such as the muscles near the larynx of the person, or any other muscles the person use to achieve a desired response. The method includes the steps of attaching a single integrated sensor, for example, near the throat of the person proximate to the larynx and detecting an electrical signal through the sensor. The method further includes the steps of extracting features from the detected electrical signal and continuously transforming them into speech sounds without the need for further modulation. The method also includes comparing the extracted features to a set of prototype features and selecting a prototype feature of the set of prototype features providing a smallest relative difference.
NEURAL TRANSLATOR
A method and apparatus are provided for processing a set of communicated signals associated with a set of muscles, such as the muscles near the larynx of the person, or any other muscles the person use to achieve a desired response. The method includes the steps of attaching a single integrated sensor, for example, near the throat of the person proximate to the larynx and detecting an electrical signal through the sensor. The method further includes the steps of extracting features from the detected electrical signal and continuously transforming them into speech sounds without the need for further modulation. The method also includes comparing the extracted features to a set of prototype features and selecting a prototype feature of the set of prototype features providing a smallest relative difference.
INFORMATION PROCESSING APPARATUS AND COMMAND PROCESSING METHOD
An acoustic feature detection unit (31) detects acoustic features of voice discretely input separately from a command instructing movement of an operation target. A movement control unit (32) controls the movement of the operation target instructed by the command on the basis of the acoustic features detected by the acoustic feature detection unit (31).
System and Method for Video Authentication
A system and method for video authentication may apply machine learning to analyze whether a person's face captured by live video matches a face in a photo ID captured by live video and to analyze other features based on a video session with the person. For example, machine learning may be applied to analyze a set of features indicating whether the person is a real, live person (as opposed to a photo image held up over the person's face in the video, etc.). Finally, the machine learning may be applied to analyze a set of features to determine whether a lower probability prediction that the person's face captured by live video matches a face in a photo ID captured by live video should be either pass authentication (due to one or more features/circumstances mitigating the lower probability) or fail authentication (due to one or more features not mitigating the lower probability). In such a situation, the set of features may indicate that mitigating factors/conditions exist that can offset the lower probability.
Acceleration-based fast SOI processing
Techniques of deriving audio signals using frequency modulated continuous-wave (FMCW) LIDAR use an acceleration-based algorithm in which an audio signal is based on a difference in velocity between two up-chirps or two-down-chirps. Such an acceleration-based algorithm takes less computation, results in fast processing, boosts the high frequency component of the audio signals which the velocity-based algorithm lacks, and improves the subjective intelligibility. For example, in the acceleration-based algorithm, the DC components may be safely ignored in many cases. In such cases, the system does not require a band-pass filter as in the conventional systems, thus reducing computational burden. Moreover, the acceleration-based algorithm emphasizes high frequencies that form a more realistic depiction of human speech.
Acceleration-based fast SOI processing
Techniques of deriving audio signals using frequency modulated continuous-wave (FMCW) LIDAR use an acceleration-based algorithm in which an audio signal is based on a difference in velocity between two up-chirps or two-down-chirps. Such an acceleration-based algorithm takes less computation, results in fast processing, boosts the high frequency component of the audio signals which the velocity-based algorithm lacks, and improves the subjective intelligibility. For example, in the acceleration-based algorithm, the DC components may be safely ignored in many cases. In such cases, the system does not require a band-pass filter as in the conventional systems, thus reducing computational burden. Moreover, the acceleration-based algorithm emphasizes high frequencies that form a more realistic depiction of human speech.
Behavior detection
A system includes a microphone and a computing device including a processor and a memory. The memory stores instructions executable by the processor to identify a word sequence in audio input received from the microphone, to determine a behavior pattern from the word sequence, and to report the behavior pattern to a remote server at a specified time.