Patent classifications
G06F16/683
Sound recognition model training method and system and non-transitory computer-readable medium
A sound recognition model training method comprises determining a relationship between a sound event and first parameter and deciding a second parameter in response to the relationship, performing sampling on the sound event using the first parameter and the second parameter to generate training audio files, and inputting at least part of the training audio files to a sound recognition model for training the sound recognition model, wherein a length of each of the training audio files is associated with the first parameter, a time difference between every two of the training audio files is associated with the second parameter, and the sound recognition model is used for determining a sound classification.
System and method for context driven voice interface in handheld wireless mobile devices
A sequence of context based search verb and search terms are selected via either touch or voice selection in a mobile wireless device and then a human articulated voice query is expanded using a culture and a world intelligence dictionary for conducting more efficient searches. Focus groups are used for populating prior query search databases for storage in the mobile wireless device that are organized by context based search terms in a mobile wireless device for efficient search.
System and method for context driven voice interface in handheld wireless mobile devices
A sequence of context based search verb and search terms are selected via either touch or voice selection in a mobile wireless device and then a human articulated voice query is expanded using a culture and a world intelligence dictionary for conducting more efficient searches. Focus groups are used for populating prior query search databases for storage in the mobile wireless device that are organized by context based search terms in a mobile wireless device for efficient search.
Recorded sound thumbnail
Aspects of the present disclosure involve a system and a method for performing operations comprising: displaying, by a messaging application, a sound capture screen that enables a user to record the sound; after the sound is recorded using the sound capture screen, generating, by the messaging application, a visual element associated with the sound; receiving, by the messaging application, selection of the visual element from a displayed list of visual elements representing different sounds; in response to receiving the selection of the visual element, conditionally adding one or more graphics representing the sound to one or more images at a user selected position based on a privacy status of the sound; and playing, by the messaging application, the sound associated with the visual element together with displaying the one or more images.
Recorded sound thumbnail
Aspects of the present disclosure involve a system and a method for performing operations comprising: displaying, by a messaging application, a sound capture screen that enables a user to record the sound; after the sound is recorded using the sound capture screen, generating, by the messaging application, a visual element associated with the sound; receiving, by the messaging application, selection of the visual element from a displayed list of visual elements representing different sounds; in response to receiving the selection of the visual element, conditionally adding one or more graphics representing the sound to one or more images at a user selected position based on a privacy status of the sound; and playing, by the messaging application, the sound associated with the visual element together with displaying the one or more images.
Automatic synthesis of translated speech using speaker-specific phonemes
An embodiment includes converting an original audio signal to an original text string, the original audio signal being from a recording of the original text string spoken by a specific person in a source language. The embodiment generates a translated text string by translating the original text string from the source language to a target language, including translation of a word from the source language to a target language. The embodiment assembles a standard phoneme sequence from a set of standard phonemes, where the standard phoneme sequence includes a standard pronunciation of the translated word. The embodiment also associates a custom phoneme with a standard phoneme of the standard phoneme sequence, where the custom phoneme includes the specific person's pronunciation of a sound in the translated word. The embodiment synthesizes the translated text string to a translated audio signal including the translated word pronounced using the custom phoneme.
Automatic synthesis of translated speech using speaker-specific phonemes
An embodiment includes converting an original audio signal to an original text string, the original audio signal being from a recording of the original text string spoken by a specific person in a source language. The embodiment generates a translated text string by translating the original text string from the source language to a target language, including translation of a word from the source language to a target language. The embodiment assembles a standard phoneme sequence from a set of standard phonemes, where the standard phoneme sequence includes a standard pronunciation of the translated word. The embodiment also associates a custom phoneme with a standard phoneme of the standard phoneme sequence, where the custom phoneme includes the specific person's pronunciation of a sound in the translated word. The embodiment synthesizes the translated text string to a translated audio signal including the translated word pronounced using the custom phoneme.
Methods and systems for predictive buffering of related content segments
The methods and systems described herein aid users by providing thorough and efficient content consumption. For example, the methods and systems buffer content segments related to a current portion of the content that the system is generating for display. The methods and systems determine a characteristic of the current portion of the content and related content segments based on the characteristic. Confidence scores are determined by the systems and methods for each of the related content segments, and one or more related content segments with higher confidence scores are buffered in memory. Accordingly, the methods and systems described herein provide a thorough viewing of content through related segments that are buffered in memory for quick access.
Contextual assistant using mouse pointing or touch cues
A method for a contextual assistant to use mouse pointing or touch cues includes receiving audio data corresponding to a query spoken by a user, receiving, in a graphical user interface displayed on a screen, a user input indication indicating a spatial input applied at a first location on the screen, and processing the audio data to determine a transcription of the query. The method also includes performing query interpretation on the transcription to determine that the query is referring to an object displayed on the screen without uniquely identifying the object, and requesting information about the object. The method further includes disambiguating, using the user input indication indicating the spatial input applied at the first location on the screen, the query to uniquely identify the object that the query is referring to, obtaining the information about the object requested by the query, and providing a response to the query.
Generation, administration and analysis of user experience testing
Systems and methods for generating, administering and analyzing a user experience study are provided. In particular, intents can be generated from a user experience study by applying one or more screener questions to participants and subjecting the screened participants to one or more tasks. Corresponding clickstreams and success data for each participant engaging in the tasks can be recorded. The success and clickstream data can also be aggregated for all the screened participants as aggregated results. Video data including audio for each of the screened participants can also be recorded.