G10L15/187

Recommending Results In Multiple Languages For Search Queries Based On User Profile
20230237098 · 2023-07-27 ·

Systems and methods for a media guidance application that generates results in multiple languages for search queries. In particular, the media guidance application resolves multiple language barriers by taking automatic and manual user language settings and applying those settings to a variety of potential search results.

Pronunciation error detection apparatus, pronunciation error detection method and program

The present invention provides a pronunciation error detection apparatus capable of following a text without the need for a correct sentence even when erroneous recognition such as a reading error occurs. The pronunciation error detection apparatus comprises: a speech recognition part that recognizes the speech in speech data based on a speech recognition model for a non-native speaker, and outputs speech recognition results, reliability and time information; a reliability determination part that outputs the speech recognition results with higher reliability than a predetermined threshold and the corresponding time information as the determined speech recognition results and the determined time information; and a pronunciation error detection part that outputs a phoneme as a pronunciation error when reliability for each phoneme in the speech recognition results using the native speaker speech recognition model under a weakly constraining grammar is greater than the reliability of the corresponding phoneme in the speech recognition results using the native speaker acoustic model under a constraining grammar in which the determined speech recognition results are correct for the speech data in a segment specified by the determined time information.

Pronunciation error detection apparatus, pronunciation error detection method and program

The present invention provides a pronunciation error detection apparatus capable of following a text without the need for a correct sentence even when erroneous recognition such as a reading error occurs. The pronunciation error detection apparatus comprises: a speech recognition part that recognizes the speech in speech data based on a speech recognition model for a non-native speaker, and outputs speech recognition results, reliability and time information; a reliability determination part that outputs the speech recognition results with higher reliability than a predetermined threshold and the corresponding time information as the determined speech recognition results and the determined time information; and a pronunciation error detection part that outputs a phoneme as a pronunciation error when reliability for each phoneme in the speech recognition results using the native speaker speech recognition model under a weakly constraining grammar is greater than the reliability of the corresponding phoneme in the speech recognition results using the native speaker acoustic model under a constraining grammar in which the determined speech recognition results are correct for the speech data in a segment specified by the determined time information.

Phonetic keyboard and system to facilitate communication in English
11705115 · 2023-07-18 ·

A system to facilitate communication in English uses a keyboard having at least one phoneme key that displays a phonetic pronunciation symbol that represent sounds of spoken English. The system also includes a computer board disposed within the keyboard and a processor and associated memory in electrical communication with an input from the phoneme key. The processor executes a set of computer instructions that bi-directionally translates the input as a standard English letter to a phonetic pronunciation or from a phonetic input to a standard English letter. An output device in electrical communication with the keyboard wherein the standard English letter is displayed is also provided. In one embodiment, a computing device configured to execute a set of computer instructions is provided that display a digitally produced keyboard for display that uses a downloadable APP to translate between a phonetic input and a standard English letter and displays a result.

Phonetic keyboard and system to facilitate communication in English
11705115 · 2023-07-18 ·

A system to facilitate communication in English uses a keyboard having at least one phoneme key that displays a phonetic pronunciation symbol that represent sounds of spoken English. The system also includes a computer board disposed within the keyboard and a processor and associated memory in electrical communication with an input from the phoneme key. The processor executes a set of computer instructions that bi-directionally translates the input as a standard English letter to a phonetic pronunciation or from a phonetic input to a standard English letter. An output device in electrical communication with the keyboard wherein the standard English letter is displayed is also provided. In one embodiment, a computing device configured to execute a set of computer instructions is provided that display a digitally produced keyboard for display that uses a downloadable APP to translate between a phonetic input and a standard English letter and displays a result.

TECHNOLOGIES FOR CREATING, ALTERING, AND PRESENTING MEDIA CONTENT

Different types of media experiences can be developed based on characteristics of the consumer. “Linear” experiences may require execution of a pre-built script, although the script could be dynamically modified by a media production platform. Linear experiences can include guided audio tours that are modified or updated based on the location of the consumer. “Enhanced” experiences include conventional media content that is supplemented with intelligent media content. For example, turn-by-turn directions could be supplemented with audio descriptions about the surrounding area. “Freeform” experiences, meanwhile, are those that can continually morph based on information gleaned from a consumer. For example, a radio station may modify what content is being presented based on the geographical metadata uploaded by a computing device associated with the consumer.

TECHNOLOGIES FOR CREATING, ALTERING, AND PRESENTING MEDIA CONTENT

Different types of media experiences can be developed based on characteristics of the consumer. “Linear” experiences may require execution of a pre-built script, although the script could be dynamically modified by a media production platform. Linear experiences can include guided audio tours that are modified or updated based on the location of the consumer. “Enhanced” experiences include conventional media content that is supplemented with intelligent media content. For example, turn-by-turn directions could be supplemented with audio descriptions about the surrounding area. “Freeform” experiences, meanwhile, are those that can continually morph based on information gleaned from a consumer. For example, a radio station may modify what content is being presented based on the geographical metadata uploaded by a computing device associated with the consumer.

SYSTEMS AND METHODS FOR PHONETIC-BASED NATURAL LANGUAGE UNDERSTANDING
20230017352 · 2023-01-19 ·

Systems and methods are described for modifying a phonetic search index based on a use frequency associated with phonetic representations of text terms included in metadata of a media item. A first phonetic representation of a text term of the metadata, pronounced as a word, may be generated. A second phonetic representation of the text term may be generated by concatenating a phonetic representation of each letter in the text term. A database may be queried to determine use frequencies of the first and second phonetic representations, one of which may be selected based on a comparison of the use frequencies. A phonetic search index may be modified by including an entry for the selected phonetic representation. A voice query related to the media item may be received, and a reply to the voice query may be generated for output by performing a lookup in the modified phonetic search index.

SYSTEMS AND METHODS FOR PHONETIC-BASED NATURAL LANGUAGE UNDERSTANDING
20230017352 · 2023-01-19 ·

Systems and methods are described for modifying a phonetic search index based on a use frequency associated with phonetic representations of text terms included in metadata of a media item. A first phonetic representation of a text term of the metadata, pronounced as a word, may be generated. A second phonetic representation of the text term may be generated by concatenating a phonetic representation of each letter in the text term. A database may be queried to determine use frequencies of the first and second phonetic representations, one of which may be selected based on a comparison of the use frequencies. A phonetic search index may be modified by including an entry for the selected phonetic representation. A voice query related to the media item may be received, and a reply to the voice query may be generated for output by performing a lookup in the modified phonetic search index.

Processing Multimodal User Input for Assistant Systems
20230222605 · 2023-07-13 ·

In one embodiment, a method includes receiving at a head-mounted device a speech input from a user and a visual input captured by cameras of the head-mounted device, wherein the visual input comprises subjects and attributes associated with the subjects, and wherein the speech input comprises a co-reference to one or more of the subjects, resolving entities corresponding to the subjects associated with the co-reference based on the attributes and the co-reference, and presenting a communication content responsive to the speech input and the visual input at the head-mounted device, wherein the communication content comprises information associated with executing results of tasks corresponding to the resolved entities.