G10L15/197

VIDEO-AIDED UNSUPERVISED GRAMMAR INDUCTION
20230035708 · 2023-02-02 · ·

A method of training a natural language neural network comprises obtaining at least one constituency span; obtaining a training video input; applying a multi-modal transform to the video input, thereby generating a transformed video input; comparing the at least one constituency span and the transformed video input using a compound Probabilistic Context-Free Grammar (PCFG) model to match the at least one constituency span with corresponding portions of the transformed video input; and using results from the comparison to learn a constituency parser.

VIDEO-AIDED UNSUPERVISED GRAMMAR INDUCTION
20230035708 · 2023-02-02 · ·

A method of training a natural language neural network comprises obtaining at least one constituency span; obtaining a training video input; applying a multi-modal transform to the video input, thereby generating a transformed video input; comparing the at least one constituency span and the transformed video input using a compound Probabilistic Context-Free Grammar (PCFG) model to match the at least one constituency span with corresponding portions of the transformed video input; and using results from the comparison to learn a constituency parser.

System to convert phonemes into phonetics-based words
11615786 · 2023-03-28 ·

A system to convert phonemes into phonetics-based words that is implemented in one or more computing systems, in association with a system that provides required inputs is disclosed. Said system comprises a phoneme enhancer, a phoneme sequence buffer, a phoneme sequence to phonetics-based word converter that comprises a sliding window phoneme sequence matcher, a phoneme sequence to phonetics-based word custom data memory, a most frequent phonetics-based word predictive memory, a phoneme similarity matrix, and a phonetics-based word output unit.

Dialogue system and method of controlling the same

A dialogue system includes a processor configured to: generate a meaning representation corresponding to an input sentence by performing Natural Language Understanding on the input sentence, generate an output sentence corresponding to the input meaning representation based on Recurrent Neural network (RNN), and determine whether the input sentence cannot be processed using the natural language generator. The processor calculates a parameter representing a probability of outputting the input sentence when the meaning representation corresponding to the input sentence is input to the natural language generator, and determines whether the input sentence cannot be processed based on the calculated parameter.

Dialogue system and method of controlling the same

A dialogue system includes a processor configured to: generate a meaning representation corresponding to an input sentence by performing Natural Language Understanding on the input sentence, generate an output sentence corresponding to the input meaning representation based on Recurrent Neural network (RNN), and determine whether the input sentence cannot be processed using the natural language generator. The processor calculates a parameter representing a probability of outputting the input sentence when the meaning representation corresponding to the input sentence is input to the natural language generator, and determines whether the input sentence cannot be processed based on the calculated parameter.

Meeting transcription using custom lexicons based on document history
11488602 · 2022-11-01 · ·

A collaborative content management system allows multiple users to access and modify collaborative documents. When audio data is recorded by or uploaded to the system, the audio data may be transcribed or summarized to improve accessibility and user efficiency. Text transcriptions are associated with portions of the audio data representative of the text, and users can search the text transcription and access the portions of the audio data corresponding to search queries for playback. An outline can be automatically generated based on a text transcription of audio data and embedded as a modifiable object within a collaborative document. The system associates hot words with actions to modify the collaborative document upon identifying the hot words in the audio data. Collaborative content management systems can also generate custom lexicons for users based on documents associated with the user for use in transcribing audio data, ensuring that text transcription is more accurate.

Meeting transcription using custom lexicons based on document history
11488602 · 2022-11-01 · ·

A collaborative content management system allows multiple users to access and modify collaborative documents. When audio data is recorded by or uploaded to the system, the audio data may be transcribed or summarized to improve accessibility and user efficiency. Text transcriptions are associated with portions of the audio data representative of the text, and users can search the text transcription and access the portions of the audio data corresponding to search queries for playback. An outline can be automatically generated based on a text transcription of audio data and embedded as a modifiable object within a collaborative document. The system associates hot words with actions to modify the collaborative document upon identifying the hot words in the audio data. Collaborative content management systems can also generate custom lexicons for users based on documents associated with the user for use in transcribing audio data, ensuring that text transcription is more accurate.

Regional features based speech recognition method and system
11488587 · 2022-11-01 · ·

Disclosed is a regional-features-based speech recognition method, including learning speech features by region using speech data classified by region category, and recognizing input speech using an acoustic model and a language model generated through classification of a region category for the input speech and the learning. A user may use a dialect recognition service that is improved using learning based on artificial intelligence (AI) and enhanced mobile broadband (eMBB), ultra-reliable and low latency communications (URLLC), and massive machine-type communications (mMTC) techniques of 5G mobile communication.

Regional features based speech recognition method and system
11488587 · 2022-11-01 · ·

Disclosed is a regional-features-based speech recognition method, including learning speech features by region using speech data classified by region category, and recognizing input speech using an acoustic model and a language model generated through classification of a region category for the input speech and the learning. A user may use a dialect recognition service that is improved using learning based on artificial intelligence (AI) and enhanced mobile broadband (eMBB), ultra-reliable and low latency communications (URLLC), and massive machine-type communications (mMTC) techniques of 5G mobile communication.

SPEECH INTERACTION METHOD, APPARATUS, DEVICE AND COMPUTER STORAGE MEDIUM
20220351721 · 2022-11-03 ·

The present disclosure provides a speech interaction method, apparatus, device and computer storage medium and relates to the field of artificial intelligence. A specific implementation solution is as follows: performing speech recognition and demand analysis for a first speech instruction input by a user; performing demand prediction for the first speech instruction if the demand analysis fails, to obtain at least one demand expression; returning at least one of the demand expression to the user in a form of a question; performing a service response with a demand analysis result corresponding to the demand expression confirmed by the user, if a second speech instruction confirming at least one of the demand expression is received from the user. The present disclosure can efficiently improve the user's interaction efficiency and enhance the user's experience.