G10L15/193

Information processing apparatus and non-transitory computer readable medium
11314810 · 2022-04-26 · ·

An information processing apparatus includes: a receiver configured to receive an utterance content of a speaker, a processing structure of a work in which the speaker utters, the work including plural processing units, and a processing unit in execution in the processing structure; an extraction unit configured to extract a related document including a sentence whose similarity to the utterance content of the speaker received by the receiver is equal to or higher than a threshold, from among related documents that are associated in advance with at least one processing unit including the processing unit in execution received by the receiver; and a setting unit configured to set a processing unit from which the extraction unit extracts a related document next, according to the processing structure received by the receiver.

Systems and methods for semantic search engine analysis

Systems and methods for semantic search engine analysis are disclosed. Generally, the system receives user provided text and/or speech search input, and further processes the search input to determine a semantic meaning. If the search input is speech, the system may first convert the speech into text. The system lexically processes the search input for ensuring the search input is valid, tags portions of the search input with metadata for applying a meaning of the tagged portions, and generally analyzes the relative locations of individual words and phrases to determine grammatical or linguistic relationships within the search input. In some embodiments, the system may disambiguate words or search terms, as well as provide input suggestions to the user. The system may use the manipulated search input to generate a search query, such as a query for searching apartment listing databases, and further display the search query results to the user.

Systems and methods for semantic search engine analysis

Systems and methods for semantic search engine analysis are disclosed. Generally, the system receives user provided text and/or speech search input, and further processes the search input to determine a semantic meaning. If the search input is speech, the system may first convert the speech into text. The system lexically processes the search input for ensuring the search input is valid, tags portions of the search input with metadata for applying a meaning of the tagged portions, and generally analyzes the relative locations of individual words and phrases to determine grammatical or linguistic relationships within the search input. In some embodiments, the system may disambiguate words or search terms, as well as provide input suggestions to the user. The system may use the manipulated search input to generate a search query, such as a query for searching apartment listing databases, and further display the search query results to the user.

ALPHANUMERIC SEQUENCE BIASING FOR AUTOMATIC SPEECH RECOGNITION
20220013126 · 2022-01-13 ·

Speech processing techniques are disclosed that enable determining a text representation of alphanumeric sequences in captured audio data. Various implementations include determining a contextual biasing finite state transducer (FST) based on contextual information corresponding to the captured audio data. Additional or alternative implementations include modifying probabilities of one

ALPHANUMERIC SEQUENCE BIASING FOR AUTOMATIC SPEECH RECOGNITION
20220013126 · 2022-01-13 ·

Speech processing techniques are disclosed that enable determining a text representation of alphanumeric sequences in captured audio data. Various implementations include determining a contextual biasing finite state transducer (FST) based on contextual information corresponding to the captured audio data. Additional or alternative implementations include modifying probabilities of one

Method and apparatus for extracting information

A method and an apparatus for extracting information are provided. The method according to an embodiment includes: receiving and parsing voice information of a user to generate text information corresponding to the voice information; extracting to-be-recognized contact information from the text information; acquiring an address book of the user, the address book including at least two pieces of contact information; generating at least two types of matching information based on the to-be-recognized contact information; determining, for each of the at least two types of matching information, a matching degree between the to-be-recognized contact information and each of at least two pieces of contact information based on the type of matching information; and extracting contact information matching the to-be-recognized contact information from the address book based on the determined matching degree.

Method and apparatus for extracting information

A method and an apparatus for extracting information are provided. The method according to an embodiment includes: receiving and parsing voice information of a user to generate text information corresponding to the voice information; extracting to-be-recognized contact information from the text information; acquiring an address book of the user, the address book including at least two pieces of contact information; generating at least two types of matching information based on the to-be-recognized contact information; determining, for each of the at least two types of matching information, a matching degree between the to-be-recognized contact information and each of at least two pieces of contact information based on the type of matching information; and extracting contact information matching the to-be-recognized contact information from the address book based on the determined matching degree.

SYSTEMS AND METHODS FOR EMPLOYING ALTERNATE SPELLINGS FOR IMPROVING THE RECOGNITION OF RARE WORDS
20230326450 · 2023-10-12 ·

A method of adding a custom vocabulary to a transcription system includes receiving a custom vocabulary at an ASIRW module. The method further includes tokenizing the custom vocabulary with the ASIRW module. The method further includes creating a new WFST (weighted finite-state transducer) with the ASIRW module. The method further includes transcribing audio using the new WFST with the ASIRW module. The tokenizing includes performing a translation model on each word of the custom vocabulary

SYSTEMS AND METHODS FOR EMPLOYING ALTERNATE SPELLINGS FOR IMPROVING THE RECOGNITION OF RARE WORDS
20230326450 · 2023-10-12 ·

A method of adding a custom vocabulary to a transcription system includes receiving a custom vocabulary at an ASIRW module. The method further includes tokenizing the custom vocabulary with the ASIRW module. The method further includes creating a new WFST (weighted finite-state transducer) with the ASIRW module. The method further includes transcribing audio using the new WFST with the ASIRW module. The tokenizing includes performing a translation model on each word of the custom vocabulary

Semantic understanding method and apparatus, and device and storage medium

A semantic understanding method and apparatus, and a device and a storage medium are provided. The method includes: acquiring a recognition character string that matches speech information; acquiring, from an entity vocabulary library, at least one entity vocabulary respectively corresponding to each recognition character in the recognition character string; and according to a situation of each entity vocabulary hitting the recognition character string, determining a matching entity vocabulary as a semantic understanding result of the speech information. By means of the method, insofar as a completely matching entity vocabulary is not acquired, a matching entity vocabulary can still be determined according to an entity vocabulary library, and semantic information of speech is thus accurately understood; and the method also has relatively high fault tolerance for situations such as wrong words, added words, and omitted words, such that the semantic understanding accuracy of speech information is improved.