G06F40/53

TEXT PROCESSING METHOD
20230101401 · 2023-03-30 ·

A text processing method is provided. The method includes: a first probability value of each candidate character of a plurality of candidate characters corresponding to a target position is determined based on character feature information corresponding to the target position in a text fragment to be processed, wherein the character feature information is determined based on a context at the target position in the text fragment to be processed; a second probability value of each candidate character of the plurality of candidate characters is determined based on a character string including the candidate character and at least one character in at least one position in the text fragment to be processed adjacent to the target position; and a correction character at the target position is determined based on the first probability value and the second probability value of each candidate character of the plurality of candidate characters.

Language-agnostic multilingual modeling using effective script normalization

A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.

PERSONALIZED EMOJI DICTIONARY

A personalized emoji dictionary, such as for use with emoji-first messaging. Text messaging is automatically converted to emojis by an emoji-first application so that only emojis are communicated from one client device to another client device. Each client device has a personalized emoji library of emojis that are mapped to words, which libraries are customizable and unique to the users of the client devices, such that the users can communicate secretly in code. Upon receipt of a string of emojis, a user can select the emoji string to convert to text if desired, for a predetermined period of time.

Electronic device for determining a character type in a character combination and processes capable of execution therewith, and control method for and storage medium storing program for same
11610070 · 2023-03-21 · ·

There is provided an electronic device including a display, a designation device configured to designate any range of a character string displayed on the display by a user, and a processor. The processor is configured to target as a processing target at least one character included in the range designated by the designation device, determine to which of a plurality of predetermined combinations a combination of a character type of the at least one character and a position of the at least one character in the range corresponds, and execute a process on characters included in the designated range based on the determined combination.

Electronic device for determining a character type in a character combination and processes capable of execution therewith, and control method for and storage medium storing program for same
11610070 · 2023-03-21 · ·

There is provided an electronic device including a display, a designation device configured to designate any range of a character string displayed on the display by a user, and a processor. The processor is configured to target as a processing target at least one character included in the range designated by the designation device, determine to which of a plurality of predetermined combinations a combination of a character type of the at least one character and a position of the at least one character in the range corresponds, and execute a process on characters included in the designated range based on the determined combination.

PERSONALIZED EMOJI DICTIONARY

A personalized emoji dictionary, such as for use with emoji-first messaging. Text messaging is automatically converted to emojis by an emoji-first application so that only emojis are communicated from one client device to another client device. Each client device has a personalized emoji library of emojis that are mapped to words, which libraries are customizable and unique to the users of the client devices, such that the users can communicate secretly in code. Upon receipt of a string of emojis, a user can select the emoji string to convert to text if desired, for a predetermined period of time.

Alternate character set domain name suggestion and registration using translation and/or transliteration
11637806 · 2023-04-25 · ·

Some embodiments provide domain name suggestions based on a user-provided ASCII phrase translated and/or transliterated into any of a number of supported non-English language character sets. To suggest non-English-language domain names, some embodiments parse, translate, and transliterate the user-provided ASCII names into domain names that include at least one non-English language character. Moreover, some embodiments determine the DNS registration status (e.g., as a second-level domain) of the Punycode (in ASCII) corresponding to these non-English domain names and provide the user with the ability to register any that are unregistered.

Alternate character set domain name suggestion and registration using translation and/or transliteration
11637806 · 2023-04-25 · ·

Some embodiments provide domain name suggestions based on a user-provided ASCII phrase translated and/or transliterated into any of a number of supported non-English language character sets. To suggest non-English-language domain names, some embodiments parse, translate, and transliterate the user-provided ASCII names into domain names that include at least one non-English language character. Moreover, some embodiments determine the DNS registration status (e.g., as a second-level domain) of the Punycode (in ASCII) corresponding to these non-English domain names and provide the user with the ability to register any that are unregistered.

Generating finite state automata for recognition of organic compound names in Chinese
11636268 · 2023-04-25 · ·

The present disclosure relates to a method and device for generating a finite state automata for recognizing a chemical name in a text, and a recognition method. According to an embodiment of the present disclosure, the method comprises substituting representation constants of categories of character segments appearing in an organic compound name set into the organic compound name set to obtain a conversion name set; updating the conversion name set based on a conversion name segment which repeatedly appears in the conversion name set; and generating the finite state automata based on the updated conversion name set.

TEXT MINING METHOD BASED ON ARTIFICIAL INTELLIGENCE, RELATED APPARATUS AND DEVICE
20230111582 · 2023-04-13 ·

This application discloses a text mining method based on artificial intelligence performed by a computer device. This application includes: obtaining domain candidate term features corresponding to domain candidate terms; obtaining term quality scores corresponding to the domain candidate terms according to the domain candidate term features; determining a new term from the domain candidate terms according to the term quality scores corresponding to the domain candidate terms; obtaining an associated text according to the new term; and determining a domain seed term as a domain new term in response to determining according to the associated text that the domain seed term satisfies a domain new term mining condition. By this application, new terms can be automatically selected from domain candidate terms based on a machine learning algorithm, thereby reducing manpower costs and well adapting to the rapid emergence of special new terms in the Internet era.