G06F40/53

SYSTEM AND A METHOD FOR PHONETIC-BASED TRANSLITERATION
20230116268 · 2023-04-13 ·

A system and a method for converting text in one of a plurality of input languages into a text in a second language using phonetic based transliteration are disclosed. The method includes receiving (802) an input text in a first script from a user; phonetically mapping (804) each character of the input text with a second script corresponding to the second language; validating (806) permutations of mapping of each input character with each character of second script and transliterating (808) input text in first script into an output text in second script. A transliteration engine (106) is configured to transliterate input text of first language into the output text of second language. The transliteration engine (106) includes a data reception module (108), a data transformation module (110), a training module (112), an inference module (114), and a database (116).

PARALLEL UNICODE TOKENIZATION IN A DISTRIBUTED NETWORK ENVIRONMENT

Unicode data can be protected in a distributed tokenization environment. Data to be tokenized can be accessed or received by a security server, which instantiates a number of tokenization pipelines for parallel tokenization of the data. Unicode token tables are accessed by the security server, and each tokenization pipeline uses the accessed token tables to tokenization a portion of the data. Each tokenization pipeline performs a set of encoding or tokenization operations in parallel and based at least in part on a value received from another tokenization pipeline. The outputs of the tokenization pipelines are combined, producing tokenized data, which can be provided to a remote computing system for storage or processing.

PARALLEL UNICODE TOKENIZATION IN A DISTRIBUTED NETWORK ENVIRONMENT

Unicode data can be protected in a distributed tokenization environment. Data to be tokenized can be accessed or received by a security server, which instantiates a number of tokenization pipelines for parallel tokenization of the data. Unicode token tables are accessed by the security server, and each tokenization pipeline uses the accessed token tables to tokenization a portion of the data. Each tokenization pipeline performs a set of encoding or tokenization operations in parallel and based at least in part on a value received from another tokenization pipeline. The outputs of the tokenization pipelines are combined, producing tokenized data, which can be provided to a remote computing system for storage or processing.

Translation processing method and storage medium
11645475 · 2023-05-09 · ·

A translation processing method executed by a computer, the translation processing method includes calculating a first translation probability from each of first phonemes included in a first document described in a first language into each of second phonemes included in a second document, whose contents substantially equivalent to those of the first document, described in a second language and a second translation probability from each of the second phonemes into each of the first phonemes; extracting a phoneme pair in which the first translation probability and the second translation probability are equal to or higher than a threshold value; and generating translation phrases in the first document and the second document based on the extracted phoneme pair.

Translation processing method and storage medium
11645475 · 2023-05-09 · ·

A translation processing method executed by a computer, the translation processing method includes calculating a first translation probability from each of first phonemes included in a first document described in a first language into each of second phonemes included in a second document, whose contents substantially equivalent to those of the first document, described in a second language and a second translation probability from each of the second phonemes into each of the first phonemes; extracting a phoneme pair in which the first translation probability and the second translation probability are equal to or higher than a threshold value; and generating translation phrases in the first document and the second document based on the extracted phoneme pair.

ELECTRONIC DEVICE FOR PROVIDING TRANSLATION SERVICE AND METHOD THEREOF

An electronic device and method for providing a translations service are disclosed. The electronic device for providing a translation service includes an input unit comprising input circuitry configured to receive input text of a first language, a processor configured to divide the input text into a main segment and a sub-segment and to generate output text of a second language by selecting translation candidate text corresponding to the input text from translation candidate text of the second language, based on a meaning of text included in the sub-segment, and an output unit comprising output circuitry configured to output the output text.

METHOD AND DEVICE FOR PROCESSING A MULTI-LANGUAGE TEXT

Embodiments of the present disclosure provide a method and apparatus for processing a multi-language text. According to embodiments of the present disclosure, the multi-language text including contents in a plurality of languages may be encoded with a Unicode. The method further comprises splitting the multi-language text into a plurality of parts based on the Unicode of the multi-language text, contents of the plurality of parts having different languages. In addition, the multi-language text may also be processed based on the plurality of parts.

SAMPLE GENERATION METHOD, MODEL TRAINING METHOD, TRAJECTORY RECOGNITION METHOD, DEVICE, AND MEDIUM
20230195998 · 2023-06-22 ·

Disclosed are a sample generation method, a model training method, a trajectory recognition method, a device, and a medium. The method is: determining a code result of a training Chinese character according to a preset code library, where the preset code library is generated based on code characters in a five-stroke code corpus; taking the code result as a training label of the training Chinese character; and generating a training sample according to both a writing trajectory and the training label of the training Chinese character. The amount of information carried in the training sample is enriched.

METHOD AND SYSTEM FOR TRANSCRIPTION OF A LEXICAL UNIT FROM A FIRST ALPHABET INTO A SECOND ALPHABET
20170357634 · 2017-12-14 ·

A server and a method for transcription of a lexical unit from a first alphabet into a second alphabet, the method comprising: acquiring a pair of (i) the lexical unit written in the first alphabet, and (ii) the corresponding transcription of the lexical unit written in the second alphabet, both having been divided into respective segments, such that within the pair, every segment of the lexical unit has a corresponding segment in the transcription of the lexical unit, and such that each lexical unit comprises either a sequence of sequentially alternating consonant segments, or a single vowel segment, or a single consonant segment; defining, for each given segment of the lexical unit, its context; training the server to calculate a theoretical frequency of at least one second alphabet character representing transcription of a particular given segment based on the context of particular given segment of the lexical unit.

METHOD AND APPARATUS FOR IDENTIFYING SIMILAR DATA ELEMENTS USING STRING MATCHING

Disclosed is a method and apparatus for identifying similar record elements to a query, the method including receiving a query, determining an index for the query, generating candidate records from a reference list to match the query based on the index and applying any one or any combination of a q-gram filter, a length filter, and a shared character count (SCC) filter, determining similarity scores of each of the candidate records identifying records from among the candidate records having a similarity score greater than or equal to a threshold, selecting data records similar to the query based on sorting the selected records according to respective similarity scores, and outputting one or more of the selected data records.