G06F40/129

TRANSLITERATION APPARATUS, TRANSLITERATION METHOD, TRANSLITERATION PROGRAM, AND INFORMATION PROCESSING APPARATUS
20170228360 · 2017-08-10 · ·

A transliteration processing device according to one embodiment includes a character string acquisition unit that acquires a first alphabetic character string representing by alphabet a first word written in a first language having a specified script and a second alphabetic character string representing by alphabet a second word written in a second language having a different script from the first language, a determination unit that makes a determination whether a first consonant element included in the first alphabetic character string and a second consonant element included in the second alphabetic character string have a predetermined correspondence, and determines whether the first word and the second word have a transliteration relationship based on a result of the determination, and an output unit that outputs, as a transliteration pair, the first word and the second word determined to have a transliteration relationship by the determination unit.

ISOLATING SEGMENTS OF BIDIRECTIONAL TEXT
20170270090 · 2017-09-21 ·

Embodiments of the present invention include a method, system, and computer program product for isolating a segment of bidirectional text. A segment of bidirectional text may be identified. A Unicode left-to-right character (LRM) may be inserted on at least a first side of the segment of bidirectional text. A Unicode right-to-left character (RLM) may be inserted on at least a second side of the segment of bidirectional text. The segment of bidirectional text may be processed through a Unicode Bidirectional Algorithm (UBA) implementation. A directionality mismatch between the LRM and the RLM may cause a conflict. In response to the conflict, the Unicode Bidirectional Algorithm may select a base text direction for the segment of bidirectional text.

Chinese, Japanese, or Korean language detection

Disclosed are systems, computer-readable mediums, and methods for determining that text contains Chinese, Japanese, or Korean characters. One method includes determining a language hypothesis for each text fragment in a plurality of text fragments identified from connected components in a document image. The method further includes selecting a first subset of text fragments from the plurality of text fragments based on ratings for the language hypothesis of each text fragment in the plurality of text fragments. The method further includes verifying, by a processor, the language hypothesis of one or more text fragments in the first subset of text fragments based on optical character recognition of the one or more text fragments. The method further includes determining, by the processor, that Chinese, Japanese, or Korean (CJK) characters are present in the document image based on the verification of the language hypothesis of each of the one or more text fragments.

Handwriting input display apparatus, handwriting input display method and recording medium storing program
11250253 · 2022-02-15 · ·

A handwriting input display apparatus causes display means to display a stroke generated by an input made by using input means to a screen as a handwritten object. The apparatus includes display control means for causing the display means to display character string candidates including a handwriting recognition candidate when the handwritten object does not change for a predetermined time. When the handwriting recognition candidate is selected, the display control means causes the display means to erase a display of the character string candidates and a display of the handwritten object, and causes the display means to display a character string object at a position where the erased handwritten object was displayed. When selection of the handwriting recognition candidate is not performed for a predetermined time and the display of the character string candidates is erased, the display control means causes the handwritten object to be kept displayed.

Generation apparatus and method
09767193 · 2017-09-19 · ·

A generation apparatus that generates a contracted sentence in which one part of a plurality of words included in a sentence is removed, the generation apparatus includes a memory configured to store a first index for determining whether two words are left as a pair in the contracted sentence, for each characteristic between the two words being connected to each other in the sentence through a grammatical or conceptual relation, and a processor coupled to the memory and configured to generate the contracted sentence by removing the one part of the plurality of words based on the first index corresponding to every pair of two words connected to each other with the grammatical or conceptual relation, and output the contracted sentence.

METHOD AND SYSTEM FOR IDEOGRAM CHARACTER ANALYSIS
20170262474 · 2017-09-14 · ·

Ideogram character analysis includes partitioning an original ideogram character into strokes, and mapping each stroke to a corresponding stroke identifier (id) to create an original stroke id sequence that includes stroke identifiers. A candidate ideogram character that has a candidate stroke id sequence within a threshold distance to the original stroke id sequence is selected. One or more embodiments may create new phrase by replacing the original ideogram character with the candidate ideogram character in a search phrase. One or more embodiments perform a search using the search phrase and the new phrase to obtain a result, and present the result. One or more embodiments may replace an original ideogram character in a character recognized document with the candidate ideogram character and store the character recognized document.

VOICE RECOGNITION METHOD AND DEVICE
20220238098 · 2022-07-28 ·

A method for recognizing speech comprises: respectively setting initial values of a Chinese character coefficient and a Pinyin coefficient, generating a Chinese character mapping function according to the initial value of the Chinese character coefficient, and generating a Pinyin mapping function according to the initial value of the Pinyin coefficient (S101); training the Chinese character mapping function and the Pinyin mapping function using a plurality of preset training samples, calculating training results as parameters of a joint loss function, and generating a target mapping function according to calculation results (S102); and recognizing, according to the target mapping function, speech to be recognized, so as to obtain a Chinese character recognition result and a Pinyin recognition result of the speech to be recognized (S103). The method reduces the cost of speech recognition while ensuring the accuracy of speech recognition.

INFORMATION PRESENTATION DEVICE, AND INFORMATION PRESENTATION METHOD
20210383793 · 2021-12-09 · ·

There is provided an information presentation device that is configured to present information, to a plurality of users that differ in level, in such a manner that each of the users can easily understand the information, and an information presentation method. The information presentation device includes: an identification unit that identifies respective levels of one or more users; an obtaining unit that obtains presentation information to be presented to the users; a conversion unit that appropriately converts the obtained presentation information according to the level of each user; and a presentation unit that presents the appropriately converted presentation information to each user. The present technology can be applied to, for example, a robot, a signage device, a car navigation device, and the like.

Generating phonemes of loan words using two converters

A technique for estimating phonemes for a word written in a different language is disclosed. A sequence of graphemes of a given word in a source language is received. The sequence of the graphemes in the source language is converted into a sequence of phonemes in the source language. One or more sequences of phonemes in a target language are generated from the sequence of the phonemes in the source language by using a neural network model. One sequence of phonemes in the target language is determined for the given word. Also, technique for estimating graphemes of a word from phonemes in a different language is disclosed.

Generating phonemes of loan words using two converters

A technique for estimating phonemes for a word written in a different language is disclosed. A sequence of graphemes of a given word in a source language is received. The sequence of the graphemes in the source language is converted into a sequence of phonemes in the source language. One or more sequences of phonemes in a target language are generated from the sequence of the phonemes in the source language by using a neural network model. One sequence of phonemes in the target language is determined for the given word. Also, technique for estimating graphemes of a word from phonemes in a different language is disclosed.