Patent classifications
G06F40/129
SAMPLE GENERATION METHOD, MODEL TRAINING METHOD, TRAJECTORY RECOGNITION METHOD, DEVICE, AND MEDIUM
Disclosed are a sample generation method, a model training method, a trajectory recognition method, a device, and a medium. The method is: determining a code result of a training Chinese character according to a preset code library, where the preset code library is generated based on code characters in a five-stroke code corpus; taking the code result as a training label of the training Chinese character; and generating a training sample according to both a writing trajectory and the training label of the training Chinese character. The amount of information carried in the training sample is enriched.
Tibetan Spelling Check Method And Device Based On Automata
The present invention discloses a Tibetan spelling check method and device based on automata, and relates to the field of natural language processing. The present invention is proposed to solve the problem in the prior art that as the application range is relatively narrow, some Tibetan characters with special structures cannot be recognized. The technical solution provided by the embodiments of the present invention includes: S10, segmenting a Tibetan text to be checked with an character as a unit to acquire at least one Tibetan character; S20, using the at least one Tibetan character as the input of a preset finite state automaton group; and S30, judging whether the Tibetan text to be checked is correctly spelled through the finite state automaton group.
METHOD AND SYSTEM FOR TRANSCRIPTION OF A LEXICAL UNIT FROM A FIRST ALPHABET INTO A SECOND ALPHABET
A server and a method for transcription of a lexical unit from a first alphabet into a second alphabet, the method comprising: acquiring a pair of (i) the lexical unit written in the first alphabet, and (ii) the corresponding transcription of the lexical unit written in the second alphabet, both having been divided into respective segments, such that within the pair, every segment of the lexical unit has a corresponding segment in the transcription of the lexical unit, and such that each lexical unit comprises either a sequence of sequentially alternating consonant segments, or a single vowel segment, or a single consonant segment; defining, for each given segment of the lexical unit, its context; training the server to calculate a theoretical frequency of at least one second alphabet character representing transcription of a particular given segment based on the context of particular given segment of the lexical unit.
Cross Lingual Search using Multi-Language Ontology for Text Based Communication
A method for conducting a cross lingual searching utilizing an ontology reference process to ensure thoroughness. When a query is entered, an ontology database is accessed to identify all representations for the parent entity of interest within specified languages. These representations are used to form a search set that results in more thorough collection from the data sources. Thus, the disclosed method accommodates situations where languages do not follow the same construct (e.g. English compared to Chinese) and where direct translation does not adequately represent the intent of the user's inquiry.
METHOD AND APPARATUS FOR IDENTIFYING SIMILAR DATA ELEMENTS USING STRING MATCHING
Disclosed is a method and apparatus for identifying similar record elements to a query, the method including receiving a query, determining an index for the query, generating candidate records from a reference list to match the query based on the index and applying any one or any combination of a q-gram filter, a length filter, and a shared character count (SCC) filter, determining similarity scores of each of the candidate records identifying records from among the candidate records having a similarity score greater than or equal to a threshold, selecting data records similar to the query based on sorting the selected records according to respective similarity scores, and outputting one or more of the selected data records.
DYNAMIC PHRASE EXPANSION OF LANGUAGE INPUT
The present disclosure generally relates to dynamic phrase expansion for language input. In one example process, a user input comprising text of a first symbolic system is received. The process determines, based on the text, a plurality of sets of one or more candidate words of a second symbolic system. The process determines, based on at least a portion of the plurality of sets of one or more candidate words, a plurality of expanded candidate phrases. Each expanded candidate phrase comprises at least one word of a respective set of one or more candidate words of the plurality of sets of one or more candidate words and one or more predicted words based on the at least one word of the respective set of one or more candidate words. One or more expanded candidate phrases of the plurality of expanded candidate phrases are displayed for user selection.
Predictive conversion of language input
Systems and processes for predictive conversion of language input are provided. In one example process, text composed by a user can be obtained. Input comprising a sequence of symbols of a first symbolic system can be received from the user. Candidate word strings corresponding to the sequence of symbols can be determined. Each candidate word string can comprise two or more words of a second symbolic system. The candidate word strings can be ranked based on a probability of occurrence of each candidate word string in the obtained text. Based on the ranking, a portion of the candidate word strings can be displayed for selection by the user.
Systems and methods for dynamically providing fonts based on language settings
A server dynamically provides fonts to a user device. The user device is provided with access to a document via a network. An update to a language parameter associated with the document is detected. Fonts associated with the update to the language parameter are determined. It is determined at least one of the fonts is not available on the user device. The at least one of the fonts is provided to the user device.
Performing a code conversion in a smaller target encoding space
Embodiments relate to a system, method and program product for performing code conversions. In one embodiment the method includes determining size of encoding space for a source file and a target file upon receipt of a code conversion request and generating a main conversion file upon determination that a target encoding space associated with said target file is smaller than a source encoding space associated with the source file. Subsequently an extension converted file is generated from the source file according to a pre-established mapping table of code conversion stored in a memory. The code conversion request is completed by using the main conversion file and said extension file together so that the source file does not need to be truncated in order to fit into the target conversion space.
Performing a code conversion in a smaller target encoding space
Embodiments relate to a system, method and program product for performing code conversions. In one embodiment the method includes determining size of encoding space for a source file and a target file upon receipt of a code conversion request and generating a main conversion file upon determination that a target encoding space associated with said target file is smaller than a source encoding space associated with the source file. Subsequently an extension converted file is generated from the source file according to a pre-established mapping table of code conversion stored in a memory. The code conversion request is completed by using the main conversion file and said extension file together so that the source file does not need to be truncated in order to fit into the target conversion space.