Patent classifications
G06F40/129
Input information correction method and information terminal
Information is read, which relates to an array of objects for input that have been displayed on a display unit upon input of input information. Whether an input object of the input information that is displayed on the display unit has been touched is determined. When the input object is determined as having been touched, the touched input object is recognized as an object to be corrected. A correction candidate object based on the array of the objects for input is displayed in the vicinity of the object to be corrected. Whether the correction candidate object has been touched is determined. When the correction candidate object is determined as having been touched, the object to be corrected is replaced with the touched correction candidate object.
Tibetan Character Constituent Analysis Method, Tibetan Sorting Method And Corresponding Devices
The present invention discloses a Tibetan character constituent analysis method, a Tibetan sorting method and corresponding devices, and relates to the field of natural language processing. The present invention is proposed to solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting. The technical solution provided by the present invention includes: S10, acquiring a Tibetan text to be analyzed; S20, using Tibetan characters in the Tibetan text as the input of a preset finite state automaton group; and S30, acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the Tibetan characters in the Tibetan text are correctly spelled.
SYSTEM AND METHOD FOR IDENTIFICATION AND CLASSIFICATION OF MULTILINGUAL MESSAGES IN AN ONLINE INTERACTIVE PORTAL
The present disclosure provides system and method for identification and classification of multilingual messages that would be considered inappropriate in an online interactive portal. The system may include processors to generate a set of data of intended inappropriate multilingual messages to train classification model. The set of data with labels is classified by assigning unique identifiers. The system includes pre-processing module to eliminate unwanted characters from set of data to train classification model. The classification model may be trained by multilingual representation module based at least in part on set of data with labels. The classification model determines whether set of data with one or more labels includes intended inappropriate multilingual messages. Furthermore, feedback loop module is utilised to retrain classification model recurrently to update set of data. The system is formed on Convolutional Neural Network (CNN) configured to classify multilingual messages as inappropriate in online interactive portal.
SYSTEM AND METHOD FOR IDENTIFICATION AND CLASSIFICATION OF MULTILINGUAL MESSAGES IN AN ONLINE INTERACTIVE PORTAL
The present disclosure provides system and method for identification and classification of multilingual messages that would be considered inappropriate in an online interactive portal. The system may include processors to generate a set of data of intended inappropriate multilingual messages to train classification model. The set of data with labels is classified by assigning unique identifiers. The system includes pre-processing module to eliminate unwanted characters from set of data to train classification model. The classification model may be trained by multilingual representation module based at least in part on set of data with labels. The classification model determines whether set of data with one or more labels includes intended inappropriate multilingual messages. Furthermore, feedback loop module is utilised to retrain classification model recurrently to update set of data. The system is formed on Convolutional Neural Network (CNN) configured to classify multilingual messages as inappropriate in online interactive portal.
Language-agnostic Multilingual Modeling Using Effective Script Normalization
A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.
Arabic Latinized
Arabic Latinized is the first and only technique to learn Arabic Language that is based on conditioning the mind to convert Arabic to Latin, in reading and writing.
The Arabic alphabet: Consists of 28 letters and 1 auxiliary called Hamza (table in FIGS. 1 and 2).
“Arabic letters in common”; Nineteen, are pronounced as in Latin.
“Arabic letters in proper”: Nine, plus the auxiliary Hamza (10 letters), also called “Gluttural Letters”, they exist only in Arabic.
The correct way of pronouncing the Arabic “guttural Letters” with the anatomical site to produce those letters is illustrated in FIG. 3
“Hamza”, the 29.sup.th auxiliary Letter is pronounced a “guttural catch or pause” in the voice, as the letter “A” in “Apple” sounds; FIG. 3. It can be independent stand-alone “”, or add-on to any of the Vowel letters: Alef (#1), either above, written as “
” or beneath, written as “
” Waw (#27), only above, written as “
” Ya' (#28), only above, written as “
”
The technique of Arabic letters' conversion to Latin is illustrated in FIGS. 4 and 5.
The invention attended to every detail unique to the Arabic language, especially the symbols of Short Vowels (FIG. 6)
Arabic Latinized
Arabic Latinized is the first and only technique to learn Arabic Language that is based on conditioning the mind to convert Arabic to Latin, in reading and writing.
The Arabic alphabet: Consists of 28 letters and 1 auxiliary called Hamza (table in FIGS. 1 and 2).
“Arabic letters in common”; Nineteen, are pronounced as in Latin.
“Arabic letters in proper”: Nine, plus the auxiliary Hamza (10 letters), also called “Gluttural Letters”, they exist only in Arabic.
The correct way of pronouncing the Arabic “guttural Letters” with the anatomical site to produce those letters is illustrated in FIG. 3
“Hamza”, the 29.sup.th auxiliary Letter is pronounced a “guttural catch or pause” in the voice, as the letter “A” in “Apple” sounds; FIG. 3. It can be independent stand-alone “”, or add-on to any of the Vowel letters: Alef (#1), either above, written as “
” or beneath, written as “
” Waw (#27), only above, written as “
” Ya' (#28), only above, written as “
”
The technique of Arabic letters' conversion to Latin is illustrated in FIGS. 4 and 5.
The invention attended to every detail unique to the Arabic language, especially the symbols of Short Vowels (FIG. 6)
Recognizing transliterated words using suffix and/or prefix outputs
A computer-implemented method includes: receiving, by a computing device, an input file defining correct spellings of one or more transliterated words; generating, by the computing device, suffix outputs based on the one or more transliterated words; generating, by the computing device, a dictionary that maps the suffix outputs to the one or more transliterated words; recognizing, by the computing device, an alternatively spelled transliterated word included in a document as one of the one or more correctly spelled transliterated words using the dictionary; and outputting, by the computing device, information corresponding to the recognized transliterated word.
Recognizing transliterated words using suffix and/or prefix outputs
A computer-implemented method includes: receiving, by a computing device, an input file defining correct spellings of one or more transliterated words; generating, by the computing device, suffix outputs based on the one or more transliterated words; generating, by the computing device, a dictionary that maps the suffix outputs to the one or more transliterated words; recognizing, by the computing device, an alternatively spelled transliterated word included in a document as one of the one or more correctly spelled transliterated words using the dictionary; and outputting, by the computing device, information corresponding to the recognized transliterated word.
Method and device for sorting Chinese characters, searching Chinese characters and constructing dictionary
The invention discloses a method and a device for sorting Chinese characters, searching for Chinese characters and constructing a dictionary, and relates to the technical field of computers. A specific implementation of the method includes: obtaining the first basic character-forming component of a Chinese character according to the stroke order as the First Character, and encoding the First Character to obtain the First Character code, where the First Character includes the first character-forming component and the first main stroke component of a Chinese character; obtaining the number of strokes included in each Chinese character, and obtaining the corresponding stroke string of each Chinese character; using the First Character code as the first and highest priority sorting field, the number of strokes as the second sorting field, and the stroke string as the third and the lowest priority sorting field to sort Chinese characters. This embodiment can solve the problem of difficulty in sorting and searching of Chinese characters caused by the unfixed definition and position of radicals.