Patent classifications
G06F40/49
METHOD FOR IDENTIFYING NOISE SAMPLES, ELECTRONIC DEVICE, AND STORAGE MEDIUM
The method for identifying noise samples, includes: obtaining an original sample set; obtaining a target sample set by adding masks to original training corpora in the original sample set using a preset adjustment rule; performing mask prediction on a plurality of target training corpora in the target sample set using a pre-trained language model to obtain a first mask prediction character corresponding to each target training corpus; matching the first mask prediction character corresponding to each target training corpus with a preset condition; and according to target training corpora of which first mask prediction characters do not match the preset condition in the target sample set, determining corresponding original training corpora in the original sample set as noise samples.
METHOD FOR IDENTIFYING NOISE SAMPLES, ELECTRONIC DEVICE, AND STORAGE MEDIUM
The method for identifying noise samples, includes: obtaining an original sample set; obtaining a target sample set by adding masks to original training corpora in the original sample set using a preset adjustment rule; performing mask prediction on a plurality of target training corpora in the target sample set using a pre-trained language model to obtain a first mask prediction character corresponding to each target training corpus; matching the first mask prediction character corresponding to each target training corpus with a preset condition; and according to target training corpora of which first mask prediction characters do not match the preset condition in the target sample set, determining corresponding original training corpora in the original sample set as noise samples.
Method and apparatus for training models in machine translation, electronic device and storage medium
A method and apparatus for training models in machine translation, an electronic device and a storage medium are disclosed, which relates to the field of natural language processing technologies and the field of deep learning technologies. An implementation includes mining similar target sentences of a group of samples based on a parallel corpus using a machine translation model and a semantic similarity model, and creating a first training sample set; training the machine translation model with the first training sample set; mining a negative sample of each sample in the group of samples based on the parallel corpus using the machine translation model and the semantic similarity model, and creating a second training sample set; and training the semantic similarity model with the second training sample set.
Method and apparatus for training models in machine translation, electronic device and storage medium
A method and apparatus for training models in machine translation, an electronic device and a storage medium are disclosed, which relates to the field of natural language processing technologies and the field of deep learning technologies. An implementation includes mining similar target sentences of a group of samples based on a parallel corpus using a machine translation model and a semantic similarity model, and creating a first training sample set; training the machine translation model with the first training sample set; mining a negative sample of each sample in the group of samples based on the parallel corpus using the machine translation model and the semantic similarity model, and creating a second training sample set; and training the semantic similarity model with the second training sample set.
UNIVERSAL DATA LANGUAGE TRANSLATOR
The present disclosure is directed to a universal data language (UDL) translator. Specifically, the systems and methods disclosed enable input data from a variety of sources to be translated into a UDL that can be consistently analyzed and compared against other sources of data. For example, an entity may upload input data that has a plurality of data terms and definitions (e.g., header column in a spreadsheet). These terms may be duplicative and/or inaccurate with respect to the underlying data. If the entity wishes to compare and transact data within a data marketplace, the entity may not fully comprehend what data it is missing and/or what data another entity may have to offer for trade. To remedy this problem of business semantic management, the present invention discloses steps for creating a UDL and a UDL translator so that any input data can be translated to UDL.
UNIVERSAL DATA LANGUAGE TRANSLATOR
The present disclosure is directed to a universal data language (UDL) translator. Specifically, the systems and methods disclosed enable input data from a variety of sources to be translated into a UDL that can be consistently analyzed and compared against other sources of data. For example, an entity may upload input data that has a plurality of data terms and definitions (e.g., header column in a spreadsheet). These terms may be duplicative and/or inaccurate with respect to the underlying data. If the entity wishes to compare and transact data within a data marketplace, the entity may not fully comprehend what data it is missing and/or what data another entity may have to offer for trade. To remedy this problem of business semantic management, the present invention discloses steps for creating a UDL and a UDL translator so that any input data can be translated to UDL.
METHODS AND SYSTEMS FOR SPEECH-TO-SPEECH TRANSLATION
There is provided a method of speech-to-speech translation including receiving at a mobile device input speech data associated with speech in a first language and converting the input speech data into input text data using a speech-to-text conversion engine (STT engine) onboard the mobile device. The method also includes translating the input text data to form a translated text data using a text-to-text translation engine (TTT engine) onboard the mobile device. The translated text data is associated with a second language. In addition, the method includes converting the translated text data into output speech data using a text-to-speech conversion engine (TTS engine) onboard the mobile device, and outputting at the mobile device a device output based on the output speech data. Mobile devices and computer-readable storage media for speech-to-speech translation are also provided.
METHODS AND SYSTEMS FOR SPEECH-TO-SPEECH TRANSLATION
There is provided a method of speech-to-speech translation including receiving at a mobile device input speech data associated with speech in a first language and converting the input speech data into input text data using a speech-to-text conversion engine (STT engine) onboard the mobile device. The method also includes translating the input text data to form a translated text data using a text-to-text translation engine (TTT engine) onboard the mobile device. The translated text data is associated with a second language. In addition, the method includes converting the translated text data into output speech data using a text-to-speech conversion engine (TTS engine) onboard the mobile device, and outputting at the mobile device a device output based on the output speech data. Mobile devices and computer-readable storage media for speech-to-speech translation are also provided.
System and method for translating text
The subject matter discloses a method for translating text in an image, comprising extracting at least a portion of the text in a source language from the image, identifying one or more bounding boxes containing the text in the image, translating at least a portion of the text in the source language to a destination language, generating a new image containing the text in the destination language in the bounding boxes of the associated words in the source language.
System and method for translating text
The subject matter discloses a method for translating text in an image, comprising extracting at least a portion of the text in a source language from the image, identifying one or more bounding boxes containing the text in the image, translating at least a portion of the text in the source language to a destination language, generating a new image containing the text in the destination language in the bounding boxes of the associated words in the source language.