G06F40/279

TRANSFER LEARNING AND PREDICTION CONSISTENCY FOR DETECTING OFFENSIVE SPANS OF TEXT
20230016729 · 2023-01-19 ·

Systems and methods for natural language processing are described. One or more embodiments of the present disclosure receive a span of text comprising an offensive span and a non-offensive span, generate a contextualized word embedding for each of a plurality of words of the span of text, generate a refined vector representation for each of the plurality of words based on the corresponding contextualized word embedding using a refinement network trained for offensive text recognition, generate label information for each of the plurality of words based on the corresponding refined vector representation, wherein the label information indicates whether each of the plurality of words includes offensive text, and transmit an indication of a location of the offensive span based on the label information.

LOG COMPRESSION AND OBFUSCATION USING EMBEDDINGS
20230017165 · 2023-01-19 ·

In some implementations, a device may train the model to generate embeddings for log files associated with an application, and to enable the model to generate embeddings for sensitive information included in a set of training log files. The device may receive a log file associated with the application. The device may generate a compressed log file including a set of embedding vectors associated with records included in the log file, where a record that includes sensitive information is associated with one or more embedding vectors for the sensitive information and one or more embedding vectors for other information included in the record. The device may store the compressed log file including the set of embedding vectors where a size of the compressed file is less than a size of the log file, and the embedding vectors obfuscate the records included in the log file.

System and method for processing an active document from a rich text document

The present invention relates to a system for converting a rich text document into an active document suitable for consumption on a particular device system, or operating system by human or machine. The system comprises a server including a non-transitory non-volatile storage medium. The non-transitory non-volatile storage medium is adapted to store at least rich text documents and active documents that have been converted from a rich text document format into an active document format. The server is adapted to carry out the steps of: scanning and parsing the rich text document to extract structural elements and contents; scanning and parsing the rich text document to extract embedded metadata; connecting structural elements and contents with the extracted metadata and the rich document to form a render data set; and sending the render data set to a configurable render module that outputs at least one active document.

SEARCHABLE DATA STRUCTURE FOR ELECTRONIC DOCUMENTS
20230014904 · 2023-01-19 ·

A method of generating a searchable representation of an electronic document includes obtaining an electronic document specifying a graphical layout of content items. The content items include at least text in a table. The method also includes selecting masking rules, generating a vertical mask based on the masking rules, and generating a horizontal mask based on the masking rules. The vertical mask indicates estimated locations of vertical boundaries of table columns of the table, and the horizontal mask indicates estimated locations of horizontal boundaries of table rows of the table. The method also includes identifying cells of the table based on the vertical mask and the horizontal mask and generating a searchable data structure based on text corresponding to the identified cells of the table.

SEARCHABLE DATA STRUCTURE FOR ELECTRONIC DOCUMENTS
20230014904 · 2023-01-19 ·

A method of generating a searchable representation of an electronic document includes obtaining an electronic document specifying a graphical layout of content items. The content items include at least text in a table. The method also includes selecting masking rules, generating a vertical mask based on the masking rules, and generating a horizontal mask based on the masking rules. The vertical mask indicates estimated locations of vertical boundaries of table columns of the table, and the horizontal mask indicates estimated locations of horizontal boundaries of table rows of the table. The method also includes identifying cells of the table based on the vertical mask and the horizontal mask and generating a searchable data structure based on text corresponding to the identified cells of the table.

System and Method for Electronic Chat Production
20230015667 · 2023-01-19 ·

Systems, methods, and computer program products for adaptively splitting electronic chats are provided. One embodiment includes receiving, by an electronic discovery system, an electronic chat comprising a set of electronic chat messages, each of the electronic chat messages in the set of electronic chat messages having a timestamp; determining a set of time gaps between the electronic chat messages from the set of electronic chat messages, based on selecting a Gaussian mixture model as a model of the time gaps, splitting the set of electronic chat message into a set of conversations based on the Gaussian mixture model; performing a text analysis on the set of conversations based on a chat subject matter identified in the set of electronic chat messages; and splitting the set of conversations based on the chat subject matter.

SYSTEM FOR THIRD PARTY SELLERS IN ONLINE RETAIL ENVIRONMENT

A third party item listing management system useable for validation of third party items to be included on a retailer website is disclosed. The third party item listing management system includes an application programming interface (API) accessible by a plurality of third parties and configured to receive item data. An item management process receives the item data and calls an item validation pipeline which includes a plurality of item validation stages including an item legalization stage. In the item legalization stage, the item data and the identity of the third party are validated against a plurality of item listing rules to determine whether the one or more items are allowed to be offered via the retailer website by the third party. The item listing rules can include a rule preventing the third party from listing an item included in a core item collection offered by the retailer via the retailer website.

INFORMATION SEARCH SYSTEM

Provided is an information search system that enables a searcher to efficiently find information they want to know, the system including: a database (12) that stores a plurality of pieces of information that are text-searchable; a query sentence acceptance unit (26) that accepts a query sentence in a natural language format; an inputted search keyword extractor (44) that extracts an inputted search keyword from the query sentence; a retrieval executor (40) that executes retrieval processing from the database using the inputted search keyword, along with a keyword relevant to the inputted search keyword; and a keyword dictionary (30) in which words associated with categories are registered, wherein the retrieval executor acquires, from the keyword dictionary, words associated with one of the categories selected by a searcher, re-sorts information retrieved as a result of the retrieval processing, based on the acquired words, and displays the information to the searcher.

Recurrent neural network to decode trial criteria
11557380 · 2023-01-17 · ·

A method and apparatus for providing curated criteria to identify one or more candidates for a clinical trial is disclosed. A computer processor identifies a first input criterion for the clinical trial. The processor employs a trained first recurrent neural network (RNN) configured as an encoder to encode the first input criterion. The encoder extracts key features of the medical condition of the patient. The processor employs a trained second RNN configured as a decoder to generate a curated output criterion by processing the encoded first input criterion based on the derived key features. The processor employs a machine learning model to ingest the curated output criterion to identify the one or more candidates for the clinical trial.

Video Title Generation Method, Device, Electronic Device and Storage Medium
20230222161 · 2023-07-13 ·

Provided are a video title generation method, an electronic device and a storage medium, which relate to a technical field of video, and in particular to a technical field of short video. The method includes: obtaining a plurality of pieces of optional text information, for a first video file; determining central text information, from the plurality of pieces of optional text information, the central text information being optional text information with the highest similarity to content of the first video file; and determining the central text information as a title of the first video file. That is, an interest point in an original video file can be determined according to user's interactive behavior data on the original video file, and the original video file can be clipped based on the interest point to obtain a plurality of clipped video files, namely, short videos.