G06F16/30

Cloud distributed hybrid data storage and normalization

A method and system for cloud distributed hybrid data storage and normalization are disclosed. The method may include obtaining a data set comprising data entities. A data entity may comprise data fields each containing a data element. The method may further include determining policy constraint meta-data for each of the data elements based on the storage policy constraint. The policy constraint meta-data may include a first meta-tag indicating the storage policy constraint for the data element. The method may further include determining whether a server satisfies the storage policy constraint based on the first meta-tag for the data element. When the server satisfies the storage policy constraint, the method may further include transmitting the data element to the server to store the data element on the server. When the server fails to satisfy the storage policy constraint, the method may further include, storing the data element on the client.

Method and apparatus for searching video segment, device, and medium

Embodiments of the present disclosure disclose a method and apparatus for searching a video segment, a device and a medium, and relate to the field of video data search. The method includes: sampling video frames from a target video and videos to be searched in a video library, and extracting features from the sampled frames; matching the target video and the videos to be searched according to the extracted features to determine a candidate video to be searched that matches the target video; determining at least one candidate video segment from the determined candidate video, and calculating a degree of matching between the target video and each candidate video segment based on the extracted features of each sampled frame; and determining a video segment matching the target video in the videos to be searched according to the calculated degree of matching between the target video and each candidate video segment.

Identifying regulator and driver signals in data systems

A method of identifying causal relationships between time series may include accessing a hierarchy of nodes in a data structure, where each node in the plurality of nodes may include a time series of data. The method may also include identifying a subset of nodes in the plurality of nodes for which causal relationships may exist in the corresponding time series. The method may additionally include generating a model for each of the subset of nodes, where the model may receive the subset of nodes and generate coefficients indicating how strongly each of the subset of nodes causally affects other nodes in the subset of nodes. The method may further include generating a ranked output of nodes that causally affect a first node in the subset of nodes based on an output of the corresponding model.

Batch processing with random access for transaction history

Methods, systems, and computer-readable media for batch processing with random access for transaction history are disclosed. A batch processing system receives a batch comprising records of events, including a first record of a first event and a second record of a second event. The system assigns the first and second records to a group based (at least in part) on determining that the events are related. The system determines that the group is related to a match set comprising one or more prior events. The system updates one or more values in the match set based (at least in part) on the first and second records. The system stores the updated match set and one or more additional match sets using a storage object. The system retrieves the match set and not the one or more additional match sets from the storage object using an index.

Method and apparatus for presenting information

Embodiments of the present disclosure provide a method and apparatus for presenting information. The method may include: acquiring target release information and a comment information set associated with the target release information; and generating, for comment information in the comment information set, usefulness probabilities and predicted comment scores of the comment information based on the comment information and the target release information. The method may further include: presenting, based on obtained usefulness probability set and predicted comment score set, the comment information in the comment information set.

Automatic identification of document sections to generate a searchable data structure

Methods and apparatuses are described for automatically identifying text sections of a document to generate a searchable hierarchical data structure. A computing device receives a document comprising text entities and converts the document from a first format to a second format, including generating metadata associated with text alignment, text position, text spacing, or fonts. The computing device extracts the text blocks, including determining coordinates associated with each text block using the metadata. The computing device determines document sections using the document metadata by identifying strings in the extracted text blocks that indicate a presence of a bullet point in the document, assigns a hierarchical category to each identified document section, and inserts text of each document section into a hierarchical data structure based upon the assigned hierarchical category. The computing device traverses the hierarchical data structure using search request data to identify document sections relating to the search request data.

Index selection for database query

One or more computer processors match a query pattern to a received query; context information related to the received query; retrieve a set of query records including the same context information as the obtained context information from an index knowledge base, wherein each query record in the set of query records include context information related to a respective history query, the query pattern, an index type associated with the query pattern, and performance information relating to the query pattern and the index type; determine that a subset of the retrieved query records includes one or more query patterns equivalent to the matched query pattern; select a query pattern and an associated index type from the subset of query records based on associated performance information in the set of query records; and perform the received query by applying the selected query pattern and the associated index type.

Method and apparatus for grouping records based upon a prediction of the content of the records

A method, apparatus and computer program product group records based upon a prediction of the content of the records. In the context of a method, data associated with respective subjects of the records is received and a threshold of a machine learning model is adjusted to satisfy an accuracy requirement for record categorization. In response to analyzing the data, but not the records, by the machine learning model, the method separates, using the machine learning model, the records into the first and second groups with the first group including records that the associated data indicates are more likely to support the addition of a code and the second group including records that the associated data indicates are less likely to support the addition of a code. The method also includes subsequently processing the records in different manners depending upon whether the records are included in the first or second group.

Computer security using context triggered piecewise hashing
11687572 · 2023-06-27 · ·

Generally discussed herein are devices, systems, and methods for clustering based on context triggered piecewise hashing (CTPH). A method can include determining a first index of a first CTPH string of the file. The first index can include contiguous bits of the CTPH string. The first index can be smaller than the CTPH string, such as to be a proper subset of the CTPH string. The method can include determining the first index matches a second index of a cluster of files and in response to determining the first index matches the second index of the cluster, associating the file with the cluster. The method can include determining that the file includes malware based on the cluster.

Computer security using context triggered piecewise hashing
11687572 · 2023-06-27 · ·

Generally discussed herein are devices, systems, and methods for clustering based on context triggered piecewise hashing (CTPH). A method can include determining a first index of a first CTPH string of the file. The first index can include contiguous bits of the CTPH string. The first index can be smaller than the CTPH string, such as to be a proper subset of the CTPH string. The method can include determining the first index matches a second index of a cluster of files and in response to determining the first index matches the second index of the cluster, associating the file with the cluster. The method can include determining that the file includes malware based on the cluster.