Patent classifications
G06F16/902
Processing a data set
Embodiments relate to processing a data set stored in a computer system. In one aspect, a method of processing a data set stored in a computer system includes providing one or more parameters for quantifying data quality of the data set. A processor generates, for each parameter of the one or more parameters, a reference pattern indicating a dysfunctional behavior of the values of the parameter. The data set is processed to obtain values of the one or more parameters. A parameter of the one or more parameters is identified whose obtained values match a corresponding reference pattern of the generated reference patterns. The identified parameter is assigned a resource weight value indicating the amount of processing resources required to fix the dysfunctional behavior of the identified parameter.
Bulk data extract hybrid job processing
Methods for hybrid job processing may include receiving raw data records stored within a plurality of tables from a plurality of systems of record at a raw data layer within a data exchange. Methods may include generating, based on a data model, a list of dependencies between the plurality of tables. Each table included in a second subset of the plurality of tables may be dependent on at least one table included in a first subset of the plurality of tables. Methods may include processing the first subset of the plurality of tables concurrently with one another. The processing includes modeling the raw data records and transmitting the modeled data records to the model data layer. Methods may include processing each table included in the second subset after completion of processing of the table included in the first subset from which the table in the second subset depends on.
Tensor-Based Deep Relevance Model for Search on Online Social Networks
In one embodiment, a method includes receiving, from a client system associated with a user, a search query comprising a number of query terms, generating a query match-matrix for the search query, identifying a number of objects matching the search query, retrieving, for each identified object, an object match-matrix for the identified object, constructing, for each identified object, a three-dimensional tensor for the identified object, computing, for each identified object, a relevance score based on the tensor for the identified object, ranking the identified objects based on their respective relevance scores, and sending, to the first client system in response to the search query, instructions for generating a search-results interface for presentation to the user.
Extensible framework for ereader tools, including named entity information
Information about named entities referenced in an electronic book (ebook) is provided to a client device. An ebook identifier identifying the ebook is received from the client device. A set of layers available for use with the ebook is determined. The layers in the set provide information associated with the ebook and a layer in the set provides information associated with named entities referenced in content of the ebook. A content range identifying a range of content of the ebook for which layer information is requested and an identification of one or more of the layers in the set for which layer information is requested is received from the client device. Layer information associated with the ebook content identified by the content range for the identified layers is transmitted to the client device. The transmitted layer information includes information associated with named entities referenced by ebook content.
Specification of database table relationships for calculation
A relationship amongst multiple relationships between database tables can be specified independent of a query. More specifically, a function (USERELATIONSHIP) can be introduced to the DAX language (Data Analysis Expressions), which provides a way to author formulas that are not evaluated immediately, but that can be evaluated dynamically and concurrently in many different contexts. The function enables a single relationship to be specified in the calculation formula away from the query. This provides a mechanism within the formula that specifies specific relationship(s) that are to be followed when the dynamic expression is evaluated.
System and method for detecting relevant potential participating entities
A method and system for detecting relevant potential participating entities across different databases. A method includes retrieving transaction data related to potential participating entities by resolving each of the plurality of potential participating entities between a dataset including transaction data and a dataset indicating the plurality of potential participating entities, wherein resolving the plurality of potential participating entities further includes applying resolution rules requiring matching a plurality of features between respective instances of the potential participating entity in the transaction data and in the dataset indicating the plurality of potential participating entities; determining a plurality of relevance scores based on the retrieved transaction data and entity characteristics of a subject entity, wherein each relevance score represents a relevance of the subject entity with respect to a respective potential participating entity; and identifying, based on the plurality of relevance scores, at least one relevant potential participating entity.
METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR PERFORMING NUMERIC SEARCHES
A method and system for performing numeric searches related to biometric information, the method comprising generating a main search key representing biometric features of an item to be searched, partitioning the main search key into at least two sub-keys, each sub-key comprising a predetermined number of bits, obtaining a set of index tables each comprising a plurality of key values, each key value being associated with a corresponding index value, wherein the number of index tables in the set is equal to the number of sub-keys within the main search key, identifying, in a first one of said set of index tables, at least one key value matching a first sub-key of the main search key, obtaining, for each identified key value, a corresponding index value pointing to a limited portion of key values in a next index table, identifying, in said limited portion of said next index table, at least one key value matching a next sub-key of the main search key repeating the steps of obtaining index values and searching a limited portion of a next index table until all sub-keys of the main search key have been searched, returning a result when the last sub-key of the main search key has been searched.
Content development device
This content development apparatus includes at least one storage medium and at least one processor. The storage medium is configured to store a plurality of resource data pertaining to content being created; and store a database pertaining to the resource data. The processor is configured to execute a plurality of editing processes; generate first information created for each type of the resource data and at least including a location of each of the resource data, and second information expressing an association between different types of the first information; store the first and the second information in the at least one storage medium; respond to a request from one of the editing processes to acquire, using a designated resource data, information indicating a different type of the resource data associated with the designated resource data; notify the editing process; and update the database.
SYSTEM AND METHOD FOR INVESTIGATING LARGE AMOUNTS OF DATA
A data analysis system is proposed for providing fine-grained low latency access to high volume input data from possibly multiple heterogeneous input data sources. The input data is parsed, optionally transformed, indexed, and stored in a horizontally-scalable key-value data repository where it may be accessed using low latency searches. The input data may be compressed into blocks before being stored to minimize storage requirements. The results of searches present input data in its original form. The input data may include access logs, call data records (CDRs), e-mail messages, etc. The system allows a data analyst to efficiently identify information of interest in a very large dynamic data set up to multiple petabytes in size. Once information of interest has been identified, that subset of the large data set can be imported into a dedicated or specialized data analysis system for an additional in-depth investigation and contextual analysis.
Methods and systems for creating automata networks
The Automata Processor Workbench (AP Workbench) is an application for creating and editing designs of AP networks (e.g., one or more portions of the state machine engine, one or more portions of the FSM lattice, or the like) based on, for example, an Automata Network Markup Language (ANML). For instance, the application may include a tangible, non-transitory computer-readable medium configured to store instructions executable by a processor of an electronic device, wherein the instructions include instructions to represent an automata network as a graph.