G06F16/24556

Data unification

Solutions for data unification include: receiving a data record, the data record comprising a plurality of data fields; selecting, from among the plurality of data fields, a subset of the data fields, the subset of the data fields being fewer in number than the plurality of data fields, wherein selecting the subset of the data fields comprises: applying a first rule to select at least a first one of the data fields within the data record for inclusion in the subset of the data fields; using content of the subset of the data fields, generating a stable identifier (stableID) for the data record; and inserting the stableID into a primary key data field of the data record.

Task processing method and distributed computing framework

The present disclosure discloses a task processing method and a distributed computing framework. A specific embodiment of the method includes: parsing an expression corresponding to a distributed computing task, and constructing task description information corresponding to the distributed computing task, the task description information being used to describe a corresponding relationship between an operator and a distributed dataset, and the operator acting on at least one of the distributed dataset or distributed datasets obtained by grouping the distributed dataset; determining, based on the task description information, a distributed dataset the operator acting on; and performing distributed computing on the distributed dataset the operator acting on using the operator. In the distributed computing, the acting scope and nesting relationship of the operator is described by constructing a topology.

LARGE OBJECT PACKING FOR STORAGE EFFICIENCY
20230027688 · 2023-01-26 ·

One example method includes receiving data, partitioning the data according to their respective similarity groups, and the similarity groups collectively define a range of similarity groups, deduplicating the data after the partitioning, packing unique data segments remaining after deduplicating into one or more compression regions, compressing the compression regions, and writing an object, that includes the compression regions, to a durable log. The deduplicating and compressing for a similarity group may be performed by a dedup-compression instances uniquely assigned to that similarity group.

RESOURCE GRAPHS FOR INHERENT AND DERIVED RESOURCE ATTRIBUTES

A facility for creating resource graphs based on inherent and derived resource attributes is configured to assist domain experts in the processing and analyses of resource data. The facility obtains input indicating inherent resource attributes and relationships to other the resource attributes. The facility identifies derived resource attributes based on the inherent resource attributes and the relationships to other resource attributes. The facility generates a resource graph based on the derived resource attributes, inherent resource attributes, and the relationships. The facility obtains attributed data from a repository of resource attribute data and evaluates the resource data based on the resource graph and attribute data.

Key pattern management in multi-tenancy database systems

The present disclosure involves systems, software, and computer implemented methods for key pattern management. One example method includes receiving a query for a logical database table from an application. A determination is made as to whether the query is a write query. In response to determining that the query is a write query, a determination is made as to whether the query complies with a key pattern configuration that describes keys of records included in a physical database table that is part of a logical table implementation. The physical table includes records of the logical database table that are allowed to be written by the application. The write query is redirected to the physical database table in response to determining that the query complies with the key pattern definition. The query is rejected in response to determining that the query does not comply with the key pattern configuration.

Windowed writes

Systems and methods are provided for synchronizing data. The systems and methods include operations for: storing a synchronization entry for a messaging application feature, the synchronization entry comprising a last update timestamp associated with a first update to content of the messaging application feature received from a first source; receiving a second update to the content of the messaging application feature from the first source; determining that the second update was received within a write window of the last update timestamp; in response to determining that the second update was received within the write window of the last update timestamp, preventing updating the last update timestamp; and sending the first update and the second update to a client device in response to receiving a synchronization request from the client device based on the last update timestamp.

Methods and apparatus to estimate audience sizes of media using deduplication based on vector of counts sketch data

Methods and apparatus to estimate audience sizes using deduplication based on vector of counts sketch data are disclosed. An example apparatus to determine an audience size for media based on vector of counts sketch data includes: a coefficient analyzer to determine coefficient values of a polynomial based on variances, a covariance, and cardinalities corresponding to a first vector of counts from a first database and a second vector of counts from a second database; an overlap analyzer to determine a real root of the polynomial, the real root corresponding to an estimate of an overlap between the first vector of counts and the second vector of counts; and a report generator to estimate the audience size based on the estimate of the overlap and the cardinalities of the first vector of counts and the second vector of counts.

Partial group by for eager group by placement query plans

A partial group by operator is a group by operator that implements a fallback mechanism. The fallback mechanism is triggered whenever memory pressure reaches a certain threshold. When the fallback mechanism is triggered, a row is included in an output of the partial group by operator without including an aggregation value for a grouping value for the row to an aggregation data structure. A final group by operator computes a final aggregate value of all results, including pre-grouped results and passed through results, from the partial group by operator.

System and method for aggregation and graduated visualization of user generated social post on a social mapping network
11704329 · 2023-07-18 · ·

Systems and methods for receiving information associated with posts to a social network are described. Posts may be associated with a location. Symbols, portions or posts, or multiple symbols may be shown on a client device in an area of a map indicating a location associated with a post. Posts may be represented by symbols, which may include shapes, and such shapes may include emojis. An amount of symbols, posts, or portions of posts displayed on a client device may be determined at least in part by an area of a map displayed on a client device.

QUERY METHOD AND DEVICE SUITABLE FOR OLAP QUERY ENGINE
20230017300 · 2023-01-19 ·

The query method and device suitable for an On-Line Analytical Processing (OLAP) query engine includes a client agent module, a query pattern matching module, a query distributed execution module, and a pre-aggregation module. The query pattern matching module is configured to obtain an MDX query request received by an OLAP query engine and process the MDX query request to generate at least one set of aggregation query sets. The one set of aggregation query sets includes a plurality of aggregation query requests. The query distributed execution module is configured to perform concurrent processing on the plurality of aggregation query requests. The aggregation query requests are arranged corresponding to the aggregation query results. An efficient OLAP query execution engine can deal with complex OLAP queries of various reporting system. Therefore, the execution efficiency of MDX query can be significantly enhanced, and analysis requests of the reporting systems are rapidly responded.