Patent classifications
G06F16/24537
COMPUTATIONAL-MODEL OPERATION USING MULTIPLE SUBJECT REPRESENTATIONS
A processing unit can determine multiple representations associated with a statement, e.g., subject or predicate representations. In some examples, the representations can lack representation of semantics of the statement. The computing device can determine a computational model of the statement based at least in part on the representations. The computing device can receive a query, e.g., via a communications interface. The computing device can determine at least one query representation, e.g., a subject, predicate, or entity representation. The computing device can then operate the model using the query representation to provide a model output. The model output can represent a relationship between the query representations and information in the model. The computing device can, e.g., transmit an indication of the model output via the communications interface. The computing device can determine mathematical relationships between subject representations and attribute representations for multiple statements, and determine the model using the relationships.
MULTIPLEXING DATA OPERATION
Embodiments of the present invention relate to a method, system, and computer program product for multiplexing data operation. In some embodiments, a method is disclosed. A query for at least one table comprising a plurality of data records is received. The query indicating a plurality of data operations to be performed on the plurality of data records. The plurality of data operations are combined into a target data operation. An intermediate result of the query is generated by performing the target data operation on the plurality of data records. A final result of the query is determined based on the intermediate result. In other embodiments, a system and a computer program product are disclosed.
Group-by size result estimation
A method and system for accurately estimating a result size of a Group-By operation in a relational database. The estimate utilizes the probability of union of the columns involved in the operation, as well as the relative cardinality of each column with respect to the other columns in the operation. In addition, the estimate incorporates the use of table filters when indicated such that table filters are applied prior to determining the size of the tables in the operation, as well as including equivalent columns into the list of columns that are a part of the Group-By operation. Accordingly, the estimate of the result size of the operation includes influencing factors that provide an accurate estimation of system memory requirements.
Search query result set count estimation
Search query result set count estimation is described. A system parses data set query that includes first query attribute and second query attribute. The system identifies first hierarchy of connected nodes including a first node representing a first query attribute, and a second hierarchy of other connected nodes including a second node representing a second query attribute. The system identifies a directed arc connecting first correlated node in first hierarchy to second correlated node in second hierarchy. The system identifies cross-hierarchy probabilities of correlations between values of a first attribute represented by the first correlated node and values of a second attribute represented by the second correlated node. The system outputs query result set estimated count generated from cross-hierarchy probabilities, probabilities that values of first attribute are associated with values corresponding to first node, and probabilities that values of second attribute are associated with values corresponding to second node.
Cache Based Efficient Access Scheduling for Super Scaled Stream Processing Systems
The technology disclosed relates to discovering a previously unknown attribute of stream processing systems according to which client offsets or client subscription queries for a streaming data store rapidly converge to a dynamic tip of a data stream that includes the most recent messages or events. In particular, it relates to grouping clients into bins to reduce a number of queries to the streaming data store by several orders of magnitude when servicing tens, hundreds, thousands or millions of clients. The bin count is further reduced by coalescing bins that have overlapping offsets. It also relates to establishing separate caches only for the current tips of data streams and serving the bins from the caches instead of the backend data store using group queries. Further, the caches are periodically updated to include the most recent messages or events appended to the dynamic tips of the data streams.
Systems and methods for managing shared content based on sharing profiles
Content items stored in an online content management service can be organized and shared. Content items can be associated with sharing profiles that include various sharing-specific metadata, such as details of how an item is shared or with whom it is shared. In some embodiments, the metadata stored in the sharing profiles can be used to organize shared content into shared folders automatically and/or to sort a list of content items.
Optimal Index Selection in Polynomial Time
A method may use a minimal set of indices for an input query including identifying the input query including primitive searches that are accelerated using indices, and computing a minimal set of indices for the input query using a polynomial-time algorithm by constructing a bi-partite graph comprising a first and a second vertex set. The first and the second vertex set may be a set of searches in both partitions of the bi-partite graph. Each edge of the edge set may connect a vertex in the first vertex set and a vertex in the second vertex set. The method may further include identifying an edge set as a strict subset relation between at least two searches of the set of searches showing up in the first and second vertex set of the bi-partite graph, and performing relational data analysis using the minimal set of indices for input queries.
FUNCTIONS FOR PATH TRAVERSALS FROM SEED INPUT TO OUTPUT
Described herein are systems, methods, and non-transitory computer readable media for defining and executing functions for determining a matching entity that is relevant to an entity of interest when, for example, there is a significant number of intermediary links and entities between the matching entity and the entity of interest. A visual depiction of a path traversal from a seed input entity to an output matched entity can be presented to an end user in a manner that allows the end user to ascertain the sequence of intermediary links and entities that connect the matched entity to the seed entity.
DETECTING DATA SKEW IN A JOIN OPERATION
Systems, methods, and devices, for managing data skew during a join operation are disclosed. A method includes computing a hash value for a join operation and detecting data skew on a probe side of the join operation at a runtime of the join operation using a lightweight sketch data structure. The method includes identifying a frequent probe-side join key on the probe side of the join operation during a probe phase of the join operation. The method includes identifying a frequent build-side row having a build-side join key corresponding with the frequent probe-side join key. The method includes asynchronously distributing the frequent build-side row to one or more remote servers.
System and method for disjunctive joins
Joining data using a disjunctive operator is described. An example computer-implemented method can include receiving a query that includes a first disjunctive predicate involving a first table and a second table. The method may also include determining a first set of rows from the first table and generating a filter from the first set of rows. The method may also further include applying the filter to the second table to generate a second set of rows. Additionally, the method may also include joining the first set of rows and the second set of rows using a first disjunctive operator of the first disjunctive predicate to generate a first results set.