Patent classifications
G06F16/86
FEATURE SETS USING SEMI-STRUCTURED DATA STORAGE
The subject technology receives by a database system, raw input data from a source table provided by an external environment, the source table comprising multiple rows and multiple columns, the raw input data comprising values in a first format, the values comprising input features corresponding to datasets included in the raw input data for machine learning models, the external environment comprising an external system from the database system and is accessed by different users. The subject technology generates cell data for a second table based on the values from the source table. The subject technology performs a database operation to generate the second table including table metadata, column metadata, and the generated cell data.
Extensible device object model
Systems and/or methods are provided relating to an extensible framework. The extensible framework provides constructs with which device developers can model devices within the framework to enable a host application utilizing the framework to interact with the devices. New devices can be supported by the framework without disrupting existing devices or the host application.
EXTRACTING INFORMATION FROM TABLES EMBEDDED WITHIN DOCUMENTS
Much valuable information in documents is presented within tables. However, the information within tables is hard to extract automatically with high accuracy due to the wide variety and low quality of typical tables found in electronic documents. Information extraction technology can provide a method of extracting information from heterogeneous tables by recognizing tables, the header cells, and cells that are merged or should be merged, creating a richer representation of table structure and providing a convenient way of linking cells to their row and column headers. Use of this richer representation allows a few extraction patterns to successfully pull out information from a wide variety of differently formatted tables.
Document relational mapping
Described is technology to translate between tree-structured documents and electronic storage such as a relational data store. A document may be composed from the data store or decomposed to a data store using a document mapping command. The document mapping command includes follow commands that associate the columns in one table with columns in another table and resolve these associations during composition or decomposition. These follow commands allow for the retrieval of data from the data store and for inserting and/or modifying the data store by way of applying deltas to the data store.
Methods and apparatus for generating causality matrix and impacts using graph processing
Methods and apparatus for generating a causality matrix using vertex-centric processing framework to be used by a codebook correlation engine to determine a set of problems to explain active symptoms in a system. Methods and apparatus for calculating impacts of problems using vertex-centric processing framework.
AUTOMATIC GENERATION OF STRUCTURED DATA FROM SEMI-STRUCTURED DATA
A method and system for generating structured data from semi-structured data are provided. The method includes reading a plurality of records from a data file including semi-structured data. Further, the method includes obtaining aligned delimiters in a list for every record that has been read. The method also includes selecting a most occurring delimiter from the list. The method then includes constructing a regular expression using the selected delimiter to split the records into different fields. The method also includes reconstructing the records for the regular expression to fit and split into fields. In addition, the method includes displaying the records split into the fields.
Anonymous Identity In Identity Oriented Networks and Protocols
A method of using ephemeral identifiers (IDs) in a network implemented a network element (NE) comprises obtaining ephemeral ID for at least one user equipment (UE) accessible by the NE, wherein the ephemeral ID is a temporary and recyclable ID associated with the UE, transmitting a request to map the ephemeral ID of the UE to a locator of the NE to a mapping server, and establishing a communication session between the UE and a network site using the ephemeral ID.
READ-AFTER-WRITE CONSISTENCY FOR DERIVED NON-RELATIONAL DATA
A system is provided that ensures read-after-write consistency. During operation, the system receives, from a user, a write to a record having a primary key in a master key-value store, wherein the write specifies a secondary key for the record. The system then caches the secondary key and the primary key in a cache entry in a cache, wherein the cache entry is associated with the user. Next, the system applies the write to the master key-value store. Prior to propagation of the write from the master key-value store to a derived key-value store that maps secondary keys to primary keys, the system receives from a given user a query for the record, the query comprising the secondary key and not the primary key. Next the system translates the secondary key to the primary key by querying the cache when the given user is the user.
Virtualizing schema relations over a single database relation
According to one embodiment of the present invention, a system maps one or more virtual relations to a table of a relational database management system. The system generates a structured query language (SQL) statement for the table from a SQL statement for a virtual relation by applying the mapping to one or more elements of the SQL statement for the virtual relation. Embodiments of the present invention further include a method and computer program product for mapping virtual relations to a table in substantially the same manners described above.
INTERACTIVE IDENTIFICATION OF SIMILAR SQL QUERIES
Systems and methods very fast grouping of “similar” SQL queries according to user-supplied similarity criteria are disclosed. The user-supplied similarity criteria includes a threshold quantifying the degree of similarity between SQL queries and common artifacts included in the queries. A similarity-characterizing data structure is disclosed that allows for the very fast grouping of “similar” SQL queries. Because the computation is distributed among multiple compute nodes, a small cluster of compute nodes takes a short time to compute the similarity-characterizing data on a workload of tens of millions of queries. The user can supply the similarity criteria through a UI or a command line tool. Furthermore, in some embodiments, the user can adjust the degree of similarity by supplying new similarity criteria. Accordingly, the system can display in real time or near real time, updated SQL groupings corresponding to the newly supplied similarity criteria using the originally computed similarity-characterizing data structure.