G06F16/1744

Multi-level compression for storing data in a data store

Data to be stored in a data block for a columnar database table may be compressed according to a multi-level compression scheme. Data to be stored in the data block may be received. The data may be compressed according a column-specific compression technique to produce compressed data. The compressed data may then be compressed according to a second compression technique different than the column-specific compression technique to produce multi-level compressed data. The multi-level compressed data may be stored in the data block. When reading from the data block, multi-level compressed data may be decompressed according to the column-specific compression technique and the default compression technique applied to the data.

DOUBLE-PASS LEMPEL-ZIV DATA COMPRESSION WITH AUTOMATIC SELECTION OF STATIC ENCODING TREES AND PREFIX DICTIONARIES

A method includes receiving an input data stream at a processor, and for each byte sequence from a plurality of byte sequences of the input data stream, a hash is generated and compared to a hash table to determine whether a match exists. If a match exists, that byte sequence is incrementally expanded to include one or more additional adjacent bytes from the input data stream, to produce multiple expanded byte sequences. Each of the expanded byte sequences is compared to the hash table to identify a maximum-length matched byte sequence from a set that includes the byte sequence and the plurality of expanded byte sequences. A representation of the maximum-length matched byte sequence is stored in the memory. If a match does not exist, a representation of that byte sequence is stored as a byte sequence literal in the memory.

FILE JOURNAL INTERFACE FOR SYNCHRONIZING CONTENT
20230101958 · 2023-03-30 ·

In some embodiments, a system for synchronizing content with client devices receives a request from a client device to synchronize operations pertaining to content items associated with a user account registered at the system. The request can include the operations and a cursor identifying a current position of the client in a journal of revisions on the system. Based on the operations, the system generates linearized operations associated with the content items. The linearized operations can include a respective operation derived for each of the content items from one or more of the operations. The system converts each respective operation in the linearized operations to a respective revision for the journal of revisions and, based on the cursor, determines whether the respective revision conflicts with revisions in the journal. When the respective revision does not conflict with revisions in the journal, the system adds the respective revision to the journal.

METHOD AND APPARATUS FOR STORING AND QUERYING TIME SERIES DATA, AND SERVER AND STORAGE MEDIUM THEREOF

Disclosed are a method and apparatus for storing and querying time series data. The method includes: determining a data type of data to be stored; compressing the data to be stored by a data compression method corresponding to the data type; storing compressed data to a data storage table corresponding to the data type; receiving a query request including a query data type and a query time condition; querying target data that meets the query time conditions from a data storage table corresponding to the query data type. In the embodiments of the present disclosure, different compression methods are adopted for different types of data, which improves the compression efficiency of time series data and save storage resources. Moreover, when performing data query, time series data that meets a query time condition is searched in a data storage table corresponding to a query data type, which improves the query efficiency of different types of time series data.

Classification of data files
11574059 · 2023-02-07 · ·

A method including determining a combined data set including query data files that are to be classified, clean data files that are known to be free of malware, and malicious data files that are known to include malware; calculating respective compression functions for each of the query data files, each of the clean data files, and each of the malicious data files; individually comparing each respective compression function with each other respective compression function to determine degrees of similarity between contents included in the data files; determining a plurality of clusters based on the degrees of similarity between contents included in the data files; and classifying each query data file as a file that is likely free of malware or as a file that likely includes malware based on analyzing the combination of the query data files, the clean data files, and the malicious data files in each cluster.

COMPUTER-READABLE RECORDING MEDIUM STORING INFORMATION PROCESSING PROGRAM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING APPARATUS
20230033921 · 2023-02-02 · ·

A recording medium stores an information processing program for managing a plurality of storage devices and a plurality of servers. The program causes a computer to execute a process including: while changing a compression ratio setting, obtaining an actual compression ratio by using some of data pieces to be used by the plurality of servers and a decompression rate at which the servers decompress a compressed dataset in which the some data pieces are compressed; and determining the compression ratio setting to be used based on a maximum total bandwidth of the plurality of storage devices and a number of the plurality of servers by using the obtained actual compression ratio and the decompression rate for each of the compression ratio settings.

Data compression and decompression facilitated by machine learning
11615057 · 2023-03-28 · ·

Disclosed herein are embodiments for compressing data. A first encoding, a decoding, and an error prediction index are received from one or more artificial neural networks. The first encoding corresponds to a lossy compression of the data. The decoding corresponds to a decompression of the first encoding. The error prediction index indicates one or more locations of predicted error in the decoding. Based on the data and the error prediction index, a first set of bits is generated to include one or more bit values of the data at the one or more locations of predicted error. Based on the error prediction index and the decoding, a second set of bits is generated to indicate one or more locations of unpredicted error in the decoding. The first encoding, the first set of bits, and the second set of bits are stored as a losslessly compressed version of the data.

Compression of array of strings with similarities
11615056 · 2023-03-28 · ·

A method of compressing a string array comprising strings with similarity includes selecting a string compression method from among a plurality of available compression methods based on at least which of the available compression method yields the shortest compressed string. The string is then compressed using the selected string compression method. The array of strings to be compressed comprises text characters represented by a first range of values within a word, and compressed string comprises one or more words in a second range of values dedicated to compression and not overlapping with the first range of values. This process is repeated for additional strings in the string array, such that the compression method used for each of a plurality of strings is independently selected.

COMPRESSION TECHNIQUES FOR VERTICES OF GRAPHIC MODELS
20230090310 · 2023-03-23 ·

Methods for lossy and lossless pre-processing of image data. In one embodiment, a method for lossy pre-processing image data, where the method may include, at a computing device: receiving the image data, where the image data includes a model having a mesh, the mesh includes vertices defining a surface, the vertices including attribute vectors, and the attribute vectors including values. The method also including quantizing the values of the attribute vectors to produce modified values, where a precision of the modified values is determined based on a largest power determined using a largest exponent of the values, encoding pairs of the modified values into two corresponding units of information. The method also including, for each pair of the pairs of the modified values, serially storing the two corresponding units of information as a data stream into a buffer, and compressing the data stream in the buffer.

DIVIDING AN ASTC TEXTURE TO A SET OF SUB-IMAGES
20220343544 · 2022-10-27 · ·

A method including encoding a digital image file into an adaptable scalable texture compression (ASTC) file comprising a single file. The method also includes dividing, logically, the ASTC file into a sub-image comprising a sub-portion of the ASTC file. The method also includes copying the sub-image to a computer memory. The method also includes associating an ASTC header with the sub-image in the computer memory. The method also includes storing a combination of the ASTC header and the sub-image in the computer memory as a new ASTC file.