Patent classifications
H03M7/3077
Efficient sorting techniques facilitating the creation and use of dataset summary metadata
The present disclosure provides techniques and solutions for sorting data. In a particular implementation, a sorting technique is provided that places values in a sorted order by adding an offset value to values that are not in a sorted order. The resulting sorted set of values is not truly sorted, in that the set of modified values is sorted, but the underlying data itself is not sorted. In another implementation, a sorting technique can use multiple streams or sets. When an out of order element is encountered, it can be added to a new stream, if such a stream is available. The sorting techniques can be used for a variety of purposes, including provided sorted data for use in generating summary data, or for providing sorted data to be used in determining an intersection between two datasets.
Method for compression of time tagged data from time correlated single photon counting
A computer-implemented method for compression of Time Tagged data including Time Tagged data records includes the step of separating the Time Tagged data records into a plurality of groups. The method also includes sorting the Time Tagged data records in at least one of the groups by a photon arrival time. The method also includes subtracting a content of a record by a content of an adjacent record resulting in modified records. The method also includes compressing the modified records with a compression method.
Systems and methods for compressing genetic sequencing data and uses thereof
Embodiments of the invention are generally directed to compressing genetic sequencing data. In many embodiments, the genetic sequencing data is reordered and encoded based on sequence homology between individual sequencing reads within the genetic sequencing data. Several embodiments are directed to systems to compress genetic sequencing data, and some embodiments are directed to non-transitory, machine-readable media that direct a processor to compress genetic sequencing data. In further embodiments, the genetic sequencing data represents paired-end sequencing data, and several embodiments transmit the data to a remote device.
METHODS AND APPARATUSES FOR STORING GRAPH DATA OF A RELATIONSHIP NETWORK GRAPH
A computer implemented method for graph data storage includes acquiring connection relationship information between any two nodes in a relationship network graph including a directed connecting edge between nodes. Based on the connection relationship information, a first mapping relationship between an identifier of each node and a node identifier of an outgoing edge-connected node of the node in a compressed sparse row format is stored. A second mapping relationship between the identifier of each node and a node identifier of an incoming edge-connected node of the node in a compressed sparse column format is stored. A set of attribute information in the relationship network graph is acquired, where the set of attribute information comprises several node attributes, several edge attributes, and/or several pieces of temporary information. Using column storage, storing each attribute value of a same attribute in the set of attribute information in continuous space.
LOSSY TENSOR COMPRESSION METHOD USING NEURAL NETWORK-BASED TENSOR-TRAIN DECOMPOSITION
Disclosed is a lossy tensor compression method using neural tensor-train decomposition (NTTD). A lossy tensor compression method performed by a computer system may include inputting, to a neural tensor-train decomposition (NTTD) model, mode indices of a target entry to be reconstructed, and obtaining tensor-train (TT) cores from the mode indices of the target entry to be reconstructed through the NTTD model.
Warm start file compression using sequence alignment
Compressing files is disclosed. An input file to be compressed is first aligned. Aligning the file includes splitting the file into sequences that can be aligned. The result is a compression matrix, where each row of the matrix corresponds to part of the file. The compression matrix may also serve as a warm start if additional compression is desired. Compression may be performed in stages, where an initial compression matrix is generated in a first stage using larger letter sizes for alignment and then a second compression stage is performed using smaller letter sizes. A consensus sequence id determined from the compression matrix. Using the consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The compressed file includes the pointer pairs and the consensus sequence.
Efficient data storage by grouping similar data within a zone
A method of storing data is provided. The method includes receiving a plurality of data blocks provided to a hyperscaler system. The method also includes determining a corresponding property for each data block of the plurality of data blocks. The method further includes identifying a set of data blocks from the plurality of data blocks. Each data block of the set of data blocks is associated with a first property. The method further includes storing the set of data blocks in a first zone of a zoned storage system, based on the first property.
COMPRESSED GRAPH NOTATION
A method for compressing RDF tuples. The method including obtaining RDF tuples, obtaining a dictionary of indices, encoding for each RDF tuple the indices attributed to the subject and the object, grouping RDF tuples sharing the same predicate and for each group sorting the RDF tuples by considering the encoding of the subject and the object, and for each group of sorted RDF tuples, serializing the index of the shared predicate, serializing the encoding of the subject and the object of a first RDF tuple, and for each RDF tuple of the group of sorted RDF tuples subsequent to the first RDF tuple of the group, computing a difference between the encoding of the subject and the object of a current RDF tuple and the encoding of the subject and the object of a previous RDF tuple, and serializing the computed difference in a form of a variable-length integer.
Conversion device, memory system, decompression device, and method
According to one embodiment, a conversion device includes a demultiplexer, first to Nth extractors and a deinterleave unit. The demultiplexer extracts first to Nth substreams from a first compressed stream. The first to Nth substreams are placed in order in the first compressed stream and include first variable-length codes to Nth variable-length codes into which first symbols to Nth symbols of a symbol string have been converted. The first to Nth extractors extract the first variable-length codes to the Nth variable-length codes from the first to Nth substreams. The deinterleave unit reorders the first variable-length codes to the Nth variable-length codes in accordance with the symbol string and outputs a second compressed stream.
Parallel entropy coding
Methods and apparatuses are described to encoded data into a bitstream and to decode data from a bitstream. The method is able to perform parallel encoding and decoding efficiently and avoids padding of substreams thus reducing the amount of bits within the bitstream. Portions of input data channels are multiplexed and encoded into substreams. During the multiplexing shuffling methods are applied in order to obtain substreams of more uniform lengths. The amount of bits within the substream may be further reduced by including only the relevant significant bits within the trailing bits of the encoding process.