H03M7/707

METHOD AND APPARATUS FOR COMPRESSING METADATA IN A FILE SYSTEM
20180101542 · 2018-04-12 ·

Embodiments of the present disclosure relate to a method and an apparatus for compressing metadata in a file system. The method comprises, in response to receiving a first request for writing first data to a file, determining whether the first request is for an initial write to a storage area associated with a second indirect block in the first group of indirect blocks, the first group of indirect blocks at least including a first indirect block and the second indirect block. The method further comprises, in response to the initial write, allocating a first group of data blocks for writing the first data on a storage device. In addition, the method further comprises compressing the first group of indirect blocks by encoding a first group of storage addresses corresponding to the first group of data blocks into the first indirect block.

Inline decompression

Techniques and apparatuses to decompress data that has been stack compressed is described. Stack compression refers to compression of data in one or more dimensions. For uncompressed data blocks that are very sparse, i.e., data blocks that contain many zeros, stack compression can be effective. In stack compression, uncompressed data block is compressed into compressed data block by removing one or more zero words from the uncompressed data block. A map metadata that maps the zero words of the uncompressed data block is generated during compression. With the use of the map metadata, the compressed data block can be decompressed to restore the uncompressed data block.

WARM START FILE COMPRESSION USING SEQUENCE ALIGNMENT
20240378179 · 2024-11-14 ·

Compressing files is disclosed. An input file to be compressed is first aligned. Aligning the file includes splitting the file into sequences that can be aligned. The result is a compression matrix, where each row of the matrix corresponds to part of the file. The compression matrix may also serve as a warm start if additional compression is desired. Compression may be performed in stages, where an initial compression matrix is generated in a first stage using larger letter sizes for alignment and then a second compression stage is performed using smaller letter sizes. A consensus sequence id determined from the compression matrix. Using the consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The compressed file includes the pointer pairs and the consensus sequence.

Information processing apparatus, information processing method and program
09922040 · 2018-03-20 · ·

The present invention aims to automatically determine an encoding parameter in consideration of a condition of compression efficiency and a memory usage and performing encoding based on the determined encoding parameter. To do so, an information processing method of an information processing apparatus comprises: estimating a memory usage concerning holding of a correspondence table of a part of structured data and codes; and estimating a compression effect obtained when the structured data is encoded, by holding the correspondence table, wherein a data size of the correspondence table is variable according to a value of a parameter, and the information processing method further comprises, under a condition of the memory usage, determining the value of the parameter on the basis of the estimated memory usage and the estimated compression effect, and encoding the structured data on the basis of the determined value of the parameter.

Path compression of a network graph

In an approach to analyzing a path on a graph, a computer receives a graph comprising a plurality of vertices and edges, each edge linking two vertices. The computer, for each one of said plurality of vertices, analyzes edges linked to said one of plurality of vertices to determine a number of outbound links from said one of plurality of vertices, orders said edges, and assigns a value to each ordered edge. The computer, for the graph, receives a path comprising a plurality of edges linking two of said plurality of vertices through at least one other of said plurality of vertices, encodes said path, the encoding using said number of outbound links and said assigned values of each of said one or more edges linking said two of said plurality of vertices, compresses the encoded path, and analyzes said path on said graph using said compressed, encoded path.

ENCODING APPARATUS, ENCODING METHOD AND SEARCH METHOD
20180034474 · 2018-02-01 · ·

A computer generates a plurality of pieces of syntax information respectively corresponding to a plurality of words in a compression target document by analyzing relationships between the plurality of words. Next, the computer assigns a plurality of compression codes to the plurality of words and to the plurality of pieces of syntax information. Then, the computer outputs the plurality of compression codes with an arrangement of a specific order.

Encoding apparatus, encoding method and search method

A computer generates a plurality of pieces of syntax information respectively corresponding to a plurality of words in a compression target document by analyzing relationships between the plurality of words. Next, the computer assigns a plurality of compression codes to the plurality of words and to the plurality of pieces of syntax information. Then, the computer outputs the plurality of compression codes with an arrangement of a specific order.

Effective stock keeping unit (SKU) management system
12169856 · 2024-12-17 · ·

An effective stock keeping unit (SKU) management system encodes catalog data into an embedding per catalog item. An embedding space is created by encoding catalog item data into an embedding per catalog item. The embedding is created by generating an index, where a number of rows represents a number of catalog items and a number of columns represents a number of fields associated with each catalog item. The index is then denormalized using customer groups and transformed by compressing the number of columns, to create the embedding space. In some configuration, a machine learning model is trained using catalog data. In the embedding space, item similarity is encoded by clustering catalog SKUs into groups in the embedding space, by placing similarly related items close to each other in the embedding space. Catalog items are then searched for in the embedding, with the closest clusters searched for a particular catalog item.

INLINE DECOMPRESSION
20240413839 · 2024-12-12 ·

Techniques and apparatuses to decompress data that has been stack compressed is described. Stack compression refers to compression of data in one or more dimensions. For uncompressed data blocks that are very sparse, i.e., data blocks that contain many zeros, stack compression can be effective. In stack compression, uncompressed data block is compressed into compressed data block by removing one or more zero words from the uncompressed data block. A map metadata that maps the zero words of the uncompressed data block is generated during compression. With the use of the map metadata, the compressed data block can be decompressed to restore the uncompressed data block.

Path compression of a network graph

In an approach to analyzing a path on a graph, a computer receives a graph comprising a plurality of vertices and edges, each edge linking two vertices. The computer, for each one of said plurality of vertices, analyzes edges linked to said one of plurality of vertices to determine a number of outbound links from said one of plurality of vertices, orders said edges, and assigns a value to each ordered edge. The computer, for the graph, receives a path comprising a plurality of edges linking two of said plurality of vertices through at least one other of said plurality of vertices, encodes said path, the encoding using said number of outbound links and said assigned values of each of said one or more edges linking said two of said plurality of vertices, compresses the encoded path, and analyzes said path on said graph using said compressed, encoded path.