H03M7/3084

Efficient generalized boundary detection

Fast, efficient, and robust compression-based methods for detecting boundaries in arbitrary datasets, including sequences (1D datasets), are desired. The methods, each employing three simple algorithms, approximate the information distance between two adjacent sliding windows within a dataset. One of the algorithms calculates an initial ordered list of subsequences; while a second algorithm updates the ordered list of subsequences by dropping a first entry and appending a last entry rather than calculating completely new ordered lists with each iteration. Large values in the distance metric are indicative of boundary locations. A smoothed z-score or a wavelet-based algorithm may then be used to locate peaks in the distance metric, thereby identifying boundary locations. An adaptive version of the method employs a collection of window sizes and corresponding weighting functions, making it more amenable to real datasets with unknown, complex, and changing structures.

CHAN FRAMEWORK, CHAN CODING AND CHAN CODE
20230223952 · 2023-07-13 ·

A FRAMEWORK and the associated method, schema and design for processing digital data, whether random or not, through encoding and decoding losslessly and correctly for purposes including the purposes of encryption/decryption or compression/decompression or both. There is no assumption of the digital information to be processed before processing. A Universal Coder is invented and now Pigeonhole meets Blackhole.

Technologies for assigning workloads to balance multiple resource allocation objectives

Technologies for allocating resources of managed nodes to workloads to balance multiple resource allocation objectives include an orchestrator server to receive resource allocation objective data indicative of multiple resource allocation objectives to be satisfied. The orchestrator server is additionally to determine an initial assignment of a set of workloads among the managed nodes and receive telemetry data from the managed nodes. The orchestrator server is further to determine, as a function of the telemetry data and the resource allocation objective data, an adjustment to the assignment of the workloads to increase an achievement of at least one of the resource allocation objectives without decreasing an achievement of another of the resource allocation objectives, and apply the adjustments to the assignments of the workloads among the managed nodes as the workloads are performed. Other embodiments are also described and claimed.

Pooling blocks for erasure coding write groups

A technique provides efficient data protection, such as erasure coding, for data blocks of volumes served by storage nodes of a cluster. Data blocks associated with write requests of unpredictable client workload patterns may be compressed. A set of the compressed data blocks may be selected to form a write group and an erasure code may be applied to the group to algorithmically generate one or more encoded blocks in addition to the data blocks. Due to the unpredictability of the data workload patterns, the compressed data blocks may have varying sizes. A pool of the various-sized compressed data blocks may be established and maintained from which the data blocks of the write group are selected. Establishment and maintenance of the pool enables selection of compressed data blocks that are substantially close to the same size and, thus, that require minimal padding.

COMPRESSION DEVICE AND DECOMPRESSION DEVICE
20230006689 · 2023-01-05 ·

According to one embodiment, an interleaving unit divides a symbol string into first and second symbols. A first coding unit converts the first symbols to first codewords. A first packet generating unit generates first packets including the first codewords. A first request generating unit generates first packet requests including sizes of variable length packets. A second coding unit converts the second symbols to second codewords. A second packet generating unit generates second packets including the second codewords. A second request generating unit generates second packet requests including sizes of variable length packets. A multiplexer outputs a compressed stream including the first and second variable length packets cut out from the first and second packets.

TECHNOLOGY FOR EARLY ABORT OF COMPRESSION ACCELERATION

An integrated circuit includes a compression accelerator to process a request from software to compress source data into an output file. The compression accelerator includes early-abort circuitry to provide for early abort of compression operations. In particular, the compression accelerator uses a predetermined sample size to compute an estimated size for a portion of the output file. The sample size specifies how much of the source data is to be analyzed before computing the estimated size. The compression accelerator also determines whether the estimated size reflects an acceptable amount of compression, based on a predetermined early-abort threshold. The compression accelerator aborts the request if the estimated size does not reflect the acceptable amount of compression. The compression accelerator may complete the request if the estimated size reflects the acceptable amount of compression. Other embodiments are described and claimed.

Column data compression schemes for scaling writes and reads on database systems
11537571 · 2022-12-27 · ·

A request for performing a data storing operation directed to a database table that comprises a plurality of table columns is received. Columnar compression metadata is accessed to identify one or more table columns in the database table, each of the one or more table columns being designated to store compressed columnar values. The columnar compression metadata is used to apply one or more columnar compression methods to generate, from one or more uncompressed columnar values received with the request for the data storing operation, one or more compressed columnar values to be persisted in the one or more table columns in the database table. A database statement is executed to persist the one or more compressed columnar values in the one or more table columns in the database table.

TEXT COMPRESSION WITH PREDICTED CONTINUATIONS

A method for text compression comprises recognizing a prefix string of one or more text characters preceding a target string of a plurality of text characters to be compressed. The prefix string is provided to a natural language generation (NLG) model configured to output one or more predicted continuations each having an associated rank. If the one or more predicted continuations include a matching predicted continuation relative to the next one or more text characters of the target string, the next one or more text characters are compressed as an NLG-type compressed representation. If no predicted continuations match the next one or more text characters of the target string, a longest matching entry in a compression dictionary is identified. The next one or more text characters of the target string are compressed as a dictionary-type compressed representation that includes the dictionary index value of the longest matching entry.

Techniques for determining compression tiers and using collected compression hints

Tiers of compression algorithms may be determined using compression information collected regarding compression ratios achieved for data sets using compression algorithms. Each tier may meet specified criteria regarding expected compression ratios achieved for a specified portion or number of data sets. Compression algorithms of each tier may be implemented by a different hardware device that may include hardware accelerators for the algorithms of the tier. Different tiers, and thus different hardware devices, achieve different levels of compression. A recommendation may be provided using compression information collected, such as from one of the hosts, regarding which hardware device to use for compression. The recommendation may be to purchase a license to use or whether to purchase a particular hardware device for compression. Compression information may be collected by a host that issues tagged I/Os providing a hint regarding what compression algorithm to use for the particular I/O operation data.

CHAN framework, CHAN coding and CHAN code
11515888 · 2022-11-29 · ·

A framework and the associated method, schema and design for processing digital data, whether random or not, through encoding and decoding losslessly and correctly for purposes including the purposes of encryption/decryption or compression/decompression or both. There is no assumption of the digital information to be processed before processing. An universal coder is invented and now pigeonhole meets blackhole.