Patent classifications
H03M7/405
File compression system
Examples of the disclosure describe systems and methods for implementing a file compression system. In an example method, a source string to be compressed is received. The source string comprises a plurality of characters. A first frequency is determined for each character of the plurality of characters of the source string. A first tree corresponding to the source string is determined based on the first frequencies. The source string is encoded using the first tree to generate a first encoded string. It is determined whether a total number of bits in the first encoded string is a multiple of eight. In accordance with a determination that the total number of bits in the first encoded string is not a multiple of eight, the first encoded string is appended with zeroes so that a new total number of bits in the first encoded string is a multiple of eight. In accordance with a determination that the total number of bits in the first encoded string is a multiple of eight, the method forgoes appending the first encoded string with zeroes. The first encoded string is divided into one or more eight-bit segments and a placeholder character is assigned to each eight-bit segment. A second frequency for each of the placeholder characters in the first encoded string is determined. A second tree corresponding to the first encoded string is determined based on the second frequencies. The first encoded string is encoded using the second tree to generate a second encoded string.
Huffman code generation
A method for generating Huffman codewords to encode a dataset includes selecting a Huffman tree type from a plurality of different Huffman tree types. Each of the Huffman tree types specifies a different range of codeword length in a Huffman tree. A Huffman tree of the selected type is produced by: determining a number of nodes available to be allocated as leaves in each level of the Huffman tree accounting for allocation of leaves in each level of the Huffman tree; allocating nodes to be leaves such that the number of nodes allocated in a given level of the Huffman tree is constrained to be no more than the number of nodes available to be allocated in the given level; and assigning the leaves to symbols of the dataset based an assignment strategy selected from a plurality of assignment strategies to produce symbol codeword information.
System and method for data compaction and security with extended functionality
A system and method for highly efficient encoding of data that includes extended functionality for asymmetric encoding/decoding and network policy enforcement. In the case of asymmetric encoding/decoding the original data is encoded by an encoder according to a codebook and sent to a decoder, but the output of the decoder depends on data manipulation rules applied at the decoding stage to transform the decoded data into a different data set from the original data. In the case of network policy enforcement, a behavior appendix into the codebook, such that the encoder and/or decoder at each node of the network comply with network behavioral rules, limits, and policies during encoding and decoding.
Computer product, generating apparatus, and generating method for generating Huffman tree, and computer product for file compression using Huffman tree
A computer-readable recording medium stores a program causing a computer to determine the size of an applied 2.sup.N-branch non-contact Huffman tree depending on where in a range the total number of types (X) of character information groups exists. The size of the 2.sup.N-branch non-contact Huffman tree has the maximum number of branches, 2.sup.N. The radicand N is an upper limit of the length of a compression code. Thus, when the size of the 2.sup.N-branch non-contact Huffman tree is determined, the radicand (N) may be determined depending on the total number of types (X) of character information groups. Specifically, when the total number of types (X) of character information groups is 2.sup.x2<X2.sup.x1, if the maximum number of branches (2.sup.N) is at least 2.sup.x1, a Huffman tree can be established. To minimize the size, N=x1 may be adopted. Further, when the total number of types (X) of character information groups is 2.sup.x1<X2.sup.x, if the maximum number of branches (2.sup.N) is at least 2.sup.x, a Huffman tree can be established. To minimize the size, N=x may be adopted.
System and method for codebook-based data encoding
A system and method for codebook-based data encoding. Portions of the data are encoded by different encoding libraries, depending on which library provides the greatest compaction for a given portion of the data. This methodology not only provides substantial improvements in data compaction over use of a single data compaction algorithm with the highest average compaction, but provides substantial additional security in that multiple decoding libraries must be used to decode the data. In some embodiments, each portion of data may further be encoded using different sourceblock sizes, providing further security enhancements as decoding requires multiple decoding libraries and knowledge of the sourceblock size used for each portion of the data. In some embodiments, encoding libraries may be randomly or pseudo-randomly rotated to provide additional security.
Personal health monitor data compaction using multiple encoding algorithms
Health monitor data is encoded using a plurality of encoding libraries. Portions of the data are encoded by different encoding libraries, depending on which library provides the greatest compaction. This methodology not only provides substantial improvements in data compaction over use of a single data compaction algorithm with the highest average compaction, but also provides substantial additional security in that multiple decoding libraries must be used to decode the data. Optionally, each portion of data may further be encoded using different sourceblock sizes, providing further security enhancements as decoding requires multiple decoding libraries and knowledge of the sourceblock size used for each portion of the data.
System and methods for adaptive bandwidth-efficient encoding of genomic data
A system and methods for adaptive bandwidth-efficient data encoding comprising: a sequence analyzer configured to analyze a received sequence dataset, maintain a count of unique characters, and identify positions where the unique character count increases by a power of two; an adaptive sourceblock optimizer that determines and dynamically adjusts optimal sourceblock sizes based on dataset characteristics; and a data deconstruction engine that deconstructs the dataset into sourceblocks and creates codewords for storage or transmission. The system analyzes sequence complexity, alphabet size, and character frequency distribution to optimize sourceblock sizes, and uses machine learning to improve decision-making over time. This adaptive approach enhances compression efficiency across varied genomic data types, including genome graphs, while maintaining data integrity and security. The system efficiently encodes, stores, and transmits complex genomic and bioinformatic datasets, addressing the growing challenges in data storage and bandwidth limitations.
BLOCKCHAIN DATA COMPRESSION AND STORAGE
Methods and systems described herein improve blockchain storage operations in a variety of environments. A blockchain compression system may determine that a blockchain compression condition associated with a blockchain having a first plurality of blocks has been satisfied. In response, the system compresses the first plurality of blocks using a first hash tree into a first root hash value and stores the first plurality of blocks in a first database. The blockchain compression system generates a first new era genesis block that includes the first root hash value and a first database address of the first database at which the first plurality of blocks are stored. The blockchain compression system stores the blockchain at one or more nodes in a blockchain network. The blockchain includes the first new era genesis block and any previous new era genesis blocks. This may effectively reduce storage requirements for the blockchain, in various embodiments.
Code table generation device, memory system, and code table generation method
According to one embodiment, a code table generation device includes a frequency table generation unit, a frequency sorting unit, and a Huffman tree generation unit. The frequency table generation unit generates a frequency table including entries each including a symbol and a frequency of occurrence, based on a frequency of occurrence for each symbol of input symbols. The frequency sorting unit sorts the entries in the frequency table by frequency of occurrence. The Huffman tree generation unit generates a Huffman tree having leaf nodes by using a queue that includes storage areas in which the sorted entries are respectively stored as the leaf nodes in an initial state, in response to the entries having been sorted.
Biometric-Authenticated Personal Health Monitor Data Compaction with Clinical Trial Optimization
A system and method for biometric-authenticated personal health monitor data compaction with clinical trial optimization is disclosed. The system receives biometric signals from multiple sensor modalities associated with a patient and extracts distinctive biometric features using signal processing algorithms. Patient identity verification is performed by comparing extracted features against stored biometric templates, generating cryptographic keys derived from verified biometric characteristics. Health data is divided into sourceblocks and encoded using multiple compression codebooks enhanced with biometric-derived cryptographic keys. Optimal encoded sourceblocks are selected based on compression efficiency and statistical preservation requirements. A clinical trial data optimization engine classifies health data by type and endpoint significance, determines statistical preservation requirements for regulatory compliance, and validates that compressed data maintains required statistical properties for clinical analysis. The system implements multi-modal biometric fusion, liveness detection, emergency override capabilities, and security controls including role-based access control and audit logging for secure clinical trial data management.