Sonic Boom: System For Reducing The Digital Footprint Of Data Streams Through Lossless Scalable Binary Substitution
20180145701 ยท 2018-05-24
Assignee
Inventors
Cpc classification
H03M7/30
ELECTRICITY
International classification
Abstract
Because all digital data streams are composed of randomly-distributed zeros (0s) and ones (1s) called bits, it can be posited that all arbitrary-length binary data sets having a finite magnitude can be distilled into numerically-precise integers that accurately represent the value of every individual bit within the set. Mathematically, once a data stream's bit structure has been analyzed, the exact combination of its uniquely-assembled bits, its digital footprint, can be perfectly replicated simply by calculating the numerical value of each consecutive bit to produce a decimal sum equal to the value of the entire stream. This universal data compression technique is called SCALABLE BINARY SUBSTITUTION because the functional objective of the scheme is to analyze the digital footprint of a source data stream, regardless of its magnitude, and substitute the entirety of its encoded information for a simple math expression: Absolutely lossless data compression through mathematically-precise substitution.
Claims
1. The invention claimed functions as a data substitution system by applying binary-to-decimal arithmetic to directly calculate the decimal value of each consecutive bit in any arbitrary-length binary source data set in order to produce an output decimal sum whose precise numerical value is substituted for the entire string of binary source bits.
2. A data substitution system as claimed in claim 1, in which the individual bits of a binary source data set are interpreted and/or directly calculated in a manner that produces an output decimal sum independently and exclusively respective of each bit's spatial values derived from their exact positions within the set.
3. The invention claimed achieves a level of material data compression by converting the output decimal sum of the consecutively added bits of a binary source data set as produced in claim 1, or 2, into an interchangeable mathematical expression of equivalent numerical value specifically encoded, prearranged, or designed to produce a materially reduced magnitude.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0036] Some embodiments and mathematical formulae of the present invention are illustrated as examples of the working process by which the SBS system functions:
[0037] FIG. [1.0]: Sample binary string of bits illustrating their numerical (decimal) value in relation to their respective bit positions within the string.
[0038] FIG. [2.0]: The SBS (Scalable Binary Substitution) Algorithm.
[0039] FIG. [3.0]: Sample 10-byte (80-bit) source data set.
[0040] FIG. [3.1]: Mathematical calculation of 10-byte (80-bit) source data set as shown in [FIG. 3.0].
[0041] FIG. [3.2.1]: Sample 8-byte (64-bit) source data set.
DETAILED DESCRIPTION OF THE INVENTION
1. Introduction
[0042] In the modern digital world, millions and billions of source bits are assembled to create most commonly used data sets like software programs, multimedia files, games, and digital communication signals. To increase the utility of digital data, there have been many innovations in the art of data compression that are based upon as many different strategies, frameworks, and methodologies as there are hardware and software systems that utilize such data. Most data compression techniques are based upon condensing source data by deleting a material amount of information or by substituting source data for an alternative symbolic representation.
[0043] Compressing data streams by calculating the value of its consecutive bits produces sums that can often be millions of digits in length. This is because according to the mathematical nature of adding the individual bits of an arbitrary-length binary data set, the numerical value of any given bit in a stream is exactly double the magnitude of the bit that directly precedes its position, and exactly one-half the magnitude of the bit that follows it.
[0044] To illustrate this point, the bits of a sample data set are examined in [FIG. 1.0] beginning with a random bit that is found in the Nth position of a hypothetical data stream. Additionally, the numerical value that is assigned to this Nth bit is thirty-two (32). When the individual bits shown in [FIG. 1.0] are calculated by directly adding the numerical value of each successive bit (or, conversely, subtracting the value of each bit from the mean deviation of the highest-order bit's value), their combined sum is 2,106 which, in the decimal system, requires 4 digits to represent that specific magnitude value.
[0045] Computers perform mathematical calculations by combining the logical operations performed by its logic gates to compute the necessary additions, subtractions, multiplications, etc., and arrive at a precise answer. The sequence of logical operations used to perform a particular calculation or specific predetermined functions are called algorithms. If computational resources are not a concern, calculating the numerical value of the assembled bits in a source data set and representing the combined sum in whole decimal value is trivial from an algorithmic perspective. Successively adding a data stream's bits that are initialized to zero (0) followed by the non-negative integer one (+1) up to N (if any) will compute {0, 1, 1+N . . . N}, provided that the necessary computing functions do not exceed the limits of the available CPU hardware and the output decimal representation fits into an allocated memory source.
[0046] To explain how this process would apply to a real world paradigm, we will examine one of the most commonly encountered binary data sets of the modern computer age: the digital music file. Given that the average 4-minute music file (.MP3 song, for instance) is approximately 4.0 Megabytes (MB) in size, this means that there are 4,194,304 bytes in the file. A byte is defined as a unit of computer information or extensible data storage capacity that consists of a discrete group of 8 bits and that is used especially to represent an alphanumeric character (i.e.: letters, numbers, symbols, etc.). Because a byte is made up of 8 bits, this means that a 4.0 MB music file contains 33,554,432 individually-assembled bits. When these 33 million bits are consecutively added together, this will mathematically produce an equivalent decimal sum approximately 10 million digits long.
[0047] In the realm of computer science, when these metrics are considered in terms of data compression, consecutively adding a data stream's bits in order to calculate the numerical value of the entire stream does not, in itself, produce any compression of the original size of the stream. Statistically speaking, a zero net compression ratio (1:1) is produced as a result of this basic process. In fact, in certain instances, negative compression ratios can result from converting binary values into their equivalent decimal values. The fundamental logic of the SBS scheme is to realize superior and absolutely lossless levels of compression by using dynamic mathematical utilities to express a data stream's combined decimal sum in its most elegant, precise, and highly-abbreviated form. By using robust math tools such as square and cube roots, high-powered exponentials, factorials, and other algebraic and calculus functions, the information contained within entire data streams, indeed oceans of data, can be flawlessly substituted for extremely compact and mathematically-precise expressions called Kinetic Data Primers, (or KDPs). A KDP is, essentially, a basic set of mathematical instructions that, upon algorithmic calculation, is designed to yield precise decimal sums that can be easily converted into a linear sequence of equivalent-value binary bits.
[0048] To illustrate how calculating a data set's bits can produce extremely large decimal numbers, and how such numbers can be simply expressed as mathematically-perfect KDPs, the following illustration is a graphical interpretation of a relatively small 64-byte data set. For perspective, given that the size of a common text message (i.e.: a tweet on the Twitter service) is limited to 140 characters, which would require 140 bytes of uncompressed data to represent those characters, 64 bytes is roughly half that size.
[0049] A 64-character text message produces a binary data set of this size: [0050] 1110101111010000111010111100001111010100010110011001101100 01000101001101011010110101010110101010110101011001101011101 01010101111011010101011010111110111001010011110111100101111 01101011101011000111010001010001101011010101010101011001101 01010101010101010111101010101011111101010101010010101010101 10011011101011010010110101101010101010101011001101010010101 01111010101010010110101010010111110110101010101010101010101 01010101010010101010101001010111110101101010101011101010101 0010101010101111111000001010101001010111 [0051] when the numerical value of these bits are consecutively added together, they produce the decimal sum: [0052] 13, 407, 807, 929, 942, 597, 099, 574, 024, 998, 205, 846, 127, 479, 365, 820, 592, 393, 377, 723, 561, 443, 721, 764, 030, 073, 546, 976, 801, 874, 298, 166, 903, 427, 690, 031, 858, 186, 486, 050, 853, 753, 882, 811, 946, 569, 946, 433, 649, 006, 084, 096 .sup.[1]
[0053] [1] This specific integer represents the precise decimal sum produced by successively adding all 512 bits of a 64-byte binary data set, providing, of course, that each bit in the set yielded its maximum possible numerical value relative to its position within the set (i.e.: if every bit in the data set were calculated as binary ones (1s)). Demonstrating the functionality of the SBS-KDP methodology by reducing a 155-digit integer into a numerically-equivalent (exponentially-powered) 5-character KDP is used herein only to show the maximum mathematically-achievable algorithmic efficiency of the SBS scheme by exploiting the structural stability of binary-to-decimal arithmetic to manipulate binary source data sets in proprietary ways. [0054] which, in SBS format, can be precisely expressed as a Kinetic Data Primer as elegant and compact as: [0055] 2512 [0056] (Two-to-the-Five Hundred and twelfth-power).
[0057] In the case of the 4.0 MB music file mentioned herein, the 10 million-digit-long decimal number that is produced by successively adding its 33 million source bits can be profoundly reduced by expressing its numerical sum in a more elegant, yet mathematically-precise way. For example, the numerical value of a 10 million-digit-long decimal number can be accurately expressed as a KDP as compactly-written as: [0058] 15600001560000 [0059] (One million five hundred and sixty-thousand-to-the-One million five hundred sixty-thousandth-power).
[0060] When a Kinetic Data Primer of this magnitude is calculated, it will produce a decimal sum approximately 10 million digits in length. This 10 million-digit-long decimal number can then, in turn, be converted back into its precise binary equivalent which, in the methodology of the SBS substitution scheme, would serve to perfectly reconstruct the digital footprint (i.e.: bit type and exact position) of all 33 million bits in the original 4.0 MB source data set.
[0061] The ultimate utility of the SBS scheme can be found in the sheer economy of data used to substitute the exact numerical value of astronomically-large source-calculated sums: Encoding an arbitrary mathematical expression such as 15600001560000 into a machine-readable format would only require 50 bits of data (less than 7 bytes)..sup.[2] In general terms of data compression, encoding the binary information contained in a 4,194,304-byte (4.0 MB) source file into an SBS-KDP as infinitesimally compact as seven (7) bytes would mathematically indicate a baseline output compression ratio of 599,186:1, which is the net compression yield of 4,194,304 bytes reduced to 7 bytes (0.007 KB). For technical perspective, the current state-of-the-art in commercial-grade audio media compression techniques only produce average output compression ratios of less than 100:1. .sup.[2] The Kinetic Data Primer size variable of seven (7) bytes represents the 50 bits of data needed to encode the mathematical expression 15600001.560000 into its KDP format. These 7 KDP bytes consist of the 21 bits of data needed to represent both the base decimal magnitude of (1,560,000) and its exponential power magnitude of (1,560,000.sup.1,560,000) plus the 8 bits of data needed to represent the ASCII symbol () used to signify a base number's exponential value. The 50 bits of data needed to express the KDP 15600001560000, for example, can be encoded within 7 bytes because, at 8-bits-per-byte, the maximum data capacity of 7 bytes is 56 bits. This 7-byte KDP size variable excludes any proprietary KDP file data including, for instance, any SBS-KDP file ID, KDP codec decimal library markers, alphanumeric hash tags (MD5, etc.), IP security/encryption codes, forensic authentication data (DMCA, etc.), KDP mantissa-correction codes, and any other dynamic KDP payload data. When these extrinsic SBS-KDP file data are embedded into a KDP in its perfect format, this could increase the KDP's output size from its 7-byte Quantum Footprint to a maximum scalable payload capacity of 32 bytes (0.03 KB). When a KDP is scaled to its maximum payload size format of 32 bytes, this will necessarily decrease its output compression ratio from 599,186:1 to 131,072:1, which is the net compression yield of 4,194,304 bytes reduced to 7 bytes (0.007 KB) and 32 bytes (0.03 KB), respectively.
2. The SBS Algorithm
[0062] The specific functions of the SBS algorithm scheme can be explained in its most simplified form in the following 5 steps shown in [FIG. 2.0].
3. The Proposed SBS Algorithm Scheme
[0063] To illustrate the (source bits-to-kinetic data primer) substitution methodology of the SBS algorithm scheme, the following example of an actual binary source data set is examined in [FIG. 3.0]. The SDS shown in [FIG. 3.0] contains 80 bits. Eighty bits (at 8 bits-per-byte) equals 10 bytes. Because a bit can only exist in two states, a zero (0) or a one (1), for the purposes of demonstrating the functionality of the SBS algorithm, the bits in the SDS have been randomly arranged. The numerical value of any given bit in a data set will always be determined by its type (i.e.: 0 or 1) and its exact position within the set. When calculating the numerical value of consecutive bits in any finite-length data set, it is important to note that only binary 1s (1-bits) will produce any numerical value and their equivalent decimal values will be determined by their exact position within the set. Conversely, if any bit in a finite-length data set is a binary 0 (0-bit), it will not produce any numerical value and, therefore, its equivalent decimal value will always be zero (0) regardless of its position within the data set. Additionally, since the numerical value of the first bit (bit-1) of any finite-length data set will always be initialized to zero (0), it will only produce a corresponding decimal value of one (+1) if it is a 1-bit. All subsequent bits in the data set, if any, will produce a corresponding decimal value exactly double (2) the value of the bit that directly precedes its position. The potential decimal value of the bits in any finite-length data set will be determined as follows: [0064] {1, 1.sup.x2,1.sup.x4, 1.sup.x8, 1.sup.x16, 1.sup.x32, 1.sup.x64, 1.sup.x128, 1.sup.x512, 1.sup.x1,024, 1.sup.x2,048, . . . N}
[0065] To illustrate in detail how the bits in the [FIG. 3.0] SDS were calculated, a list of all 80 source bits and their equivalent decimal values are listed in [FIG. 3.1]. Consecutively adding each bit in the [FIG. 3.1] SDS produces a decimal sum of 8.140274939e22, which has 23 digits in the output number. When this sum is expressed as a whole number, its precise decimal value is: 81,402,749,386,839,761,113,321. To realize a material level of data compression, this 23-digit decimal sum can be synthesized into an alternate mathematical expression such as 121.sup.11 (or One hundred twenty-one-to-the-eleventh-power). This alternate numerical expression can then be coded into a machine-readable KDP written as: 12111
[0066] The data needed to encode the mathematical expression 12111 is only 24 bits (3 bytes). Specifically, the decimal values (121) and (11) can each be encoded within two 8-bit groups because, in the binary system, the total range of decimal values that can be represented in each group is 0 through 255. The ASCII symbol () can also be encoded using 1 byte of data.
[0067] In addition to the methodology of successively adding the value of individual source bits into a combined output decimal sum, the dynamic functionality of the SBS supersubstitution framework also allows the digital footprint of any arbitrary-length binary data set to be perfectly replicated by calculating the sum of its source bits independently and exclusively by their spatial values derived from their exact positions within the set. Accordingly: For an 8-bit data set composed of all binary ones (1-bits), the maximum spatial-bit value that can be obtained is 36. This decimal value is calculated by successively adding the base value of each individual bit respective of its positional value within the set, or {1+2+3+4+5+6+7+8}=36. Conversely, for material clarification, if an 8-bit data set were composed of seven binary zeros (0-bits) and a single binary one (1-bit) found in the 8.sup.th-bit position, the total calculated spatial-bit value of the set would be 8. Calculating the collective bit positions of individually-interpreted source bits as a collective spatially-oriented data set presents an alternately viable method of expressing a data set's combined numerical sum in a more economical fashion. This form of spatially-oriented and vector-based calculation method necessarily involves an expanded algorithmic process to identify and correct numerical redundancies and to produce a perfect output KDP with a collateral mantissa, if any, as compact as the primary SDS-to-KDP method detailed herein.
3.2 SBS Algorithm Scheme with a Multi-Kinetic Data Primer Number of a Source Data Set
[0068] The Source Data Set shown in [FIG. 3.0] demonstrates the methodology in which the bits of an SDS can be calculated into an equivalent decimal value and further synthesized into an alternate numerical expression which, in the final stage of the SBS scheme, is used as the input data for a source's KDP. In the above SDS-to-KDP demonstration, the decimal sum that resulted from calculating the SDS's bits was precise enough to be synthesized into a single exponential expression of (121.sup.11) without any collateral decimal remainder. Because there are an infinite number of equivalent numerical values that can be calculated from the analysis of binary data sets, it is a mathematical certainty that not every sum will be without any collateral decimal remainder resulting from such calculation. Therefore, in the following SDS-to-KDP demonstration, we will show how an SDS with an imperfect decimal sum can be synthesized into a perfect KDP using multiple primers. This SDS is shown in [FIG. 3.2.1].
[0069] When the individual bits shown in [FIG. 3.2.1] are consecutively added together, the decimal sum that is produced is: 2,432,902,008,176,640,000. When this decimal sum is initially calculated to determine if it can be synthesized into a neat high-powered exponential expression of equivalent value, or, in other words, an expression without any collateral decimal remainder, it is found to be numerically imperfect. Whenever an imperfect source sum is produced, the simplest method of calculating its most-approximate base primer is to subtract a binary magnitude variable that is found to be the closest numerical approximation to the output decimal sum of the SDS. In other words, since the output sum of the SDS is (2.432902008e18), the closest equivalent decimal value that can be expressed as a binary magnitude variable would be (2.sup.61), which, when calculated, produces a decimal value of (2.305843009e18). In order to calculate the next viable (2.sup.nd-order) sub-primer, the numerical disparity between the SDS sum and the newly-obtained base primer value must first be ascertained. When these two numbers are calculated by subtracting the base primer value from the sum of the SDS, the remaining decimal value is (1.27058999e17). When this decimal remainder is calculated to determine whether it can be synthesized into a neat equivalent expression, its most-approximate equivalent sum is found to produce a mantissa (collateral decimals to the right of a logarithm).
[0070] Whenever any sub-primer is found to have a mantissa, the simplest method of determining whether it can be used as a viable output sub-primer, the closest square/cube root of the number is calculated to find the most-approximate non-negative integer with the smallest mantissa (i.e., the lowest number of collateral decimals). In the case of the decimal remainder (1.27058999e17), the most viable sub-primer variable is found by calculating its first cube root (.sup.3), which produces a decimal value of (502,730.3947). This sub-primer output variable of (502,730.3947.sup.3) can be used as a viable 2.sup.nd-order KDP number, because, when it is calculated into its whole decimal form and compared for accuracy against its source variable, it doesn't produce any collateral decimals. Therefore, the two KDP numbers that can be integrated to produce a perfect output KDP number are detailed as follows:
TABLE-US-00001 1. Precise calculated decimal sum of SDS: 2,432,902,008,176,640,000 2. MINUS base primer binary magnitude value of (2.sup.61): 2,305,843,009,213,693,952 3. MINUS 2.sup.nd-order sub-primer value of (502,730.3947.sup.3): 127,058,998,962,946,048 Decimal Remainder (if any): ZERO
[0071] In the final analysis, the perfect multi-variable output KDP number is: [0072] 261+502730.39473
[0073] When this multi-variable output KDP number is calculated into a single whole number, it produces a decimal value of (2,432,902,008,176,640,000), which is precisely equivalent to the calculated decimal sum of the SDS. The data needed to encode the mathematical expression 261+502730.3947^3 as a perfect KDP number is 73 bits (less than 10 bytes). These 73 bits consist of:
TABLE-US-00002 1. 2 bits to represent the base decimal number (2) 2. 8 bits to represent the ASCII symbol ({circumflex over ()}) to signify an exponential- power 3. 6 bits to represent the exponential-power decimal magnitude of (61) 4. 8 bits to represent the ASCII symbol (+) to signify an addition math operation 5. 19 bits to represent the whole decimal number (502,730) 6. 8 bits to represent the ASCII symbol (.) to signify a decimal point (or a period) 7. 12 bits to represent the decimal value of the mantissa (3947) 8. 8 bits to represent the ASCII symbol ({circumflex over ()}) to signify an exponential- power 9. 2 bits to represent the exponential-power decimal magnitude of (3) = 73 Total bits of data to represent the KDP 2{circumflex over ()}61 + 502730.3947{circumflex over ()}3.
[0074] The 73 total bits of data needed to express the above perfect KDP can be encoded within 10 bytes because, at 8-bits-per-byte, the maximum data capacity of 10 bytes is 80 bits. In terms of data compression, encoding the binary information contained in an 8-byte SDS into a 10-byte multi-variable KDP number would mathematically indicate a negative net output compression ratio of 0.80:1, which is the net compression yield of 8 bytes increased to 10 bytes (0.0097 KB).
[0075] This particular example of a multi-primer KDP is being included herein to demonstrate that it is, in fact, mathematically-possible to produce a negative net compression yield from the application of the SBS scheme to an arbitrary-length SDS.
[0076] Although it is highly unlikely that an SDS as small as 8 bytes would have any viable human utility beyond machine-readable-only command prompts and predetermined programming functions, an 8-byte SDS was specifically chosen because it approximates the algorithmic/substitution threshold limit that determines whether a positive or negative output compression yield is produced by the application of the SBS scheme. It is important to emphasize the fact that, as prior algorithm examples demonstrate, the SBS scheme uses multi-input data fields to encode an SDS into an output KDP whose range of unique numerical input data are virtually limitless. Whenever the application of the SBS scheme produces a negative net compression yield, it is mathematically-possible to synthesize other multi-primer alternative variables that can produce more precise decimal sums which, upon further calculation, can have a material effect on whether the final KDP synthesis yields a positive or negative net compression ratio.
4. Experimental Results And Discussions
[0077] The algorithm structure of the SBS-KDP scheme uses dual binary input data fields to encode up to 64 bits (8 bytes) of scalable KDP source information per field. The precise range of numerical values that be encoded within each 64-bit number field is 0 through 18,446,744,073,709,551,615 (18 Quintillion, or 2.sup.641), which is used to represent the corresponding range of decimal values produced by calculating the bits of an SDS. The two number fields are functionally partitioned by a third input data character field used to represent dynamic mathematical functions such as exponential-powers (x), square and cube roots (.sup.x), factorials (x!), or any other math operation (+, , , x, etc.), for instance.
[0078] When both input number fields are coded to represent the maximum decimal value of their 64-bit data capacities used in tandem with the input character field to express a dynamic mathematical operation, a high-powered exponential value, for example, the combined tri-field input would be: [0079] 1844674407370955161518446744073709551615
[0080] The data needed to represent this specific maximum-value KDP number is only 136 bits (17 bytes), whereas the amount of source data that can be encoded is 2.3 sextillion bytes (2.3 Zettabytes, or ZB) with 100.000% lossless data retention efficiency. If no other extrinsic SBS-KDP file data are needed to produce a perfect KDP source number, then these 17-byte-scheme metrics would mathematically indicate an output compression ratio of 138 EB:0.017 KB, which is the net compression yield of a 2.3 ZB SDS reduced to 17 bytes (0.017 KB)..sup.[3]
[0081] .sup.[3] As previously explained, whenever any extrinsic SBS-KDP file data are embedded into a perfect KDP source number, the output size of the KDP could increase from its 17-byte Quantum Footprint to its maximum scalable payload capacity of 32 bytes (0.032 KB). Including any such extrinsic KDP file data would necessarily decrease the output compression ratio from 139 EB:0.017 KB to 73 EB:0.031 KB, which is the net compression yield of a 2.3 ZB SDS reduced to 17 bytes and 32 bytes, respectively.
[0082] In this non-provisional patent submission, the inventor hereby makes the following CLAIM(S) to substantiate, support, and corroborate the uniquely defensible nature of the preceding Summary of the Invention entitled: SONIC BOOM: System for Reducing the Digital Footprint of Data Streams Through Lossless Scalable Binary Substitution.