Method and System for Content Agnostic File Indexing

Abstract

A computer-implemented method for content-agnostic referencing of a binary data file, the method comprising: determining a length of the binary data file, the length comprising the number of bits of the binary data file; for the determined length, generating all permutations of data of the determined length; locating an index within the generated permutations, wherein the index is the starting position of the binary data file within the generated permutations; and using the length and the index to indicate the binary data file.

Claims

1. A computer-implemented method for content-agnostic referencing of a binary data file, the method comprising: determining a length of the binary data file, the length comprising the number of bits of the binary data file; for the determined length, generating all permutations of data of the determined length; locating an index within the generated permutations, wherein the index is the starting position of the binary data file within the generated permutations; and using the length and the index to indicate the binary data file.

2. The method of claim 1, wherein using the length and the index to indicate the binary data file comprises: persisting on a storage device the length and the index instead of the binary data file.

3. The method of claim 1, wherein using the length and the index to indicate the binary data file comprises: Transmitting the length and the index instead of the data file.

4. The method of claim 3 wherein transmitting transmits the length and index on a network.

5. The method of claim 3 wherein transmitting transmits the length and index on a bus.

6. (canceled)

7. A method of compressing a data file comprising a sequence of bytes, the method comprising Calculating the number of bytes in the data file; Generating all possible permutations of data of the size of the calculated number of bytes; Searching through the generated permutations to locate the permutation that matches the data file; Determining the index of the located permutation; and using the number of bytes and index to indicate the data file.

8. The method of claim 7 wherein using the number of bytes and index to indicate the data file comprises persisting the number of bytes and index on a storage device.

9. The method of claim 8 wherein the storage device is a disk.

10. The method of claim 9 wherein using the number of bytes and index to indicate the data file comprises transmitting the number of bytes and index instead of the data file.

11. The method of claim 10 wherein transmitting transmits the bytes and index over a network.

12. The method of claim 10 wherein transmitting transmits the bytes and index via a bus

13. A method of compressing a data file, the method comprising Calculating the size of the data file; Generating all possible permutations of data of the size data file; Searching through the generated permutations to locate the permutation that matches the data file; Determining the index of the located permutation; and using the size and index to indicate the data file.

14. The method of claim 13 where the data file is binary data.

15. The method of claim 13 where the data file is n-ary data.

16. The method of claim 13 where the index is an integer.

17. The method of claim 13 wherein using the size and index to indicate the data file comprises transmitting the size and index.

18. The method of claim 17 wherein transmitting comprises transmitting on a network.

19. The method of claim 17 wherein transmitting comprises transmitting on a bus.

20. The method of claim 13 wherein using the size and index to indicate the data file comprises storing the size and index.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] For a more complete understanding of the present disclosure and its advantages, reference is now made to the following descriptions, taken in conjunction with the accompanying drawings, in which:

[0009] FIG. 1 is a flowchart outlining the steps of one embodiment of the present disclosure.

[0010] FIG. 2 is another flowchart outlining the steps of another embodiment of the present disclosure.

[0011] Similar reference numerals refer to similar parts or steps throughout the several views of the drawings.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0012] The present disclosure relates to a method for content-agnostic indexing of data. The method may be used for a variety of computer-specific needs, including for example as a file referencing system or a compression system.

[0013] The disclosure below describes the invention in connection with compression of binary data as exemplary, but the teachings work as well with any type of data, better termed n-ary data. For example, the method and system also works with qubits and bits.

[0014] One embodiment of the present invention comprises a method as described in the flow chart depicted in FIG. 1. Binary data (n.sub.i) (for instance, a data file) to be persisted or transmitted is analyzed to determine its length in bits (l(n.sub.i). Using this information, at step 106, the method calculates all permutations of data of the identified length. For example, if the input data is:

[0015] then the input data is 2-bits long. At step 106, all permutations of 2-bits will be generated, namely: [0016] {00} {01} {10} {11}

[0017] At step 108, the method determines the index (n.sub.f) of the input binary data file in the generated permutations. Using the example above, the index (n.sub.f) returned would be 1. Finally, rather than storing or transmitting the input binary data (i.e. 01), the system instead stores the length (2) and the index (1).

[0018] When the need comes to decode the original input data (for instance, a request to retrieve the original binary data from disk, or receipt of the transmitted data across a network), the method needs only a length (l(n.sub.i) and an index (n.sub.f) as input. Using the above example, the input provided would be the length (2) and the index (1). As shown in FIG. 2, the system calculates all permutations of the inputted length. As above, that would generate the following permutations: [0019] {00} {01} {10} {11}

[0020] The system would then go to the provided index (1 in the above example) and return the permutation. Again, using the above example, this would return 01 the original binary data.

[0021] The above method has been described for purposes of example in terms of a binary system (i.e. the input data is binary data). The method and system work similarly for n-ary systems. While the binary system describes above works essentially in the Euclidean plane, with n-ary data Hilbert spaces conceptually provide the same advantages. The method and process can be generalized for n-ary data per below:

dn=p(i)

(dn)n=p(f)

d=order of the system
n=length in appropriate n-ary units respective to the order of the system
p(i)=initial index
p(f)=final index

TABLE-US-00001 Order of System Visual Reference (d) Representation Key Search Pattern 1 String n/x Left to Right 2 Plane n/x/y Top Left to Bottom Right 3 3(fold) n/x/y/z Top Back Left to Bottom Front Right D D(fold) n/x/y/z/ . . . Top Back Left . . . to Bottom Front Right . . .

[0022] It should be noted that given two alternative ordered systems with the same input file, the system with the higher order will have a higher n-ary density relative to the alternative with a lesser ordered system.

[0023] An example of the method is disclosed in the following Ruby code snippets. The below snippet demonstrates a method as disclosed in FIG. 1:

TABLE-US-00002 class Input require securerandom def create(k) input_binary = SecureRandom.hex(k) end def clean(k) input_string = create(k).unpack(B*).first.to_s end def build(n) permutation = (0..2**n1).map { |i| %0#{n}b % i } end def self.kmp_search(string, substring) return nil if string.nil? or substring.nil? pos = 2 cnd = 0 failure_table = [1, 0] while pos < substring.length if substring[pos 1] == substring[cnd] failure_table[pos] = cnd + 1 pos += 1 cnd += 1 elsif cnd > 0 cnd = failure_table[cnd] else failure_table[pos] = 0 pos += 1 end end m = i = 0 while m + i < string.length if substring[i] == string[m + i] i += 1 return m if i == substring.length else m = m + i failure_table[i] i = failure_table[i] if i > 0 end end return nil end def kmp_search(substring) Input.kmp_search(self, substring) end end init = Input.new input = init.clean(1) depth = input.length generate = init.build(depth) steps = generate.join.to_s step = Input.kmp_search(#{steps} ,#{input}) p input p depth p step

[0024] The below snippet demonstrates a method as disclosed in FIG. 2, using an input length (l(n.sub.i) of 16 and an index (n.sub.f) of 72,629:

TABLE-US-00003 class Output def build(n) permutation = (0..2**n1) .map { |i| %0#{n}b % i } end end depth = 16 step = 72629 init = Output.new create = init.build(depth) interpret = create.join.to_s compute = (depth + step) 1 output = interpret[step..compute] .gsub (/\s\w+$/,...) p output

[0025] The method and system may preferably be implemented in a computing system, which can include a personal computer, a workstation, a network computer, a hand held computer, or any other computing system. Further, the system may be written as a software program in any appropriate computer language.

[0026] The system includes one or more processing devices, which may be any computer processing unit, and could be a single central processing unit, or a number of processing units configured to operate either in sequence or in parallel. The processing device can be configured to execute software processes which implement the steps disclosed herein. The system may also include a memory capable of storing the steps necessary for a processing device to implement the steps disclosed herein. This memory could be in the form of memory resident within the processing device or in the form of standalone memory coupled to the processing unit via a communication path, such as a bus or a network.

[0027] Although this disclosure has been described in terms of certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure.

Method and System for Content Agnostic File Indexing

Assignee

Inventors

Cpc classification

Classification Explorer

H03M7/3077

ELECTRICITY

Classification Explorer

G06F16/2237

PHYSICS

Classification Explorer

H03M7/55

ELECTRICITY

Classification Explorer

G06F16/13

PHYSICS

Classification Explorer

G06F16/1744

PHYSICS

International classification

Classification Explorer

G06F7/00

PHYSICS

Abstract

Claims

Description