EFFICIENT ADAPTIVE SEISMIC DATA FLOW LOSSLESS COMPRESSION AND DECOMPRESSION METHOD
20170317688 · 2017-11-02
Assignee
Inventors
- Shanhui XU (Beijing, CN)
- Jian GUO (Beijing, CN)
- Changchun Yang (Beijing, CN)
- Guangding LIU (Beijing, CN)
Cpc classification
G01V1/32
PHYSICS
G01V11/00
PHYSICS
G01V1/28
PHYSICS
H03M7/30
ELECTRICITY
G01V1/247
PHYSICS
H03M7/70
ELECTRICITY
International classification
Abstract
An efficient adaptive seismic data flow lossless compression and decompression method, which aims at solving the problem that data occupies the storage space and affects the transmission efficiency and is used for efficiently compressing geophysical instrument data, particularly seismic data after 24-bit analog-to-digital conversion. In the method, a data flow is compressed in a lossless mode in real time, and sampling data is adaptively compressed into 1 byte or 2 bytes or 3 bytes from original 24 bits and 3 bytes in a coding manner. Besides the foregoing data ranges, other integers that can be expressed by other 24-bit integer data with symbols are required to be expressed by 4 bytes after being operated through a compression algorithm. The method has the advantages of saving a large amount of storage space and remarkably increasing the data transmission efficiency.
Claims
1. An efficient adaptive seismic data flow lossless compression and decompression method, for efficient compression of 24-bit analog-to-digital converted geophysical prospecting equipment data which is a seismic data; wherein real-time lossless compression of data streams and an original 24-bit 3-byte format of sampled data is compressed to a 1, 2, or 3-byte format wherein a small amount of data is converted to 4 bytes using a specific encoding process the method comprising: determining required bits before data compression based on a size of an original data, compressing data in the [0-63] and [−64-−1] intervals into 1 byte; compressing data in the [64-8191] and [−8192-−65] intervals into 2 bytes; compressing data in the [8192 to 104875] and [−104876-−8193] intervals into 3 bytes; leaving unchanged occupying bytes so they are the same as the original bytes; wherein other integer numbers outside of the above data intervals are represented by other 24-bit signed integer data; wherein, these integers require 4 bytes after running the compression algorithm operation; wherein in the data compression there are a required number of bytes and positive or negative values needed to select different operators and operation codes (opcodes) for a bit operation on specified bytes; wherein the opcode is a binary number of the byte; wherein there are 2 types of operators: (1) bitwise AND (&) and (2) bitwise OR (|); setting a specified bit position; ignoring the bytes lower than the specified byte position and not processing said lower bytes; deleting bytes higher than the specified byte position as invalid; wherein operators and opcodes for a first byte should be selected under different conditions.
2. The method according to claim 1, wherein the compression algorithm operation is a cyclical operation performed on 3 bytes of each data sampling point; the method further comprising: reading a byte making a numerical judgment dividing the byte into 8 compression modes for processing based on numerical size and positive or negative value of the byte, leaving unchanged bits in terms of a magnitude of values using a core of the compression algorithm operation; and deleting redundant sign bits.
3. The method according to claim 2, wherein the compression has the following 8 compression modes: (1) an [000000h, 00003Fh] interval data uses an end byte as a compression result in order to achieve compression in a range of 1-3 bytes; because a first 2 bytes of data within the interval are both 00h, the first 2 bits of end byte are 00; (2) a [FFFFC0h, FFFFFFh] interval data uses an end byte and a BFh for a bitwise AND operation as the compression result; the end byte changes from 11bbbbbb to 10bbbbbb in order to achieve compression in 1 byte range; (3) a [000040h, 001FFFh] interval data uses a middle byte and the 40h for a bitwise OR operation; then, a result of the operation and the end byte are used as the compression result; the middle byte changes from 000bbbbb to 010bbbbb in order to achieve compression in 2 bytes range; (4) a [FFE000h, FFFFBFh] interval data uses a middle byte and a DFh for the bitwise AND operation; then, a result of the operation and the end byte are used as the compression result; the middle byte changes from 111bbbbb to 110bbbbb in order to achieve compression in 2 bytes range; (5) a [002000h, 0FFFFFh] interval data uses a first byte and the 60h for the bitwise OR operation; then, a result of the operation, a middle byte, and a end byte are used together as the compression result; the first byte changes from 0000bbbb to 0110bbbb, the middle byte and end byte remain unchanged, and data occupies 3 bytes both before and after compression; (6) a [F0000h, FFDFFFh] interval data uses a first byte and a EFh for the bitwise AND operation; then, a result of the operation, a middle byte, and an end byte are used together as the compression result; the first byte changes from 1110bbbb to 1111bbbb, the middle byte and end byte remain unchanged, and data occupies 3 bytes both before and after compression; (7) a [100000h, 7FFFFFh] interval data ensures original 3 bytes do not move, and then adds 70h as a first byte; newly formed 4 bytes function as the compression operation result; (8) a [800000h, 7FFFFFh] interval data ensures the original 3 bytes do not move, and then adds F0h as a first byte; newly formed 4 bytes function as the compression operation result.
4. The method according to claim 3, wherein modes 7 and 8 increase the number of occupied bytes from 3 bytes in the original data to 4 bytes after the compression: wherein in modes 6 and 7, 3 bits in addition to a sign bit are required for flag bits, i.e., in “0110” and “1110,” range of positive and negative data that is expressed is reduced from 23-bit 0 (1) bbbbbbb bbbbbbbb bbbbbbbb to 20-bit 0 (1) 110bbbb bbbbbbbb bbbbbbbb; wherein valid data occupying the range between 20 and 23 bits is not represented by 3 bytes; in addition, added first bytes “70h” and “F0h” in modes with 4 bytes are not unusual; wherein the flag bit requires 4 bytes of compressed data in order to use judgment codes “01110” or “11110”, specifically, the first byte is represented by any number between 01110bbb and 11110bbb, and any number in this range used as the first byte does not affect decompression byte number judgment and decompression result.
5. The method according to claim 1, wherein the data compression is independently operated on each data sample, and does not rely on other sample data and, the data stream is compressed in real-time.
6. The method according to claim 1, further comprising: decompressing the compressed data; wherein when decompression operations are conducted on binary data compressed using the compression method, the number of bytes occupied are firstly determined by each data sampling point in accordance with the compression coding rule; wherein, via a series of set decoding operations, lossless seismic compressed data are restored as 1, 2, 3, or 4 bytes to the original 24-bit 3-byte data format at any time; wherein the step of decompressing the compressed data comprises the following substeps: (1) selecting a byte using a mask determination method to judge the number of bytes; wherein the selected byte functions as a data sample from this byte; (2) recovering the byte; wherein a number of bytes are intercepted as determined in Step 1, and performing the bit operation on the first byte in order to recover valid bytes; and remaining bytes are left intact; (3) performing sign bit expansion; wherein a byte that does not affect the value is added, and only represents a symbol for the sampled data (at less than 3 bytes); (4) restoring the sampled data to a 3-byte 24-bit format.
7. The method according to claim 6, wherein beginning with the first byte of the compressed file or data stream, the byte is masked, using the byte with the F0h for the bitwise AND operation, wherein subsequent values are divided into 8 different modes based on the operation result.
8. The method according to claim 6, wherein 8 different modes are as follows: if an operation result is a value of 00h, 10h, 20h, or 30h, the operation result means that 1 byte from the original byte (including the original byte) is a positive data sampling point that is taken as an end byte, then, by adding 00h as a middle byte and 00h as a first byte, there are 3 bytes that function as a decompression result (which is a same value as a value before compression); if the operation result is a value of 80h, 90h, A0h, or B0h, the operation result means that 1 byte from the original byte (including the original byte) is a negative data sampling point, by using the original byte and the C0h for a bitwise OR operation, the result is used as an end byte, by adding FFh as the middle byte and FFh as the first byte, there are 3 bytes that function as the decompression result (which is the same value as the value before compression); if the operation result is a value of 50h or 40h, the operation result means that 2 bytes from the original byte (including the original byte) are a positive data sampling point, by using the original byte and 1Fh for a bitwise AND operation, the result is used as the end byte, a next byte of the original byte is the middle byte, by adding 00h as the first byte, there are 3 bytes that function as the decompression result (which is the same value as the value before compression); if the operation result is a value of D0h or C0h, the operation result means that 2 bytes from the original byte (including the original byte) are a negative data sampling point, by using the original byte and E0h for the bitwise OR operation, the result is used as the end byte, the next byte of the original byte is the middle byte, by adding FFh as the first byte, there are 3 bytes that function as the decompression result (which is the same value as the value before compression); if the operation result is 60h, the operation result means that 3 bytes from the original byte (including the original byte) are a positive data sampling point, by using the original byte and 0Fh for the bitwise AND operation, the results are used as the end byte, a next 2 bytes of the original byte are the middle byte and first byte, these 3 bytes are the decompression result (which is the same value as the value before compression); if the operation result is EFh, the operation result means that 3 bytes from the original byte (including the original byte) are a negative data sampling point, by using the original byte and E0h for the bitwise OR operation, the result is used as the end byte, the next 2 bytes of the original byte are the middle byte and first byte, these 3 bytes are the decompression result (which is the same value as the value before compression); if the operation result is 70h, the operation result means that 4 bytes from the original byte (including the original byte) are a positive data sampling point, a current byte is not needed, and a next 3 bytes from the current byte are the original data; if the operation result is F0h, the operation result means that 4 bytes from the original byte (including the original byte) are a negative data sampling point, the current byte is not needed, and the next 3 bytes from the current byte are the original data; wherein after decompressing a data sample using any of the above 8 modes, it is necessary to jump a pointer to a corresponding byte to perform the next sampled data judgment and the decompression operation, wherein the decompression operation jumps 1 byte after the operation of modes 1 and 2; 2 bytes after the operation of modes 3 and 4; 3 bytes after the operation of modes 5 and 6; and 4 bytes after the operation of modes 7 and 8.
9. The method according to claim 1, wherein the method is applied to 24-bit sampled seismic data files stored in binary form, after the seismic data files are processed using the compression algorithm operation (thereby significantly reducing file bytes), wherein the files are fully restored to original form by using a decompression algorithm, and for an ordinary integer 4-byte 32-bit signed integer storage of seismic data files, the ordinary integer 4-byte 32-bit signed integer storage of seismic data files are a sign bit expansion on a basis of a 24-bit integer, adding a symbol byte of 00h or FFh which does not affect the values, for four bytes corresponding to each sample, the symbol byte is ignored and thrown away, the compression algorithm operation is conducted on remaining three bytes; decompression is carried out using the method firstly and then performs the sign bit expansion and restores it into the original 4-byte integer data file; similar operations are performed on the 4-byte integer seismic data stream for compression and decompression operations.
10. The method according to claim 1, wherein the process of the efficient adaptive seismic data lossless compression and decompression method used in seismic acquisition system further comprises: acquiring a station analog-digital conversion module output 24-bit integer data; performing a master cpu or FPGA module compression operation; transferring data via network to a power station or crossover stationmaster module or FPGA module after compression, and performing a decompression operation based on a decompression algorithm to restore the 24-bit integer data; decompressing compressed data after a seismometer host system accepts it, alternatively, directly storing the compressed data in a file, and then decompressed to restore the original data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0039]
[0040]
DETAILED DESCRIPTION OF THE INVENTION
[0041]
[0042] {circle around (1)} The [000000h, 00003Fh] interval data use the end byte as the compression result in order to achieve compression in the range of 1-3 bytes. Because the first 2 bytes of data within the interval are both 00h, the first 2 bits of end byte are 00.
[0043] {circle around (2)}. The [FFFFC0h. FFFFFFh] interval data use the end byte and the BFh for the bitwise AND operation as the compression result. The end byte changes from 11bbbbbb to 10bbbbbb in order to achieve compression in the 1 byte range.
[0044] {circle around (3)} The [000040h, 001FFFh] interval data use the middle byte and the 40h for the bitwise OR operation. Then, the result of the operation and the end byte are used as the compression result: the middle byte changes from 000bbbbb to 010bbbbb in order to achieve compression in the 2 bytes range.
[0045] {circle around (4)} The [FFE000h, FFFFBFh] interval data use the middle byte and the DFh for the bitwise AND operation. Then, the result of the operation and the end byte are used as the compression result; the middle byte changes from 111bbbbb to 110bbbbb in order to achieve compression in the 2 bytes range.
[0046] {circle around (5)} The [002000h, 0FFFFFh] interval data use the first byte and the 60h for the bitwise OR operation. Then, the result of the operation, the middle byte, and the end byte are used together as the compression result; the first byte changes from 0000bbbb to 0110bbbb, the middle byte and end byte remain unchanged, and the data occupies 3 bytes both before and after compression.
[0047] {circle around (6)} The [F00000h, FFDFFFh] interval data use the first byte and the EFh for the bitwise AND operation. Then, the result of the operation, the middle byte, and the end byte are used together as the compression result: the first byte changes from 1110bbbb to 1111bbbb, the middle byte and end byte remain unchanged, and the data occupies 3 bytes both before and after compression.
[0048] {circle around (7)} The [100000h, 7FFFFFh] interval data ensure the original 3 bytes do not move, and then adds 70h as the first byte. The newly formed 4 bytes function as the compression operation result.
[0049] {circle around (8)} The [800000h, EFFFFFh] interval data ensure the original 3 bytes do not move, and then adds F0h as the first byte. The newly formed 4 bytes function as the compression operation result.
[0050] Modes 7 and 8 actually increase the number of occupied bytes from 3 bytes in the original data to 4 bytes after compression, because in modes 6 and 7, 3 bits in addition to the sign bit are required for the flag bits, i.e., in “0110” and “1110,” the range of positive and negative data that can be expressed is reduced from 23-bit 0 (1) bbbbbbb bbbbbbbb bbbbbbbb to 20-bit 0 (1) 110bbbb bbbbbbbb bbbbbbbb. Moreover, valid data occupying the range between 20 and 23 bits cannot be represented by 3 bytes. In addition, it should be noted that the added first bytes “70h” and “F0h” in modes with 4 bytes are not unusual, because the flag bit requires 4 bytes of compressed data in order to use judgment codes “01110” or “11110.” Specifically, the first byte can be represented by any number between 01110bbb and 11110bbb, and any number in this range used as the first byte does not affect the decompression byte number judgment and decompression result.
[0051] In the case of seismic data, under normal circumstances, the number of cases that meet the conditions of modes 1, 2, 3, and 4 is much larger than the number of cases that meet the conditions of modes 5-8. Modes 7 and 8 rarely occur, and therefore have a very good data compression effect as a whole.
[0052]
[0053] {circle around (1)}. If the operation result is a value of 00h, 10h, 20h, or 30h, it means that 1 byte from the original byte (including the original byte) is a positive data sampling point that can be taken as an end byte. Then, by adding 00h as the middle byte and 00h as the first byte, there are 3 bytes that function as the decompression result (which is the same value as the value before compression).
[0054] {circle around (2)} If the operation result is a value of 80h, 90h, A0h, or B0h, it means that 1 byte from the original byte (including the original byte) is a negative data sampling point. By using the original byte and the C0h for the bitwise OR operation, you can use the result as an end byte. By adding FFh as the middle byte and FFh as the first byte, there are 3 bytes that function as the decompression result (which is the same value as the value before compression).
[0055] {circle around (3)} If the operation result is a value of 50h or 40h, it means that 2 bytes from the original byte (including the original byte) are a positive data sampling point. By using the original byte and 1Fh for the bitwise AND operation, you can use the result as the end byte. The next byte of the original byte is the middle byte. By adding 00h as the first byte, there are 3 bytes that function as the decompression result (which is the same value as the value before compression).
[0056] {circle around (4)} If the operation result is a value of D0h or C0h, it means that 2 bytes from the original byte (including the original byte) are a negative data sampling point. By using the original byte and E0h for the bitwise OR operation, you can use the result as the end byte. The next byte of the original byte is the middle byte. By adding FFh as the first byte, there are 3 bytes that function as the decompression result (which is the same value as the value before compression).
[0057] {circle around (5)} If the operation result is 60h, it means that 3 bytes from the original byte (including the original byte) are a positive data sampling point. By using the original byte and 0Fh for the bitwise AND operation, you can use the results as the end byte. The next 2 bytes of the original byte are the middle byte and first byte. These 3 bytes are the decompression result (which is the same value as the value before compression).
[0058] {circle around (6)} If the operation result is EFh, it means that 3 bytes from the original byte (including the original byte) are a negative data sampling point. By using the original byte and E0h for the bitwise OR operation, you can use the result as the end byte. The next 2 bytes of the original byte are the middle byte and first byte. These 3 bytes are the decompression result (which is the same value as the value before compression).
[0059] {circle around (7)} If the operation result is 70h, it means that 4 bytes from the original byte (including the original byte) are a positive data sampling point. The current byte is not needed, and the next 3 bytes from the current byte are the original data.
[0060] {circle around (8)} If the operation result is F0h, it means that 4 bytes from the original byte (including the original byte) are a negative data sampling point. The current byte is not needed, and the next 3 bytes from the current byte are the original data.
[0061] After decompressing a data sample using any of the above 8 modes, it is necessary to jump the pointer to the corresponding byte to perform the next sampled data judgment and decompression operation. Specifically, it is necessary to jump 1 byte after the operation of modes 1 and 2; 2 bytes after the operation of modes 3 and 4; 3 bytes after the operation of modes 5 and 6; and 4 bytes after the operation of modes 7 and 8.
[0062] Table. 1 shows the operators and opcode table required for efficient adaptive seismic data flow lossless compression and decompression method
TABLE-US-00001 TABLE 1 Operators and opcode table required for efficient adaptive seismic data flow lossless compression and decompression method Occcupied Number bytes of bytes Binary determination required form of the Use mask Decompression Interval of numbers after compressed Compression judgment Valid byte Sign bit before compression compression value Operator Opcode (& F0h) correction* expansion [000000h, 00003Fh] 1 00bbbbbb 00h or 10h or Add 2 bytes [0, 63] 20h or 30h 0000h [FFFFC0h, FFFFFFh] 10bbbbbb End BFh 80h or 90h or | C0h Add 2 bytes [-64, −1] byte & A0h or B0h FFFFh [000040h, 001FFFh] 2 010bbbbb Middle 40h 50h or 40h & 1Fh Add 00h as [64, 8191] bbbbbbbb byte | the first byte [FFE000h, FFFFBFh] 110bbbbb Middle DFh D0h or C0h | E0h Add FFh as [−8192, −65] bbbbbbbb byte & the first byte [002000h, 0FFFFFh] 3 0110bbbb First 60h 60h & 0Fh [8192, 1048575] bbbbbbbb byte | bbbbbbbb [F00000h, FFDFFFh] 1110bbbb Firsst EFh E0h | F0h [−1048576, −8193] bbbbbbbb byte & bbbbbbbb [100000h, 7FFFFFh] 4 01110000 Add first 70h 70h Delete [1048576, 8388607] bbbbbbbb byte the first bbbbbbbb byte bbbbbbbb [800000h, EFFFFFh] 11110000 Add first F0h F0h [−8388608, −1048577] 1bbbbbbb byte bbbbbbbb bbbbbbbb Note: *Operation on compressed first byte