Video Data Compression Systems

20170257629 · 2017-09-07

Assignee

Inventors

Cpc classification

International classification

Abstract

Data compression and decompression methods for compressing and decompressing data based on an actual or expected throughput (bandwidth) of a system. In one embodiment, a controller tracks and monitors the throughput (data storage and retrieval) of a data compression system and generates control signals to enable/disable different compression algorithms when, e.g., a bottleneck occurs so as to increase the throughput and eliminate the bottleneck.

Claims

1. (canceled)

2. A system, comprising: one or more different asymmetric data decompression algorithms, wherein each algorithm of the one or more different asymmetric data decompression algorithms utilizes one or more asymmetric data decompression routines of a plurality of different asymmetric data decompression routines, wherein a first asymmetric data decompression routine of the plurality of different asymmetric data decompression routines is configured to produce decompressed data with a higher data rate for a given data throughput than a second asymmetric data decompression routine of the plurality of different asymmetric data decompression routines; and a processor configured: to analyze one or more data parameters from one or more data blocks containing video data, wherein at least one data parameter relates to an expected or anticipated throughput of a communications channel; and to select two or more different data decompression routines from among a plurality of different data decompression routines based upon, at least in part, the one or more data parameters relating to the expected throughput of the communications channel.

3. The system of claim 2, wherein the expected or anticipated throughput of the communications channel is known apriori.

4. The system of claim 2, wherein at least one of the one or more different asymmetric data decompression routines utilizes a lossless data decompression algorithm.

5. The system of claim 2, wherein at least one of the selected two or more different data decompression routines comprises: a standardized data decompression algorithm capable of decompressing the video data.

6. The system of claim 2, wherein at least one of the one or more data parameters comprises a resolution of the one or more data blocks containing video data.

7. The system of claim 2, wherein at least one of the one or more data parameters comprises an attribute or a value related to a format or a syntax of video data contained in the one or more data blocks containing video data.

8. The system of claim 2, wherein the selected two or more different data decompression routines utilize a content-dependent data decompression routine.

9. The system of claim 8, wherein the content-dependent data decompression routine utilizes an arithmetic algorithm.

10. The system of claim 2, wherein the selected two or more different data decompression routines are configured to perform decompression in real-time or substantially real-time.

11. The system of claim 2, wherein the communications channel comprises a distributed network.

12. The system of claim 11, wherein the distributed network comprises the Internet.

13. The system of claim 2, wherein the selected two or more different data decompression routines are utilized to decompress the one or more data blocks containing video data to create one or more decompressed data blocks, and wherein a descriptor is associated with the one or more decompressed data blocks that indicates the selected two or more different data decompression routines.

14. The system of claim 2, wherein the selected two or more different data decompression routines are utilized to decompress the one or more data blocks containing video data to create one or more decompressed data blocks, and wherein a descriptor indicating the selected two or more different data decompression routines is included with the one or more decompressed data blocks.

15. The system of claim 2, wherein at least one of the one or more data parameters comprises a video data profile.

16. A system, comprising: a plurality of different asymmetric data decompression encoders, wherein each asymmetric data decompression encoder of the plurality of different asymmetric data decompression encoders is configured to utilize one or more data decompression algorithms, and wherein a first asymmetric data decompression encoder of the plurality of different asymmetric data decompression encoders is configured to decompress data blocks containing video or image data at a higher data decompression rate than a second asymmetric data decompression encoder of the plurality of different asymmetric data decompression encoders; and one or more processors configured to: determine one or more data parameters, at least one of the one or more data parameters relating to a throughput of a communications channel measured in bits per second; and select one or more asymmetric data decompression encoders from among the plurality of different asymmetric data decompression encoders based upon, at least in part, the determined one or more data parameters.

17. The system of claim 16, wherein at least one of the plurality of different asymmetric data decompression encoders is configured to utilize an arithmetic algorithm.

18. The system of claim 16, wherein the throughput of the communications channel comprises: an estimated throughput of the communications channel.

19. The system of claim 16, wherein the throughput of the communications channel comprises: an expected throughput of the communications channel.

20. The system of claim 16, wherein the selected one or more asymmetric data decompression encoders are configured to decompress the data blocks containing video or image data for output at different data transmission rates measured in bits per second to produce a plurality of decompressed data blocks.

21. The system of claim 16, wherein at least one of the plurality of different asymmetric data decompression encoders is configured to utilize a standardized data decompression algorithm capable of decompressing video data.

22. The system of claim 16, wherein at least one of the one or more data parameters comprises: a resolution of the data blocks containing video or image data.

23. The system of claim 16, wherein at least one of the one or more data parameters comprises: a data transmission rate of the data blocks containing video or image data.

24. The system of claim 16, wherein at least one of the one or more data parameters comprises: an attribute or a value related to a format or a syntax of video or image data contained in the data blocks containing video or image data.

25. The system of claim 16, wherein the selected one or more asymmetric data decompression encoders are configured to utilize a content-dependent data decompression algorithm.

26. The system of claim 25, wherein the content-dependent data decompression algorithm comprises: an arithmetic algorithm.

27. The system of claim 16, wherein the selected one or more asymmetric data decompression encoders are configured to perform decompression in real-time or substantially real-time.

28. The system of claim 16, wherein the communications channel comprises: a distributed network.

29. The system of claim 28, wherein the distributed network comprises: the Internet.

30. The system of claim 16, wherein the selected one or more asymmetric data decompression encoders are utilized to decompress the data blocks containing video or image data to create one or more decompressed data blocks, and wherein a descriptor is associated with the one or more decompressed data blocks that indicates the selected one or more asymmetric data decompression encoders.

31. The system of claim 16, wherein the selected one or more asymmetric data decompression encoders are utilized to decompress the data blocks containing video or image data to create one or more decompressed data blocks, and wherein a descriptor indicating the selected one or more asymmetric data decompression encoders is included with the one or more decompressed data blocks.

32. The system of claim 16, wherein at least one of the one or more data parameters comprises: a video or image data profile.

33. The system of claim 16, wherein the one or more processors are further configured to encode each of the data blocks containing video or image data with a plurality of the selected one or more asymmetric data decompression encoders to create decompressed data blocks.

34. The system of claim 33, further comprising: a memory for storing the decompressed data blocks.

35. A system, comprising: a plurality of video data decompression encoders; wherein at least one of the plurality of video data decompression encoders is configured to utilize an asymmetric data decompression algorithm, and wherein at least one of the plurality of video data decompression encoders is configured to utilize an arithmetic data decompression algorithm, wherein a first video data decompression encoder of the plurality of video data decompression encoders is configured to decompress at a higher decompression ratio than a second data decompression encoder of the plurality of data decompression encoders; and one or more processors configured to: determine one or more data parameters, at least one of the one or more data parameters relating to a throughput of a communications channel; and select one or more video data decompression encoders from among the plurality of video data decompression encoders based upon, at least in part, the determined one or more data parameters.

36. The system of claim 35, wherein the throughput of the communications channel comprises: an estimated or expected throughput of the communications channel.

37. The system of claim 35, wherein the selected one or more video data decompression encoders are configured to decompress one or more data blocks containing video data for different data transmission rates measured in bits per second to produce a plurality of decompressed data blocks.

38. The system of claim 35, wherein at least one of the one or more data parameters are related to a resolution of one or more data blocks containing video data.

39. The system of claim 35, wherein at least one of the one or more data parameters comprises: a data transmission rate of one or more data blocks containing video data.

40. The system of claim 35, wherein at least one of the one or more data parameters comprises: an attribute or a value related to a format or a syntax of video data contained in one or more data blocks containing video data.

41. The system of claim 35, wherein the selected one or more video data decompression encoders are configured to perform data decompression in real-time or substantially real-time.

42. The system of claim 35, wherein the communications channel comprises: a distributed network or the Internet.

43. The system of claim 35, wherein one or more data blocks containing video data are decompressed with the selected one or more video data decompression encoders to create one or more decompressed data blocks, and wherein a descriptor is associated with the one or more decompressed data blocks that indicates the selected one or more video data decompression encoders.

44. The system of claim 35, wherein the one or more processors are configured to encode each of one or more data blocks with a plurality of the selected one or more asymmetric data decompression encoders to create decompressed data blocks.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0057] FIG. 1 is a high-level block diagram of a system for providing bandwidth sensitive data compression/decompression according to an embodiment of the present invention.

[0058] FIG. 2 is a flow diagram of a method for providing bandwidth sensitive data compression/decompression according to one aspect of the present invention.

[0059] FIG. 3 is a block diagram of a preferred system for implementing a bandwidth sensitive data compression/decompression method according to an embodiment of the present invention.

[0060] FIG. 4A is a diagram of a file system format of a virtual and/or physical disk according to an embodiment of the present invention.

[0061] FIG. 4B is a diagram of a data structure of a sector map entry of a virtual block table according to an embodiment of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0062] The present invention is directed to a system and method for compressing and decompressing based on the actual or expected throughput (bandwidth) of a system employing data compression. Although one of ordinary skill in the art could readily envision various implementations for the present invention, a preferred system in which this invention is employed comprises a data storage controller that preferably utilizes a real-time data compression system to provide “accelerated” data storage and retrieval bandwidths. The concept of “accelerated” data storage and retrieval was introduced in U.S. patent application Ser. No. 09/266,394, filed Mar. 11, 1999, entitled “System and Methods For Accelerated Data Storage and Retrieval,” now U.S. Pat. No. 6,601,104, and U.S. patent application Ser. No. 09/481,243, filed Jan. 11, 2000, entitled “System and Methods For Accelerated Data Storage and Retrieval,” now U.S. Pat. No. 6,604,158, both of which are commonly assigned and incorporated herein by reference.

[0063] In general, as described in the above-incorporated applications, “accelerated” data storage comprises receiving a digital data stream at a data transmission rate which is greater than the data storage rate of a target storage device, compressing the input stream at a compression rate that increases the effective data storage rate of the target storage device and storing the compressed data in the target storage device. For instance, assume that a mass storage device (such as a hard disk) has a data storage rate of 20 megabytes per second. If a storage controller for the mass storage device is capable of compressing (in real time) an input data stream with an average compression rate of 3:1, then data can be stored in the mass storage device at a rate of 60 megabytes per second, thereby effectively increasing the storage bandwidth (“storewidth”) of the mass storage device by a factor of three. Similarly, accelerated data retrieval comprises retrieving a compressed digital data stream from a target storage device at the rate equal to, e.g., the data access rate of the target storage device and then decompressing the compressed data at a rate that increases the effective data access rate of the target storage device. Advantageously, providing accelerated data storage and retrieval at (or close to) real-time can reduce or eliminate traditional bottlenecks associated with, e.g., local and network disk accesses.

[0064] In a preferred embodiment, the present invention is implemented for providing accelerated data storage and retrieval. In one embodiment, a controller tracks and monitors the throughput (data storage and retrieval) of a data compression system and generates control signals to enable/disable different compression algorithms when, e.g., a bottleneck occurs so as to increase the throughput and eliminate the bottleneck.

[0065] In the following description of preferred embodiments, two categories of compression algorithms are defined—an “asymmetrical” data compression algorithm and a “symmetrical data compression algorithms. An asymmetrical data compression algorithm is referred to herein as one in which the execution time for the compression and decompression routines differ significantly. In particular, with an asymmetrical algorithm, either the compression routine is slow and the decompression routine is fast or the compression routine is fast and the decompression routine is slow. Examples of asymmetrical compression algorithms include dictionary-based compression schemes such as Lempel-Ziv.

[0066] On the other hand, a “symmetrical” data compression algorithm is referred to herein as one in which the execution time for the compression and the decompression routines are substantially similar. Examples of symmetrical algorithms include table-based compression schemes such as Huffman. For asymmetrical algorithms, the total execution time to perform one compress and one decompress of a data set is typically greater than the total execution time of symmetrical algorithms. But an asymmetrical algorithm typically achieves higher compression ratios than a symmetrical algorithm.

[0067] It is to be appreciated that in accordance with the present invention, symmetry may be defined in terms of overall effective bandwidth, compression ratio, or time or any combination thereof. In particular, in instances of frequent data read/writes, bandwidth is the optimal parameter for symmetry. In asymmetric applications such as operating systems and programs, the governing factor is net decompression bandwidth, which is a function of both compression speed, which governs data retrieval time, and decompression speed, wherein the total governs the net effective data read bandwidth. These factors work in an analogous manner for data storage where the governing factors are both compression ratio (storage time) and compression speed. The present invention applies to any combination or subset thereof, which is utilized to optimize overall bandwidth, storage space, or any operating point in between.

[0068] Referring now to FIG. 1, a high-level block diagram illustrates a system for providing bandwidth sensitive data compression/decompression according to an embodiment of the present invention. In particular, FIG. 1 depicts a host system 10 comprising a controller 11 (e.g., a file management system), a compression/decompression (or data compression) system 12, a plurality of compression algorithms 13, a storage medium 14, and a plurality of data profiles 15. The controller tracks and monitors the throughput (e.g., data storage and retrieval) of the data compression system 12 and generates control signals to enable/disable different compression algorithms 13 when the throughput falls below a predetermined threshold. In one embodiment, the system throughput that is tracked by the controller 11 preferably comprises a number of pending access requests to the memory system.

[0069] The data compression system 12 is operatively connected to the storage medium 14 using suitable protocols to write and read compressed data to and from the storage medium 14. It is to be understood that the storage medium 14 may comprise any form of memory device including all forms of sequential, pseudo-random, and random access storage devices. The storage medium 14 may be volatile or non-volatile in nature, or any combination thereof. Storage medium as known within the current art include all forms of random access memory, magnetic and optical tape, magnetic and optical disks, along with various other forms of solid-state mass storage media. Thus it should be noted that the current invention applies to all forms and manners of storage media including, but not limited to, storage mediums utilizing magnetic, optical, and chemical techniques, or any combination thereof. The data compression system 12 preferably operates in real-time (or substantially real-time) to compress data to be stored on the storage medium 14 and to decompress data that is retrieved from the storage medium 14. The data compression system 12 may maintain the compressed data to be stored on the storage medium 14 and the decompressed data that is retrieved from the storage medium 14 for subsequent data processing, storage, or transmittal. In addition, the data compression system 12 may receive data (compressed or not compressed) via an I/O (input/output) port 16 that is transmitted over a transmission line or communication channel from a remote location, and then process such data (e.g., decompress or compress the data). The data compression system 12 may further transmit data (compressed or decompressed) via the I/O port 16 to another network device for remote processing or storage.

[0070] The controller 11 utilizes information comprising a plurality of data profiles 15 to determine which compression algorithms 13 should be used by the data compression system 12. In a preferred embodiment, the compression algorithms 13 comprise one or more asymmetric algorithms. As noted above, with asymmetric algorithms, the compression ratio is typically greater than the compression ratios obtained using symmetrical algorithms. Preferably, a plurality of asymmetric algorithms are selected to provide one or more asymmetric algorithms comprising a slow compress and fast decompress routine, as well as one or more asymmetric algorithms comprising a fast compress and slow decompress routine.

[0071] The compression algorithms 13 further comprise one or more symmetric algorithms, each having a compression rate and corresponding decompression rate that is substantially equal. Preferably, a plurality of symmetric algorithms are selected to provide a desired range of compression and decompression rates for data to be processed by a symmetric algorithm.

[0072] In a preferred embodiment, the overall throughput (bandwidth) of the host system 10 is one factor considered by the controller 11 in deciding whether to use an asymmetrical or symmetrical compression algorithm for processing data stored to, and retrieved from, the storage medium 14. Another factor that is used to determine the compression algorithm is the type of data to be processed. In a preferred embodiment, the data profiles 15 comprise information regarding predetermined access profiles of different data sets, which enables the controller 11 to select a suitable compression algorithm based on the data type. For instance, the data profiles may comprise a map that associates different data types (based on, e.g., a file extension) with preferred one(s) of the compression algorithms 13. For example, preferred access profiles considered by the controller 11 are set forth in the following table.

TABLE-US-00001 Access Profile 1: Access Profile 2 Access Profile 3 Data is written to a Data is written The amount of times data storage medium once to the storage is read from and written (or very few times) medium often to the storage medium is but is read from the but read few substantially the same. storage medium many times Times

[0073] With Access Profile 1, the decompression routine would be executed significantly more times than the corresponding compression routine. This is typical with operating systems, applications and websites, for example. Indeed, an asymmetrical application can be used to (offline) compress an (OS) operating system, application or Website using a slow compression routine to achieve a high compression ratio. After the compressed OS, application or website is stored, the asymmetric algorithm is then used during runtime to decompress, at a significant rate, the OS, application or website launched or accessed by a user.

[0074] Therefore, with data sets falling within Access Profile 1, it is preferable to utilize an asymmetrical algorithm that provides a slow compression routine and a fast decompression routine so as to provide an increase in the overall system performance as compared the performance that would be obtained using a symmetrical algorithm. Further, the compression ratio obtained using the asymmetrical algorithm would likely be higher than that obtained using a symmetrical algorithm (thus effectively increasing the storage capacity of the storage device).

[0075] With Access Profile 2, the compression routine would be executed significantly more times than the decompression routine. This is typical with a system for automatically updating an inventory database, for example, wherein an asymmetric algorithm that provides a fast compression routine and a slow decompression routine would provide an overall faster (higher throughput) and efficient (higher compression ratio) system performance than would be obtained using a symmetrical algorithm.

[0076] With Access Profile 3, where data is accessed with a similar number of reads and writes, the compression routine would be executed approximately the same number of times as the decompression routine. This is typical of most user-generated data such as documents and spreadsheets. Therefore, it is preferable to utilize a symmetrical algorithm that provides a relatively fast compression and decompression routine. This would result in an overall system performance that would be faster as compared to using an asymmetrical algorithm (although the compression ratio achieved may be lower).

[0077] The following table summarizes the three data access profiles and the type of compression algorithm that would produce optimum throughput.

TABLE-US-00002 Compressed Example Data Decom- Access Data Compression Charac- pression Profile Types Algorithm teristics Algorithm 1. Write few, Operating Asym- Very high Asym- Read many systems, metrical compression metrical Programs, (Slow ratio (Fast Web sites compress) decompress) 2. Write Automatically Asym- Very high Asym- many, Read updated metrical compression metrical few inventory (Fast ratio (Slow database compress) decompress) 3. Similar User Sym- Standard Sym- number of generated metrical compression metrical Reads and documents ratio Writes

[0078] In accordance with the present invention, the access profile of a given data set is known a priori or determined prior to compression so that the optimum category of compression algorithm can be selected. As explained below, the selection process may be performed either manually or automatically by the controller 11 of the data compression system 12. Further, the decision regarding which routines will be used at compression time (write) and at decompression time (read) is preferably made before or at the time of compression. This is because once data is compressed using a certain algorithm, only the matching decompression routine can be used to decompress the data, regardless of how much processing time is available at the time of decompression.

[0079] Referring now to FIG. 2, a flow diagram illustrates a method for providing bandwidth sensitive data compression according to one aspect of the present invention. For purposes of illustration, it is assumed that the method depicted in FIG. 2 is implemented with a disk controller for providing accelerated data storage and retrieval from a hard disk on a PC (personal computer). The data compression system is initialized during a boot-up process after the PC is powered-on and a default compression/decompression routine is instantiated (step 20).

[0080] In a preferred embodiment, the default algorithm comprises an asymmetrical algorithm since an operating system and application programs will be read from hard disk memory and decompressed during the initial use of the host system 10. Indeed, as discussed above, an asymmetric algorithm that provides slow compression and fast decompression is preferable for compressing operating systems and applications so as to obtain a high compression ratio (to effectively increase the storage capacity of the hard disk) and fast data access (to effectively increase the retrieval rate from the hard disk). The initial asymmetric routine that is applied (by, e.g., a vendor) to compress the operating system and applications is preferably set as the default. The operating system will be retrieved and then decompressed using the default asymmetric routine (step 21).

[0081] During initial runtime, the controller will maintain use the default algorithm until certain conditions are met. For instance, if a read command is received (affirmative result in step 22), the controller will determine whether the data to be read from disk can be compressed using the current routine (step 23). For this determination, the controller could, e.g., read a flag value that indicates the algorithm that was used to compress the file. If the data can be decompressed using the current algorithm (affirmative determination in step 23), then the file will be retrieved and decompressed (step 25). On the other hand, if the data cannot be decompressed using the current algorithm (negative determination in step 23), the controller will issue the appropriate control signal to the compression system to load the algorithm associated with the file (step 24) and, subsequently, decompress the file (step 25).

[0082] If a write command is received (affirmative result in step 26), the data to be stored will be compressed using the current algorithm (step 27). During the process of compression and storing the compressed data, the controller will track the throughput to determine whether the throughput is meeting a predetermined threshold (step 28). For example, the controller may track the number of pending disk accesses (access requests) to determine whether a bottleneck is occurring. If the throughput of the system is not meeting the desired threshold (e.g., the compression system cannot maintain the required or requested data rates)(negative determination in step 28), then the controller will command the data compression system to utilize a compression routine providing faster compression (e.g., a fast symmetric compression algorithm) (step 29) so as to mitigate or eliminate the bottleneck.

[0083] If, on the other hand, the system throughput is meeting or exceeding the threshold (affirmative determination in step 28) and the current algorithm being used is a symmetrical routine (affirmative determination in step 30), in an effort to achieve optimal compression ratios, the controller will command the data compression system to use an asymmetric compression algorithm (step 31) that may provide a slower rate of compression, but provide efficient compression.

[0084] This process is repeated such that whenever the controller determines that the compression system can maintain the required/requested data throughput using a slow (highly efficient) asymmetrical compression algorithm, the controller will allow the compression system to operate in the asymmetrical mode. This will allow the system to obtain maximum storage capacity on the disk. Further, the controller will command the compression system to use a symmetric routine comprising a fast compression routine when the desired throughput is not met. This will allow the system to, e.g., service the backlogged disk accesses. Then, when the controller determines that the required/requested data rates are subsequently lower and the compression system can maintain the data rate, the controller can command the compression system to use a slower (but more efficient) asymmetric compression algorithm.

[0085] With the above-described method depicted in FIG. 2, the selection of the compression routine is performed automatically by the controller so as to optimize system throughput. In another embodiment, a user that desires to install a program or text files, for example, can command the system (via a software utility) to utilize a desired compression routine for compressing and storing the compressed program or files to disk. For example, for a power user, a GUI menu can be displayed that allows the user to directly select a given algorithm. Alternatively, the system can detect the type of data being installed or stored to disk (via file extension, etc.) and automatically select an appropriate algorithm using the Access Profile information as described above. For instance, the user could indicate to the controller that the data being installed comprises an application program which the controller would determine falls under Access Profile 1. The controller would then command the compression engine to utilize an asymmetric compression algorithm employing a slow compression routine and a fast decompression routine. The result would be a one-time penalty during program installation (slow compression), but with fast access to the data on all subsequent executions (reads) of the program, as well as a high compression ratio.

[0086] It is to be appreciated that the present invention may be implemented in any data processing system, device, or apparatus using data compression. For instance, the present invention may be employed in a data transmission controller in a network environment to provide accelerated data transmission over a communication channel (i.e., effectively increase the transmission bandwidth by compressing the data at the source and decompressing data at the receiver, in real-time).

[0087] Further, the present invention can be implemented with a data storage controller utilizing data compression and decompression to provided accelerated data storage and retrieval from a mass storage device. Exemplary embodiments of preferred data storage controllers in which the present invention may be implemented are described, for example, in U.S. patent application Ser. No. 09/775,905, filed on Feb. 2, 2001, entitled “Data Storewidth Accelerator”, now U.S. Pat. No. 6,748,457, which is commonly assigned and fully incorporated herein by reference.

[0088] FIG. 3 illustrates a preferred embodiment of a data storage controller 120 as described in the above-incorporated U.S. Ser. No. 09/775,905, now U.S. Pat. No. 6,748,457, for implementing a bandwidth sensitive data compression protocol as described herein. The data storage controller 120 comprises a DSP (digital signal processor) 121 (or any other micro-processor device) that implements a data compression/decompression routine. The DSP 121 preferably employs a plurality of symmetric and asymmetric compression/decompression as described herein. The data storage controller 120 further comprises at least one programmable logic device 122 (or volatile logic device). The programmable logic device 122 preferably implements the logic (program code) for instantiating and driving both a disk interface 114 and a bus interface 115 and for providing full DMA (direct memory access) capability for the disk and bus interfaces 114, 115. Further, upon host computer power-up and/or assertion of a system-level “reset” (e.g., PCI Bus reset), the DSP 121 initializes and programs the programmable logic device 122 before of the completion of initialization of the host computer. This advantageously allows the data storage controller 120 to be ready to accept and process commands from the host computer (via the bus 116) and retrieve boot data from the disk (assuming the data storage controller 120 is implemented as the boot device and the The data storage controller 120 further comprises a plurality of memory devices including a RAM (random access memory) device 123 and a ROM (read only memory) device 124 (or FLASH memory or other types of non-volatile memory). The RAM device 123 is utilized as on-board cache and is preferably implemented as SDRAM. The ROM device 124 is utilized for non-volatile storage of logic code associated with the DSP 121 and configuration data used by the DSP 121 to program the programmable logic device 122.

[0089] The DSP 121 is operatively connected to the memory devices 123, 124 and the programmable logic device 122 via a local bus 125. The DSP 121 is also operatively connected to the programmable logic device 122 via an independent control bus 126. The programmable logic device 122 provides data flow control between the DSP 121 and the host computer system attached to the bus 116, as well as data flow control between the DSP 121 and the storage device. A plurality of external I/O ports 127 are included for data transmission and/or loading of one or more programmable logic devices. Preferably, the disk interface 114 driven by the programmable logic device 122 supports a plurality of hard drives.

[0090] The storage controller 120 further comprises computer reset and power up circuitry 128 (or “boot configuration circuit”) for controlling initialization (either cold or warm boots) of the host computer system and storage controller 120. A preferred boot configuration circuit and preferred computer initialization systems and protocols are described in U.S. patent application Ser. No. 09/775,897, filed on Feb. 2, 2001, entitled “System and Methods For Computer Initialization,” published as U.S. Patent Publication No. US 2001-0047473 A1, now abandoned, which is commonly assigned and incorporated herein by reference. Preferably, the boot configuration circuit 128 is employed for controlling the initializing and programming the programmable logic device 122 during configuration of the host computer system (i.e., while the CPU of the host is held in reset). The boot configuration circuit 128 ensures that the programmable logic device 122 (and possibly other volatile or partially volatile logic devices) is initialized and programmed before the bus 116 (such as a PCI bus) is fully reset. In particular, when power is first applied to the boot configuration circuit 128, the boot configuration circuit 28 generates a control signal to reset the local system (e.g., storage controller 120) devices such as a DSP, memory, and I/O interfaces. Once the local system is powered-up and reset, the controlling device (such as the DSP 121) will then proceed to automatically determine the system environment and configure the local system to work within that environment. By way of example, the DSP 121 of the disk storage controller 120 would sense that the data storage controller 120 is on a PCI computer bus (expansion bus) and has attached to it a hard disk on an IDE interface. The DSP 121 would then load the appropriate PCI and IDE interfaces into the programmable logic device 122 prior to completion of the host system reset. Once the programmable logic device 122 is configured for its environment, the boot device controller is reset and ready to accept commands over the computer/expansion bus 116.

[0091] It is to be understood that the data storage controller 120 may be utilized as a controller for transmitting data (compressed or uncompressed) to and from remote locations over the DSP I/O ports 127 or bus 116, for example. Indeed, the I/O ports 127 of the DSP 121 may be used for transmitting data (compressed or uncompressed) that is either retrieved from the disk or received from the host system via the bus 116, to remote locations for processing and/or storage. Indeed, the I/O ports 127 may be operatively connected to other data storage controllers or to a network communication channels. Likewise, the data storage controller 120 may receive data (compressed or uncompressed) over the I/O ports 127 of the DSP 121 from remote systems that are connected to the I/O ports 127 of the DSP, for local processing by the data storage controller 120. For instance, a remote system may remotely access the data storage controller 120 (via the I/O ports of the DSP or the bus 116) to utilize the data compression, in which case the data storage controller 120 would transmit the compressed data back to the system that requested compression.

[0092] In accordance with the present invention, the system (e.g., data storage controller 120) preferably boots-up in a mode using asymmetrical data compression. It is to be understood that the boot process would not be affected whether the system boots up defaulting to an asymmetrical mode or to a symmetrical mode. This is because during the boot process of the computer, it is reading the operating system from the disk, not writing. However, once data is written to the disk using a compression algorithm, it must retrieve and read the data using the corresponding decompression algorithm.

[0093] As the user creates, deletes and edits files, the data storage controller 120 will preferably utilize an asymmetrical compression routine that provides slow compression and fast decompression. Since using the asymmetrical compression algorithm will provide slower compression than a symmetrical algorithm, the file system of the computer will track whether the data storage controller 120 has disk accesses pending. If the data storage controller 120 does have disk accesses pending and the system is starting to slow down, the file management system will command the data storage controller 120 to use a faster symmetrical compression algorithm. If there are no disk access requests pending, the file management system will leave the disk controller in the mode of using the asymmetrical compression algorithm.

[0094] If the data storage controller 120 was switched to using a symmetrical algorithm, the file management system will preferably signal the controller to switch back to a default asymmetrical algorithm when, e.g., the rate of the disk access requests slow to the point where there are no pending disk accesses.

[0095] At some point a user may decide to install software or load files onto the hard disk. Before installing the software, for example, as described above, the user could indicate to the data storage controller 120 (via a software utility) to enter and remain in an asymmetric mode using an asymmetric compression algorithm with a slow compression routine and a very fast decompression routine. The disk controller would continue to use the asymmetrical algorithm until commanded otherwise, regardless of the number of pending disk accesses. Then, after completing the software installation, the user would then release the disk controller from this “asymmetrical only” mode of operation (via the software utility).

[0096] Again, when the user is not commanding the data storage controller 120 to remain in a certain mode, the file management system will determine whether the disk controller should use the asymmetrical compression algorithms or the symmetrical compression algorithms based on the amount of backlogged disk activity. If the backlogged disk activity exceeds a threshold, then the file management system will preferably command the disk controller to use a faster compression algorithm, even though compression performance may suffer. Otherwise, the file management system will command the disk controller to use the asymmetrical algorithm that will yield greater compression performance.

[0097] It is to be appreciated that the data compression methods described herein by be integrated or otherwise implemented with the content independent data compression methods described in the above-incorporated U.S. Pat. Nos. 6,195,024 and 6,309,424.

[0098] FIG. 4A is a diagram of a file system format of a virtual and/or physical disk according to an embodiment of the present invention.

[0099] In yet another embodiment of the present invention, a virtual file management system is utilized to store, retrieve, or transmit compressed and/or accelerated data. In one embodiment of the present invention, a physical or virtual disk is utilized employing a representative file system format as illustrated in FIG. 4A. As shown in FIG. 4A, a virtual file system format comprises one or more data items. For instance, a “Superblock” denotes a grouping of configuration information necessary for the operation of the disk management system. The Superblock typically resides in the first sector of the disk. Additional copies of the Superblock are preferably maintained on the disk for backup purposes. The number of copies will depend on the size of the disk. One sector is preferably allocated for each copy of the Superblock on the disk, which allows storage to add additional parameters for various applications. The Superblock preferably comprises information such as (i) compress size; (ii) virtual block table address; (iii) virtual block table size; (iv) allocation size; (v) number of free sectors (approximate); (vi) ID (“Magic”) number; and (vii) checksum.

[0100] The “compress size” refers to the maximum uncompressed size of data that is grouped together for compression (referred to as a “data chunk”). For example, if the compress size is set to 16 k and a 40 k data block is sent to the disk controller for storage, it would be divided into two 16 k chunks and one 8 k chunk. Each chunk would be compressed separately and possess its own header. As noted above, for many compression algorithms, increasing the compression size will increase the compression ratio obtained. However, even when a single byte is needed from a compressed data chunk, the entire chunk must be decompressed, which is a tradeoff with respect to using a very large compression size.

[0101] The “virtual block table address” denotes the physical address of the virtual block table. The “virtual block table size” denotes the size of the virtual block table.

[0102] The “allocation size” refers to the minimum number of contiguous sectors on the disk to reserve for each new data entry. For example, assuming that 4 sectors are allowed for each allocation and that a compressed data entry requires only 1 sector, then the remaining 3 sectors would be left unused. Then, if that piece of data were to be appended, there would be room to increase the data while remaining contiguous on the disk. Indeed, by maintaining the data contiguously, the speed at which the disk can read and write the data will increase. Although the controller preferably attempts to keep these unused sectors available for expansion of the data, if the disk were to fill up, the controller could use such sectors to store new data entries. In this way, a system can be configured to achieve greater speed, while not sacrificing disk space. Setting the allocation size to 1 sector would effectively disable this feature.

[0103] The “number Of free sectors” denotes the number of physical free sectors remaining on the disk. The ID (“Magic) number” identifies this data as a Superblock. The “checksum” comprises a number that changes based on the data in the Superblock and is used for error checking. Preferably, this number is chosen so that all of the words in the Superblock (including the checksum) added up are equal to zero.

[0104] FIG. 4B is a diagram of a data structure of a sector map entry of a virtual block table according to an embodiment of the present invention.

[0105] The “virtual block table” (VET) comprises a number of “sector map” entries, one for each grouping of compressed data (or chunks). The VET may reside anywhere on the disk. The size of the VBT will depend on how much data is on the disk. Each sector map entry comprises 8 bytes. Although there is preferably only one VBT on the disk, each chunk of compressed data will have a copy of its sector map entry in its header. If the VBT were to become corrupted, scanning the disk for all sector maps could create a new one.

[0106] The term “type” refers to the sector map type. For example, a value of “00” corresponds to this sector map definition. Other values are preferably reserved for future redefinitions of the sector map.

[0107] A “C Type” denotes a compression type. A value of “000” will correspond to no compression. Other values are defined as required depending on the application. This function supports the use of multiple compression algorithms along with the use of various forms of asymmetric data compression.

[0108] The “C Info” comprises the compression information needed for the given compression type. These values are defined depending on the application. In addition, the data may be tagged based on its use—for example operating system “00”, Program “01”, or data “10”. Frequency of use or access codes may also be included. The size of this field may be greatly expanded to encode statistics supporting these items including, for example, cumulative number of times accessed, number of times accessed within a given time period or CPU clock cycles, and other related data.

[0109] The “sector count” comprises the number of physical sectors on the disk that are used for this chunk of compressed data. The “LBA” refers to the logical block address, or physical disk address, for this chunk of compressed data.

[0110] Referring back to FIG. 4A, each “Data” block represent each data chunk comprising a header and compressed data. The data chunk may up anywhere from 1 to 256 sectors on the disk. Each compressed chunk of data is preferably preceded on the disk by a data block header that preferably comprises the following information: (i) sector map; (ii) VBI; (iii) ID (“Magic”) Number; and (iv) checksum.

[0111] The “sector map” comprises a copy of the sector map entry in the VBT for this data chunk. The “VBI” is the Virtual Block Index, which is the index into the VBT that corresponds to this data chunk. The “ID (“Magic) Number” identifies this data as a data block header. The “checksum” number will change based on the data in the header and is used for error checking. This number is preferably chosen such that the addition of all the words in the header (including the checksum) will equal zero.

[0112] It should be noted that the present invention is not limited to checksums but may employ any manner of error detection and correction techniques, utilizing greatly expanded fields error detection and/or correction.

[0113] It should be further noted that additional fields may be employed to support encryption, specifically an identifier for encrypted or unencrypted data along with any parameters necessary for routing or processing the data to an appropriate decryption module or user.

[0114] The virtual size of the disk will depend on the physical size of the disk, the compress size selected, and the expected compression ratio. For example, assume there is a 75 GB disk with a selected compress size expecting a 3:1 compression ratio, the virtual disk size would be 225 GB. This will be the maximum amount of uncompressed data that the file system will be able to store on the disk.

[0115] If the number chosen is too small, then the entire disk will not be utilized. Consider the above example where a system comprises a 75 GB disk and a 225 GB virtual size. Assume that in actuality during operation the average compression ratio obtained is 5:1. Whereas this could theoretically allow 375 GB to be stored on the 75 GB disk, in practice, only 225 GB would be able to be stored on the disk before a “disk full” message is received. Indeed, with a 5:1 compression ratio, the 225 GB of data would only take up 45 GB on the disk leaving 30 GB unused. Since the operating system would think the disk is full, it would not attempt to write any more information to the disk.

[0116] On the other hand, if the number chosen is too large, then the disk will fill up when the operating system would still indicate that there was space available on the disk. Again consider the above example where a system comprises a 75 GB disk and a 225 GB virtual size. Assume further that during operation, the average compression ratio actually obtained is only 2:1. In this case, the physical disk would be full after writing 150 GB to it, but the operating system would still think there is 75 GB remaining. If the operating system tried to write more information to the disk, an error would occur.

[0117] Thus, in another embodiment of the present invention, the virtual size of the disk is dynamically altered based upon the achieved compression ratio. In one embodiment, a running average may be utilized to reallocate the virtual disk size. Alternatively, certain portions of the ratios may already be known—such as a preinstalled operating system and programs. Thus, this ratio is utilized for that portion of the disk, and predictive techniques are utilized for the balance of the disk or disks.

[0118] Yet in another embodiment, users are prompted for setup information and the computer selects the appropriate virtual disk(s) size or selects the best method of estimation based on, e.g., a high level menu of what is the purpose of this computer: home, home office, business, server. Another submenu may ask for the expected data mix, word, excel, video, music, etc. Then, based upon expected usage and associated compression ratios (or the use of already compressed data in the event of certain forms of music and video) the results are utilized to set the virtual disk size.

[0119] It should be noted that the present invention is independent of the number or types of physical or virtual disks, and indeed may be utilized with any type of storage.

[0120] It is to be understood that the systems and methods described herein may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. In particular, the present invention may be implemented as an application comprising program instructions that are tangibly embodied on a program storage device (e.g., magnetic floppy disk, RAM, ROM, CD ROM, etc.) and executable by any device or machine comprising suitable architecture. It is to be further understood that, because some of the constituent system components and process steps depicted in the accompanying Figures are preferably implemented in software, the actual connections between such components and steps may differ depending upon the manner in which the present invention is programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present invention.

[0121] Although illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present system and method is not limited to those precise embodiments, and that various other changes and modifications may be affected therein by one skilled in the art without departing from the scope or spirit of the invention. All such changes and modifications are intended to be included within the scope of the invention as defined by the appended claims.