Apparatus and method for storing data traffic on flow basis

Abstract

An apparatus and method for storing data traffic on a flow basis. The apparatus for storing data traffic on a flow basis includes a packet storage unit, a flow generation unit, and a metadata generation unit. The packet storage unit receives packets corresponding to data traffic, and temporarily stores the packets using queues. The flow generation unit generates flows by grouping the packets by means of a hash function using information about each of the packets as input, and to store the flows. The metadata generation unit generates metadata and index data corresponding to each of the flows, and stores the metadata and the index data.

Claims

1. An apparatus for storing data traffic on a flow basis, comprising: one or more units being configured and executed by a processor using algorithm associated with least one non-transitory storage device, the one or more units comprising, a packet storage unit configured to receive packets corresponding to data traffic, and to temporarily store the packets using queues; a flow generation unit configured to generate flows by grouping the packets by means of a hash value using algorithm that maps data of an arbitrary length to data of a fixed length of each of the received packets, the hash value being applied as an input value, when the input value varies the hash value varies accordingly, and to store the flows in flow buffers in response to detection of a size of the flows stored in the flow buffers exceeding a specific value or the flows being terminated, and in response to detection of internal data of a body corresponding to first flow being identical to internal data of a body corresponding to second flow, the flow buffers configured to store an address of internal data of a body corresponding to the first flow in a flow data map inside the second flow to increase efficiency of the packet storage unit by preventing all redundant data, and in response to detection of the redundant data being present in the internal data of the packets inside flows, the redundant data being eliminated from flows, and an address of the same data stored in third flow being stored in a flow data map inside the third flow; and a metadata generation unit configured to generate metadata and index data corresponding to each of the flows, and to store the metadata and the index data, wherein the flow generation unit comprises: a hash value generation unit configured to generate a hash value based on an IP address of each sender, an IP address of each recipient, a port address of the sender, and a port address of the recipient, which correspond to the packets, a generation unit configured to sort the packets according to their flows based on the hash values, to generate flows by grouping the packets, and to store the flows in flow buffers, and a flow storage unit configured to store the flows, stored in the flow buffers, on hard disks, and wherein the flow buffers comprise: an upstream content buffer configured to store a request packet, a header buffer configured to store a header of a response packet corresponding to the request packet, and downstream content buffers configured to store a body of the response packet.

2. The apparatus of claim 1, wherein the flow storage unit stores each of the flows on the hard disks.

3. The apparatus of claim 1, wherein the metadata generation unit generates the metadata including an IP address of a sender, an IP address of a recipient, a port address of the sender, a port address of the recipient, an internal address of a hard disk on which the flows are stored, and a start time and end time of the flows, and the index data including the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient, which correspond to the flows.

4. The apparatus of claim 3, wherein the metadata and the index data are stored on a solid state drive (SSD), and the data traffic is stored on the hard disks.

5. An apparatus for searching for data traffic on a flow basis, comprising: one or more units being configured and executed by a processor using algorithm associated with least one non-transitory storage device, the one or more units comprising, a flow storage unit configured to store flows generated by arranging packets corresponding to data traffic using information about each of the packets using a hash value using algorithm that maps data of an arbitrary length to data of a fixed length of each of the packets, the hash value being applied as an input value, when the input value varies the hash value varies accordingly; a metadata storage unit configured to store metadata and index data corresponding to each of the flows; and a search unit configured to search for a flow stored in the flow storage unit based on information about the flow, the search unit further configured to determine whether a flow is present in the flow storage unit using any one of the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient, which correspond to the flow, and Bloom filter, and in response to determination using the Bloom filter that the flow is present, the search unit configured to search for metadata corresponding to the flow using the index data and then searches for the flow based on the metadata, wherein the flow storage unit stores the flows on hard disks, and wherein the metadata storage unit stores the metadata and the index data on a solid state drive (SSD), wherein in response to detection of internal data of a body corresponding to first flow being identical to internal data of a body corresponding to second flow, the flow storage unit configured to store an address of internal data of a body corresponding to the first flow in a flow data map inside the second flow to increase efficiency of the flow storage unit by preventing all redundant data being stored, and in response to detection of redundant data being present in the internal data of the packets inside flows, the redundant data being eliminated from flows, and an address of the same data stored in third flow being stored in a flow data map inside the third flow.

6. The apparatus of claim 5, wherein the search unit comprises a Bloom filter configured to store an IP address of a sender, an IP address of a recipient, a port address of the sender, and a port address of the recipient, which correspond to the flow.

7. A method of storing data traffic on a flow basis, comprising: receiving packets corresponding to data traffic, and temporarily storing the packets using queues; generating flows by arranging the packets using information about each of the packets and a hash value using algorithm that maps data of an arbitrary length to data of a fixed length of each of the received packets, the hash value being applied as an input value, when the input value varies the hash value varies accordingly, and storing the flows in flow buffers in response to detection of a size of the flows stored in the flow buffers exceeding a specific value or the flows being terminated, and in response to detection of first internal data of a body corresponding to first flow being identical to internal second data of a body corresponding to second flow storing an address of internal data of a body corresponding to the first flow in a flow data map inside the second flow to increase efficiency of a packet storage unit by preventing all redundant data being stored, and in response to detection of redundant data being present in the internal data of the packets inside flows, eliminating the redundant data from flows, and storing an address of the same data in third flow in a flow data map inside the third flow; and generating metadata and index data corresponding to each of the flows, wherein generating the flows comprises: generating a hash value based on an IP address of each sender, an IP address of each recipient, a port address of the sender, and a port address of the recipient, which correspond to the packets, generating flows by grouping the packets based on the hash values, and storing the flows in flow buffers, and storing the flows, stored in the flow buffers, on hard disks, and wherein the flow buffers comprise: an upstream content buffer configured to store a request packet, a header buffer configured to store a header of a response packet corresponding to the request packet, and downstream content buffers configured to store a body of the response packet.

8. The method of claim 7, wherein generating the metadata and the index data comprises generating the metadata including an IP address of a sender, an IP address of a recipient, a port address of the sender, a port address of the recipient, an internal address of a hard disk on which the flow has been stored, and a start time and end time of the flows, and the index data including the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient, which correspond to the flows.

9. The method of claim 8, wherein generating the metadata and, the index data comprises storing the generated metadata and index data on a solid state drive (SSD).

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The above and other objects, features and advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:

(2) FIG. 1 is a block diagram showing an apparatus for storing data traffic on a flow basis according to an embodiment of the present invention;

(3) FIG. 2 is a block diagram showing an embodiment of the flow generation unit shown in FIG. 1;

(4) FIG. 3 is a diagram showing an apparatus for storing data traffic on a flow basis according to an embodiment of the present invention;

(5) FIG. 4 is a diagram showing the storage of flows using an apparatus for storing data traffic on a flow basis according to an embodiment of the present invention;

(6) FIG. 5 is a block diagram showing an apparatus for searching for data traffic on a flow basis according to an embodiment of the present invention;

(7) FIG. 6 is a diagram showing search for a flow using an apparatus for searching for data traffic on a flow basis according to an embodiment of the present invention; and

(8) FIG. 7 is an operation flowchart showing a method of storing data traffic on a flow basis according to an embodiment of the present invention.

DETAILED DESCRIPTION

(9) Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Redundant descriptions and descriptions of well-known functions and configurations that have been deemed to make the gist of the present invention unnecessarily obscure will be omitted below. The embodiments of the present invention are intended to fully describe the present invention to persons having ordinary knowledge in the art to which the present invention pertains. Accordingly, the shapes, sizes, etc. of components in the drawings may be exaggerated to make the description obvious.

(10) Embodiments of the present invention are described in detail with reference to the accompanying diagrams.

(11) FIG. 1 is a block diagram showing an apparatus for storing data traffic on a flow basis according to an embodiment of the present invention.

(12) A packet storage unit 110 receives packets corresponding to data traffic, and temporarily stores the packets using queues.

(13) In this case, a high-speed Network Interface Card (NIC) may receive the packets corresponding to the data traffic.

(14) In this case, the efficiency of communication between threads and a disk write task can be increased by temporarily storing the packets using the queues and transferring the temporarily stored packets at one time without transmitting the received packets to a CPU and processing them.

(15) A flow generation unit 120 generates flows by arranging packets using information about each of the packets and a hash function, and stores the flows.

(16) In this case, arranging packets refers to grouping packets that are used when a single task is performed.

(17) In this case, the term flow used herein may refer to a set of packets that are used to perform a single task.

(18) In this case, when generating the flows by arranging the packets, the flow generation unit 120 may generate the flows by arranging the packets using information about each of the packets and hash values generated using a hash function.

(19) In this case, a method of arranging packets is not limited to a specific method. When hash values generated from respective packets are the same, a single flow may be generated by arranging packets having the same hash value.

(20) In this case, the information about each of the packets may be information including the IP address of a sender, the IP address of a recipient, the port address of the sender, and the port address of the recipient, which correspond to the packet.

(21) In this case, the flow generation unit 120 may temporarily store the generated flows in flow buffers. When the size of data stored in the flow buffers exceeds a specific value or a flow is terminated, the flows temporarily stored in the flow buffers may be transmitted to hard disks and stored on the hard disks. There is no limitation regarding the specific value. When more frequent movement from the flow buffers to the hard disks is required, the specific value may be adjusted to a smaller value. In contrast, when the efficiency of transmission to the hard disks is increased, the specific value is adjusted to a larger value, and thus a larger amount of data may be transmitted at one time.

(22) In this case, the flow buffers in which the flows are temporarily stored include a packet header buffer configured to store the headers of all the packets of a corresponding flow that is temporarily stored, an upstream content buffer configured to store the payload of a request packet, and downstream content buffers configured to store the payload of a response packet corresponding to the request packet. The downstream content buffers include an HTTP response header buffer configured to store an HTTP response header and an HTTP response body buffer configured to store an HTTP response body, in the case of HTTP data traffic. Although a conventional buffer stores both a request packet and a response packet in a single buffer without dividing each packet into a header and a payload, the flow buffers used in the present invention include the header buffer, the upstream content buffer and the downstream content buffers in order to separately store the header and the payload of the packet.

(23) In this case, the flow storage unit 120 stores the address of the internal data of a body corresponding to a first flow in a flow data map inside a second flow when the internal data of the body corresponding to the first flow is the same as the internal data of a body corresponding to the second flow. This is intended to increase the efficiency of storage space. To prevent the waste of storage space that occurs when all redundant data is stored, when redundant data is present in the internal body data of the response packets inside flows, the redundant data is eliminated from a flow, and the address of the same data stored in another flow is stored in a flow data map inside the other flow. This is described in detail with reference to FIG. 4.

(24) The metadata generation unit 130 generates metadata and index data corresponding to each flow.

(25) In this case, the metadata may include the IP address of a sender, the IP address of a recipient, the port address of the sender, the port address of the recipient, the internal address of a hard disk on which the flow has been stored, and the start time and end time of the flow, which correspond to the flow.

(26) In this case, the index data may include the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient.

(27) In this case, the index data may include the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient among the data included in the metadata. When the metadata is directly used to search for a flow, a reduction in search speed may occur because the size of metadata is large. Accordingly, a reduction in search speed can be prevented using a method of determining a search target flow using the index data in which part of the metadata has been stored and then fetching the address of the hard disk at which the flow has been stored using the metadata.

(28) In this case, processing speed can be improved by storing the index data and the metadata on a solid state drive (SSD), instead of the hard disks.

(29) FIG. 2 is a block diagram showing an embodiment of the flow generation unit shown in FIG. 1.

(30) Referring to FIG. 2, the flow generation unit 120 includes a hash value generation unit 210, a generation unit 220, and a flow storage unit 230.

(31) The hash value generation unit 210 generates a hash value by applying a hash function using the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient, corresponding to a packet, as input.

(32) In this case, the hash function refers to an algorithm that maps data of an arbitrary length to data of a fixed length. The hash function has a characteristic in which input varies when a hash value varies. Accordingly, it is possible to group packets and generate a flow using a hash value generated by applying the hash function to an input value.

(33) The generation unit 220 sorts packets according to their flows based on the hash values, generates flows by grouping the packets, and stores the flows in the flow buffers.

(34) In this case, although a method of sorting packets according to their flows is not limited to a specific method, flows may be generated by grouping packets having the same hash value.

(35) In this case, the generation unit 220 may temporarily store the generated flows in the flow buffers. When the size of data stored in the flow buffers exceeds a specific value or a flow is terminated, the flows temporarily stored in the flow buffers may be transmitted to hard disks and stored on the hard disks. There is no limitation regarding the specific value. When more frequent movement from the flow buffers to the hard disks is required, the specific value may be adjusted to a smaller value. In contrast, when the efficiency of transmission to the hard disks is increased, the specific value is adjusted to a larger value, and thus a larger amount of data may be transmitted at one time.

(36) In this case, the flow buffers in which the flows are temporarily stored include a packet header buffer configured to store the headers of all the packets of a corresponding flow that is temporarily stored, an upstream content buffer configured to store the payload of a request packet, and downstream content buffers configured to store the payload of a response packet corresponding to the request packet. The downstream content buffers include an HTTP response header buffer configured to store an HTTP response header and an HTTP response body buffer configured to store an HTTP response body, in the case of HTTP data traffic. Although a conventional buffer stores both a request packet and a response packet in a single buffer without dividing each packet into a header and a payload, the flow buffers used in the present invention include the header buffer, the upstream content buffer and the downstream content buffers in order to separately store the header and the payload of the packet.

(37) The flow storage unit 230 stores the flows, stored in the flow buffers, on hard disks.

(38) In this case, the flow storage unit 230 stores the address of the internal data of a body corresponding to a first flow in a flow data map inside a second flow when the internal data of the body corresponding to the first flow is the same as the internal data of a body corresponding to the second flow. This is intended to increase the efficiency of storage space. To prevent the waist of storage space that occurs when all redundant data is stored, when redundant data is present in the internal body data of the response packets inside flows, the redundant data is eliminated from a flow, and the address of the same data stored in another flow is stored in a flow data map inside the other flow. This is described in detail with reference to FIG. 4.

(39) FIG. 3 is a diagram showing an apparatus for storing data traffic on a flow basis according to an embodiment of the present invention.

(40) The packet storage unit 110, the flow generation unit 120 and the metadata generation unit 130 shown in FIG. 1 may be implemented using threads within a central processing unit (CPU).

(41) First, the threads within the CPU may include three types of threads: engine threads 330, writing threads 340, and index threads 350.

(42) The engine threads 330 may be responsible for the detection of packets from the high-speed NIC 310, the generation of flows, the management of flows, and the generation of index data.

(43) The writing threads 340 may be responsible for the periodical storage of flow data, present in the flow buffers on hard disks 350. In this case, the writing thread 340 may share the flow buffers with the engine threads 330.

(44) The index threads 360 may be responsible for the storage of metadata and index data corresponding to each of the flows on an SSD 370. In this case, the index threads 360 share the metadata and index data of the flows with the engine threads 330.

(45) Referring to FIG. 3, the storage of flows by the apparatus for storing data traffic on a flow basis according to an embodiment of the present invention is described.

(46) First, the packet storage unit 110 receives packets corresponding to data traffic from the NIC 310, and temporarily stores the packets using queues 320.

(47) In this case, the packet storage unit 110 may be performed using the engine threads 330.

(48) In this case, the efficiency of communication between threads and a disk write task can be increased by temporarily storing the packets using the queues and transferring the temporarily stored packets, at one time without transmitting the received packets to a CPU and processing them.

(49) Furthermore, the flow generation unit 120 generates flows by, arranging packets using information about each of the packets and a hash function, and stores the generated flows on the hard disks 350.

(50) In this case, the flow generation unit 120 may generate flows using engine threads 330, and may store the flows on the hard disks 350 using the writing threads 340.

(51) In this case, arranging packets refers to grouping packets that are used when a single task is performed.

(52) In this case, the term flow used herein may refer to a set of packets that are used to perform a single task.

(53) In this case, when generating the flows by arranging the packets, the flow generation unit 120 may generate the flows by arranging the packets using information about each of the packets and hash values generated using a hash function.

(54) In this case, a method of arranging packets is not limited to a specific method. When hash values generated from respective packets are the same, a single flow may be generated by arranging packets having the same hash value.

(55) In this case, the information about each of the packets may be information including the IP address of a sender, the IP address of a recipient, the port address of the sender, and the port address of the recipient, which correspond to the packet.

(56) Furthermore, the metadata generation unit 130 may generate metadata and index data corresponding to each of the flows, and may store the metadata and index data on the SSD 370.

(57) In this case, the metadata generation unit 130 may generate the metadata and index data using the engine threads 330, and may store the metadata and index data on the SSD 370 using the index thread.

(58) In this case, the metadata may include the IP address of a sender, the IP address of a recipient, the port address of the sender, the port address of the recipient, the internal address of a hard disk on which the flow has been stored, and the start time and end time of the flow, which correspond to the flow.

(59) In this case, the index data may include the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient.

(60) In this case, the index data may include the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient among the data included in the metadata. When the metadata is directly used to search for a flow, a reduction in search speed may occur because the size of metadata is large. Accordingly, a reduction in search speed can be prevented using a method of determining a search target flow using the index data in which part of the metadata has been stored and then fetching the address of the hard disk at which the flow has been stored using the metadata.

(61) FIG. 4 is a diagram showing the storage of flows using an apparatus for storing data traffic on a flow basis according to an embodiment of the present invention.

(62) Referring to FIG. 4(a), in the case of TCP data traffic, packets are divided and stored in a header buffer, an upstream content buffer, and a downstream content buffer. In this case, the headers of packets corresponding to TCP data traffic may be stored in the header buffer, the payload of a request packet is stored in the upstream content buffer, and the payload of a response packet may be stored in the downstream content buffer.

(63) Furthermore, in the present invention, in the case of HTTP data traffic, the payloads of response packets are divided into HTTP response headers and bodies. The HTTP response headers are stored in a HTTP response header buffer, and the HTTP response bodies are stored in an HTTP response body buffer. The downstream content buffers may include the HTTP response header buffer and the HTTP response body buffer. Referring to the HTTP flow of FIG. 4(b), it can be seen that the HTTP response headers and bodies are separately stored in the downstream content buffers. That is, it can be seen that flow data used in the present invention includes a flow data map, request packets, and the headers and bodies of response packets corresponding to the request packets.

(64) The flow storage unit 230 of the flow generation unit 120 may store temporarily stored flow data on the hard disks, and packet headers may be stored in the order in which the packets of each flow are received. The flow data stored on the hard disks is shown at the lower end of FIG. 4.

(65) In this case, the flow storage unit 120 stores the address of the internal data of a body corresponding to a first flow in a flow data map inside a second flow when the internal data of the body corresponding to the first flow is the same as the internal data of a body corresponding to the second flow. This is intended to increase the efficiency of storage space. To prevent the waste of storage space that occurs when all redundant data is stored, when redundant data is present in the internal body data of the response packets inside flows, the redundant data is eliminated from a flow, and the address of the same data stored in another flow is stored in a flow data map inside the other flow.

(66) Referring to FIG. 4, this is described in greater detail.

(67) In the case of the HTTP flow and the redundant data of the HTTP flow shown in FIG. 4, the body parts of response packets are the same. In this case, when data redundant between flow (b) and flow (c) is all stored, a problem arises in that storage space is wasted. Accordingly, data indicative of the location of redundant data is stored in a flow data map that constitutes part of flow data. From FIG. 4, it can be seen that redundant data is not stored in flow (c) but is stored in flow (b). In this case, data indicative of a location at which the internal redundant data of flow (b) has been stored may be stored in the flow data map of flow (c).

(68) In the case of HTTP data traffic, request packets to response packets are the same, and thus the waste of storage space can be significantly reduced when the method shown in FIG. 4 is used.

(69) FIG. 5 is a block diagram showing an apparatus for searching for data traffic on a flow basis according to an embodiment of the present invention.

(70) Referring to FIG. 5, the apparatus for searching for data traffic on a flow basis according to the present embodiment includes a flow storage unit 510, a metadata storage unit 520, and a search unit 530.

(71) The flow storage unit 510 stores flows generated by arranging packets using information about each of the packets corresponding to data traffic and a hash function.

(72) In this case, the stored flows may be flows generated by the apparatus for storing data traffic on a flow basis, which is shown in FIG. 1.

(73) In this case, the flows may be stored on the hard disks.

(74) The metadata storage unit 520 stores metadata and index data corresponding to each of the flows.

(75) In this case, the stored metadata and index data may be metadata and index data generated by the apparatus for storing data traffic on a flow basis, which is shown in FIG. 1.

(76) In this case, the metadata and the index data may be stored on an SSD in order to improve search speed.

(77) In this case, the metadata may include the IP address of a sender, the IP address of a recipient, the port address of the sender, the port address of the recipient, the internal address of a hard disk on which the flow has been stored, and the start time and end time of the flow, which correspond to the flow.

(78) In this case, the index data may include the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient.

(79) The search unit 530 searches for the flow, stored in the flow storage unit 510, based on the information of the flow, the metadata and the index data.

(80) In this case, the search unit 530 may search for metadata corresponding to the search target specific flow using the information of the flow including any one of the IP address of the recipient, the IP address of the sender, the port address of the recipient, and the port address of the sender and the index data including the IP address of the recipient, the IP address of the sender, the port address of the recipient, and the port address of the sender.

(81) In this case, the search unit 530 may extract the address of the hard disk where a flow included in metadata corresponding to a specific flow has been stored, and may search for an internal flow inside the hard disk.

(82) In this case, the search unit 530 may include a Bloom filter including the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient, which correspond to the flow, in order to improve search speed.

(83) In this case, the Bloom filter is a filter that is capable of determining whether information is not present. Whether information is present cannot be determined, and whether information is present can be estimated. A detailed description of the Bloom filter is omitted.

(84) In this case, the search unit 530 may determine whether a search target flow is present using the Bloom filter. If a flow is not present, search may be terminated. If it is determined using the Bloom filter that a flow is present, this is not accurate information, and thus it is determined using the index data whether the flow is present. If the flow is present, the address of the hard disk where the flow has been stored is extracted from metadata corresponding to the flow, and then the flow may be searched for.

(85) That is, since the Bloom filter can rapidly determine that a flow is not present, the present invention can effectively increase search speed.

(86) FIG. 6 is a diagram showing search for a flow using an apparatus for searching for data traffic on a flow basis according to an embodiment of the present invention.

(87) Referring to FIG. 6, a Bloom filter, sorted arrays in which index data is arranged, flow metadata and flow data are shown.

(88) In this case, the Bloom filter may be stored in memory and thus it can be rapidly determined whether a flow is not present.

(89) In this case, the index data and the flow metadata may be stored in the SSD.

(90) In this case, the flow data may be stored on the hard disks.

(91) In this case, in the Bloom filter, the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient are separate from one another, and thus it may be rapidly determined based on only the IP address 610 of a sender whether a flow is not present.

(92) Referring to FIG. 6, search for a flow according to an embodiment of the present invention is described.

(93) First, it is determined whether a flow is not present using the input IP address of a sender and the Bloom filter 610. If, as a result of the determination, it is determined that the flow is not present, search is terminated.

(94) Furthermore, if, as a result of the determination, it is determined that the flow is present, this information is not accurate information, and thus it is determined using index data whether the flow is present. In the case of FIG. 6, it is determined that flow 3 is present, the address of the hard disk where flow 3 has been stored is extracted from metadata, and packets corresponding to flow 3 may be searched for using the address of the hard disk.

(95) FIG. 7 is an operation flowchart showing a method of storing data traffic on a flow basis according to an embodiment of the present invention.

(96) Referring to FIG. 7, packets corresponding to data traffic are received and the packets are temporarily stored using queues at step S710.

(97) In this case, a high-speed Network Interface Card (NIC) may receive the packets corresponding to the data traffic.

(98) In this case, the efficiency of communication between threads and a disk write task can be increased by temporarily storing the packets using the queues and transferring the temporarily stored packets at one time without transmitting the received packets to a CPU and processing them.

(99) In this case, step S710 may be performed by the packet storage unit 110 shown in FIG. 1.

(100) Furthermore, hash values may be generated using packet information and a hash function at step S720.

(101) In this case, the hash function refers to an algorithm that maps data of an arbitrary length to data of a fixed length. The hash function has a characteristic in which input varies when a hash value varies. Accordingly, it is possible to group packets and generate a flow using a hash value generated by applying the hash function to an input value.

(102) Furthermore, flows are generated by arranging the packets using the hash values and are stored at step S730.

(103) In this case, arranging packets refers to grouping packets that are used when a single task is performed.

(104) In this case, the term flow used herein may refer to a set of packets that are used to perform a single task.

(105) In this case, when generating the flows by arranging the packets, the flow generation unit 120 may generate the flows by arranging the packets using information about each of the packets and hash values generated using a hash function.

(106) In this case, a method of arranging packets is not limited to a specific method. When hash values generated from respective packets are the same, a single flow may be generated by arranging packets having the same hash value.

(107) In this case, the information about each of the packets may be information including the IP address of a sender, the IP address of a recipient, the port address of the sender, and the port address of the recipient, which correspond to the packet.

(108) In this case, the flow generation unit 120 may temporarily store the generated flows in flow buffers. When the size of data stored in the flow buffers exceeds a specific value or a flow is terminated, the flows temporarily stored in the flow buffers may be transmitted to hard disks and stored on the hard disks. There is no limitation regarding the specific value. When more frequent movement from the flow buffers to the hard disks is required, the specific value may be adjusted to a smaller value. In contrast, when the efficiency of transmission to the hard disks is increased, the specific value is adjusted to a larger value, and thus a larger amount of data may be transmitted at one time.

(109) In this case, the flow buffers in which the flows are temporarily stored include a packet header buffer configured to store the headers of all the packets of a corresponding flow that is temporarily stored, an upstream content buffer configured to store the payload of a request packet, and downstream content buffers configured to store the payload of a response packet corresponding to the request packet. The downstream content buffers include an HTTP response header buffer configured to store an HTTP response header and an HTTP response body buffer configured to store an HTTP response body, in the case of HTTP data traffic. Although a conventional buffer stores both a request packet and a response packet in a single buffer without dividing each packet into a header and a payload, the flow buffers used in the present invention include the header buffer, the upstream content buffer and the downstream content buffers in order to separately store the header and the payload of the packet.

(110) In this case, the flow storage unit 120 stores the address of the internal data of a body corresponding to a first flow in a flow data map inside a second flow when the internal data of the body corresponding to the first flow is the same as the internal data of a body corresponding to the second flow. This is intended to increase the efficiency of storage space. To prevent the waste of storage space that occurs when all redundant data is stored, when redundant data is present in the internal body data of the response packets inside flows, the redundant data is eliminated from a flow, and the address of the same data stored in another flow is stored in a flow data map inside the other flow. This has been described in detail with reference to FIG. 4.

(111) Furthermore, metadata and index data corresponding to each flow are generated at step S740.

(112) In this case, the metadata may include the IP address of a sender, the IP address of a recipient, the port address of the sender, the port address of the recipient, the internal address of a hard disk on which the flow has been stored, and the start time and end time of the flow, which correspond to the flow.

(113) In this case, the index data may include the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient.

(114) In this case, the index data may include the IP address of the sender, the IP address of the recipient, the port address of the sender, and the port address of the recipient among the data included in the metadata. When the metadata is directly used to search for a flow, a reduction in search speed may occur because the size of metadata is large. Accordingly, a reduction in search speed can be prevented using a method of determining a search target flow using the index data in which part of the metadata has been stored and then fetching the address of the hard disk at which the flow has been stored using the metadata.

(115) In this case, processing speed can be improved by storing the index data and the metadata on an SSD, instead of the hard disks.

(116) According to at least one embodiment of the present invention, data traffic can be stored and searched for on the basis of a flow unit composed of packets, thereby further increasing storage and search speeds.

(117) According to at least one embodiment of the present invention, data traffic can be searched for based on two-step flow-based index data, thereby increasing search speed.

(118) According to at least one embodiment of the present invention, data traffic can be stored on a flow unit basis, other than a packet unit basis, thereby increasing the efficiency of communication and a disk write task.

(119) An embodiment of the present invention may be implemented in a computer system, e.g., as a computer readable medium. A computer system may include one or more of a processor, a memory, a user input device, a user output device, and a storage, each of which communicates through a bus. The computer system may also include a network interface that is coupled to a network. The processor may be a central processing unit (CPU) or a semiconductor device that executes processing instructions stored in the memory and/or the storage. The memory and the storage may include various forms of volatile or non-volatile storage media. For example, the memory may include a read-only memory (ROM) and a random access memory (RAM).

(120) Accordingly, an embodiment of the invention may be implemented as a computer implemented method or as a non-transitory computer readable medium with computer executable instructions stored thereon. In an embodiment, when executed by the processor, the computer readable instructions may perform a method according to at least one aspect of the invention.

(121) The apparatus and method for storing data traffic on a flow basis and the apparatus for searching for data traffic on a flow basis according to the present invention are not limited to the configurations and methods of the above-described embodiments, but some or all of the embodiments may be selectively combined such that the embodiments can be modified in various manners.

Apparatus and method for storing data traffic on flow basis

Assignee

Inventors

Cpc classification

Classification Explorer

H04L69/22

ELECTRICITY

Classification Explorer

H04L49/901

ELECTRICITY

Classification Explorer

H04L49/9042

ELECTRICITY

International classification

Classification Explorer

H04L12/879

ELECTRICITY

Abstract

Claims

Description