Identification of a protocol of a data stream
11265372 · 2022-03-01
Assignee
Inventors
Cpc classification
H04L67/02
ELECTRICITY
H04L67/1091
ELECTRICITY
International classification
H04L41/0604
ELECTRICITY
Abstract
The invention concerns a method for identifying a protocol of a data stream exchanged between two entities of a telecommunication network, the processing method comprising the following steps: on receiving data of the data stream, grammatical parsing of said data stream in order to identify a protocol of the data stream; in the event of failure to identify the protocol of the data stream by grammatical parsing, consulting a signature engine mapping protocols with corresponding signatures, and sequentially applying signatures to the data flow in order to identify a data stream protocol.
Claims
1. A method for identifying a protocol of a data stream exchanged between two entities of a telecommunication network, the processing method comprising the following steps: receiving data of the data stream, parsing the data stream in order to identify a protocol of the data stream; in the event of failure to identify the protocol of the data stream by means of the parsing, consulting a signature engine that matches protocols with corresponding signatures, and sequentially applying the signatures to the data stream in order to identify a protocol of the data stream.
2. The method of claim 1, further comprising, in the event of failure to identify the protocol of the data stream by consulting the signature engine, applying a statistical protocol recognition method in order to identify the protocol of the data stream.
3. The method of claim 1, wherein the identified protocol is an application-level protocol.
4. The method of claim 1, wherein, in the event of success in identifying the protocol of the data stream by means of the parsing, the method further comprises a step of identifying protocol data by applying a one-pass algorithm to context elements of the data stream depending on the identified protocol.
5. The method of claim 4, wherein, in the event of failure to identify protocol data by applying the one-pass algorithm, the method further comprises consulting a signature engine that matches protocol data with corresponding signatures, and sequentially applying the signatures to the data stream in order to identify protocol data of the data stream.
6. The method of claim 1, further comprising a step of processing the data stream on the basis of the identified protocol of the data stream.
7. The method of claim 6, wherein the processing of the data stream comprises at least one of the steps from among: applying a service quality policy depending on the identified protocol; or authorizing or prohibiting the data stream on the basis of the identified protocol.
8. A non-transitory computer-readable medium comprising a computer program product stored thereon and including instructions for implementing the method of claim 1 when this computer program product is executed by a processor.
9. A device for identifying a protocol of a data stream exchanged between two entities of a telecommunication network, the device comprising: an interface configured to receive data of the data stream; a processor configured to: parse the data stream in order to identify a protocol of the data stream; in the event of failure to identify the protocol of the data stream by means of the parsing, consult a signature engine that matches protocols with corresponding signatures, and sequentially apply the signatures to the data stream in order to identify a protocol of the data stream.
Description
(1) Other features and advantages of the invention will become apparent on examining the detailed description below and the appended drawings in which:
(2)
(3)
(4)
(5) The invention can be implemented in a protocol identification device such as the analyzer 300 illustrated in
(6)
(7) In a step 200, one or more packets of a stream are received by the identification device, for example after interception of the packets by the analyzer 300 on the communication link 200.
(8) In a step 201, a received data packet can be identified in order to be associated with an existing stream, or in order to create a new entry in a table listing the current data streams. For example, an IP address (and optionally a port number) of a source entity and an IP address (and optionally a port number) of a recipient entity can be taken into account to identify the stream corresponding to the packet. Such a technique is well known and will not be explained in greater detail.
(9) The source or recipient entity may refer either to a client or to a server. The client may be a laptop or desktop computer, a touchscreen tablet, a smartphone or else any electronic device comprising an interface that makes it possible to communicate in the network 100 or 110, for example the Internet. According to the invention, the two communicating entities may be in two separate networks, as illustrated in
(10) The low-layer protocols of the data stream can be determined in step 201 by explicit recognition. As mentioned above, explicit recognition requires little computing power in that the protocol of a layer of a given level may be indicated explicitly by the layer of the level immediately below it.
(11) Thus, it can for example be determined that the IPv4 or IPv6 protocol is used on the basis of Ethernet layer data. Likewise, the IP layer indicates whether the UDP or TCP protocol is used.
(12) From step 202 onwards, the aim of the method according to the invention is to identify a protocol that is not explicitly indicated by the lower-level layers. Such identification is therefore implicit. For example, the recognition of a protocol of the layers of levels 5 to 7 of the OSI level, and in particular of level 7 (application), is considered.
(13) In a step 203, the identification device implements parsing of the data of the data stream, which are contained in the packet or packets of the data stream, in order to identify a protocol of the data stream. Indeed, certain protocols of the application level have grammar that is readily identifiable by using low computing power. This is the case, for example, for the SMTP and HTTP protocols. Such protocols have context elements that are useful for the recognition thereof. For example, they both use a “handshake” process to set up the stream. Other protocols, such as SSL or SIP, can also be identified by recognizing their grammar. It should be noted that, statistically, 90% of the application protocols of the streams to be classified can be recognized by using step 203. The prioritized initial use of such a recognition method thus makes it possible to recognize a large number of protocols with low computing power.
(14) In step 203, it is checked whether the protocol of the data stream has been successfully identified by means of the parsing.
(15) In the event of success in identifying the protocol of the data stream by means of the parsing, the method may further comprise a step 204 of identifying protocol data by applying a one-pass (or “single-pass”) algorithm to context elements of the data stream depending on the identified protocol. The one-pass algorithm may depend on the identified protocol.
(16) The identification of the protocol data can be considered to be the identification of an application or sub-application of a layer higher than the layer of the protocol identified in step 203. For example, if the protocol is identified as being HTTP, the sub-application of a higher layer, or protocol data, may be Facebook™ data, for example.
(17) The application of the one-pass algorithm may consist in inputting context elements of the stream (for example, for HTTP, the context elements may be elements such as the URL, User Agent, etc.) into a rules engine. “Context element of the stream” refers to any header or payload element of the data stream. The use of a one-pass algorithm is not very costly in terms of computational resources, and the processing time is fixed and does not depend on the number of inputs.
(18) In response to the input of the context elements, the rules engine can return a set of rules that can be tested on the data of the protocol identified in step 102 in order to identify the protocol data. For example, having identified the HTTP protocol in step 202, the protocol data can be identified as being Facebook™ data.
(19) In a step 212, it is checked whether the protocol data have been identified in step 204 by means of the one-pass algorithm. In the event of success, the method continues with step 205. In the event of failure, the method moves on to step 206, which is described below.
(20) Steps 204 and 205 are optional, and the method can move directly from step 203 to 205 in the event of positive identification in step 203.
(21) Once the protocol and, optionally, the protocol data have been identified, the method may comprise applying a step 205 of processing the data stream on the basis of the identified protocol and, optionally, on the basis of the application data. The processing of the stream may, for example, consist in applying a service quality policy depending on the identified protocol or in authorizing or prohibiting the data stream on the basis of the identified protocol, or it may more generally consist in classifying the stream on the basis of the identified protocol. The classification may be transmitted to a processing device external to the protocol identification device.
(22) In the event of failure to identify the protocol of the data stream by means of the parsing in step 202, the method according to the invention comprises a step 206 of consulting a signature engine that matches protocols with corresponding signatures. In a step 207, the signatures are sequentially applied to the data stream in order to identify the application-level protocol of the data stream. Such sequential application is costlier in terms of resources, and is thus advantageously applied only if the parsing in step 202 has failed.
(23) Statistically, such a signature search method makes it possible to access half of the 10% of application protocols that have not been able to be identified by the parsing method (i.e. 5% of protocols). Although it is costlier in terms of computational resources, the signature search method nonetheless remains reliable.
(24) Steps 206 and 207 can also be applied to the protocol data if identification has failed in step 204. In this case, the protocol data are compared with signatures in order to identify them.
(25) In a step 208, it is checked whether the protocol of the data stream has been successfully identified by the signature search method.
(26) In the event of success, the method returns to step 205, which has been described above.
(27) In the event of failure, one embodiment of the invention may provide for an additional step 209 of applying a statistical protocol recognition method in order to identify the application protocol of the data stream (or the protocol data). Such a method in particular makes it possible to identify encrypted protocols, such as BitTorrent. Such a method is costly in terms of computing power (sequential search) and is not totally reliable. However, it does make it possible to identify 1 to 2% of the protocols or protocol data that have not been identified by the methods implemented previously.
(28) In a step 210, it is checked whether the protocol of the data stream has been successfully identified by the statistical method.
(29) In the event of success, the method returns to step 205, which has been described above.
(30) In the event of failure (statistically in about 3% of cases), the method ends without being able to identify the application protocol of the data stream. A predefined processing operation can be applied in a step 211 in the event of failure. For example, as a precautionary measure, the data stream can be blocked.
(31) The invention also provides for the incremental application of protocol recognition methods, from the method that is the most reliable and the least costly in terms of computing power to the method that is the least reliable and the most resource-intensive. It thus optimizes the search for the application-level protocol.
(32)
(33) The identification device 301 can be implemented in the analyzer 300, which is located for interception between the networks 100 and 110 in
(34) The identification device comprises a random-access memory 305 and a processor 304, and also a memory 301 for storing instructions that make it possible to implement the steps of the method described above with reference to
(35) The memory 301 may additionally store data used by the processor to implement the method, in particular: the signature engine that matches signatures with corresponding protocols; the sets of rules associated with given protocols, for the recognition of protocol data; rules of statistical protocol recognition methods.
(36) The identification device 301 further includes an input interface 302, which is intended to receive data of data streams conveyed over the communication link 200 or within a given network.
(37) The identification device 301 further comprises an output interface 303, which is capable of providing a protocol identification result, or a command determined on the basis of the identified protocol.
(38) Of course, the present invention is not limited to the embodiment described above by way of example; it extends to other variants.