Method for generating requests for the segmentation of the monitoring of an interconnection network and associated hardware

11258687 · 2022-02-22

Assignee

Inventors

Cpc classification

International classification

Abstract

The invention relates to a method for generating a request, from a formal language instruction defining a set of ports of an interconnection network, said request including an addressing command for each one of the ports defined in the instruction, said method including the following steps: Receiving, by a communication module, a formal language instruction defining a set of ports, Processing, by a processing module, the formal language instruction so as to generate a set of numbers encoded on at least one byte, each number including position bits, each one of the position bits allowing to identify a port and at least one authorization bit, the at least one authorization bit allowing to define access rights on the ports, and Encoding, by an encoding module, the set of numbers so as to generate the request including the addressing command.

Claims

1. A method for monitoring an interconnection network implementing a request generating device, said method comprising: receiving a formal language instruction defining a set of ports corresponding to a restricted perimeter of the interconnection network, generating, from the formal language instruction, a request including an addressing command for each one of the set of ports defined in the formal language instruction, the request encoding a set of numbers encoded on at least one byte, each number of said set of numbers encoded on said at least one byte including position bits, each one of the position bits allowing identification of a port and at least one authorization bit, the at least one authorization bit allowing to define access rights on the set of ports, transmitting the request to a restricted perimeter monitoring module able to receive hardware statuses of the set of ports of the restricted perimeter, requesting hardware status values by the restricted perimeter monitoring module with the set of ports corresponding to the restricted perimeter of the interconnection network, receiving the hardware status values by the restricted perimeter monitoring module, and storing the hardware status values by the restricted perimeter monitoring module.

2. The method for monitoring an interconnection network according to claim 1, further comprising: receiving additional hardware status values from a plurality of additional restricted perimeter monitoring modules, storing the additional hardware status values, and aggregating the additional hardware status values from the plurality of additional restricted perimeter monitoring modules.

3. The method for monitoring an interconnection network according to claim 1, wherein at least one of the hardware status values is selected from: a state of a power supply, board voltage values, internal temperatures, a state of a fan, environmental and operational data of a display, and software versions of a switch.

4. The method for monitoring an interconnection network according to claim 1, wherein said method for monitoring is implemented in parallel by three restricted perimeter monitoring modules.

5. The method for monitoring an interconnection network according to claim 1, wherein the interconnection network includes a plurality of levels and the set of numbers encoded on said at least one byte includes at least one number encoded on said at least one byte for each level of the plurality of levels of the interconnection network.

6. The method for monitoring an interconnection network according to claim 1, wherein the set of numbers encoded on said at least one byte are numbers encoded on eight bytes.

7. The method for monitoring an interconnection network according to claim 1, wherein the set of numbers includes between two and sixteen numbers encoded on said at least one byte.

8. The method for monitoring an interconnection network according to claim 1, wherein the formal language instruction includes a plurality of numbers corresponding to numbers of ports of the interconnection network and symbols.

9. The method for monitoring an interconnection network according to claim 1, wherein a position bit of one of the position bits in a number of said set of numbers encoded on said at least one byte is indicative of a port number of a port of said set of ports, and a value of the position bit is indicative of whether said port is taken into account when establishing the request.

10. The method for monitoring an interconnection network according to claim 1, wherein values of the position bits allow definition of one or more associated ports which will be taken into account when establishing the request and a value of the at least one authorization bit allows definition of whether the one or more associated ports will be a subject of a destination by the addressing command or whether the one or more associated ports will only be traversed.

11. The method for monitoring an interconnection network according to claim 1, wherein the request generating device is configured to generate, from the formal language instruction defining the set of ports of the interconnection network, the request including the addressing command for each one of the set of ports defined in the formal language instruction, said request generating device comprising: a communication module configured to receive the formal language instruction defining the set of ports, a processing module, configured to process the formal language instruction to generate the set of numbers encoded on said at least one byte, and an encoding module, configured to encode the set of numbers encoded on said at least one byte to generate said request.

12. The method for monitoring an interconnection network according to claim 1, wherein the request generating device is configured to receive the formal language instruction from a client device and send the request to a system for monitoring the interconnection network.

Description

(1) Other advantages and characteristics of the invention will appear upon reading the following description given by way of illustrative and non-limiting example, with reference to the appended figures which represent:

(2) FIG. 1, an example of the topology of a cluster associated with a monitoring system according to the invention.

(3) FIG. 2, a schematic representation of a request generation method according to the invention.

(4) FIGS. 3A and 3B, schematically, a sequence of transformations according to the generation method according to the invention for transforming a formal language instruction designating a set of ports, into a request including the addressing command for each port of the set of ports.

(5) FIG. 4, a schematic representation of certain steps implemented according to the invention when processing the formal language instruction.

(6) FIG. 5, a schematic representation of certain steps implemented according to the invention during the monitoring method.

DESCRIPTION OF THE INVENTION

(7) In the following description, by “formal language text” or “formal language”, within the meaning of the invention, is meant a textual data structured by a data model that is predefined or organized in a predefined manner. The formal language text is not a computer code, but respects a certain form allowing its interpretation by the request generator. The form to be respected may be freely selected during the configuration of the request generator according to the invention.

(8) By “configuration file”, within the meaning of the invention, is meant a file comprising the information required for implementing the method according to the invention. The files are accessible to the various modules of the devices and systems according to the invention.

(9) The expression “access to a configuration file”, within the meaning of the invention, corresponds to the actions required to access a file on a local storage medium, or by downloading it or by direct broadcasting (streaming), from a remote storage, via a web protocol or access to a database. This is usually an access path to the file possibly preceded by an access protocol (such as http://, file://).

(10) The terms or expressions “application”, “software”, “program code”, and “executable code” mean any expression, code or notation, of a set of instructions intended to cause a system data processing to perform a particular function directly or indirectly (e.g. after a conversion operation to another code). Program code examples may include, but are not limited to, a subprogram, a function, an executable application, a source code, an object code, a library and/or any other sequence of instructions designed for being performed on a computer system.

(11) The expression “human-machine interface”, within the meaning of the invention, corresponds to any element allowing a human being to communicate with a computer, in particular and without being exhaustive, a keyboard and means allowing in response to the commands entered on the keyboard to perform displays and optionally to select with the mouse or a touchpad items displayed on the screen.

(12) By “maximum predetermined threshold”, within the meaning of the invention, is meant a maximum value of hardware statuses associated with each port, switch or node for a good operation of the hardware. For example, this corresponds to the maximum acceptable temperature limits. These limits can be real or hypothetical and generally correspond to a level beyond which malfunctions can occur and resulting in a shutdown of the hardware, a decrease in the service life of the equipment or at the very least decreases in service quality.

(13) The term “malfunction”, within the meaning of the invention, corresponds to the occurrence of a hardware incident or congestion on the interconnection network.

(14) The expression “number encoded on at least one byte” refers to a number, multiplet of 8 bits or more. This number, as is generally the case for multiplets of 8, can encode information.

(15) In the following description, the same references are used to designate the same elements.

(16) Generally, in the context of setting up an interconnection network, when initializing, a network manager 800, for the management of the interconnection network, starts a discovery phase where it performs, via an administration node 31, a scan of the interconnection network in order to discover all the switches and nodes. Then, the network management module proceeds to the assignment of identifier for example of the LID- or GID-type, the configuration of the switches, the calculations of a routing table and the configuration of the ports. At this point, an interconnection network 1 is ready and ready for use. Once the network is configured, the network manager can monitor any changes on the interconnection network (e.g. a device is added or a link is deleted). The nodes are connected to each other by switches (called switch in Anglo-Saxon terminology), for example hierarchically.

(17) In the example illustrated in FIG. 1, the interconnection network 1 comprises here a set of referenced nodes 30, including an administration node 31, as well as switches 21, 22, 23. Each switch here has six bidirectional ports numbered 1 to 6. The nodes 30 are connected to first level switches 21 which are themselves connected to second level switches 22 which are in turn connected to third level switches 23. The switches 20, that is to say 21, 22 and 23, route messages from their sources to their destinations, especially according to routing tables programed when initializing the interconnection network. For these purposes and as shown in FIG. 1, a port 10 is generally associated, in each switch 20, with one or more destinations.

(18) The switches 21 are each connected to three different nodes. The switches 21 are also connected to the switches 22. Thus, each node is connected here to each one of the other nodes via at most five switches. By way of illustration, the node 30a is connected to the node 30b via the switches 21a, 22a, 23a, 22b and 21b. More precisely, the output port 1 of the node 30a is connected to the port 2 of the switch 21a, the port 4 of which is connected to the port 1 of the switch 22a, the port 6 of which is connected to the port 4 of the switch 23a, the port 1 of which is connected to the port 4 of the switch 22b, the port 3 of which is connected to the port 4 of the switch 21b, the port 1 of which is connected to the input port of the node 30b. An example of an addressing command according to the direct routes would be in this case a command of the type:
−D 0 1 4 6 1 3 1

(19) According to one embodiment a client 600 can access the request generator 300. For example, the addressing command generator 300 may be implemented as a “front end” which can be accessed by clients for generating requests from formal language text entries. In the example shown in FIG. 1, a client 600 accesses the addressing command generator 300. For example, the client may execute a client application that is configured to communicate with the request generator 300. The client application may be a browser or another client-based communication application. In response to the access by the client 600, the request generator 300 can generate a user interface that can be sent to the client 600.

(20) In addition, the interconnection network 1 shown in FIG. 1 is associated with two monitoring modules 700, each of which takes care of a portion of the interconnection network. Said portions 11a, 11b of the interconnection network consisting of a set 11 of ports.

(21) FIG. 2 is a flowchart illustrating an example of a request generation method according to the invention. This method allows the generation of a request 231 including an addressing command 232 for each one of the ports 10 defined by a formal language instruction 211. The request may be more particularly a computer language request, and preferably it is a request including a plurality of direct route commands 232.

(22) As shown in FIG. 2, the method includes a step 110 for receiving a formal language instruction 211 defining a set 11 of ports.

(23) The formal language instruction 211 defines a set 11 of ports of an interconnection network. This set 11 of ports of an interconnection network may comprise several thousand ports. Preferably, the set of ports defined by the formal language instruction corresponds to a restricted perimeter, or a portion, of the interconnection network. These sets of ports are shown in FIG. 1 by the sets 11a and 11b where each set includes a plurality of computing nodes associated with a plurality of switches 20 including numerous ports. Thus, the computing nodes 30 belonging to two different sets are able to communicate with each other unlike the ports including different partition keys or managed by different network managers.

(24) As mentioned above, this instruction can be received by a communication module 310. The method 100 may begin in a state where a user, using a client device, has accessed the request generation device 300 according to the invention. The device 300 may provide a user interface on the client device 600. As a result, the user is able to enter the formal language instruction in a user interface. The client device sends the formal language instruction 211 to the device 300 for processing and generating requests.

(25) The formal language instruction 211 may include numbers and symbols. The objective being that this instruction is quickly generated by a user and that it includes all the information to enable the request generation device 300 to generate the potential thousands of direct route addressing commands for reaching the set of desired ports.

(26) By way of example, the formal language may use the following conventions: Operator ‘[’: Beginning of the list of ports to be monitored, Operator ‘]’: End of the list of ports to be monitored, Operator ‘(’: Beginning of the list of ports to be traversed without monitoring, Operator ‘)’: End of the list of ports to be traversed without monitoring, Operator ‘-’: Specifies an interval, and Operator ‘,’: Concatenation.

(27) The interpretation conventions of the symbols presented above are by way of example only and while remaining within the scope of the invention, certain symbols may be modified and symbols having new functions may also be added. The symbols may also correspond to logical operators.

(28) FIG. 3A shows for example a formal language instruction such as:
[1][20][1-38]

(29) Such a formal language instruction was written by the user to describe the following behavior: [1]: The packets exit port #1 while monitoring it (collecting data for that port). They now arrive into a switch 21. [20]: The packets are allowed to traverse only port 20 and to collect data thereon. No packet is allowed to exit through ports other than port 20. The packets allowed to traverse reach a set of switches 22. [1-38]: The packets are allowed to traverse the set of switches 22 through ports 1 to 38 and to collect data thereon. With [1-38], 38 theoretical ports are possible, in particular it is possible to have on the switch a smaller number of ports with a physical existence.

(30) For clarity and conciseness reasons, the virtual port #0 of the implementations of direct route path of the InfiniBand and OmniPath protocols may not appear in the formal language instruction.

(31) FIG. 3B shows another example of a formal language instruction such as:
(1)(1-18)[1-9,13]

(32) Such a formal language instruction was written by the user to describe the following behavior: (1): The packets exit port #1 without monitoring it (no data is collected for this port). They now arrive into a switch 21. (1-18): The packets are allowed to traverse ports 1 to 18 of switch S, with no data being collected thereon. No packet is allowed to exit through ports not listed in this interval. The packets allowed to traverse reach a set of switches 22. [1-9,13]: The packets are allowed to traverse the set of switches 22 through ports 1 to 9 and 13 and to collect data thereon.

(33) The method 100 according to the invention also includes a step 120 for processing the formal language instruction 211 so as to generate a set 221 of numbers 222 encoded on at least one byte. This step can be implemented by a processing module 320.

(34) As shown in FIG. 3A, each one of the numbers 222 includes position bits 223 for identifying ports. Particularly in FIG. 3A, the position bit 223 pointed at is the port #20 of the second number encoded on at least one byte. The position of one of the position bits in the number 222 encoded on at least one byte is indicative of a port number. Thus, a part of the bits advantageously corresponds to the switch port numbers. For example, as mentioned, the position bit 223 pointed at is the port #20.

(35) With reference to FIGS. 3A and 3B, the leftmost bit in the diagram is bit #63 corresponding to a possible port #63 and the rightmost bit is bit #0 corresponding to an authorization bit. Thus, each number 222 also includes at least one authorization bit 224 for defining access rights on the ports. For example, in FIG. 3A, the authorization bit 224 pointed at is the authorization bit of the first number encoded on eight bytes. In this example, there is an authorization bit for specifying whether the port selected via the position bits is to be traversed and monitored (bit value 1) or only traversed (bit value 0).

(36) The numbers 222 encoded on at least one byte may advantageously be encoded on at least four bytes, preferably they are encoded on eight bytes. Thus, the encoded number 222 may also comprise unused bits 225. Indeed, in FIG. 3A, the number 222 is encoded on eight bytes, namely 64 bits. Since a switch 20 of an interconnecting network may comprise less than 63 ports, then it is likely that the number 222 is not fully utilized.

(37) FIG. 4 shows a flowchart of a processing step according to the invention. In this case, the processing step includes a first step 121 for parsing the instruction 211 into formal language. This allows to identify the symbols that may be present in the instruction and also to identify the ranges of ports concerned.

(38) In particular, each symbol of the formal language instruction 211 is sequentially compared 122 to a stored list of symbols for segmenting the instruction 211 into a plurality of portions 212. Each portion 212 advantageously corresponds to a level of the interconnection network 1. The symbol list can be stored in a configuration file. With reference to FIG. 3B, the identified portions would be (1) and (1-18) and [1-9,13].

(39) Once the portions are identified, there is a transformation 123 of each portion 212 into a number 222 encoded on at least one byte. For example, FIG. 3B shows three numbers encoded on eight bytes. Once generated, a number 222 encoded on at least one byte is stored 124 on a memory.

(40) In the context of the method, there can then be a step 126 for verifying that all the portions 212 have been transformed into a number encoded on at least one byte. If that is not the case, the process is repeated from step 123 and the non-transformed portion undergoes the transformation step 123.

(41) Finally, there is an aggregation 125 of the numbers 222 encoded on at least one byte so as to form a set 221 of numbers encoded on at least one byte. This step for aggregating the numbers encoded on at least one byte so as to form a set of numbers encoded on at least one byte allows to group numbers into a set of numbers. The aggregation allows to group the numbers so that they can be processed together. Several ways and many possibilities are known for grouping numbers into a set of numbers. This aggregation may for example correspond to storage on a same file, to a list, to a concatenation, to an indexing under the same literal.

(42) During a step for processing a new instruction, so as to further accelerate the generation of a request, the method according to the invention may further includes a step for comparing each portion to portions stored on a memory. If an identical portion is identified in the memory, then the transformation step 123 may be replaced by a use of the number 222 encoded on at least one byte corresponding to said stored portion.

(43) The method 100 according to the invention also includes a step 130 for encoding the set 211 of numbers so as to generate a request 231 including at least one addressing command. This encoding step can be performed by an encoding module 330 associated with the request generation device according to the invention.

(44) The request 231 is preferably a request in a computer language. It allows the generation of a list of a direct route access path, or addressing command, from said numbers encoded on at least one byte.

(45) The access path list is preferably generated in a computer language allowing a “system manager” to directly access the ports defined in the formal (non-computer) language instruction made by a user. Advantageously, the request includes an addressing command per port to be monitored and said addressing command is a command of the direct route path type. Such encoding may for example rely on a software library including a routine collection capable of generating a request 231 including a plurality of addressing commands, each pointing towards one of the ports defined in the formal language instruction.

(46) Thus, the method may comprise encoding 130 the set 221 of numbers encoded on at least one byte so as to generate the request 231 via a dedicated script that can be written in a wide variety of computer languages.

(47) FIGS. 3A and 3B show examples of such requests 231. By convention, it is accepted that a direct route path always starts with the identifier 0. In addition, for clarity reasons, it is assumed here that each node comprises only one input and output port with the reference 1.

(48) The information for establishing the links between switch ports and between switch and node ports is typically stored in routing tables and/or in a configuration database of the IT infrastructure. Here, the use of a direct route path allows to dispense with the use of such routing tables.

(49) The different steps of the method according to the invention can be stored on one or more configuration files which a request generation device according to the invention could access. The one or more configuration files may be encoded in a large number of programming languages. For example, they are C language-encoded.

(50) According to another aspect, the invention relates to a request generating device 300, configured to generate a request, from a formal language instruction defining a set of ports of an interconnection network. The interconnection network generally includes a plurality of computing nodes where each node includes a plurality of ports for connecting them to each other and to switches. The generated request has the particularity of including an addressing command 232 for each one of the ports defined in the formal language instruction. The request generator 300 is more particularly an addressing command generator.

(51) As shown in FIG. 1, the request generating device 300 includes a communication module 310 able to receive a formal language instruction defining a set of ports. The communication module 310 is configured to receive and transmit information to remote systems such as tablets, phones, computers or servers and can thus enable the communication between the device 300 and a remote terminal, including a client 600. The communication module 310 allows to transmit the data on at least one communication network and may comprise a wired or wireless communication. The interaction may involve an application software used by the operator to interact with the system according to the invention. The application software may include a graphical interface for facilitating an interaction with an operator. An operator can then act on the remote terminal so as to generate an instruction that can be implemented by the device 300. Similarly, the device 300 is able to receive the instruction 211 from a client device and to send the request 231 to a system 400 for monitoring an interconnection network.

(52) The request generating device 300 also includes a processing module 320 able to process the formal language instruction 211 so as to generate a set 221 of numbers 222 encoded on at least one byte, as well as an encoding module 330 able to encode the set of numbers encoded on at least one byte so as to generate a request including an addressing command for each one of the ports of the set of ports.

(53) These modules 310, 320, 330 are separate in FIG. 1, but the invention may provide various types of arrangement, for example a single module cumulating all the functions described here. Similarly, these means may be divided into several electronic boards or gathered on a single electronic board.

(54) When an action is taken to a device or a module, it is actually performed by a microprocessor of the device or module controlled by instruction codes stored in a memory. If an action is taken to an application, it is actually performed by a microprocessor of the device in a memory of which the instruction codes corresponding to the application are stored. When a device or module transmits or receives a message, this message is sent or received by a communication interface. The memories mentioned in this invention may correspond to a random access memory and/or a mass memory. The mass memory may be a medium such as a memory card, or a hard disk hosted by a remote server.

(55) In another aspect, the invention relates to a system 400 for monitoring an interconnection network including a request generating device according to the invention, as well as a method implementing this system. Preferably, the monitoring method according to the invention is dedicated to monitoring hardware failures. It thus allows to follow hardware information that can be transmitted by the ports of the switches of the interconnection network.

(56) A monitoring method according to the invention is shown in FIG. 5. It comprises the following steps: receiving 510 a formal language instruction 211 defining a set 11 of ports corresponding to a restricted perimeter of the interconnection network, generating 520, from the formal language instruction 211, a request 231 including an addressing command for each one of the ports 10 defined in the formal language instruction 211, transmitting 530 the request 231 to a restricted perimeter monitoring module 700 able to receive hardware statuses of the set 11 of ports of the restricted monitoring perimeter, requesting 540 hardware statuses by the restricted perimeter monitoring module 700 with the ports corresponding to the restricted perimeter of the interconnection network, receiving 550 the hardware status values by the monitoring module 700, and storing 560 the hardware status values by the monitoring module 700.

(57) Preferably, storing 560 the hardware status values corresponds to storing the hardware status values that exceeded a predetermined maximum threshold. Thus, there is no memory clutter with less relevant information.

(58) The method may also include a step for transmitting the hardware status values stored by the monitoring module.

(59) In the embodiment described in FIG. 5, the monitoring method is implemented in parallel by three restricted perimeter monitoring modules 700. This allows to lighten the load supported by each one of the monitoring modules 700 and thus to improve the performance of the method.

(60) When several restricted perimeter monitoring modules are implemented, the method for monitoring an interconnection network according to the invention may further comprise receiving hardware status values from a plurality of restricted perimeter monitoring modules. As mentioned above, advantageously, only the values of hardware statuses that exceeded a predetermined maximum threshold are received 570 and stored 580. Receiving the hardware status values, as well as the subsequent steps, can be preferably supported by the monitoring system and more particularly by a dedicated processing module.

(61) The hardware status values, preferably those that exceeded a predetermined maximum threshold, are aggregated 590 so as to obtain an overall view of the interconnection network. The hardware status values may be advantageously selected from: the state of the power supply, the board voltage values, the internal temperatures (chips, controller, fan, board, etc.), the state of the fan, the environmental and operational data of the display, the software versions of the switch, the cabling errors, the congestion levels, the identifier errors (LID . . . ), the connection errors, the power supply errors, the throughput alerts.

(62) Thus, the methods and hardware according to the invention allow to quickly segment a set of ports of an interconnection network so as, for example, to set up a perimeter monitoring of said interconnection network. In addition, the thus-developed monitoring method has improved performance compared with conventional systems of routing tables or using partition keys.

(63) All of these advantages therefore contribute to reducing the monitoring processing times, especially the hardware monitoring, of an interconnection network and also to reducing the volumes of data managed.