ENCODING OF FAULT SCENARIOS OF A MANYCORE PROCESSOR
20170003347 ยท 2017-01-05
Inventors
Cpc classification
G01R31/31703
PHYSICS
G01R31/31718
PHYSICS
International classification
Abstract
A method implemented by computer for compressing and decompressing all the fault scenarios of a processor comprising computation units interconnected by a communication network having topology symmetries, each fault scenario corresponding to the number and the location of one or more failing computation units and the method comprises the steps of reception or determination of one or more topology symmetries; determination of the equivalent scenarios by means of said topology symmetries; each of the fault equivalence classes being associated with a resource allocation solution in terms of mapping and routing. Different developments include the determination or the application of an inference engine, of identifiers associated with the fault scenarios, of combinatorial exploration techniques, of compression rates, of reconfiguration of the processor and of classification of the processor in a range. A program product and associated systems are also described.
Claims
1. A method implemented by computer for compressing all the fault scenarios of a processor comprising computation units interconnected by a communication network having topology symmetries, each fault scenario corresponding to the number and the location of one or more failing computation units and the method comprising the steps of: reception or determination of one or more topology symmetries; determination of the equivalent scenarios by means of said topology symmetries; each of the fault equivalence classes being associated with a resource allocation solution determining a specific mapping of the tasks of the applications on the computation units and a specific routing of the data exchanges over the communication network.
2. The method as claimed in claim 1, the determination of the equivalent scenarios being performed by application of an inference engine.
3. The method as claimed in claim 1, the determination of the equivalent scenarios comprising the association of a set of identifiers with each fault scenario.
4. The method as claimed in claim 3, the determination of the equivalent scenarios comprising the association of a unique identifier with each fault scenario.
5. The method as claimed in claim 4, the unique identifier being obtained by concatenation of the character strings forming said identifiers.
6. The method as claimed in claim 1, the symmetries comprising one or more axial rotations about an axis comprising one or more computation units, according to angles 90/+90/+180, and one or more vertical or horizontal displacements or shifts.
7. The method as claimed in claim 1, the equivalent fault scenarios being determined by combinatorial exploration, said combinatorial exploration comprising the construction of one or more equivalence class trees.
8. The method as claimed in claim 1, further comprising the creation of a file specifying the allocation of resources.
9. The method as claimed in claim 1, further comprising the determination of a compression rate of the fault scenario space, the compression rate being associated with the type of architecture of the processor.
10. The method as claimed in claim 1 for decompressing all the fault scenarios of a processor, after the startup of the processor, further comprising: the determination of one or more faults associated with one or more computation units on starting up the processor; the characterization of the corresponding fault scenario; the identification of the fault equivalence class and of the computation resource allocation solution associated with said fault equivalence class.
11. The method as claimed in claim 10, further comprising the reconfiguration of the processor started up, the reconfiguration comprising the application to the processor of the computation resource allocation solution.
12. The method as claimed in claim 11, the reconfiguration being performed at the moment of, or after, the occurrence of one or more faults of one or more computation units of the processor in order to continue the execution of the application on the processor.
13. The method as claimed in claim 11, the reconfiguration being performed before the occurrence of one or more faults of one or more computation units of the processor preventively, one or more faults being simulated or anticipated.
14. The method as claimed in claim 1, further comprising the classification of the processor in a range grouping together processors with an identical number of failing computation units.
15. The method as claimed in claim 1, a failing computation unit being associated with a failure rate or confidence of execution rate, said failure being total or partial.
16. The method as claimed in claim 1, further comprising the identification of one or more symmetrical links from the determination or the reception of one or more topology symmetries.
17. A computer program product, said computer program comprising code instructions making it possible to perform the steps of the method as claimed in claim 1, when said program is run on a computer.
18. A system for compressing or decompressing the fault scenario space of a processor, the system comprising means for implementing the steps of the method as claimed in claim 1.
Description
DESCRIPTION OF THE FIGURES
[0059] Different aspects and advantages of the invention will become apparent from the description of a preferred, but nonlimiting, mode of implementation of the invention, with reference to the figures below:
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
DETAILED DESCRIPTION OF THE INVENTION
[0067] A multicore microprocessor is a processor that has a number of physical cores which work in parallel. A physical core is a set of circuits capable of executing programs autonomously. All the functionalities necessary to the execution of a program are present in these cores: ordinal counter, registers, computation units, etc. The term multicore is generally employed to describe a processor composed of at least two cores (or computation units) etched within the same chip. These computation units can be clustered (or not). The computation units are interconnected by a communication network (Network on Chip or NoC).
[0068] The recent technological advances in the design and creation of circuits now make it possible to produce processors containing a large number of generic cores (from a few tens to several hundreds) with complex memory hierarchies: these are no longer called multicore but manycore systems.
[0069] A certain number of problems are due to the increase in the number of cores. For example, this increase brings about a situation in which the interconnections become the bottleneck in the quest for performance. To the different execution constraints are thus added particular parallel programming techniques to exploit all the potential of the circuit (run time layer development, performance analysis, distributed control, compilation, management of communications on a network on chip, etc).
[0070] The useand therefore the marketingof a degraded chip can be obtained by different means and/or methods.
[0071] A first approach consists in performing an allocation (mapping and routing) of an application (software) to the computation and communication resources of a degraded manycore processor. Generally, these operations are done statically and offline (that is to say, when the chip is not powered up). These techniques will also assume a total and entire knowledge of the architecture of the processors on which the software application will be deployed. For example, one approach to make it possible to sell these degraded processors may consist in defining computation power ranges. A given range then gathers together all the processors which guarantee the availability of a set overall minimum of computation resources. Thus, by following this marketing strategy and because of the absence of differentiation according to the location of the healthy resources, processors corresponding to different architectures may be found within a same range. For example, in the example of a manycore with an architecture providing 16 computation clusters for which the range of degraded processors derived from this manycore comprises only 15 functional clusters (a single cluster being faulty), there are then 16 possible architectures in this range. Each architecture corresponds to the case of one of the 16 clusters in failure state. There then arises the problem of the deployment of a same application on manycore processors with the same global computation capacity but with an architecture that differs from one processor to another. In effect, the information is known only very late, when the processor is switched on. Consequently, this mapping/routing technique is limited.
[0072] It is possible to provide a solution that makes it possible to reconfigure the deployment of the application on the chip according to the fault scenario thereof. In the absence of such a solution, the degraded processors cannot be fully exploited and cannot be marketed as is. Most of the current multi/manycore processors use shared buses (for example the ClearSpeed CSX manycore) or certain types of NoCs of RING type. These circuits tend to eliminate the problem of reconfiguration by eliminating the combinatorial explosion of possible fault scenarios thereof. In fact, the patent literature does not as yet even mention this technical problem (and a fortiori does not propose any technical solution). However, these circuits with shares buses or NoCs of RING type are not scalable and the next generations of manycore processors will probably have to dispense with this type of circuit to adopt NoCs of 2D Torus, 3D Torus or other such types for which this technical problem will be acute.
[0073] Another possible approach consists in modifying the hardware, for example with the implementation of special routers. This solution may complement the examples disclosed herein.
[0074] Regarding the exploration of the space of the architectures for solving the problem of deployment, a first solution consists in exploring all the space of the architectures of a range and reiterating the mapping/routing algorithms for each architecture (fault scenario) and saving the solutions obtained. When a chip is started up, it is then possible to detect what its architecture is and use the corresponding solution. By construction, this technique is exact (i.e. optimal) and provides the guarantee, in the compilation phase, of the possibility (or not) of deploying the application whatever the fault scenario within a range. However, this technique is generally costly in execution time because the number of configurations to be explored is very great. A problem of storage of the solutions also arises because of the huge size of the binary which results therefrom. In effect, with each fault scenario requiring its own binary, the size of the final binary could be very great if all the reconfigurations for the specified range had to be stored.
[0075] By contrast to the techniques previously mentioned, certain embodiments of the invention make it possible to advantageously process the faults in terms of computation resources. Certain embodiments of the invention in effect consider processors with an NoC that is entirely functional, that is to say without router (or link) fault. The communication links and the routers are generally less subject to faults.
[0076] It is stressed that the techniques disclosed hereinbelow apply to the mapping and to the routing (both). In order to avoid additional notations which would complicate the description, only the mapping will be mentioned hereinbelow, but the disclosure will apply equally to the routing (routing links).
[0077] The term resource allocation consists of mapping and routing operations that make it possible to allocate the computation and communication resources necessary to execute the different tasks of an application and have them communicate.
[0078] The term mapping therefore designates the operation which assigns each of the tasks of an application to execution media (i.e. computation units, e.g. core, processor, cluster, server, etc.), which implicitly presuppose memory means. The tasks assigned to one or more media will then be executed solely thereby. The scheduling at a medium level will determine the order and the times for which the tasks access these shared resources (which are allocated to them).
[0079] The term routing designates the operation which computes a path made up of communication links which link the computation units (Bus, NoC, local area network, internet, etc.) in order to ensure the transfer of data between tasks which communicate with one another and which are executed on different media.
[0080]
[0081]
[0082] Each symmetry operation has associated with a relabeling function which makes it possible to restore the initial order of the labels of the architecture. For example, the cluster relabeling function [1,4,7,2,5,8,3,6,9]->[1,2,3,4,5,6,7,8,9] corresponds to the rotation of 180 in the space around the axis which passes through the clusters 1, 5 and 9. The same applies for the interconnects whose labels are not mentioned or represented in the interests of simplification.
[0083] A fault scenario is defined by the number and the location of the computation units (i.e. of the clusters in the present example) which are defective. For example, in a scenario with three faults, a scenario for which only the three clusters i, j and k are faulty will be denoted a scenario (i,j,k). The order of formulation of these faults is unimportant, i.e. the scenarios (i,j,k), (i,k,j), (j,i,k), (j,k,i), (k,i,j) or (k,j,i) are the same.
[0084]
[0085] By applying a symmetry operation (
[0086] Defining equivalence classes presents the advantage of not having to solve the mapping/routing problem and of saving the solution only for the single and unique representative of each class. By having available the sequence of symmetry operations, to bring a scenario to the representative of its class, it is possible to use this same sequence to reverse the order of application of the symmetry operations in order to relabel the solution of mapping/routing of the representative as a solution of exactly equal quality for the scenario concerned. This scenario will be known on startup of the chip.
[0087] In the example of
[0088] In the example of the scenario (1,2) represented in
[0089] The method for putting this compression into practice is described hereinbelow.
[0090] After the identification of the fault scenario equivalence classes (when the symmetry operations which save the structure of an architecture are explored and itemized), these same classes are exploited in order to proceed with the partitioning of all the fault scenarios. The number of scenario space partitioning possibilities corresponds to the Bell number which is exponential in the size of the set concerned.
[0091] For the partitioning, a first solution could consist in implementing an inference engine based on rules of symmetry, implemented for example by means of languages such as LISP or Prolog. Starting from a given scenario, the inference engine sets out to search for a sequence of rules to be applied in order to establish an equivalence with the representative of a class already listed or to create a new equivalence class if no equivalence is established (i.e. absence of sequence). In the latter case, this scenario becomes the representative of the new class. To implement such an inference engine, it is then necessary to define a) the rules of symmetry and the relabeling operations which are associated with them and b) the priority (or the order or the occurrence) of application of each of these rules and do so on each step of the search process (the priorities, orders or occurrences being able to vary from one step to another of the search process). The symmetry operations are not provided with good characteristics with respect to the composition. For example, they are not generally commutative, i.e. for two operations O1 and O2, O1*O2 gives a different result from O2*O1. Similarly, they are not idempotent: O1*O1=/=O1. These characteristics generally make the definition of an order/occurrence of application of the symmetry operations very complex. It is then difficult to define a search strategy which optimizes the execution time. Furthermore, there is a risk of having therein a number of exceptions to the predefined order which have to be taken into account according to the architecture. Moreover, when the symmetry is used, it is tempting to rely on intuition to establish these parameters but this will soon prove, in many cases, to be very complex and counterproductive for the identification of the equivalence classes. The characteristics of the symmetry operations also pose a problem of definition of the stop conditions for an inference engine. Since the number of compositions of operations possible is very great, when no equivalence with a case already listed is found, it is not generally possible to say at which moment it is possible to stop the inference engine and to conclude on the presence of a new equivalence class. If the stop condition is too restrictive, the technique also risks not being exact and the maximum compression rate might not be reached (a class could correspond to several partitions, if the equivalence between all its elements is not established). The adaptation of such a method for each architecture will also take a long time and be very complex.
[0092] These reasons make the development of an effective and generic inference engine irrelevant, in the general case.
[0093]
[0094] The ex ante knowledge of the equivalence makes it possible to persevere in the combination and the search for the sequence of symmetry operations. Without this prior knowledge, the stop problem could become critical. The demonstration which has just been made for the example might not stop, or by default (i.e. if a stop was triggered) might culminate in the incorrect conclusion of the non-equivalence of the scenarios (and therefore in the conclusion of the existence of two equivalence classes). This first search approach remains possible, but it is limited.
[0095] Another embodiment of the compression of the space of the scenarios is described below. This method is rapid and generic. According to this embodiment, each scenario is associated with an identifier, so that two scenarios associated with a same identifier are necessarily or inevitably equivalent and are therefore interchangeable. According to one embodiment, the identifier can be that of the equivalence class itself. By construction, such an identifier incorporates the transformations of symmetry and does not depend on the labels associated with the failing clusters. To this end, a distance is introduced: an inter-cluster distance which adds the concept of symmetry to the usual inter-cluster distances. Then, by using this new distance as a basis, means and/or steps make it possible to identify a particular scenario.
[0096] In detail, the compression method comprises four steps or phases.
[0097] In a first step, a distance is defined that is called symmetrical inter-cluster distance. To define the fault scenario identifier, a means of identification of the clusters at equivalent distance from another cluster is described hereinbelow. To define an inter-cluster distance, a first solution consists in taking into account the number of links which separate these clusters (e.g. hop count). This measure makes it possible to assess the shortest inter-cluster routing path length and thus gives a measurement for the data bit rate consumed relative to the overall routing capacity of the network. The smaller this is, the more the congestion of the network can be avoided (and energy saved). The improved inter-cluster distance proposed here takes into accountin addition to the hop count and the nature of the interconnectsthe patterns of symmetry. By corollary, two pairs of clusters which are at different hop counts will have different symmetrical distances. These distances are introduced in
[0098]
[0099] According to the symmetries detected and exploited in order to define the symmetrical distances (for example by a developer implementing reconfiguration software for a given chip), different matrices can be obtained. The greater the number of symmetries exploited, the better the compression. For example, if a developer considers that the clusters 6 and 2 are not at the same symmetrical distance from 5 because he or she disregards the axial rotation about the center, then he or she will have to label one of these distances some other way, C for example. This will automatically increase to 3 the number of equivalence classes for the range with 2 faults and will thus reduce the compression rate to 91.66%. This rate will be reduced all the more if it is considered that this same developer would surely have neglected other symmetries on the distances denoted B initially in the matrix of the example and will then have to define additional distances D, E and F. In this case, the compression rate drops to 83.33%.
[0100] To designate the symmetrical distances, it is possible to use numerical labels in place of the alphabetical labels, but the latter greatly simplify the computation and the comparison of the identifiers of scenarios thereby allowing for the implementation of more generic and more effective algorithms.
[0101]
[0102] In a second step of the compression method, a fault scenario is identified and characterized. To do this, the positions of the faults relative to one another are used, by means of the concept of symmetrical distance defined previously.
[0103] In one embodiment, the identifier of a scenario (and therefore of its equivalence class) is a word taken from an alphabet made up exclusively of the labels of the symmetrical distances. For a scenario with two faults (Ui,Uj), the identifier denoted Identity(Ui,Uj) is given by the distance Mij. It should be noted that the brackets in the identifier are necessary neither for the definition nor for the comparison of the identifiers. These brackets are represented only to make it easier to read and show the transition from one fault to the next, but they can be deleted on implementation.
[0104] For a scenario with three faults (Ui,Uj,Uk), the identifier Identity(Ui,Uj,Uk) is obtained by Mji(MkiMkj). For the scenarios with a greater number of faults, recurrence is applied by using a character string concatenation operator. By using Identity(Ui, Uj, . . . , Um) to designate the identifier of the scenario (Ui, Uj, . . . , Um), then Identity(Ui, Uj, Uk, . . . , Um, Uq) corresponds to the concatenation of Identity(Ui, Uj, . . . , Um) and (MqiMqjMqk . . . Mqm). Since the concatenation is not symmetrical, the identification depends on the order considered in the formulation of the faults of the scenario. In other words, with no additional method, several identifiers can be computed for a same scenario. In order to avoid equivalent scenarios being detected as such because of the order of the formulation of the faults, an additional and optional step makes it possible to make sure that they have a unique identifier and therefore ensures the accuracy of the technique and a maximum compression rate. In one embodiment, the order of formulation of faults which gives the smallest identification with respect to the comparison of the character strings is selected. The benefit of the concatenation for the comparison of the identifiers justifies the use of alphabetical labels. Alternatively, alphabetical, numeric and/or alphanumeric characters can be used. By applying this identification technique to the example of
[0105] In conclusion, the use of identifiers encoding the symmetries makes it possible to dispense with the description of a sequence of symmetry operations in order to demonstrate the equivalence of the scenarios.
[0106] When the number of faults is greater than half the number of units initially provided on a chip, rather than defining a scenario through its units with faults, it may be prudent to define it through its functional units to optimize the size of the scenario identifiers. The technique for computing the identifier Identity will remain valid by relating the positions of the healthy units relative to themselves. For example, from 5 faults, it is more efficient to list the remaining 4 healthy clusters of the 33 2D Torus.
[0107] In a third step of the compression method, the space of the scenarios is explored. During this step, an equivalence class tree is constructed. This operation consists in scanning the space of the possible fault scenarios in order to partition it into equivalence classes according to the definition given during the preceding step. For each scenario scanned, a check to see if the tree does not contain a listed scenario which is equivalent to it is carried out. If that is not the case, a new node is added to the tree (the scenario and its identifier being the representatives of the new equivalence class). The different levels of the tree indicate the different ranges. For example, at the top of the tree, there are the classes of the range with a single fault. In the example of the 33 2D Torus type processor, it is made up of only a single node. The construction of the tree stops when the maximum fault level considered (i.e. the most degraded range) has been reached. This tree is constructed just once for each manycore architecture because it is independent of the applications which will be deployed therein. The supplier of the manycore processor will be able to construct and incorporate this tree in the development tools (for example). The users or developers will also be able to use such trees, for example when designing applications in order to compute and save the mapping and/or routing solutions for each of the nodes of the level which represents the range on which they want to deploy the application.
[0108] In the case of the exploration, given that it is not possible to know previously whether two given scenarios are equivalent or not, the use of an identifier is found to be particularly advantageous: it makes it possible to state that two scenarios are equivalent even though the sequence of symmetry operations necessary to transform one scenario into another remains unknown (not available, not documented, not accessible, etc).
[0109] Optionally, different variants of the explorer can be implemented, for example with the choice of a widthwise or depthwise search, according to different choices of the representative of an equivalence class (for example choosing the one for which the labels of the clusters are the smallest).
[0110] For most manycore processor architectures, the space explored in order to construct the tree of the equivalence classes can itself be reduced. In effect, it is not always necessary to scan the entire space. For the example of the 33 2D Torus, any fault can be brought backby symmetry operationsto be placed (or relabeled) as fault of the cluster 1. In order to reduce the space to be explored, it is therefore possible to consider that the computation unit number 1 is always faulty. Thus, the size of the space to be explored is divided by 9. Optionally, for greater certainty, a test can also be implemented in order to check that the tree of the equivalence classes obtained does indeed cover all the possible scenario cases.
[0111] In a fourth step of the compression method, a relabeling is performed. After the startup of the chip and the detection of the fault scenario, the identifier and therefore the equivalence class to which the chip belongs can be determined according to the technique presented in the second step. Thus, it is possible to extract, from the tree of the scenarios constructed during the phase 3, the equivalent scenario and the mapping/routing solution which are associated with the fault situation of the chip. It will then be necessary to adapt this equivalent solution to the scenario of the chip. This adaptation constitutes the relabeling step or phase.
[0112] This relabeling step is executed on the chip since there is a need to know the fault scenario. In an embedded execution context, this phase can be optimized because of or for resources that are limited in computation and memory terms (low complexity in execution time and space).
[0113] In some cases, the NoC architecture can be the same for all the manycore processors (when, by assumption, the NoC is entirely functional or fault-free). The relabeling step proposed consumes little in computation terms. In effect, contrary to the reasons which urge relativizing the use of an inference engine, the fact of knowing previously that two scenarios are equivalent (by computing their identifier and therefore the order of formulation of the faults) makes possible the determination of a sequence of symmetry operations which is both generic and effective for bringing a scenario back to another which is equivalent to it. For example, for a 44 2D Torus, the algorithm requires fewer than 3 instructions to detect the sequence and thus relabel a scenario to its equivalent. For that, from the moment when the first fault is known according to the order which has been obtained for the computation of the identifier, it is sufficient to bring this fault back to the fault 1 by performing the necessary vertical and horizontal shifts. Then, the axial rotation (if necessary) will determine the rest of the faults. This algorithm is of O(1) complexity, that is to say the lowest there is.
[0114]
[0115]
[0116]
[0117] The invention will be advantageously applicable for Cloud Computing, distributed computation environments, Grid Computing, etc. In addition to the virtualization of applications (eliminating the specifics of code execution on a stock of heterogeneous processors), the invention can in fact make it possible to manage faults over a set of processors, because of the topological abstraction which is performed to exploit the symmetries. In a simplified example, a processor with 800 cores can be topologically equivalent to two hundred processors with 4 cores. Consequently, the exploitation of the symmetries in both cases carry the same advantages, namely static reconfigurations (packaging in ranges, pre-marketing) even dynamic reconfigurations during runtime.
[0118] The present invention can be implemented from hardware and/or software elements. It can be available as computer program product on a computer-readable medium. The medium can be electronic, magnetic, optical, electromagnetic or be a broadcast medium of infrared type.