MESSAGE CHANNELS
20250053465 ยท 2025-02-13
Inventors
Cpc classification
G06F15/80
PHYSICS
G06F15/173
PHYSICS
International classification
Abstract
A message channel functionality for a data processing system is disclosed. This provides communication channels which may be considered to be a shared resource. The approach combines atomic stores, which are fully completed in a single atomic transaction, and non-coherence to provide non-coherent atomic stores that are conditional to implement primitive communications channels that can be used to implement software queues and channels more efficiently. This enables the programmer to execute a store from registers on one side of a communications link and to have that data appear in the registers of a data consumer on that link directly, bypassing both the shared state upgrade problem and the parallel problem of acquiring a synchronization lock before data send.
Claims
1. A data processing system comprising: a system privileged agent arranged to define a configuration of the data processing system; multiple processing elements arranged to perform data processing, wherein the multiple processing elements comprise a producer element and a consumer element; and interconnect circuitry arranged to couple the multiple processing elements with one another, wherein the data processing system supports a message channel functionality according to which: the system privileged agent is configured to define a message channel for communication between the producer element and a consumer element, the message channel being defined by a message channel identifier and a message channel target pointer, wherein the message channel target pointer indicates a non-cacheable target location associated with the consumer element; the producer element is configured to perform an atomic message store operation with respect to a block of message data targeting the consumer element, wherein the producer element specifies the message channel identifier and the block of message data; and the interconnect circuitry is configured to convey the block of message data atomically to the non-cacheable target location associated with the consumer element.
2. The data processing system as claimed in claim 1, wherein the data processing system further comprises: a router device coupled to the interconnect circuitry and comprising an input port and an output port, wherein the system privileged agent is configured to define the message channel for communication between the producer element and a consumer element by: providing the producer element with a message channel router pointer indicative of the input port of the router device; and storing message channel configuration data in the router, the message channel configuration data comprising the message channel identifier and the message channel target pointer, wherein the producer element is arranged to perform the atomic message store operation specifying the message channel router pointer, and wherein the router device is responsive to reception of the block of message data at the input port to forward the block of message data from the output port to the non-cacheable target location indicated by the message channel target pointer.
3. The data processing system as claimed in claim 2, wherein the multiple processing elements comprise multiple consumer elements, and wherein more than one consumer elements subscribe to the message channel, wherein the message channel is associated with multiple message channel target pointers, wherein each of the message channel target pointers indicates a non-cacheable target location associated with a respective consumer element of the multiple consumer elements, and wherein the message channel configuration data stored in the router device comprises the message channel identifier and the multiple message channel target pointers.
4. The data processing system as claimed in claim 3, wherein the router device is responsive to reception of the block of message data at the input port to select a recipient consumer element from the more than one consumer elements which subscribe to the message channel and to forward the block of message data from the output port to the non-cacheable target location indicated by the message channel target pointer associated with the recipient consumer element.
5. The data processing system as claimed in claim 4, wherein the router device is further responsive to the reception of the block of message data at the input port to re-forward the block of message data from the output port to each of the more than one consumer elements which subscribe to the message channel.
6. The data processing system as claimed in claim 2, wherein the consumer element is responsive to reception of the block of message data at the non-cacheable target location indicated by the message channel target pointer to return a success indicator to the router device, wherein the success indicator indicates whether or not the block of message data has been successfully received by the consumer element, and the router device is responsive to reception of the success indicator to forward the success indicator to the producer element.
7. The data processing system as claimed in claim 2, wherein the router device is responsive to the reception of the block of message data at the input port, when no consumer element is available for the message channel, to return a message failure indication to the producer element.
8. The data processing system as claimed in claim 2, comprising: a plurality of router devices coupled to the interconnect circuitry and each comprising an input port and an output port, wherein the system privileged agent is configured to define the message channel for communication between the producer element and a consumer element by concatenating the plurality of router devices, such that: the message channel router pointer specified by the producer element specifies a first router device of the plurality of router devices, and the message channel configuration data stored in each of the plurality of router devices links the plurality of router devices in sequence, such that each router device of the plurality of router devices other than a last concatenated router device is responsive to reception of the block of message data at the input port to forward the block of message data from the output port to a next router device of the plurality of router devices, and the last concatenated router which is responsive to reception of the block of message data at the input port to forward the block of message data from the output port to the non-cacheable target location indicated by the message channel target pointer.
9. The data processing system as claimed in claim 1, wherein the system privileged agent is configured to define a router-less message channel for communication between the producer element and a consumer element by providing the producer element with the message channel target pointer indicating the non-cacheable target location associated with the consumer element, wherein the producer element is arranged to perform the atomic message store operation specifying the message channel target pointer.
10. The data processing system as claimed in claim 1, wherein the consumer element comprises a holding buffer accessible to user software executing on the consumer element, wherein the non-cacheable target location associated with the consumer element is configured as a data reception port of the consumer element, and wherein the data reception port is configured to forward the block of message data received atomically to the holding buffer.
11. The data processing system as claimed in claim 10, wherein the holding buffer comprises at least one of: a set of system registers; vector registers; user software addressable memory buffer; and a plurality of sub-buffers, wherein each sub-buffer of the plurality of sub-buffers is allocated to a corresponding message channel to which the consumer element is subscribed.
12. The data processing system as claimed in claim 10, wherein the consumer element is configured to reserve at least a portion of the holding buffer for at least one prioritised message channel to which the consumer element is subscribed.
13. The data processing system as claimed in claim 10, wherein the user software executing on the consumer element is configured to test whether the holding buffer currently holds a user software targeted block of message data on a message channel to which the user software is subscribed.
14. The data processing system as claimed in claim 10, wherein the consumer element is configured to support execution of multiple tasks on the consumer element, wherein each task has an individual set of consumer element state and the consumer element is configured to switch to a corresponding individual set of consumer element state when switching to a current task of the multiple tasks.
15. The data processing system as claimed in claim 14, wherein the consumer element is responsive to an attempt to deliver the block of message data at the data reception port, to receive or reject the block of message data in dependence on whether current task is subscribed to the message channel for the block of message data.
16. The data processing system as claimed in claim 14, wherein the data reception port is responsive to an attempt to deliver the block of message data at the data reception port, when the current task is not subscribed to the message channel for the block of message data, to generate an interrupt signal for the consumer element.
17. The data processing system as claimed in claim 1, wherein the consumer element comprises message channel handling circuitry comprising the non-cacheable target location, wherein the message channel handling circuitry is configured to reference message channels using message channel identifiers, wherein user software executing on the consumer element is configured to reference the message channel using a virtual message channel identifier, and the consumer element comprises message channel identifier translation circuitry configured to translate virtual message channel identifiers to message channel identifiers in dependence on user software currently executing on the consumer element.
18. The data processing system as claimed in claim 1, wherein the system privileged agent comprises at least one of: an operating system; and a hypervisor.
19. The data processing system as claimed in claim 1, wherein the system privileged agent is responsive to a message channel setup call for the message channel from a processing element of the multiple processing elements to: allocate the message channel identifier for the message channel; specify the message channel target pointer; wherein the processing element uses virtual addresses to reference memory locations, and allocate a virtual address for the processing element to use for the message channel, wherein the virtual address maps to a physical address given by the message channel target pointer.
20. The data processing system as claimed in claim 2, wherein the system privileged agent is responsive to a message channel setup call for the message channel from a processing element of the multiple processing elements to: allocate the message channel identifier for the message channel; specify the message channel target pointer; specify the message channel router pointer; wherein the processing element uses virtual addresses to reference memory locations; and allocate a virtual address for the processing element to use for the message channel, wherein the virtual address maps to a physical address given by the message channel router pointer.
21. The data processing system as claimed in claim 20, wherein the system privileged agent is configured to define a virtual-to-physical address mapping scheme between a virtual address space and a physical address space in which a subset of bits of the physical address space are directly indicative of a set of message channel identifiers defined by the system privileged agent.
22. The data processing system as claimed in claim 1, wherein at least one processing element of the multiple processing elements is configured to support execution of multiple tasks on the processing element, wherein each task has an individual set of processing element state, wherein the system privileged agent is configured to administer time-sliced use of the processing element by causing an exchange of the individual set of processing element state and by modifying at least one of the message channel identifier and the message channel target pointer.
23. A method of operating a data processing system comprising: defining a configuration of the data processing system by a system privileged agent; performing data processing in multiple processing elements, wherein the multiple processing elements comprise a producer element and a consumer element; coupling the multiple processing elements with one another via interconnect circuitry; defining a message channel for communication between the producer element and a consumer element, the message channel being defined by a message channel identifier and a message channel target pointer, wherein the message channel target pointer indicates a non-cacheable target location associated with the consumer element; performing an atomic message store operation by the producer element with respect to a block of message data targeting the consumer element, wherein the producer element specifies the message channel identifier and the block of message data; and conveying the block of message data atomically to the non-cacheable target location associated with the consumer element via the interconnect circuitry.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] The present invention will be described further, by way of example only, with reference to embodiments thereof as illustrated in the accompanying drawings, in which:
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
DESCRIPTION OF EXAMPLE EMBODIMENTS
[0045] Before discussing the embodiments with reference to the accompanying figures, the following description of embodiments is provided.
[0046] In accordance with one example configuration there is provided a data processing system comprising: [0047] a system privileged agent arranged to define a configuration of the data processing system; [0048] multiple processing elements arranged to perform data processing, wherein the multiple processing elements comprise a producer element and a consumer element; and [0049] interconnect circuitry arranged to couple the multiple processing elements with one another, [0050] wherein the data processing system supports a message channel functionality according to which: [0051] the system privileged agent is configured to define a message channel for communication between the producer element and a consumer element, the message channel being defined by a message channel identifier and a message channel target pointer, wherein the message channel target pointer indicates a non-cacheable target location associated with the consumer element; [0052] the producer element is configured to perform an atomic message store operation with respect to a block of message data targeting the consumer element, wherein the producer element specifies the message channel identifier and the block of message data; and [0053] the interconnect circuitry is configured to convey the block of message data atomically to the non-cacheable target location associated with the consumer element.
[0054] The present techniques are based on an approach which, instead of seeking to address the problem of acquiring shared state, instead provides a communication channel which may be considered to be a shared resource. This approach makes use of the concept of an atomic store (that is, that the store will be fully completed in a single atomic transaction), and the concept of non-coherence (which most common coherent buses implement) to use non-coherent atomic stores that are conditional to implement primitive communications channels that can be used to implement software queues and channels more efficiently. In short this enables the programmer to execute a store from registers on one side of a communications link and to have that data appear in the registers of a data consumer on that link directly, bypassing both the shared state upgrade problem and the parallel problem of acquiring a synchronization lock before data send. A system privileged agent (e.g. part of the operating system or a hypervisor) defines the configuration of the components participating in the message channel functionality and provision the ability of system software and user-space to then make use of message channels. Once thus established a producer element, i.e. one of the processing elements which has message data to be conveyed, performs an atomic message store operation, specifying a message channel identifier and a block of message data to be conveyed. The interconnect circuitry is configured to convey the block of message data atomically to the non-cacheable target location associated with the consumer element.
[0055] There are a variety of ways in which the message channel for communication between a producer element and a consumer element may be physically established. In particular, the producer element and the consumer element may communicate via the interconnect circuitry without any other intermediate device. However, in some examples the data processing system further comprises: [0056] a router device coupled to the interconnect circuitry and comprising an input port and an output port, [0057] wherein the system privileged agent is configured to define the message channel for communication between the producer element and a consumer element by: [0058] providing the producer element with a message channel router pointer indicative of the input port of the router device; and [0059] storing message channel configuration data in the router, the message channel configuration data comprising the message channel identifier and the message channel target pointer, [0060] wherein the producer element is arranged to perform the atomic message store operation specifying the message channel router pointer, [0061] and wherein the router device is responsive to reception of the block of message data at the input port to forward the block of message data from the output port to the non-cacheable target location indicated by the message channel target pointer.
[0062] Accordingly, the communication between the producer element and the consumer element in such examples is provided in (at least) two stages, whereby the block of message data is initially conveyed from the producer element to the input port of the router device (as indicated by the message channel router pointer) and then subsequently the block of message data is forwarded by the router device from its output port to the non-cacheable target location indicated by the message channel target pointer.
[0063] It should be appreciated that the message channels which are established in the data processing system are not limited to providing one-to-one communication between a given producer element and a given consumer element. Instead, it should be appreciated that multiple processing elements may subscribe to given message channel, such that a block of message data which is put into a message channel by a producer element may be provided to just one consumer element or may be provided to more than one consumer element. For example, in some examples in which the data processing system further comprises a router device, the multiple processing elements comprise multiple consumer elements, and wherein more than one consumer elements subscribe to the message channel, wherein the message channel is associated with multiple message channel target pointers, wherein each of the message channel target pointers indicates a non-cacheable target location associated with a respective consumer element of the multiple consumer elements, [0064] and wherein the message channel configuration data stored in the router device comprises the message channel identifier and the multiple message channel target pointers. As such, depending on the configuration of the router device and the message channel, the block of message data may be forwarded to a selected one of the multiple consumer elements which subscribe to the message channel, may be forwarded to a subset of the multiple consumer elements which subscribe to the message channel, or may be forwarded to all of the multiple consumer elements which subscribe to the message channel.
[0065] In some examples, the router device is responsive to reception of the block of message data at the input port to select a recipient consumer element from the more than one consumer elements which subscribe to the message channel and to forward the block of message data from the output port to the non-cacheable target location indicated by the message channel target pointer associated with the recipient consumer element.
[0066] Such a selection of the recipient consumer element may occur in a variety of ways, but in some examples the router device is configured to select the recipient consumer element in dependence on recipient ordering data for the more than one consumer elements which subscribe to the message channel stored in the router device.
[0067] As mentioned, the block of message data may be distributed to more than one subscribing consumer element and thus in some examples the router device is further responsive to the reception of the block of message data at the input port to re-forward the block of message data from the output port to each of the more than one consumer elements which subscribe to the message channel.
[0068] Whilst the message channel functionality may be used to convey a range of sizes of blocks of message data, a message channel may also be used for communication, without explicit data being sent. Thus in examples the producer element is further configured to send a zero data message, wherein the zero data message specifies the message channel identifier and an identifier indicative of the zero data message data, and the router device is responsive to reception of a zero data message to forward the zero data message to one or more consumer elements which subscribe to the message channel. Hence, instead of the producer element specifying the message channel identifier and an explicit block of data to be conveyed, in such examples the producer element specifies the message channel identifier and an identifier indicative of the zero data message data, i.e. the data identifier, which may take a variety of forms (but generally need only be a short, unique identifier) has the particular meaning that no data is being transferred. Nonetheless the same message signalling takes place. Such zero-data messaging may then be used for a variety of signalling purposes.
[0069] There are some examples, wherein the multiple processing elements comprise multiple producer elements, and wherein more than one producer elements are configured to send the zero data message, and the router device is configured to maintain a count of a number of producer elements from which the zero data message has been received, and the router device is responsive to the count reaching a predetermined threshold value to forward the zero data message to one or more consumer elements which subscribe to the message channel. This mechanism can for example be used to allow multiple producer elements to synchronise with one consumer element.
[0070] The passing of messages via the message channels from a producer element to a consumer element via a router may also involve acknowledgments being returned and in some examples the consumer element is responsive to reception of the block of message data at the non-cacheable target location indicated by the message channel target pointer to return a success indicator to the router device, wherein the success indicator indicates whether or not the block of message data has been successfully received by the consumer element, and the router device is responsive to reception of the success indicator to forward the success indicator to the producer element.
[0071] The router device itself may also react to acknowledgments, and in particular may react to a negative acknowledgment (i.e. a NACK indicating that a message has not been received by the consumer element) and in some examples the router device is responsive to the reception of the success indicator from the consumer element, when the success indicator indicates that the block of message data has not been successfully received by the consumer element, to retry forwarding the block of message data.
[0072] Alternatively, or in addition, the router device may try a different destination for a given block of message data, and in some examples the multiple processing elements comprise multiple consumer elements, and wherein more than one consumer elements subscribe to the message channel, wherein the router device is responsive to the reception of the success indicator from the consumer element, when the success indicator indicates that the block of message data has not been successfully received by the consumer element, to retry forwarding the block of message data to another consumer element which subscribes to the message channel.
[0073] In some examples, the router device may itself generate message failure indications. For example, the router device is responsive to the reception of the block of message data at the input port, when no consumer element is available for the message channel, to return a message failure indication to the producer element.
[0074] The router device may determine that no consumer element is available in a variety of ways, but in some examples the router device is configured to maintain consumer element capacity data indicative of a capacity of the consumer element to receive message data, and when the consumer element capacity data indicates that the consumer element does not have capacity to receive the block of message data at the input port, the router device is arranged to return a message failure indication to the producer element.
[0075] In some examples, the router device comprises a message block buffer configured to store multiple blocks of message data, wherein the router device is configured to forward the multiple blocks of message data respecting an ordering defined by the message block buffer. In some examples the ordering is a first-in-first-out ordering.
[0076] In some examples the router device comprises a message block buffer configured to buffer a received block of message data and, when the message block buffer is not available to buffer the block of message data but will be available to buffer the block of message data after a known processing step, to return a fail-but-retry message to the producer element indicating that the block of message data will be receivable after the known processing step. For example, the producer element could receive a retry indicator, which indicates that a buffer will be reserved for a follow-on retry. This can be possible for the router device when it has no available buffer space in the cycle when the message is received, but due to deterministic behaviour of how the router device handles message blocks (and clears buffer entries) a prediction can be possible that buffer space will be available on the next cycle.
[0077] In some examples data processing system comprises: [0078] a plurality of router devices coupled to the interconnect circuitry and each comprising an input port and an output port, [0079] wherein the system privileged agent is configured to define the message channel for communication between the producer element and a consumer element by concatenating the plurality of router devices, such that: [0080] the message channel router pointer specified by the producer element specifies a first router device of the plurality of router devices, [0081] and the message channel configuration data stored in each of the plurality of router devices links the plurality of router devices in sequence, [0082] such that each router device of the plurality of router devices other than a last concatenated router device is responsive to reception of the block of message data at the input port to forward the block of message data from the output port to a next router device of the plurality of router devices, and the last concatenated router which is responsive to reception of the block of message data at the input port to forward the block of message data from the output port to the non-cacheable target location indicated by the message channel target pointer.
[0083] In some examples the router device further comprises an auxiliary interface providing a control access to the router device, wherein control signals received at the auxiliary interface provide at least partial control of operation of the router device.
[0084] In some examples the router device further comprises an auxiliary interface providing a control access to the router device, wherein control signals received at the auxiliary interface provide at least partial control of operation of the router device, wherein the at least partial control of operation of the router device comprises the selection of the recipient consumer element from the more than one consumer elements which subscribe to the message channel.
[0085] In some examples the router device further comprises a work queue buffer arranged to buffer multiple blocks of message data, wherein the multiple blocks of message data comprise task definitions of tasks to be carried out by the consumer element, [0086] and wherein the control signals received at the auxiliary interface control scheduling of the tasks to be carried out by the consumer element by selection from the multiple blocks of message data buffered in the work queue buffer.
[0087] In some examples the work queue buffer comprises multiple work queues, wherein the multiple work queues each has an associated priority level relative to each other, and wherein the control signals received at the auxiliary interface control scheduling of the tasks to be carried out by the consumer element respecting the relative associated priority levels of the multiple work queues.
[0088] As mentioned above, the zero data message sending using message channels may be used for a variety of signalling purposes. In some examples the producer element is further configured to send a zero data message, wherein the zero data message specifies the message channel identifier and an identifier indicative of the zero data message data, and wherein the multiple processing elements comprise multiple producer elements, and wherein more than one producer elements are configured to send the zero data message, [0089] and the router device further comprises producer element lock tracking storage and the router device is responsive to reception of the zero data message from a lock-seeking producer element to store an indication of the lock-seeking producer element in the producer element lock tracking storage, [0090] wherein the producer element lock tracking storage also stores a lock status indication indicative of whether a lock target is currently allocated to one of the multiple producer elements, [0091] wherein when the lock status indication is not set, the lock target is allocated to the lock-seeking producer element and the lock status indication is set, [0092] and when the lock status indication is set, the indication of the lock-seeking producer element is queued up in the producer element lock tracking storage.
[0093] The producer element lock tracking storage may be provided in a variety of ways, but in some examples the producer element lock tracking storage is configured as a shift register, wherein storing the indication of the lock-seeking producer element in the producer element lock tracking storage comprises shifting the indication of the lock-seeking producer element into the shift register, [0094] wherein when the lock status indication is set and the lock-allocated producer element to which the lock target is currently allocated sends the zero data message again, the router device is configured to pop an indication of the lock-allocated producer element from the shift register, [0095] and wherein when popping the indication of the lock-allocated producer element from the shift register reveals an indication of a further lock-seeking producer element, the router device is configured to send the zero data message to the further lock-seeking producer element indicating that the lock target is now allocated to the further lock-seeking producer element.
[0096] As mentioned above, the data processing system may be provided in a configuration with one or more router devices, but may also be provided in a router-less configuration. In some examples, the system privileged agent is configured to define a router-less message channel for communication between the producer element and a consumer element by providing the producer element with the message channel target pointer indicating the non-cacheable target location associated with the consumer element, [0097] wherein the producer element is arranged to perform the atomic message store operation specifying the message channel target pointer.
[0098] The consumer element may be configured to make use of a block of message data received at the non-cacheable target location in various ways, but in some examples the consumer element comprises a holding buffer accessible to user software executing on the consumer element, [0099] wherein the non-cacheable target location associated with the consumer element is configured as a data reception port of the consumer element, [0100] and wherein the data reception port is configured to forward the block of message data received atomically to the holding buffer.
[0101] The holding buffer may take a variety of forms but in some examples the holding buffer comprises at least one of: [0102] a set of system registers; [0103] vector registers; and [0104] user software addressable memory buffer.
[0105] In some examples the holding buffer is sub-divided into a plurality of sub-buffers, wherein each sub-buffer of the plurality of sub-buffers is allocated to a corresponding message channel to which the consumer element is subscribed.
[0106] In some examples the consumer element is configured to reserve at least a portion of the holding buffer for at least one prioritised message channel to which the consumer element is subscribed.
[0107] In some examples, the consumer element is responsive to an attempt to deliver the block of message data at the data reception port, to return a success indicator, wherein the success indicator indicates whether or not the holding buffer currently has capacity to receive the block of message data.
[0108] In some examples the user software executing on the consumer element is configured to test whether the holding buffer currently holds a user software targeted block of message data on a message channel to which the user software is subscribed.
[0109] In some examples the consumer element is configured to support execution of multiple tasks on the consumer element, wherein each task has an individual set of consumer element state and the consumer element is configured to switch to a corresponding individual set of consumer element state when switching to a current task of the multiple tasks.
[0110] In some examples the consumer element is responsive to an attempt to deliver the block of message data at the data reception port, to receive or reject the block of message data in dependence on whether current task is subscribed to the message channel for the block of message data.
[0111] In some examples the data reception port is responsive to an attempt to deliver the block of message data at the data reception port, when the current task is not subscribed to the message channel for the block of message data, to generate an interrupt signal for the consumer element.
[0112] In some examples the consumer element is configured to reference memory locations using virtual addresses and comprises address translation circuitry to perform address translation of the virtual addresses into physical addresses, [0113] wherein the consumer element is configured to map a virtual address associated with the message channel to a physical address associated with the holding buffer, [0114] and wherein the consumer element is configured to access the message channel by execution of a load instruction specifying the virtual address.
[0115] Moreover, the virtualisation approach can also be extended to the message channel identifiers to renumber the message channel identifiers to an enumeration used by the particular consumer software which is currently executing. Accordingly, in some examples the consumer element comprises message channel handling circuitry comprising the non-cacheable target location, wherein the message channel handling circuitry is configured to reference message channels using message channel identifiers, [0116] wherein user software executing on the consumer element is configured to reference the message channel using a virtual message channel identifier, [0117] and the consumer element comprises message channel identifier translation circuitry configured to translate virtual message channel identifiers to message channel identifiers in dependence on user software currently executing on the consumer element.
[0118] The blocks of message data are not limited in terms of their semantic content and therefore the message channels may be put to a great variety of uses in supporting communication between processing elements in the system. However, in some examples the consumer element is configured to receive task definitions via the message channel and the block of message data provides at least a part of a task definition for the consumer element. Thus a message channel may be used by a producer element to delegate processing tasks to a consumer element.
[0119] The provision of task data in this manner may be reacted to in various ways, but in some examples the consumer element is configured, when a currently executing task relinquishes use of the consumer element, and when a block of message data providing at least a part of a task definition for the consumer element has been received on the message channel, to switch to performing a new task defined by the task definition.
[0120] Various configurations may be supported for administering the manner in which the consumer element responds to received task definitions, in particular how the consumer element prioritises a received task definition against other tasks it is carrying out. In some examples the consumer element is configured, when a block of message data providing at least a part of a task definition for the consumer element is received on the message channel, to pause execution of a currently executing task and to switch to performing a new task defined by the task definition.
[0121] The virtualisation approach may also be applied at the producer element side and accordingly, in some examples the producer element is configured to reference memory locations using virtual addresses and comprises address translation circuitry to perform address translation of the virtual addresses into physical addresses, [0122] wherein the producer element is configured to map a virtual address associated with the message channel to a physical address associated with the message channel identifier, [0123] and wherein the producer element is configured to access the message channel by execution of a store instruction specifying the virtual address. This then provides an approach to the use of the message channel for the producer element which is advantageously integrated with its approach to interacting with the memory system, whereby pushing a message into a given message channel can be achieved by a store operation to a specified virtual address (which has been mapped to the message channel).
[0124] A variety of message store operations may be supported, but in some examples the execution of the store instruction comprises retrieval of the block of message data from a set of registers and storing the block of message data to the physical address associated with the message channel identifier.
[0125] In some examples the system privileged agent comprises at least one of: [0126] an operating system; and [0127] a hypervisor.
[0128] In order to facilitate the use of the message channel functionality, the system privileged agent can provide a range of system calls that may be made by the processing elements in the system. In some examples the system privileged agent is responsive to a message channel setup call for the message channel from a processing element of the multiple processing elements to: [0129] allocate the message channel identifier for the message channel; [0130] specify the message channel target pointer; [0131] wherein the processing element uses virtual addresses to reference memory locations, and allocate a virtual address for the processing element to use for the message channel, wherein the virtual address maps to a physical address given by the message channel target pointer.
[0132] Establishment of such system call possibilities may also incorporate the use of one or more router devices in the supporting infrastructure for the message channels and hence in some examples the system privileged agent is responsive to a message channel setup call for the message channel from a processing element of the multiple processing elements to: [0133] allocate the message channel identifier for the message channel; [0134] specify the message channel target pointer; [0135] specify the message channel router pointer; [0136] wherein the processing element uses virtual addresses to reference memory locations, and allocate a virtual address for the processing element to use for the message channel, wherein the virtual address maps to a physical address given by the message channel router pointer.
[0137] In some examples at least some of the physical address mappings employed may enable the physical address to encode the message channel identifier and hence simplify matching. Hence in some examples the system privileged agent is configured to define a virtual-to-physical address mapping scheme between a virtual address space and a physical address space in which a subset of bits of the physical address space are directly indicative of a set of message channel identifiers defined by the system privileged agent.
[0138] Context switching may be supported on at least one processing element in the system and the present techniques further propose that the processing element state which is switched in and out on a context switch also comprises at least one message channel identifier and corresponding message channel target pointer. Hence in some examples at least one processing element of the multiple processing elements is configured to support execution of multiple tasks on the processing element, wherein each task has an individual set of processing element state, [0139] wherein the system privileged agent is configured to administer time-sliced use of the processing element by causing an exchange of the individual set of processing element state and by modifying at least one of the message channel identifier and the message channel target pointer.
[0140] When multiple router devices are present the start-up process for the system may support a process administered by the system privileged agent defining the system configuration which takes the positioning of the router devices in the structure of the system into account when allocating message channels to router devices. Hence in some examples the data processing system comprises: [0141] a plurality of router devices coupled to the interconnect circuitry, [0142] wherein at system start-up the system privileged agent is configured to map each of the plurality of router devices into a physical address space, [0143] and a proximity table is constructed comprising information indicative of a predefined cost function related to communication between each processing element of the multiple processing elements and each router device of the plurality of router device, [0144] wherein the system privileged agent is configured to define the message channel for communication between the producer element and a consumer element by: [0145] selecting the router device associated with the message channel in dependence on the information comprised in the proximity table.
[0146] The allocation of router devices may make use of the information held in the proximity table in a variety of ways, but in some examples the router device is further selected to minimise a predefined cost function related to communication between the router device and the consumer element. [0147] In some examples the predefined cost function is a measure of at least one of: relative distance between the router device and the consumer element; [0148] bandwidth between the router device and the consumer element; and/or signalling latency between the router device and the consumer element.
[0149] Other factors may also be taken into account and in some examples the router device is further selected in dependence on a relative priority of the message channel.
[0150] The proximity table may also be used and updated in a dynamic fashion, such as when a router device is added to or removed from the data processing system when it is operational. Hence in some examples the system privileged agent is responsive to addition of a new router device to the data processing system whilst the data processing system is operating to re-construct the proximity table to incorporate the new router device.
[0151] In accordance with one example configuration there is a method of operating a data processing system comprising: [0152] defining a configuration of the data processing system by a system privileged agent; [0153] performing data processing in multiple processing elements, wherein the multiple processing elements comprise a producer element and a consumer element; [0154] coupling the multiple processing elements with one another via interconnect circuitry; [0155] defining a message channel for communication between the producer element and a consumer element, the message channel being defined by a message channel identifier and a message channel target pointer, wherein the message channel target pointer indicates a non-cacheable target location associated with the consumer element; [0156] performing an atomic message store operation by the producer element with respect to a block of message data targeting the consumer element, wherein the producer element specifies the message channel identifier and the block of message data; and [0157] conveying the block of message data atomically to the non-cacheable target location associated with the consumer element via the interconnect circuitry.
[0158] Particular embodiments will now be described with reference to the figures.
[0159]
[0160] Before proceeding further with the description of various specific examples of the present techniques, the following table sets out a number of definitions of terms which may be used in the description of those examples:
TABLE-US-00001 Term Definition Message Conditional store of message to a target. A single store Store operation which is atomic. In one example using the Operation Arm ISA the st64bXX family of instructions can be used. The message store is also non-coherent. This should not imply that the message store operation must be 64B and additional instructions can be provided to incorporate smaller message store operations such as 32B, 16B, etc. that would use fewer source architected registers and potentially smaller bus transactions. Message Each message channel identifier is a unique name for a Channel given message channel instance. This can be thought of Identifier as a single message channel object to which producers and consumers subscribe. MCP Message Channel Port: each message channel port may be virtualized, so that depending on the instance in time can be assigned (potentially via indirection) to a specific message channel identifier. MCU Message Channel Unit: where the count of contained MCPs is >=1. ACPI Advanced Configuration & Power Interface ACPI SLIT System Locality Information Table (as specified by ACPI) ACPI SRAT System Resource Affinity Table (as specified by ACPI) ACPI Heterogeneous Memory Attribute Table (as specified by HMAT ACPI) NUMA Non-Uniform Memory Architecture NUCA Non-Uniform Cache Architecture Push The act of sending/releasing data to a channel. operation Equivalent to a store + semantic of releasing data stored to downstream agent (e.g., transfers ownership). Pop The act of pulling/receiving data from a channel. operation Equivalent to a load + semantic of receiving data from upstream agent (e.g., transfers ownership). Virtual Virtual representation of memory space seen by the address application layer (e.g., EL0) Physical Physical representation of memory space, this address corresponds to a specific device or set of devices. PE A PE could be general purpose core (e.g., an Arm A- (processing class, R-class, M-class core) which contains a program element) counter address and is capable of loading instructions provided in a specified instruction set architecture. A PE could also be an accelerator, device, or other compatible processing element of less programmable capability such as a GPU, DMA, NIC, NPU, or another known device. Here, targets could also be compatible storage devices communicating over protocols such as NVMe that are responsive to message store operations.
[0161]
[0162] The
[0163] Accordingly, when a user-level software agent (executing on a processing element) wants to use a channel, it will instantiate a queue from a software framework (e.g. Boost Lock-free Queue, Intel's Threading Building Blocks, etc.). In code this could be specified as follows:
TABLE-US-00002 #include <boost> #include <cstdint> boost::lockfree::queue< std::int64_t > queue( ); [0164] where the queue object is created (in this example) on the stack and the queue constructor is called on this stack-based memory location. When the constructor is called it then calls into the OS/hypervisor to obtain a message channel identifier. This message channel identifier is a unique (but virtualizable) identifier that enables differentiation of a single channel within the system. Further extensions of this can include using a ACPI SLIT (system locality information table), a SRAT, or other, e.g., HMAT table to specify locality and topology, so that the system can treat each channel as a NUMA resource (with varying locality to each PE, i.e., each physical channel could have an associated proximity domain).
[0165] The queue software constructor could look something like this (in abstract code format):
TABLE-US-00003 #include <boost> #include <cstdint> #include <cstdlib> queue::queue( ) { const auto our_proc_id = getpid( ); char channel_list_buffer[ MAX_PATH_LENGTH ]; std::snprintf( channel_list_buffer /** null term **/, MAX_PATH_LENGTH, /proc/%d/channel_list, our_proc_id ); const auto queue_id = open( channel_list_buffer, OPEN_QUEUE_HANDLE ); /** initialize producer/consumer addresses **/ producer_address_ = mmap( 0, 0, PROT_WRITE, MAP_QUEUE | MAP_PRIVATE /** only within single, ASID **/ /** MAP_SHARED if cross-ASID, multi-process **/ queue_id, 0 ); if( producer_address_ == MAP_FAILED ) { //check error codes, fall back to software queue if needed } consumer_address_ = mmap( 0, 0, PROT_READ, MAP_QUEUE | MAP_PRIVATE /** only within single, ASID **/ /** MAP_SHARED if cross-ASID, multi-process **/ queue_id, 0 ); if( consumer_address_ == MAP_FAILED ) { //check error codes, fall back to software queue if needed } //producer and consumer are now ready for push/pop operations }
[0166] In the above example implementation there are several system calls that map into the underlying OS. As an example these could work as follows:
Open:
[0167] Read inode indicated by input file path string; [0168] Open as a queue handle (specific implementations could open as RD or WR using the default O_RD/WR flags specified in POSIX or they could leave that to mmap to specify); [0169] At the specified inode, the OS allocates a channel identifier/handle for the given channel; [0170] Return is a file descriptor integer which maps to the identifier/handle. Note that this handle doesn't have to numerically match that which is the internal (and micro-architecturally visible) channel ID, it simply has to map to the one that the hardware will use within the OS for subsequent mmap call.
mmap: [0171] Reads inode represented by file descriptor integer provided; [0172] This file descriptor maps to the specific channel identifier used by the hardware; [0173] [As an implementation choice] the OS could read/use the HMAT/SLIT table information presented in sysfs to allocate the most local router to the caller. [0174] The OS allocates a virtual address page which maps to the Physical Address of register port on the router which is assigned to this particular channel. The router itself will use the physical channel port along with address information provided to decode which channel each message store operation maps to. [0175] The mmap command returns a valid virtual address to the physical channel port on the router. The mapping is good for read or write only (as a design choice), it could be either read or write but the example is shown given specific permissions.
[0176] Hence, the producer and consumer address are returned in the callee's virtual address (VA) space via mmap. To perform a push/pop operation, this VA is translated into a physical address. For the producer a message store operation is required and in some examples this is implemented by the above-mentioned st64 instruction and variants, although this mechanism may employ with any message store to non-cacheable memory. A separate message store operation variant could be used to provide control of the permission to access message channels from a given EL. For the consumer a message channel port address is defined which can be the target of message store operations (either from a router or from another PE directly). This port simply receives data and for example forwards it to a holding buffer for user-space consumption. Such a holding area could be a (set of) special purpose system register(s) (e.g., a bank of 64B system registers with associated instructions for user-space to consume this data, which could be an existing Arm ISA Wd64 operation if the 64B system register is exposed to the software as a device memory address to read from). This holding area could also be vector registers (e.g., from the SVE instruction set) or other register set with sufficient width. This holding area could also be of many other forms (including any addressable memory buffer with sufficient space), e.g., it is abstracted from the architecture. In one example implementation, the consumer software can map a consumer channel as previously described, and use a load instruction such as Ld64 to access the channel. Each load is translated as normal through address generation and the PA is then used when accessing the MCU and translated (potentially using the same mapping mechanism described previously for producers) to access a channel. The channel access could occur within the core or outside of the core. Data is returned to registers (if using Ld64).
[0177]
[0178] In some implementations, the router 200 can be configured to provide broadcast capability by providing a resend command to the sender for every message store operation. For each successful send command, the router 200 will send a unique copy of that data from the producer to each subscriber consumer element on the channel. Once each receiver has been sent exactly one copy, the producer will receive a response to indicate complete. The router is responsible for keeping track of which consumer would receive the next broadcast (e.g., 1 of N consumers will be selected for each message store operation from the producer and the router tracks which consumer is selected via some policy such as round-robin, FIFO, MRU, etc.). The broadcast target list at the router could be sequential (e.g., the router only keeps track of next), or it could be discontinuous where the router must keep track of which recipient has received the message out of a set of N in response to consumers that may not be available currently (e.g., the router keeps the consumer tracking 207 as a bit-vector, which tracks which of N consumers have received the message, and each bit could be visited in turn, such as according to a round-robin policy).
[0179]
[0180]
[0181]
[0182]
[0183]
[0184] The work queue buffer 411 allows a message channel to be used as a work queue and the message channel router 410 (controlled via its auxiliary interface 412) can then act as a scheduling agent for downstream consumer element targets. Hence a producer on a work stream (or producers) could enqueue work targeting a given VA (mapped by the OS for this purpose) and the router device 410 is augmented with additional functionality to allow buffering of these jobs (in the work queue buffer 411) for example into levels of priority (e.g., three priority queues to implement a multi-level queue scheduler). The auxiliary interface allows the attachment of a controller (not shown) which could run in hardware or in firmware and control the dispatch targets of jobs on a channel. The target consumer elements of the jobs are pre-configured in software (i.e., they would be subscribers to the configured message channel). The buffering for the priority queue could be internal to the scheduler SRAM, or be allocated as pinned system memory and given to the auxiliary scheduler via a set of registers (e.g., when configuring the scheduler, software would allocate/pin memory to a PA range then set the values within the auxiliary scheduler's defined register set), or this address range could be given to the scheduler as a VA and the auxiliary scheduler could obtain the PA via an address translation service (e.g., IOMMU or system memory management unit). This auxiliary interface could also use additional system features for scheduling decisions such as a thread context table.
[0185]
[0186]
[0187] Hence, ticket-locks or MCS locks can make use of a message channel conveying zero data messages as described above. Such a ticket-lock and MCS lock are both be more efficient than existing coherence-based approaches, whilst providing the same software API (top level) to the programmer. In one example implementation the router is augmented with a tracking mechanism (such as one of those described above) for each targetable PE in the system (e.g., 256 PEs). This tracking mechanism (internal to the router) could take the form of a shift register, and for 256 PEs this would be 2048 bits. The first thread to acquire the lock has its core-ID shifted into the first 8 bits starting at position zero of the ticket lock shift register, which is the implicit head. An index is kept into the head of the shift register based on log.sub.2(#PEs) offsets for each index. An additional bit is also used to indicate if this channel identifier (lock channel) is currently locked. To acquire a lock a thread/context with a valid VA for the lock message channel initiates a message store operation without data to the channel (enqueuing the first core-ID if the lock is currently active). The response from the router could either be success (which means that the initiating core has acquired an immediate lock) or defer (which means that the initiating core must wait until it receives a lock-message from the router. If immediate acquisition occurs, the lock is held by the initiating thread until it is release, else, the core can choose to continue to do something else or wait for the MCP to populate with a lock token. If a lock is immediately gained, then there were no waiting objects in the queue and the lock-active-bit was set to 0. On acquiring the lock, the lock active bit is set to 1 in the case of an empty shift register. If a lock is not immediately acquired, it means that the lock-active-bit must be 1 and the requesting core must wait its turn. On waiting, the core can spin, wait-for-event (an Arm AArch64 instruction that places the core into a low-power state in wait) or take other action. To release the lock, the core holding the lock incorporates another message store (another zero data message) to the lock message channel (using the VA that maps to the correct channel identifier). This has the effect of popping the core-ID from the head of the tracking queue and pushing a lock-token to the new core's MCP as a no-data message store from the routing device (transferring the lock to the next core). Conditions could arise when threads attempting to acquire a lock are swapped off (via software PE multiplexing). A further variant provides that the OS or supervisor software removes the calling core index from the lock queue for that channel ID. The router in this case could be augmented with additional indexing that would point each PE index into the shift register for fast look-up.
[0188]
[0189]
[0190]
[0191]
[0192]
[0193]
[0194] As in the case of the consumer elements, the producer element may also virtualise the memory addresses used.
[0195]
[0196]
[0197]
[0198] Thus in systems with more than one routing device, one software implementation (acting as a trusted agent) can map the topology of the PEs in relation to the routing devices within the system using a table similar to the SRAT. Channels are allocated to routers based on the principal of first-touch similar to how memory pages are allocated to physical memory. In a similar way as well, if higher-priority channels are allocated and scheduled to a router by an operating system, it may be desirable to move lower priority channels to routers that are further away. Thus the allocation can also be dynamic and using the proximity table created at boot time (or statically provided) the OS/hypervisor/driver layers can deallocate one channel identifier from a given routing device and move it to a target routing device. During this transition, attempted message store operations receive a response of defer, to indicate they should try again in one implementation or receive a more informative response such as move-retry, which tells the software that the channel identifier is currently being moved.
[0199] In one implementation the OS would not invalidate the PA from the producers which are using it as a message store target until the channel identifier is now active in another router. Once active, the software layer that is initiating the move operation ensures that the page table entry mapping the VA to PA translation is updated to the new PA at the new target router and the software then executes an invalidation of the VA->PA mapping. In an example implementation in the Arm architecture, this is achieved by a TLB invalidation, followed by a barrier). All new message store operations target the same channel identifier at the new message channel router via the new PA (noting that all interim message store operations will have received a fail-retry or defer response, this means that no in-flight state needs to be restored or accounted for, aside from what the software layer packed up from the router and moved). As an overall example, from boot the routers on the interconnect are be mapped into the PA space. Once the initial boot stages are complete, the driver layers that manage the routers for the kernel are booted and a proximity table is constructed as a matrix, with each row could representing a PE, and each column a distance to a given router. Message channels are assigned to routers based on the lowest distance first. This matrix is given to the software layer and kept as part of the allocation process (see the above-described mmap example). Additional routers can be provided dynamically via hot-plug operations. At this point, the driver must re-run the table generation above to include the new routers into the topology table.
[0200]
[0201] Various disclosed configurations are set out in the following numbered clauses:
[0202] Clause 1. A data processing system comprising: [0203] a system privileged agent arranged to define a configuration of the data processing system; [0204] multiple processing elements arranged to perform data processing, wherein the multiple processing elements comprise a producer element and a consumer element; and [0205] interconnect circuitry arranged to couple the multiple processing elements with one another, [0206] wherein the data processing system supports a message channel functionality according to which: [0207] the system privileged agent is configured to define a message channel for communication between the producer element and a consumer element, the message channel being defined by a message channel identifier and a message channel target pointer, wherein the message channel target pointer indicates a non-cacheable target location associated with the consumer element; [0208] the producer element is configured to perform an atomic message store operation with respect to a block of message data targeting the consumer element, wherein the producer element specifies the message channel identifier and the block of message data; and [0209] the interconnect circuitry is configured to convey the block of message data atomically to the non-cacheable target location associated with the consumer element.
[0210] Clause 2. The data processing system as defined in Clause 1, wherein the data processing system further comprises: [0211] a router device coupled to the interconnect circuitry and comprising an input port and an output port, [0212] wherein the system privileged agent is configured to define the message channel for communication between the producer element and a consumer element by: [0213] providing the producer element with a message channel router pointer indicative of the input port of the router device; and [0214] storing message channel configuration data in the router, the message channel configuration data comprising the message channel identifier and the message channel target pointer, [0215] wherein the producer element is arranged to perform the atomic message store operation specifying the message channel router pointer, [0216] and wherein the router device is responsive to reception of the block of message data at the input port to forward the block of message data from the output port to the non-cacheable target location indicated by the message channel target pointer.
[0217] Clause 3. The data processing system as defined in Clause 2, wherein the multiple processing elements comprise multiple consumer elements, and wherein more than one consumer elements subscribe to the message channel, wherein the message channel is associated with multiple message channel target pointers, wherein each of the message channel target pointers indicates a non-cacheable target location associated with a respective consumer element of the multiple consumer elements, [0218] and wherein the message channel configuration data stored in the router device comprises the message channel identifier and the multiple message channel target pointers.
[0219] Clause 4. The data processing system as defined in Clause 3, wherein the router device is responsive to reception of the block of message data at the input port to select a recipient consumer element from the more than one consumer elements which subscribe to the message channel and to forward the block of message data from the output port to the non-cacheable target location indicated by the message channel target pointer associated with the recipient consumer element.
[0220] Clause 5. The data processing system as defined in Clause 4, wherein the router device is configured to select the recipient consumer element in dependence on recipient ordering data for the more than one consumer elements which subscribe to the message channel stored in the router device.
[0221] Clause 6. The data processing system as defined in Clause 4 or Clause 5, wherein the router device is further responsive to the reception of the block of message data at the input port to re-forward the block of message data from the output port to each of the more than one consumer elements which subscribe to the message channel.
[0222] Clause 7. The data processing system as defined in any of Clauses 2-6, wherein the producer element is further configured to send a zero data message, wherein the zero data message specifies the message channel identifier and an identifier indicative of the zero data message data, and the router device is responsive to reception of a zero data message to forward the zero data message to one or more consumer elements which subscribe to the message channel.
[0223] Clause 8. The data processing system as defined in Clause 7, wherein the multiple processing elements comprise multiple producer elements, and wherein more than one producer elements are configured to send the zero data message, and the router device is configured to maintain a count of a number of producer elements from which the zero data message has been received, and the router device is responsive to the count reaching a predetermined threshold value to forward the zero data message to one or more consumer elements which subscribe to the message channel.
[0224] Clause 9. The data processing system as defined in any of Clauses 2-8, wherein the consumer element is responsive to reception of the block of message data at the non-cacheable target location indicated by the message channel target pointer to return a success indicator to the router device, wherein the success indicator indicates whether or not the block of message data has been successfully received by the consumer element, and the router device is responsive to reception of the success indicator to forward the success indicator to the producer element.
[0225] Clause 10. The data processing system as defined in Clause 9, wherein the router device is responsive to the reception of the success indicator from the consumer element, when the success indicator indicates that the block of message data has not been successfully received by the consumer element, to retry forwarding the block of message data.
[0226] Clause 11. The data processing system as defined in Clause 9, wherein the multiple processing elements comprise multiple consumer elements, and wherein more than one consumer elements subscribe to the message channel, wherein the router device is responsive to the reception of the success indicator from the consumer element, when the success indicator indicates that the block of message data has not been successfully received by the consumer element, to retry forwarding the block of message data to another consumer element which subscribes to the message channel.
[0227] Clause 12. The data processing system as defined in any of Clauses 2-11, wherein the router device is responsive to the reception of the block of message data at the input port, when no consumer element is available for the message channel, to return a message failure indication to the producer element.
[0228] Clause 13. The data processing system as defined in any of Clauses 2-12, wherein the router device is configured to maintain consumer element capacity data indicative of a capacity of the consumer element to receive message data, and when the consumer element capacity data indicates that the consumer element does not have capacity to receive the block of message data at the input port, the router device is arranged to return a message failure indication to the producer element.
[0229] Clause 14. The data processing system as defined in any of Clauses 2-13, wherein the router device comprises a message block buffer configured to store multiple blocks of message data, wherein the router device is configured to forward the multiple blocks of message data respecting an ordering defined by the message block buffer.
[0230] Clause 15. The data processing system as defined in Clause 14, wherein the ordering is a first-in-first-out ordering.
[0231] Clause 16. The data processing system as defined in any of Clauses 2-15, wherein the router device comprises a message block buffer configured to buffer a received block of message data and, when the message block buffer is not available to buffer the block of message data but will be available to buffer the block of message data after a known processing step, to return a fail-but-retry message to the producer element indicating that the block of message data will be receivable after the known processing step.
[0232] Clause 17. The data processing system as defined in any of Clauses 2-16, comprising: [0233] a plurality of router devices coupled to the interconnect circuitry and each comprising an input port and an output port, [0234] wherein the system privileged agent is configured to define the message channel for communication between the producer element and a consumer element by concatenating the plurality of router devices, such that: [0235] the message channel router pointer specified by the producer element specifies a first router device of the plurality of router devices, [0236] and the message channel configuration data stored in each of the plurality of router devices links the plurality of router devices in sequence, [0237] such that each router device of the plurality of router devices other than a last concatenated router device is responsive to reception of the block of message data at the input port to forward the block of message data from the output port to a next router device of the plurality of router devices, and the last concatenated router which is responsive to reception of the block of message data at the input port to forward the block of message data from the output port to the non-cacheable target location indicated by the message channel target pointer.
[0238] Clause 18. The data processing system as defined in any of Clauses 2-17, wherein the router device further comprises an auxiliary interface providing a control access to the router device, wherein control signals received at the auxiliary interface provide at least partial control of operation of the router device.
[0239] Clause 19. The data processing system as defined in Clause 4, or any of Clauses 5-18 when dependent on Clause 4, wherein the router device further comprises an auxiliary interface providing a control access to the router device, wherein control signals received at the auxiliary interface provide at least partial control of operation of the router device, wherein the at least partial control of operation of the router device comprises the selection of the recipient consumer element from the more than one consumer elements which subscribe to the message channel.
[0240] Clause 20. The data processing system as defined in Clause 18 or Clause 19, wherein the router device further comprises a work queue buffer arranged to buffer multiple blocks of message data, wherein the multiple blocks of message data comprise task definitions of tasks to be carried out by the consumer element, and wherein the control signals received at the auxiliary interface control scheduling of the tasks to be carried out by the consumer element by selection from the multiple blocks of message data buffered in the work queue buffer.
[0241] Clause 21. The data processing system as defined in any of Clauses 18-20, wherein the work queue buffer comprises multiple work queues, wherein the multiple work queues each has an associated priority level relative to each other, and wherein the control signals received at the auxiliary interface control scheduling of the tasks to be carried out by the consumer element respecting the relative associated priority levels of the multiple work queues.
[0242] Clause 22. The data processing system as defined in any of Clauses 2-21, wherein the producer element is further configured to send a zero data message, wherein the zero data message specifies the message channel identifier and an identifier indicative of the zero data message data, and wherein the multiple processing elements comprise multiple producer elements, and wherein more than one producer elements are configured to send the zero data message, [0243] and the router device further comprises producer element lock tracking storage and the router device is responsive to reception of the zero data message from a lock-seeking producer element to store an indication of the lock-seeking producer element in the producer element lock tracking storage, [0244] wherein the producer element lock tracking storage also stores a lock status indication indicative of whether a lock target is currently allocated to one of the multiple producer elements, [0245] wherein when the lock status indication is not set, the lock target is allocated to the lock-seeking producer element and the lock status indication is set, [0246] and when the lock status indication is set, the indication of the lock-seeking producer element is queued up in the producer element lock tracking storage.
[0247] Clause 23. The data processing system as defined in Clause 22, wherein the producer element lock tracking storage is configured as a shift register, wherein storing the indication of the lock-seeking producer element in the producer element lock tracking storage comprises shifting the indication of the lock-seeking producer element into the shift register, [0248] wherein when the lock status indication is set and the lock-allocated producer element to which the lock target is currently allocated sends the zero data message again, the router device is configured to pop an indication of the lock-allocated producer element from the shift register, [0249] and wherein when popping the indication of the lock-allocated producer element from the shift register reveals an indication of a further lock-seeking producer element, the router device is configured to send the zero data message to the further lock-seeking producer element indicating that the lock target is now allocated to the further lock-seeking producer element.
[0250] Clause 24. The data processing system as defined in Clause 1, wherein the system privileged agent is configured to define a router-less message channel for communication between the producer element and a consumer element by providing the producer element with the message channel target pointer indicating the non-cacheable target location associated with the consumer element, [0251] wherein the producer element is arranged to perform the atomic message store operation specifying the message channel target pointer.
[0252] Clause 25. The data processing system as defined in any of Clauses 1-24, wherein the consumer element comprises a holding buffer accessible to user software executing on the consumer element, [0253] wherein the non-cacheable target location associated with the consumer element is configured as a data reception port of the consumer element, [0254] and wherein the data reception port is configured to forward the block of message data received atomically to the holding buffer.
[0255] Clause 26. The data processing system as defined in Clause 25, wherein the holding buffer comprises at least one of: [0256] a set of system registers; [0257] vector registers; and [0258] user software addressable memory buffer.
[0259] Clause 27. The data processing system as defined in Clause 25 or Clause 26, wherein the holding buffer is sub-divided into a plurality of sub-buffers, wherein each sub-buffer of the plurality of sub-buffers is allocated to a corresponding message channel to which the consumer element is subscribed.
[0260] Clause 28. The data processing system as defined in any of Clauses 25-27, wherein the consumer element is configured to reserve at least a portion of the holding buffer for at least one prioritised message channel to which the consumer element is subscribed.
[0261] Clause 29. The data processing system as defined any of Clauses 25-28, wherein the consumer element is responsive to an attempt to deliver the block of message data at the data reception port, to return a success indicator, wherein the success indicator indicates whether or not the holding buffer currently has capacity to receive the block of message data.
[0262] Clause 30. The data processing system as defined any of Clauses 25-29, wherein the user software executing on the consumer element is configured to test whether the holding buffer currently holds a user software targeted block of message data on a message channel to which the user software is subscribed.
[0263] Clause 31. The data processing system as defined any of Clauses 25-30, wherein the consumer element is configured to support execution of multiple tasks on the consumer element, wherein each task has an individual set of consumer element state and the consumer element is configured to switch to a corresponding individual set of consumer element state when switching to a current task of the multiple tasks.
[0264] Clause 32. The data processing system as defined in Clause 31, wherein the consumer element is responsive to an attempt to deliver the block of message data at the data reception port, to receive or reject the block of message data in dependence on whether current task is subscribed to the message channel for the block of message data.
[0265] Clause 33. The data processing system as defined in Clause 31, wherein the data reception port is responsive to an attempt to deliver the block of message data at the data reception port, when the current task is not subscribed to the message channel for the block of message data, to generate an interrupt signal for the consumer element.
[0266] Clause 34. The data processing system as defined in Clause 25, or any of Clauses 26-33, wherein the consumer element is configured to reference memory locations using virtual addresses and comprises address translation circuitry to perform address translation of the virtual addresses into physical addresses, [0267] wherein the consumer element is configured to map a virtual address associated with the message channel to a physical address associated with the holding buffer, [0268] and wherein the consumer element is configured to access the message channel by execution of a load instruction specifying the virtual address.
[0269] Clause 35. The data processing system as defined in any of Clauses 1-34, wherein the consumer element comprises message channel handling circuitry comprising the non-cacheable target location, wherein the message channel handling circuitry is configured to reference message channels using message channel identifiers, [0270] wherein user software executing on the consumer element is configured to reference the message channel using a virtual message channel identifier, [0271] and the consumer element comprises message channel identifier translation circuitry configured to translate virtual message channel identifiers to message channel identifiers in dependence on user software currently executing on the consumer element.
[0272] Clause 36. The data processing system as defined in any of Clauses 1-35, wherein the consumer element is configured to receive task definitions via the message channel and the block of message data provides at least a part of a task definition for the consumer element.
[0273] Clause 37. The data processing system as defined in Clause 36, wherein the consumer element is configured, when a currently executing task relinquishes use of the consumer element, and when a block of message data providing at least a part of a task definition for the consumer element has been received on the message channel, to switch to performing a new task defined by the task definition.
[0274] Clause 37. The data processing system as defined in Clause 36, wherein the consumer element is configured, when a block of message data providing at least a part of a task definition for the consumer element is received on the message channel, to pause execution of a currently executing task and to switch to performing a new task defined by the task definition.
[0275] Clause 38. The data processing system as defined in any of Clauses 1-37, the producer element is configured to reference memory locations using virtual addresses and comprises address translation circuitry to perform address translation of the virtual addresses into physical addresses, [0276] wherein the producer element is configured to map a virtual address associated with the message channel to a physical address associated with the message channel identifier, [0277] and wherein the producer element is configured to access the message channel by execution of a store instruction specifying the virtual address.
[0278] Clause 39. The data processing system as defined in Clause 38, wherein the execution of the store instruction comprises retrieval of the block of message data from a set of registers and storing the block of message data to the physical address associated with the message channel identifier.
[0279] Clause 40. The data processing system as defined in any of Clauses 1-39, wherein the system privileged agent comprises at least one of: [0280] an operating system; and [0281] a hypervisor.
[0282] Clause 41. The data processing system as defined in any of Clauses 1-40, wherein the system privileged agent is responsive to a message channel setup call for the message channel from a processing element of the multiple processing elements to: [0283] allocate the message channel identifier for the message channel; [0284] specify the message channel target pointer; [0285] wherein the processing element uses virtual addresses to reference memory locations, and allocate a virtual address for the processing element to use for the message channel, wherein the virtual address maps to a physical address given by the message channel target pointer.
[0286] Clause 42. The data processing system as defined in Clause 2, or any of Clauses 3-41 when dependent on Clause 2, wherein the system privileged agent is responsive to a message channel setup call for the message channel from a processing element of the multiple processing elements to: [0287] allocate the message channel identifier for the message channel; [0288] specify the message channel target pointer; [0289] specify the message channel router pointer; [0290] wherein the processing element uses virtual addresses to reference memory locations, and allocate a virtual address for the processing element to use for the message channel, wherein the virtual address maps to a physical address given by the message channel router pointer.
[0291] Clause 43. The data processing system as defined in Clause 42, wherein the system privileged agent is configured to define a virtual-to-physical address mapping scheme between a virtual address space and a physical address space in which a subset of bits of the physical address space are directly indicative of a set of message channel identifiers defined by the system privileged agent.
[0292] Clause 44. The data processing system as defined in any of Clauses 1-43, wherein at least one processing element of the multiple processing elements is configured to support execution of multiple tasks on the processing element, wherein each task has an individual set of processing element state, wherein the system privileged agent is configured to administer time-sliced use of the processing element by causing an exchange of the individual set of processing element state and by modifying at least one of the message channel identifier and the message channel target pointer.
[0293] Clause 45. The data processing system as defined in Clause 2, or any of Clauses 3-44 when dependent on Clause 2, comprising: [0294] a plurality of router devices coupled to the interconnect circuitry, [0295] wherein at system start-up the system privileged agent is configured to map each of the plurality of router devices into a physical address space, [0296] and a proximity table is constructed comprising information indicative of a predefined cost function related to communication between each processing element of the multiple processing elements and each router device of the plurality of router device, [0297] wherein the system privileged agent is configured to define the message channel for communication between the producer element and a consumer element by: [0298] selecting the router device associated with the message channel in dependence on the information comprised in the proximity table.
[0299] Clause 46. The data processing system as defined in Clause 45, wherein the router device is further selected to minimise a predefined cost function related to communication between the router device and the consumer element.
[0300] Clause 47. The data processing system as defined in Clause 46, wherein the predefined cost function is a measure of at least one of: [0301] relative distance between the router device and the consumer element; [0302] bandwidth between the router device and the consumer element; and/or [0303] signalling latency between the router device and the consumer element.
[0304] Clause 48. The data processing system as defined in Clause 46 or Clause 47, wherein the router device is further selected in dependence on a relative priority of the message channel.
[0305] Clause 49. The data processing system as defined in any of Clauses 46-48, wherein the system privileged agent is responsive to addition of a new router device to the data processing system whilst the data processing system is operating to re-construct the proximity table to incorporate the new router device.
[0306] Clause 50. A method of operating a data processing system comprising: [0307] defining a configuration of the data processing system by a system privileged agent; [0308] performing data processing in multiple processing elements, wherein the multiple processing elements comprise a producer element and a consumer element; [0309] coupling the multiple processing elements with one another via interconnect circuitry; [0310] defining a message channel for communication between the producer element and a consumer element, the message channel being defined by a message channel identifier and a message channel target pointer, wherein the message channel target pointer indicates a non-cacheable target location associated with the consumer element; [0311] performing an atomic message store operation by the producer element with respect to a block of message data targeting the consumer element, wherein the producer element specifies the message channel identifier and the block of message data; and [0312] conveying the block of message data atomically to the non-cacheable target location associated with the consumer element via the interconnect circuitry.
[0313] In brief overall summary a message channel functionality for a data processing system is disclosed. This provides communication channels which may be considered to be a shared resource. The approach combines atomic stores, which are fully completed in a single atomic transaction, and non-coherence to provide non-coherent atomic stores that are conditional to implement primitive communications channels that can be used to implement software queues and channels more efficiently. This enables the programmer to execute a store from registers on one side of a communications link and to have that data appear in the registers of a data consumer on that link directly, bypassing both the shared state upgrade problem and the parallel problem of acquiring a synchronization lock before data send.
[0314] In the present application, the words configured to . . . are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a configuration means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. Configured to does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
[0315] Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes, additions and modifications can be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims. For example, various combinations of the features of the dependent claims could be made with the features of the independent claims without departing from the scope of the present invention.