PARALLEL DATA PROCESSING IN EMBEDDED SYSTEMS
20220342722 · 2022-10-27
Assignee
Inventors
Cpc classification
G06F9/52
PHYSICS
International classification
Abstract
The invention claims a computer-implemented method of lock-free parallel data processing in autonomous embedded systems comprising: one or more producer providing data, a Smart Object Pool, an asynchronous publisher object capable of cloning event objects, a circular batching queue, an event object handler, and a subscriber being arranged for sending the event objects to one or more consumers and for returning the event object to the smart object pool once it determines that no more consumer needs the event.
Claims
1. A method of parallel data processing in an autonomous embedded system operating in the framework of an LMAX disruptor, comprising: receiving a call to publish a data object; cloning the data object and placing it into a pre-allocated space in memory; placing the cloned data object or a pointer to the data object into a batching queue; publishing the cloned data object by delivering or notifying the cloned event to one or more subscribers; and releasing the pre-allocated memory space.
2. The method of claim 1 wherein the pre-allocated memory space is released after the one or more subscribers have finished processing the data object.
3. The method of claim 1 in which the data object is published to multiple subscribers.
4. The method of claim 3 in which each consumer may receive each data object no more than once.
5. The method of any preceding claim comprising implementing a consumer sequence barrier between the batching queue and the one or more subscribers.
6. The method of claim 1 in which data objects from multiple producers are published to a single subscriber.
7. The method of claim 6 in the data object comprises a binding being a direct connection to middleware.
8. The method of any preceding claim comprising implementing a producer barrier between the multiple producers and the batching queue.
9. The method of claim 8 comprising providing a separate memory pool for each producer.
10. The method of any preceding claim in which the memory is allocated such that different kinds of data are allocated to different memory spaces.
11. The method of claim 10 in which the size of the memory space allocated to a kind of data depends on the frequency at which the data is generated.
12. The method according to any preceding claim wherein a smart object pool in the memory space is pre-allocated to each producer.
13. The method of any preceding claim wherein the memory comprises a lock-free memory management system employing smart pointers.
14. The method of any preceding claim in which the batching queue comprises a ring buffer.
15. The method of any preceding claim in which the publishing is performed asynchronously.
16. The method of any preceding claim in which batching queue publishes the cloned data object by delivering the cloned event to the one or more subscribers.
17. The method of any preceding claim in which each subscriber comprises a group of listeners and each subscriber transmits a reference to the object contained in the batching queue to each listener.
18. The method of any preceding claim implemented using C++ version 11 onwards.
19. The method of any preceding claim implemented for task scheduling, wherein the data object comprises a task.
20. A computer readable medium comprising instructions which when implemented in a computer cause the computer to operate a method as claimed in any of claims 1 to 19.
21. An autonomous embedded computing system comprising one or more processors and memory and being configured to implement a method as claimed in any of claims 1 to 19.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] The invention will be better understood with the aid of the description of embodiments given by way of example and illustrated by the figures, in which:
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
DETAILED DESCRIPTION OF POSSIBLE EMBODIMENTS OF THE INVENTION
[0045] Embodiments of the present invention are described below by way of example only. These examples represent the best mode of putting the invention into practice that are currently known to the Applicant although they are not the only ways in which this could be achieved. The description sets forth the functions of the example and the sequence of steps or operations for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples.
[0046] Nowadays, a robot might be processing data coming from dozens of sensors. Electronics manufacturers offer more and more variety of sensors: Optical sensors, Biosensors, Touch sensors, Image sensors, etc. At the same time, the quality of sensor devices improves too, giving more and better measurements. Furthermore, a robotic system must also process data coming from other sources including: commands, messages from other robots, cloud data processing, etc. In summary, a robotic system can nowadays easily produce thousands of data events per second and the software application will have to process them.
[0047] Space applications requiring on-board data processing at high rate is a growing field. On the one hand, constellations of satellites is a growing market. LeoSat, for example, are building their high-throughput satellites that will form a mesh network interconnected through laser links, creating an optical backbone in space which is about 1.5 times faster than terrestrial fiber backbones. OneWeb, will provide a global constellation of low Earth orbit satellites that will provide low latency, high-speed broadband to even the most remote locations on Earth. These both cases, although with different purpose, face similar requirements in terms of data processing: dispatching and processing thousands of Earth requests per second, intensive satellite to satellite communications (coordination, routing, positioning) and high throughput data link to Earth. All these demanding requirements mean development of high rate data processing applications.
[0048] In a different field, vision-based navigation, there is also a challenge of data processing combined with computationally heavy algorithms. One example is rendezvous with uncooperative objects in space, e.g., debris removal. Another example of this is autonomous pinpoint planetary landing, where the number of sensors and the complexity of the Guidance Navigation and Control (GNC) algorithms make this discipline still one of the biggest challenges in space. One common element to these two use cases, is a well-known fact in control engineering: for optimal control algorithms, the higher the rate of sensor data, the better is the performance of the algorithm.
[0049] In the previous two subsections, we showed that the amount of data going through an embedded system has increased substantially in the past few years. On the other hand, we have presented two very different use cases that show that this trend is common ground in embedded systems.
[0050] There is another important factor, which is the evolution of embedded computers. As the computational capabilities of embedded processor increase, and the data processing need grows, some parallel processing limitation might appear as the well-known Amdahl's Law. In computer architecture, Amdahl's law (or Amdahl's argument,
[0051] This parallel data processing problem has been around a long time and has been successfully solved in other sectors like financial trading. In general, the use of lock-free programming techniques have largely improved performance and determinism in parallel data processing applications. In this disclosure, a novel solution is presented that accelerates parallel data processing applications by means of applying some of these lock-free programming techniques developed in the financial trading sector. The solution has application in space and robotics applications to produce data processing rates that are substantially faster than traditional parallel processing approaches, as well as other applications.
[0052] Based on this pattern and combining it with the new smart pointers in C++ 11 and wrapping it within a publisher subscriber pattern, an aspect of the present invention provides a simple API (application programming interface), miniaturization of the LMAX disruptor for embedded systems. The functionality of the disruptor pattern is described here in relation to two main approaches. The term “smart pointer” is used here to refer to a pointer used to make sure that an object is deleted if it is no longer used (referenced), as known for example in C++ 11.
[0053] The first approach is the sensor multiplexer as shown in
[0054] The second approach is an event loop (
[0055] In both approaches data is processed in parallel, either incoming in parallel or outgoing in parallel.
[0056] Although there are two separate testing scenarios, they both share some common design, these are the specific features: [0057] Multiple producers and single consumer or vice versa [0058] Collection and aggregation of data to be stored in disk [0059] Configurable producer rate and number of producers.
[0060] The following two subsections show the specific details for spacecraft and robotic system experiments.
Robotics Testbed
[0061] The robotic testbed was based in the Robot Operating System (ROS). The ROS is by far the most used software development framework for robotics applications. It is also clearly dominant in aerospace, and self-driving cars. ROS was originally built for academia and its purpose was to accelerate the development of prototypes for robotic research. However, requirements like performance and reliability were not taken into account when designing this framework, which is why ROS really struggles with fast data processing. ROS offers two ways of delivering messages or sensor data to the application:
[0062] The most common one is the single thread spinner. Here, ROS delivers the messages sequentially to the application. It is similar to a simple event loop.
[0063] The second way to do that is by using the async multithread spinner. In this case, there is no event loop and messages are delivered concurrently to the application.
[0064] Applications using this approach must have some thread contention code in order to function correctly.
[0065] In this disclosure we present a third alternative kind of parallel data processing. In this case we combine the multithread implementation of ROS with a lock-free event-loop as described in the previous section. This implementation is very well suited for situations like high frequency sensor fusion and it should enhance the performance of ROS substantially.
[0066] A simple benchmark test was developed. It consists of a set of eight sensor providers in the form of publishers. In the client side, we subscribe to these eight sources of data and do some simple data processing consisting of fusion of the data in a JSON object and save it to file.
[0067] The results show that by combining ROS with an eventloop as described further here, for example with reference to
[0068] Embodiments of the invention are not limited to the use of event loops and may also be used for data multiplexing as described further here.
[0069] The application for the single-threaded solution is represented in
[0070] In the subscription rate chart, as shown in
[0071] Let us now continue with the traditional multi-thread scenario, also shown in
[0072] Finally, let us look at the results of the solution according to some embodiments of the present invention using an event loop, for example as shown in
Space Testbed
[0073] The base application is quite like the one presented above for the robotics test bed. In this case, however, the data was received in a mix of SDO (Service Data Object), SDO blocks and PDO (Process Data Object) CAN (controller area network) open messages.
[0074] The testing of this solution was performed in two different environments. First in Zybo boards with Petalinux installed, and secondly in Zedboard with Xillinux installed.
[0075] The results presented here are for the Petalinux setup which performs substantially better than Xilinux. The reason behind is that Petalinux is optimised for Zynq boards, while Xilinux is more oriented to development and prototyping.
[0076] The sensor fusion benchmark test was performed with the parameters presented in table I below.
[0077] Throughput: The main performance criteria is the rate at which data is processed, or throughput. The traditional multithread and the event-loop approaches described here start to diverge when the total rate of all messages reaches 1500 Hz and when tested at a total of 3000 Hz, the difference between both approaches is substantial, around 15%, as shown in
TABLE-US-00001 TABLE 1 data processing test parameters Parameter Min Max Threads 4 64 SDO PDO Rate 5 Hz 100 Hz SDO Block Rate 2 Hz 5 Hz
[0078] One noticeable feature is the small variability over time of eventloop data processing in comparison with the traditional approach, where the application has a big variability.
[0079]
[0080] Secondly, the standard deviation of the execution time, shows that the eventloop disclosed here has negligible value, while for the multi-thread application, this value is quite large, making any application quite unpredictable.
[0081] Regarding resources utilization, the solution described here needs up to 10% less CPU for data processing. The reason is mainly the cost of context switching in traditional multi-thread applications. As for RAM usage, the solution described here needs more RAM than the alternative application, which is the only trade-off needed when using this software.
Comparison of Robotics and Space Results
[0082] In the ROS test, the eventloop described here outperformed ROS alone with an impressive 40% increase. However, here we have seen a 15% increase instead. The reason for this difference is twofold. First, CAN-open is quite a complex protocol with a big overhead of data, which makes it hard to scale given the amount of data exchanged back and forth. Second, this particular Linux distribution, is quite optimal in terms of performance. However, the solution described here achieves the same level of predictability and scalability found in the ROS test. The main difference in the space scenario, is that the solution described here can also reduce the CPU consumption. The reason for that comes from the cost of context switch in this setup limited in resources.
[0083] In most use cases, there is a combination of sensors producing data at high rates simultaneously with a slow and heavy data processing block. In a traditional operating system, this is solved by using different threads that deal with different parts of the application, e.g., one thread reads images from camera, another thread does the filtering/processing, etc.
[0084] In real-time operating systems, one would use the same sort of concepts: Threads for the various classes of activities and protected synchronization and inter-task communication. The scheduling methods are different though in order to prefer predictability in comparison to average case behaviour. Just a few examples of issues: [0085] In case of multi-core, there is strong execution-time interference via shared resources like busses, caches and main memory. Therefore, predictability in these cases is reduced. In highly critical cases (like avionics), this is a major problem. [0086] Scheduling threads in case of multi-core is more complex as one needs to schedule in time (when) and space (which core). Moving a thread (e.g., after waking up from blocking) from one core to another may have huge overhead in terms of context switch time as the local program and instruction caches need to be filled first.
[0087] The software application described here provides, in some embodiments, for lock-free parallel data processing in autonomous embedded systems, such as, by way of example, robotic embedded systems or space embedded systems. Other systems which may benefit from the advantages offered by this system, include autonomous drones, vision navigation systems, autonomous rovers, self-driving vehicles, and others.
[0088] Advantageously, the software application is not restricted to a particular platform but may be installed in most computers, Linux distributions and in a variety of microcontrollers and embedded systems.
[0089] In this application external data, such as, but not limited to, sensor data, are received by a producer and placed, for example by the publisher object, in pre-allocated object instances of a “Smart Object Pool” described further below, thus creating event objects for the received data. In other words, the “Smart Object Pool” comprises memory with pre-allocated spaces, also referred to here as “slots” into which data is placed, for example according to the type of data or to a subscriber. A plurality of data may be received concurrently or, optionally, sequentially in time. It is important that the allocation of the memory space does not change during the processes according to embodiments of the invention. The allocation of space, for example according to data type as described elsewhere here, does not change. Only the occupation of the space may change, for example by being replaced with more recent data, for example after previous data has been used as required.
[0090] The producer may be a sensor, but may be any other external provider of data, in particular numerical data.
[0091] The Smart Object Pool may be a lock-free memory management system employing smart pointers, but embodiments of the invention are not limited to this.
[0092] In some embodiments of the invention, event data are copied by the asynchronous publisher into event objects located in the Smart Object Pool and subsequently moved into a circular lock-free batching queue, for example a ring buffer. By default, a copy operation may be performed, that is the data is duplicated in the batching queue, but this behaviour can be modified in special variants of the invention. Some members of the event object may not be copied, or copied by reference only.
[0093] In some embodiments of the present invention, the batching queue could hold events of variable size up to a settable maximum, and the user may be given the flexibility to manage memory allocation.
[0094] Event objects may be subsequently transferred from the ring buffer to an asynchronous subscriber, for example by a handler object.
[0095] According to some embodiments of the present invention, the asynchronous subscriber or consumer may comprise a “container” or group of listeners and may pass a reference to the object contained in the ring buffer to each registered listener. Once the system has determined that no further transfer of the event data to consumers is required, the asynchronous subscriber may send an appropriate message to the object pool, which frees the event as available for reuse.
[0096] In some embodiments of the present invention, the overall functionality of the method can be configured in the following manner: [0097] a. as a sensor multiplexer (
[0099] In the first configuration, one producer/multiple subscribers, the memory may be pre-allocated to the producer.
[0100] In the second configuration, one subscriber/multiple producers, the memory may be allocated such that, for example, different kinds of data are allocated to different memory spaces, or “pools”. The size of the pool may be allocated in various ways, for example according to the frequency at which the data is generated by the producer. The frequency may correspond to the frequency of requests to publish data objects. Examples of different kinds of data may include but are not limited to working state, temperature, battery status and others.
[0101] In both examples, one smart object pool is provided per producer.
[0102] However the memory is allocated, the allocation remains constant during the cycle of publishing data to one or more subscribers.
[0103] A system according to some embodiments of the invention may be configured as a sensor multiplexer, in which case a listening thread may be is created for each registered consumer linked to the publisher. The listening threads may pass the event data by reference to its corresponding consumer. Once all consumers have processed the event object, they may be returned to the Smart Object Pool.
[0104] A system according to some embodiments of the invention may be configured as a lock-free event loop, in which case the event object may be passed to an event emitter, which invokes the consumer. The event emitter may take the form of an LMAX disruptor configured to work as a multiple producer/single consumer setup. Once the consumer has processed an event object, this event object is returned to the Smart Object Pool.
[0105] According to some embodiments of the invention, placing copies as opposed to original event objects into the batching queue for processing and recycling processed event objects in the Smart Object Pool avoids the need to periodically and actively free up memory at high cost to CPU, and prevents memory space from being cluttered with event data, which have been processed and are no longer required, thus rendering the overall system more deterministic.
[0106] Employing the Smart Object Pool to for data management has several advantages: [0107] Firstly, any incoming data has only to be copied once during the entire processing system. [0108] No memory allocations are required in this processing system. [0109] Waiting periods for allocation of free memory are avoided, as is the added cost on CPU of a garbage collector feature, resulting in an improved deterministic behaviour of the processing system, in accordance with some embodiments of the present invention. [0110] The Smart Memory Pool is lock-free and therefore further reduces the CPU consumption of the claimed processing system when compared to conventional multi thread or single thread processing systems.
[0111] This efficiency is demonstrated in the increased data processing rate when compared to conventional multi thread or single thread processing systems (
[0112] As a further advantage over conventional multi-thread processing methods, some embodiments of the present invention may significantly decrease standard deviation for processing time, for example at least 9-fold or better, as evidenced in the CANopen system for CubeSat (
[0113] The improved standard deviation of this method renders it particularly suitable for embedded system with a need for a high degree of reliability, reproducibility and determinism.
[0114] A system according to some embodiments of the invention is illustrated in
[0115] A method according to some embodiments of the invention will now be described with reference to
[0116] The publisher 10 may initiate the copying, or cloning, of the data object and placing it into the pre-allocated space in the memory at operation 1903. The data object or, optionally, a pointer to the object, may then be placed into a batching queue, for example the ring buffer 14, at operation 1905.
[0117] The ring buffer, or ring buffers consumer sequence, may publish the cloned data object, in other words notify subscribers of newly available cloned data. This may take place by delivering the cloned event to one or more subscribers, for example the asynchronous subscriber 16, at operation 1607. The subscriber 16 may comprise one or more listeners as mentioned above, in which case the subscriber may then invoke all listeners associated to the publisher 10. A decision is made at operation 1609 whether all listeners have finished processing the data. In the affirmative, i.e. after the subscriber, or all listeners comprised in the subscriber, have finished processing the data as indicated at operation 1609, the cloned data object is recycled back to the pool 12 at operation 1611. In other words part of the memory space previously occupied by the cloned data object is released for the publication of a new data object. However, if the subscriber, or all the listeners comprised in the subscribers, have not finished processing data, as indicated by decision 1609, the system waits for all the subscriber, or all the listeners 19a-b comprised in the subscribers to finished processing data, at operation 1613 before the decision operation 1609 is repeated.
[0118] It is known in the art how to check whether subscribers no longer require a data object. For example, references in code to a pointer to a data object, such as may be generated by subscribers, may be monitored. When no more such references are detected, it may be assumed that the data object is no longer required. This may be achieved using smart pointers as known in C++ 11 onwards. An important aspect of embodiments of the invention is that when new data is to be published it is not necessary to request a memory space from the operating system, as would normally be required. Because the memory is pre-allocated and the allocation does not change, the memory space is always available for the data. This contributes to the speed of operation of methods according to embodiments of the invention. Asking the operating system for memory is costly in terms of processing power and time, and more importantly it is not a predictable operation and takes an undetermined amount of time.
[0119] An example of a container of listeners is two or more listeners that are interested in different parts of a data object.
[0120] Two embodiments of the system shown in
[0121]
[0122] Sequence barriers are part of the LMAX original disruptor pattern. They keep pointers to the next element in the ring buffer: in the case of consumer barrier, it is the pointer to the next data to be processed, in the case of producer barrier (see
[0123] The smart object pool 12 is shown to be divided into separate slots 12a-12g. Notably also in
[0124] In the case of the data multiplexer, one instance of the data multiplexer may be provided for each of different types of data. One data multiplexer may be able to handle one type of data only, hence, only one smart object pool is needed. Smart object pools may also be data type dependent.
[0125] The embodiment of
[0126] When the producer 20 is invoked, for example via a publish API, to publish certain data, an object memory slot corresponding to the data is requested from the smart object pool 12 and a pointer or other identifier of the slot containing the data to be published is returned to the producer 20. The producer 20 sends the object address/pointer to the next available sequence in the ring buffer 14. The data itself can then be accessed via the ring buffer 14 using the address/pointer.
[0127] In this embodiment, the ring buffer 14 is connected to a consumer sequence barrier 18. The consumer sequence barrier 18 is connected to the consumers 19 and the single smart object pool 12. The consumer sequence barrier 18 handles/regulates access to the data in the ring buffer 14, for the consumers 19, for example based on policy data associated with specific consumers 19. For example, a consumer A may only access data from slot 3 of the ring buffer 14, if a certain policy is satisfied. The barrier 18 may have a most recent data policy for the consumers 17 so that old events are discarded.
[0128] According to some embodiments of the invention, a policy may be that of independent consumer (subscriber) rates. In other words, different consumers may consume data at different rates. Embodiments of the invention may be used to ensure access to the most recent data and that data will be kept in the batching queue until processing by the subscribers is finished.
[0129] Supposing that the consumed data was provided from memory slot 12a, when the last consumer has finished processing the data, the memory slot 12a is returned to the smart memory pool, or released to be available for new data. This may be for example by the customers 19 notifying the consumer sequence barrier 18 and the consumer sequence barrier 18 notifying the smart object pool, as indicated in
[0130] In the multiple producer 20a, 20b/single customer 19 embodiment shown in
[0131] The operations performed in the system of
[0132]
[0133] Referring to
[0134] In operation 1920, upon receiving the address/pointer for the memory slot e.g. 12a, from the smart object pool 12, the producer sends the object address to the next available address in the ring buffer as shown in
[0135] In operation 1930, the consumer sequence barrier 18 receives a request from a consumer or multiple consumers to access data in the ring buffer 14. The consumer sequence barrier 18 checks the received request (s), against at least one policy associated with the consumer (s) making the request (s). Based the result of the check(s) (operation 1940), the consumer sequence barrier 18 either allows the consumer (s) to access the one data in the ring buffer 14 at operation 1950 or ends the process by returning a null at operation 1945.
[0136] In operation 1950, the consumer sequence barrier determines if the consumer (s) have processed/accessed all required data from the ring buffer. The determination could be made based on at least one policy associated with the consumer(s). Based on the determination, the consumer sequence barrier 18 returns the object memory slot to the one smart object pool at operation 1960 as shown in
[0137]
[0138]
[0139] The claimed method may be carried out on a physical entity, whereby producers and consumers are independent software entities, which may either be part of the same physical entity or run independently from said entity.
[0140] As noted previously, in the described embodiments of the invention the system may be implemented as a single computing device. Such a device may comprise one or more processors which may be microprocessors, controllers or any other suitable type of processors for processing computer executable instructions to control the operation of the device. The device may also comprise memory, e.g. random access memory, as well as storage optionally in the form of flash memory. Depending on the system requirements, additional capabilities may be provided as is well known in the art such as external inputs and outputs, wireless connectivity and others.
[0141] It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. Variants should be considered to be included into the scope of the invention.
[0142] The figures illustrate exemplary methods. While the methods are shown and described as being a series of acts or operations that are performed in a particular sequence, it is to be understood and appreciated that the methods are not limited by the order of the sequence. For example, some acts or operations can occur in a different order than what is described herein. In addition, an act can occur concurrently with another act. Further, in some instances, not all acts may be required to implement a method described herein.
[0143] Moreover, the acts or operations described herein may comprise computer-executable instructions that can be implemented by one or more processors and/or stored on a computer-readable medium or media. The computer-executable instructions can include routines, sub-routines, programs, threads of execution, and/or the like. Still further, results of acts of the methods can be stored in a computer-readable medium, displayed on a display device, and/or the like.
[0144] It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art. What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable modification and alteration of the above devices or methods for purposes of describing the aforementioned aspects, but one of ordinary skill in the art can recognize that many further modifications and permutations of various aspects are possible. Accordingly, the described aspects are intended to embrace all such alterations, modifications, and variations that fall within the scope of the appended claims.
[0145] There is disclosed here a computer implemented method as described in the following clauses: [0146] Clause 1. A computer-implemented method of lock-free parallel data processing in an autonomous embedded system comprising: [0147] One or more producer object reading a plurality of sensors providing data of at least one external source, [0148] a Smart Object Pool, containing pre-allocated object instances, [0149] one asynchronous publisher object arranged to [0150] acquire an event object from the pre-allocated objects in the Smart Object Pool, [0151] copy the single instance of data published by the producer into the acquired event object, place the event object or, optionally, a pointer to the event object into a batching queue, [0152] a handler arranged for asynchronously delivering the events objects in the batching queue to one or more subscriber objects [0153] said one or more subscriber being arranged for sending the event objects to one or more consumer and for returning the event object to the smart object pool once it determines that no more consumer needs the event. [0154] Clause 2. The computer-implemented method of the preceding clause, said batching queue consisting of a ring buffer [0155] Clause 3. The computer-implemented method of any one of the preceding clause consisting in either: [0156] an Event Loop system wherein event objects are received from one or a plurality of producer, and wherein each event object is transmitted to one event loop consumer, or [0157] a Sensor Multiplexer system, wherein event objects are received from a single producer thread, and wherein each event object is multiplexed to one or to a subset of a plurality of consumers. [0158] Clause 4. The computer-implemented method of any one of the preceding clauses wherein the producers and/or the consumers are software entities running in independent threads in a single processes.