Parallel signal processing system and method

09832543 · 2017-11-28

Assignee

Inventors

Cpc classification

International classification

Abstract

A system and method for processing a plurality of channels, for example audio channels, in parallel is provided. For example, a plurality of telephony channels are processed in order to detect and respond to call progress tones. The channels may be processed according to a common transform algorithm. Advantageously, a massively parallel architecture is employed, in which operations on many channels are synchronized, to achieve a high efficiency parallel processing environment. The parallel processor may be situated on a data bus, separate from a main general purpose processor, or integrated with the processor in a common board or integrated device. All, or a portion of a speech processing algorithm may also be performed in a massively parallel manner.

Claims

1. A method for processing signal streams within a plurality of channels, comprising: (a) providing a plurality of timeslices of the signal streams within the plurality of channels, each timeslice representing a time-continuous portion of digitized data of a respective signal stream within a respective channel; (b) loading the plurality of timeslices into a memory accessible by at least one multiprocessor, each respective multiprocessor having a plurality of processing cores, each of the plurality of processing cores being controlled in accordance with a common instruction sequence for the respective multiprocessor, such that the plurality of processing cores of a respective multiprocessor synchronously and concurrently execute the same processing instruction on data of a different signal stream; (c) providing the common instruction sequence for the multiprocessor adapted to process each timeslice to produce at least one of processed data and at least one data processing result for each respective timeslice; (d) executing the common instruction sequence on the at least one multiprocessor, to produce at least one of processed data and at least one data processing result for each respective timeslice, comprising at least one of a time-frequency domain transform and a wavelet transform of the time-continuous portion of digitized data of each respective timeslice in parallel; and (e) storing the at least one of processed data and at least one data processing result for each respective timeslice in a memory.

2. The method according to claim 1, wherein the signal streams each comprise telephone signals, and the common instruction sequence comprises a call progress tone analysis algorithm.

3. The method according to claim 1, wherein the common instruction sequence executes substantially without data-dependent conditional execution branch instructions.

4. The method according to claim 1, wherein the common instruction sequence executes substantially without interaction between respective channels.

5. The method according to claim 1, wherein the at least one multiprocessor is provided as a coprocessor to a general purpose processor, further comprising transferring the stored at least one of processed data and at least one data processing result for each respective timeslice from a first memory device associated with the at least one multiprocessor to a second memory device associated with the general purpose processor.

6. The method according to claim 1, wherein each signal stream comprises a telephone communication stream, and the common instruction sequence controls each respective processing core to concurrently execute a call progress tone detection algorithm, further comprising responding to a detected call progress tone represented in at least one telephone communication stream.

7. The method according to claim 1, wherein the common instruction sequence controls each respective processing core to concurrently execute a time-frequency domain transform.

8. The method according to claim 1, wherein the common instruction sequence controls each respective processing core to concurrently execute a wavelet transform.

9. The method according to claim 1, wherein the signal streams comprise human voice, and the data processing result is employed to perform speech recognition for a respective channel.

10. The method according to claim 1, wherein the common instruction sequence comprises a Goertzel filter algorithm.

11. The method according to claim 1, wherein the multiprocessor processes sequential timeslices of a plurality of audio data channels with real-time throughput.

12. The method according to claim 1, wherein the multiprocessor is part of a graphic processing unit, further comprising using the stored processing result to control a telephone switching system.

13. The method according to claim 1, wherein the common instruction sequence defines a Fourier transform.

14. The method according to claim 1, wherein the stored at least one of processed data and at least one data processing result for each respective time-slice in a memory is analyzed to determine an existence of an predetermined in-band signal.

15. The method according to claim 1, wherein the at least one of processed data and at least one data processing result for each respective timeslice represents a joint processing result representing a contribution from each of a plurality of independent data streams, wherein the joint processing result is produced in real time at a fixed maximum latency after the a continuous portion of digitized data is stored in the memory.

16. The method according to claim 15, wherein the plurality of independent data streams comprise a plurality of telephone communication channels, wherein the common instruction sequence defines an algorithm adapted to process each timeslice to identify an in-band signaling tone of a respective telephone communication channel.

17. The method according to claim 16, wherein the instruction sequence defines a Goertzel filter algorithm to identify the in-band signaling tone, further comprising receiving the identification of the in-band signaling tone by a host general purpose programmable processor and performing a communication channel control function by the general purpose programmable processor in dependence on the identification.

18. A method for processing a set of parallel signal streams, comprising: (a) receiving a series of timeslices over time for each respective parallel signal stream, each timeslice representing a time-continuous portion of digitized data of a respective signal stream loaded into a memory accessible by a multiprocessor having a plurality of parallel processing cores, the plurality of parallel processing cores being under control of a common instruction sequence, such that the plurality of parallel processing synchronously and concurrently execute the same processing instruction on a respective timeslice of a different signal stream; and (b) executing the common instruction sequence on the multiprocessor, to produce a respective processing result for each of the parallel signal streams, comprising convolution processing and at least one of a time-frequency domain transform and a wavelet transform of the time-continuous portion of digitized data of each respective timeslice.

19. A parallel processing method for processing a plurality of audio channels comprising speech in parallel, comprising: receiving the plurality of audio channels comprising speech in parallel; loading a current set of parallel timeslices from the plurality of audio channels comprising speech into a memory array accessible by a multiprocessor system, the multiprocessor system having a plurality of processing cores together controlled in parallel by a common instruction sequence; processing the current set of parallel timeslices from the plurality of audio channels comprising speech with the multiprocessor system, according to the common instruction sequence such that each processing core of the multiprocessor is synchronized to concurrently execute the same instruction on data representing a timeslice of a respective audio channel, to at least one of characterize the speech of a respective audio channel, characterize an automate in-band signal of a respective audio channel, and characterize a respective audio channel, to produce a set of current characterizations for each of the set of parallel timeslices stored in a memory; loading a subsequent set of parallel timeslices from the plurality of audio channels comprising speech into the same memory locations of the memory array as previously occupied by the current set of parallel timeslices; and processing the subsequent set of parallel timeslices from the plurality of audio channels comprising speech according to an algorithm comprising at least one of a time-frequency domain transform and a wavelet transform with the multiprocessor system, according to the common instruction sequence such that each processing core of the multiprocessor is synchronized to concurrently execute the same instruction on data representing a timeslice of a respective audio channel, to at least one of characterize the speech of a respective audio channel, characterize an automate in-band signal of a respective audio channel, characterize the respective audio channel, and to determine a change in a respective prior characterization associated with a respective audio channel, to produce a set of subsequent characterizations for each of the set of parallel timeslices stored in the memory.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a schematic diagram of a system for implementing the invention.

(2) FIG. 2 is a flowchart of operations within a host processor

(3) FIG. 3 is a schematic diagram showing operations with respect to a massively parallel coprocessor.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

(4) One embodiment of the present invention provides a system and method for analyzing call progress tones and performing other types of audio band processing on a plurality of voice channels, for example in a telephone system. Examples of call progress tone analysis can be found at: www.commetrex.com/products/algorithms/CPA.html; www.dialogic.com/network/csp/appnots/10117_CPA_SR6_HMP2.pdf; whitepapers.zdnet.co.uk/0,1000000651,260123088p,00.htm; and www.pikatechnologies.com/downloads/samples/readme/6.2%20-%20Call%20Progress%20Analysis%20-%20ReadMe.txt.

(5) In a modest size system for analyzing call progress tones, there may be hundreds of voice channels to be handled are simultaneously. Indeed, the availability of a general purpose call progress tone processing system permits systems to define non-standard or additional signaling capabilities, thus reducing the need for out of band signaling. Voice processing systems generally require real time performance; that is, connections must be maintained and packets or streams forwarded within narrow time windows, and call progress tones processed within tight specifications.

(6) An emerging class of telephone communication processing system, implements a private branch exchange (PBX) switch, which employs a standard personal computer (PC) as a system processor, and employs software which executes on a general purpose operating system (OS).

(7) For example, the Asterisk system runs on the Linux OS. More information about Asterisk may be found at Digium/Asterisk, 445 Jan Davis Drive NW, Huntsville, Ala. 35806, 256.428.6000 asterisk.org/downloads. Another such system is: “Yate” (Yet Another Telephony Engine), available from Bd. Nicolae Titulescu 10, Bl. 20, Sc. C, Ap. 128 Sector 1, Bucharest, Romania yate.null.ro/pmwiki/index.php?n=Main.Download.

(8) In such systems, scalability to desired levels, for example hundreds of simultaneous voice channels, requires that the host processor have sufficient headroom to perform all required tasks within the time allotted. Alternately stated, the tasks performed by the host processor should be limited to those it is capable of completing without contention or undue delay. Because digitized audio signal processing is resource intensive, PC-based systems have typically not implemented functionality, which requires per-channel signal processing, or offloaded the processing to specialized digital signal processing (DSP) boards. Further, such DSP boards are themselves limited, for example 8-16 voice processed channels per DSP core, with 4-32 cores per board, although higher density boards are available. These boards are relatively expensive, as compared to the general purpose PC, and occupy a limited number of bus expansion slots.

(9) The present invention provides an alternate to the use of specialized DSP processors dedicated to voice channel processing. According to one embodiment, a massively parallel processor as available in a modern video graphics processor (though not necessarily configured as such) is employed to perform certain audio channel processing tasks, providing substantial capacity and versatility. One example of such a video graphics processor is the nVidia Tesla™ GPU, using the CUDA software development platform (“GPU”). This system provides 8 banks of 16 processors (128 processors total), each processor capable of handling a real-time fast Fourier transform (FFT) on 8-16 channels. For example, the FFT algorithm facilitates subsequent processing to detect call progress tones, which may be detected in the massively parallel processor environment, or using the host processor after downloading the FFT data. One particularly advantageous characteristic of implementation of a general purpose FFT algorithm rather than specific call tone detection algorithms is that a number of different call tone standards (and extensions/variants thereof) may be supported, and the FFT data may be used for a number of different purposes, for example speech recognition, etc.

(10) Likewise, the signal processing is not limited to FFT algorithms, and therefore other algorithms may also or alternately be performed. For example, wavelet based algorithms may provide useful information.

(11) The architecture of the system provides a dynamic link library (DLL) available for calls from the telephony control software, e.g., Asterisk. An application programming interface (API) provides communication between the telephony control software (TCS) and the DLL. This TCS is either unmodified or minimally modified to support the enhanced functionality, which is separately compartmentalized.

(12) The TCS, for example, executes a process which calls the DLL, causing the DLL to transfer a data from a buffer holding, e.g., 2 mS of voice data for, e.g., 800 voice channels, from main system memory of the PC to the massively parallel coprocessor (MPC), which is, for example an nVidia Tesla™ platform. The DLL has previously uploaded to the MPC the algorithm, which is, for example, a parallel FFT algorithm, which operates on all 800 channels simultaneously. It may, for example, also perform tone detection, and produce an output in the MPC memory of the FFT-representation of the 800 voice channels, and possibly certain processed information and flags. The DLL then transfers the information from the MPC memory to PC main memory for access by the TCS, or other processes, after completion.

(13) While the MPC has massive computational power, it has somewhat limited controllability. For example, a bank of 16 DSPs in the MPC are controlled by a single instruction pointer, meaning that the algorithms executing within the MPC are generally not data-dependent in execution, nor have conditional-contingent branching, since this would require each thread to execute different instructions, and thus dramatically reduce throughput. Therefore, the algorithms are preferably designed to avoid such processes, and should generally be deterministic and non-data dependent algorithms. On the other hand, it is possible to perform contingent or data-dependent processing, though the gains from the massively parallel architecture are limited, and thus channel specific processing is possible. Advantageously, implementations of the FFT algorithm are employed which meet the requirements for massively parallel execution. For example, the CUDA™ technology environment from nVidia provides such algorithms. Likewise, post processing of the FFT data to determine the presence of tones poses a limited burden on the processor(s), and need not be performed under massively parallel conditions. This tone extraction process may therefore be performed on the MPC or the host PC processor, depending on respective processing loads and headroom.

(14) In general, the FFT itself should be performed in faster-than real-time manner. For example, it may be desired to implement overlapping FFTs, e.g., examining 2 mS of data every 1 mS, including memory-to-memory transfers and associated processing. Thus, for example, it may be desired to complete the FFT of 2 mS of data on the MPC within 0.5 mS. Assuming, for example, a sampling rate of 8.4 kHz, and an upper frequency within a channel of 3.2-4 kHz, the 2 mS sample, would generally imply a 256 point FFT, which can be performed efficiently and quickly on the nVidia Tesla™ platform, including any required windowing and post processing.

(15) Therefore, the use of the present invention permits the addition of call progress tone processing and other per channel signal processing tasks to a PC based TCS platform without substantially increasing the processing burden on the host PC processor, and generally permits such a platform to add generic call progress tone processing features and other per channel signal processing features without substantially limiting scalability.

(16) Other sorts of parallel real time processing are also possible, for example analysis of distributed sensor signals such as “Motes” or the like. See, en.wikipedia.org/wiki/Smartdust. The MPC may also be employed to perform other telephony tasks, such as echo cancellation, conferencing, tone generation, compression/decompression, caller ID, interactive voice response, voicemail, packet processing and packet loss recovery algorithms, etc.

(17) Similarly, simultaneous voice recognition can be performed on hundreds of simultaneous channels, for instance in the context of directing incoming calls based on customer responses at a customer service center. Advantageously, in such an environment, processing of particular channels maybe switched between banks of multiprocessors, depending on the processing task required for the channel and the instructions being executed by the multiprocessor. Thus, to the extent that the processing of a channel is data dependent, but the algorithm has a limited number of different paths based on the data, the MPC system may efficiently process the channels even where the processing sequence and instructions for each channel is not identical.

(18) FIG. 1 shows a schematic of system for implementing the invention.

(19) Massively multiplexed voice data 101 is received at network interface 102. The network could be a LAN, Wide Area Network (WAN), Prime Rate ISDN (PRI), a traditional telephone network with Time Division Multiplexing (TDM), or any other suitable network. This data may typically include hundreds of channels, each carrying a separate conversation and also routing information. The routing information may be in the form of in-band signaling of dual frequency (DTMF) audio tones received from a telephone keypad or DTMF generator. The channels may be encoded using digital sampling of the audio input prior to multiplexing. Typically voice channels will come in 20 ms frames.

(20) The system according to a preferred coprocessor embodiment includes at least one host processor 103, which may be programmed with telephony software such as Asterisk or Yate, cited above. The host processor may be of any suitable type, such as those found in PCs, for example Intel Pentium Core 2 Duo or Quadra, or AMD Athlon X2. The host processor communicates via shared memory 104 with MPC 105, which is, for example 2 GB or more of DDR2 or DDR3 memory.

(21) Within the host processor, application programs 106 receive demultiplexed voice data from interface 102, and generate service requests for services that cannot or are desired not to be be processed in real time within the host processor itself. These service requests are stored in a service request queue 107. A service calling module 108 organizes the service requests from the queue 107 for presentation to the MPC 105.

(22) The module 108 also reports results back to the user applications 106, which in turn put processed voice data frames back on the channels in real time, such that the next set of frames coming in on the channels 101 can be processed as they arrive.

(23) FIG. 2 shows a process within module 108. In this process, a timing module 201 keeps track of a predetermined real time delay constraint. Since standard voice frames are 20 ms long, this constraint should be significantly less than that to allow operations to be completed in real time. A 5-10 ms delay would very likely be sufficient; however a 2 ms delay would give a degree of comfort that real time operation will be assured. Then, at 202, e blocks of data requesting service are organized into the queue or buffer. At 203, the service calling module examines the queue to see what services are currently required. Some MPC's, such as the nVidia Tesla™ C870 GPU, require that each processor within a multiprocessor of the MPC perform the same operations in lockstep. For such MPC's, it will be necessary to choose all requests for the same service at the same time. For instance, all requests for an FFT should be grouped together and requested at once. Then all requests for a Mix operation might be grouped together and requested after the FFT's are completed—and so forth. The MPC 105 will perform the services requested and provide the results returned to shared memory 104. At 204, the service calling module will retrieve the results from shared memory and at 205 will report the results back to the application program. At 206, it is tested whether there is more time and whether more services are requested. If so, control returns to element 202. If not, at 207, the MPC is triggered to sleep (or be available to other processes) until another time interval determined by the real time delay constraint is begun, FIG. 3 shows an example of running several processes on data retrieved from the audio channels. The figure shows the shared memory 104 and one of the processors 302 from the MPC 105. The processor 302 first retrieves one or more blocks from the job queue or buffer 104 that are requesting an FFT and performs the FFT on those blocks. The other processors within the same multiprocessor array of parallel processors are instructed to do the same thing at the same time (on different data). After completion of the FFT, more operations can be performed. For instance, at 304 and 305, the processor 302 checks shared memory 104 to see whether more services are needed. In the examples given, mixing 304 and decoding 305 are requested by module 109, sequentially. Therefore these operations are also performed on data blocks retrieved from the shared memory 104. The result or results of each operation are placed in shared memory upon completion of the operation, where those results are retrievable by the host processor.

(24) In the case of call progress tones, these three operations together: FFT, mixing, and decoding, will determine the destination of a call associated with the block of audio data for the purposes of telephone switching.

(25) If module 108 sends more request for a particular service than can be accommodated at once, some of the requests will be accumulated in a shared RAM 109 to be completed in a later processing cycle. The MPC will be able to perform multiple instances of the requested service within the time constraints imposed by the loop of FIG. 2. Various tasks may be assigned priorities, or deadlines, and therefore the processing of different services may be selected for processing based on these criteria, and need not be processed in strict order.

(26) The following is some pseudo code illustrating embodiments of the invention as implemented in software. The disclosure of a software embodiment does not preclude the possibility that the invention might be implemented in hardware.

Embodiment #1

(27) Data Structures to be Used by Module 108

(28) RQueueType Structure // Job Request Queue

(29) ServiceType

(30) ChannelID // Channel Identifier

(31) VoiceData // Input Data

(32) Output // Output Data

(33) End Structure

(34) // This embodiment uses a separate queue for each type of service to be requested.

(35) // The queues have 200 elements in them. This number is arbitrary and could be adjusted

(36) // by the designer depending on anticipated call volumes and numbers of processors available

(37) // on the MPC. Generally the number does not have to be as large as the total of number

(38) // of simultaneous calls anticipated, because not all of those calls will be requesting services

(39) // at the same time.

(40) RQueueType RQueueFFT[200] // Maximum of 200 Requests FFT

(41) RQueueType RQueueMIX[200] // Maximum of 200 Requests MIX

(42) RQueueType RQueueENC[200] // Maximum of 200 Requests ENC

(43) RQueueType RQueueDEC[200] // Maximum of 200 Requests DEC

(44) Procedures to be Used by Module 108

(45) // Initialization Function

(46) Init: Initialize Request Queue Initialize Service Entry Start Service Poll Loop
// Service Request Function

(47) ReqS: Case ServiceType FFT: Lock RQueueFFT Insert Service Information into RQueueFFT Unlock RQueueFFT MIX: Lock RQueueMIX Insert Service Information into RQueueMIX Unlock RQueueMIX ENC: Lock RQueueENC Insert Service Information into RQueueENC Unlock RQueueENC DEC: Lock RQueueDEC Insert Service Information into RQueueDEC Unlock RQueueDEC End Case Wait for completion of Service Return output
// Service Poll Loop
// This loop is not called by the other procedures. It runs independently. It will keep track of
// where the parallel processors are in their processing. The host will load all the requests for a
// particular service into the buffer. Then it will keep track of when the services are completed
// and load new requests into the buffer.
//
SerPL: Get timestamp and store in St

(48) // Let's do FFT/FHT

(49) Submit RQueueFFT with FFT code to GPU

(50) For all element in RQueueFFT Signal Channel of completion of service

(51) End For

(52) // Let's do mixing

(53) Submit RQueueMIX with MIXING code to GPU

(54) For all element in RQueueMIX Signal Channel of completion of service

(55) End For

(56) // Let's do encoding

(57) Submit RQueueENC with ENCODING code to GPU

(58) For all element in RQueueENC Signal Channel of completion of service

(59) End For

(60) // Let's do decoding

(61) Submit RQueueDEC with DECODING code to GPU

(62) For all element in RQueueDEC Signal Channel of completion of service

(63) End For

(64) // Make sure it takes the same amount of time for every pass

(65) Compute time difference between now and St

(66) Sleep that amount of time

(67) Goto SerPL // second pass

(68) Examples of Code in Application Programs 106 for Calling the Routines Above

(69) Example for Calling “Init”

(70) // we have initialize PStar before we can use it

(71) Call Init

(72) Example for Requesting an FFT

(73) // use FFT service for multitone detection

(74) Allocate RD as RQueueType

(75) RD.Service=FFT

(76) RD.ChannelID=Current Channel ID

(77) RD.Input=Voice Data

(78) Call ReqS(RD)

(79) Scan RD.Output for presence of our tones

(80) Example for Requesting Encoding

(81) // use Encoding service

(82) Allocate RD as RQueueType

(83) RD.Service=ENCODE

(84) RD.ChannelID=Current Channel ID

(85) RD.Input=Voice Data

(86) Call ReqS(RD)

(87) // RD.Output contains encoded/compressed data

(88) Example for Requesting Decoding

(89) // use Decoding service

(90) Allocate RD as RQueueType

(91) RD.Service=DECODE

(92) RD.ChannelID=Current Channel ID

(93) RD.Input=Voice Data

(94) Call ReqS(RD)

(95) // RD.Output contains decoded data

Embodiment #2

(96) // This embodiment is slower, but also uses less memory than embodiment #1 above

(97) Data Structures to be Used by Module 108

(98) RQueueType Structure // Job Request Queue ServiceType ChannelID // Channel Identifier VoiceData // Input Data Output // Output Data

(99) End Structure

(100) // This embodiment uses a single queue, but stores other data in a temporary queue

(101) // when the single queue is not available. This is less memory intensive, but slower.

(102) RQueueType RQueue[200] // Maximum of 200 Requests

(103) Procedures to be Used by Module 108

(104) // Initialization Function

(105) Init: Initialize Request Queue Initialize Service Entry Start Service Poll Loop

(106) // Service Request Function

(107) ReqS: Lock RQueue Insert Service Information into RQueue Unlock RQueue Wait for completion of Service Return output

(108) // Service Poll Loop

(109) // to run continuously

(110) SerPL: Get timestamp and store in St // Let's do FFT/FHT For all element in RQueue where SerivceType=FFT Copy Data To TempRQueue End For Submit TempRQueue with FFT code to GPU For all element in TempRQueue Move TempRQueue.output to RQueue.output Signal Channel of completion of service End For // Let's do mixing For all element in RQueue where SerivceType=MIXING Copy Data To TempRQueue End For Submit TempRQueue with MIXING code to GPU For all element in RQueue Move TempRQueue.output to RQueue.output Signal Channel of completion of service End For // Let's do encoding For all element in RQueue where SerivceType=ENCODE Copy Data To TempRQueue End For Submit TempRQueue with ENCODING code to GPU For all element in RQueue Move TempRQueue.output to RQueue.output Signal Channel of completion of service End For // Let's do decoding For all element in RQueue where SerivceType=DECODE Copy Data To TempRQueue End For Submit TempRQueue with DECODING code to GPU For all element in RQueue Move TempRQueue.output to RQueue.output Signal Channel of completion of service End For // Make sure it takes the same amount of time for every pass Compute time difference between now and St Sleep that amount of time Goto SerPL // second pass
Examples of Code in the Application Programs 106 for Calling the Routines Above
Example for Calling “Init”

(111) // we have initialize PStar before we can use it

(112) Call Init

(113) Example for Calling “FFT”

(114) // use FFT service for multitone detection

(115) Allocate RD as RQueueType

(116) RD.Service=FFT

(117) RD.ChannelID=Current Channel ID

(118) RD.Input=Voice Data

(119) Call ReqS(RD)

(120) Scan RD.Output for presents of our tones

(121) Example for Calling Encoding

(122) // use Encoding service

(123) Allocate RD as RQueueType

(124) RD.Service=ENCODE

(125) RD.ChannelID=Current Channel ID

(126) RD.Input=Voice Data

(127) Call ReqS(RD)

(128) // RD.Output contains encoded/compressed data

(129) Example for Calling Decoding

(130) // use Decoding service

(131) Allocate RD as RQueueType

(132) RD.Service=DECODE

(133) RD.ChannelID=Current Channel ID

(134) RD.Input=Voice Data

(135) Call ReqS(RD)

(136) // RD.Output contains decoded data

(137) While the embodiment discussed above uses a separate host and massively parallel processing array, it is clear that the processing array may also execute general purpose code and support general purpose or application-specific operating systems, albeit with reduced efficiency as compared to an unbranched signal processing algorithm. Therefore, it is possible to employ a single processor core and memory pool, thus reducing system cost and simplifying system architecture. Indeed, one or more multiprocessors may be dedicated to signal processing, and other(s) to system control, coordination, and logical analysis and execution. In such a case, the functions identified above as being performed in the host processor would be performed in the array, and, of course, the transfers across the bus separating the two would not be required.

(138) The present invention may be applied to various parallel data processing algorithms for independent or interrelated data streams. For example, telephone conversions, sensor arrays, communications from computer network components, image processing, tracking of multiple objects within a space, object recognition in complex media or multimedia, and the like.

(139) One particular advantage of the present architecture is that it facilitates high level interaction of multiple data streams and data fusion. Thus for example, in a telephone environment, the extracted call progress tones may be used by a call center management system to control workflows, scheduling, pacing, monitoring, training, voice stress analysis, and the like, which involve a an interaction of a large number of concurrent data streams which are each nominally independent. On the other hand, in a seismic data processor, there will typically be large noise signals imposed on many sensors, which must be both individually processed and processor for correlations and significant events. Therefore, another advantage of the integration of the real time parallel data processing and analysis within a computing platform, that supports a general purpose (typically non-real time) operating system, is that a high level of complex control may be provided based on the massive data flows through the real-time subsystem, within an integrated platform, and often without large expense, using available computational capacity efficiently.

(140) From a review of the present disclosure, other modifications will be apparent to persons skilled in the art. Such modifications may involve other features which are already known in the design, manufacture and use of telephony engines and parallel processing and which may be used instead of or in addition to features already described herein. Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure of the present application also includes any novel feature or novel combination of features disclosed herein either explicitly or implicitly or any generalization thereof, whether or not it mitigates any or all of the same technical problems as does the present invention. The applicants hereby give notice that new claims may be formulated to such features during the prosecution of the present application or any further application derived therefrom.

(141) The word “comprising”, “comprise”, or “comprises” as used herein should not be viewed as excluding additional elements. The singular article “a” or “an” as used herein should not be viewed as excluding a plurality of elements. The word “or” should be construed as an inclusive or, in other words as “and/or”.