LOW-POWER PROCESSOR WITH SUPPORT FOR MULTIPLE PRECISION MODES
20170322808 · 2017-11-09
Assignee
Inventors
Cpc classification
G06F7/38
PHYSICS
International classification
Abstract
Multiple data wordlengths may be supported by a processor through a single data path and/or a single set of registers. For example, the processor may support 16-bit wordlengths and 24-bit wordlengths through a single datapath. For supported data wordlengths that are less than the wordlength of the registers and datapath, the data may be left-aligned within the registers and datapath. The left alignment of data may allow saturation detection in the processor to be performed by examining the same saturation point regardless of the wordlength of the data being operated on. A special saturation mode of the processor may set the lower bits to zero when a configuration register or instruction-bit is set and saturation is detected.
Claims
1. An apparatus, comprising: a processor capable of saturation and configured to process data in at least a first mode for processing data of a first wordlength and a second mode for processing data of a second wordlength that is longer than the first wordlength, wherein a register size of the processor matches the second wordlength, and wherein the processor is further configured to perform steps comprising: processing the data as data aligned to the most significant bits of registers of the processor in the low-precision mode; and detecting saturation during the processing of the data, wherein the same saturation point is examined whether the processor is operating in the low-precision mode or the high-precision mode.
2. The apparatus of claim 1, wherein the processor is configured to process the data aligned to the most significant bits as left-aligned data.
3. The apparatus of claim 1, wherein the processor is further configured to perform steps comprising clearing one or more least significant bits (LSB) not in use during operation of the processor in the first mode upon detecting saturation during the processing of data in the first mode.
4. The apparatus of claim 3, wherein the processor performs the step of clearing the one or more least significant bits (LSB) in hardware.
5. The apparatus of claim 3, wherein the processor performs the step of clearing the one or more least significant bits (LSB) in response to a received instruction.
6. The apparatus of claim 1, wherein the processor is further configured to perform steps comprising clearing one or more least significant bits (LSB) not in use during operation of the processor in the first mode after pre-determined operations are performed during the processing of the data, wherein the pre-determined operations may cause least significant bits (LSBs) to be set during the processing of the data in the first mode.
7. The apparatus of claim 1, wherein the processor is configured to process 16-bit data when the processor is operating in low-precision mode, and wherein the processor is configured to process 24-bit data when the processor is operating in high-precision mode.
8. The apparatus of claim 1, wherein the processor comprises a digital signal processor (DSP).
9. A method, comprising: receiving, at a processor having a register size of a second wordlength, an indication of whether data is low-precision data of a first wordlength or high-precision data of the second wordlength that is longer than the first wordlength; processing, by the processor, the data as data aligned to most significant bits in registers of the processor in the low-precision mode; and detecting, by the processor, saturation during the processing of the data, wherein the same saturation point is examined whether the data is low-precision data or high-precision data.
10. The method of claim 9, wherein the step of processing the data as data aligned to the most significant bits comprises processing the data as left-aligned data.
11. The method of claim 9, further comprising clearing one or more least significant bits (LSB) not in use during operation of the processor on low-precision data upon detecting saturation during the processing of the low-precision data.
12. The method of claim 11, wherein the step of clearing the one or more least significant bits (LSB) is performed in processor hardware.
13. The method of claim 11, wherein the step of clearing the one or more least significant bits (LSB) is in response to a received instruction.
14. The method of claim 9, further comprising clearing one or more least significant bits (LSB) not in use during operation of the processor on the low-precision data after pre-determined operations are performed during the processing of the data, wherein the pre-determined operations may cause least significant bits (LSBs) to be set during the processing of the low-precision data.
15. The method of claim 9, wherein the low-precision data comprises 16-bit data, and wherein the high-precision data comprises 24-bit data.
16. The method of claim 9, wherein the step of receiving an indication of whether data is low-precision data or high-precision data comprises reading a configuration register of a processor.
17. An apparatus, comprising: a digital signal processor (DSP) configured to process data with a first wordlength in a first mode and to process data with a second wordlength longer than the first wordlength in a second mode, wherein a register size of the digital signal processor (DSP) matches the second wordlength, the digital signal processor (DSP) configured to perform steps comprising: processing data aligned to most significant bits of registers of the digital signal processor (DSP) in the first mode; and detecting saturation during the processing of the data in the first mode, wherein the same saturation point is examined whether the processor is operating in the first mode or the second mode.
18. The apparatus of claim 17, wherein the digital signal processor (DSP) comprises multiply-accumulate circuitry configured to perform the steps of processing the data and detecting the saturation.
19. The apparatus of claim 18, wherein the multiply-accumulate circuitry is configured to process data aligned to more significant bits by processing left-aligned data.
20. The apparatus of claim 18, wherein the multiply-accumulate circuitry comprises: a first set of registers; a multiplier coupled to the first set of registers and configured to receive two operands from the first set of registers; an adder coupled to the multiplier and configured to receive a result of a multiplication operation of the two received operands; and an accumulation register coupled to the adder and configured to accumulate value.
21. The apparatus of claim 20, wherein the multiplier is configured to operate on both low-precision data in the first mode and on high-precision data in the second mode.
22. The apparatus of claim 17, wherein when operating in the first mode, the digital signal processor processes 16-bit data, and wherein when operating in the second mode, the digital signal processor processes 24-bit data.
23. The apparatus of claim 17, wherein the digital signal processor is further configured to perform steps comprising clearing one or more least significant bits (LSB) not in use during operation of the processor in the first mode upon detecting saturation during the processing of data in the first mode.
24. The apparatus of claim 17, wherein the processor is further configured to perform steps comprising clearing one or more least significant bits (LSB) not in use during operation of the processor in the first mode after pre-determined operations are performed during the processing of the data in the first mode, wherein the pre-determined operations may cause least significant bits (LSBs) to be set during the processing of the data.
25. A computer program product, comprising: a non-transitory computer readable medium comprising code for performing steps comprising: receiving an indication of whether data is low-precision data of a first wordlength or high-precision data of a second wordlength that is longer than the first wordlength; processing the data as data aligned to most significant bits in the low-precision mode; and detecting saturation during the processing of the data, wherein the same saturation point is examined whether the data is low-precision data or high-precision data.
26. The computer program product of claim 25, wherein the step of processing the data as data aligned to the most significant bits comprises processing the data as left-aligned data.
27. The computer program product of claim 25, wherein the medium further comprises code to perform a step of clearing one or more least significant bits (LSB) not in use during operation of the processor on low-precision data upon detecting saturation during the processing of the low-precision data.
28. The computer program product of claim 25, wherein the medium further comprises code to perform a step of clearing one or more least significant bits (LSB) not in use during operation of the processor on the low-precision data after pre-determined operations are performed during the processing of the data, wherein the pre-determined operations may cause least significant bits (LSBs) to be set during the processing of the low-precision data.
29. The computer program product of claim 25, wherein the low-precision data comprises 16-bit data, and wherein the high-precision data comprises 24-bit data.
30. The computer program product of claim 25, wherein the step of receiving an indication of whether data is low-precision data or high-precision data comprises reading a configuration register of a processor.
31. A method, comprising: processing first data in a first mode having a first wordlength using a datapath of a processor; and processing second data in a second mode having a second wordlength that is longer than the first wordlength using the datapath of the processor, wherein the step of processing the first data in the first mode comprises processing the first data as data aligned to most significant bits of the datapath.
32. The method of claim 31, wherein the step of processing the first data in the first mode comprises processing the first data as left-aligned data.
33. The method of claim 31, wherein the step of processing the first data in the first mode comprises clearing one or more least significant bits (LSB) not in use during processing the first data upon detecting saturation during the processing of the first data.
34. The method of claim 31, wherein the step of processing the first data in the first mode comprises clearing one or more least significant bits (LSB) not in use during operation of the processor in the first mode after pre-determined operations are performed during the processing of the first data, wherein the pre-determined operations may cause least significant bits (LSBs) to be set during the processing of the first data.
35. The method of claim 31, wherein the first data comprise 16-bit words, and wherein the second data comprises 24-bit words.
36. An apparatus, comprising: a processor comprising a datapath for processing data, wherein the processor processes first data of a first wordlength in a first mode using the datapath, and wherein the processor processes second data of a second wordlength longer than the first wordlength in a second mode using the datapath, and wherein the processor processes the first data in the first mode as data aligned to most significant bits of the datapath.
37. The apparatus of claim 36, wherein the processor is configured to process the first data in the first mode by processing the first data as left-aligned data.
38. The apparatus of claim 36, wherein the processor processes the first data in the first mode by clearing one or more least significant bits (LSB) not in use during the processing of the first data upon detecting saturation during the processing of the first data.
39. The apparatus of claim 36, wherein the processor processes the first data in the first mode by clearing one or more least significant bits (LSB) not in use during processing the first data after pre-determined operations are performed during the processing of the first data, wherein the pre-determined operations may cause least significant bits (LSBs) to be set during the processing of the first data.
40. The apparatus of claim 36, wherein the first data comprise 16-bit words, and wherein the second data comprises 24-bit words.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] For a more complete understanding of the disclosed system and methods, reference is now made to the following descriptions taken in conjunction with the accompanying drawings.
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
DETAILED DESCRIPTION
[0029]
[0030] In a two-mode embodiment, data transmitted along the datapath 306 may be formatted as shown in data 310 and 312. Data 310 may illustrate low-precision data transmitted over datapath 306, and data 312 may illustrate high-precision data transmitted over datapath 306. “Higher bits” described herein may refer to bits of more significance, or bits that are left-aligned in a big endian computer system. “Lower bits” described herein may refer to bits of less significance, or bits that are right-aligned in a little endian computer system. The high-precision data 312 occupies all bits in the datapath 306. The low-precision data 310 occupies fewer than all bits in the datapath 306 and is left aligned, such that the data is stored in the most significant bits (MSBs) of the datapath 306, which leaves the least significant bits (LSBs) unused. Although blanks may be indicated in the N2 LSBs of data 310, in implementation zeroes may be located in these bits, and those zeroes would have no impact on the value being stored in the N1 MSBs. The least significant N2 bits may thus be set to zero during operation in low-precision mode as those bits do not impact the value represented by bits in the data 310. Setting these lower bits to zero may reduce power consumption by the circuitry 308 when processing the low-precision data. Further, setting these lower bits to zero may prevent propagation of bit toggles to higher bits that could cause arithmetic errors and higher power consumption. Although two-mode operation is described, the processor 302 may support additional modes of operation to support additional wordlengths.
[0031] When operations are performed on values contained in the data, such as the low-precision data, the values may reach a saturation point, or reach a largest possible value that can be stored in a certain number of bits. Saturation may be detected by the processor and handled to prevent arithmetic errors, such as overflow. The left alignment of data may allow saturation detection in the processor to be performed by examining the same saturation point regardless of the wordlength of the data being operated on.
[0032] Referring back to
[0033] One example of saturation detection is shown in the following source code that may be executed by the processor 302 for 16-bit and 24-bit modes of operation:
TABLE-US-00001 if ((in[24:23] == 2’b00) || (in[24:23] == 2’b11)) out[23:0] = in[23:0] else if (in < 0) out = 0x800000; else if (S16) out = 0x7FFF00; else out = 0x7FFFFF;
[0034] In the example above, “in” and “out” may denote memory locations 24 bits in width, such as one of the registers 304, and “S16 ” may denote a configuration bit, such as bit 312A, indicating a mode of operation for the processor 302. In the code above, when saturation is detected, the configuration bit S16 is examined, and if the S16 bit is set indicating 16-bit mode of operation, then the memory location is saturated such that the low bits remain zero. If saturation is detected and the configuration bit S16 is not set (indicating 24-bit mode of operation), then the memory location is saturated with all bits set to one. Example input values to the code above are listed in Table 1 below along with the corresponding output of the code.
TABLE-US-00002 TABLE 1 In Value Out Value Notes 0x0876543 0x7FFFFF Positive saturation detected with S16 = 0 0x100AAAA 0x800000 Negative saturation detected with S16 = 0 0x0654321 0x654321 No saturation detected 0x08765 0x7FFF00 Positive saturation detected with S16 = 1
[0035] Another example of saturation detection is shown in the following source code that may be executed by the processor 302 for 32-bit and 48-bit modes of operation:
TABLE-US-00003 if ((in[55:47] == 9’d0) || (in[55:47] == 9’b111111111)) out[47:0] = in[47:0]; else if (in <0) out = 0x800000000000; else if (S32) out = 0x7FFFFFFF0000; else out = 0x7FFFFFFFFFFF;
[0036] Example input values to the code above are listed in Table 2 below along with the corresponding output of the code.
TABLE-US-00004 TABLE 2 In Value Out Value Notes 0x12345678ABCDEF 0x7FFFFFFFFFF Positive saturation detected with S32 = 0 0xCCBBBBBBAAAAAA Ox800000000000 Negative saturation detected with S32 = 0 0xFF876543210ABC 0x876543210ABC No saturation detected 0x12345678ABCDEF 0x7FFFFFFF0000 Positive saturation detected with S32 = 1
[0037] The processor 302 may determine the appropriate mode of operation by receiving information from an application executing on the processor 302. In one embodiment, the processor 302 may include a configuration register 312, in which one configuration bit 312A, may be set to zero or one to toggle the processor 302 between two modes of operation. In processors with more than two modes of operation, additional bits may be used in the configuration register 312 to indicate which of multiple modes of operation should be executed. The configuration bit 312A may be set during execution of an application. In another embodiment, the processor 302 may implement different instructions for operations in different modes of operations. For example, the processor 302 may receive a “MULT1” operation instructing execution of multiplication in a first mode of operation, such as multiplying two 16-bit values, and may receive a “MULT2” operation instructing execution of multiplication in a second mode of operation, such as multiplying two 24-bit values.
[0038] The registers 304 may be configured to support the multiple possible wordlengths in different modes of operation. For example, the registers 304 may have a wordlength matching the width of datapath 306, which is the largest wordlength of the various modes of operation possible within processor 300. For example, when the two modes of operation are 16-bit and 24-bit, the registers 304 may have a wordlength of 24-bit. Low-precision values may be packed into the 24-bit registers. By storing multiple wordlengths of data in the registers 304, the processor may include less circuitry and thus support a greater maximum clock speed and subsequent speed of operation.
[0039] A method of operating a processor to support multiple modes of operation is shown in
[0040] One operation that may be performed by the processor in block 504 of
a+(b×c).fwdarw.a,
where a is the value stored in the accumulation register 608, and b and c are operands retrieved from the registers 602 through datapath 610. The multiply-accumulate (MAC) operation described with reference to
[0041] In one embodiment, the multiplier 604 may process data received through datapath 610 similarly regardless of the wordlength of the data. For example, when 24-bit data is received, the multiplier 604 may multiply the operands to obtain a result, and when 16-bit data is received having all lower bits set to zero, the multiplier 604 may similarly multiply the operands to obtain a result. In contrast, conventional multipliers may divide operands into pieces and perform multiplication of the various pieces of the words and sum the words together. For example, a conventional multiplier may divide a 24-bit word into a 16-bit portion and an 8-bit portion, perform multiplication using the 16-bit portion and 8-bit portions separately and sum the results. This division allows the multiplier to be capable of supporting 16-bit arithmetic when the conventional multiplier receives a 16-bit word instead of a 24-bit word. In some embodiments, the multiplier 604 may not divide operands into portions when performing multiplication or other arithmetic operations.
[0042] The processor embodiments described above may be useful in any computing device to reduce power consumption, reduce heat dissipation, decrease size, and reduce cost. One particularly advantageous embodiment may include the integrating of the processor described in various embodiments above in a mobile device.
[0043] The schematic flow chart diagram of
[0044] If implemented in firmware and/or software, functions described above may be stored as one or more instructions or code on a computer-readable medium. Examples include non-transitory computer-readable media encoded with a data structure and computer-readable media encoded with a computer program. Computer-readable media includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise random access memory (RAM), read-only memory (ROM), electrically-erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc includes compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), floppy disks and Blu-ray discs. Generally, disks reproduce data magnetically, and discs reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.
[0045] In addition to storage on computer readable medium, instructions and/or data may be provided as signals on transmission media included in a communication apparatus. For example, a communication apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the claims.
[0046] Although the present disclosure and certain representative advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. For example, although digital signal processors (DSPs) are described throughout the detailed description, aspects of the invention may be applied to the design of other processors, such as graphics processing units (GPUs) and central processing units (CPUs). Further, although ones (1s) and zeros (0s) are given as example bit values throughout the description, the function of ones and zeros may be reversed without change in operation of the processor described in embodiments above. For example, a one value in a configuration register may be used to indicate either a first mode of operation or a second mode of operation without change in the operation of the processor. Additionally, although 16-bit and 24-bit modes are described for a processor, the processor may support different wordlengths and/or additional wordlengths. For example, a processor may support 32-bit wordlength as a low-precision mode and 48-bit wordlength as a high-precision mode. As one of ordinary skill in the art will readily appreciate from the present disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.