SCALABLE SINGLE-INSTRUCTION-MULTIPLE-DATA INSTRUCTIONS
20170046168 ยท 2017-02-16
Inventors
Cpc classification
G06F9/3887
PHYSICS
G06F9/30036
PHYSICS
G06F9/38873
PHYSICS
International classification
Abstract
A method for executing scalable single-instruction-multiple-data (SIMD) instructions includes performing a query to determine a hardware vector length of a SIMD processor. The method also includes scaling a first instruction of the scalable SIMD instructions to a first scaled vector length to generate a first scaled instruction. The first scaled vector length is based on the hardware vector length, and the first instruction is a compiled instruction having an adaptable vector length. The method also includes adjusting a first number of iterations to be used by the SIMD processor to perform first operations associated with the first instruction based on the first scaled vector length.
Claims
1. A method for executing scalable single-instruction-multiple-data (SIMD) instructions, the method comprising: performing a query to determine a hardware vector length of a SIMD processor; scaling a first instruction of the scalable SIMD instructions to a first scaled vector length to generate a first scaled instruction, the first scaled vector length based on the hardware vector length, wherein the first instruction is a compiled instruction having an adaptable vector length; and adjusting a first number of iterations to be used by the SIMD processor to perform first operations associated with the first instruction based on the first scaled vector length.
2. The method of claim 1, wherein the query is performed at run-time.
3. The method of claim 1, wherein performing the query comprises polling a control register to determine the hardware vector length.
4. The method of claim 1, wherein performing the query comprises executing a dedicated scalar instruction to determine the hardware vector length.
5. The method of claim 1, wherein performing the query comprises executing hypervisor code to retrieve a value that indicates the hardware vector length.
6. The method of claim 1, wherein performing the query comprises performing a library call to access hardware specification data, the hardware specification data indicating the hardware vector length.
7. The method of claim 1, wherein performing the query comprises: sending a software request for a range of hardware vector lengths for available resources, the software request sent to a library, an operating system, or both; and receiving a signal indicating the hardware vector length when the SIMD processor is available to process data.
8. The method of claim 1, wherein performing the query comprises: sending a software request for a particular hardware vector length of an available resource, the software request sent to a library, an operating system, or both; and receiving a signal indicating the particular hardware vector length.
9. The method of claim 1, further comprising initiating execution of the first scaled instruction at the SIMD processor, wherein the SIMD processor performs the first operations using the adjusted first number of iterations to execute the first scaled instruction.
10. The method of claim 1, further comprising: scaling other instructions of the scalable SIMD instructions to the first scaled vector length to generate additional scaled instructions; and adjusting a number of iterations to be used by the SIMD processor to perform operations associated with the other instructions based on the first scaled vector length; and initiating execution of the additional scaled instructions at the SIMD processor.
11. An apparatus comprising: a processor configured to: retrieve a first single-instruction-multiple-data (SIMD) instruction; and scale the first SIMD instruction to a first scaled vector length to generate a first scaled instruction, the first scaled vector length based on a hardware vector length of a SIMD processor, wherein the first SIMD instruction is a compiled instruction having an adaptable vector length; and a loop value register storing a control value that indicates a number of iterations to be used by the SIMD processor to perform operations associated with the first SIMD instruction, wherein the control value is adjusted based on the first scaled vector length.
12. The apparatus of claim 11, wherein the processor is the SIMD processor.
13. The apparatus of claim 11, wherein the processor is further configured to: poll a control register to determine the hardware vector length; or execute a dedicated scalar instruction to determine the hardware vector length.
14. The apparatus of claim 11, wherein the processor is further configured to execute hypervisor code to retrieve a value that indicates the hardware vector length.
15. The apparatus of claim 11, wherein the processor is further configured to: send a software request for a range of hardware vector lengths for available resources, the software request sent to a library, an operating system, or both; and receive a signal indicating the hardware vector length when the SIMD processor is available to process data.
16. The apparatus of claim 11, wherein the processor is further configured to: send a software request for a particular hardware vector length of an available resource, the software request sent to a library, an operating system, or both; and receive a signal indicating the particular hardware vector length.
17. A non-transitory computer-readable medium comprising commands for executing scalable single-instruction-multiple-data (SIMD) instructions, the commands, when executed by a processor, cause the processor to perform operations comprising: performing a query to determine a hardware vector length of a SIMD processor; scaling a first instruction of the scalable SIMD instructions to a first scaled vector length to generate a first scaled instruction, the first scaled vector length based on the hardware vector length, wherein the first instruction is a compiled instruction having an adaptable vector length; and adjusting a first number of iterations to be used by the SIMD processor to perform first operations associated with the first instruction based on the first scaled vector length.
18. The non-transitory computer-readable medium of claim 17, wherein performing the query comprises: sending a software request for a range of hardware vector lengths for available resources, the software request sent to a library, an operating system, or both; and receiving a signal indicating the hardware vector length when the SIMD processor is available to process data.
19. The non-transitory computer-readable medium of claim 17, wherein performing the query comprises: sending a software request for a particular hardware vector length of an available resource, the software request sent to a library, an operating system, or both; and receiving a signal indicating the particular hardware vector length.
20. The non-transitory computer-readable medium of claim 17, wherein the processor is the SIMD processor.
Description
IV. BRIEF DESCRIPTION OF THE DRAWINGS
[0011]
[0012]
[0013]
[0014]
V. DETAILED DESCRIPTION
[0015] Particular aspects of the present disclosure are described with reference to the drawings. In the description, common features are designated by common reference numbers throughout the drawings.
[0016] Referring to
[0017] The scalable SIMD instructions 110 (e.g., program code) may include a first instruction 112 (e.g., a first SIMD instruction), a second instruction 114 (e.g., a second SIMD instruction), and an Nth instruction 116 (e.g., an Nth SIMD instruction). Each instruction 112, 114, 116 may be a compiled instruction having an adaptable vector length. For example, each instruction 112, 114, 116 may have a non-specified (or variable) vector length that is adaptable (by a processor) for run-time processing. To illustrate, each instruction 112, 114, 116 may be compiled to have multiple vector lengths (where a specific vector length is determined during run-time processing) to enable a processor to reduce a number of processing iterations during run-time processing, as described below. In a particular implementation, N may correspond to any integer that is greater than zero. For example, if N is equal to eight, the scalable SIMD instructions 110 may include eight SIMD instructions.
[0018] Each instruction 112-116 of the scalable SIMD instructions 110 may specify data that is to be processed by a SIMD processor (not shown). As a non-limiting example, each instruction 112-116 of the scalable SIMD instructions 110 may be configured to cause the SIMD processor to process 2048 words of input data, where each word includes 32 bits of data. If the SIMD processor has a hardware vector length of 64 bits, the SIMD processor may be capable of processing two words (e.g., 64 bits) at a time. Thus, if each instruction 112-116 has a vector length of two words, the SIMD processor may execute each instruction 112-116 over 1024 iterations (e.g., loops) if 64 bits are processed by the SIMD processor during each iteration.
[0019] However, if the SIMD processor has a hardware vector length of 128 bits, the SIMD processor may be capable of processing four words during each iteration. Thus, the number of iterations to process a 2048 word instruction may be reduced by half based on the processing capabilities (e.g., the hardware vector length) of the SIMD processor. To accommodate the increased processing capabilities of the SIMD processor, the scalable SIMD instructions 110 may undergo the scaling process 100 to scale each instruction 112-116 based on the hardware vector length of the SIMD processor.
[0020] For example, each instruction 112-116 may be scaled from having a vector length of two words (e.g., 64 bits) to having a vector length of four words (e.g., 128 bits). Thus, the first instruction 112 (e.g., a SIMD instruction having a vector length of 64 bits) may be scaled to generate a first scaled instruction 122 (e.g., a SIMD instruction having a vector length of 128 bits). In a similar manner, the second instruction 114 (e.g., a SIMD instruction having a vector length of 64 bits) may be scaled to generate a second scaled instruction 124 (e.g., a SIMD instruction having a vector length of 128 bits), and the Nth instruction 116 (e.g., a SIMD instruction having a vector length of 64 bits) may be scaled to generate an Nth scaled instruction 126 (e.g., a SIMD instruction having a vector length of 128 bits). Thus, according to one implementation, scaling an instruction as used herein may include using an actual hardware vector length for the vector length parameter of the instruction behavior.
[0021] As described below with respect to
[0022] The scaling process 100 of
[0023] Referring to
[0024] The processor 202 is communicatively coupled to the control register 206 via a bus 212, and the processor 202 is communicatively coupled to the hypervisor 208 via a bus 214. The memory 203 is communicatively coupled to the processor 202 via a bus 213. The memory 203 may store the scalable SIMD instructions 110 of
[0025] The processor 202 includes an SIMD processor 204 (e.g., SIMD processing components), an instruction cache 224, an instruction scaling module 222, and a loop value register 238. The processor 202 may run an operating system 220. For example, the operating system 220 may provide instructions for the processor 202 to execute. Although the SIMD processor 204 is illustrated as being integrated into the processor 202, in other implementations, the SIMD processor 204 may be distinct from the processor 202 and coupled to the processor 202 via a bus. The SIMD processor 204 may include one or more SIMD execution units 236. The operating system 220 may include hardware specification data 226 that indicates the vector length of the SIMD processor 204. The instruction cache 224 may include a dedicated scalar instruction 228 or a library call instruction 230.
[0026] The processor 202 may be configured to perform a query to determine a hardware vector length of the SIMD processor 204. As used herein, the hardware vector length of the SIMD processor 204 may correspond to an amount of data that one of the SIMD execution units 236 is capable of processing during a processing cycle. For example, if one of the SIMD execution units 236 is capable of processing 64 bits of data (e.g., 2 words) during a processing cycle, the hardware vector length of the SIMD processor 204 may be 64 bits. As another example, if one of the SIMD execution units 236 is capable of processing 128 bits of data (e.g., 4 words) during a processing cycle, the hardware vector length of the SIMD processor 204 may be 128 bits.
[0027] According to one implementation, the processor 202 may poll the control register 206 to determine the hardware vector length. For example, the control register 206 may store data indicating a vector length 232 of the SIMD processor 204. The processor 202 may send a poll signal to the control register 206 via the bus 212 to access the data. Based on the poll signal, the processor 202 may determine the vector length 232 (e.g., the hardware vector length) of the SIMD processor 204.
[0028] According to one implementation, the processor 202 may execute the dedicated scalar instruction 228 to determine the hardware vector length of the SIMD processor 204. For example, the processor 202 may fetch the dedicated scalar instruction 228 from the instruction cache 224. After fetching the dedicated scalar instruction 228, the processor 202 may execute the dedicated scalar instruction 228 to determine the hardware vector length of the SIMD processor 204. To illustrate, upon executing the dedicated scalar instruction 228, the processor 202 may poll a register (e.g., the control register 206 or another register (not shown)) to access data indicating the hardware vector length of the SIMD processor 204.
[0029] According to one implementation, the processor 202 may execute hypervisor code 234 to retrieve a value that indicates the hardware vector length of the SIMD processor 204. The hypervisor 208 may support the processor 202 and the SIMD processor 204. For example, the hypervisor 208 may store information (e.g., hardware specification information) associated with the processor 202 and the SIMD processor 204. To illustrate, the hypervisor 208 may store the hardware vector length of the SIMD processor 204 as hypervisor code 234 that may be executed and translated into a machine language of the processor 202. Thus, the processor 202 may execute the hypervisor code 234 to determine the hardware vector length of the SIMD processor 204.
[0030] According to one implementation, the processor 202 may perform a library call to access the hardware specification data 226. For example, the processor 202 may fetch a library call instruction 230 from the instruction cache 224. After fetching the library call instruction 230, the processor 202 may execute the library call instruction 230 to access hardware specification data 226. The hardware specification data 226 may indicate the hardware vector length of the SIMD processor 204. After determining the hardware vector length of the SIMD processor 204, the processor 202 may be configured to scale the first instruction 112 of the scalable SIMD instructions 110 to a first scaled vector length to generate the first scaled instruction 122. The first scaled vector length may be based on the hardware vector length of the SIMD processor 204. For example, if the hardware vector length of the SIMD processor 204 is 128 bits, the first scaled vector length may be equal to 128 bits.
[0031] To illustrate, the instruction scaling module 222 may fetch the scalable SIMD instructions 110 from the memory 203 via the bus 213. Upon fetching the scalable SIMD instructions 110, the instruction scaling module 222 may perform the scaling process 100 described with respect to
[0032] Although the instruction scaling module 222 is described as scaling the instructions 112-116 to the first scaled vector length (e.g., 128 bits), in some implementations, the instruction scaling module 222 may scale the instructions 112-116 to a second scaled vector length that is less than the first scaled vector length or to a third scaled vector length that is greater than the first scaled vector length. For example, if the processor 202 determines that the hardware vector length of the SIMD processor 204 is 32 bits (e.g., one word), the instruction scaling module 222 may scale the instructions 112-116 to the second scaled vector length (e.g., 32 bits). Alternatively, if the processor 202 determines that the hardware vector length of the SIMD processor 204 is 256 bits (e.g., eight words), the instruction scaling module 222 may scale the instructions 112-116 to the third scaled vector length (e.g., 254 bits).
[0033] After scaling the instructions 112-116 to generate the scaled SIMD instructions 120, the processor 202 may be configured to adjust a number of iterations (e.g., loops) to be used by the SIMD processor 204 to perform operations associated with scaled instructions 122-126. For example, the loop value register 238 may store a control value 240 that indicates the number of iterations (e.g., loops) that the one or more SIMD execution units 236 are to perform to execute the instructions. The processor 202 may send a signal to the loop value register 238 to adjust the control value 240 based on the first scaled vector length (e.g., the hardware vector length of the SIMD processor 204). For example, if each instruction 112-116 includes 2048 words, the processor 202 may adjust the control value 240 to indicate that 512 iterations are to be used by the execution units 236 to perform operations associated with each instruction 112-116.
[0034] After the processor 202 adjusts the control value 240, the processor 202 may be configured to initiate execution of the scaled instructions 122-126 at the SIMD processor 204. For example, the processor 202 may provide the scaled SIMD instructions 120 to the one or more SIMD execution units 236, and the one or more SIMD execution units 236 may execute the scaled instructions 122-126 based on the adjusted number of iterations indicated by the control value 240. In some implementations, the processor 202 may provide the scaled SIMD instructions 120 to the memory 203, and the SIMD processor 204 may retrieve the scaled SIMD instructions 120 from the memory 203.
[0035] The system 200 of
[0036] Referring to
[0037] The method 300 includes performing, at a processor, a query to determine a hardware vector length of a SIMD processor, at 302. For example, referring to
[0038] According to another implementation, the processor 202 may send a software request for a range of hardware vector lengths for available resources. The software request may be sent to a library or to the operating system 220. In response to sending the software request, the processor 202 may receive a signal indicating the hardware vector length of the SIMD processor 204 when the SIMD processor 204 is available to process data (e.g., when one or more of the SIMD execution unit 236 is available to process instructions). According to yet another implementation, the processor 202 may send a software request for a particular hardware vector length of an available resource. For example, the processor 202 may send a software request for a particular SIMD execution unit 236. The software request may be sent to the library or to the operating system 220. In response to sending the software request, the processor 202 may receive a signal indicating the particular hardware vector length of the particular SIMD execution unit 236 when the particular SIMD execution unit 236 is available to process data.
[0039] A first SIMD instruction may be scaled to a first scaled vector length to generate a first scaled instruction, at 304. The first scaled vector length may be based on the hardware vector length, and the first SIMD instruction may be a compiled instruction having an adaptable vector length. For example, referring to
[0040] A first number of iterations to be used by the SIMD processor to perform first operations associated with the first SIMD instruction may be adjusted based on the first scaled vector length, at 306. For example, referring to
[0041] According to one implementation, the method 300 may also include initiating execution of the first scaled instruction 122 at the SIMD processor 204. For example, the processor 202 may provide the first scaled instruction 122 to the SIMD processor 204, and the one or more SIMD execution units 236 may execute the first scaled instruction 122 based on the adjusted number of iterations indicated by the control value 240.
[0042] According to one implementation, the second instruction 114 may be scaled in the same manner as the first instruction with respect to the method 300.
[0043] The method 300 of
[0044] Referring to
[0045] A wireless controller 440 may be coupled to an antenna 442 via transceiver 450. The device 400 may include a display 428 coupled to a display controller 426. A speaker 448, a microphone 446, or both may be coupled to a coder/encoder (CODEC) 434.
[0046] The memory 203 may include instructions 456 executable by the processor 202, the SIMD processor 204, another processing unit of the device 400, or a combination thereof, to perform the method 300 of
[0047] According to one implementation, the device 400 may be included in a system-in-package or system-on-chip device 422, such as a mobile station modem (MSM). According to one implementation, the processor 202, the SIMD processor 204, the display controller 426, the memory 432, the CODEC 434, the wireless controller 440, and the transceiver 450 are included in a system-in-package or the system-on-chip device 422. According to one implementation, an input device 430, such as a touchscreen and/or keypad, and a power supply 444 are coupled to the system-on-chip device 422. Moreover, according to one implementation, as illustrated in
[0048] In conjunction with the described implementations, an apparatus includes means for performing a query to determine a hardware vector length of a SIMD processor. For example, the means for performing the query may include the processor 202 of
[0049] The apparatus may also include means for scaling a first SIMD instruction to a first scaled vector length to generate a first scaled instruction. The first scaled vector length may be based on the hardware vector length. For example, the means for scaling the first SIMD instruction may include the instruction scaling module 222 of
[0050] The apparatus may also include means for adjusting a first number of iterations to be used by the SIMD processor to perform first operations associated with the first SIMD instruction based on the first scaled vector length. For example, the means for adjusting the first number of iterations may include the processor 202 of
[0051] Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software executed by a processor, or combinations of both. Various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or processor executable instructions depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
[0052] The steps of a method or algorithm described in connection with the implementations disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a compact disc read-only memory (CD-ROM), or any other form of non-transient storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or user terminal.
[0053] The previous description of the disclosed implementations is provided to enable a person skilled in the art to make or use the disclosed implementations. Various modifications to these implementations will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other implementations without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the implementations shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.