FAST SPARSE NEURAL NETWORKS
20220335272 · 2022-10-20
Inventors
- Erich Konrad Elsen (Naperville, IL, US)
- Trevor John Gale (San Francisco, CA, US)
- Marat Dukhan (Mountain View, CA, US)
Cpc classification
G06F17/16
PHYSICS
International classification
Abstract
A neural network system includes at least one layer which applies a 1×1 convolution to a dense activation matrix, using a kernel defined by a sparse weight matrix. The layer is implemented by a processor with access to a sparsity dataset which indicates where the null weights are located in the weight matrix. The processor selects the feature values corresponding to the other weights from a memory unit configured to store the activation matrix, and then uses these extracted feature values for calculating the convolved values.
Claims
1. A method of implementing a neural network, the neural network comprising a plurality of layers including at least one sparse 1×1 convolutional layer, the input of the convolutional layer comprising, for each of a plurality of elements arranged in an H×W array, a respective input channel of feature values, the sparse 1×1 convolutional layer being configured to apply a sparse 1×1 convolution to the input channels to form respective output channels each composed of a plurality of convolved values, the sparse 1×1 convolution being defined by a sparse C×C′ weight matrix having a plurality of null weights which are equal to zero and a plurality of non-null weights, and the input channels constituting a dense C′×HW activation matrix having a feature value defined for each element of the activation matrix, the method comprising: obtaining an indication of the null weights of the weight matrix; and processing the sparse C×C′ weight matrix in conjunction with the dense C′×HW activation matrix by, for elements of a row vector comprising a plurality of the elements in a row of the activation matrix, generating the convolved values for the plurality of elements by: (a) extracting corresponding feature values of the input channels from a memory unit storing the activation matrix, the corresponding feature values being feature values for which according to the indication the corresponding weight of the weight matrix is a non-null weight, and (b) forming a corresponding sum of the corresponding extracted feature values weighted by the respective non-null weights.
2. A method according to claim 1 in which the null weights constitute substantially 70-95% of the components of the weight matrix.
3. A method according to claim 1 in which an output layer of the neural network is fully connected.
4. A method according to claim 1 in which the memory unit has a CHW memory layout.
5. A method according to claim 4 in which the processing is performed with an inner loop for successive row vectors of elements in the same row, and an outer loop for successive rows.
6. A method according to claim 1 in which the processing is performed repeatedly for successive row vectors, the row vectors collectively including the whole array of elements.
7. A method according to claim 1, the neural network further including an output layer following the convolutional layer and arranged to generate one or more output values, each output value being determined based on all the convolved values of all the elements.
8. A method according to claim 1, in which the non-null weights are in the same positions in each of a plurality of rows of the weight matrix.
9. A method according to claim 8 in which the processing for the plurality of rows of the weight matrix is performed in parallel to generate the corresponding plurality of convolved values of the output channels for the row vector.
10. A method according to claim 1 in which during the generation of the convolved values for the plurality of elements, upon said extraction of corresponding feature values from the memory unit, the extracted features values are stored in a cache memory, the extraction and storage not being performed in respect of feature values which were stored in the cache memory during the generation of preceding convolved values for the plurality of elements.
11. A method according to claim 10 in which during the generation of the convolved values for the plurality of elements based on the corresponding feature values for the plurality of elements, the corresponding feature values for a plurality of additional elements are also read from the memory unit into the cache memory, the convolved values of the plurality of additional elements not being generated in parallel with the convolved values for the plurality of elements.
12. (canceled)
13. (canceled)
14. A system configured to implement a neural network, the neural network comprising a plurality of layers including at least one sparse 1×1 convolutional layer, the input of the convolutional layer comprising, for each of a plurality of elements arranged in an H×W array, a respective input channel of feature values, the sparse 1×1 convolutional layer being configured to apply a sparse 1×1 convolution to the input channels to form respective output channels each composed of a plurality of convolved values, the sparse 1×1 convolution being defined by a sparse C×C′ weight matrix having a plurality of null weights which are equal to zero and a plurality of non-null weights, and the input channels constituting a dense C′×HW activation matrix having a feature value defined for each element of the activation matrix, the system comprising a memory unit and a processing unit, the memory unit storing instructions which when implemented by the processing unit cause the processing unit to: obtain an indication of the null weights of the weight matrix; and process the sparse C×C′ weight matrix in conjunction with the dense C′×HW activation matrix by, for elements of a row vector comprising a plurality of the elements in a row of the activation matrix, generating the convolved values for the plurality of elements by: (a) extracting corresponding feature values of the input channels from a memory unit storing the activation matrix, the corresponding extracted feature values being feature values for which according to the indication the corresponding weight of the weight matrix is a non-null weight, and (b) forming a corresponding sum of the corresponding extracted feature values weighted by the respective non-null weights.
15. (canceled)
16. (canceled)
17. A system according to claim 14 in which an output layer of the neural network is fully connected.
18. A system according to claim 14 in which the memory unit has a CHW memory layout.
19. A system according to claim 18 in which the processing is performed with an inner loop for successive row vectors of elements in the same row, and an outer loop for successive rows.
20. A system according to claim 14 in which the processing is performed repeatedly for successive row vectors, the row vectors collectively including the whole array of elements.
21. A system according to claim 14, the neural network further including an output layer following the convolutional layer and arranged to generate one or more output values, each output value being determined based on all the convolved values of all the elements.
22. A system according to claim 14, in which the non-null weights are in the same positions in each of a plurality of rows of the weight matrix.
23. A system according to claim 22 in which the processing for the plurality of rows of the weight matrix is performed in parallel to generate the corresponding plurality of convolved values of the output channels for the row vector.
24. One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations for implementing a neural network, the neural network comprising a plurality of layers including at least one sparse 1×1 convolutional layer, the input of the convolutional layer comprising, for each of a plurality of elements arranged in an H×W array, a respective input channel of feature values, the sparse 1×1 convolutional layer being configured to apply a sparse 1×1 convolution to the input channels to form respective output channels each composed of a plurality of convolved values, the sparse 1×1 convolution being defined by a sparse C×C′ weight matrix having a plurality of null weights which are equal to zero and a plurality of non-null weights, and the input channels constituting a dense C′×HW activation matrix having a feature value defined for each element of the activation matrix, the method comprising: obtaining an indication of the null weights of the weight matrix; and processing the sparse C×C′ weight matrix in conjunction with the dense C′×HW activation matrix by, for elements of a row vector comprising a plurality of the elements in a row of the activation matrix, generating the convolved values for the plurality of elements by: (a) extracting corresponding feature values of the input channels from a memory unit storing the activation matrix, the corresponding feature values being feature values for which according to the indication the corresponding weight of the weight matrix is a non-null weight, and (b) forming a corresponding sum of the corresponding extracted feature values weighted by the respective non-null weights.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] Examples of the present disclosure will now be described for the sake of example only with reference to the following drawings, in which:
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035] Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION
[0036]
[0037] The neural network 100 comprises an input layer 101, an output layer 103 and one or more hidden layers 102a, 102b, 102c. The input layer 101, hidden layer(s) 102a, 102b, 102c and output layer 103 are arranged in a sequence. The output of each layer except the output layer 103 provides the input for the next layer of the sequence. One of more of the input layer 101, hidden layer(s) 102a, 102b, 102c and output layer 103 are convolutional layers. Indeed, they may all be convolutional layers, though typically at least the output layer 103 is not. Each convolutional layer receives input defined based on an array (typically a two-dimensional array) of elements. For each element there is a respective input channel which is a feature vector composed of C′ feature values. Similarly, for each element the convolutional layer generates a respective output channel having C values referred to as “convolved values”. Each convolutional layer employs a respective kernel defined by a weight matrix.
[0038] The input to the input layer 101 is data defining an image, such as data which for each of an array of pixels specifies values for one or more values. The pixels may correspond to respective ones of the elements. For example, C′ may be 3 for this layer, and the feature values of the input channel for each element may be respectively the intensity of red, green and blue channels.
[0039] At least one of the layers, particularly one of the hidden layer(s) 102a, 102b, 102c is a 1×1 convolutional layer. In the case of a 1×1 convolutional layer, the output channel for each element depends only upon the input channel for the element. That is, the kernel does not contain weights which cause a component of the output channel for one element to depend upon the input channel of another element.
[0040] As described below, one or more of the layer(s) of the neural network 100 which implement a 1×1 convolution may be implemented using a kernel which exhibits “sparsity” (i.e. at least a certain proportion of the weights taking zero values, e.g. at least half), particularly one of the hidden layer(s) 102a, 102b, 102c. However, not all the layers of the neural network may exhibit sparsity.
[0041] Firstly, the input layer 101 may comprise a kernel which does not exhibit sparsity. Its overall contribution to the parameter count, FLOP count, and runtime is small. Instead, the input layer 101 may employ a dense convolutional kernel, and take an image as its input.
[0042] Also, one or more of the layers 101, 102a, 102b, 102c, 103 may implement a “squeeze and excitation” (SE) layer, as described in “Squeeze and excitation networks”, Jie Hu et al, (2019). In such a layer, an input to the layer is mapped to feature maps denoted U (e.g. by a convolution), and the feature maps are subject to a “squeeze” operation which produces a channel descriptor by aggregating feature maps across their H×W spatial dimensions, to produce an embedding of the global distribution of channel-wise feature responses. This aggregation is followed by an “excitation” operation, which takes the embedding as an input and produces a collection of per-channel weights, which are applied to the feature maps U to generate the output of the SE layer. If such a SE layer is present in the neural network 100, this also may not employ a sparse kernel as described below, since experiments have shown that typically they contribute less than 1% of the total FLOPs of dense models in which they are conventionally used.
[0043] Also, the last layer 103 of the neural network 100 may be implemented as fully-connected layer, rather than a convolutional layer. Again, it is known from experiment that in conventional models a fully-connected output layer contributes insignificantly (<1%) to the total FLOP count, but does contribute a significant fraction (20-50%) of total parameters, especially if the training of the neural network is such that other layers of the neural network are pruned.
[0044]
[0045] The third of the memory units is a feature value memory unit 205 which stores data input to and output from each of the layers. Upon receiving the data input 201, the data is stored in the feature value memory 205.
[0046] The data in the data input 201 and stored in the feature value memory 205 may be in the standard HWC layout in which the values for different channels corresponding to one spatial location are adjacent in memory. That is, denoting the number of elements per row of the array as W, the number of rows in the array by H, and the number of channels per element by C, the memory location (i.e. the offset distance from some arbitrary location in the memory space) for the value of the c-th channel of the element at position (h, w) in the array, may be expressed as h*(W)*(C)+w*(C)+c. Upon receiving the data input 201, the data input 201 may be stored, typically still in the HWC format, in the feature memory unit 205.
[0047] To implement one of the layers of the neural network 100, the processor 202 may transfer successive portions of the data describing the input to that layer from the feature memory unit 205 to a cache memory 206 of the processor 202. In the case of a layer exhibiting sparsity, for each element the transfer may be performed in multiple steps, in which each of which only a subset of the feature values of input channel for that element is transferred to the cache memory 206, as required to generate a portion of the output channel for the element. To allow the convolved values for multiple elements to be generated together (e.g. in parallel), feature values for the multiple elements may be transferred from the feature memory unit 205 to the cache memory 206 simultaneously.
[0048] For each layer (except optionally for the output layer 103), the convolved values of the respective output channel for each element are stored in the feature value memory unit 205. The output channels are subsequently read by the processor 202 from the feature value memory unit 205, and used by the processor 202 as input data for the successive layer of the neural network 100. As described below, the output channels for one or more of the layers of the neural network 100, such as the input layer 101 and/or one or more of the hidden layers 102a, 102b, 102c, may be stored in the feature value memory unit 205 in a CHW format, also called here a CHW layout. In the CHW layout, the values of all the spatial locations for one channel are adjacent in memory. In the CHW layout, the memory location (offset from an arbitrary position in the memory space) of the c-th channel of the element at position (h,w) in the H×W array is c*(W)*(H)+h*(W)+w. It is convenient for sparse convolutional operations if the input data is in the CHW format for the one or more of the hidden layers 102c, 102b, 102c and output layer 103, and in particular for the convolutional layer 102a immediately following the input layer 101.
[0049] The output channels for the output layer 103 are transmitted from the computer system 200 as the output data 207. The output may be for example represent a classification of the image data 201. Alternatively, if the data input 201 is side-data and the neural network 100 is a generative network, the output data 207 may be a dataset representing an image or a signal such as a sound waveform. Alternatively, if the data input 201 is sensor data describing an environment, e.g. an image of a real-world environment collected by a still or video camera, the output data 207 may be control data which is transmitted to an agent in order to control the agent to interact with the environment, e.g. move (by translation, rotation and/or reconfiguration) within the environment. Alternatively, if the data input 201 is data representing a portion of natural language (e.g. a sequence of letters or a sound signal collected by a sensor when the natural language is spoken), the output data 207 may be modified natural language, such as translation of the natural language, and may again be a sequence of letters or a sound signal.
[0050] Turning to
[0051] The kernel for the 1×1 convolutional layer is denoted by the C×C′ weight matrix 302, where C is the number of convolved values in the output channel of each element. C may be the same as, or different from, C′. Values in the weight matrix 302 which are zero (“null values”) are denoted by unshaded (white) boxes, while non-zero values (“non-null values”) in the kernel matrix are denoted by shaded boxes. The proportion of non-null values is small, e.g. in the range 25%-10%. The convolution operation consists of the multiplication of the weight matrix 302 by the activation matrix 301. This is described below with reference to
[0052]
[0053] When processing each column of the matrix 401 (i.e. the input values for each element) to generate the corresponding convolved values, the weight rows of each group may be processed in parallel to generate the corresponding convolved values. However, different groups of weight rows may be processed successively.
[0054]
[0055]
[0056] In a first step, shown in
[0057] Optionally, for each of the non-null weight values in the first row of the weight matrix 501 (i.e. the first and fourth weights), the processor also reads the corresponding feature values (i.e. the first and fourth feature values) for a second set of e.g. eight elements, and writes them to the next eight locations 5023, 5024 of the corresponding rows of the memory space 502 (i.e. the first and fourth rows). They are shown in
[0058] For each of the first set of eight elements, the processor 502 forms the respective convolved value by multiplying each non-null weight in the first row of the weight matrix 501 by the feature value for that element in the row of the memory space 502 corresponding to non-null weight, and accumulating (adding) the results. The processor 202 then writes the respective convolved value for each of these eight elements to the first eight positions 5031 of the portion 503 of the memory space of the feature value memory unit 205. Optionally, a non-linear function included in the 1×1 convolution (e.g. an ReLU function) may be performed to each of the convolved values. Thus, the process illustrated in
[0059] As noted above, the processor 202 may optionally have already written the first and fourth feature values for the second set of eight elements to the eight respective memory locations 5023, 5024. In this case, the processor 202 may optionally generate the convolved values for the second set of eight elements by the same process. That is, for each of the second set of eight elements, the processor 202 forms the respective convolved value by multiplying each non-null weight in the first row of the weight matrix 501 by the feature value for that element in the portion 5023, 2024 of the row of the memory space 502 corresponding to non-null weight, accumulating (adding) the results, and writing them to the next eight positions 5032 of the first row of the memory space 503. If the 1×1 convolution operation includes performing a non-linear function, this is performed on each of the convolved values. Note that this process is not illustrated in
[0060]
[0061] Optionally, the second and third feature values for the second set of eight elements are written into the next eight positions of the corresponding rows (i.e. the second and third rows) of the memory space 502 (as indicated by a single diagonal line from bottom-left to top-right. Then the processor 202 calculates the respective second convolved value of the output channel for each of the first set of eight elements by multiplying the non-null weights in the second row of the weight matrix 501 by the corresponding feature values for the first set of eight elements, and adding the results.
[0062]
[0063] In the sequence of steps shown in
[0064] If there is known to be any regularity in the structure in the weight matrix 501, even a small amount, this allows the process of
[0065] A method 600 of generating a convolved value in the process illustrated in
[0066] Experiments were performed demonstrating that large savings in computational burden and memory requirements can be achieved using the techniques explained above. Three factors particularly contribute to this:
[0067] 1. Though the weight matrix is sparse, the activation matrix is dense. This means that the processor 202 can perform vector loads from the activation matrix and process multiple spatial locations simultaneously.
[0068] 2. By processing the matrix in the right order the system can keep in the cache memory values that will be randomly accessed. Note that random access from the cache memory 206 can be performed more quickly than from the feature value memory unit 205.
[0069] 3. Particularly when the number of input channels is small, the prefetching of feature values from the activation matrix for the second set of elements further reduces the number of cases in which the cache memory 206 does not contain required feature values when the convolved values for the second set of elements are to be calculated, such that a value has to be obtained from the feature value memory unit 205.
[0070] The experiments demonstrated that for a constant computational budget, sparse convolutional networks are more accurate than dense ones, such as by a factor of 1.3-2.4 as measured by wall clock time, while needing only 66% as many parameters—equivalent to approximately one entire generation of improvement.
[0071] This specification uses the term “configured” in connection with systems and computer program components. For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions.
[0072] Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. Alternatively or in addition, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
[0073] The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
[0074] A computer program, which may also be referred to or described as a program, software, a software application, an app, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages; and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.
[0075] In this specification, the term “database” is used broadly to refer to any collection of data: the data does not need to be structured in any particular way, or structured at all, and it can be stored on storage devices in one or more locations. Thus, for example, the index database can include multiple collections of data, each of which may be organized and accessed differently.
[0076] Similarly, in this specification the term “engine” is used broadly to refer to a software-based system, subsystem, or process that is programmed to perform one or more specific functions. Generally, an engine will be implemented as one or more software modules or components, installed on one or more computers in one or more locations. In some cases, one or more computers will be dedicated to a particular engine; in other cases, multiple engines can be installed and running on the same computer or computers.
[0077] The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA or an ASIC, or by a combination of special purpose logic circuitry and one or more programmed computers.
[0078] Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. The central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
[0079] Computer readable media suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks.
[0080] To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's device in response to requests received from the web browser. Also, a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone that is running a messaging application, and receiving responsive messages from the user in return.
[0081] Data processing apparatus for implementing machine learning models can also include, for example, special-purpose hardware accelerator units for processing common and compute-intensive parts of machine learning training or production, i.e., inference, workloads.
[0082] Machine learning models can be implemented and deployed using a machine learning framework, e.g., a TensorFlow framework, a Microsoft Cognitive Toolkit framework, an Apache Singa framework, or an Apache MXNet framework.
[0083] Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface, a web browser, or an app through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
[0084] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the device, which acts as a client. Data generated at the user device, e.g., a result of the user interaction, can be received at the server from the device.
[0085] While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub-combination. Moreover, although features may be described above as acting in certain combinations and even initially be claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub-combination or variation of a sub-combination.
[0086] Similarly, while operations are depicted in the drawings and recited in the claims in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
[0087] Particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.