SYSTEM AND METHOD FOR EFFICIENT SCROLLING
20200026403 ยท 2020-01-23
Inventors
Cpc classification
G09G2340/02
PHYSICS
G09G5/003
PHYSICS
G06F3/167
PHYSICS
G06T1/20
PHYSICS
G09G2340/10
PHYSICS
International classification
G06T1/20
PHYSICS
Abstract
In general, techniques are discussed for performing efficient content scrolling on smartphones and other user devices. Power and memory bandwidth requirements are reduced during high-speed scrolling by utilizing lossy compression of content during rendering with minimal user experience impact, as the user is less likely to notice artifacts resulting from high compression while the content is scrolling quickly.
Claims
1. A method of displaying scrolling content, the method comprising: responsive to instructions from an executing application, rendering a layer by a GPU into a memory; determining a requested scrolling speed based on a user input; responsive to determining the requested scrolling speed is at or below a first threshold, retrieving the layer from memory with lossless fetch compression; responsive to determining the requested scrolling speed is above the first threshold, retrieving the layer from memory with lossy fetch compression; and communicating the retrieved layer to a display panel.
2. The method of claim 1, wherein the user input is based on at least one of: a magnitude of a touch panel user finger swipe, a direction of the touch panel user finger swipe, a user voice command, a user eye gaze command, and a user-initiated autoscroll command.
3. The method of claim 1, further comprising: determining a desired frame rate based on the user input, wherein the retrieved layer is communicated to the display panel at the desired frame rate.
4. The method of claim 1, further comprising: responsive to determining the requested scrolling speed is at or above a second threshold, wherein the second threshold is higher than the first threshold, increasing a compression factor of the lossy fetch compression.
5. The method of claim 4, wherein the lossy fetch compression is executed by the GPU on the layer in the memory.
6. The method of claim 4, wherein the compression factor is increased based on at least one of: an available memory bandwidth and a power consumption limit.
7. The method of claim 1, wherein the retrieved layer is communicated to the display panel over a display compositor pipeline and a DSI interface.
8. The method of claim 7, wherein the DSI interface is in communication with at least one of: a command mode display panel and a video mode display panel.
9. An apparatus for displaying scrolling content, the apparatus comprising: a memory; and a processor, the processor configured to responsive to instructions from an executing application, render a layer by a GPU into the memory, determine a requested scrolling speed based on a user input, responsive to determining the requested scrolling speed is at or below a first threshold, retrieve the layer from memory with lossless fetch compression, responsive to determining the requested scrolling speed is above the first threshold, retrieve the layer from memory with lossy fetch compression, and communicate the retrieved layer to a display panel.
10. The apparatus of claim 9, wherein the user input is based on at least one of: a magnitude of a touch panel user finger swipe, a direction of the touch panel user finger swipe, a user voice command, a user eye gaze command, and a user-initiated autoscroll command.
11. The apparatus of claim 9, wherein the processor is further configured to determine a desired frame rate based on the user input, wherein the retrieved layer is communicated to the display panel at the desired frame rate.
12. The apparatus of claim 9, wherein the processor is further configured to responsive to determining the requested scrolling speed is at or above a second threshold, wherein the second threshold is higher than the first threshold, increase a compression factor of the lossy fetch compression.
13. The apparatus of claim 12, wherein the lossy fetch compression is executed by the GPU on the layer in the memory.
14. The apparatus of claim 12, wherein the compression factor is increased based on at least one of: an available memory bandwidth and a power consumption limit.
15. The apparatus of claim 9, wherein the retrieved layer is communicated to the display panel over a display compositor pipeline and a DSI interface.
16. The apparatus of claim 15, wherein the DSI interface is in communication with at least one of: a command mode display panel and a video mode display panel.
17. An apparatus for displaying scrolling content, the apparatus comprising: means for storage; and means for processing, wherein the means for processing configured to responsive to instructions from an executing application, render a layer by a GPU into the storage means, determine a requested scrolling speed based on a user input, responsive to determining the requested scrolling speed is at or below a first threshold, retrieve the layer from memory with lossless fetch compression, responsive to determining the requested scrolling speed is above the first threshold, retrieve the layer from memory with lossy fetch compression, and communicating the retrieved layer to a display panel.
18. The apparatus of claim 17, wherein the user input is based on at least one of: a magnitude of a touch panel user finger swipe, a direction of the touch panel user finger swipe, a user voice command, a user eye gaze command, and a user-initiated autoscroll command.
19. The apparatus of claim 17, wherein the means for processing is further configured to, determine a desired frame rate based on the user input, wherein the retrieved layer is communicated to the display panel at the desired frame rate.
20. The apparatus of claim 17, wherein the means for processing is further configured to, responsive to determining the requested scrolling speed is at or above a second threshold, wherein the second threshold is higher than the first threshold, increase a compression factor of the lossy fetch compression.
21. The apparatus of claim 20, wherein the lossy fetch compression is executed by the GPU on the layer in the storage means.
22. The apparatus of claim 20, wherein the compression factor is increased based on at least one of: an available memory bandwidth and a power consumption limit.
23. The apparatus of claim 17, wherein the retrieved layer is communicated to the display panel over a display compositor pipeline and a DSI interface.
24. The apparatus of claim 23, wherein the DSI interface is in communication with at least one of: a command mode display panel and a video mode display panel.
25. A non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause a processor to responsive to instructions from an executing application, render a layer by a GPU into a memory; determining a requested scrolling speed based on a user input; responsive to determining the requested scrolling speed is at or below a first threshold, retrieve the layer from memory with lossless fetch compression; responsive to determining the requested scrolling speed is above the first threshold, retrieve the layer from memory with lossy fetch compression; and communicate the retrieved layer to a display panel.
26. The non-transitory computer-readable storage medium of claim 25, wherein the user input is based on at least one of: a magnitude of a touch panel user finger swipe, a direction of the touch panel user finger swipe, a user voice command, a user eye gaze command, and a user-initiated autoscroll command.
27. The non-transitory computer-readable storage medium of claim 25, the processor further configured to determine a desired frame rate based on the user input, wherein the retrieved layer is communicated to the display panel at the desired frame rate.
28. The non-transitory computer-readable storage medium of claim 25, the processor further configured to responsive to determining the requested scrolling speed is at or above a second threshold, wherein the second threshold is higher than the first threshold, increase a compression factor of the lossy fetch compression.
29. The non-transitory computer-readable storage medium of claim 28, wherein the lossy fetch compression is executed by the GPU on the layer in the memory and the compression factor is increased based on at least one of: an available memory bandwidth and a power consumption limit.
30. The non-transitory computer-readable storage medium of claim 25, wherein the retrieved layer is communicated to the display panel over a display compositor pipeline and a DSI interface and the DSI interface is in communication with at least one of: a command mode display panel and a video mode display panel.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
DETAILED DESCRIPTION
[0018] Smartphones and other user devices may display content to a user, and the user may scroll a viewable content responsive to user input. Scrolling may be relatively slow or relatively fast. Relatively fast scrolling can be computationally- and bandwidth-intensive, especially for complex content. Power and memory bandwidth requirements may be reduced by utilizing lossy compression of content from during the rendering phase during relatively fast scrolling with minimal user experience impact, as the user is less likely to notice artifacts resulting from high compression while the content is scrolling quickly.
[0019]
[0020] In the example of
[0021] Examples of processor 12, GPU 14, and display processor 18 include, but are not limited to, one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Processor 12 may be the central processing unit (CPU) of device 10. In some examples, GPU 14 may be specialized hardware that includes integrated and/or discrete logic circuitry that provides GPU 14 with massive parallel processing capabilities suitable for graphics processing. In some instances, GPU 14 may also include general purpose processing capabilities, and may be referred to as a general purpose GPU (GPGPU) when implementing general purpose processing tasks (i.e., non-graphics related tasks). Display processor 18 may also be specialized integrated circuit hardware that is designed to retrieve image content from system memory 16, compose the image content into an image frame, and output the image frame to display 19.
[0022] Processor 12 may execute various types of applications. Examples of the applications include web browsers, e-mail applications, spreadsheets, video games, or other applications that generate viewable objects for display. System memory 16 may store instructions for execution of the one or more applications. The execution of an application on processor 12 causes processor 12 to produce graphics data for image content that is to be displayed. Processor 12 may transmit graphics data of the image content to GPU 14 for further processing based on and instructions or commands that processor 12 transmits to GPU 14.
[0023] Processor 12 may communicate with GPU 14 in accordance with a particular application processing interface (API). Examples of such APIs include the DirectX API by Microsoft, the OpenGL or OpenGL ES by the Khronos group, and the OpenCL; however, aspects of this disclosure are not limited to the DirectX, the OpenGL, or the OpenCL APIs, and may be extended to other types of APIs. Moreover, the techniques described in this disclosure are not required to function in accordance with an API, and processor 12 and GPU 14 may utilize any technique for communication or transmission.
[0024] System memory 16 may be the memory for device 10. System memory 16 may comprise one or more computer-readable storage media. Examples of system memory 16 include, but are not limited to, a random access memory (RAM), an electrically erasable programmable read-only memory (EEPROM), flash memory, or other medium that can be used to carry or store desired program code in the form of instructions and/or data structures and that can be accessed by a computer or a processor.
[0025] In some aspects, system memory 16 may include instructions that cause processor 12, GPU 14, and/or display processor 18 to perform the functions ascribed in this disclosure to processor 12, GPU 14, and/or display processor 18. Accordingly, system memory 16 may be a computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors (e.g., processor 12, GPU 14, and/or display processor 18) to perform various functions.
[0026] System memory 16 is a non-transitory storage medium. The term non-transitory indicates that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term non-transitory should not be interpreted to mean that system memory 16 is non-movable or that its contents are static. As one example, system memory 16 may be removed from device 10, and moved to another device. As another example, memory, substantially similar to system memory 16, may be inserted into device 10. In certain examples, a non-transitory storage medium may store data that can, over time, change (e.g., in RAM).
[0027] As noted above, display processor 18 may perform composition of layers to form a frame for display by a display unit (e.g., shown in the example of
[0028] Each of the different hardware pipelines of the display processor may fetch a single layer from memory and perform various operations, such as rotation, clipping, mirroring, blurring, or other editing operations with respect to the layer. Each of the different hardware pipelines may concurrently fetch a different layer, perform these various editing operations, outputting the processed layers to mixers that mix one or more of the different layers to form a frame.
[0029] As utilization of devices (such as mobile devices) to perform increasingly more tasks, including transmission of frames wirelessly for display via display units not integrated within the mobile device (such as television sets), devices have begun to provide multitasking in terms of presenting multiple windows alongside one another. These windows may also be accompanied by various alerts, notifications, and other on-screen items.
[0030] To accommodate the additional layers that result from the increased number of layers, the display processor may offer more hardware pipelines to allow for an increased number of layers to be processed. Adding additional hardware pipelines may however result in increased die area for the SoC, potentially increasing power utilization and adding significant cost.
[0031] In the techniques described in this disclosure, a single hardware image fetcher pipeline of hardware image fetcher pipelines 24 (image fetchers 24) in display processor 18 may independently process two or more layers. Rather than process a single layer (or multiple dependent layers where any operation performed to one of the multiple dependent layers is also performed with respect to the other dependent layers), the techniques may allow a single one of image fetchers 24 of display processor 18 to individually process one of the multiple independent layers separate from the other ones of the multiple layers. Unlike dependent layers, for independent layers any operation performed to one of the independent layers need not necessarily be performed with respect to the other dependent layers. The example techniques are described with respect to independent layers, but may be applicable to dependent layers as well.
[0032] In operation, each individual one of image fetchers 24 of display processor 18 may concurrently (e.g., in parallel or at the same time) retrieve or, in other words, fetch two or more layers. Each of image fetchers 24 may next individually process the two or more layers. For example, one of image fetchers 24 may apply a first operation with respect a first one of the layers and apply a second, different operation with respect to the second one of the layers. Example operations include a vertical flip, a horizontal flip, clipping, rotation, etc.
[0033] After individually processing the multiple layers, each of the image fetchers 24 may individually output the multiple processed layers to layer mixing units that may mix the multiple processed layers to form a frame. In some examples, a single first processed layer of the multiple layers processed by a first one of image fetchers 24 may be mixed with a single second processed layer of the multiple layers processed by a second one of image fetchers 24 where the remaining layers of the multiple layers processed by the first and second ones of image fetchers 24 may be mixed separate from the single first and second layers. As such, each of the image fetchers 24 has multiple outputs to a crossbar connecting the hardware pipelines to the layer mixing units, as described below in more detail with respect to
[0034] In this respect, the techniques may allow each of image fetchers 24 to independently process two or more layers, thereby increasing the number of layers display processor 18 is able to concurrently retrieve, and potentially without increasing the number of image fetchers 24. As such, the techniques may improve layer throughput without, in some examples, adding additional image fetchers to image fetchers 24, which may avoid an increase in boardspace, or chip area (which may also be referred to as chip die area) for a system on a chip design, cost, etc.
[0035]
[0036] As further shown in the example of
[0037] Each of image fetchers 24 may execute according to a clock cycle to fetch a pixel from each of the two or more of layers 27. In this respect, the discussion of fetching layers 27 should be understood to refer to fetching of a pixel from each of layers 27. Each of image fetchers 24 may therefore fetch two or more of layers 27 by fetching a pixel from each of the two or more layers 27. Image fetchers 24 may be configured to perform a direct memory access (DMA), which refers to a process whereby images fetchers 24 may directly access system memory 16 independently from processor 12, or in other words, without requesting that processor 12 manage the memory access.
[0038] As shown in the example of
[0039] Image fetchers 24 may fetch two or more individual, distinct (or, in other words, independent) ones of layers 27 rather than fetch a single individual, distinct layer or a layer having two or more dependent sub-layers (as in the case of video data in which a luminance sub-layer and a chrominance sub-layer are dependent in that any operation performed with respect to one of the sub-layers is also performed with respect to the other sub-layer). Image fetchers 24 may each be configured to perform a different operation with respect to each of the two or more fetched ones of layers 27. The various operations are described in more detail with respect to
[0040] In this sense, each of image fetchers 24 may support multi-layer (or, for rectangular images, multi-rectangle) fetching when configured in DMA mode. Each of the fetched layers 27 may have a different color or tile format (given that each layer is independent and not dependent from one another), and a different horizontal/vertical flip setting (again, because each of the two of more fetched ones of layer 27 is independent form one another). Each of image fetchers 24 may also support, as described in more detail below, overlapping of the two or more fetched ones of layers 27, as well as, support source splitting.
[0041] Crossbar 28 may represent a hardware unit configured to route or otherwise switch anyone of processed layers 29 to any one of mixers 30. Crossbar 28 may include a number of stages, each stage having nodes equal to half of a number of inputs to crossbar 28. For example, assuming crossbar 28 includes 16 inputs, each stage of crossbar 28 may include eight nodes. The eight nodes of each stage may be interconnected to eight nodes of a successive stage in various combinations. One example combination may resemble what is referred to as a non-blocking switch network or non-blocking network switch. Crossbar 28 may operate with respect to the clock cycle, transitioning processed layers from each stage to each successive stage per clock cycle, outputting processed layers 29 to one of mixers 30. Crossbar 28 is described in more detail below with respect to the example of
[0042] Mixers 30 each represent a hardware unit configured to perform layer mixing to obtain composite layers 31A-31N (composite layers 31). Composite layers 31 may each include the two or more independent processed layers 29 combined in various ways as described in more detail below with respect to the examples of
[0043] DSPs 32 may represent a hardware unit configured to perform various digital signal processing operations. In some examples, DSPs 32 may represent a dedicated hardware unit that perform the various operations. In these and other examples, DSPs 32 may be configured to execute microcode or instructions that configure DSPs 32 to perform the operations. Example operations for which DSPs 32 may be configured to perform include picture adjustment, inverse gamma correction (IGC) using a lookup table (LUT), gamut mapping, polynomial color correction, panel correction using a LUT, and dithering. DSPs 32 may be configured to perform the operations to generate processed composite layers 33, outputting processed composite layers 33 to DSC 34.
[0044] DSC 34 may represent a unit configured to perform display stream compression. Display stream compression may refer to a process whereby processed composite layers 33 and composite layers 31N are losslessly or lossy compressed through application of predictive differential pulse-code modulation (DPCM) and/or color space conversion to the luminance (Y), chrominance green (Cg), and chrominance orange (Co) color space (which may also be referred to as YCgCo color model). DSC 34 may output compressed layers 35A-35N (compressed layers 35, which may refer to compressed versions of both processed composite layers 33 and non-processed layers 31) to crossbar 38.
[0045] Crossbar 38 may be substantially similar to crossbar 28, routing or otherwise switching compressed layers 35 to various different display interfaces 40. Display interfaces 40 may represent one or more different interfaces by which to display compressed layers 35. DSC 34 may compress each of compressed layers 35 in different ways based on the type of display interface 40 to which compressed layers 35 are each is destined. Examples of different types of display interfaces 40 may include DisplayPort, video graphics array (VGA), digital visual interface (DVI), high definition multimedia interface (HDMI), and the like. Display interfaces 40 may be configured to output each of the compressed layers 35 to one or more display, such as display 19, by writing the compressed layers 35 to a frame buffer or other memory structure, neither of which are shown for ease of illustration purposes.
[0046]
[0047] In the example of
[0048]
[0049] In the example of
[0050] In the example of
[0051] In the example of
[0052]
[0053] Referring first to the example of
[0054] The example shown in
[0055] In the examples of
[0056] Display processor 18 may, in the example of
[0057]
[0058]
[0059]
[0060] Burst buffer 72 of address generator 70 may support horizontal flip burst alignment on both P0 and P1 plane (which refers to the streams, or planes, of pixels from each of the two different ones of independent layers 27). Formatter 74 may support include separate P0 and P1 interface to the de-tile buffer. De-tile buffer 76 may support burst level horizontal flip operations, while unpacker 76 may handle horizontal flip operators within each access unit (which may refer to 16-bytes of pixel data). The video pipeline for image fetchers 24, while not explicitly shown in
[0061]
[0062] The internal architecture of crossbar 28 shown in the example of
[0063] Crossbar 28, as shown in
TABLE-US-00001 //Pseudo code for crossbar configuration //create the fixed network, //7 levels (y direction), each level has 8 (x direction) 22 mini-crossbar, each bar has two connections to the level up and 2 connections to the level down total 16 connections up and 16 connection down, the fixed network has a double link data structure LV[y][x].dn[3:0]; //down connection for current level ,y=0 to 6, x =0 to 15 LV[y][x].up[3:0]; // up connection for current level LV[y][x].ilayer[3:0]; //layer mixer layer number. 16 unique layers (8 layers 2 sublayers) need flops for these signals. Total flops are 7*16*4=448 LV[y][x]. iactive //current layer is used in current frame. Not used layer has this value set to 0. Need flops for these signals. Total flops are 716=112 LV[y][x].olayer[3:0], //layer mixer layer number at each level output LV[y][x]. oactive //current layer active bit at the output of each level // fixed connection between level 0 to level 1 and level 6 to level 5. They have the same connection to the next level For (k=0,k<8, k++){ LV[0][2*k].dn=k; LV[0][2*k+1].dn=8+k; LV[6][2*k].up=k; LV[6][2*k+1].up=8+k} //fixed connection between level 1 to level 2 and level 5 to level 4. They have the same connection to the next level For (m=0, m<2, m++){For (k=0, k<4, k++){ LV[1][m*8+2*k].dn=m*8+k;LV[1][m*8+2*k+1].dn=m*8+4+k LV[5][m*8+2*k].up=m*8+k;LV[5][m*8+2*k+1].up=m*8+4+k} //fixed connection between level 2 to level 3 and level 4 to level 3. They have the same connection to the next level For (n=0, n<2, n++){For (m=0,m<2,m++){For (k=0,k<2,k++){ LV[2][n*8+m*4+2*k].dn=n*8+m*4+k;LV[2][n*8+m*4+2*k+1].dn=n*8+m*4+2+k LV[4][n*8+m*4+2*k].up=n*8+m*4+k;LV[4][n*8+m*4+2*k+1.up]=n*8+m*4+2+k}}} // close the double link For (y=1,y<4,y++){ For (x=0, x<16, x++){ LV[y][LV[y1][x].dn].up=LV[y1][x].dn LV[6y][LV[7y][x].up].dn=LV[7y][x].up} //Config network at start of the frame to form 1616 crossbar //LV_CFG[y][x].cross[0] is the 22 mini-bar crossover select signals. 7 levels (y direction) and each level has 8 (x direction) mini 22 bar. Each mini bar needs one bit to determine 0= no crossover, 1=crossover. Total 7 level 8 bit configuration need to be setup during frame start up. Configuration is 1 level at a time from both top and bottom level. Total cycle is 4 (meet in the middle) to completely setup crossbar network. LV_CFG[y][x].cross[0] =0 //y=0, to 6, x =0 to 7. default to 0(no cross) ************************** // level 0 cross config in clock 0 *************************** N=0 For (j=0,j<7,j++){For (k=j,k<7,k++){// find the conflict, Left half and right half check independently. CMP[N].L_l=LV[0][2*j].ilayer[3:1];CMP[N].L_l_a= LV[0][2*j].iactive; CMP[N].L_r=LV[0][(2*(k+1)].ilayer[3:1];CMP[N].L_r_a=LV[0][2*(k+1)].iactive; CMP[N].R_l=LV[0][2*j+1].ilayer[3:1];CMP[N].R_l_a= LV[0][2*j+1].iactive; CMP[N].R_r=LV[0][(2*(k+1)+1].ilayer[3:1];CMP[N].R_r_a=LV[0][2*(k+1)+1].iactive ; // cross over when adjacent active layer number are on the same left or right half If ((CMP[N].L_l==CMP[N].L_r) && CMP[N].L_l_a&& CMP[N],L_r_a)) ||(CMP[N].R_l==CMP[N].R_r) && CMP[N].R_l_a&& CMP[N].R_r_a)) {LV_CFG[0][j].cross=1} N=N+1 } For (i=0,i<16, i++){ // trafer the layer number to the next level after level 0 crosses are set LV[1][i].ilayer=LV[0][(LV[1][i].up[3:1]<<1 + LV_CFG[0][LV[1][i].up>1].cross {circumflex over ()}LV[1][i].up[0]].ilayer LV[1][i].iactive=LV[0][(LV[1][i].up[3:1]<<1 + LV_CFG[0][LV[1][i].up>1].cross {circumflex over ()}LV[1][i].up[0]].iactive} // level 6 cross config is a slave of level 0 config For (s=0, s<2, s++){For (i=0, i<8, i++){//if odd layer end in the left half of the bar in level 1, it need cross at the level 6. If even layer end in the right half of the bar need cross at level 6 as well. If ((LV[1][8*s+i].ilayer[0]~=s) && (LV[1][8*s+i].iative==1) LV_CFG[6][LV[1][8*s+i].ilayer[3:1]].cross=1}} // transfer layer number to layer 5 after level 6 cross is set For (i=0,i<16, i++){ LV[5][i].olayer=LV[6][LV[5][i].dn[3:1]<<1+LV_CFG[6][LV[5][i].dn[3:1].cross{circumflex over ()}LV[5][ i].dn[0]].olayer LV[5][i].oactive=LV[6][LV[5][i].dn[3:1]<<1+LV_CFG[6][LV[5][i].dn[3:1].cross{circumflex over ()}LV[5] [i].dn[0]].oactive } ********************************************************* // level 1 cross config in clock 1 reuse the comparator used in L0 config ********************************************************* N=0 For (j=0,j<4,j++){for (k=j,j<4,k++){for (s=0,s<2, s++){ //s=0 left 88 bar, s=1 right 88 bar CMP[N+s].L_l=LV[1][8*s+2*j].ilayer[3:1];CMP[N].L_l_a= LV[1][8*s+2*j].iactive; CMP[N+s].L_r=LV[1][(8*s+2*(k+1)].ilayer[3:1];CMP[N].L_r_a=LV[1][8*s+2*(k+1)]. iactive; CMP[N+s].R_l=LV[1][8*s+2*j+1].ilayer[3:1];CMP[N].R_l_a= LV[1][8*s+2*j+1].iactive; CMP[N+s].R_r=LV[1][(8*s+2*(k+1)+1].ilayer[3:1];CMP[N].R_r_a=LV[1][8*s+2*(k+ 1)+1].iactive; // cross over when adjacent layer number are on the same left or right half of the 88 bar (eq to 88 crossbar level 0 cross logic) If ((CMP[N+s].L_l==CMP[N+s].L_r) && CMP[N+s].L_l_a&& CMP[N+s],L_r_a)) ||(CMP[N+s].R_l==CMP[N+s].R_r) && CMP[N+s].R_l_a&& CMP[N+s].R_r_a)) {LV_CFG[1][4*s+j].cross=1} N=N+2}}} For (i=0,i<15, i++){// trafer the layer number to the next level after level 1 crosses are set LV[2][i].ilayer=LV[1][(LV[2][i].up[3:1] + LV_CFG[1][LV[2][i].up>1].cross {circumflex over ()}LV[0][i].up[0]].ilayer} } // level 5 cross config is a slave of level 1 config For (s=0, s<2, s++){For (i=0, i<4, i++){for(j=i,j<4,j++) If ((LV[5][8*s+2i].olayer== LV[2][8*s+4+j].ilayer) && (LV[2][8*s+4+j].iative && LV[5][8*s+2i].oactive || LV[5][8*s+2i+1].olayer== LV[2][8*s+j].ilayer) && (LV[2][8*s+4+j].iative && LV[5][8*s+2i+1].oactive) LV[5]_CFG[4*s+i]].cross=1}} // transfer level 5 layer number to layer 4 For (i=0, i<16,i++){ LV_[4][i].olayer=LV[5][LV[4][i].dn[3:1]<<1+LV[4][i].dn[0]{circumflex over ()}LV_CFG[5][LV[4][i].dn[ 3:1]].cross].olayer LV_[4][i].oactive=LV[5][LV[4][i].dn[3:1]<<1+LV[4][i].dn[0]{circumflex over ()}LV_CFG[5][LV[4][i].dn[ 3:1]].cross].oactive } ******************************************************** // level 2 cross config in clock 2 reuse the comparator used in L0 config *********************************************************** N=0 For (j=0,j<2,j++){for (k=j,j<2,k++){for (s=0,s<4, s++){ //s=0 left most 44 bar, s=3 right most 44 bar CMP[N+s].L_l=LV[2][4*s+2*j].ilayer[3:1];CMP[N].L_l_a= LV[2][4*s+2*j].iactive; CMP[N+s].L_r=LV[2][(4*s+2*(k+1)].ilayer[3:1];CMP[N].L_r_a=LV[2][4*s+2*(k+1)]. iactive; CMP[N+s].R_l=LV[2][4*s+2*j+1].ilayer[3:1];CMP[N].R_l_a= LV[2][4*s+2*j+1].iactive; CMP[N+s].R_r=LV[2][(4*s+2*(k+1)+1].ilayer[3:1];CMP[N].R_r_a=LV[2][4*s+2*(k+ 1)+1].iactive; // cross over when adjacent layer number are on the same left or right half of the 44 bar (eq to 44 crossbar level 0 cross logic) If ((CMP[N+s].L_l==CMP[N+s].L_r) && CMP[N+s].L_l_a&& CMP[N+s],L_r_a)) ||(CMP[N+s].R_l==CMP[N+s].R_r) && CMP[N+s].R_l_a&& CMP[N+s].R_r_a)) {LV_CFG[2][2*s+j].cross=1} // level 2 cross config in clock 2 reuse the comparator used in L0 config N=N+8}}} For (i=0,i<16, i++){ // trafer the layer number to the next level(3) after level 2 crosses are set LV[3][i].ilayer=LV[2][(LV[3][i].up[3:1] + LV_CFG[2][LV[3][i].up>1].cross {circumflex over ()}LV[0][i].up[0]].ilayer} LV[3][i].iactive=LV[2][(LV[3][i].up[3:1] + LV_CFG[2][LV[3][i].up>1].cross {circumflex over ()}LV[0][i].up[0]].iactive} } //L4 config is a slave of L2 config For (s=0, s<2, s++){For(ss=0,ss<2, ss++){ For (i=0, i<2, i++) If ((LV[4][8*s+4*ss+2i].olayer== LV[3][8*s+4+j].ilayer) && (LV[3][8*s+4*ss+2+i].iative && LV[4][8*s+4*ss+2i].oactive || LV[4][8*s+4*ss+2i+1].olayer== LV[3][8*s+4*ss+i].ilayer) && (LV[2][8*s+4*ss+i].iative && LV[5][8*s+4*ss+2i+1].oactive) LV[4]_CFG[4*s+2*ss+i]].cross=1}} // transfer layer number from level 4 to level 3 For (i=0, i<16,i++){ LV_[3][i].olayer=LV[4][LV[3][i].dn[3:1]<<1+LV_[3][i].dn[0]{circumflex over ()}LV_CFG[4][LV[3][i].dn[ 3:1]].cross].olayer LV_[3][i].oactive=LV[4][LV[3][i].dn[3:1]<<1+LV[3][i].dn[0]{circumflex over ()}LV_CFG[4][LV[3][i].dn[ 3:1]].cross].oactive} ****************************** *//level 3 cross config. clock cycle 3 ****************************** For (i=0,i< 8, i++){ If ((LV[3][2*i].ilayer !=LV[3][2*i].olayer )|| LV[3][2*i+1].ilayer!=LV[3][ 2*i+1].olayer)){LV_CFG[3][i].cross=1}}
[0064]
[0065] Alternative user inputs may determine a scroll speed and direction. For example, a user voice command, a user eye gaze command, and a user-initiated autoscroll command may also be used to determine the scroll speed.
[0066]
[0067] In this approach, the layer 902 is retrieved from memory 900 with lossless fetch compression in 904. Lossless compression ensures no information from the layer is lost during the retrieval, but incurs memory bandwidth and power consumption costs of retrieving a large amount of information from the memory 900. The rendering and displaying process to support scrolling content, as discussed herein, may require multiple accesses to the memory 900. For example, high frame rate rendering and display to support high speed or fast scrolling may require many layer retrievals from memory. This creates latency or delay, requires memory bandwidth, and consumes power.
[0068] Once the layer 902 is retrieved, it may be additionally processed in 906. 906 may execute source processing of an image layer. Example processes may include, for example, Color space conversion (for example, YUV to RGB), Scaling (for example, changing resolution of imageupscaling/downscaling), Image Cropping (for example, Configuring ROI and stride), Decimation, and De-interleaving image.
[0069] A hardware mixer 908 may receive the output from process 906 and mix in any other necessary signals or components. A resulting output is sent to the display compositor pipeline 910, which communicates with an DSI interface 912. The DSI interface provides a standardized interface with display panels.
[0070] For example, the DSI interface may conform to the Display Serial Interface (DSI) specification by the Mobile Industry Processor Interface (MIPI) Alliance defining display controllers in mobile devices. It may support LCD and similar display technologies. It may define a serial bus and a communication protocol between a host (source of the image data) and a display device such as a display panel. For example, display panels may be either command mode or video mode. The desired image content is then displayed on the display.
[0071]
[0072] During low speed scrolling, the layer 902 may be retrieved from memory 900 with lossless fetch compression 904 for communication over the DSI interface 912 as discussed above.
[0073] During high speed scrolling, content is rendered and displayed at a fast rate (high frames per second), thus individual frames of the content are less visually perceivable to the user. A GPU software 920 may determine such conditions are suitable to utilize lossy fetch compression 924 instead of lossless fetch compression 904. For example, a first threshold scroll speed may be set by software, user preferences, or system default, below which lossless fetch compression will be used. In this example, lossy fetch compression will be used if the requested scroll speed is above the first threshold scroll speed.
[0074] A specific compression scheme may be selected in display-graphics pipeline 922. As discussed here, a compression factor may be varied depending on a desired quality to compression ratio: a higher compression factor may result in a lower quality output of the layer. Alternatively, a lower compression factor may result in a higher quality output of the layer. For example, a second threshold scroll speed may be set by software, user preferences, or system default. If the requested scroll speed is above the second threshold scroll speed, a higher compression factor lossy fetch compression may be used.
[0075] Utilizing lossy fetch compression 924 when retrieving layer 902 from memory 900 can reduce required memory bandwidth and power consumption during scrolling use cases. While the quality of individual frames and the layer overall may suffer, such decreased quality is unlikely to impact the user experience as the frames are changing very quickly. Furthermore, a compression factor or compression ratio can be further increased as the scroll speed and rendering frame rate increases, further increasing savings with minimal impact to user experience. In other embodiments, a fixed amount of memory bandwidth may be allocated, and the compression factor is selected to ensure the allocated memory bandwidth is not exceeded. In other embodiments, a fixed amount of power may be allocated for the rendering process, and the compression factor is selected to ensure the allocated power is not exceeded.
[0076] The power consumption of display processor during the layer fetch from DDR memory is proportional to: 1. Width and height of layer, 2. bits per pixel of the layer, and 3. Frame rate (often measured in frames-per-second or FPS). The power consumption is inversely proportional to any applied fetch compression factor, which reduces a size of the layer actually retrieved from memory. Bandwidth consumption, and thus, power consumption, can be reduced by increasing fetch compression.
[0077] As an example, if a compression factor is 4, an estimated 15% power savings can be achieved.
TABLE-US-00002 TABLE 1 Estimated power savings With Video Existing proposed Savings Percentage playback (mA) changes (mA) (mA) Savings WQHD 325.52 mA ~274.07 mA ~50 15% RGB layer
[0078] In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media. In this manner, computer-readable media generally may correspond to tangible computer-readable storage media which is non-transitory. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.
[0079] By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. It should be understood that computer-readable storage media and data storage media do not include carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
[0080] Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term processor, as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
[0081] The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
[0082] Various examples have been described. These and other examples are within the scope of the following claims.