Data processing apparatus for pipeline execution acceleration and method thereof

Abstract

Disclosed herein are a data processing apparatus for pipeline execution acceleration and a method thereof. According to an exemplary embodiment of the present invention, the data processing apparatus for pipeline execution acceleration includes: a processor configured to sequentially execute a first application program and a second application program reading or writing a specific file; and a file system configured to complete a write for a file data for the specific file to a data block previously allocated from the first application program and provide the file data for the specific file to the second application program prior to executing a close call for the specific file from the first application program, when executing a read call for the specific file from the second application program.

Claims

1. A data processing apparatus for pipeline execution acceleration, comprising: a processor configured to: execute a first application program writing file data to a specific file, and execute a second application program reading the file data from the specific file; and a file system communicationally coupled to the processor and configured as a virtual ring type storage space in form of a circular queue of data blocks to store at least some of the file data for the specific file to cause the at least some of the file data be readable from a data block among the data blocks by the second application program prior to executing a close call for the writing of the file data to for the specific file from the first application program, wherein the file system is further configured to, set for the circular queue a queue_tail that represents a position for reading the file data and set a queue_head that represents a position for writing a file data, perform a setting of a queue_index for a read that represents a start position of the circular queue at a time of executing the read from the specific file by the second application program to read file data having a size from the set queue_index and in response to the read, change the queue_tail of the circular queue, the setting of the queue_index for the read includes to, check, in a read request message, for an offset, which represents a value up to a position spaced apart from a previous start position of the specific file previously stored for the circular queue, and for a size, which represents a size of the file data to be read, and set the queue_index using the offset when a size of the circular queue is larger than the size included in the read request message and the specific file is closed.

2. The data processing apparatus of claim 1, wherein the queue_tail is obtained by the following equation (queue_tail=(queue_tail+size) modulo queue_size), in which the queue_tail represents the queue_tail within the circular queue, the size represents the size of the file data to be read, and the queue_size represents a size of the circular queue.

3. The data processing apparatus of claim 1, wherein the file system is further configured to, set a queue_index for the writing representing the start position of the circular queue for the writing at a time of executing the writing from the second application program, and write the file data having a size from the set queue_index and in response to the write, change the queue_head of the circular queue.

4. The data processing apparatus of claim 3, wherein to set the queue_index for the writing includes, checking, in a write request message, for an offset, which represents a value up to a position spaced apart from a previous start position of the specific file previously stored within the circular queue and for a size, which represents a size of the file data to be written, and setting the queue_index using the offset when the offset is included in an empty section of the circular queue and an available space of the circular queue is larger than the size.

5. The data processing apparatus of claim 3, wherein the queue_head is obtained by the following equation (queue_head=(queue_head+size) modulo queue_size), wherein the queue_head represents the queue_head within the circular queue, the size represents the size of the file data to be written, and the queue_size represents a size of the circular queue.

6. The data processing apparatus of claim 1, the file system is further configured to: present an application programming interface (API) configured to perform write and read operations of the specific file according to the circular queue at the time of executing the read and the write for the specific file; update previously generated call information for the read and the write according to the circular queue at the time of executing the read and the write for the specific file and provide the call information to allow the API to perform the read and write operations for the specific file; and manage the file data for the specific file and read or write the file data according to the circular queue at the time of executing the read and the write from the first and second application programs.

7. A data processing method for pipeline execution acceleration, comprising: executing, by a processor, a first application program writing file data to a specific file; executing, by the processor, a second application program reading the file data from the specific file; and providing, by a file system, a virtual ring type storage space configured in form of a circular queue of data blocks that stores at least some of the file data for the specific file to cause the at least some of the file data for the specific file be readable from a data block among the data blocks by the second application program prior to executing a close call for the writing of the file data to the data for the specific file from the first application program, wherein by the file system, set for the circular queue a queue_tail that represents a position for reading the file data and set a queue_head that represents a position for writing the file data, perform a setting of a queue_index for a read that represents a start position of the circular queue at a time of executing the read from the specific file by the second application program to read file data having a size from the set queue_index and in response to the read, change the queue_tail of the circular queue, the setting of the queue_index for the read includes to, check, in a read request message, for an offset, which represents a value up to a position spaced apart from a previous start position of the specific file previously stored for the circular queue, and for a size, which represents a size of the file data to be read, and set the queue_index using the offset when a size of the circular queue is larger than the size included in the read request message and the specific file is closed.

8. The data processing method of claim 7, wherein the queue_tail is obtained by the following equation (queue_tail=(queue_tail+size) modulo queue_size), in which the queue_tail represents the queue_tail within the circular queue, the size represents the size of the file data to be read, and the queue_size represents a size of the circular queue.

9. The data processing method of claim 7, wherein by the file system, setting a queue_index for the writing representing the start position of the circular queue for the writing at a time of executing the writing from the second application program, and writing the file data having a size from the set queue_index and in response to the writing, changing the queue_head of the circular queue.

10. The data processing method of claim 9, wherein to set the queue_index for the writing includes, checking, in a write request, for an offset, which represents a value up to a position spaced apart from a previous start position of the specific file previously stored within the circular queue and for a size, which represents a size of the file data to be written, and setting the queue_index is set using the offset when the offset is included in an empty section of the circular queue and an available space of the circular queue is larger than the size.

11. The data processing method of claim 9, wherein the queue_head is obtained by the following equation (queue_head=(queue_head+size) modulo queue_size), in which the queue_head represents the queue_head within the circular queue, the size represents the size of the file data to be written, and the queue_size represents a size of the circular queue.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a diagram for describing an execution overlapping principle between pipeline stages according to an exemplary embodiment of the present invention.

(2) FIG. 2 is a first diagram for describing an execution principle using a remote pipe according to an exemplary embodiment of the present invention.

(3) FIG. 3 is a control block diagram illustrating a control configuration of a data processing apparatus according to an exemplary embodiment of the present invention.

(4) FIG. 4 is a diagram illustrating data management for a specific file in a file storage management module illustrated in FIG. 3.

(5) FIG. 5 is a flow chart illustrating a file read process according to the exemplary embodiment of the present invention.

(6) FIG. 6 is a flow chart illustrating a file write process according to the exemplary embodiment of the present invention.

(7) FIG. 7 is a flow chart illustrating an execution process of opening a file according to an exemplary embodiment of the present invention.

(8) FIG. 8 is a flow chart illustrating an execution process at the time of recording a file block according to an exemplary embodiment of the present invention.

(9) FIG. 9 is a flow chart illustrating an internal execution process of reading a file block according to an exemplary embodiment of the present invention.

(10) FIG. 10 is a flow chart illustrating an execution process of closing a file according to an exemplary embodiment of the present invention.

(11) FIG. 11 is a second diagram for describing an execution principle using a remote pipe according to an exemplary embodiment of the present invention.

DESCRIPTION OF EXEMPLARY EMBODIMENTS

(12) Only a principle of the present invention will be described below. Therefore, although the principle of the present invention is not clearly described or shown in the specification, those skilled in the art can implement a principle of the present invention and invent various apparatuses included in a concept and a scope of the present invention. In addition, conditional terms and exemplary embodiments described in the specification are in principle used only for purposes for understanding the concept of the present invention.

(13) The foregoing objects, features and advantages will become more apparent from the following description of preferred exemplary embodiments of the present invention with reference to accompanying drawings, which are set forth hereinafter. Accordingly, those having ordinary knowledge in the related art to which the present invention pertains will easily embody technical ideas or spirit of the present invention. Further, when technical configurations known in the related art are considered to make the contents obscure in the present invention, the detailed description thereof will be omitted.

(14) In particular, an exemplary embodiment of the present invention proposes a new method for shortening an execution time of a pipeline by supporting execution overlapping between stages while using file medium input and output connection as it is. That is, characteristics of the pipeline are 1) a use of an output of a previous stage as an input by application programs of each stage and 2) a direct generation of an output by being executed from the moment the input is provided and execution overlapping between pipeline stages using the pipeline characteristics.

(15) FIG. 1 is a diagram for describing an execution overlapping principle between pipeline stages according to an exemplary embodiment of the present invention.

(16) As illustrated in FIG. 1, a principle of receiving and executing an output data of a first application as an input by a second application while the first application is executed.

(17) That is, the second application directly processes data generated by the first application without waiting until the first application ends. By doing so, most of the execution time of the second application may overlap the first application, which may be greatly reduced, compared with sequentially executing the overall execution time of the pipeline configured of two stages.

(18) FIG. 2 is a diagram for describing a first execution principle using a remote pipe according to an exemplary embodiment of the present invention.

(19) FIG. 2 illustrates a process of executing a pipeline which is configured of several stages, for example, process A, process B, process C, and process D using a remote pipe.

(20) FIG. 3 is a control block diagram illustrating a control configuration of a data processing apparatus according to an exemplary embodiment of the present invention and FIG. 4 is a diagram illustrating data management for a specific file in a file storage management module illustrated in FIG. 3.

(21) Referring to FIGS. 3 and 4, the data processing apparatus may include a processor 110 and a file system 120.

(22) The processor 110 may execute first and second application programs according to predefined processing order.

(23) According to the exemplary embodiment of the present invention, the processor 110 executes the first and second application programs, that is, two application programs according to the processing order to be able to output an input file or data as one file or data. Further, the processor 110 executes two application programs but the number of application programs is not limited and may be executed by the pipeline method and is not limited thereto.

(24) Herein, the processor 110 may receive the file data for the specific file from the file system 120 when the first application program executes read call for the specific file and provides the file data from the file system 120 depending on whether the first application program executes write call for the file data when the second application program executes the read call for the specific file.

(25) When executing the read call for the specific file from the second application program, the file system 120 completes the write for the file data for the specific file to a data block allocated from the first application program and provides the file data to the second application program prior to executing a close call from the first application program.

(26) That is, the file system 120 includes a file system application programming interface (API) 122 which perform the write and read operations for the specific file at the time of executing the read call and the write call for the specific file, a file state management module 124 which updates previously generated call information at the time of executing the read call and the write call for the specific file and provides the call information to make the file system API 122 perform the read and write operation for the specific file, and a file storage management module 126 which manages the file data for the specific file and reads or writes the file data at the time of executing the read call and the write call from the first and second application programs.

(27) The file storage management module 126 may store the file data for the specific file when at least one of the first and second application programs executes the write call for the specific file and read the file data for the specific file from the allocated data block at the time of executing the read call for the specific file.

(28) As the file storage management module 126, a storage medium (Not illustrated) such as a disk and a main memory may be used but is not limited thereto.

(29) FIG. 4 illustrates that the file storage management module 126 stores and manages the file data in the storage medium using a circular queue.

(30) As illustrated in FIG. 4, a dotted line box represents a file section in which the overall file is stored and a solid line box represents a data block in which the file data is stored in the storage medium and represents the circular queue.

(31) Herein, at least one of the first and second application programs recognizes that a virtual file such as a dotted line box is present and handles the file data but actually stores some of the file data in the circular queue.

(32) A reader sequentially reads blocks from the left of the storage section and a writer performs sequential recording from the right. To manage the section, the circular queue is used. The reader reads a block required in a queue_tail of the circular queue and the writer adds a block to a queue_head. A size of the data block of the file data which is actually stored in the storage medium may be limited to the size of the circular queue and is not limited thereto.

(33) FIG. 5 is a flow chart illustrating a file read process according to the exemplary embodiment of the present invention.

(34) As illustrated in FIG. 5, when the file system receives a read request message from the application program (S510), it may check an offset and a size included in the received read request message (S511).

(35) In this case, the offset represents a value up to a position spaced apart from a start position of the specific file previously stored in the circular queue and the size may represent a size of data to be read.

(36) Next, the file system may check whether the offset is included in the storage section of the circular queue (S512). In this case, the file system performs error processing when the offset is not included in the storage section of the circular queue (S513).

(37) Next, when the offset is included in the storage section of the circular queue, the file system may check whether the size of the data is larger than the size or the file is closed (S514). When the size of the data of the circular queue is smaller than the size or the file is not closed, the file system waits until an available data is present in the circular queue or the file is closed (S515).

(38) When the size of the data of the circular queue is larger than the size or the file is closed, the file system may check whether the file is closed (S516).

(39) When the file is closed, the file system may change the size representing the size of data to be read to the size of the available data of the circular queue from the offset (S517).

(40) Next, the file system may set a queue_index representing a position of the circular queue (S518). Here, the queue_index may be set to be a value calculated by the following [Equation 1].
Queue_index=offset modulo queue_size [Equation 1]

(41) In this case, when the file is not closed, the file system may set the queue_index representing the position of the circular queue without performing the process of changing the size to the size of the available data of the circular queue from the offset.

(42) Next, the file system may read a block having a size corresponding to the size from the set position of the queue_index (S519).

(43) Next, the file system may change the queue_tail of the circular queue to discard the read data (S520). Here, the queue_tail may be set to be a value calculated by the following [Equation 2].
Queue_tail=(queue_tail+size)modulo queue_size [Equation 2]

(44) Here, the queue_tail represents the previously set queue_tail within the circular queue, the size represents a size of data to be read, and the queue_size represents the size of the circular queue.

(45) FIG. 6 is a flow chart illustrating a file write process according to the exemplary embodiment of the present invention.

(46) As illustrated in FIG. 6, when the file system receives a write request message from the application program (S610), it may check an offset and a size included in the received write request message (S611).

(47) Next, the file system may check whether the offset is included in an empty section of the circular queue (S612).

(48) Next, when the offset is included in the empty section of the circular queue, the file system may check whether an available space of the circular queue is larger than the size (S614). When the available space is smaller than the size, the file system waits until the available space of the circular queue is larger than the size (S615).

(49) When the available space of the circular queue is larger than the size, the file system may set the queue_index which represents the position of the circular queue (S616).

(50) Next, the file system may record a block having a size corresponding to the size from the set position of the queue_index (S617).

(51) Next, the file system may change the queue_head of the circular queue to write a new data (S618). Here, the queue_head may be changed to be a value calculated by the following [Equation 3].
Queue_head=(queue_head+size)modulo queue_size [Equation 3]

(52) Here, the queue_head represents the previously set queue_head within the circular queue, the size represents a size of data to be read, and the queue_size represents the size of the circular queue.

(53) FIG. 7 is a flow chart illustrating an execution process of opening a file according to an exemplary embodiment of the present invention.

(54) As illustrated in FIG. 7, according to the exemplary embodiment of the present invention, when any first application program generates or opens the specific file, an internal execution operation between components included in the data processing apparatus is illustrated. That is, when the first application program executes the generation or open call for the specific file, the file system API 122 is connected to the file state management module 124 and the file storage management module 126 to perform an operation corresponding to the call (S100).

(55) Next, the file state management module 124 receives a file attribute for the specific file from the file system API 122 to generate or open the specific file and changes the state of the specific file to “open” (S110).

(56) Further, the specific file is used as a data transfer medium between the first and second application programs and therefore a file generation mode/open mode allows only a read-only or write-only mode.

(57) The file system API 122 is connected to the file storage management module 126 to allow any one of the first and second application programs recording in the specific file in the “open” state to use the write-only mode and the other of the first and second application programs reading the specific file to use the read-only mode (S120).

(58) FIG. 8 is a flow chart illustrating an execution process at the time of recording a file block according to an exemplary embodiment of the present invention.

(59) As illustrated in FIG. 8, according to the exemplary embodiment of the present invention, when any first application program writes the specific file, an internal execution operation between components included in the data processing apparatus is illustrated. That is, the file system API 122 is changed to the second application program immediately after transferring the data block which is written in the specific file to the file storage management module 126 and executes the following operations (S130).

(60) The file storage management module 126 completes the data block writing and informs the file system API 122 of the complete information (S140).

(61) Next, the file system API 122 requests the file state management module 124 to update the file size for the specific file (S150).

(62) Here, since the update timing of the file size is later than the write timing of the data block, the second application program sharing the specific file may read a value smaller than the size of the actually written (recorded) specific file, such that the second application program reading the specific file may correctly recognize the end of the specific file.

(63) FIG. 9 is a flow chart illustrating an internal execution process of reading a file block according to an exemplary embodiment of the present invention.

(64) As illustrated in FIG. 9, according to the exemplary embodiment of the present invention, when any first application program read the specific file, an internal execution operation between components included in the data processing apparatus is illustrated. That is, when the first application program executes the read call to read the specific file, the file system API 122 requests the file state management module 124 to check an offset of the data block for the specific file (S160).

(65) When the value of summing the position and size of the data block is smaller than the size of the file, the file system API 122 requests the file storage management module 126 of the data block (S170).

(66) The file storage management module 126 reads the data block and transfers the read data block to the file system API 122.

(67) Next, the file system API 122 transfers the transferred data block to the first application program and when the specific file is larger than the size, may execute the preset operation according to the state of the specific file.

(68) FIG. 10 is a flow chart illustrating an execution process of closing a file according to an exemplary embodiment of the present invention.

(69) As illustrated in FIG. 10, according to the exemplary embodiment of the present invention, when any second application program closes the specific file, an internal execution operation between components included in the data processing apparatus is illustrated. That is, when the second application program executes the file close call, the file system API 122 interrupts the connection with the file state management module 124 and the file storage management module 126 (S190).

(70) In this case, the file state management module 124 determines whether the file state is changed by referring to an open mode of the specific file corresponding to the previous connection. In other words, when the open mode is the write-only mode, the file state management module 124 changes the file state to “close” and when the open mode is the read-only mode, keeps the state as it is. When changing the state to the “close”, all the application programs waiting in the read call need to recognize the state change. The reason is that when the file is in the “write” state, the application program retrying the data block reading in the read call need not try the data block reading any more.

(71) FIG. 11 is a second diagram for describing an execution principle using a remote pipe according to an exemplary embodiment of the present invention.

(72) FIG. 11 illustrates a process of distributing and executing the pipeline configured of Process A to Process H using the remote pipe in the cluster configured of three nodes.

(73) Process A reads an initial input file from the disk, generates an output using the remote pipe and Process B receives the input from the remote pipe to generate an output to another remote pipe. By repeating this process to Process H, the execution of the pipeline may be completed.

(74) According to the exemplary embodiments of the present invention, the data processing apparatus may reduce the execution time of the pipeline by overlappingly executing the application programs of the pipeline while using the file input and output as it is.

(75) Further, according to the exemplary embodiments of the present invention, the data processing apparatus may be applied to the pipeline stage and the process of inputting data by the user.

(76) Further, according to the exemplary embodiments of the present invention, the data processing apparatus may execute the pipeline from the moment the user transmits data online, thereby reducing the waiting time of the user and the burden of the large-capacity data transmission.

(77) Further, according to the exemplary embodiments of the present invention, the data processing apparatus may switch the disk 10 to the network 10, thereby increasing the input and output speed of the data block and saving the disk space.

(78) Further, according to the exemplary embodiments of the present invention, the data processing apparatus may provide the computing environment in which the execution environment of the existing application programs is guaranteed and the application programs of several stages may be operated in parallel, thereby efficiently using the large-capacity computing resources.

(79) Further, according to the exemplary embodiments of the present invention, the data processing apparatus may perform the pipeline using the plurality of nodes in the cluster system without using the distributed file system, thereby providing the environment in which the data processing may be more rapidly made.

(80) Meanwhile, the exemplary embodiment of the present invention describes that all the components configuring the present invention as described above are coupled in one or are operated, being coupled with each other, but is not necessarily limited thereto. That is, at least one of all the components may be operated, being optionally coupled with each other within the scope of the present invention. Further, all the components may be each implemented in one independent hardware, but a part or all of each component may be selectively combined to be implemented as a computer program having a program module performing some functions or all the functions combined in one or a plurality of hardwares. Further, the computer program is stored in computer readable media, such as a USB memory, a CD disk, a flash memory, and the like, to be read and executed by a computer, thereby implementing the exemplary embodiment of the present invention. An example of the storage media of the computer program may include a magnetic recording medium, an optical recording medium, a carrier wave medium, and the like.

(81) A person with ordinary skilled in the art to which the present invention pertains may variously change and modify the foregoing exemplary embodiments without departing from the scope of the present invention. Accordingly, the exemplary embodiments disclosed in the present invention and the accompanying drawings are used not to limit but to describe the spirit of the present invention. The scope of the present invention is not limited only to the exemplary embodiments and the accompanying drawings. The protection scope of the present invention must be analyzed by the appended claims and it should be analyzed that all spirits within a scope equivalent thereto are included in the appended claims of the present invention.

Data processing apparatus for pipeline execution acceleration and method thereof

Assignee

Inventors

Cpc classification

Classification Explorer

G06F9/546

PHYSICS

Classification Explorer

G06F9/54

PHYSICS

Classification Explorer

G06F9/544

PHYSICS

International classification

Classification Explorer

G06F9/54

PHYSICS

Abstract

Claims

Description