CAMERA DEVICE AND IMAGE GENERATION METHOD OF CAMERA DEVICE
20220417428 · 2022-12-29
Assignee
Inventors
Cpc classification
H04N23/69
ELECTRICITY
H04N5/272
ELECTRICITY
H04N23/951
ELECTRICITY
H04N23/815
ELECTRICITY
International classification
H04N5/262
ELECTRICITY
Abstract
A camera device according to an embodiment may include: an image sensor which generates first Bayer data having a first resolution; and a processor which performs deep learning on the basis of the first Bayer data to output second Bayer data having a second resolution higher than the first resolution.
Claims
1-10. (canceled)
11. A camera device comprising: an input unit configured to receive magnification information on an image from a user; an image sensor configured to receive light and generate a first image having a first resolution; and a processor configured to generate a second image having a second resolution higher than the first resolution based on the first image, wherein the processor generates a third image corresponding to a magnification inputted by the user based on the first image and the second image.
12. The camera device according to claim 11, wherein the processor is a processor being learned using a deep learning.
13. The camera device according to claim 11, wherein the processor generates the third image by superimposing the second image on the first image enlarged by the magnification inputted by the user.
14. The camera device according to claim 11, wherein the processor generates a third image having a resolution value between the first resolution and the second resolution.
15. The camera device according to claim 11, wherein the processor generates a third image after up-scaling the first image using the magnification as inputted and down-scaling the second image using the magnification as inputted.
16. The camera device according to claim 11, wherein the processor generates a third image after superimposing the second image to the center of the first image.
17. The camera device according to claim 11, wherein the processor performs correction of the resolution by a preset range based on a boundary region where the first image and the second image are being superimposed with each other.
18. The camera device according to claim 11, wherein the processor performs correction of the resolution by changing a mixing ratio of the first resolution and the second resolution.
19. The camera device according to claim 17, wherein the processor increases the mixing ratio of the first resolution as it enters the inside of the second image based on the boundary region.
20. The camera device according to claim 11, wherein the processor generates the second image according to a preset algorithm to generate an image having a second resolution.
21. An image generation method of camera device comprising the steps of: receiving magnification information on an image from a user; receiving light by using an image sensor and generating a first image having a first resolution; generating a second image having a second resolution higher than the first resolution based on the first image; and generating a third image that is an image of a magnification inputted by the user based on the first image and the second image.
22. The image generation method of camera device according to claim 21, wherein the step of generating of the second image is performed by a processor being learned using a deep learning.
23. The image generation method of camera device according to claim 21, wherein the step of generating of the third image comprise a step of generating the third image by superimposing the second image on the first image enlarged by the magnification inputted by the user.
24. The image generation method of camera device according to claim 21, wherein the step of generating the third image comprises a step of generating a third image having a resolution value between the first resolution and the second resolution.
25. The image generation method of camera device according to claim 21, wherein the step of generating the third image comprises a step of generating a third image after up-scaling the first image using the magnification as inputted and down-scaling the second image using the magnification as inputted.
26. The image generation method of camera device according to claim 21, wherein the step of generating the third image comprises a step of generating a third image after superimposing the second image to the center of the first image.
27. The image generation method of camera device according to claim 21, comprising a step of performing correction of the resolution by a preset range based on a boundary region where the first image and the second image are being superimposed in the generated third image.
28. The image generation method of camera device according to claim 27, wherein the step of performing correction of a resolution is performed by changing the mixing ratio of the first resolution and the second resolution.
29. The image generation method of camera device according to claim 28, wherein the step of generating the third image comprises a step of performing correction of a resolution in a way that the mixing ratio of the first resolution is increased as it enters the inside of the second image based on the boundary region.
30. The image generation method of camera device according to claim 21, wherein the step of generating of the second image comprises a step of generating the second image according to a preset algorithm to generate an image having a second resolution.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
BEST MODE
[0049] The embodiments described in the present specification and the configurations shown in the drawings are preferred examples of the disclosed invention, and there may be various modifications that may replace the embodiments and drawings of the present specification at the time of filing of the present application.
[0050] In addition, terms used in the present specification are used to describe embodiments and are not intended to limit and/or limit the disclosed invention. Singular expressions include plural expressions unless the context clearly indicates otherwise.
[0051] In the present specification, terms such as “comprise”, “include” or “have” are intended to designate the presence of features, numbers, steps, actions, components, parts, or a combination thereof described in the specification. Or the presence or addition of other features, numbers, steps, actions, components, parts, or combinations thereof, or any other feature, or a number, steps, operations, components, parts, or combinations thereof, and includes ordinal numbers such as “first” and “second” used herein. The terms described above may be used to describe various components, but the components are not limited by the terms.
[0052] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present invention. In addition, in the drawings, parts not related to the description are omitted in order to clearly describe the present invention.
[0053]
[0054] Referring to
[0055] Specifically, the image sensor 130 may include an image sensor such as a complementary metal oxide semiconductor (CMOS) or a charge coupled device (CCD) that converts light entering through the lens 120 of the camera device into an electrical signal.
[0056] The transmission unit 140 may transmit the image acquired by the image sensor 130 to the reception unit 210 of the image generation device 200. In
[0057] Specifically, the transmission unit 140 may extract information on a Bayer pattern from an image acquired by the image sensor 130 and then transmit the information to the reception unit 210.
[0058] The image generation unit 200 may include a transmission unit 210 that receives an image transmitted by the transmission unit 140 and transmits it to the processor 220, a processor 220 that generates an image having a higher resolution by using an algorithm generated by deep learning training on the image received from the transmission unit 210, an output unit 230 that receives the image generated by the processor 220 and transmits it to the outside, and the like.
[0059] Specifically, after receiving a Bayer image having a first resolution from the reception unit 210, the processor 220 generates a Bayer image having a second resolution using an algorithm generated by deep learning training, and then the generated second Bayer image may be transmitted to the output unit 230.
[0060] Here, the second resolution means a resolution having a resolution value different from that of a first resolution, and specifically, may mean a resolution higher or lower than the first resolution. A resolution value that a second resolution may have may be freely set and changed by a user according to a user's purpose.
[0061] Therefore, although not illustrated in the drawing, a camera device 100 according to an embodiment may further include an input unit for receiving information about the second resolution, and through this, the user may input information about a desired resolution to the camera device 100.
[0062] For example, if the user wants to obtain a high resolution image, the second resolution may be set to a resolution having a large difference from the first resolution, and when a new image is desired to be acquired within a relatively short time, the second resolution value may be freely set to a resolution that does not have a large difference from that of the first resolution.
[0063] In addition, the processor 220 may be implemented through a memory (not shown) in which at least one program instruction being executed through the processor is stored.
[0064] Specifically, the memory may include a volatile memory such as SRAM or DRAM. However, it is not limited thereto, and in some cases, the memory 115 may also include a non-volatile memory such as flash memory, read only memory (ROM), erasable programmable read only memory (EPROM), electronically erasable programmable read only memory (EEPROM), and the like.
[0065] A typical camera device receives the Bayer pattern from the image sensor and outputs image data through the process of applying a color (color interpolation, color interpolation, or demosaicing), but the transmission unit 140 according to an embodiment extracts information including Bayer pattern information from the image received from the image sensor 130 and may transmit the extracted information to the outside.
[0066] Here, the Bayer pattern may include raw data being outputted by the image sensor 130 that converts an optical signal included in the camera device 100 into an electrical signal.
[0067] Specifically, the optical signal transmitted through the lens 120 included in the camera device 100 may be converted into an electrical signal through each pixel disposed in the image sensor capable of detecting colors of R, G, and B.
[0068] If the specification of the camera device 100 is 5 million pixels, it can be considered that an image sensor including 5 million pixels capable of detecting colors of R, G, and B is included. Although the number of pixels is 5 million, it can be seen that monochrome pixels that detect only the brightness of black and white rather than actually detecting each color are combined with any one of R, G, and B filters.
[0069] That is, in the image sensor, R, G, and B color filters are arranged in a specific pattern on monochromatic pixel cells arranged as many as the number of pixels. Accordingly, the R, G, and B color patterns are disposed intersected with one another according to the user's (i.e., human) visual characteristics, and this is called a Bayer pattern.
[0070] In general, the Bayer pattern has a smaller amount of data than image data. Therefore, there is an advantage in that even if the device is equipped with a camera device that does not have a high-end processor, it can transmit and receive Bayer pattern image information relatively faster than image data, and based on this, the Bayer pattern image can be converted into images with various resolutions.
[0071] For example, since a camera device is mounted on a vehicle, the camera device does not require many processors to process images even in an environment where the low voltage differential signaling (LVDS) having a full-duplex transmission speed of 100 Mbit/s is used, and thus it is not overloaded so that it may not endanger the driver or the safety of the driver using the vehicle.
[0072] In addition, since the size of data transmitted by the in-vehicle communication network can be reduced, even if it is applied to an autonomous vehicle, it is possible to eliminate problems caused by the communication method, communication speed, and the like according to the operation of a plurality of cameras disposed in the vehicle.
[0073] In addition, in transmitting the Bayer pattern image information to the reception unit 210, the transmission unit 140 receives frame in Bayer pattern from the image sensor 130 and then can send the information after sampling down to 1/n size.
[0074] Specifically, the transmission unit 140 may perform downsampling after smoothing through a Gaussian filter or the like on Bayer pattern data received before downsampling. Thereafter, after generating a frame packet based on the down-sampled image data, the completed frame packet may be transmitted to the reception unit 210. However, these functions may be simultaneously performed by the processor 220 instead of the transmission unit 140.
[0075] In addition, the transmission unit 140 may include a serializer (not shown) that converts the Bayer pattern into serial data in order to transmit Bayer pattern information through a serial communication method such as a low voltage differential signaling (LVDS).
[0076] Until now, general components of the camera module 100 according to an embodiment have been described. Hereinafter, a method and features of generating an algorithm being applied to the processor 220 will be described.
[0077] The algorithm applied to the processor 220 of the camera device 100 according to an embodiment is an algorithm for generating an image having a higher resolution than the resolution of the input image, and may mean an optimal algorithm that is generated by repeatedly performing deep learning training.
[0078] Deep learning, also referred to as deep structured learning, refers to a set of algorithms related to machine learning that attempts high-level abstraction (a task that summarizes core contents or functions in large amounts of data or complex data) through a combination of several nonlinear transformation methods.
[0079] Specifically, deep learning expresses any learning data in a form that a computer can understand (for example, in the case of an image, pixel information is expressed as a column vector, and the like), and is a learning technique for a lot of research (how to make a better representation technique and how to make a model to learn these) to apply these to learning, and may include learning techniques such as deep neural networks (DNN) and deep belief networks (DBN).
[0080] For example, deep learning may first recognize the surrounding environment and transmit the current environment state to the processor. The processor performs an action corresponding to it, and the environment informs the processor of a reward value according to the action again. And the processor takes the action that maximizes the reward value. Through this process, the learning process may be repeated.
[0081] As described above, the learning data used while performing deep learning may be a result obtained by converting a Bayer image with a low actual resolution into a Bayer image with a high resolution, or may be information obtained through simulation.
[0082] If the simulation process is performed, data can be acquired more quickly by adjusting it according to the environment of the simulation (the background of the image, the type of color, and the like). Hereinafter, a method of generating an algorithm applied to the processor 220 according to an embodiment will be described in detail with reference to
[0083]
[0084] The deep learning of
[0085] Deep neural networks (DNNs) can be specified as a deep neural network in which multiple hidden layers exist between an input layer and an output layer, a convolutional neural network that forms a pattern of connections between neurons, similar to the structure of the visual cortex of animals, and a recurrent neural network that builds up a neural network at every moment over time.
[0086] Specifically, DNN classifies neural networks by repeating convolution and sub-sampling to reduce the amount of data and distort. In other words, DNN outputs classification results through feature extraction and classification, and is mainly used for image analysis, and convolution means image filtering.
[0087] If describing the process being performed by the processor 220 to which the DNN algorithm is applied with reference to
[0088] Increasing the magnification means enlarging only a specific part of the image acquired by the image sensor 110. Accordingly, since the portion not selected by the user is a portion that the user is not interested in, there is no need to perform a process of increasing the resolution, so that only the portion selected by the user may be subjected to the convolution and subsampling process.
[0089] Subsampling refers to a process of reducing the size of an image. As an example, sub-sampling may use a Max-pool method and the like. Max-pull is a technique that selects the maximum value in a given region, similar to how neurons respond to the largest signal. Subsampling has advantages of reducing noise and increasing learning speed.
[0090] When convolution and subsampling are performed, a plurality of images 20 may be outputted as illustrated in
[0091] When a plurality of images are being outputted as illustrated in
[0092] The deep learning of
[0093] In the case of deep learning according to
[0094] Specifically, the deep learning according to
[0095] Here, the output data Y is data outputted through actual deep learning, and the second sample data Z is data inputted by the user and may mean data that can be most ideally outputted when the first sample data X is inputted to the algorithm.
[0096] Therefore, the algorithm according to
[0097] Specifically, after analyzing the parameters affecting the output data, feedback is given by changing or deleting the parameters or creating new parameters so that there may be no difference between the second sample data Z, which is the ideal output data, and the first output data Y, which is the actual output data.
[0098] For example, as illustrated in
[0099] In this case, if the difference between the first output data Y, which is the actual output data, and the second sample data Z, which is the most ideal output data, is increasing when the parameter is changed in the direction of increasing the value of the P22 parameter, the feedback may change the algorithm in the direction of decreasing the P22 parameter.
[0100] Conversely, if the difference between the first output data Y, which is the actual output data, and the second sample data Z, which is the most ideal output data, is decreasing when the parameter is changed in the direction of increasing the value of the P33 parameter, the feedback may change the algorithm in the direction of increasing the P33 parameter.
[0101] That is, through this method, the algorithm to which deep learning is applied in a way that the first output data Y, which is the actual output data, is outputted similarly to the second sample data Z, which is the most ideal output data.
[0102] And in this case, the resolution of the second sample data Z may be the same as or higher than the resolution of the first output data Y, and the resolution of the second sample data Z may be the same as the resolution of the first output data Y.
[0103] In general, in order to implement a processor capable of deep learning with a small chip, the number of deep learning processes and memory gates should be minimized, and here, the factors that have the greatest influence on the number of gates are the algorithm complexity and the amount of data processed per clock, and the amount of data processed by the processor depends on the input resolution.
[0104] Accordingly, since the processor 220 according to an embodiment creates a high-magnification image in a way that upscaling is performed after reducing the input resolution to reduce the number of gates, there is an advantage in that images can be generated faster.
[0105] For example, if an image with an input resolution of 8 MP (mega pixel) needs to be zoomed twice, it is zoomed twice by upscaling two times each in the horizontal and vertical directions based on the ¼ area (2 MP). And after ¼ downscaling of ¼ area (2 MP) and using an image with a resolution of 0.5 MP as input data for deep learning, and based on the generated image, if it is zoomed 4 times (4×) in a way that the width and length are upscaled by 4 times respectively, a zoom image of the same area that is zoomed twice can be generated.
[0106] Therefore, in the camera device 100 and the image generating method according to an embodiment, in order to prevent performance degradation due to input resolution loss, deep learning generates an image by learning as much as a magnification corresponding to the resolution loss, so there is an advantage of minimizing performance degradation.
[0107] In addition, deep learning-based algorithms for realizing high resolution images generally use a frame buffer, which may be difficult to operate in real time in general PCs and servers due to its characteristics.
[0108] However, since the processor 220 according to an embodiment applies an algorithm that has already been generated through deep learning, it can be easily applied to a low-spec camera device and various devices including the same, in applying this algorithm specifically, high resolution is realized by using only a few line buffers, so there is also an effect that a processor can be implemented with a relatively small chip.
[0109]
[0110] Referring to
[0111] The first Bayer data is information including the Bayer pattern previously described, and although described as Bayer data in
[0112] In addition, in
[0113] Referring to
[0114] Specifically, the first Bayer data includes a plurality of row data, and the plurality of row data may be transmitted to the first data alignment unit 221 through the plurality of line buffers 11.
[0115] For example, if the area where deep learning is to be performed by the deep learning processor 222 is a 3×3 area, a total of three lines must be simultaneously transmitted to the first data alignment unit 221 or the processor 220 to perform deep learning. Accordingly, information on the first line among the three lines is transmitted to the first line buffer 11a and then stored in the first line buffer 11a, and information on the second line among the three lines may be transmitted to the second line buffer 11b and then stored in the second line buffer 11b.
[0116] After that, in the case of the third line, since there is no information about the line received thereafter, it is not stored in the line buffer 11 and may be directly transmitted to the processor 220 or the first data alignment unit 221.
[0117] At this time, since the first data alignment unit 221 or the processor 220 needs to simultaneously receive information on three lines, information on the first line and information on the second line stored in the first line buffer 11a and the second line buffer 11b are also may be transmitted to the processor 220 or the first image alignment unit 219 at the same time.
[0118] Conversely, if the area where deep learning is to be performed by the deep learning processor 222 is an (N+1)×(N+1) area, Deep learning can be performed only when a total of N+1 lines are simultaneously transmitted to the first data alignment unit 221 or the processor 220. Accordingly, information on the first line among N+1 lines is transmitted to the first line buffer 11a and then stored in the first line buffer 11a, information on the second line among N+1 lines may be transmitted to the second line buffer 11b and then stored in the second line buffer 11b, and information on the Nth line among N+1 lines may be transmitted to the Nth line buffer 11n and then stored in the Nth line buffer 11n.
[0119] After that, in the case of the (N+1)th line, since there is no information on the line received thereafter, it is not stored in the line buffer 11 and can be directly transmitted to the processor 220 or the first data alignment unit 221, and as described previously, at this time, the first data alignment unit 221 or the processor 220 needs to simultaneously receive information on N+1 lines, so information on the first line to the nth line stored in the line buffers 11a to 11n may be simultaneously transmitted to the processor 220 or the first image alignment unit 219.
[0120] The first image alignment unit 219 receives Bayer data from the line buffer 11 and arranges the Bayer data for each wavelength band to generate a first array data, and then may transmit the first array data that has been generated, to the deep learning processor 222.
[0121] The first image alignment unit 219 may generate a first array data arranged by classifying the received information into specific wavelengths or specific colors of red, green, and blue.
[0122] Thereafter, the deep learning processor 222 may generate a second array data by performing deep learning based on the first array data received through the first image alignment unit 219.
[0123] Specifically, performing deep learning may mean a process of generating an algorithm through inference or iterative learning in order to generate an optimal algorithm as described previously with reference to
[0124] Accordingly, the deep learning processor 222 may perform deep learning based on the first array data received through the first image alignment unit 219 to generate second array data having a second resolution higher than the first resolution.
[0125] For example, if first array data is received for the 3×3 area as described previously, deep learning is performed for the 3×3 area, and if the first array data is received for the (n+1)×(n+1) area, deep learning may be performed for the (n+1)×(n+1) area.
[0126] Thereafter, the second array data generated by the deep learning processor 222 is transmitted to the second data alignment unit 223, and the second data alignment unit 223 may convert the second array data into second Bayer data having a Bayer pattern.
[0127] After that, the converted 2nd Bayer data is outputted to the outside through a plurality of line buffers 12a, and the outputted 2nd Bayer data can be generated as an image having a second resolution higher than the first resolution by another process.
[0128]
[0129] When the user selects a specific region in the image 10 having the first resolution, the processor 220 may perform the deep learning described previously for the region, and as a result of this, as illustrated in
[0130]
[0131] In
[0132] However, the second image having a second resolution is generated by the processor 220, and the algorithm performed by the processor 220 is an algorithm for generating an image having a specific resolution, and generally cannot be changed in real time.
[0133] For example, if the deep learning performed by the processor 220 performs an algorithm for generating a second image having a second resolution having a resolution three times higher than the first resolution, in general, a camera device can only generate an image having a resolution three times higher, but cannot generate an image having a different resolution.
[0134] Therefore, an image for magnification 3× can be generated in an optimal state, but images for other magnifications cannot be generated in an optimal state, so in this case, there is a problem in that the continuous zoom function is inevitably deteriorated.
[0135] Therefore, in the camera device 100 and the method of generating an image of the camera device according to an embodiment, even in a camera device that only supports optical continuous zoom for a specific magnification, an image for a magnification that is not supported can be generated by applying the present technology, so an object of the present invention is to provide a camera device and an image generation method of the camera device capable of effectively utilizing a continuous zoom function in a wider range of magnification.
[0136] Referring back to
[0137] After that, if the desired magnification of a user is a 2× image, the first image is magnified by 2× and then upscaling is performed. That is, since the difference in magnification is 2×, an image with upscaling corresponding to 2× is generated.
[0138] On the other hand, since the magnification of the second image is 4 times and the resolution is also 4 times, down-scaling is performed to match magnification 2×. That is, since the difference in magnification is 2 times, an image that has been down-scaled by ½ times is generated.
[0139] As the pixels of the two images become the same according to this process, a third image is generated by synthesizing the two images as illustrated in
[0140] That is,
[0141] In the case of synthesizing the third image according to a camera device 100 and an image generation method of the camera device, since high-resolution images are used for synthesizing the parts of user's interest, there is an advantage in that it is possible to generate images at various magnifications without any special sense of heterogeneity.
[0142]
[0143] According to
[0144] Accordingly, the camera device 100 according to an embodiment may generate a more natural image without a sense of heterogeneity by performing a resolution correction process on a boundary region, that is, a preset range.
[0145] Looking at this in detail, in the case of area (6a) in
[0146] That is, when the ratio of the first resolution and the second resolution is mixed in the boundary regions (6a) and (6b), the generated resolution has a resolution value between the first resolution and the second resolution, and thus there is no abrupt difference in resolution between the two images, thereby reducing the sense of heterogeneity.
[0147] In addition, based on (6a), the ratio of the first resolution is raised to be higher than the ratio of the second resolution in the region near the first resolution, and the correction for the region near the second resolution can be performed in a way that the ratio of the second resolution is raised to be higher than the ratio of the first resolution.
[0148] That is, the correction may be performed in a way that the uppermost portion of (6a) has the highest first resolution ratio, and then decreases the first resolution ratio and increases the second resolution ratio as it travels downward. In this case, since the value of the resolution is gradually changed, more natural correction can be performed.
[0149] In addition, in
[0150] In addition, the range of (6a) is a preset range, which can be freely set by a user, and may be determined according to the number of line buffers. That is, when the number of line buffers is 4, correction may be performed based on 4 pixels as illustrated in
[0151] Up to now, a camera device 100 and an image generation method of the camera device have been described through the drawings.
[0152] In the case of digital zoom according to the prior art, since an image is generated by simply magnifying a portion of a less magnified image, as the magnification increases, the practical number of pixels decreases, leading to performance degradation. Since zoom using the optical system of a small camera is also not a method of continuously moving a lens, there is a problem that an image cannot be generated using a lens other than a specifically fixed magnification.
[0153] In a camera device and an image generation method of the camera device according to an embodiment, since a high-resolution image is implemented by using a processor to which an algorithm capable of generating a high-resolution image is applied, there is an advantage in that a high resolution image can be implemented by using only a few line buffers.
[0154] In addition, even in a camera device that only support optically continuous zoom for a specific magnification, images for unsupported magnifications can be generated by applying this technology, so there is an effect in that a practically continuous zoom function in a wider range of magnification can be utilized.
[0155] Although the embodiments so far have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, appropriate results can be achieved if the described techniques are performed in a different order from the described method, and/or components of the described systems, structures, devices, circuits, and the like are combined or combined in a manner different from the described method, or even if substituted or substituted by other components or equivalents. Therefore, other embodiments and equivalents to the claims also fall within the scope of the claims to be described later.