METHOD FOR DETERMINING ONE OR MORE GROUPS OF EXPOSURE SETTINGS TO USE IN A 3D IMAGE ACQUISITION PROCESS

Abstract

A method for determining one or more groups of exposure settings to use in a 3D image acquisition process carried out with an imaging system, the 3D image acquisition process comprising capturing one or more sets of image data on the image sensor using the respective groups of exposure settings, wherein the one or more sets of image data are such as to allow the generation of one or more 3D point clouds defining the three-dimensional coordinates of points on the surface(s) of one or more objects being imaged, each group of exposure settings specifying a value for one or more parameters of the imaging system, wherein the method comprises identifying one or more candidate groups of exposure settings and selecting from the candidate groups of exposure settings, one or more groups of exposure settings that satisfy one or more optimization criteria.

Claims

1. A method for determining one or more groups of exposure settings to use in a 3D image acquisition process carried out with an imaging system, the imaging system comprising an image sensor and the 3D image acquisition process comprising capturing one or more sets of image data on the image sensor using the respective groups of exposure settings, wherein the one or more sets of image data are such as to allow the generation of one or more 3D point clouds defining the three-dimensional coordinates of points on the surface(s) of one or more objects being imaged, each group of exposure settings specifying a value for one or more parameters of the imaging system that will affect the amount of signal reaching the image sensor, the method comprising: (i) identifying, using image data captured on the image sensor, one or more candidate groups of exposure settings; (ii) for each candidate group of exposure settings: determining an amount of signal likely to be received in different pixels of the image sensor in the event that the candidate group of exposure settings is used to capture a set of image data for use in the 3D image acquisition process, determining, based on the amount of signal likely to be received in the different pixels, whether or not the respective pixels would be well-exposed pixels if using the candidate group of exposure settings, wherein a well-exposed pixel is one for which the value of a quality parameter associated with that pixel is above a threshold, wherein the value of the quality parameter for a pixel reflects a degree of uncertainty that would be present in the three dimensional coordinates of a point in a point cloud associated with that pixel, in the event that the point cloud were to be generated using the set of image data captured with the candidate group of exposure settings; determining an exposure cost, wherein the exposure cost is derived from the values of the one or more parameters in the candidate group of exposure settings; and (iii) selecting, from the one or more candidate groups of exposure settings, one or more groups of exposure settings to be used for the 3D image acquisition process, the selection being such as to satisfy one or more optimization criteria, wherein the one or more optimization criteria are defined in terms of: (a) the number of pixels in the set N, wherein a pixel will belong to the set N if there is at least one selected group of exposure settings for which the pixel is determined as being a well-exposed pixel; and (b) the exposure cost(s) for the one or more selected groups of exposure settings.

2. A method according to claim 1, wherein for each candidate group of exposure settings, the method comprises: identifying one or more alternative candidate groups of exposure settings for which the one or more parameters of the imaging system have different values, but for which the amount of signal expected to be received at the image sensor is the same; and for each alternative candidate group of exposure settings, determining an exposure cost, wherein the exposure cost is derived from the values of the one or more parameters in the alternative candidate group of exposure settings; wherein the one or more alternative candidate groups of exposure settings are available to be selected for use in the 3D image acquisition process.

3. A method according to claim 1, wherein the selection of the one or more candidate groups of exposure settings is such as to ensure that a ratio of the number of pixels in the set N and the exposure cost(s) for the one or more selected groups of exposure settings meet a criterion.

4. A method according to claim 1, wherein the selection of the one or more candidate groups of exposure settings is such as to ensure that: (a) the number of pixels in the set N meets a first criterion; and (b) the exposure cost(s) for the one or more selected groups of exposure settings meet a second criterion.

5. A method according to claim 4, wherein the first criterion is to maximise the number of pixels that belong to the set N.

6. A method according to claim 4, wherein the first criterion is to ensure that the number of pixels that belong to the set N is above a threshold.

7. A method according to claim 3, wherein the second criterion is to minimise the sum of the exposure costs.

8. A method according to claim 3, wherein the second criterion is to ensure that the sum of the exposure costs for each of the selected groups of exposure settings is beneath a threshold.

9. A method according to claim 1, wherein for one or more of the candidate groups of exposure settings, the step of determining an amount of signal likely to be received in different pixels of the image sensor in the event that the candidate group of exposure settings is used to capture a set of image data comprises capturing a set of image data with the candidate group of exposure settings.

10. A method according to claim 9, wherein the set(s) of image data captured when using the one or more candidate groups of exposure settings are used to identify one or more other candidate groups of exposure settings.

11. A method according to claim 4, wherein steps (i) to (iii) are repeated through one or more iterations, wherein for each iteration: a single one of the candidate group of exposure settings identified in that iteration is selected; and the first criterion is to maximise the number of pixels in the set N and the second criterion is that the sum of the exposure cost for the group of exposure settings selected in the present iteration and the respective exposure costs for the groups of exposure settings selected in all previous iterations is below a threshold.

12. A method according to claim 11, wherein for each iteration, the selected group of exposure settings is used to capture a set of imaging data with the imaging system; wherein for each iteration from the second iteration onwards, the set of image data captured in the previous iteration is used in determining the candidate groups of exposure settings for the present iteration.

13. A method according to claim 12, wherein the step of determining whether or not respective pixels would be well-exposed pixels if using a candidate group of exposure settings comprises determining a probability that the respective pixels will be well exposed, the probability being determined based on the amount of signal received in those pixels in previous iterations of the method.

14. A method according to claim 1, wherein the exposure cost for each group of exposure settings is a function of the exposure time used in that group of settings.

15. A method according to claim 14, wherein the step of identifying one or more candidate groups of exposure settings comprises determining, for one or more pixels of the image sensor, a range of exposure times for which the pixel is likely to be a well exposed pixel.

16. A method according to claim 1, wherein the value of the quality parameter associated with a pixel is determined based on the amount of ambient light in the scene being imaged.

17. A method according to claim 1, wherein each group of exposure settings comprise one or more of: the exposure time of the image sensor; the size of an aperture stop in the path between the object and the sensor; an intensity of light used to illuminate the object; and the strength of an ND filter placed in the light path between the object and the sensor.

18. (canceled)

19. (canceled)

20. A method according to claim 1, wherein the image data in each set of image data comprises one or more 2D images of the object as captured on the sensor.

21. A method according to claim 1, wherein the imaging system is one that uses structured illumination to obtain each set of image data.

22. (canceled)

23. (canceled)

24. (canceled)

25. A method for generating a 3D image of one or more objects using an imaging system comprising an image sensor, the method comprising: capturing, on the image sensor, one or more sets of image data using respective groups of exposure settings, the sets of image data being such as to allow the generation of one or more 3D point clouds defining the three-dimensional coordinates of points on the surface(s) of the one or more objects, each group of exposure settings specifying a value for one or more parameters of the imaging system that will affect the amount of signal reaching the image sensor; and constructing a 3D point cloud using the data from one or more of the captured sets of image data; wherein the exposure settings used for capturing each set of image data are determined using a method according to claim 1.

26. A computer readable storage medium comprising computer executable code that when executed by a computer will cause the computer to carry out a method according to claim 1.

27. An imaging system for performing a 3D image acquisition process by capturing one or more sets of image data with one or more groups of exposure settings, the one or more sets of image data being such as to allow the generation of one or more 3D point clouds defining the three-dimensional coordinates of points on the surface of one or more objects being imaged, the imaging system comprising an image sensor for capturing the one or more sets of image data, the imaging system being configured to determine the one or more groups of exposure settings to use for the 3D image acquisition process by carrying out a method in accordance with claim 1.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0071] Embodiments of the invention will now be described by way of example with reference to the accompanying drawings in which:

[0072] FIG. 1 shows an example of a conventional point cloud matrix;

[0073] FIG. 2 shows an illustration of how multiple point cloud matrices may be merged or combined to form a single point cloud matrix;

[0074] FIG. 3 shows an example of a point cloud matrix in an embodiment described herein;

[0075] FIG. 4 shows a flow-chart of steps in an embodiment described herein;

[0076] FIG. 5A shows a schematic of a structured illumination imaging system in an embodiment;

[0077] FIG. 5B shows a schematic of the geometry of the structured illumination imaging system of FIG. 5A;

[0078] FIG. 6 shows an example of how the standard deviation in a GCPS value may vary as a function of signal amplitude, in an embodiment in which a combination of Gray-coding and phase shifting is used to recover 3D spatial information from an object;

[0079] FIG. 7 shows a schematic of how multiple input point cloud matrices may be obtained using the imaging system of FIG. 5 and used to generate a single output point cloud matrix; and

[0080] FIG. 8 shows an example of a point cloud matrix in an embodiment described herein

[0081] FIG. 9 shows a graph of how the depth noise for a point in a 3D image varies as a function of the contrast obtained in a sequence of images used to generate the 3D image;

[0082] FIG. 10 shows a histogram of the number of well-exposed pixels in an image for different candidate exposure times.

[0083] FIG. 11 shows a histogram of the number of well-exposed pixels in an image for different candidate exposure times.

[0084] FIG. 12 shows a series of images representing the number of well-exposed pixels obtained from cumulative exposures of the image sensor; and

[0085] FIG. 13 shows pseudocode for implementing an algorithm according to an embodiment.

DETAILED DESCRIPTION

[0086] FIG. 4 shows a flow-chart of steps carried out in embodiments described herein. In a first step S401, a plurality of sets of 3D image data are acquired by an imaging system. Each set of 3D image data can be used to compute a respective input point cloud that defines the 3D coordinates of different points on the surface of an object being imaged, together with a value for the intensity or brightness level of the surface at each point.

[0087] In step S402, the value of a quality parameter is evaluated for the data associated with each point in each one of the respective input point clouds. As discussed above, the value of the quality parameter comprises a measure of the uncertainty in the three-dimensional co-ordinates at each point. The quality parameter may be computed as a function of the acquired intensity values that are used to calculate the spatial coordinates at each point in the respective input point clouds.

[0088] In step S403, a single output set of 3D image data is computed based on the image data contained in each one of the acquired sets of image data. In common with the acquired image data sets, the output set of image data defines values for the 3D coordinates of different points on the surface of the object being imaged, together with the intensity or brightness level of the surface at each point. Here, the values for the 3D coordinates are computed by weighting the values for the 3D coordinates specified in the respective input point clouds, in accordance with their respective quality parameter values. The output image data set can then be used to render a 3D image of the object (step S404).

[0089] An example embodiment using a structured light illumination method to acquire 3D image data will now be described with reference to FIGS. 5 to 7.

[0090] Referring to FIG. 5A, there is shown a schematic of a system suitable for capturing 3D images of an object 501 using structured light illumination. The system comprises a projector 503 and a camera 505. The projector is used to project a spatially varied 2D illumination pattern onto the object 40. The pattern itself comprises a series of light and dark fringes 507, 509. The pattern may be generated using a spatial light modulator, for example. The camera 505 is used to acquire a 2D image of the object as illuminated by the projector.

[0091] FIG. 5B shows a simplified diagram of the system geometry. The camera and projector are located a distance B apart. A point on the object that lies a distance D away from the camera and projector is located at an angle θ.sub.c from the camera and θ.sub.p from the projector. Owing to the angle between the camera and the projector, any variations in the surface topology of the object will cause the pattern of light and dark fringes, as detected by the camera, to become distorted. By corollary, the distortion in the pattern will encode information about the 3D surface of the object, and can be used to deduce its surface topology. The 3D information can be recovered by capturing sequential images in which the object is illuminated with different patterns of light, and comparing the measured intensity for each pixel across the sequence of images.

[0092] In the present embodiment, a phase shifting technique is used to obtain the 3D information. Phase shifting is a well-known technique in which a sequence of sinusoidally modulated intensity patterns is projected onto the object, with each pattern being phase shifted with respect to the previous one. A 2D image of the illuminated object is captured each time the intensity pattern is changed. Variations in the surface topology of the object will give rise to a change in the phase of the intensity pattern as seen by the camera at different points across the surface. By comparing the intensities of light in the same pixel across the sequence of 2D images, it is possible to compute the phase at each point, and in turn use this to obtain depth information about the object. The data is output as a 2D array, in which each element maps to a respective one of the pixels of the camera, and defines the 3D spatial coordinates of a point as seen in that pixel.

[0093] It will be appreciated that other techniques, besides phase shifting, may also be used to recover the 3D spatial information; for example, in some embodiments, a Gray-coding technique may be used, or a combination of Gray-coding and phase shifting. The precise algorithms used to decode the 3D spatial information from the sequence of 2D images will vary depending on the specific illumination patterns and the way in which those patterns are varied across the sequence of images; further information on algorithms for recovering the depth information using these and other techniques is available in the publication “Structured light projection for accurate 3D shape determination” (O. Skotheim and F. Couweleers, ICEM12-12.sup.th International Conference on Experimental Mechanics 29 August-2 September, 2004, Politecnico di Bari, Italy). In each case, the 3D spatial information in the object is computed by considering the variation in intensities at each point on the object as the illumination pattern changes and the points are exposed to light and dark regions of the pattern.

[0094] In one example, in which a combination of Gray-coding and phase shifting is used to recover the 3D spatial information, the projector is used to project a series of both Gray-code and phase-shifted patterns onto the object. (Further details of such a method can be found, for example, in an article by Giovanna Sansoni, Matteo Carocci and Roberto Rodella, entitled “Three-dimensional vision based on a combination of Gray-code and phase-shift light protection: analysis and compensation of the systematic errors”—Applied Optics, 38, 6565-6573, 1999). Here, for each point on the object, two corresponding pixel positions can be defined: (i) the projector pixel coordinate i.e. a pixel position in the projector from which the light that is incident on that point on the object is emanating, and (ii) the camera pixel coordinate i.e. the pixel position in the camera at which the light reflected by that point on the object is captured. Using a suitable algorithm, and taking into account the relative positions of the camera and projector (these relative positions being determined straightforwardly using a standard calibration measurement), the images captured at the camera can be processed in order to determine, for each camera pixel, the corresponding projector pixel coordinate. In effect, a determination can be made as to which projector pixel a particular camera pixel is “looking at”. Moreover, by combining the image data received from the Gray-code patterns and phase shifted patterns, the projector pixel coordinate can be determined with higher resolution than the projector pixels themselves.

[0095] The above methodology can be understood as follows. First, by choosing a number of N Gray code patterns, and setting the number of sinusoidal fringes in the phase shifting patterns to 2.sup.N, the fringes can be aligned with the binary transitions in the sequence of N Gray code patterns. The resulting Gray code words, GC(i, j) and the values obtained for the phase, ϕ(i,j), can be combined to form a set of “GCPS” values that describe the absolute fringe position in each position in the field of view. The GCPS values can in turn be used to determine the projector pixel coordinates by performing a scaling of the values from a minimum/maximum of the code to the width (w.sub.p) and height (h.sub.p) of the projector image; in effect, one is able to measure the “fringe displacement” by estimating the phase of the sine patterns in every pixel in the camera.

[0096] Next, it is possible to define:

[00002] $α (i, j) = G C P S_{v} (i, j) = G C_{v} (i, j) + \frac{1}{2 π} ϕ_{v} (i, j)$

where, GC.sub.v(i, j) is the result of the Gray code measurements and ϕ.sub.v(i,j) is a result of the phase stepping measurements, both performed with vertical fringes. (As before, the indices i,j refer to the pixel elements of the image sensor). From the equation above, it is then possible to calculate the originating subpixel projector column for each pixel in the camera image:

[00003] $P_{c} (i, j) = 0.5 + (w_{p} - 1) \frac{α (i, j) - α_{\min}}{α_{\max} - α_{\min}}$

where α.sub.min and α.sub.min are the maximum and minimum values for the GCPS code for vertical fringes. Similarly, when obtaining GCPS values using horizontal fringes, it is possible to define:

[00004] $β (i, j) = G C P S_{h} (i, j) = G C_{h} (i, j) + \frac{1}{2 π} ϕ_{h} (i, j)$

[0097] Then, using the equation for β(1, j), it is possible to calculate the originating subpixel projector row for each pixel in the camera image by:

[00005] $P_{r} (i, j) = 0.5 + (h_{p} - 1) \frac{β (i, j) - β_{\min}}{β_{\max} - β_{\min}}$

where β.sub.min and β.sub.min are the maximum and minimum values for the GCPS code for horizontal fringes.

[0098] Having obtained the subpixel projector column and row coordinates P.sub.c(i, j), P.sub.r(i, j), those values can be used to obtain the {x,y,z} coordinates of points on the object being imaged; specifically, for a given camera pixel p, which is established to be receiving light from a point on the projector g, a position estimate E of a point on the object having coordinates {x.sub.ij y.sub.ij, z.sub.ij} can be derived by using known triangulation methods, akin to those used for stereo vision, taking into consideration the lens parameters, distance between the camera and projector etc.

[0099] The uncertainty in each GCPS value will be largely influenced by the amplitude of the recovered signal and to a lesser degree the presence of ambient light. Experiments carried out by the inventors have shown that the uncertainty in the GCPS value is typically fairly constant until the amplitude of the received signal drops beneath a certain level, after which the uncertainty increases nearly exponentially; this means that the measured amplitude and ambient light can be translated through a pre-established model into an expected measurement uncertainty of the GCPS value—an example is provided in FIG. 6, which shows the standard deviation (std) in the GCPS value as a function of intensity. By using the same calculation as used to obtain the position estimate E above, but now taking the projector pixel position to be g+Δg, where Δg is derived from the standard deviation in the GCPS value at the detected signal amplitude, it is possible to obtain a new position estimate E′. An estimated standard deviation ΔE in the position measurement E can then be derived by assuming that ΔE=|(|E−E′|)|. The estimate ΔE can then be used to define the quality parameter.

[0100] It will be clear that regardless of precisely which algorithm is used for the structured illumination, in order to compute the 3D spatial information with high accuracy, it will be desirable to measure the variation in intensity at each point on the object with maximal signal to noise; this in turn requires that the contrast in intensity that is seen as a particular point is alternatively exposed to light and dark regions of the illumination pattern should be as high as possible. In line with the earlier discussion, the extent of contrast at a particular point may be limited by the need to accommodate a large range of intensities across the surface of the object; if the surface of the object itself comprises a number of bright and dark regions, then given the finite dynamic range of the camera, it may not be possible to maximize the signal recovered from dark regions of the object's surface without also incurring saturation effects where brighter regions of the object are exposed to the brighter fringes in the illumination pattern. It follows that it may not be possible to optimize the contrast in the intensities seen at every point on the surface in a single exposure (in this context, an “exposure” will be understood to refer to the capture of a sequence of 2D images from which a respective point cloud can be computed).

[0101] In order to address the above problem, in embodiments described herein, a number of exposures are taken using different settings. An example of this is shown pictorially in FIG. 7. To begin with, a first sequence of 2D images is captured using a first set of exposure settings. In the present example, the exposure settings are varied by altering the relative size of the camera aperture, although it will be appreciated that the exposure settings may also be varied by adjusting one or more other parameters of the imaging system. The circle 701 shows the relative size of the camera aperture as used to capture the first sequence of images, and the patterns 703a, 703b, 703c show the illumination patterns that are projected onto the object when capturing the respective images in the first sequence of images. As can be seen, each illumination pattern 703a, 703b, 703c comprises a sinusoidally modulated intensity pattern in which a series of alternatively light and dark fringes are projected on to the object. The phase of the pattern across the field of view is illustrated schematically by the wave immediately beneath each pattern, with the dark fringes in the pattern corresponding to the troughs in the wave, and the bright fringes corresponding to the peaks in the wave. As can be seen, the illumination patterns in the successive images are phase shifted with respect to one another, with the positions of the bright and dark fringes in each pattern being translated with respect to one another. A suitable algorithm is used to compare the intensities of light in the same pixel across the sequence of 2D images, in order to compute the depth information at that point. The three-dimensional spatial information is then stored, together with the intensity value for each point, in a first point cloud matrix 705.

[0102] In addition to the three-dimensional coordinates {x, y, z} and intensity values I, each element in the point cloud matrix 705 includes a value for the quality parameter q, as shown in FIG. 3 earlier. In the present embodiment, the value q is determined based on the standard deviation of the observed intensities in the respective camera pixel I.sub.n (i,j) seen as the three different illumination patterns 703a, 703b, 703c are projected on the object. In another embodiment, the value q is determined based on the difference between the maximum value of intensity and minimum intensity in that pixel across the sequence of images. In this way, the quality parameter defines a measure of the contrast seen in each pixel across the sequence of images.

[0103] In the next stage of the process, the exposure settings are adjusted by expanding the size of the camera aperture, as reflected by the circle 707, thereby increasing the amount of light from the object that will reach the camera. A second sequence of 2D images is captured, with the illumination patterns 709a, 709b, 709c again being varied for each image in the sequence. The second sequence of 2D images is used to compute a second input 3D image point cloud 711 that records the depth information for each element, together with an intensity value and a value for the quality parameter q. Owing to the difference in exposure between the first sequence of images and the second sequence of images, it is likely that the degree of contrast seen in a given pixel element {i, j} will vary across the two sequences of images; for example, the difference between the maximum intensity level and minimum intensity level detected in a given pixel element {i, j} will vary between the two sets of images. Accordingly, the values of q recorded in the first point cloud matrix 705 will likely differ from the values of q in the second point cloud matrix 711.

[0104] In a further step, the exposure settings are further adjusted by expanding the size of the camera aperture as shown by circle 713. A third sequence of 2D images is captured on the camera, with the illumination patterns 715a, 715b, 715c being projected onto the object. The third sequence of 2D images is used to compute a third input 3D image point cloud matrix 717, which again records the depth information for each element, together with an intensity value and a value for the quality parameter q. As in the case of the first and second point cloud matrices, the difference in exposure between the second sequence of images and the third sequence of images means that it is likely that the degree of contrast seen in the same pixel element {i, j} will vary across the second and third sequences of images. Thus, the values of q recorded in the third point cloud matrix 717 will likely differ from both the first point cloud matrix 705 and the second point cloud matrix 711.

[0105] Having computed the point cloud matrices for each sequence of images, the method proceeds by using the data in the respective point cloud matrices to compile a single output point cloud matrix 719, which can then be used to render a 3D representation of the object.

[0106] It will be understood that, whereas the example shown in FIG. 7 includes a total of three input sequences of images, this is by way of example only and the output point cloud matrix may be computed by capturing any number N of point clouds, where N>=2. Similarly, whilst in the above described example the exposure was varied by increasing the size of the camera aperture in-between capturing each sequence of images, it will be readily understood that this is not the only means by which the exposure may be varied—other examples of ways in which the exposure may be varied include increasing the illumination intensity, increasing the camera integration time, increasing the camera sensitivity or gain, or by varying the strength of an neutral density filter placed in the optical path between the object and the camera.

[0107] As previously discussed, whilst the example shown in FIG. 7 includes the step of generating individual point clouds for each set of acquired image data, it will be understood that this is by way of example only and in some embodiments, the output point cloud may be generated without the need to construct the individual point clouds for each set of image data. In such cases, the values for the quality parameter associated with the pixels in each respective set of image data may be deduced from the signal levels detected in those pixels, as well as the amount of ambient light incident on the image sensor.

[0108] By capturing multiple sets of image data with different groups of exposure settings, and combining the data from those data sets to provide a single output 3D image, embodiments as described herein can help to compensate for the limits in dynamic range of the camera, providing an enhanced signal to noise ratio for both darker and brighter points on the object surface. In so doing, embodiments can help to ensure that useable data is captured from areas that would, if using conventional methods of 3D surface imaging, be either completely missing from the final image or else dominated by noise, and can ensure that the surface topography of the object is mapped with greater accuracy compared with such conventional methods of 3D surface imaging.

[0109] In the embodiments described above, it has been assumed that the camera sensor is imaging in grayscale; that is, a single intensity value is measured in each pixel, relating to the total light level incident on the sensor. However, it will be understood that embodiments described herein are equally applicable to color imaging scenarios. For example, in some embodiments, the camera sensor may comprise an RGB sensor in which a Bayer mask is used to resolve the incident light into red, green and blue channels; in other embodiments, the camera sensor may comprise a three-CCD device in which three separate CCD sensors are used to collect the respective red, blue and green light signals. In this case, the point cloud matrices will be acquired in the same manner as in the above described embodiments, but within each matrix element, the intensity values I.sub.ij will be decomposed into the three colour intensities r.sub.ij, g.sub.ij, b.sub.ij. FIG. 8 shows an example of such a point cloud matrix.

[0110] As previously discussed, the data used for rendering the final 3D image may be obtained by varying one or more exposure settings, including the illumination intensity, camera integration time (exposure time), increasing the camera sensitivity or gain, or by varying the strength of an neutral density filter placed in the optical path between the object and the camera, for example. There will thus exist a large number of combinations of different settings that can be used for any one exposure. Some of these combinations may offer a more optimal solution than others. For example, in some cases, it might be desirable to choose a group of exposure settings that will minimise exposure time; in other cases, there may be additional or different considerations, such as a need to keep the size of the aperture constant to avoid changes in depth-of-field, which will impose other constraints in terms of which parameters of the imaging system are varied, and by how much. In general, when acquiring an image, it will usually be necessary to strike a balance between achieving an acceptable SNR in the image (in particular, the noise in the depth values for each point on the surface of the object being imaged), and one or more exposure requirements, such as (i) the overall duration of the acquisition, (ii) the illumination intensity required for the acquisition, (iii) the aperture size etc.

[0111] It should be noted that 3D measurement systems employing active illumination (e.g. structured light, laser triangulation, time-of-flight) differ from regular cameras in that the amount of ambient light highly influences the dynamic range of the system if a desired maximum noise level is to be achieved. This means that traditional auto-exposure algorithms cannot be used directly—they typically optimize only for that the total signal level is sufficient, whilst active 3D measurement systems instead must enforce a correct ratio between emitted light and ambient light whilst simultaneously avoiding saturation. Finding good exposure sets manually, meanwhile, is a complex endeavour, as they require the user to have a complete mental model of the camera and its parameter sets.

[0112] It is desirable, therefore, to provide a means for determining which exposure settings to vary, and by how much, in order to optimize the data quality for a given scene or object. As discussed above in relation to FIG. 5B, the improvement in noise for any given pixel will fall off as the contrast increases beyond a certain point. Embodiments described herein can use this fact that to determine a suitable set of exposure settings, against the backdrop of one or more imaging constraints (e.g. a maximum overall exposure time, maximum aperture size, etc.)

[0113] In more detail, we can specify a target in terms of the value of the quality parameter to be obtained for pixels in the final 3D image, where that target is to be achieved subject to one or more imaging constraints. The goal is to try to optimize the final 3D image in terms of noise, whilst imposing one or more constraints (“costs”) such as a maximum total exposure time, or total illumination power, for example.

[0114] We can begin by defining an exposure cost E.sub.Cost that is a function of one or more exposure settings of the system, where the exposure settings in questions are ones that dictate the amount of light that is incident on the camera:

E.sub.Cost=f.sub.1(exposure time)+f.sub.2(aperture size)+f.sub.3(neural density filter strength)+f.sub.4(illumination intensity)+ . . .

[0115] Here, the functions {f.sub.1, f.sub.2, . . . f.sub.n} define how the cost of the exposure varies with each respective parameter. The functions can be user-defined and effectively define the “downside” to the user in varying each parameter. As an example, if it is desirable to capture a 3D image in a very short space of time, then the user may apportion a high cost to exposure time. In another example, if the user is not time-limited, but wishes to keep the overall power usage to a minimum, they may apportion a high cost to illumination intensity. The value of E.sub.Cost can be used to provide a constraint in determining the optimum exposure settings for a given acquisition.

[0116] We can also define the term E.sub.value as an exposure value that indicates how much light that reaches the sensor, both in terms of ambient light and light from the sensor system itself:

E.sub.value=e.sub.1(exposure time).Math.e.sub.2(aperture size).Math.e.sub.3(neural density filter strength).Math.e.sub.4(illumination intensity).Math. . . .

[0117] E.sub.value serves to incorporate numerous effects that will affect how much signal is received by the camera, and thus the quality of each individual pixel in the system. E.sub.value can also be extended such that it returns two exposure values—one indicating the exposure value for the ambient light (E.sub.ambient) and one indicating the exposure value for the projected light (E.sub.amplitude).

[0118] We can further determine relationships between the functions {e.sub.1, e.sub.2, . . . e.sub.n} where the relationship between each pair of functions defines the extent to which modifying one parameter will alter the amount of light incident on the camera, relative to modifying the other parameter. As an example, in terms of increasing the amount of light incident on the camera, the step of doubling the exposure time may be equivalent to doubling the aperture size. In another example, in terms of increasing the amount of light incident on the camera, the step of doubling the aperture size may be equivalent to reducing the neural density filter strength by a factor of 4. The functions {e.sub.1, e.sub.2, . . . e.sub.n} may be defined so as to take these relationships into consideration.

[0119] The functions {e.sub.1, e.sub.2, . . . e.sub.n} and the relationships between them may be determined empirically offline by experiment. Knowledge of the relationships between these functions is useful because it can allow one to translate a change in one parameter value to other parameter values; for example, in the event that one is seeking to achieve a particular value for E.sub.value, and is able to determine a change in exposure time that will achieve that E.sub.value, it becomes possible to translate that change in exposure time into a change in the size of the aperture that will then have the same effect in terms of E.sub.value. As will become apparent below, this is advantageous because it can simplify the determination of the exposure settings for an acquisition by focusing on one parameter only (typically, the exposure time), and then translating the change(s) in exposure time into values of the other parameters according to the user's particular needs (e.g. a desire to minimize overall exposure time versus a desire to minimize aperture size etc.).

[0120] In what follows, we provide two examples of how the exposure settings may be determined, based on the values E.sub.value, E.sub.Cost and q.sub.min, where q.sub.min is the minimum acceptable value of the quality parameter.

[0121] In the first example, we set the target of finding exposure settings for each image, where those exposure settings will maximize the number of pixels with quality parameter value q(p)>q.sub.min in the final image, subject to the condition that the sum of the exposure costs across the sequence of exposures is less than a predefined maximum cost E.sub.Cost.sub.max. In other words, given a set of n exposures, in which each exposure i is acquired with a group of exposure settings having an associated exposure cost E.sub.Cost.sub.i we wish to find a set of exposures that will maximize the total number of pixels for which q(p)>q.sub.min in the final image subject to the condition that: Σ.sub.i.sup.n E.sub.Cost.sub.i≤E.sub.Cost.sub.max, or where the ratio of number of pixels for which q(p)>q.sub.min versus Σ.sub.i.sup.n E.sub.Cost.sub.i is maximized.

[0122] As a second example, we set a target of finding a set of exposure settings for each image that will minimize the sum of the exposure costs across the sequence of exposures, whilst ensuring that a threshold number of pixels in the final 3D image will have a quality parameter value q(p)>q.sub.min.

[0123] In what follows, we will discuss different strategies for satisfying these targets. In each case, we can restrict the target to use a particular 2D or 3D region of a given scene.

[0124] For many 3D measurement systems, there is a roughly (√{square root over (c)}).sup.−1 relationship between the received contrast/signal c and the noise. Contrast indicates the amplitude/signal level of the active illumination employed by the camera system. Furthermore, the ambient light also influences the amount of light incident in the camera. The contrast is typically related to how much light is collected; as an example, doubling the exposure time is likely to double the contrast (and the ambient light). Referring to FIG. 9, it can be seen that there is almost a “plateau” when the contrast exceeds a certain value (e.g. 50) where the measurement noise does not improve significantly with contrast. Furthermore, for many applications there is a fixed limit on what level of noise that can be tolerated (e.g. 0.15 mm). In the exposure reflected in FIG. 9, for example, exposures with contrast exceeding 40 could be considered to be “good enough”. It can also be seen that the minimum contrast that can be tolerated also depends on the amount of ambient light present.

[0125] Referring back to the illustration in FIG. 5B, the distance uncertainty of a point σ.sub.SL being measured by structured light system can be estimated using the following formula:

[00006] $σ_{SL} = \frac{D^{2}}{B} \frac{\sin (θ_{c})}{\sin (θ_{p})} \frac{FOV}{ϕ_{\max}} \frac{\sqrt{A + 2 C}}{A}$

Where D is the distance to the point being measured, B is the camera-projector baseline, and θ.sub.c and θ.sub.p are the camera angle and projector angle, respectively. The value FOV is the field-of-view of the projector, ϕ.sub.max is the number of sine waves projected, A is the amplitude of the observed sinewaves and C is the fixed light signal received i.e. the ambient illumination+DC level of the emitted sine wave (Bouquet, G., et al, “Design tool for TOF and SL based 3D cameras,” Opt. Express 25, 27758-27769 (2017), the content of which is incorporated herein by reference).

[0126] There is, however, also the issue of sensor saturation. If we define S.sub.max as the maximum signal level the sensor can accommodate without saturating, then when A+C>S.sub.max, σ.sub.SL will quickly deteriorate. There will be first a drop in quality when only parts of the sine wave can be recovered, a further drop when only the Gray code part of the code can be deciphered, followed by a complete loss of information once the sensor is fully saturated. It will be appreciated that when capturing data with multiple exposure settings, one does not have to always capture both Gray Codes and the Phase images/sine waves. As the Gray codes are more robust to saturation, one can capture the Gray Codes using one exposure setting, and use that Gray code combined with phase images captured at multiple exposure settings. This saves time, as one saves the effort required of multiple Gray code recaptures.

[0127] In order to take into account the possibility of saturation, the previous equation can be rewritten as:

[00007] $σ_{SL} = {\begin{matrix} \frac{D^{2}}{B} \frac{\sin (θ_{c})}{\sin (θ_{p})} \frac{FOV}{ϕ_{\max}} \frac{\sqrt{A + 2 C}}{A}, & A + C < S_{\max} \\ \infty, & A + C > S_{\max} \end{matrix}$

[0128] It should be noted that in most cases, B and FOV can be considered to be constants.

[0129] As noted above, the exposure cost E.sub.Cost for a particular image acquisition is modelled as a function of different parameters, each on which will affect the amount of light incident on the camera. For simplicity, in what follows, we will assume that each of these parameters is kept constant, except for the exposure time, such that E.sub.Cost=t for a length of exposure t.

[0130] We can then formulate the following equations:

A′(p)=tA*(p)

C′(p)=tC*(p)

where A′(p) indicates the measured amplitude of a point p in the image, C′(p) indicates the measured ambient light of the point p, A*(p) is the exposure time independent amplitude of the point p and C*(p) is the exposure time independent ambient light of the point p.

[0131] We can think of A* and C* as normalized values in some unit of time. A* and C* can be described using the following:

A*(p)=v.sup.+(p)−v.sup.−(p)

C*(p)=v.sup.−(p)

where for each pixel p, v.sup.+(p) is a pixel signal level with the system's active illumination on (projector on for structured light), and v.sup.−(p) is a pixel signal level with the system's illumination off.

[0132] The predicted noise of the system is then given by:

[00008] $σ_{S L} = \frac{D^{2}}{B} \frac{\sin (θ_{c})}{\sin (θ_{p})} \frac{FOV}{ϕ_{\max}} \frac{\sqrt{v^{+} (p) + v^{-} (p)}}{v^{+} (p) - v^{-} (p)}$

[0133] When considered per pixel p this yields:

[00009] $σ_{SL} (p) = \frac{D^{2}}{B} \frac{\sin (θ_{c})}{\sin (θ_{p})} \frac{FOV}{ϕ_{\max}} \frac{\sqrt{v^{+} (p) + v^{-} (p)}}{v^{+} (p) - v^{-} (p)} = \frac{D^{2}}{B} \frac{\sin (θ_{c})}{\sin (θ_{p})} \frac{FOV}{ϕ_{\max}} \frac{\sqrt{v^{+} (p) + v^{-} (p)}}{v^{+} (p) - v^{-} (p)}$

[0134] We can simplify the expression for σ.sub.SL to:

[00010] $σ_{SL} (p) = X \frac{\sqrt{v^{+} (p) + v^{-} (p)}}{v^{+} (p) - v^{-} (p)}, X = \frac{D^{2}}{B} \frac{\sin (θ_{c})}{\sin (θ_{p})} \frac{FOV}{ϕ_{\max}}$

[0135] For the purpose of establishing the values of the exposure time, we can define the quality parameter q as

[00011] $q = \frac{1}{σ_{SL}} .$

The desired outcome for q>q.sub.min can then be reformulated as σ.sub.SL<σ.sub.max. This means that each pixel will have a minimum time t.sub.min that is required for sufficient exposure of that point on the object. It can be seen that the necessary t.sub.min depends not only on ambient light, but also the distance to object D and the position in scene as determined by the angles θ.sub.c and θ.sub.p. Depending on the implementation, these variables can be kept constant or reflect the actual per-pixel data.

[0136] Due to the effect of oversaturation (where σ.sub.SL>>σ.sub.max), there is also a time t.sub.max that defines the maximum time that a pixel can be exposed without experiencing saturation. In reality this means that for some pixels it will be impossible to satisfy σ.sub.max>σ.sub.SL as this would require signal levels A and C that would saturate the sensor. This could e.g. happen if the sensor system is used outdoors in sunlight or in the presence of strong light sources. In such cases, the built-in illumination would not be able to drown out the ambient light.

[0137] The values of t.sub.min and t.sub.max for each pixel can be determined in a number of ways. In a first example, the values are predicted from captured images as follows.

[0138] Assume that we have captured an image I(t.sub.0) by using an exposure time t.sub.0 and containing a pixel p.sub.0 with v.sup.+(p.sub.0) and v.sup.−(p.sub.0). Under the assumption of p.sub.0 not being fully oversaturated, or too under-saturated, we can then use the following formulas to predict t.sub.min and t.sub.max for that pixel:

[00012] $t_{\max} (p_{0}) = \frac{t_{0} S_{\max}}{v^{+} (p_{0})}$

whilst

[00013] $σ_{\max} (p_{0}) = X \frac{1}{\sqrt{t_{\min}}} \frac{\sqrt{v^{+} (p_{0}) + v^{-} (p_{0})}}{v^{+} (p_{0}) - v^{-} (p_{0})}$

yielding

[00014] $t_{\min} (p_{0}) = {(X \frac{\sqrt{v^{+} (p_{0}) + v^{-} (p_{0})}}{σ_{\max} (v^{+} (p_{0}) - v^{-} (p_{0}))})}^{2}$

[0139] If the pixel captured is oversaturated, no prediction can be done other that t.sub.max<t.sub.0. If the pixel captured is under-saturated (e.g. contrast close to zero), no prediction can be done other than that t.sub.min>t.sub.0. For under-saturation, there will, however, be a range of responses where predictions of t.sub.min and t.sub.max can be predicted (whilst σ.sub.SL(p.sub.0)>σ.sub.max).

[0140] Interestingly, there will exist situations where t.sub.min>t.sub.max. This means usually in practice that the active illumination of the system is too weak to overcome the ambient illumination to allow for imaging with sufficiently high quality. This might not be over the entire scene, e.g. the surface normal of the imaged object and its specular characteristics may be unfavourable for the setup.

[0141] In a first example, we capture a set of candidate images I.sub.init={I.sub.1, I.sub.2, . . . I.sub.n} with different exposure settings i.sub.init={E.sub.1, E.sub.2, . . . E.sub.n}. Note that, since we are only considering the exposure time in the present case, E.sub.init={t.sub.1, t.sub.2, . . . t.sub.n}. We then determine which subset of these images—and hence, candidate exposure times—offers the best result in terms of maximising the number of pixels in the final 3D image with an acceptable noise level, whilst satisfying the constraint that the total time taken is less than T.sub.max (it will be appreciated here, that since E.sub.Cost is expressed purely in terms of the exposure time, the value T.sub.max effectively corresponds to E.sub.Cost.sub.max in this example).

[0142] In the present example, in order to quickly evaluate the amount of well-exposed pixels using exposure times from a candidate exposure taken from I.sub.init we can use the following histogram-based approach. Given one or more images from I.sub.init, we compute t.sub.min and t.sub.max for each pixel. We assume that I.sub.init covers the whole dynamic range of the scene (typically around 7 stops) so that we can estimate the values of t.sub.min and t.sub.max for most of the pixels. For each pixel p, there could be several images I.sub.good.sup.p ⊂I.sub.init which give good estimates for t.sub.min and t.sub.max for p. We compute these values by either picking some I from I.sub.good.sup.p or by averaging t.sub.min and t.sub.max from the whole set I.sub.good.sup.p. Next, we create a 2D histogram H. This is a k×k matrix where both each row xϵ[0, k−1] and column y ϵ[0, k−1] represent exposure times in the range [xΔ.sub.t, (x+1)Δ.sub.t], where:

[00015] $Δ_{t} = [\frac{T^{*}}{k}]$

and T* is the largest time bin under consideration.

[0143] We define each value H(x, y) as the number of pixels for which t.sub.min is in the x-th row and t.sub.max is in the y-th column:

H(x,y)={p.sub.i|t.sub.min(p)ϵ[xΔ.sub.t,(x+1)Δ.sub.t] and t.sub.max(p)ϵ[yΔ.sub.t,(y+1)Δ.sub.t]

[0144] The histogram allows us to estimate, for a given exposure time tϵ[0, T*], the number of pixels N.sub.good that will be well-exposed using t by:

[00016] $N_{good} = {.Math.}_{y = idx (t)}^{k - 1} {.Math.}_{x = 0}^{idx (t)} H (x, y)$

[0145] Where idx(t) is the row/column number corresponding to t:

[00017] $i d x (t) = [\frac{t}{Δ_{t}}]$

[0146] Note that this holds for any exposure time in [0, T*] and not just those in I.sub.init.

[0147] An example of such a histogram is shown in FIG. 10. Here, exposure times are given in milliseconds with Δ.sub.t=5 ms. The x-axis represents t.sub.minϵ(0, 500) and the y-axis represents t.sub.maxϵ(0, 500). The region labelled R1 contains the values of H that should be summed to obtain N.sub.good for a given t′=105 ms (as represented by a point on the diagonal); in other words, the region R1 contains all pixels p for which t.sub.min(p)ϵ(0, t′) and t.sub.max(p)ϵ(t′,T*).

[0148] Given a set of exposure times in increasing order E′={t.sub.0, t.sub.1, . . . }, we can estimate:

[00018] $N_{good} (E^{'}) = {.Math.}_{y = idx (t)}^{T^{*}} {.Math.}_{x = 0}^{idx (t_{0})} H (x, y) + {.Math.}_{y = idx (t)}^{T^{*}} {.Math.}_{x = idx (t_{0}) + 1}^{idx (t_{1})} H (x, y) + .Math.$

[0149] This is illustrated in FIG. 11. Since the rectangles R1 and R2 overlap, the above formula avoids counting histogram bins twice.

[0150] To further speed up the process we can use the cumulative histogram

[00019] $H_{c} (x, y) = \underset{x′ < x, y^{'} < y}{.Math.} H (x, y)$

[0151] We can then use the following formula to quickly compute N.sub.good for the complete set of exposures (i.e. to compute the cumulative sum of pixels that have an acceptable level of noise in at least one of the captured images):

[00020] $N_{good} = {.Math.}_{y = c}^{d} {.Math.}_{x = a}^{b} H (x, y) = H_{c} (b + 1, d + 1) + H_{c} (a, c) - H_{c} (b + 1, c) - H_{c} (a, d + 1)$

[0152] This approach allows us to quickly compute N.sub.good(t) using just the sum of four numbers.

[0153] We can use the above approach for determining N.sub.good to find a set of exposure times that will satisfy the first target, i.e. that will maximize the number of pixels with quality parameter value q(p)>q.sub.min in the final image, subject to the condition that the sum of the exposure times across the image is less than T.sub.max.

[0154] In one example, we can use a “greedy” algorithm as follows:

[0155] We set: E.sub.opt=Ø

[0156] While total time of E.sub.opt is less than T.sub.max: [0157] Find EϵE.sub.init that maximizes the number of pixels with q(p)>q.sub.min for the set of exposures E.sub.opt ∪E. This can be found quickly by using the histogram approach as discussed above; [0158] Add E to E.sub.opt and remove E from E.sub.init

[0159] FIG. 12 shows the results after each round in the algorithm, with each row showing one cycle in the while-loop. After each step, we estimate G where G is a binary image that is true for all pixels that are well exposed for the exposures contained in E.sub.opt. The initial image for G (image 1) is completely black since no pixels are known to have an acceptable value of q. Then, the first exposure is selected for the scene with the maximum number of good pixels (q>q.sub.min) (see image 2). This provides an updated G (image 3). In the next round, the updated image G is used as a basis, and a new E is selected as being the one that provides the maximum amount of additional well-exposed pixels. This then results in a further updated image for G.

[0160] In the case where the cost is defined as a function of multiple ones of the imaging parameters (and not just the exposure time), the algorithm will run until the sum of costs for the chosen set of exposures exceeds E.sub.cost.sub.max instead of until the sum of exposure times exceeds T.sub.max. In more detail, having determined a candidate group of exposure settings, one can determine the value E.sub.value for that group of exposure settings, and use this to translate to/generate one or more alternative candidate groups of exposure settings for which the value of E.sub.value is the same; this can be achieved by considering the functions {e.sub.1 e.sub.2, . . . e.sub.n} as discussed earlier. As an example, one might identify a candidate group of exposure settings in which the exposure time is t.sub.1 and the aperture size is a.sub.1. One can then convert this to a second group of exposure settings in which the exposure time is t.sub.2 and the aperture size is a.sub.2 and where t.sub.2>t.sub.1 and a.sub.2<a.sub.1. These two alternatives should then provide similar outcomes in terms of the light that is collected from light and dark regions of the object being imaged. However, the two alternatives may have very different costs depending on whether the user places greater importance on keeping the overall exposure time as short as possible, or keeping the aperture size as small as possible (in other words, depending on the functions I.sub.l and f.sub.2 in E.sub.Cost).

[0161] The greedy algorithm can be easily modified to meet the second target i.e. to find a set of exposure settings for each image that will minimize the total exposure cost across the sequence of exposures, whilst ensuring that a threshold number of pixels in the final 3D image will have a quality parameter value q(p)>q.sub.min.

[0162] We set: E.sub.opt=0

[0163] While the total number of pixels for which q(p)>q.sub.min in E.sub.opt is less than N.sub.min: [0164] Find EϵE.sub.init that maximizes the number of pixels with q(p)>q.sub.min for the set of exposures E.sub.opt ∩E. This can be found quickly by using the histogram approach as discussed above; [0165] Add E to E.sub.opt and remove E from E.sub.init

[0166] It will be appreciated that in selecting the optimal set of exposure times, we need not be limited to only selecting exposure times that were present in the initial sequence of captured images I.sub.init. The histogram representation allows us to make use of the exposure times corresponding to each respective time bin in the histogram, and not just the exposure times t.sub.iϵE.sub.init. Given H(x,y), we can quickly compute a set of optimal exposure times as follows (here we will revert to the first target of maximizing the number of pixels with quality parameter value q(p)>q.sub.min in the final image, subject to the condition that the sum of the exposure times across the image is less than T.sub.max).

[0167] Let {circumflex over (t)}.sub.i be a discretized exposure time corresponding to a row/column in the histogram H. Let {circumflex over (t)}.sub.a be a maximum time such that at time {circumflex over (t)}.sub.a there are no pixels with t.sub.min<{circumflex over (t)}.sub.a (lower bound for acceptable exposure times). Similarly, let {circumflex over (t)}.sub.b be a minimum time at which there are no pixels with t.sub.max>{circumflex over (t)}.sub.b (upper bound for acceptable exposure times). Let {circumflex over (T)} be the set of acceptable discretized exposure times, {circumflex over (T)}={{circumflex over (t)}.sub.i|{circumflex over (t)}.sub.iϵ({circumflex over (t)}.sub.a, {circumflex over (t)}.sub.b)}, sorted in increasing order.

[0168] To determine an optimal set of exposure times, we proceed as follows: [0169] Identify all possible subsets {circumflex over (T)}′ of {circumflex over (T)} that have a total time less than T.sub.max. This can be achieved as follows. Let |{circumflex over (T)}|=n, we can assume that n is relatively small since it makes no sense to have less than e.g. ⅓ stops between exposures. To quickly generate all subsets that have a total time less than T.sub.max we can map all times t.sub.iϵt.sub.iϵ{circumflex over (T)} to integers k where k=1, 2, . . . , n. Then for each integer k, we list all possible distinct partitions. For example:
1={1}
2={2}
3={3}, {1+2}
4={4}, {3+1}
5={5}, {4+1}, {3+2}
6={6}, {5+1}, {4+2}, {3+2+1}

[0170] In the case of using the cost-based approach, we can identify all possible subsets I″ of that have a total cost less than E.sub.cost.sub.max as basis for the algorithm.

[0171] For each integer k, the partition results in a group of subsets of {circumflex over (T)} with total time {circumflex over (t)}.sub.k. Then, all subsets with a total time≤T.sub.max corresponds to partitioning of all integers k<k.sub.max where k.sub.max is a mapped value of {circumflex over (t)}.sub.i closest to {circumflex over (T)}.sub.max. The partitioning can be generated in a bottom-up manner or recursively. As an example, in Python code, one can use the following sequence of commands to generate the subsets recursively:

def uniquepartitions (n, I=1); [0172] yield (n,) [0173] for i in range (I, n//2+1): [0174] for p in unique_partitions(n−I, i): [0175] if i not in p: #eliminate non-unique results [0176] yield (i,)+P

[0177] The bottom up approach is straightforward using dynamic programming. Moreover, we can obtain a pre-computed table of all such partitioning up to some large value of k. [0178] Having obtained each subset {circumflex over (T)}′, we can compute N.sub.good({circumflex over (T)}′) and select the subset {circumflex over (T)}″ having the largest value of N.sub.good. Note that the value of N.sub.good can be computed quickly for each subset using the cumulative histogram approach as discussed above.

[0179] The previous examples have been discussed in the context of capturing an initial sequence of images, and thereafter determining the best exposure settings for the image acquisition. Thus, in the previous examples, the candidate groups of exposure settings are determined offline, such that each group of exposure settings to be used in the 3D image acquisition process is determined prior to acquiring the respective sets of image data. In what follows, we describe an online approach for choosing optimal exposure settings. Here, we assume that we are given an initial exposure(s) and the aim is to predict the next best exposure. Thus, in the present online approach, the 3D image acquisition process is carried out iteratively, by determining in each iteration, a next best group of exposure settings to use and then capturing a set of image data with that group of exposure settings. In each iteration of the method, one or more new candidate groups of exposure settings are considered and a single one of the candidate groups of exposure settings selected for use in acquiring the next set of image data.

[0180] We consider an image 40 taken with an exposure time t. For each pixel p.sub.iϵI(t) we denote the pixel signal level with the system's active illumination on (projector on for structured light) as v.sup.+(p.sub.i), and denote the pixel signal level with the system's illumination off as v.sup.−(p.sub.i).

[0181] We have several constraints (in the following, the numbers provided are based on the camera sensor being an 8-bit sensor with 256 intensity levels): [0182] v.sub.bad maximum value for “black” underexposed pixels (e.g. 10). Those pixels for which v<v.sub.bad cannot be guaranteed to increase in value if we increase the exposure time. This will include pixels in a region of shadow, for example. [0183] v.sub.min, the minimum acceptable pixel value (e.g. 50). A pixel with vϵ[v.sub.bad,v.sub.min] will increase its value if the exposure time is increased. [0184] v.sub.max, the maximum acceptable pixel value (e.g. 230). Those pixels for which v>v.sub.max are overexposed and it is not possible to estimate a proper exposure time for them. [0185] σ.sub.max, the maximum measurement uncertainty which gives acceptable 3D quality. We estimate σ for each pixel as follows:

[00021] $σ (p) \sim \frac{\sqrt{A + 2 C}}{A} \sim \frac{\sqrt{v^{+} (p) - v^{-} (p) + 2 v^{-} (p)}}{v^{+} (p) - v^{-} (p)} \sim \frac{\sqrt{v^{+} (p) + v^{-} (p)}}{v^{+} (p) - v^{-} (p)}$

[0186] Given p with σ(p) and time t, we can estimate σ′(p) using time t′=αt as follows:

[00022] $σ^{'} (p) = \frac{σ (p)}{\sqrt{α}}$

[0187] The algorithm proceeds as follows: [0188] Begin with an initial exposure time and acquire an image [0189] Compute a candidate set of next possible exposure times [0190] For each candidate exposure time, compute the expected number of well-exposed pixels [0191] Pick the exposure time with the greatest expected number of well-exposed pixels [0192] Repeat until a condition is met (e.g. one reaches the maximum allowed time)

[0193] A probabilistic model can be used to estimate the number of well-exposed pixels given a set of exposures that has already taken place. Consider an image I(t) taken with an exposure time t. Each pixel p.sub.iϵI(t) may fall into one of the following categories: [0194] Case 1 (impossible): the pixel is properly exposed, but σ(p) is too high. That is:

v.sup.−(p)ϵ(v.sub.min,v.sub.max),v.sup.+(P)ϵ(v.sub.min,v.sub.max),σ(p)>σ.sub.max

[0195] Since it is not possible to obtain proper measurements from pixels in this category by increasing the exposure time, these pixels are to be dropped from consideration. [0196] Case 2 (acceptable quality): the pixel is properly exposed and 3D measurements at this point have acceptable quality. That is:

v.sup.+(p)ϵ(v.sub.min,v.sub.max),σ(p)<α.sub.max

[0197] These pixels can be dropped from further consideration. [0198] Case 3 (overexposed). There are actually two possibilities here:

[0199] (a) The pixel is overexposed even with the projector off; that is v.sup.− (p)>v.sub.max In this case, we can do nothing but decrease the ambient light. We increase the probability with α′.sub.2α for each exposure stop, but only up until α′.sub.2α.

[00023] $P r (σ (p) < σ_{\max} .Math. t > t_{i}) = 0,$ $\Pr (σ (p) < σ_{\max} .Math. t < t_{i}) = \min (α_{2 α} \log_{2} (\frac{t_{i}}{t}), α_{2 α}^{'})$

[0200] (b) The pixel is overexposed with the projector on, but properly exposed with the projector off; that is, v.sup.+(p)>v.sub.max and v.sup.− (p)<v.sub.min. In this case, we should aim to decrease the exposure time. As before:

Pr(σ(p)<σ.sub.max|t>t.sub.i)=0,

[0201] Suppose we can say that p has |v.sup.+(p)−v.sup.−(p)|=R with some probability α.sub.2b if p is properly exposed. This means that if:

[00024] $t^{'} = t_{i} \frac{v_{\max}}{R + v^{-} (p)}$

then v.sup.+(p)=v.sub.max with probability α.sub.2b.

[0202] We can write:

[00025] $β^{'} = \frac{v_{\max}}{R + v^{-} (p)}$ $σ^{'} (p) = \frac{\sqrt{v_{\max} + β^{'} v^{'} (p)}}{v_{\max} - β^{'} v^{-} (p)}$

[0203] Next, we estimate the extent to which we can reduce t′ while still keeping σ(p)<σ.sub.max.

[0204] Let t″=β″t′, then:

[00026] $If β^{″} < {(\frac{σ^{'} (p)}{σ_{\max}})}^{2}, then σ^{″} (p) > σ_{\max}$

[0205] This gives us:

Pr(σ(p)<σ.sub.max|tϵ(β″t.sub.i,β′t.sub.i))=α.sub.2b,

Pr(σ(p)<σ.sub.max|t<β″t.sub.i)=0 [0206] Case 4 (underexposed). There are three possibilities here:

[0207] (a) The pixel is underexposed with the projector on; that is v.sup.+(p)<v.sub.bad. Here, we need to increase the exposure time. Again, we increase the probability with α.sub.3α for each stop, but no more than α′.sub.3α.

[00027] $P (σ (p) < σ_{\max} .Math. t < t_{i}) = 0,$ $\Pr (σ (p) < σ_{\max} .Math. t > t_{i}) = \min (α_{2 α} \log_{2} (\frac{t_{i}}{t}), α_{2 α}^{'})$ $\Pr (σ (p) < σ_{\max} .Math. t < t_{i}) = \min (α_{3 α} \log_{2} (\frac{t}{t_{i}}), α_{3 α}^{'})$

[0208] (b) The pixel is underexposed with the projector on; that is σ(p)>σ.sub.max, but v.sup.+(p)ϵ(v.sub.bad, v.sub.min) We can increase the exposure time to get v.sup.+(p) such that σ(p)<σ.sub.max but also keeping v.sup.+(p)<v.sub.max and estimate the probability as follows:

[00028] $P r (σ (p) < σ_{\max} .Math. t < t_{i}) = 0,$ $\Pr (σ (p) < σ_{\max} .Math. t \in ({(\frac{σ (p)}{σ_{\max}})}^{2} t_{i}, \frac{v_{\max}}{v^{+} (p)} t_{i})) = 1,$ $\Pr (σ (p) < σ_{\max} .Math. t \in ({(\frac{σ (p)}{σ_{\max}})}^{2} t_{i}, \frac{v_{m α x}}{v^{+} (p)} t_{i})) = 1,$ $\Pr (σ (p) < σ_{\max} .Math. t > \frac{v_{m α x}}{v^{+} (p)} t_{i}) = 0$

[0209] (c) The pixel is underexposed with the projector off, that is:

[0210] v.sup.− (p)<v.sub.min, v.sup.+(p)ϵ(v.sub.min, v.sub.max), σ(p)>σ.sub.max. This case is similar to case 4(b) above.

[0211] FIG. 13 sets out a general framework for the above-described online approach with examples of possible algorithms for choosing candidate times and estimated the expected number of pixels with an acceptable value for the quality parameter. We use E={(t.sub.i, I(t.sub.i)} as a set of current exposure times along with images. The algorithm uses ComputeExpectedNum to estimate the expected number of pixels for which σ(p)<σ.sub.max given an exposure time t and previous exposures E. It uses the probabilistic model described above, and can be extended by estimating α.sub.i using machine learning techniques (for example, by using values of neighbourhood pixels with a convolutional neural network CNN). The procedure estimates the probability for each pixel based on which one of the above categories the pixel falls into: [0212] Given some t.sub.i with I(t.sub.i) [0213] Compute candidate times in a log grid with ⅙ step inbetween t.sub.i and 1 step<t.sub.min, >t.sub.max. [0214] For each candidate time compute expected number of good pixels (using the above). [0215] Pick time with the highest expected number of good pixels.

[0216] As before, in the case where the cost is defined as a function of multiple ones of the imaging parameters (and not just the exposure time), the algorithm will run until the sum of costs for the set of exposures exceeds E.sub.cost.sub.max instead of until the sum of exposure times exceeds T.sub.max. Another alternative is that the algorithm terminates once the ratio between expected increase in number of new good pixels (the return value from ComputeExpectedNum) versus the exposure cost of the selected exposure falls below a threshold.

[0217] It will be appreciated that in the above examples, the change in exposure settings for each image acquisition has been limited to the exposure time only, with the understanding that other parameters (aperture size, illumination intensity, neural density filter strength etc.) remain constant across the sequence of image acquisitions. However, as previously discussed, it is possible to infer from the change in exposure time the extent to which other ones of the parameters would need to be altered in order to achieve the same signal to noise; this can be done by considering the respective functions {e, e.sub.2, . . . , e.sub.n} associated with each parameter. For example, having determined a set of exposure times {t.sub.1, t.sub.2, t.sub.n} for a particular acquisition, this can be translated to a set of aperture sizes {a.sub.1, a.sub.2, . . . , a.sub.n} or combinations of both these parameters {{t.sub.1, a.sub.1}, {t.sub.2, a.sub.2} . . . , {t.sub.n, a.sub.n}} by considering the respective functions {e.sub.1, e.sub.2, . . . , e.sub.n} associated with the exposure time and aperture size whilst minimizing their respective costs {f.sub.1, f.sub.2, . . . , f.sub.n}. Accordingly, although the algorithms described herein focus on exposure time, the determination of different exposure times can act as a proxy for determining the settings for other parameters that affect the amount of light incident on the camera.

[0218] In the case of adjusted projector brightness, this will primarily affect the amplitude of the received signal, and to a much less extent the strength of the ambient light. This can easily be incorporated into the “greedy” algorithm described earlier by simply including images captured with different projector brightnesses into the set, e.g. a discrete set of different brightnesses.

[0219] It will further be appreciated that although the specific examples described herein relate to structured illumination systems, the methods described herein can be readily extended to other forms of 3D imaging, by considering how the signal to noise in the final image varies as a function of the received light signal. As an example, for active time-of-flight systems, the following relation exists for the depth noise σ.sub.TOF:

[00029] $σ_{TOF} = \frac{1}{2 \sqrt{2 m}} \frac{c τ_{response}}{\sqrt{N_{p h}}}$

[0220] Where N.sub.ph is the total signal level received (the sum of amplitude A and ambient light C), T.sub.response is the time response of the system, c is the speed of light and m is the number of samples performed. √{square root over (N.sub.ph)} can be replaced by the signal to noise ratio (SNR) when including contribution from dark noise and ambient noise. In this case τ.sub.response will usually be a constant (dictated by the characteristics of the components).

[0221] It will be appreciated that the above algorithms can be easily constrained to working only on relevant region in the image. This region could be specified in 2D (in pixel coordinates) or in 3D (in world XYZ coordinates). Pixels determined to fall outside the specified region-of-interest, either in 2D or 3D, can then be excluded from further consideration of the algorithms.

[0222] In summary, embodiments described herein provide a means for rendering a high SNR 3D image of a scene or object. By carrying out a plurality of image acquisitions with different exposure settings and merging the data from those sets of image data to form a single point cloud in which the signal to noise ratio is maximised for each point, it is possible to accommodate large variations in the amount of light available from different points in the scene. Moreover, embodiments provide a means for determining a set of exposure settings to use in acquiring each image, in a way that will maximise the signal to noise ratio in the final 3D image whilst satisfying one or more constraints on time, depth of focus, illumination power, etc.

[0223] It will be appreciated that implementations of the subject matter and the operations described in this specification can be realized in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be realized using one or more computer programs, i.e., one or more modules of computer program instructions, encoded on computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially generated propagated signal. The computer storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).

[0224] While certain embodiments have been described, these embodiments have been presented by way of example only and are not intended to limit the scope of the invention. Indeed, the novel methods, devices and systems described herein may be embodied in a variety of forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the invention. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the invention.

METHOD FOR DETERMINING ONE OR MORE GROUPS OF EXPOSURE SETTINGS TO USE IN A 3D IMAGE ACQUISITION PROCESS

Inventors

Cpc classification

Classification Explorer

H04N23/72

ELECTRICITY

Classification Explorer

H04N23/743

ELECTRICITY

Classification Explorer

H04N2013/0096

ELECTRICITY

Classification Explorer

H04N13/296

ELECTRICITY

Classification Explorer

H04N13/254

ELECTRICITY

Classification Explorer

H04N23/73

ELECTRICITY

International classification

Classification Explorer

H04N5/235

ELECTRICITY

Classification Explorer

H04N13/254

ELECTRICITY

Classification Explorer

H04N13/296

ELECTRICITY

Abstract

Claims

Description