MACHINE VISION USING DIFFRACTIVE SPECTRAL ENCODING
20230153600 · 2023-05-18
Assignee
Inventors
- Aydogan Ozcan (Los Angeles, CA)
- Jingxi Li (Los Angeles, CA, US)
- Deniz Mengu (Los Angeles, CA, US)
- Yair Rivenson (Los Angeles, CA, US)
Cpc classification
G06N3/0675
PHYSICS
G02B27/4272
PHYSICS
International classification
Abstract
A machine vision task, machine learning task, and/or classification of objects is performed using a diffractive optical neural network device. Light from objects passes through or reflects off the diffractive optical neural network device formed by multiple substrate layers. The diffractive optical neural network device defines a trained function between an input optical signal from the object light illuminated at a plurality or a continuum of wavelengths and an output optical signal corresponding to one or more unique wavelengths or sets of wavelengths assigned to represent distinct data classes or object types/classes created by optical diffraction and/or reflection through/off the substrate layers. Output light is captured with detector(s) that generate a signal or data that comprise the one or more unique wavelengths or sets of wavelengths assigned to represent distinct data classes or object types or object classes which are used to perform the task or classification.
Claims
1. A system using diffractive spectral encoding for performing one or more of a machine vision task, machine learning task, and/or classification of objects comprising: a diffractive optical neural network device comprising a plurality of optically transmissive and/or reflective substrate layers arranged in an optical path, each of the plurality of optically transmissive/reflective substrate layers comprising a plurality of physical features formed on or within the plurality of optically transmissive and/or reflective substrate layers and having different complex-valued transmission and/or reflection coefficients as a function of lateral coordinates across each substrate layer, wherein the plurality of optically transmissive and/or reflection substrate layers and the plurality of physical features collectively define a trained function between an input optical signal from the objects illuminated at a plurality or a continuum of wavelengths and an output optical signal corresponding to one or more unique wavelengths or sets of wavelengths within the plurality or the continuum of wavelengths assigned to represent distinct data classes or object types or object classes created by optical diffraction and/or reflection through/off the plurality of optically transmissive and/or reflective substrate layers; a light source configured to illuminate the objects and generate the input optical signal; and a detector or set of detectors configured to sense the output optical signal(s) or data from the diffractive optical neural network.
2. The system of claim 1, wherein the light source simultaneously illuminates the objects at a plurality or a continuum of wavelengths.
3. The system of claim 1, wherein the light source sequentially illuminates the objects at a plurality or a continuum of wavelengths.
4. The system of claim 1, wherein the detector or set of detectors generates a time domain signal or data that comprises the output information of the diffractive optical neural network.
5. The system of claim 1, wherein the detector or set of detectors generates a spectral domain signal or data that directly reveals the output information of the diffractive optical neural network.
6. The system of claim 1, further comprising a trained, digital neural network configured to receive a signal or data from the detector or set of detectors as an input and digitally output a reconstructed image of the objects.
7. The system of claim 6, wherein the trained, digital neural network configured to reconstruct images of the input objects is trained using at least one of the following: (i) a structural loss term, (ii) a cross entropy loss term, (iii) softmax-cross-entropy loss term, (iv) a diffractive network inference accuracy related penalty term, or (v) combinations of (i-iv) with different weights.
8. The system of claim 6, wherein the trained, digital neural network comprises a shallow network having five or less hidden layers.
9. The system of claim 1, wherein a signal or data from the detector or set of detectors are post-processed by a computer executed algorithm or software or dedicated hardware that performs one or more operations of: Fourier transform, addition, subtraction, multiplication, standardization, peak detection or combinations thereof.
10. The system of claim 1, wherein the detector comprises a single pixel detector.
11. The system of claim 1, wherein the light source comprises a pulsed light source.
12. The system of claim 1, wherein the light source comprises a broadband light source.
13. The system of claim 1, wherein the light source comprises a plurality of discrete frequency/wavelength lines.
14. The system of claim 1, wherein the light source comprises a broadband light and wherein the system further comprises at least one dispersive element that receives light from the broadband light source prior to illumination of the objects.
15. The system of claim 1, further comprising at least one dispersive element interposed between the diffractive optical neural network device and the detector or set of detectors.
16. The system of claim 1, further comprising at least one waveguide interposed between the diffractive optical neural network device and the detector or set of detectors.
17. The system of claim 1, wherein the diffractive optical neural network device, light source, and detector or set of detectors are used in a time domain or spectral domain spectroscopy device.
18. The system of claim 1, wherein the diffractive optical neural network device, light source, and detector or set of detectors are used in an interferometric measurement device.
19. The system of claim 1, wherein the diffractive optical neural network device, light source, and detector or set of detectors are used in an optical coherence tomography (OCT) setup.
20. The system of claim 1, wherein the diffractive optical neural network device, light source, and detector or set of detectors are used in a Fourier-transform infrared spectroscopy (FTIR) measurement system.
21. The system of claim 1, wherein the system performs single-shot spectral encoding of object information.
22. The system of claim 1, wherein the light source emits light within the ultra-violet, visible, infrared, terahertz, millimeter, or radio portion of the electromagnetic spectrum.
23. The system of claim 1, wherein the detector or set of detectors output a signal or data that comprises spectral class scores.
24. The system of claim 1, wherein the detector or set of detectors output a signal or data that comprises a signal encoding and/or feature extraction and/or feature representation scheme.
25. The system of claim 1, wherein the system performs optical signal compression.
26. The system of claim 6, wherein the reconstructed images of the objects are fed back to the diffractive optical neural network device as new inputs so as to improve the inference accuracy of the system.
27. A method of performing a machine vision task, machine learning task, and/or classification of objects using a diffractive optical neural network device, comprising: passing light from the objects through the diffractive optical neural network device comprising a plurality of optically transmissive and/or reflective substrate layers arranged in an optical path, each of the plurality of optically transmissive and/or reflective substrate layers comprising a plurality of physical features formed on or within the plurality of optically transmissive and/or reflective substrate layers and having different complex-valued transmission/reflection coefficients as a function of lateral coordinates across each substrate layer, wherein the plurality of optically transmissive and/or reflection substrate layers and the plurality of physical features collectively define a trained function between an input optical signal from the objects illuminated at a plurality or a continuum of wavelengths and an output optical signal corresponding to one or more unique wavelengths or sets of wavelengths within the plurality or the continuum of wavelengths assigned to represent distinct data classes or object types or object classes created by optical diffraction and/or reflection through/off the plurality of optically transmissive and/or reflective substrate layers; capturing light from the diffractive optical neural network device with a detector or a set of detectors that generate a signal or data that comprise the one or more unique wavelengths or sets of wavelengths within the plurality or the continuum of wavelengths assigned to represent distinct data classes or object types or object classes; and performing the machine vision task, machine learning task, and/or classification of objects based on the signal or data generated by the detector or set of detectors.
28. The method of claim 27, further comprising: inputting the signal or data generated by the detector or set of detectors to a trained, digital neural network and digitally outputting a reconstructed image of the objects.
29. The method of claim 28, wherein the trained, digital neural network comprises a shallow network having five or less hidden layers.
30. The method of claim 27, wherein the detector comprises a single pixel detector.
31. The method of claim 27, wherein the light from the object originates from a pulsed light source.
32. The method of claim 27, wherein the light from the object originates from a broadband light source.
33. The method of claim 27, wherein the light from the object originates from a light source that emits light within the ultra-violet, visible, infrared, terahertz, millimeter, or radio portion of the electromagnetic spectrum.
34. The method of claim 27, wherein the signal or data generated by the detector or set of detectors comprises spectral class scores.
35. The method of claim 27, wherein a light source simultaneously illuminates the objects at a plurality or a continuum of wavelengths.
36. The method of claim 27, wherein a light source sequentially illuminates the objects at a plurality or a continuum of wavelengths.
37. The method of claim 27, wherein the detector or set of detectors generates a time domain signal or data that comprises output information of the diffractive optical neural network.
38. The method of claim 28, wherein the trained, digital neural network configured to reconstruct images of the input objects is trained using at least one of the following: (i) a structural loss term, (ii) a cross entropy loss term, (iii) softmax-cross-entropy loss term, (iv) a diffractive network inference accuracy related penalty term, or (v) combinations of (i-iv) with different weights.
39. The method of claim 33, wherein the light source emits light at a plurality of discrete frequencies or wavelengths.
40. The method of claim 32, wherein the system further comprises at least one dispersive element that receives light from the broadband light source prior to illumination of the objects.
41. The method of claim 27, further comprising at least one dispersive element interposed between the diffractive optical neural network device and the detector or set of detectors.
42. The method of claim 27, further comprising at least one waveguide interposed between the diffractive optical neural network device and the detector or set of detectors.
43. The method of claim 27, wherein the diffractive optical neural network device, light source, and detector or set of detectors are used in a time domain or spectral domain spectroscopy device.
44. The method of claim 27, wherein the diffractive optical neural network device, light source, and detector or set of detectors are used in an interferometric measurement device.
45. The method of claim 27, wherein the diffractive optical neural network device, light source, and detector or set of detectors are used in an optical coherence tomography (OCT) setup.
46. The method of claim 27, wherein the diffractive optical neural network device, light source, and detector or set of detectors are used in a Fourier-transform infrared spectroscopy (FTIR) measurement system.
47. The method of claim 27, wherein the diffractive optical neural network device performs single-shot spectral encoding of object information.
48. The method of claim 27, wherein the detector or set of detectors output a signal or data that comprises an optical signal encoding and/or feature extraction and/or feature representation scheme.
49. The method of claim 27, wherein the diffractive optical neural network device performs optical signal compression.
50. The method of claim 28, wherein the reconstructed images of the objects are fed back to the diffractive optical neural network device as new inputs so as to improve the inference accuracy of the machine vision task, machine learning task, and/or classification of objects.
51. A system using diffractive spectral encoding of an acoustic signal for performing a machine vision task, machine learning task, and/or classification of objects comprising: a diffractive acoustic neural network device comprising a plurality of acoustically transmissive and/or reflective substrate layers arranged in a path, each of the plurality of acoustically transmissive and/or reflective substrate layers comprising a plurality of physical features formed on or within the plurality of acoustically transmissive and/or reflective substrate layers and having different transmission/reflection coefficients as a function of lateral coordinates across each substrate layer, wherein the plurality of acoustically transmissive and/or reflection substrate layers and the plurality of physical features collectively define a trained function between an input acoustic signal from the object exposed to a plurality or a continuum of frequencies and an output acoustic signal corresponding to one or more unique frequencies or sets of frequencies within the plurality or the continuum of frequencies assigned to represent distinct data classes or object types or object classes created by acoustic diffraction and/or reflection through/off the plurality of acoustically transmissive and/or reflective substrate layers; an acoustic source configured to expose the objects along the path; and a detector or set of detectors configured to sense the acoustic optical signal from the diffractive acoustic neural network.
52. (canceled)
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
DETAILED DESCRIPTION OF ILLUSTRATED EMBODIMENTS
[0039]
[0040] The light source 6 may also include an artificial light source such as a laser, light bulb, light emitting diode(s) (LED), laser diode(s), and the like. In some instances, the light source 6 may be filtered prior to illuminating the object 4. The light source 6 that illuminates the object 4 may include visible light (e.g., light with a wavelength in the range of about 380 nm to about 740 nm) as well as light outside the perception range of humans. For example, the wavelength operating range may extend beyond the visible perception range of humans (e.g., from about 300 nm to about 1,000 nm). The light source 6 may also emit light within the ultra-violet, visible, infrared, terahertz, millimeter, or radio portion of the electromagnetic spectrum.
[0041] Illumination of the object 4 by the light source may transmit through the object 4, reflect off the object 4, or combinations thereof.
[0042] The light from the object 4 enters a diffractive optical neural network device 10. The diffractive optical neural network devices 10 described herein may be used for machine learning, classification, and/or processing (separately or combinations thereof) of at least one optical image, optical signal, or optical data (e.g., optically encoded data). As seen in
[0043] The diffractive optical neural network device 10 contains a plurality of optically transmissive and/or reflective substrate layers 16 arranged in one or more optical paths. The substrate layers 16 are formed as a physical substrate or matrix of optically transmissive material (for transmission mode such as illustrated in
[0044]
[0045] Each substrate layer 16 of the diffractive optical neural network device 10 has a plurality of physical features 20 (
[0046] The plurality of substrate layers 16 that are arranged along the optical path 18 collectively define a trained mapping function between an input optical signal 12 from the object 4 to the plurality of substrate layers 16 and an output optical signal 30 that is created by optical diffraction through the plurality of substrate layers 16 (or reflection from the substrate layers 16). The pattern of physical locations formed by the physical features 20 may define, in some embodiments, an array located across the surface of the substrate layer 16. Additional details regarding the substrate layers 16 and physical features 20 that are formed thereon may be found in International Patent Application Publication No. WO 2019/200289, which is incorporated herein by reference.
[0047] As seen in
[0048] The light or optical radiation that forms the input optical signal 12 is directed through the substrate layers 16 of the diffractive optical network device 10 along an optical path 18 (or in other embodiments along multiple optical paths 18 such as seen in
[0049] In some embodiments, the input optical signal 12 may originate from one or more objects 4 that illuminated from a light source 6 (e.g., artificial source of light or natural lighting (e.g., sun light). In still other embodiments, the object 4 may emit fluorescent light or emissive radiation in response to the light from the source of light 6. For example, the source of light 6 may act as an excitation light source and the diffractive optical network device 10 receives fluorescent light that is emitted from the object 4.
[0050] The output optical signal 30 is captured by a detector 32 or set of detectors 32. As seen in
[0051] The plurality of substrate layers 16 arranged along the optical path(s) 18 collectively define a trained function between the input optical signal 12 from the object 4 illuminated at a plurality or a continuum of wavelengths (i.e., the broadband light source) and the output optical signal(s) 30 corresponding to one or more unique wavelengths or sets of wavelengths within the plurality or the continuum of wavelengths assigned to represent distinct data classes, object types, or object classes created by optical diffraction and/or reflection through/off the plurality of optically transmissive/reflective substrate layers 16. The output optical signal(s) 30 from the diffractive optical neural network device 10 may comprise an optical signal that contains or encodes extracted features or feature representation scheme of the object(s) 4.
[0052] With reference to
[0053] The trained neural network 110 may be trained using at least one of the following: (i) a structural loss term, (ii) a cross entropy loss term, (iii) softmax-cross-entropy loss term, (iv) a diffractive network inference accuracy related penalty term, or (v) combinations of (i-iv) with different weights. The computing device 100 may execute an algorithm or software program 102 (or other dedicated hardware) may also be used to perform various post-processing operations of the output signals or data from the detector 32. This includes, by way of illustration, one or more operations of: Fourier transform, addition, subtraction, multiplication, standardization, peak detection or combinations thereof. As explained herein, in some embodiments, the reconstructed images 120 are fed back to the same diffractive optical neural network device 10 as new inputs to improve the inference accuracy of the same. This operation is illustrated by dashed arrows B in
[0054] The computing device 100 may have or be connected to a display 130 that can be used to display results of the machine vision task, machine learning task, and/or classification of objects 4. For example,
[0055] In some embodiments, the light from the light source 6 may pass through one or more dispersive elements (e.g., lens or set of lenses) prior to impinging on the substrate layers 16 of the diffractive optical neural network device 10. One or more dispersive elements may also be interposed between the output of the diffractive optical neural network device 10 and the detector 32. One or more waveguides may optionally be used to guide light from the diffractive optical neural network device 10 prior to arrival at the detector 32. These may include, by way of examples, optical fibers or the like.
[0056]
[0057] The plurality of acoustically transmissive and/or reflection substrate layers 210 and the plurality of physical features collectively define a trained function between an input acoustic signal 214 from the object 4 exposed to a plurality or a continuum of frequencies and an output acoustic signal 216 corresponding to one or more unique frequencies or sets of frequencies within the plurality or the continuum of frequencies assigned to represent distinct data classes or object types or object classes created by acoustic diffraction and/or reflection through/off the plurality of acoustically transmissive and/or reflective substrate layers 210.
[0058] In one embodiment, an acoustic source 220 is provided and configured to expose the object 4 and generate the input acoustic signal 214. Apertures (not shown) similar to apertures 8, 28 may be used at the front/back end of the diffractive acoustic neural network device 200 similar to the light embodiments of
Experimental
[0059] Results
[0060]
[0061] Based on the system architecture shown in .sub.D) for a given diffractive network design:
.sub.D=
.sub.I+α.Math.
.sub.E+β.Math.
.sub.P (1),
[0062] where .sub.I and
.sub.E refer to the loss terms related to the optical inference task (e.g., object classification) and the diffractive power efficiency at the output detector 32, respectively (see Material and Methods section for details). The spatial purity loss,
.sub.P, on the other hand, has a rather unique aim of clearing the light intensity over a small region of interest surrounding the active area of the single-pixel detector 32 to improve the robustness of the machine vision system for uncontrolled lateral displacements of the detector 32 position with respect to the optical axis (see Material and Methods for detailed definitions of
.sub.I,
.sub.E and
.sub.P). The hyperparameters, α and β, control the balance between the three major design factors represented by these training loss terms.
[0063] To exemplify the performance of this design framework as a function of different parameters, with ten class-specific wavelengths uniformly distributed between λ.sub.min=1.0 mm and λ.sub.max=1.45 mm, a 3-layer 16 diffractive optical neural network device 10 with α=β=0 can achieve >96% blind testing accuracy for spectrally encoded optical classification of handwritten digits (see Table 1, 4.sup.th row). Fine tuning of the hyperparameters, α and β, yields broadband diffractive optical neural network device 10 designs that provide improved diffractive power efficiency at the single-pixel detector 32 and partial insensitivity to misalignments without excessively sacrificing inference accuracy. For example, using α=0.03 and β=0.1, 95.05% blind testing accuracy was obtained for spectrally encoded optical classification of handwritten digits with—1% inference accuracy drop compared to the diffractive model trained with α=β=0, while at the same time achieving ˜8 times higher diffractive power efficiency at the output detector 32 (see Table 1).
[0064] Next, the substrate layers 16 shown in
[0065] For the same 3D-printed diffractive optical neural network device 10 (
[0066] In addition to the diffractive optical neural network device 10 shown in
[0067] To provide a mitigation strategy for this trade-off, a collaboration framework was introduced between the diffractive optical neural network device 10 and its corresponding trained neural network 110 (for image reconstruction). This collaboration is based on the fact that the decoder trained neural network 110 can faithfully reconstruct the images 120 of the input objects 4 using the spectral encoding present in s, even if the optical classification is incorrect, pointing to a wrong class through max(s). It was observed that by feeding the reconstructed images 120 from the trained neural network 110 back to the diffractive optical neural network device 10 as new inputs it can correct its initial wrong inference (see
[0068] In this collaboration between the diffractive optical neural network model/device 10 and its corresponding shallow, trained neural network 110, the training loss function of the latter was coupled to the classification performance of the diffractive optical neural network model/device 10. In other words, in addition to a structural loss function (.sub.S) that is needed for a high-fidelity image reconstruction, a second loss term was added that penalized the neural network 110 by a certain weight if its reconstructed image 120 cannot be correctly classified by the diffractive optical neural network model/device 10 (see the Materials and Methods section). This ensures that the collaboration between the optical encoder and its corresponding decoder (i.e., trained neural network 110) is constructive, i.e., the overall classification accuracy is improved through the feedback of the reconstructed images 120 onto the diffractive optical neural network model/device 10 as new inputs. Based on this collaboration scheme, the general loss function of the decoder trained neural network 110 can be expressed as:
.sub.Recon=γ.Math.
.sub.S(O.sub.recon,O.sub.input)+(1−γ).Math.
.sub.I (2),
[0069] where .sub.S refers to structural loss, e.g., Mean Absolute Error (MAE) or reversed Huber (“BerHu”) loss, which are computed through pixel-wise comparison of the reconstructed image (O.sub.recon) with the ground truth object image (O.sub.input) (see Materials and Methods section for details). The second term in Eq. (2),
.sub.I, refers to the same loss function used in the training of the diffractive optical neural network model/device 10 (front-end) as in Eq. (1), except this time it is computed over the new class scores, s′, obtained by feeding the reconstructed image, O.sub.recon, back to the same diffractive optical neural network model/device 10 (see
[0070] Table 1 summarizes the performance comparison of different loss functions employed to train the decoder trained neural network 110 and their impact on the improvement of the classification performance of the diffractive optical neural network device 10. Compared to the case when γ=1, which refers to independent training of the reconstruction trained neural network 110 without taking into account .sub.I, one sees significant improvements in the inference accuracy of the diffractive optical neural network model through max(s′) when the trained neural network 110 has been penalized during its training (with e.g., γ=0.95) if its reconstructed images 120 cannot be correctly classified by the diffractive optical neural network model (refer to the Materials and Methods section for further details). Stated differently, the use of
.sub.I term in Eq. (2) for the training of the decoder trained neural network 110 tailors the image reconstruction space to generate object features that are more favorable for the diffractive optical classification, while also retaining its reconstruction fidelity to the ground truth object, O.sub.input by the courtesy of the structural loss term,
.sub.S, in Eq. (2).
TABLE-US-00001 TABLE 1 Diffractive power efficiency at the output Testing accuracy Testing accuracy Diffractive optical network detector: η (%) max(s) (%) max(s′) (%) 10 wavelengths, α = 0.4, β = 0.2 0.964 ± 0.466 84.02 MAE: 84.03 (FIGS. 7A-7B) MAE + SCE: 91.29 s = [s.sub.0, s.sub.1, . . . , s.sub.9] BerHu + SCE: 91.06 10 wavelengths, α = 0.08, β = 0.2 0.124 ± 0.062 93.28 MAE: 91.31 (FIGS. 11A-11B) MAE + SCE: 94.27 s = [s.sub.0, s.sub.1, . . . , s.sub.9] BerHu + SCE: 94.02 10 wavelengths, α = 0.03, β = 0.1 0.047 ± 0.026 95.05 MAE: 93.40 (FIGS. 5A-5C, 10A-10B) MAE + SCE: 95.32 s = [s.sub.0, s.sub.1, . . . , s.sub.9] BerHu + SCE: 95.37 10 wavelengths, α = β = 0 0.006 ± 0.004 96.07 MAE: 94.58 FIG. 12A-12B MAE + SCE: 96.26 s = [s.sub.0, s.sub.1, . . . , s.sub.9] BerHu + SCE: 96.30 20 wavelengths (Differential), α = β = 0 0.004 ± 0.002 96.82 MAE: 90.15 FIGS. 15A-15B MAE + SCE: 96.81 s.sub.D = [s.sub.0+, s.sub.0−, s.sub.1+, s.sub.1−, . . . , s.sub.9+, s.sub.9−] BerHu + SCE: 96.64 s = Δs = [Δs.sub.0, Δs.sub.1, . . . , Δs.sub.9]
[0071] Table 1. Numerical blind testing accuracies of different diffractive networks and their integration with decoder image reconstruction ANNs. The diffractive optical networks presented in the first 3 rows were trained with different (α,β) pairs for experimental validation, resulting in different diffractive power efficiencies at the output detector, while the model in the 4.sup.th row was trained with α==0. The mean diffractive power efficiencies (η) of the diffractive network models were calculated at the output detector, considering the whole testing dataset, represented with the corresponding standard deviations (see Materials and Methods section for details).
[0072] Discussion
[0073] Even though Eq. (1) tries to find a balance among the optical inference accuracy, detector photon efficiency and resilience to possible detector 32 misalignments, there are other sources of experimental errors that contribute to the physical implementations of diffractive optical neural networks 10. First, due to multi-layer 16 layout of these networks 10, any inter-layer 16 misalignments might have contributed to some of the errors that were observed during the experiments. In addition, the optical forward model does not take into account multiple reflections that occur through the diffractive layers 16. These are relatively weaker effects that can be easily mitigated by e.g., time-gating of the detector 32 output and/or using anti-reflection coatings that are widely employed in the fabrication of conventional optical components. Moreover, any measurement errors that might have taken place during the characterization of the dispersion of the diffractive-layer material can cause the numerical models to slightly deviate from their physical implementations (i.e., the fabricated diffractive optical neural network device 10). Finally, 3D fabrication errors stemming from printing overflow and crosstalk between diffractive features on the substrate layers 16 can also contribute to some of the differences observed between the numerical and experimental results.
[0074] In addition to the above discussed physical implementation-related possible improvements to the results, the performance of the presented spectral encoding-based machine vision framework can be further improved using a differential class encoding strategy. The use of two different wavelengths was explored to encode each class score: instead of using 10 discrete wavelengths to represent a spectral class score vector, s=[s.sub.0, s.sub.1, . . . ,s.sub.9], the spatial information of an object 4 was encoded into 20 different wavelengths (s.sub.0+, s.sub.0−, s.sub.1+, s.sub.1−, . . . , s.sub.9+, s.sub.9−) that are paired in groups of two in order to differentially represent each spectral class score, i.e.,
In this differential spectral encoding strategy, the diffractive network makes an inference based on max(Δs) resulting from the spectral output at the single-pixel detector 32. With this spectrally encoded differential classification scheme, 96.82% optical classification accuracy was attained for handwritten digits (see Table 1 and
[0075] As an alternative to the shallow decoder trained neural network 110 with 2-hidden layers, the use of a much deeper convolutional architecture was investigated as the image reconstruction network 110 in the spectrally-encoded machine vision framework. For this, the output of the 2-hidden layer fully-connected network (with an input of s) is further processed by a U-Net-like deep convolutional neural network 110 with skip connections and a total of >1.4M trainable parameters in order to reconstruct the images 120 of handwritten digits using s. It was found out that the collaboration of the diffractive networks with this deeper, trained neural network 110 architecture yielded only marginal improvements over the classification accuracies presented in Table 1. For instance, when the diffractive optical network design shown in
[0076] The function of the decoder trained neural network 110, up to this point, has been to reconstruct the images 120 of the unknown input objects 4 based on the encoding present in the spectral class scores, s=[s.sub.0, s.sub.1, . . . , s.sub.9], which also helped to improve the classification accuracy of the diffractive optical neural network device 10 by feeding these reconstructed images 120 back to it. As an alternative strategy, the decoder trained neural network 110 was investigated for a different task: to directly classify the objects 4 based on the spectral encoding (s) provided by the diffractive optical neural network device 10. In this case, the decoder trained neural network 110 is solely focused on improving the classification performance with respect to the optical inference results that are achieved using max(s). For example, based on the spectral class scores encoded by the diffractive optical neural network models/devices 10 that achieved 95.05% and 96.07% accuracy for handwritten digit classification using max(s), a fully-connected, shallow classification trained neural network 110 with 2-hidden layers improved the blind testing accuracy to 95.74 and 96.50%, respectively. Compared to the accuracies presented in Table 1, these numbers indicate that a slightly better classification performance is possible, provided that the image reconstruction is not essential for the target application, and can be replaced with a classification decoder trained neural network 110 that takes s as its input.
[0077] In the earlier systems 2 that have been described above, the diffractive optical neural network model and the corresponding back-end electronic trained neural network 110 or ANN have been separately trained, i.e., after the training of the diffractive optical neural network model for optical image classification, the back-end trained neural network 110 was trained based on the spectral encoding of the converged diffractive network model, yielding either the reconstruction trained neural network 110 or the classification trained neural network 110, as discussed earlier. As an alternative strategy, such hybrid systems can also be jointly-trained, through the error backpropagation between the electronic trained neural network 110 and the diffractive optical front-end.
[0078] This was demonstrated using the MNIST dataset and jointly-trained a diffractive network with an image reconstruction trained neural network 110 at the back-end. The same approach will may also be extended to jointly-train a diffractive network with a classification trained neural network 110 at the back-end, covering a different dataset (EMNIST). In the joint-training of hybrid network systems composed of a diffractive optical neural network model (for ultimate use as a device 10) and a reconstruction trained neural network 110, a linear superposition of two different loss functions was used to optimize both the optical classification accuracy and the image reconstruction fidelity: see Eq. 24 and Table 3.
[0079] Through this linear superposition, the impact of different relative weights of these loss functions were explored on (1) the image classification accuracy of the diffractive optical neural network, and (2) the quality of the image reconstruction performed by the back-end trained neural network 110. For this goal, the relative weight (ξ) of the optical classification loss term was changed in order to shift the attention of the hybrid design between these two tasks. For instance, when the weight of the optical classification loss is set to be zero (ξ=0), the entire hybrid system becomes a computational single-pixel imager that ignores the optical classification accuracy and focuses solely on the image reconstruction quality; as confirmed in
[0080] The inference performance of these hybrid systems was also investigated in terms of the number of wavelengths that are simultaneously processed through the diffractive network. For this, hybrid systems were jointly trained that assign a group of wavelengths to each data class: inference of an object class is then based on the maximum average power accumulated in these selected spectral bands, where each band represents one data class. The results, summarized in Table 3, reveal that assigning e.g., 5 distinct wavelengths to each data class (i.e., a total of 50 wavelengths for 10 data classes), achieved a similar optical classification accuracy, compared to their counterparts that encoded the objects' spatial information using fewer wavelengths. This indicates that the diffractive optical neural network devices 10 can be designed to simultaneously process a larger number of wavelengths to successfully encode the spatial information of the input FOV into spectral features.
[0081] To further explore the capabilities of the system 2 for more challenging image classification tasks beyond handwritten digits, the EMNIST dataset was used, containing 26 object classes, corresponding to handwritten capital letters (see
TABLE-US-00002 TABLE 2 Testing accuracy max(s) or Diffractive network max(s′) (%) 26 wavelengths 84.05 s = [s.sub.0, s.sub.1, . . . , s.sub.25] (FIG. 20A) 52 wavelengths (differential) 86.78 s.sub.D = [s.sub.0+, s.sub.0−, s.sub.1+, s.sub.1−, . . . , s.sub.25+, s.sub.25−] s = Δs = [Δs.sub.0, Δs.sub.1, . . . , Δs.sub.25] (FIG. 20B) 26 wavelengths (jointly-trained with ANN) 85.60 s = [s.sub.0, s.sub.1, . . . , s.sub.25] (FIG. 20C) 52 wavelengths (differential, jointly-trained 87.68 with ANN) s.sub.D = [s.sub.0+, s.sub.0−, s.sub.1+, s.sub.1−, . . . , s.sub.25+, s.sub.25−] s = Δs = [Δs.sub.0, Δs.sub.1, . . . , Δs.sub.25] (FIG. 20D)
[0082] Table 2: Blind testing accuracies for EMNIST handwritten capital letter classification. Also see
TABLE-US-00003 TABLE 3 Blind testing ξ accuracy Diffractive network (Eq. 24) max(s) (%) 10 wavelengths 0.0 10.72 s = [s.sub.0, s.sub.1, . . . , s.sub.9] (FIG. 18A) 10 wavelengths 0.25 94.94 s = [s.sub.0, s.sub.1, . . . , s.sub.9] (FIG. 18B) 10 wavelengths 0.5 95.66 s = [s.sub.0, s.sub.1, . . . , s.sub.9] (FIG. 18C) 10 wavelengths 1.0 96.01 s = [s.sub.0, s.sub.1, . . . , s.sub.9] (FIG. 18D) 20 wavelengths (differential) 0.0 8.88 s.sub.D = [s.sub.0+, s.sub.0−, s.sub.1+, s.sub.1−, . . . , s.sub.9+, s.sub.9−] s = Δs = [Δs.sub.0, Δs.sub.1, . . . , Δs.sub.9] (FIG. 19A) 20 wavelengths (differential) 0.25 95.17 s.sub.D = [s.sub.0+, s.sub.0−, s.sub.1+, s.sub.1−, . . . , s.sub.9+, s.sub.9−] s = Δs = [Δs.sub.0, Δs.sub.1, . . . , Δs.sub.9] (FIG. 19B) 20 wavelengths (differential) 0.5 95.83 s.sub.D = [s.sub.0+, s.sub.0−, s.sub.1+, s.sub.1−, . . . , s.sub.9+, s.sub.9−] s = Δs = [Δs.sub.0, Δs.sub.1, . . . , Δs.sub.9] (FIG. 19C) 20 wavelengths (differential) 1.0 96.04 s.sub.D = [s.sub.0+, s.sub.0−, s.sub.1+, s.sub.1−, . . . , s.sub.9+, s.sub.9−] s = Δs = [Δs.sub.0, Δs.sub.1, . . . , Δs.sub.9] (FIG. 19D) 50 wavelengths (averaging) 0.5 95.86% s.sub.D = [s.sub.0.sup.1, s.sub.0.sup.2, s.sub.0.sup.3, s.sub.0.sup.4, s.sub.0.sup.5, . . . , s.sub.9.sup.1, s.sub.9.sup.2, s.sub.9.sup.3, s.sub.9.sup.4, s.sub.9.sup.5] s = [s.sub.0, s.sub.1, . . . , s.sub.9] 50 wavelengths (learnable weighted 0.5 95.22% averaging) s.sub.D = [s.sub.0.sup.1, s.sub.0.sup.2, s.sub.0.sup.3, s.sub.0.sup.4, s.sub.0.sup.5, . . . , s.sub.9.sup.1, s.sub.9.sup.2, s.sub.9.sup.3, s.sub.9.sup.4, s.sub.9.sup.5] s = [s.sub.0, s.sub.1, . . . , s.sub.9]
[0083] Table 3: Blind testing accuracies of jointly-trained hybrid machine vision systems for MNIST image dataset. Image classification is performed by the corresponding diffractive optical neural network's output, max(s), and a decoder trained neural network 110 is jointly-trained for image reconstruction using the spectral encoding of data classes through a single-pixel detector. Also see
[0084] An optical-based machine vision system 2 is presented that uses trainable matter composed of diffractive layers 16 to encode the spatial information of objects 4 into the power spectrum of the diffracted light, which is used to perform optical classification of unknown objects 4 with a single-pixel spectroscopic detector 32. A shallow, low-complexity trained neural networks 110 can be used as decoders to reconstruct images 120 of the input objects 4 based on the spectrally-encoded class scores, demonstrating task-specific super-resolution. Although terahertz pulses were used to experimentally validate the spectrally-encoded machine vision framework, it can be broadly adopted for various applications covering other parts of the electromagnetic spectrum. In addition to object recognition, this machine vision concept can also be extended to perform other learning tasks such as scene segmentation, multi-label classification, as well as to design single or few pixel, low-latency super-resolution imaging systems by harnessing the spectral encoding provided by diffractive optical neural network devices 10 coupled with shallow decoder trained neural networks 110.
[0085] It is important to note that if the material absorption of the diffractive layers 16 is lower and/or the signal-to-noise ratio of the single-pixel detector 32 is increased, the optical inference accuracy of the presented network designs could be further improved by e.g., increasing the number of diffractive layers 16 or the number of learnable features (i.e., neurons) within the diffractive optical neural network device 10. Compared to using wider diffractive layers 16, increasing the number of diffractive layers 16 offers a more practical method to enhance the information processing capacity of diffractive networks, since training higher numerical aperture diffractive systems through image data is in general relatively harder. Despite their improved generalization capability, such deeper diffractive systems composed of larger numbers of diffractive layers 16 would partially suffer from increased material absorption and surface back-reflections. However, one should note that the optical power efficiency of a broadband network also depends on the size of the output detector 32. For example, the relatively lower power efficiency numbers reported in Table 1 are by and large due to the small size of the output detector 32 used in these designs (2×λ.sub.min) and can be substantially improved by using a detector 32 with a much larger active area.
[0086] In some embodiments, dispersion engineered material systems such as metamaterials can open up a new design space for enhancing the inference and generalization performance of spectral encoding through trainable diffractive optical neural network devices 10. Finally, the methods presented herein would create new 3D imaging and sensing modalities that are integrated with optical inference and spectral encoding capabilities of broadband diffractive networks, and can be merged with some of the existing spectroscopic measurement techniques such as FDOCT, FTIR and others to find various new applications in biomedical imaging, analytical chemistry, material science and other fields. For example, the diffractive optical neural network device 10, light source 6, detector 32 or set of detectors 32 may be used in a time domain or spectral domain spectroscopy device, an interferometric measurement device, an optical coherence tomography (OCT) setup or device, a Fourier-transform infrared spectroscopy (FTIR) measurement system or device.
[0087] Materials and Methods
[0088] Terahertz time-domain spectroscopy setup. The schematic diagram of the terahertz time-domain spectroscopy (THz-TDS) setup is shown in
[0089] The 3D-printed diffractive optical neural network device 10 was placed between the terahertz source 6 and the detector 32. It consisted of an input aperture 8, an input object 4, three diffractive layers 16 and an output aperture 28, as shown in
[0090] Forward model of the diffractive optical network and its training. A diffractive optical neural network device 10 is, in general, composed of successive diffractive layers 16 (transmissive and/or reflective) that collectively modulate the incoming object waves. According to the forward model used in this work, the diffractive layers 16 are assumed to be thin optical modulation elements, where the i.sup.th feature on the l.sup.th layer at a spatial location (x.sub.i, y.sub.i, z.sub.i) represents a wavelength (λ) dependent complex-valued transmission coefficient, t.sup.l, given by:
t.sup.l(x.sub.i,y.sub.i,z.sub.i,λ)=a.sup.l(x.sub.i,y.sub.i,z.sub.i,λ)exp(jϕ.sup.l(x.sub.i,y.sub.i,z.sub.i,λ)) (3),
[0091] where a and ϕ denote the amplitude and phase coefficients, respectively.
[0092] The diffractive layers 16 are connected to each other by free-space propagation, which is modeled through the Rayleigh-Sommerfeld diffraction equation:
[0093] where w.sub.i.sup.l(x, y, z, λ) is the complex-valued field on the i.sup.th pixel of the l.sup.th layer at (x, y, z) with a wavelength of λ, which can be viewed as a secondary wave generated from the source at (x.sub.i, y.sub.i, z.sub.i); and r=√{square root over ((x−x.sub.i).sup.2+(y−y.sub.i).sup.2+(z−z.sub.i).sup.2)} and j=√{square root over (−1)}. For the l.sup.th layer (l≥1, treating the input plane as the 0.sup.th layer), the modulated optical field u.sup.l at location (x.sub.i, y.sub.i, z.sub.i) is given by
[0094] where I denotes all the pixels on the previous diffractive layer 16.
[0095] 0.5 mm was used as the smallest feature size of the diffractive layers 16, which is mainly restricted by the resolution of the 3D-printer. To model the Rayleigh-Sommerfeld diffraction integral more accurately over a wide range of illumination wavelengths, the diffractive space was sampled with a step size of 0.25 mm so that the x and y coordinate system in the simulation window was oversampled by two times with respect to the smallest feature size. In other words, in the sampling space a 2×2 binning was performed to form an individual feature of the diffractive layers 16, and thus all these four (4) elements share the same physical thickness, which is a learnable parameter. The printed thickness value, h, of each pixel of a diffractive layer is composed of two parts, h.sub.m and h.sub.base, as follows;
h=q(h.sub.m)+h.sub.base (6),
where h.sub.m denotes the learnable thickness parameters of each diffractive feature and is confined between h.sub.min=0 and h.sub.max=0.8 mm. The additional base thickness, h.sub.base, is a constant, non-trainable value chosen as 0.5 mm to ensure robust 3D printing and avoid bending of the diffractive layers after fabrication. Quantization operator in Eq. (6), i.e., q(⋅), denotes a 16-level/4-bit uniform quantization (0.05 mm for each level). To achieve the constraint applied to h.sub.m, an associated latent trainable variable h, was defined using the following analytical form:
[0096] Note that before the training starts, h.sub.m of all the diffractive neurons are initialized as 0.375 mm, resulting in an initial h of 0.575 mm. Based on these definitions, the amplitude and phase components of the complex transmittance of i.sup.th feature of layer l, i.e., a.sup.l(x.sub.i, y.sub.i, z.sub.i, λ) and ϕ.sup.l(x.sub.i, y.sub.i, z.sub.i, λ), can be written as a function of the thickness of each individual neuron hi and the incident wavelength λ:
[0097] where the wavelength dependent parameters n(λ) and κ(λ) are the refractive index and the extinction coefficient of the diffractive layer material corresponding to the real and imaginary parts of the complex-valued refractive index ñ(λ), i.e., ñ(λ)=n(λ)+jκ(λ). Both of these parameters for the 3D-printing material used herein were experimentally measured over a broad spectral range (see
[0098] Based on this outlined optical forward model,
[0099] Based on the diffractive network layout reported in
[0100] Spectral class scores. Each spectral component contained in the incident broadband terahertz beam is assumed to be a plane wave with a Gaussian lateral distribution. The beam waist corresponding to different wavelength components was experimentally measured. Although, a flat spectral magnitude (equal weight for each spectral component) was assumed during the training of the diffractive optical networks, the pulsed terahertz source used in the setup contained a different spectral profile within the band of operation. To circumvent this mismatch and calibrate the diffractive system (which is a one-time effort), the power spectrum was measured of the pulsed terahertz source 6 without any objects or diffractive layers serving as the experimental reference, I.sub.exp.sup.R(λ). In addition, the corresponding wave of each spectral component was propagated through free-space containing equal power across the entire operation band from the plane of the input aperture 8 all the way to the output plane, forming the numerical reference wave collected by the detector aperture 28, i.e., I.sub.tr.sup.R(λ). Based on these spectral power distributions used for calibration, the experimentally measured power spectrum, I.sub.exp(λ), that is optically created by a 3D-printed diffractive optical neural network 10 is normalized as:
[0101] which corrects the mismatch between the spectral profiles assumed in the training phase and the one provided by the broadband terahertz illumination source 6. In fact, this is an important practical advantage of the framework since the diffractive models can work with different forms of broadband radiation, following this calibration/normalization routine outlined above.
[0102] As described herein, there are two types of diffractive optical neural network devices 10 presented. With the number of wavelengths that one would like to encode the object information denoted by M and the number of data classes denoted by C, in the first type one assigns a single wavelength to each data class, thus one can take M=C (e.g., C=10 for MNIST data). For differential diffractive optical neural network devices 10, on the other hand, each data class is represented by a pair of spectral components, i.e., M=2C. As the dataset of handwritten digits has 10 classes, during the training of the standard diffractive optical networks, 10 discrete wavelengths were selected, each representing one digit. These wavelengths were uniformly distributed between λ.sub.min=1.00 mm and λ.sub.max=1.45 mm with 0.05 mm spacing; for the EMNIST image dataset this wavelength range was changed to be 0.825 mm to 1.45 mm with 0.025 mm spacing. For the differential diffractive optical neural network device 10 design, 20 wavelengths were uniformly distributed between λ.sub.min=0.65 mm and λ.sub.max=1.6 mm: for differential designs involving EMNIST image dataset, 52 wavelengths were used, uniformly distributed between λ.sub.min=0.755 mm and λ.sub.max=1.52 mm. The first 10 spectral components (s.sub.0, s.sub.1, . . . , s.sub.q) are assigned to be positive signals (s.sub.0,+, s.sub.1,+, s.sub.9,+) and the subsequent 10 spectral components (s.sub.10, s.sub.11, . . . , s.sub.19) are assigned to be negative signals (s.sub.0,−, s.sub.1,−, . . . , s.sub.9,−). Based on this, the differential spectral class score Δs.sub.c for class c is defined as:
[0103] where s.sub.c,+ and s.sub.c,− denote the positive and negative spectral signals for the c.sup.th class, respectively, and T is a non-learnable hyperparameter (also referred to as the ‘temperature’ hyperparameter in machine learning literature) used only in the training phase to improve the convergence speed and the accuracy of the final model; T was empirically chosen as T=0.1.
[0104] Image reconstruction neural network architecture. The image reconstruction trained neural network 110 is a 3-layer (with 2 hidden layers) fully-connected neural network, which receives an input of spectral class score vector (s) and outputs a reconstructed image 120 of the object 4. The two (2) hidden layers have 100 and 400 neurons, respectively. The size of the 3D-printed objects 4 used in the experiments is 2 cm×2 cm and when they are sampled at 0.5 mm intervals, in the discrete space each input object corresponds to 40×40 pixels, hence the dimension of the output layer of the image reconstruction network is 1600. Each fully connected layer of this image reconstruction ANN has the following structure:
z.sub.k+1=BN{LReLU[FC{z.sub.k}]} (12),
[0105] where z.sub.k and z.sub.k+1 denotes the input and output of the k.sup.th layer, respectively, FC denotes the fully connected layer, LReLU denotes leaky rectified linear unit, and BN is the batch normalization layer. In the architecture used, LReLU is defined as:
[0106] For the batch normalization layer, BN, with a d-dimensional input x=(x.sup.(1), . . . , x.sup.(d)), each dimension of the input is first normalized (i.e., re-centered and re-scaled) using its mean μ.sub.B and standard deviation σ.sub.B calculated across the mini-batch B of size m, and then multiplied and shifted by the parameters γ.sup.(k) and β.sup.(k) respectively, which are both subsequently learnt during the optimization process:
[0107] where k∈[1,d], i∈[1,m] and ∈ is a small number added in the denominator for numerical stability.
[0108] Loss function for the training of spectral encoding diffractive optical networks. The total loss for training of diffractive optical networks, .sub.D, is defined as
.sub.D=
.sub.I+α.Math.
.sub.E+β.Math.
.sub.P (16),
[0109] where .sub.I stands for the optical inference loss.
.sub.E denotes the output detector diffractive power efficiency-related loss and
.sub.P denotes the spatial purity loss. The non-trainable hyperparameters, α and β, are relative weight coefficients for the corresponding loss terms. For different diffractive optical networks presented herein, the (α, β) pairs are set to be (0.4,0.2), (0.08,0.2), (0.03,0.1), (0,0) and (0,0) providing 84.02%, 93.28%, 95.05%, 96.07% and 96.82% optical inference accuracy, respectively (see Table 1). For multi-class object classification,
.sub.I was defined using softmax-cross-entropy (SCE) as follows:
[0110] where , C and g.sub.c denote the normalized spectral class score for the c.sup.th class, the number of data classes, and the c.sup.th entry of the ground truth label vector, respectively. In the 10-wavelength diffractive optical network designs, M=C=10, and
is calculated as:
[0111] where T′ is a non-learnable hyperparameter, which is used only in the training phase and empirically chosen as 0.1. For the 20-wavelength differential diffractive optical network design, is equal to Δs.sub.c defined in Eq. (3).
[0112] The output detector 32 diffractive power efficiency-related loss term .sub.E in Eq. (16) is defined as:
[0113] where η denotes the diffractive power efficiency at the output detector 32 and η.sub.th refers to the penalization threshold that was taken as 0.015 during the training phase. η is defined as:
[0114] where I.sub.c.sub.
[0115] The spatial purity loss .sub.P is used to clear the optical power over a small region of interest, 1 cm×1 cm surrounding the active area of the single-pixel detector, for the purpose of decreasing the sensitivity of the diffractive optical network to potential misalignment of the detector in the transverse plane with respect to the optical axis.
.sub.P is calculated using:
[0116] where I.sub.detector, c and I.sub.peripheral, c denote the optical power of the c.sup.th spectral component collected by the active area of the output detector 32 and within a 1 cm×1 cm periphery around the output detector 32 aperture, respectively.
[0117] Loss function for the training of image reconstruction (decoder) networks. Total loss of an electronic image reconstruction network, .sub.Recon, is defined as:
.sub.Recon=γ.Math.
.sub.S(O.sub.recon,O.sub.input)+(1−γ).Math.
.sub.I (22),
[0118] where .sub.S stands for the pixel-wise structural loss between the reconstructed image of the object O.sub.recon and the ground truth object structure O.sub.input.
.sub.I is the same loss function defined in Eq. (17); except, instead of ŝ, it computes the loss SCE(
, g) using
and ground truth label vector g. Here,
denotes the new class scores computed by cycling O.sub.recon back to the object plane of the diffractive optical network model at hand and numerically propagating it through the optical forward model as depicted in
[0119] where q is a hyperparameter that is empirically set as 20% of the standard deviation of the normalized input ground truth image. Examples of the reconstructed images using these different loss terms are shown in
[0120] Training-related details. Both the diffractive optical neural network models/devices 10 and the corresponding decoder trained neural network 110 used herein were simulated and trained using Python (v3.6.5) and TensorFlow (v1.15.0, Google Inc.). Adam was selected as the optimizer during the training of all the models, and its parameters were taken as the default values in TensorFlow and kept identical in each model. The learning rate was set as 0.001. The handwritten digit image data are divided into three parts: training, validation and testing, which contain 55K, 5K and 10K images, respectively. Diffractive optical networks were trained for 50 epochs and the best model was selected based on the classification performance on the validation data set. Image reconstruction neural networks 110 were trained for 20 epochs. In .sub.S(O.sub.recon,O.sub.input) In this case, the best trained neural network 110 model was selected based on the minimum loss value over the validation data set. If there was an image feedback cycle, i.e., γ<1 in Eq. (22), the best trained neural network 110 model was selected based on the classification performance provided by
over the validation set.
[0121] For the training of the models, a desktop computer with a TITAN RTX graphical processing unit (GPU, Nvidia Inc.) and Intel® Core™ i9-9820X central processing unit (CPU, Intel Inc.) and 128 GB of RAM was used, running Windows 10 operating system (Microsoft Inc.). For the diffractive optical front-end design involving M=C=10, the batch size was set to be 4 and 5 for the diffractive optical neural network 10 and the associated image reconstruction trained neural network 110, respectively. However, for the differential design of the diffractive optical front-end with M=2C=20, the batch size was set to be 2 and 5 during the training of the diffractive optical neural network 10 and the associated image reconstruction trained neural network 110, respectively. The main limiting factor on these batch size selections is the GPU memory of the computer. The typical training time of a diffractive optical neural network model with C=10 is ˜80 hours. The typical training time of an image reconstruction decoder trained neural network 110 with and without the image feedback/collaboration loop is ˜20 hours and ˜2 hours, respectively.
[0122] While embodiments of the present invention have been shown and described, various modifications may be made without departing from the scope of the present invention. For example, while the system 2 is described herein as performing object 4 classification or image reconstruction, the system 2 may also be used to perform optical signal compression. In addition, while the invention largely focuses on optical diffraction and reflection, the system and method can also be used with acoustic waves instead of optical waves as seen in the embodiment of