METHOD AND SYSTEM FOR ADAPTIVE BEAMFORMING OF ULTRASOUND SIGNALS
20210382157 · 2021-12-09
Inventors
- Wouter Marinus Benjamin LUIJTEN (EINDHOVEN, NL)
- Ruud Johannes Gerardus van Sloun (Eindhoven, NL)
- Frederik Jan De Bruijn (Eindhoven, NL)
- Harold Agnes Wilhelmus Schmeitz (Eindhoven, NL)
Cpc classification
G06N3/082
PHYSICS
G01S7/52085
PHYSICS
G01S7/52047
PHYSICS
G10K11/34
PHYSICS
International classification
Abstract
The invention relates to a method for adaptive beamforming of ultrasound signals, the method comprising the steps of (a) Receiving time-aligned RF signals acquired by multiple ultrasound transducer elements in response to an ultrasound transmission; (b) Determining content-adaptive apodization weights for beamforming the time-aligned RF signals by applying a trained artificial neural network (16) to the time-aligned RF signals; and (c) Applying the content-adaptive apodization weights to the time-aligned RF signals to calculate a beamformed output signal. The invention also relates to a method for training an artificial neural network (16) useful in adaptive beamforming of ultrasound signals, and a related computer program and system.
Claims
1. A method for adaptive beamforming of ultrasound signals, the method comprising the steps of a) Receiving RF signals acquired by multiple ultrasound transducer elements in response to an ultrasound transmission; b) Determining content-adaptive apodization weights for beamforming the RF signals by applying a trained artificial neural network to the RF signals.
2. The method of claim 1, wherein the number of input nodes and the number of output nodes of the trained artificial neural network corresponds to the number of contributing RF signals.
3. The method of claim 1, comprising a further step of c) Applying the content-adaptive apodization weights to the RF signals to calculate a beamformed output signal.
4. The method of claim 1, wherein the trained artificial neural network comprises at least one activation layer including an activation function, which propagates both positive and negative input values with unbounded output values.
5. The method of claim 1, wherein the neural network comprises at least one activation layer including an activation function which concatenates the positive and the negative part of input values.
6. The method of claim 1, wherein the artificial neural network comprises at most four fully connected layers.
7. The method of claim 1, wherein the artificial neural network comprises at most three activation layers.
8. The method of claim 1, wherein the beamformed output signal is used to reconstruct an ultrasound image of a field-of-view, and wherein the RF signals are rearranged prior to applying the trained artificial neural network, so that the RF data relating to one or at most a few pixels of the ultrasound image are processed in one or more batches by the artificial neural network.
9. The method of claim 1, wherein the artificial neural network comprises at least one convolutional layer, in addition to or as an alternative to one or several fully-connected layer(s).
10. The method of claim 1, wherein the artificial neural network is part of a recurrent neural network.
11. The method of claim 1, wherein some or all of the weights of the artificial neural network are quantized, in particular quantized to 1 to 4 bits.
12. The method of claim 1, wherein the artificial neural network comprises at least one hidden layer having fewer nodes than the input layer and/or the output layer of the artificial neural network.
13. A method for providing a trained artificial neural network useful in content-adaptive beamforming of ultrasound signals, the method comprising: (a) Receiving input training data, namely RF signals acquired by multiple ultrasound transducer elements in response to an ultrasound transmission, (b) Receiving output training data, wherein the output training data are content-adaptive apodization weights, wherein such content-adaptive apodization weights have been calculated from the RF signals by a content-adaptive beamforming algorithm, in particular a minimum variance algorithm; or wherein the output training data are beamformed output signals calculated from the RF signals by a content-adaptive beamforming algorithm; (c) training an artificial neural network by using the input training data and the output training data; (d) providing the trained artificial neural network.
14. A computer program comprising instruction, which, when the program is executed by a computational unit, causes the computational unit to carry out the method of claim 13.
15. A system for adaptive beamforming of ultrasound signals, the system comprising a) a first interface, configured for receiving RF signals acquired by multiple ultrasound transducer elements in response to an ultrasound transmission; b) a computational unit configured for applying a trained artificial neural network to the RF signals, whereby content-adaptive apodization weights for beamforming the RF signals are generated, and for applying the content-adaptive apodization weights to the RF signals to calculate a beamformed output signal; c) a second interface, configured for outputting the beamformed output signal.
Description
SHORT DESCRIPTION OF THE FIGURES
[0053] Useful embodiments of the invention shall now be described with reference to the attached figures. Similar elements or features are designated with the same reference signs in the figures. In the figures:
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
DESCRIPTION OF EMBODIMENTS
[0064]
[0065]
[0066] According to the invention, the conventional adaptive beamforming algorithm/processor 14, is replaced by a neural network. An example of such neural network 16 is shown in
[0067] The neural network 16 receives as input the time-aligned RF signals 18 S.sub.1, S.sub.2 . . . , S.sub.n acquired from a plurality of ultrasound transducers, and which are to be used to calculate one pixel. The number of nodes 21 in the input layer 20 corresponds to n, the number of contributing RF signals. In this embodiment, the number n of nodes 34 of the output layer 36 corresponds to the number n of nodes 21 of the input layer 20. To calculate the content-adaptive apodization weights w.sub.1, . . . , w.sub.n, the input values signals S.sub.1, S.sub.2 . . . , S.sub.n are propagated through the neural network.
[0068] In this embodiment, the input layer 20 is a fully-connected layer, i.e. each node 21 in the input layer is connected by an edge 22 with each node 23 in the next layer 24. This operation corresponds to a matrix multiplication, wherein each value of the input layer 20 is multiplied with the weights of the edges connecting it to the nodes 23 in the next layer 24.
[0069] The next layer is an activation layer 24, in this example an antirectifier layer. The antirectifier effectively introduces non-linearity, while preserving negative signal components as well as the dynamic range of the input. Because it concatenates the positive and the negative part of the input, in effectively doubles the number of nodes 25 in the following layer 26, since each node 23 has a different output depending on whether it has a positive or a negative input value, as illustrated by the two edges 24a and 24b. Otherwise, the structure of the nodes 25 contained in the following layer 26 is equivalent to the structure of the nodes 23 in the activation layer 24, i.e. there is no inter-connection between neighbouring nodes 23 in layer 24.
[0070] The layer 26 following the activation layer 24 is again a fully-connected layer, i.e. each node 25 in this layer is connected to each node 27 in the following layer 28. This following layer 28 has significantly fewer nodes 27 than the preceding layer 26. By reducing the number of nodes, the number of parameters/weights that needs to be trained is reduced. This leads to the amount of computation in the network being reduced and to a control of overfitting. For example, there may be a dimensionality reduction by a factor of 3-6, in the shown example, the factor is 3, i.e. the layer 28 has a third of the size of the preceding layer 26. In useful embodiments, the factor will be 5. The layer 28 is again an activation layer, namely an anti-rectifier layer, which combines a sample wise L2 normalisation with two ReLU activations, thereby concatenating the positive and the negative part of the input. This results in a doubling of the number of nodes 29 in the next layer 32. This layer 32 is again a fully-connected layer, since each node in layer 32 is connected to each node 34 in the output layer 36. The values outputted at output layer 36 are the content-adaptive apodization weights w.sub.1, . . . , w.sub.n.
[0071] In the embodiment of
[0072] In
[0073] The neural network 16 of this preferred embodiment is shown in more detail in
[0074] During training, dropout is applied between each pair of fully-connected layers, for example with a probability of 0.2. In other words, during training a fixed percentage of the nodes in the dropout layers are dropped out. Thus, the dropout layers are present only during training the network. The dropout helps to reduce overfitting of the neural network to the training data.
[0075] The NN may be implemented in Python using the Keras API with a TensorFlow (Google, CA, USA) backend. For training the Adam optimizer was used with a learning rate of 0.001, stochastically optimizing across a batch of pixels belonging to a single image. The neural network shown in
[0076] When training the neural network shown in
[0077] The NN of
[0078] The method according to this embodiment of the invention was also tested on simulated images in order to compare resolution and contrast. Resolution was assessed by evaluating the averages full-width-at-half-maximum (FWHM) of all point scatterers. Contrast was estimated using the average CNR of anechoic cysts. The results are shown in Table 1. Thus, the NN beamformer is able to generate a high contrast image, with significantly less clutter than the MV target.
TABLE-US-00001 TABLE 1 Resolution and Contrast metrics Parameter DAS NN MV FWHM.sub.lat (mm) 0.846 0.704 0.778 FWHM.sub.ax (mm) 0.431 0.342 0.434 CNR (dB) 10.96 11.48 12.45
[0079]
[0080] Further, there may be a connection to a remote computer or server 128, for example via the internet 112. The method according to the invention may be performed by CPU 104 or GPU 106 of the hardware unit 102 but may also be performed by a processor of the remote server 128.
[0081]
[0082] An MLP is a feedforward artificial neural network, which takes a set of input data and maps onto a set of appropriate outputs. An MLP consists of multiple layers of neurons, which have nonlinear activation functions, with each layer being fully connected to the next one. It has been demonstrated previously that the minimum number of layers needed to represent an arbitrary continuous mapping y=ƒ(x.sub.1, x.sub.2, . . . , x.sub.n) is 3, having the input layer, the hidden layer, and the output layer. A 3-layer MLP (or equivalently a 1-hidden-layer MLP) is a function ƒ: R.sup.n.fwdarw.R.sup.l, where n is the size of the input vector x and l is the size of the output vector ƒ(x) such that, in matrix notation:
y≅ƒ(x)=G{b.sup.(2)+W.sup.(2)[s(b.sup.(1)+W.sup.(1)x)]},
where b.sup.(1) and b.sup.(2) are bias vectors, w.sup.(1) and w.sup.(2) are weight matrices and G and s are activation functions. A commonly-used activation function is in the form of a sigmoid function:
[0083] where λ determines the slope of the transition from 0 to 1. The weight matrices w.sup.(1) and W.sup.(2) are computed using a training algorithm such as the Levenberg-Marquardt or the back-propagation algorithms.
[0084] The neural networks used in this alternative aspect are first trained to learn an adaptive beamforming algorithm based on a training dataset that has been generated by Field II simulation and then, the trained neural network is applied to two different test datasets to prove the concept. The alternative aspect is a framework that can be generalized to many other computationally expensive adaptive beamforming techniques as long as sufficient amount of input-output data pairs are available.
[0085] The alternative aspect provides a machine learning framework that can learn computationally expensive adaptive beamforming algorithms from a limited amount of training datasets and apply the learned algorithms on new datasets at a significantly lower computational cost via inference. The alternative aspect of the invention may be an enabler of real-time processing of computationally expensive adaptive beamforming algorithms that are otherwise very difficult to run in real-time, sometimes even with GPUs at an above-average price point.
[0086] The main element of the alternative aspect of the invention is a neural network that maps time-aligned complex per-channel RF data to complex beamformed RF data. While several types of neural networks like multi-layer perceptron (MLP), convolutional neural networks (CNN), and more advanced networks such as regressive and/or generative networks may be used to perform similar tasks, the alternative aspect uses MLP model to demonstrate the feasibility of using machine learning/deep learning framework to learn and apply an advanced adaptive beamforming technique. The MV beamformer is used as a test algorithm, but the core concepts presented here can be extended to other adaptive beamforming algorithms as well.
[0087] The input-output pairs used in training the neural network in the alternative aspect of the invention are not pixels from original and MV beamformer images, but rather the input data consists of time-aligned complex channel RF signals at a given depth and the output data is the corresponding complex beamformer output for the MV beamformer. The main steps are illustrated in
[0088] In step 2, the training data set is used to train the learning algorithm. This step is performed iteratively until the mapping error converges to a certain pre-specified level. An MLP model was used to prove the concept. However, more advanced network architectures involving convolutional neural networks may be used. This will be described in more detail as an embodiment later.
[0089] In step 3, a test data set in the form of time-aligned complex per-channel data, which the learning algorithm has not observed before, is introduced. The trained algorithm operates on the input data to predict (or infer) its complex MV beamformer output. The inference step is expected to be significantly faster than direct computation of MVBF as it approximates computationally-intensive operations in MVBF using only additions and multiplications. For example, the computational complexity associated with standard DAS is linear with the number of elements, O(N). However, the computational complexity for MVBF is proportional to the subarray size L and becomes O(L.sup.3) due to matrix inversions needed to compute the optimal aperture weights. However, using MLPs, the added computational burden can be significantly reduced, potentially making it more feasible for real-time processing.
[0090] Some preliminary simulation results are provided in
[0091]
[0092] Other network architectures could be used to learn the adaptive beamforming algorithm. The key component is that the network maps from per-channel inputs to beamformed outputs. For instance, a convolutional neural network is expected to give good results. The input data is the aligned, real (or complex) per channel data. Processing can be local (learning one pixel value from the relevant per channel data) or global (learning the whole beamformed RF frame from the whole aligned data stack). Local processing seems appropriate to imitate algorithms (such as the minimum variance beamformer) whose input data is local anyways. Global algorithms also have to potential to learn and use anatomy information, provided enough data is provided.
[0093] In keeping with the philosophy of this alternative aspect of the invention, the following describes a local approach with convolutional neural network. The aligned per-channel data for each pixel is cropped in fast time around the sample depth of interest, yielding a (numTime*numElements) data matrix. The time dimension is typically a few wavelengths to be sensitive to steering effects. The training dataset size is determined by the number of such data windows in the number of available images. One single image can typically yield hundreds of thousands of independent training input-output pairs. A fully convolutional neural network with receptive field spanning the full input data and outputting a single scalar can be trained to learn the adaptively beamformed value at the depth of interest.
[0094] The above-discussion is intended to be merely illustrative of the present system and should not be construed as limiting the appended claims to any particular embodiment or group of embodiments. Thus, while the present system has been described in particular detail with reference to exemplary embodiments, it should also be appreciated that numerous modifications and alternative embodiments may be devised by those having ordinary skill in the art without departing from the broader and intended spirit and scope of the present system as set forth in the claims that follow. Accordingly, the specification and drawings are to be regarded in an illustrative manner and are not intended to limit the scope of the appended claims.