METHOD AND SYSTEM FOR ADAPTIVE BEAMFORMING OF ULTRASOUND SIGNALS

20210382157 · 2021-12-09

    Inventors

    Cpc classification

    International classification

    Abstract

    The invention relates to a method for adaptive beamforming of ultrasound signals, the method comprising the steps of (a) Receiving time-aligned RF signals acquired by multiple ultrasound transducer elements in response to an ultrasound transmission; (b) Determining content-adaptive apodization weights for beamforming the time-aligned RF signals by applying a trained artificial neural network (16) to the time-aligned RF signals; and (c) Applying the content-adaptive apodization weights to the time-aligned RF signals to calculate a beamformed output signal. The invention also relates to a method for training an artificial neural network (16) useful in adaptive beamforming of ultrasound signals, and a related computer program and system.

    Claims

    1. A method for adaptive beamforming of ultrasound signals, the method comprising the steps of a) Receiving RF signals acquired by multiple ultrasound transducer elements in response to an ultrasound transmission; b) Determining content-adaptive apodization weights for beamforming the RF signals by applying a trained artificial neural network to the RF signals.

    2. The method of claim 1, wherein the number of input nodes and the number of output nodes of the trained artificial neural network corresponds to the number of contributing RF signals.

    3. The method of claim 1, comprising a further step of c) Applying the content-adaptive apodization weights to the RF signals to calculate a beamformed output signal.

    4. The method of claim 1, wherein the trained artificial neural network comprises at least one activation layer including an activation function, which propagates both positive and negative input values with unbounded output values.

    5. The method of claim 1, wherein the neural network comprises at least one activation layer including an activation function which concatenates the positive and the negative part of input values.

    6. The method of claim 1, wherein the artificial neural network comprises at most four fully connected layers.

    7. The method of claim 1, wherein the artificial neural network comprises at most three activation layers.

    8. The method of claim 1, wherein the beamformed output signal is used to reconstruct an ultrasound image of a field-of-view, and wherein the RF signals are rearranged prior to applying the trained artificial neural network, so that the RF data relating to one or at most a few pixels of the ultrasound image are processed in one or more batches by the artificial neural network.

    9. The method of claim 1, wherein the artificial neural network comprises at least one convolutional layer, in addition to or as an alternative to one or several fully-connected layer(s).

    10. The method of claim 1, wherein the artificial neural network is part of a recurrent neural network.

    11. The method of claim 1, wherein some or all of the weights of the artificial neural network are quantized, in particular quantized to 1 to 4 bits.

    12. The method of claim 1, wherein the artificial neural network comprises at least one hidden layer having fewer nodes than the input layer and/or the output layer of the artificial neural network.

    13. A method for providing a trained artificial neural network useful in content-adaptive beamforming of ultrasound signals, the method comprising: (a) Receiving input training data, namely RF signals acquired by multiple ultrasound transducer elements in response to an ultrasound transmission, (b) Receiving output training data, wherein the output training data are content-adaptive apodization weights, wherein such content-adaptive apodization weights have been calculated from the RF signals by a content-adaptive beamforming algorithm, in particular a minimum variance algorithm; or wherein the output training data are beamformed output signals calculated from the RF signals by a content-adaptive beamforming algorithm; (c) training an artificial neural network by using the input training data and the output training data; (d) providing the trained artificial neural network.

    14. A computer program comprising instruction, which, when the program is executed by a computational unit, causes the computational unit to carry out the method of claim 13.

    15. A system for adaptive beamforming of ultrasound signals, the system comprising a) a first interface, configured for receiving RF signals acquired by multiple ultrasound transducer elements in response to an ultrasound transmission; b) a computational unit configured for applying a trained artificial neural network to the RF signals, whereby content-adaptive apodization weights for beamforming the RF signals are generated, and for applying the content-adaptive apodization weights to the RF signals to calculate a beamformed output signal; c) a second interface, configured for outputting the beamformed output signal.

    Description

    SHORT DESCRIPTION OF THE FIGURES

    [0053] Useful embodiments of the invention shall now be described with reference to the attached figures. Similar elements or features are designated with the same reference signs in the figures. In the figures:

    [0054] FIG. 1 is a schematic illustration of a conventional DAS beamforming technique;

    [0055] FIG. 2 depicts a schematic overview of an adaptive beamformer;

    [0056] FIG. 3 is a simplified illustration of a neural network, according to an embodiment of the invention;

    [0057] FIG. 4 is a schematic overview of an implementation of an embodiment of the inventive method;

    [0058] FIG. 5 shows a schematic representation of an artificial neural network, according to an embodiment of the invention;

    [0059] FIG. 6 shows ultrasound images obtained with (A) conventional DAS beamforming, (B) minimum variance beamforming, (C) a deep learning based beamformer according to an embodiment of the invention;

    [0060] FIG. 7 shows an ultrasound imaging system according to an embodiment of the invention;

    [0061] FIG. 8 shows an overview of an alternative neural network based beamforming method;

    [0062] FIG. 9 shows a training data set from a simulated phantom used in the alternative method, wherein the original image is shown on the left, the image obtained by the alternative neural network method in the middle, and the minimum variance beamformed image on the right.

    [0063] FIG. 10 shows a test data set from a simulated phantom pertaining to the alternative method, wherein the original image is shown on the left, the image obtained by the alternative neural network method in the middle, and the minimum variance beamformed image on the right.

    DESCRIPTION OF EMBODIMENTS

    [0064] FIG. 1 illustrates conventional beamforming with the delay-and-sum (DAS) method. In response to e.g. an ultrasound pulse transmitted by an array 4 of transducer elements, echoes 3 are reflected from a point structure (focal point) 2 in the field-of-view. The echoes 3 are recorded by the array 4 of ultrasound transducers. The thus acquired raw RF signals 5 are also referred to channel data, each raw RF signal 5 having been acquired by one transducer element and thus relating to one channel. The example in FIG. 1 shows 8 channels. For beamforming, the channel data 5 are time-of-flight corrected in step 6, i.e. the different time shifts t.sub.1, t.sub.2, . . . , t.sub.n in which the echoes 3 were acquired by the array 4 are corrected for, depending on the geometry of the transducer array 4 and the focal point 2. These time-aligned RF signals S.sub.1 . . . S.sub.n are then multiplied with apodization weights w.sub.1 . . . w.sub.n in step 7. In conventional DAS-beamforming, these weights are pre-set and not adapted to the content of the ultrasound image, i.e. they are not adapted to the RF signals. The weighted signals 8 are summed in step 9, to yield a beamformed output signal 10. This beamformed output signal 10 can be further processed to yield image data for one pixel.

    [0065] FIG. 2 shows an adaptive beamforming method. In this method, the time-aligned RF signals 18 S.sub.1, S.sub.2 . . . , S.sub.n are used by a beamforming algorithm 14 to calculate the content-adaptive apodization weights 12, which, thus, are not pre-determined as in the DAS-beamformer. Rather, the signals are processed by the adaptive beamformer 14, e.g. a minimum variance beamformer, which calculates the optimal weights in order to maximize image quality. The weighted RF signals are summed in step 9, which results in the beamformed output signal.

    [0066] According to the invention, the conventional adaptive beamforming algorithm/processor 14, is replaced by a neural network. An example of such neural network 16 is shown in FIG. 3. This example network is arranged into layers 20, 24, 26, 28, 32, 36, each layer consisting of a number of nodes, wherein the nodes between neighbouring layers are connected by edges. Each edge/connection corresponds to a simple operation, which is performed on the value of the first node, and the value of this operation is added to the value of the connected node. In particular, a real or complex number can be assigned as a value to each node of the neural network.

    [0067] The neural network 16 receives as input the time-aligned RF signals 18 S.sub.1, S.sub.2 . . . , S.sub.n acquired from a plurality of ultrasound transducers, and which are to be used to calculate one pixel. The number of nodes 21 in the input layer 20 corresponds to n, the number of contributing RF signals. In this embodiment, the number n of nodes 34 of the output layer 36 corresponds to the number n of nodes 21 of the input layer 20. To calculate the content-adaptive apodization weights w.sub.1, . . . , w.sub.n, the input values signals S.sub.1, S.sub.2 . . . , S.sub.n are propagated through the neural network.

    [0068] In this embodiment, the input layer 20 is a fully-connected layer, i.e. each node 21 in the input layer is connected by an edge 22 with each node 23 in the next layer 24. This operation corresponds to a matrix multiplication, wherein each value of the input layer 20 is multiplied with the weights of the edges connecting it to the nodes 23 in the next layer 24.

    [0069] The next layer is an activation layer 24, in this example an antirectifier layer. The antirectifier effectively introduces non-linearity, while preserving negative signal components as well as the dynamic range of the input. Because it concatenates the positive and the negative part of the input, in effectively doubles the number of nodes 25 in the following layer 26, since each node 23 has a different output depending on whether it has a positive or a negative input value, as illustrated by the two edges 24a and 24b. Otherwise, the structure of the nodes 25 contained in the following layer 26 is equivalent to the structure of the nodes 23 in the activation layer 24, i.e. there is no inter-connection between neighbouring nodes 23 in layer 24.

    [0070] The layer 26 following the activation layer 24 is again a fully-connected layer, i.e. each node 25 in this layer is connected to each node 27 in the following layer 28. This following layer 28 has significantly fewer nodes 27 than the preceding layer 26. By reducing the number of nodes, the number of parameters/weights that needs to be trained is reduced. This leads to the amount of computation in the network being reduced and to a control of overfitting. For example, there may be a dimensionality reduction by a factor of 3-6, in the shown example, the factor is 3, i.e. the layer 28 has a third of the size of the preceding layer 26. In useful embodiments, the factor will be 5. The layer 28 is again an activation layer, namely an anti-rectifier layer, which combines a sample wise L2 normalisation with two ReLU activations, thereby concatenating the positive and the negative part of the input. This results in a doubling of the number of nodes 29 in the next layer 32. This layer 32 is again a fully-connected layer, since each node in layer 32 is connected to each node 34 in the output layer 36. The values outputted at output layer 36 are the content-adaptive apodization weights w.sub.1, . . . , w.sub.n.

    [0071] In the embodiment of FIG. 3, the neural network has three fully-connected layers (the output layer 36 not counting as one, since it does not propagate any values to a following layer), wherein the fully-connected layers are named 20, 26 and 32. Further, the network has two activation layers 23 and 28, each in between a pair of fully-connected layers. In other useful embodiments, there may be another fully-connected layer followed by an activation layer, i.e. a total of 4 fully-connected layers and three activation layers.

    [0072] In FIG. 4, a schematic overview of a possible implementation of the inventive method is given. The (raw) RF signals are illustrated as input data at 40, wherein the data comprises of a number of channels, each having a number of axial samples. In step 42, the raw RF data is time-aligned using traditional methods, wherein the different planes stand for the data of the different channels. In this embodiment, the time-of-flight correction of the RF signals is calculated beforehand and stored in a buffer. Alternatively, the time-of-flight correction could also be computed on the fly in a GPU, thereby reducing communication and memory overhead. Further, in this implementation, all data 43 from the various channels relating to one pixel is rearranged into a new format 45 in step 44, so that the data 43 for each pixel can be processed as a single batch in the NN. The next step of applying the NN 16 to the time-aligned and rearranged RF signal 45 is shown at 46. A skip connection 48 is added from the input (time-aligned RF signals) to the output at 50, where the time-aligned RF signals are multiplied with the apodization weights generated by the NN in step 52. The result is beamformed RF data 55 relating to one pixel 54, which is used to reconstruct an ultrasound image 51. After beamforming by the NN, the beamformed pixels are rearranged according to their spatial location.

    [0073] The neural network 16 of this preferred embodiment is shown in more detail in FIG. 5. Above each layer its output size (for 128 contributing RF signals) is indicated. The fully-connected layers are illustrated by a dark shading, the antirectifier layers are illustrated in white, and the drop-out layers (which are only present during training of the network) are illustrated in a light shading. This NN 16 comprises four fully-connected layers comprising 128 nodes for the input layer and output layer, and 32 nodes for the inner layers. This dimensionality reduction 58 by a factor of 8 (2.sup.3) following the first antirectifier layer, or by a factor of 4 with respect to the input layer, forces the network to find a more compact representation of the data. Each of the fully-connected layers (except the last layer) is followed by an Antirectifier layer. The last fully-connected layer 60 is either the output layer or is directly connected to an output layer (not shown).

    [0074] During training, dropout is applied between each pair of fully-connected layers, for example with a probability of 0.2. In other words, during training a fixed percentage of the nodes in the dropout layers are dropped out. Thus, the dropout layers are present only during training the network. The dropout helps to reduce overfitting of the neural network to the training data.

    [0075] The NN may be implemented in Python using the Keras API with a TensorFlow (Google, CA, USA) backend. For training the Adam optimizer was used with a learning rate of 0.001, stochastically optimizing across a batch of pixels belonging to a single image. The neural network shown in FIG. 5 was implemented and trained on in vivo ultrasound image data.

    [0076] When training the neural network shown in FIG. 5, the apodization weights calculated by a known adaptive beamforming technique using traditional algorithms may be used as output training data. The input training data is the corresponding time-aligned RF signals. During training, the NN 16 is applied to the training input data to generate calculated output data. A comparison between the calculated output data and the output training data is used to recursively adapt the weights within the neural network 16, in this case at a learning rate of, for example 0.0005 to 0.01. To prevent overfitting, methods of regularization can be used, i.e. drop-out of nodes, using artificially calculated data or weight decay based on normalization.

    [0077] The NN of FIG. 5 was tested on images acquired using a single plane-wave transmission, and the results are shown in FIG. 6. The image designated A shows the image reconstructed with a DAS beamformer. The image B uses a minimum variance beamformer, and in image C the deep learning based beamformer according to the implementation of the invention was applied. It can be observed that the NN beamformer is able to generate a high-contrast image comparable to the MV target, with significantly less clutter. Further, both adaptive techniques show an increase in CNR (Contrast-to-noise ratio) and resolution compared to the DAS, with the NN even outperforming the MV target on the latter, likely due to its ability of incorporating a generable prior in the beamforming process by averaging statistics of the training data. Training on higher quality images allows for improving the performance of the NN even further.

    [0078] The method according to this embodiment of the invention was also tested on simulated images in order to compare resolution and contrast. Resolution was assessed by evaluating the averages full-width-at-half-maximum (FWHM) of all point scatterers. Contrast was estimated using the average CNR of anechoic cysts. The results are shown in Table 1. Thus, the NN beamformer is able to generate a high contrast image, with significantly less clutter than the MV target.

    TABLE-US-00001 TABLE 1 Resolution and Contrast metrics Parameter DAS NN MV FWHM.sub.lat (mm) 0.846 0.704 0.778 FWHM.sub.ax (mm) 0.431 0.342 0.434 CNR (dB) 10.96 11.48 12.45

    [0079] FIG. 7 is a schematic representation of an ultrasound system 100 according to an embodiment of the invention and configured to perform the inventive method. The ultrasound system 100 includes a usual ultrasound hardware unit 102, comprising a CPU 104, GPU 106 and digital storage medium 108, for example a hard disc or solid-state disc. A computer program may be loaded into the hardware unit, from CD-ROM 110 or over the internet 112. The hardware unit 102 is connected to a user-interface 114, which comprises a keyboard 116 and optionally a touchpad 118. The touchpad 118 may also act as a display device for displaying imaging parameters. The hardware unit 102 is connected to the ultrasound probe 120, which includes an array of ultrasound transducers 122, which allows the acquisition of live ultrasound images from a subject or patient (not shown). The live images 124, acquired with the ultrasound probe 120 and beamformed according to the inventive method performed by the CPU 104 and/or GPU, are displayed on screen 126, which may be any commercially available display unit, e.g. a screen, television set, flat screen, projector etc.

    [0080] Further, there may be a connection to a remote computer or server 128, for example via the internet 112. The method according to the invention may be performed by CPU 104 or GPU 106 of the hardware unit 102 but may also be performed by a processor of the remote server 128.

    [0081] FIGS. 8 to 10 relate to an alternative aspect of the invention, in which a NN is used not to calculate apodization weights, but outputs adaptively beamformed RF data. The aim of this alternative aspect may be described as learning the behaviour of a given adaptive beamforming (BF) algorithm more accurately, by learning the actual mathematical operations of the adaptive BF algorithm involving per-channel data (i.e. RF signals) and beamformed RF data, rather than just learning the effect of adaptive beamforming purely in image domain. This alternative aspect provides a machine learning framework involving multi-layer perceptrons (MLPs) to learn computationally expensive adaptive beamforming algorithms, such as the MV or other techniques described above. However, this machine learning framework is intended to learn the mapping between aligned complex per-channel data and the adaptively beamformed RF data as opposed to mapping between pixel values from original and adaptively beamformed images.

    [0082] An MLP is a feedforward artificial neural network, which takes a set of input data and maps onto a set of appropriate outputs. An MLP consists of multiple layers of neurons, which have nonlinear activation functions, with each layer being fully connected to the next one. It has been demonstrated previously that the minimum number of layers needed to represent an arbitrary continuous mapping y=ƒ(x.sub.1, x.sub.2, . . . , x.sub.n) is 3, having the input layer, the hidden layer, and the output layer. A 3-layer MLP (or equivalently a 1-hidden-layer MLP) is a function ƒ: R.sup.n.fwdarw.R.sup.l, where n is the size of the input vector x and l is the size of the output vector ƒ(x) such that, in matrix notation:


    y≅ƒ(x)=G{b.sup.(2)+W.sup.(2)[s(b.sup.(1)+W.sup.(1)x)]},

    where b.sup.(1) and b.sup.(2) are bias vectors, w.sup.(1) and w.sup.(2) are weight matrices and G and s are activation functions. A commonly-used activation function is in the form of a sigmoid function:

    [00003] g ( z ) = 1 1 + e - λ z ,

    [0083] where λ determines the slope of the transition from 0 to 1. The weight matrices w.sup.(1) and W.sup.(2) are computed using a training algorithm such as the Levenberg-Marquardt or the back-propagation algorithms.

    [0084] The neural networks used in this alternative aspect are first trained to learn an adaptive beamforming algorithm based on a training dataset that has been generated by Field II simulation and then, the trained neural network is applied to two different test datasets to prove the concept. The alternative aspect is a framework that can be generalized to many other computationally expensive adaptive beamforming techniques as long as sufficient amount of input-output data pairs are available.

    [0085] The alternative aspect provides a machine learning framework that can learn computationally expensive adaptive beamforming algorithms from a limited amount of training datasets and apply the learned algorithms on new datasets at a significantly lower computational cost via inference. The alternative aspect of the invention may be an enabler of real-time processing of computationally expensive adaptive beamforming algorithms that are otherwise very difficult to run in real-time, sometimes even with GPUs at an above-average price point.

    [0086] The main element of the alternative aspect of the invention is a neural network that maps time-aligned complex per-channel RF data to complex beamformed RF data. While several types of neural networks like multi-layer perceptron (MLP), convolutional neural networks (CNN), and more advanced networks such as regressive and/or generative networks may be used to perform similar tasks, the alternative aspect uses MLP model to demonstrate the feasibility of using machine learning/deep learning framework to learn and apply an advanced adaptive beamforming technique. The MV beamformer is used as a test algorithm, but the core concepts presented here can be extended to other adaptive beamforming algorithms as well.

    [0087] The input-output pairs used in training the neural network in the alternative aspect of the invention are not pixels from original and MV beamformer images, but rather the input data consists of time-aligned complex channel RF signals at a given depth and the output data is the corresponding complex beamformer output for the MV beamformer. The main steps are illustrated in FIG. 8. In step 1, the training data set for neural network training is prepared. It consists of the time-aligned complex per-channel data as input, and the complex MV beamformed RF data as target. For an N-channel system, M input-output pairs are obtained from 2N input per-channel data (real and imaginary) at a given pixel location and the corresponding MV beamformer output (real and imaginary). Both input and output data have real and imaginary parts. Hence, the input data matrix is M×2N and the output data matrix is M×2. The MV beamforming should be performed offline to obtain such input-output pairs. Because the MV beamforming is performed only once at the data preparation stage, the computational burden associated with it is not a limitation of the method.

    [0088] In step 2, the training data set is used to train the learning algorithm. This step is performed iteratively until the mapping error converges to a certain pre-specified level. An MLP model was used to prove the concept. However, more advanced network architectures involving convolutional neural networks may be used. This will be described in more detail as an embodiment later.

    [0089] In step 3, a test data set in the form of time-aligned complex per-channel data, which the learning algorithm has not observed before, is introduced. The trained algorithm operates on the input data to predict (or infer) its complex MV beamformer output. The inference step is expected to be significantly faster than direct computation of MVBF as it approximates computationally-intensive operations in MVBF using only additions and multiplications. For example, the computational complexity associated with standard DAS is linear with the number of elements, O(N). However, the computational complexity for MVBF is proportional to the subarray size L and becomes O(L.sup.3) due to matrix inversions needed to compute the optimal aperture weights. However, using MLPs, the added computational burden can be significantly reduced, potentially making it more feasible for real-time processing.

    [0090] Some preliminary simulation results are provided in FIGS. 9 and 10: FIG. 9 shows a training data set from simulated phantom containing a single large anechoic cyst, showing the original image (left), the neural network image (middle), and the true MVBF image (right). A 64-element P4-2 phased-array was simulated. All images are pre-scanconverted images and all images are displayed on a 60 dB dynamic range. Notice the MVBF image exhibits finer speckle size and reduced amount of sidelobes inside the cyst. The neural network image also shows reduced sidelobes in the cyst and slighter smaller speckle size.

    [0091] FIG. 10 shows a test data set from simulated phantom containing 3 small anechoic cysts. The original image (left), the neural network image (middle), and the true MVBF image (right) are shown. A 64-element P4-2 phased-array was simulated. All images are pre-scanconvert images and all images are displayed on a 60 dB dynamic range. Notice the MVBF image exhibits finer speckle size and reduced sidelobes in the anechoic cyst. The neural network image also shows similar improvements.

    [0092] Other network architectures could be used to learn the adaptive beamforming algorithm. The key component is that the network maps from per-channel inputs to beamformed outputs. For instance, a convolutional neural network is expected to give good results. The input data is the aligned, real (or complex) per channel data. Processing can be local (learning one pixel value from the relevant per channel data) or global (learning the whole beamformed RF frame from the whole aligned data stack). Local processing seems appropriate to imitate algorithms (such as the minimum variance beamformer) whose input data is local anyways. Global algorithms also have to potential to learn and use anatomy information, provided enough data is provided.

    [0093] In keeping with the philosophy of this alternative aspect of the invention, the following describes a local approach with convolutional neural network. The aligned per-channel data for each pixel is cropped in fast time around the sample depth of interest, yielding a (numTime*numElements) data matrix. The time dimension is typically a few wavelengths to be sensitive to steering effects. The training dataset size is determined by the number of such data windows in the number of available images. One single image can typically yield hundreds of thousands of independent training input-output pairs. A fully convolutional neural network with receptive field spanning the full input data and outputting a single scalar can be trained to learn the adaptively beamformed value at the depth of interest.

    [0094] The above-discussion is intended to be merely illustrative of the present system and should not be construed as limiting the appended claims to any particular embodiment or group of embodiments. Thus, while the present system has been described in particular detail with reference to exemplary embodiments, it should also be appreciated that numerous modifications and alternative embodiments may be devised by those having ordinary skill in the art without departing from the broader and intended spirit and scope of the present system as set forth in the claims that follow. Accordingly, the specification and drawings are to be regarded in an illustrative manner and are not intended to limit the scope of the appended claims.