Generative adversarial networks (GANs) have been used to successfully translate images between multiple imaging modalities. While there is a significant amount of literature on the use cases for these approaches, there has been limited investigation into the optimal model design and evaluation criteria. In this paper, we demonstrated the performance of different approaches on the task of cone-beam computer tomography (CBCT) to fan-beam computer tomography (CT) translation. We examined the implications of choosing between 2D and 3D models, the size of 3D patches, and the integration of the Structural Similarity Index Measure (SSIM) into the cycle-consistency loss. Additionally, we introduced a partially-invertible VNet architecture into the RevGAN framework, enabling the use of 3D UNet-like architectures with minimal memory footprint. We compared image similarity metrics to visual inspection as an evaluation method for these models using held-out patient data and phantom scans to demonstrate their generalizability. Our findings suggest that 3D models, despite requiring a longer training time to converge due to the number of parameters, produce fewer image perturbations compared to 2D models. Training with larger patches also improved stability and significantly reduced artifacts, but increased the training time, while the SSIM-L1 cycle-consistency loss function enhanced performance. Interestingly, our study revealed a discrepancy between standard image similarity metrics and visual evaluation, with the former failing to adequately penalize visually evident artifacts in synthetic CT scans. This underscores the need for tailored and standardized evaluation metrics for medical image translation, which would facilitate more accurate comparisons across studies. To further the clinical applicability of image-to-image translation, we have open-sourced our methods and experiments, available at github.com/ganslate-team.
KEYWORDS: Image segmentation, Education and training, Muscles, Image processing, Breast, Mammography, Data modeling, Atomic force microscopy, Deep learning, Digital mammography
PurposeWe developed a segmentation method suited for both raw (for processing) and processed (for presentation) digital mammograms (DMs) that is designed to generalize across images acquired with systems from different vendors and across the two standard screening views.ApproachA U-Net was trained to segment mammograms into background, breast, and pectoral muscle. Eight different datasets, including two previously published public sets and six sets of DMs from as many different vendors, were used, totaling 322 screen film mammograms (SFMs) and 4251 DMs (2821 raw/processed pairs and 1430 only processed) from 1077 different women. Three experiments were done: first training on all SFM and processed images, second also including all raw images in training, and finally testing vendor generalization by leaving one dataset out at a time.ResultsThe model trained on SFM and processed mammograms achieved a good overall performance regardless of projection and vendor, with a mean (±std. dev.) dice score of 0.96±0.06 for all datasets combined. When raw images were included in training, the mean (±std. dev.) dice score for the raw images was 0.95±0.05 and for the processed images was 0.96±0.04. Testing on a dataset with processed DMs from a vendor that was excluded from training resulted in a difference in mean dice varying between −0.23 to +0.02 from that of the fully trained model.ConclusionsThe proposed segmentation method yields accurate overall segmentation results for both raw and processed mammograms independent of view and vendor. The code and model weights are made available.
KEYWORDS: Magnetic resonance imaging, Data acquisition, Brain, Visualization, Neuroimaging, Image restoration, Medical image reconstruction, Inverse problem on medical image
In spite of its extensive adaptation in almost every medical diagnostic and examinatorial application, Magnetic Resonance Imaging (MRI) is still a slow imaging modality which limits its use for dynamic imaging. In recent years, Parallel Imaging (PI) and Compressed Sensing (CS) have been utilised to accelerate the MRI acquisition. In clinical settings, subsampling the k-space measurements during scanning time using Cartesian trajectories, such as rectilinear sampling, is currently the most conventional CS approach applied which however, is prone to producing aliased reconstructions. With the advent of the involvement of Deep Learning (DL) in accelerating the MRI, reconstructing faithful images from subsampled data became increasingly promising. Retrospectively applying a subsampling mask onto the k-space data is a way of simulating the accelerated acquisition of kspace data in real clinical setting. In this paper we compare and provide a review for the effect of applying either rectilinear or radial retrospective subsampling on the quality of the reconstructions outputted by trained deep neural networks. With the same choice of hyper-parameters we train and evaluate two distinct Recurrent Inference Machines (RIMs), one for each type of subsampling. The qualitative and quantitative results of our experiments indicate that the model trained on data with radial subsampling attains higher performance and learns to estimate reconstructions with higher fidelity paving the way for other DL approaches to involve radial subsampling.
Many advanced reconstruction and image processing methods are being developed with the aim of improving image quality in CT. Development and testing of these methods is aided by the ability to simulate realistic images, to both control the acquisition process and be able to use digital phantoms with known ground truth. Therefore, in this work, we present a method to simulate realistic scanner-specific sinograms from digital phantoms. For this, a series of measurements was conducted on a clinical CT system to characterize resolution loss, noise characteristics, and the exposure-to-detector output relationship. These measurements were used to develop a simulation pipeline, which involves raytracing of a digital phantom, taking into account the focal spot size and gantry rotation, followed by the use of Lambert’s Law to determine the amount of energy arriving at each detector element. The spectrum for the specific tube voltage and current was modeled using previously published spectral models. The resulting sinogram was then corrupted, by applying the measured detector Modulation Transfer Function (MTF), and adding noise based on the Noise Power Spectrum (NPS) and mean-variance relationship. Simulator results were compared to those acquired with the CT system in our clinic, showing an average difference of 2.1% in the off-center MTF magnitude, 0.048 in normalized NPS magnitude and only 6 and 5 Hounsfield Units (HU) difference in the voxel values for respectively water and air. The developed simulator seems capable of generating realistic CT images, which can help researchers develop and test their algorithms.
We present WeakSTIL, an interpretable two-stage weak label deep learning pipeline for scoring the percentage of stromal tumor infiltrating lymphocytes (sTIL%) in H&E-stained whole-slide images (WSIs) of breast cancer tissue. The sTIL% score is a prognostic and predictive biomarker for many solid tumor types. However, due to the high labeling efforts and high intra- and interobserver variability within and between expert annotators, this biomarker is currently not used in routine clinical decision making. WeakSTIL compresses tiles of a WSI using a feature extractor pre-trained with self-supervised learning on unlabeled histopathology data and learns to predict precise sTIL% scores for each tile in the tumor bed by using a multiple instance learning regressor that only requires a weak WSI-level label. By requiring only a weak label, we overcome the large annotation efforts required to train currently existing TIL detection methods. We show that WeakSTIL is at least as good as other TIL detection methods when predicting the WSI-level sTIL% score, reaching a coefficient of determination of 0.45 ± 0.15 when compared to scores generated by an expert pathologist, and an AUC of 0.89 ± 0.05 when treating it as the clinically interesting sTIL-high vs sTIL-low classification task. Additionally, we show that the intermediate tile-level predictions of WeakSTIL are highly interpretable, which suggests that WeakSTIL pays attention to latent features related to the number of TILs and the tissue type. In the future, WeakSTIL may be used to provide consistent and interpretable sTIL% predictions to stratify breast cancer patients into targeted therapy arms.
Radiotherapy plays an important role in the management of lung cancer for many patients. During radiation treatment the respiratory motion of the patient impedes the possibility to accurately target the tumor, requiring large treatment margins. This leads to additional radiation to healthy tissue, and the associated toxicity. Real time adaptive radiotherapy is a promising direction for solving this problem, where radiation beams are shaped continuously to track the tumor based on real-time image analysis, which is possible with e.g. MRI-guided radiotherapy. To assist in the MR-Linac planning process, we developed a U-Net based tumor tracking method that uses a double encoder structure to incorporate both 3D+t planning CT images and a 3D planning scan with corresponding segmentation. Our best model achieves 0.60 surface Dice and 92% recall.
Ductal Carcinoma in Situ (DCIS) constitutes 20–25% of all diagnosed breast cancers and is a well known potential precursor for invasive breast cancer.1 The gold standard method for diagnosing DCIS involves the detection of calcifications and abnormal cell proliferation in mammary ducts in Hematoxylin and Eosin (H&E) stained whole-slide images (WSIs). Automatic duct detection may facilitate this task as well as downstream applications that currently require tedious, manual annotation of ducts. Examples of these are grading of DCIS lesions2 and prediction of local recurrence of DCIS.3 Several methods have been developed for object detection in the field of deep learning. Such models are typically initialised using ImageNet transfer-learning features, as the limited availability of annotated medical images has hindered the creation of domain-specific encoders. Novel techniques such as self-supervised learning (SSL) promise to overcome this problem by utilising unlabelled data to learn feature encoders. SSL encoders trained on unlabelled ImageNet have demonstrated SSL’s capacity to produce meaningful representations, scoring higher than supervised features on the ImageNet 1% classification task.4 In the domain of histopathology, feature encoders (Histo encoders) have been developed.5, 6 In classification experiments with linear regression, frozen features of these encoders outperformed those of ImageNet encoders. However, when models initialised with histopathology and ImageNet encoders were fine-tuned on the same classification tasks, there were no differences in performance between the encoders.5, 6 Furthermore, the transferability of SSL encodings to object detection is poorly understood.4 These findings show that more research is needed to develop training strategies for SSL encoders that can enhance performance in relevant downstream tasks. In our study, we investigated whether current state-of-the-art SSL methods can provide model initialisations that outperform ImageNet pre-training on the task of duct detection in WSIs of breast tissue resections. We compared the performance of these SSL-based histopathology encodings (Histo-SSL) with ImageNet pre-training (supervised and self-supervised) and training from scratch. Additionally, we compared the performance of our Histo-SSL encodings with published Histo encoders by Ciga5 and Mormont6 on the same task.
Segmentation of digital mammograms (DMs) into background, breast, and pectoral muscle is an important pre-processing step for many medical imaging pipelines. Our aim is to propose a segmentation method suited for processed DMs that generalizes across cranio-caudal (CC) and medio-lateral oblique (MLO) projections, and across models of different vendors. A dataset of 247 diagnostic DM exams was used, totaling 493 CC and 494 MLO processed images, of which 199 (40.4%) and 486 (98.4%) contained a pectoral muscle, respectively. The images were acquired with 10 different DM models from GE (73%) and Siemens (27%). The multi-class segmentation was done by a U-Net trained with a multi-class weighted focal loss. Several types of data augmentation were used during training, to generalize across model types, including random look-up table and random elastic and gamma transformations. The DICE coefficients for the segmentations were (mean ± std. dev.) 0.995 ± 0.005, 0.980 ± 0.016, 0.839 ± 0.243 for background, breast, and pectoral muscle, respectively. Background segmentation did not differ significantly between CC and MLO images. The pectoral muscle segmentation resulted in a higher DICE coefficient for MLO (0.932 ± 0.104) than CC images (0.636 ± 0.323). The false positive rate of pectoral muscle segmentation was 1.5% in CC images without any pectoral muscle. Among different model types, the mean overall DICE coefficients ranged from 0.985-0.990 for the different system models. The developed method yielded accurate overall segmentation results, independent of view, and was able to generalize well over mammograms acquired by systems of different vendors.
Purpose: A computer-aided diagnosis (CADx) system for breast masses is proposed, which incorporates both handcrafted and convolutional radiomic features embedded into a single deep learning model.
Approach: The model combines handcrafted and convolutional radiomic signatures into a multi-view architecture, which retrieves three-dimensional (3D) image information by simultaneously processing multiple two-dimensional mass patches extracted along different planes through the 3D mass volume. Each patch is processed by a stream composed of two concatenated parallel branches: a multi-layer perceptron fed with automatically extracted handcrafted radiomic features, and a convolutional neural network, for which discriminant features are learned from the input patches. All streams are then concatenated together into a final architecture, where all network weights are shared and the learning occurs simultaneously for each stream and branch. The CADx system was developed and tested for diagnosis of breast masses (N = 284) using image datasets acquired with independent dedicated breast computed tomography systems from two different institutions. The diagnostic classification performance of the CADx system was compared against other machine and deep learning architectures adopting handcrafted and convolutional approaches, and three board-certified breast radiologists.
Results: On a test set of 82 masses (45 benign, 37 malignant), the proposed CADx system performed better than all other model architectures evaluated, with an increase in the area under the receiver operating characteristics curve (AUC) of 0.05 ± 0.02, and achieving a final AUC of 0.947, outperforming the three radiologists (AUC = 0.814 − 0.902).
Conclusions: In conclusion, the system demonstrated its potential usefulness in breast cancer diagnosis by improving mass malignancy assessment.
Computer-aided detection aims to improve breast cancer screening programs by helping radiologists to evaluate digital mammography (DM) exams. DM exams are generated by devices from different vendors, with diverse characteristics between and even within vendors. Physical properties of these devices and postprocessing of the images can greatly influence the resulting mammogram. This results in the fact that a deep learning model trained on data from one vendor cannot readily be applied to data from another vendor. This paper investigates the use of tailored transfer learning methods based on adversarial learning to tackle this problem. We consider a database of DM exams (mostly bilateral and two views) generated by Hologic and Siemens vendors. We analyze two transfer learning settings: 1) unsupervised transfer, where Hologic data with soft lesion annotation at pixel level and Siemens unlabelled data are used to annotate images in the latter data; 2) weak supervised transfer, where exam level labels for images from the Siemens mammograph are available. We propose tailored variants of recent state-of-the-art methods for transfer learning which take into account the class imbalance and incorporate knowledge provided by the annotations at exam level. Results of experiments indicate the beneficial effect of transfer learning in both transfer settings. Notably, at 0.02 false positives per image, we achieve a sensitivity of 0.37, compared to 0.30 of a baseline with no transfer. Results indicate that using exam level annotations gives an additional increase in sensitivity.
We propose an algorithm to recognize breast parenchyma regions containing mass-like abnormalities in dedicated breast CT images using texture feature descriptors. From 53 patient breast CT scans (29 of which containing masses), we first isolated the parenchyma through automatic segmentation, and we obtained a total of 14,751 normal 2D image patches (negatives), and 2,100 containing a breast mass (positives). We extracted 141 texture features (10 first-order descriptors, 6 Haralick features, 20 run-length features, 45 structural and pattern descriptors, 60 Gabor features), which we then analyzed through multivariate analysis of variance (MANOVA) and linear discriminant analysis, resulting in an area under the ROC curve (AUC) of 0.9. We finally identified the most discriminant features through sequential forward selection, and used them to train and validate a neural network by dividing the data into multiple batches, with each batch always containing the whole set of positive cases, and as many different negative examples. To avoid the possible bias due to the high skewness in class proportion, the training was performed on all these batches independently, without re-initializing the network weights after each training. The network was tested using an additional independent 18 patient breast CT scans (8 normal and 10 containing a mass), on a total of 7,274 image patches (852 positives, 6,422 negatives) which were not used during the training/validation phase, resulting in 95.6% precision, 95.8% recall, and 0.99 AUC. Our results suggest that the proposed approach could be further evaluated and expanded for computer-aided detection tasks in breast CT imaging.
Digital breast tomosynthesis is rapidly replacing digital mammography as the basic x-ray technique for evaluation of the breasts. However, the sparse sampling and limited angular range gives rise to different artifacts, which manufacturers try to solve in several ways. In this study we propose an extension of the Learned Primal- Dual algorithm for digital breast tomosynthesis. The Learned Primal-Dual algorithm is a deep neural network consisting of several ‘reconstruction blocks’, which take in raw sinogram data as the initial input, perform a forward and a backward pass by taking projections and back-projections, and use a convolutional neural network to produce an intermediate reconstruction result which is then improved further by the successive reconstruction block. We extend the architecture by providing breast thickness measurements as a mask to the neural network and allow it to learn how to use this thickness mask. We have trained the algorithm on digital phantoms and the corresponding noise-free/noisy projections, and then tested the algorithm on digital phantoms for varying level of noise. Reconstruction performance of the algorithms was compared visually, using MSE loss and Structural Similarity Index. Results indicate that the proposed algorithm outperforms the baseline iterative reconstruction algorithm in terms of reconstruction quality for both breast edges and internal structures and is robust to noise.
Computer-aided detection or decision support systems aim to improve breast cancer screening programs by helping radiologists to evaluate digital mammography (DM) exams. Commonly such methods proceed in two steps: selection of candidate regions for malignancy, and later classification as either malignant or not. In this study, we present a candidate detection method based on deep learning to automatically detect and additionally segment soft tissue lesions in DM. A database of DM exams (mostly bilateral and two views) was collected from our institutional archive. In total, 7196 DM exams (28294 DM images) acquired with systems from three different vendors (General Electric, Siemens, Hologic) were collected, of which 2883 contained malignant lesions verified with histopathology. Data was randomly split on an exam level into training (50%), validation (10%) and testing (40%) of deep neural network with u-net architecture. The u-net classifies the image but also provides lesion segmentation. Free receiver operating characteristic (FROC) analysis was used to evaluate the model, on an image and on an exam level. On an image level, a maximum sensitivity of 0.94 at 7.93 false positives (FP) per image was achieved. Similarly, per exam a maximum sensitivity of 0.98 at 7.81 FP per image was achieved. In conclusion, the method could be used as a candidate selection model with high accuracy and with the additional information of lesion segmentation.
Ventricular volume and its progression are known to be linked to several brain diseases such as dementia and schizophrenia. Therefore accurate measurement of ventricle volume is vital for longitudinal studies on these disorders, making automated ventricle segmentation algorithms desirable. In the past few years, deep neural networks have shown to outperform the classical models in many imaging domains. However, the success of deep networks is dependent on manually labeled data sets, which are expensive to acquire especially for higher dimensional data in the medical domain. In this work, we show that deep neural networks can be trained on muchcheaper-to-acquire pseudo-labels (e.g., generated by other automated less accurate methods) and still produce more accurate segmentations compared to the quality of the labels. To show this, we use noisy segmentation labels generated by a conventional region growing algorithm to train a deep network for lateral ventricle segmentation. Then on a large manually annotated test set, we show that the network significantly outperforms the conventional region growing algorithm which was used to produce the training labels for the network. Our experiments report a Dice Similarity Coefficient (DSC) of 0.874 for the trained network compared to 0.754 for the conventional region growing algorithm (p < 0.001).
KEYWORDS: Image segmentation, Digital breast tomosynthesis, Breast, Mammography, Systems modeling, Detection and tracking algorithms, Tissues, Convolutional neural networks, 3D modeling, Image processing algorithms and systems
Digital breast tomosynthesis (DBT) has superior detection performance than mammography (DM) for population-based breast cancer screening, but the higher number of images that must be reviewed poses a challenge for its implementation. This may be ameliorated by creating a twodimensional synthetic mammographic image (SM) from the DBT volume, containing the most relevant information. When creating a SM, it is of utmost importance to have an accurate lesion localization detection algorithm, while segmenting fibroglandular tissue could also be beneficial. These tasks encounter an extra challenge when working with images in the medio-lateral oblique view, due to the presence of the pectoral muscle, which has similar radiographic density. In this work, we present an automatic pectoral muscle segmentation model based on a u-net deep learning architecture, trained with 136 DBT images acquired with a single system (different BIRADS ® densities and pathological findings). The model was tested on 36 DBT images from that same system resulting in a dice similarity coefficient (DSC) of 0.977 (0.967-0.984). In addition, the model was tested on 125 images from two different systems and three different modalities (DBT, SM, DM), obtaining DSCs between 0.947 and 0.970, a range determined visually to provide adequate segmentations. For reference, a resident radiologist independently annotated a mix of 25 cases obtaining a DSC of 0.971. The results suggest the possibility of using this model for inter-manufacturer DBT, DM and SM tasks that benefit from the segmentation of the pectoral muscle, such as SM generation, computer aided detection systems, or patient dosimetry algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.