1.IntroductionCoronary artery disease (CAD) is the narrowing of coronary arteries caused by a build-up of atherosclerotic plaques. As the most common type of heart disease, CAD leads to one in seven deaths in the United States.1 Optical coherence tomography (OCT) has been recognized as a valuable tool for imaging coronary tissue structures due to its high-resolution capabilities.2 However, real-time interpretation of OCT images requires a significant amount of expertise and prior training. Additionally, the power of OCT interpretation, especially of the pathological region, is hindered by the lack of histopathological correlation. At present, direct histopathological analysis requires an invasive and time-consuming evaluation that involves post-mortem tissue examination. The use of multiple reagents in histopathology can also lead to detrimental effects on tissue imaging. Histopathological analysis is not suitable for clinical use in patients, who require real-time tissue characterization of coronary arteries. Incorporating histopathological visualization into real-time OCT imaging holds great potential to complement OCT with histopathological visualization. A typical example of generating virtual stained histology images from OCT images of human coronary arteries is shown in Fig. 1. To date, there are limited frameworks developed to generate virtual stained histology from OCT images.3,4 Winetraub et al. used Pix2Pix Generative Adversarial Networks (GANs) to generate virtual stained hematoxylin and eosin (H&E) histology for human skin tissues.3 However, Pix2Pix GAN for virtual staining requires a pixel-wisely paired OCT and H&E image dataset. The creation of a pixel-wisely paired dataset demands a significant investment of resources and labor, including the embedding of samples in fluorescent gel, photo-bleaching, and manual fine alignment.3 Such a method also lacks generalizability to blood vessels, which are deformable soft tissue. Our previous method4 demonstrates the capability to segment the three-layer structure (i.e., intima, media, and adventitia) in both OCT and H&E images, thereby generating virtual stained images optimized for different layers in human coronary. However, current performance has not been optimal if there are pathological patterns, such as calcium and lipid accumulation, that alter the typical three-layer structure of human coronary arteries. To generate pathological-related regions from an unpaired dataset, we propose a structurally constrained pathology-aware convolutional transformer GAN (SCPAT-GAN) to generate virtually stained H&E histology images from OCT images. The proposed SCPAT-GAN incorporates two key components to enhance image quality for both normal and pathological coronary samples: a structural constraining module and a pathology awareness module. In summary, our main contributions include the following.
2.Methods2.1.Design of SCPAT-GAN2.1.1.Network architectureThe design of SCPAT-GAN is shown in Fig. 2. The SCPAT-GAN consists of two convolutional transformer generators ( and ) and two discriminators ( and ). The transformer structure possesses self-focus mechanisms that provide the global context of a given data sample even at the lowest layer. transfers images from OCT domain to the histology domain; transfers images from the histology domain to the OCT domain. The two generators share a similar structure. is the discriminator for histology images and is the discriminator for OCT images. Symbols and stand for OCT and histology images respectively. The convolutional transformer generators ( and ) take advantage of U-Net5 like structure to extract multi-scale features. The multi-scale features are sent to Swin transformer block (STB) and structural constraint and pathology aware (SCPA) module. The STB is a deep neural network architecture that employs multiple residual Swin transformer sub-blocks (RSTBs) to extract features from input data. The RSTBs contain various Swin transformer layers (STLs)6 that facilitate local attention and cross-window interaction learning. The feature extraction process of RSTBs is expressed as: , where denotes the features generated from STLs, Conv represents 2D convolutional layer with a kernel size of , and represents the input feature of RSTBs. Each STL comprises components including layer normalization, multi-head self-attention (MHA) modules, residual connections, and a two-level multilayer perceptron (MLP) with Gaussian error linear unit (GELU) non-linearity. Given an input of size , the STL will reshape the input to the feature map of by partitioning the input into non-overlapping windows of patches, where is the total number of the windows. For a local window feature , the query , key , and value matrices are computed as , , and , where , , . The , , and are projection matrices shared across different windows. The self-attention of each head can be calculated as: , where denotes the query dimension; stands for the number of patches in a window; and . 2.1.2.Structural constraining and pathology awarenessThe SCPA module is based on a transformer encoder-decoder architecture, which guides the virtual staining procedure. The SCPA module performs structural constraining and pathology awareness functions by segmenting the human coronary layers and classifying the types of coronary samples (normal or pathological). The multi-scale features are split into a sequence of patches , where stands for the patch size, represents for the number of patches, and C is the number of channels of the multi-scale features. The patches are flattened and then linearly projected to an embedding sequence , where is the embedding dimension. Learnable position embeddings are added to the sequence of patch embeddings to generate the tokens for Encoder. The Encoder maps the input sequence to , which is an encoding sequence containing contextualized information of multi-scale features. The SCPA module is designed to be aware of pathology patterns as well as maintain and constrain the normal structure of coronary samples. In the case of normal coronary samples, the is decoded to a segmentation map , where and represents the three-layer structure of human coronary arteries. The segmentation map is acquired by the SCPA module, taking the scalar production between patch embeddings and class embeddings : , where is acquired by decoding , and is acquired by decoding a set of three randomly initialized learnable class embeddings [, , ] corresponding to the three coronary layers. In the case of diseased coronary samples, the patch embeddings are sent to a two-level MLP for classification between normal and pathological coronary images: . Also, the patch embeddings is concatenated to the extracted features from STB and then merged and up-sampled for OCT → Histology and Histology → OCT conversion. 2.1.3.Loss functionThe loss function of SCPAT-GAN consists of five terms, which are adversarial loss , cycle-consistency loss , embedding loss , structural constraint loss , pathology awareness loss We follow the definition of and made by Zhu et al.7 and the definition of made by Liu et al.8 , , , and are hyper-parameters. and are two generators that generate virtual histology images from OCT images and virtual OCT images from histology images respectively. , , , and are the SCPA modules for performing structural constraining and pathology awareness functions in the generators. The is implemented by segmentation loss where and stand for the number of pixels in segmentation maps. and are the ground-truth pixel labels of different coronary layers for H&E and OCT images, respectively. stands for the number of categories of the coronary layers (). The is implemented by classification loss where and are the ground-truth labels for pathology samples. We aim to solve the following minmax optimization problem3.Experiments3.1.Experimental Settings3.1.1.Experimental datasetHuman coronary samples were collected from the School of Medicine at the University of Alabama at Birmingham (UAB). Specimens were imaged via a commercial OCT system (Thorlabs Ganymede, Newton, New Jersey). A total of 194 OCT images were collected from 23 patients with an imaging depth of 2.56 mm.9 The pixel size was within a B-scan. The width of the images ranged from 2 mm to 4 mm depending on the size of sample. After OCT imaging, samples were processed for H&E histology at UAB. We rescale the H&E images in the Aperio ImageScope software to enforce a pixel size of . Among the dataset, 112 OCT images are from normal samples with the three-layer structure (i.e., intima, media, and adventitia); 82 OCT and H&E images contain pathological patterns. At the pixel level, we pixel-wisely label the structure (e.g., the layer structure) in a subset of OCT and H&E images for training purposes. At the image level, we label each OCT or H&E image as the pathological OCT or normal. The OCT and H&E images are divided into non-overlap patches with a size of . We randomly flip the patches from left to right for data augmentation. The training set contains 4297 OCT image patches and 4297 H&E image patches. 3.1.2.Implementation detailsWe adopt three convolution and transpose convolution layers with a stride of two for building a U-Net like structure for generating multi-scale feature maps. For the STB, we follow the design in our previous work.6 Our design of SCPA module is inspired by the Segmenter model.10 But different from the Segmenter,10 we design the SCPA module to be capable of performing both segmentation and classification tasks, which suits our need for structural constraining and pathology awareness functions during virtual staining. The SCPAT-GAN is implemented by Pytorch. For training, the hyperparameters , , , and are set to 1, 0.2, 5, and 5 empirically. The pixel values of OCT and H&E images are scaled to [0, 1]. The batch size is 9. The learning rate is initialized as , followed by a linearly decaying decay for every 2 epochs. In total, the SCPAT-GAN is trained 10,000 epochs to ensure convergence. The experiments are carried out on an RTX A6000 GPU. 3.1.3.MetricsWe measure the similarity of pairs of virtual stained histology and real histology images using reference-free metrics including Fréchet inception distance (FID)11 and perceptual hash value (PHV).8 The FID is defined as where and are the magnitudes of the virtual stained and real histology images; Tr is the trace of the matrix; and are the covariance matrix of the virtual stained and real histology images. The PHV is defined as where is the total number of extracted featuremap, represents the featuremap extracted from ’th layer of ResNet-101, avg is the average pooling operation that turns 3-D features into 1-D features, is the unit step function, and is a preset threshold. We use the three variations of PHV scores (, PHV1), (, PHV2), and (, PHV3) which are extracted from different levels of ResNet-101. We set to be 0.02.Also, we designed a protocol to involve two pathologists (Dr. Silvio H. Litovsky, referred to as pathologist A; and Dr. Charles C. Marboe, referred to as pathologist B) to evaluate the quality of the virtual stained H&E images. Real and virtual stained H&E images are given to the pathologists, who are blinded to the true labels, to make predictions. The two pathologists work independently from each other. We compare the prediction results from the pathologist with the true labels, following the setup of the visual Turing test.12,13 3.2.Results and Discussion3.2.1.Quantitative analysisThe quantitative results (calculated by three-fold cross-validation) of SCPAT-GAN, as well as two start-of-art methods, for generating virtual stained H&E are shown in Table 1. Compared to our previous method (Coronary-GAN4) and Cycle-GAN, the SCPAT-GAN generates virtual stained H&E images of better quality, with lower FID scores and higher PHV scores, for normal, pathological, and the whole dataset. Those scores indicate that virtual stained histology and real histology are perceptually similar. Moreover, we have two experienced pathologists with more than 30 years of experience to evaluate the quality of virtual stained H&E images. The pathologists, who are blind to the true labels, manually identify if an image is real or virtual. Table 1The FID and PHV scores of SCPAT-GAN, Coronary-GAN, and Cycle-GAN. The PHV scores calculated from different levels of the feature maps PHV1, PHV2, and PHV3. We report evaluation results for normal, pathological, and the whole dataset. All the results are calculated using three-fold cross-validation.
Note: The best performance results are highlighted in bold. The results of pathologists’ evaluation are shown in Fig. 3. Among the total 60 images (half virtual and half real), over half of them (42 images by pathologist A and 33 images by pathologist B) are deemed as “real.” For the virtual stained images, more than half (19 images) are deemed as “real” by pathologist A and half (15 images) are deemed as “real” by pathologist B. We calculated the accuracy (pathologist A: 0.56; pathologist B: 0.55), precision (pathologist A: 0.54; pathologist B: 0.54) values of the evaluation results from the two pathologists. We compare the evaluation results with that of random guessing (in theory, accuracy and precision should be 0.5 for an observer who is making choices randomly). We found that the average accuracy (0.55) and precision values (0.55) are close to that of random guessing. The average sensitivity (0.68) is higher, which indicates that the pathologists are capable of identifying real histology images. However, the average specificity (0.43) is lower, which means the virtually stained images are less likely to be identified. Thus, the quality of virtually stained images is close to that of real histology images according to pathologists’ justification. Moreover, the intraclass correlation coefficient (ICC) between the evaluation results of the two pathologists is 0.014, which means a low interreader agreement because the images are indistinguishable. 3.2.2.Ablation studyWe perform an ablation study by removing the structural constraining (PAT-GAN) or pathology awareness functions (SCT-GAN) or both (T-GAN). The models are retrained and compared with the design of SCPAT-GAN. The results are reported in Table 2. When both structural constraining and pathological awareness models are equipped, SCPAT-GAN reaches the best performance. The ablated model without the structure constraining and pathological awareness modules (T-GAN) reports a compromised performance. Table 2The ablation study. We remove the pathological awareness module (SCT-GAN), structural constraining module (PAT-GAN), and both modules (T-GAN). The ablated models, SCT-GAN, PAT-GAN, and T-GAN, are retrained.
Note: The best performance results are highlighted in bold. 3.2.3.Qualitative analysisWe visually inspect the virtual stained H&E images generated by SCPAT-GAN in Fig. 4. For normal coronary samples, the SCPAT-GAN is capable of generating the three-layer structure; for pathological coronary samples, the SCPAT-GAN is capable of resolving lipid-rich (red arrow) and calcified patterns (yellow star). Compared to real H&E images, virtual stained H&E images generated by SCPAT-GAN show similar patterns for lipid-rich and calcified regions. In contrast, the Coronary-GAN4 and Cycle-GAN fail to generate pathological patterns. The proposed SCPAT-GAN allows the generation of 3D virtual H&E volume for both normal and pathological human coronary samples. As shown in Fig. 5, we demonstrate 3D virtual H&E visualization for normal [Fig. 5(a)] and pathological [Fig. 5(d)] coronary samples. The 3D H&E visualization is impossible to acquire from conventional biochemical staining process, which provides an intuitive way of presenting histological information and reduces the randomness of the H&E sanctioning process.14 4.DiscussionIn this paper, we design a convolutional transformer-GAN, namely SCPAT-GAN, for generating virtual stained H&E histology from OCT images. Our SCPAT-GAN algorithm is capable of virtually staining OCT images for human coronary samples. The SCPAT-GAN does not require pixel-wisely matched OCT and H&E datasets. By incorporating structural constraining and pathology awareness functions in the SCPAT-GAN, our method outperforms existing methods, which is confirmed by both objective metrics and the pathologist’s evaluation. Compared to other label-free15 or stain-to-stain8 works for virtual staining of histology16 which focuses on top-view images or other image modalities, our SCPAT-GAN is designed for cross-sectional, depth-resolved OCT images and human coronary samples. Moreover, the proposed SCPAT-GAN is capable of generating 3D virtual stained H&E visualization for coronary samples, which is impossible to acquire using a conventional biochemical staining process. As the first study to demonstrate the feasibility of virtual stained histology in OCT images from non-paired training, our study does not focus on computational optimization. In the future, we will further reduce the computational overhead of SCPAT-GAN via lightweight neural network17 and implement parallel computing for 3D virtual histology. Also, we plan to enable the SCAPT-GAN in intravascular OCT imaging, towards the assistance of percutaneous coronary intervention. Furthermore, we will acquire more data and differentiate pathological patterns to provide fine-grained image-wise labels. Moreover, our current approach still requires image level labels of normal and pathological data and pixel-level layer annotation. We will explore self-supervised approaches to address this issue. Besides, we will explore the other use-cases of the SCPAT-GAN, such as generating multiple types of virtual staining (e.g. Van Gieson staining, Toluidine blue staining, and Alcian blue staining), and virtual staining of other samples (e.g., human skin and eye). 5.ConclusionIn this paper, we develop a deep learning model, namely SCPAT-GAN, for generating virtual histology information. Our work is the first to generate virtual H&E images with pathological patterns for coronary samples based on OCT. The proposed framework has great potential to provide real-time histopathological information during an OCT imaging procedure. Code and Data AvailabilityData underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request. FundingThis work was supported in part by the National Science Foundation (Grant Nos. CRII-2222739 and CAREER-2239810) and the New Jersey Health Foundation. AcknowledgmentsThe authors would like to thank Dr. Dezhi Wang from the University of Alabama, Birmingham, for histology service. ReferencesR. Hajar,
“Risk factors for coronary artery disease: historical perspectives,”
Heart Views: Off. J. Gulf Heart Assoc., 18
(3), 109 https://doi.org/10.4103/HEARTVIEWS.HEARTVIEWS_106_17
(2017).
Google Scholar
G. J. Tearney et al.,
“Consensus standards for acquisition, measurement, and reporting of intravascular optical coherence tomography studies: a report from the international working group for intravascular optical coherence tomography standardization and validation,”
J. Am. Coll. Cardiol., 59
(12), 1058
–1072 https://doi.org/10.1016/j.jacc.2011.09.079 JACCDI 0735-1097
(2012).
Google Scholar
Y. Winetraub et al.,
“OCT2Hist: non-invasive virtual biopsy using optical coherence tomography,”
(2021). Google Scholar
X. Li et al.,
“Structural constrained virtual histology staining for human coronary imaging using deep learning,”
in IEEE 20th Int. Symp. Biomed. Imaging (ISBI),
1
–5
(2023). https://doi.org/10.1109/ISBI53787.2023.10230480 Google Scholar
O. Ronneberger, P. Fischer and T. Brox,
“U-Net: convolutional networks for biomedical image segmentation,”
Lect. Notes Comput. Sci., 9351 234
–241 https://doi.org/10.1007/978-3-319-24574-4_28 LNCSD9 0302-9743
(2015).
Google Scholar
J. Liang et al.,
“SwinIR: image restoration using Swin transformer,”
in IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW),
1833
–1844
(2021). https://doi.org/10.1109/ICCVW54120.2021.00210 Google Scholar
J.-Y. Zhu et al.,
“Unpaired image-to-image translation using cycle-consistent adversarial networks,”
in IEEE Int. Conf. Comput. Vis. (ICCV),
2242
–2251
(2017). https://doi.org/10.1109/ICCV.2017.244 Google Scholar
S. Liu et al.,
“Unpaired stain transfer using pathology-consistent constrained generative adversarial networks,”
IEEE Trans. Med. Imaging, 40
(8), 1977
–1989 https://doi.org/10.1109/TMI.2021.3069874 ITMID4 0278-0062
(2021).
Google Scholar
X. Li et al.,
“Multi-scale reconstruction of undersampled spectral-spatial OCT data for coronary imaging using deep learning,”
IEEE Trans. Biomed. Eng., 69
(12), 3667
–3677 https://doi.org/10.1109/TBME.2022.3175670 IEBEAX 0018-9294
(2022).
Google Scholar
R. Strudel et al.,
“Segmenter: transformer for semantic segmentation,”
in IEEE/CVF Int. Conf. Comput. Vis. (ICCV),
7242
–7252
(2021). https://doi.org/10.1109/ICCV48922.2021.00717 Google Scholar
M. Heusel et al.,
“GANs trained by a two time-scale update rule converge to a local nash equilibrium,”
in Proc. 31st Int. Conf. Neural Inf. Process. Syst., NIPS’17,
6629
–6640
(2017). Google Scholar
H. Y. Park et al.,
“Realistic high-resolution body computed tomography image synthesis by using progressive growing generative adversarial network: visual turing test,”
JMIR Med. Inf., 9
(3), e23328 https://doi.org/10.2196/23328
(2021).
Google Scholar
Y. Myong et al.,
“Evaluating diagnostic content of AI-generated chest radiography: a multi-center visual turing test,”
PloS One, 18
(4), e0279349 https://doi.org/10.1371/journal.pone.0279349 POLNCL 1932-6203
(2023).
Google Scholar
S. W. Dyson et al.,
“Impact of thorough block sampling in the histologic evaluation of melanomas,”
Arch. Dermatol., 141
(6), 734
–736 https://doi.org/10.1001/archderm.141.6.734
(2005).
Google Scholar
R. Cao et al.,
“Label-free intraoperative histology of bone tissue via deep-learning-assisted ultraviolet photoacoustic microscopy,”
Nat. Biomed. Eng., 7
(2), 124
–134 https://doi.org/10.1038/s41551-022-00940-z
(2023).
Google Scholar
B. Bai et al.,
“Deep learning-enabled virtual histological staining of biological samples,”
Light Sci. Appl., 12
(1), 57 https://doi.org/10.1038/s41377-023-01104-7
(2023).
Google Scholar
S. Belousov,
“MobilestyleGAN: a lightweight convolutional neural network for high-fidelity image synthesis,”
(2021). Google Scholar
BiographyXueshen Li is currently a PhD candidate at the Stevens Institute of Technology. He received his BS degree in 2018 from Northeastern University in Shenyang, China; and his MS degree in 2020 from Eindhoven University of Technology in the Netherlands. His current research interests include deep learning and medical image processing. Hongshan Liu received her MS degree in Electrical Engineering from University of Michigan-Ann Arbor and her BS degree in Physics from Zhejiang University. She is a doctoral student in Biomedical Engineering at Stevens Institute of Technology. Her research focuses on deep learning-based image processing in the clinical applications of optical coherence tomography. Xiaoyu Song received her PhD in Biostatistics from Columbia University. She is an Assistant Professor at the Icahn School of Medicine at Mount Sinai. Her research interest is in biostatistics and statistical genomics. Charles C. Marboe is a Professor Emeritus of Pathology and Cell Biology at Columbia University Medical Center. He has 42 years of experience in cardiovascular pathology. Brigitta C. Brott is an interventional cardiologist with a background in Materials Science and Engineering. She obtained her cardiology and interventional cardiology training at Duke University Medical Center. She is a Professor of Medicine and Biomedical Engineering at the University of Alabama at Birmingham. Her research interests include novel coatings to improve healing after device implantation, and optimization of imaging and physiology assessments to guide cardiac interventional procedures. |
Optical coherence tomography
Transformers
Pathology
Design
Education and training
Arteries
Histopathology