Abstract
Deep learning (DL) is a subdomain of artificial intelligence algorithms capable of automatically evaluating subtle graphical features to make highly accurate predictions, which was recently popularized in multiple imaging-related tasks. Because of its capabilities to analyze medical imaging such as radiology scans and digitized pathology specimens, DL has significant clinical potential as a diagnostic or prognostic tool. Coupled with rapidly increasing quantities of digital medical data, numerous novel research questions and clinical applications of DL within medicine have already been explored. Similarly, DL research and applications within hematology are rapidly emerging, although these are still largely in their infancy. Given the exponential rise of DL research for hematologic conditions, it is essential for the practising hematologist to be familiar with the broad concepts and pitfalls related to these new computational techniques. This narrative review provides a visual glossary for key deep learning principles, as well as a systematic review of published investigations within malignant and non-malignant hematologic conditions, organized by the different phases of clinical care. In order to assist the unfamiliar reader, this review highlights key portions of current literature and summarizes important considerations for the critical understanding of deep learning development and implementations in clinical practice.
Introduction
Recent advances in large-scale data storage, availability, and computational power have led to significant interest in the development of new techniques for “big data” analysis. Rapidly evolving artificial intelligence (AI) algorithms aim to efficiently utilize vast amounts of information with minimal human interaction to address tasks that automate or improve upon human-level assessment. Artificial intelligence takes many forms and includes domains of deep learning (DL), convolutional neural networks (CNN), and other related techniques that are capable of processing imaging data quickly and automatically. Research divisions within commercially successful technology companies have popularized DL models for vision-related tasks, such as facial recognition, image segmentation, object detection, and many other examples that are currently being integrated into daily life.
Within the medical field, visual assessment of digitized clinical imaging and biospecimens by physicians is critical in numerous phases of clinical care for patients. As a result, early investigations that employ clinical DL using histology slides or radiological images within medicine have produced promising results, including diagnosis detection,1 clinical subtyping,2 cancer mutation prediction,3,4 and survival.5 Recognizing the clinical importance of these algorithms, the US Food and Drug Administration has approved a number of novel AI and DL products.6
However, DL algorithms exploring malignant and non-malignant hematologic conditions are still scarce. With digitization tools generating larger biospecimen image databases7,8 and researchers becoming increasingly familiar with DL techniques, examples of applications in hematology are growing exponentially.9-14 As such, it is inevitable for hematologists to be familiar with the broad concepts, applications, and limitations of clinical DL.
In this structured narrative review, we aim to describe the general concepts, provide a visual glossary for key terms within image-based DL, and conduct a systematic review to provide an up-to-date assessment of the application of image-based DL in benign and malignant hematology across various phases of patient care.
Neural networks and deep learning
The concept of “deep learning” is poorly defined, imprecise, and often used interchangeably with terms such as “machine learning” and “artificial intelligence.” Traditionally, “artificial intelligence” is the use of automated systems to perform a particular task. “Machine learning” represents a subset of AI in which rules are not explicitly predetermined, but are acquired by training and optimizing parameters based on observed data. Machine learning workflows traditionally separate data into training, validation, and external testing cohorts for model assessment. Examples of machine learning that are probably the most familiar include linear regression, logistic regression, or Cox proportional hazards models. “Deep learning” is a recently-popularized subset of machine learning utilizing a specialized neural-network architecture undergoing millions of arithmetic operations (Figure 1).15,16 DL architectures are loosely modeled after the complex neural connections of the human brain.17 Although the term “deep learning” is derived from “deep convolutional neural networks” and has gained interest particularly in clinical research, the strict definition has become increasingly ambiguous and may not completely represent modern state-of-the-art techniques. The field of DL and the list of essential glossary terms are rapidly changing, but in keeping with contemporary clinical manuscripts, this review will use the term “deep learning” to mean “deep convolutional neural networks and other contemporary techniques related to computer vision”. There are also non-image-based neural networks and image-based machine learning architectures without neural networks; however, both are beyond the scope of this current review.
Image preprocessing
A standard workflow3,18,19 for DL research typically requires preprocessing input images, which can expedite DL training time or improve performance. Preprocessing steps are typically dependent on the modality of the image type. While radiological images may be input either whole or with particular Regions of Interest (ROI) segmented, histopathology slides are typically tessellated into smaller tiles representing tissue or segmented cells of interest prior to inclusion into the model. Normalization of pathology images may reduce artifacts specific to a clinical site or particular scanning device, but there is no current standard normalization process. Data augmentation may be performed with random image rotations, vertical or horizontal flips, and simulated compression artifacts to increase the size of the training set and broaden generalizability. In addition to using images alone, researchers can include other data modalities such as clinical information with multi-modal models to supplement image inputs in attempts to improve model performance.
General neural network structure
In a simplified viewpoint of neural network structures (Figure 2), the input image is transformed at various intermediate states, termed “nodes,” with each node representing a different graphical feature of the image. As the image is passed from node to node, the connection between each node involves mathematical transformations to represent more complex features in later nodes. Each node can be connected to multiple subsequent nodes simultaneously, and the group of nodes with similar numbers of sequential connections from the input image represent a layer of intermediate nodes. Shallow and Deep neural networks refer to the number of node layers within a particular architecture, but there is no strict definition to differentiate the two. In addition, nodes may not necessarily connect to the nodes in the immediately subsequent layer, but may connect by “skip connections” to nodes in later layers. The penultimate layer of nodes, each representing only a single numerical value, is termed the Logit Layer, the values of which are then normalized between the range of 0 and 1 to give the final probabilities for the outcomes of interest. Common outcomes of interest and examples include object detection, segmentation, classification, regression, survival analysis, and detail optimization (Figure 3).
Information propagation and parameter training
To develop a neural network model, the input image is represented numerically by each pixel. The numerical information is propagated though intermediate nodes and layers towards the direction of the output layer. The connections between nodes are mathematically represented by either non-linear operations or matrix multiplication and addition with potentially millions of trainable parameters, whose values are updated while optimizing the end outcome.
Upon initial model evaluation, the sequential movement of information from input image towards the outcome of interest is deemed “forward propagation” or “forward pass” (Figure 2A-F). To complement an initial prediction, the user defines a particular loss function to quantitatively describe the incorrectness of the model’s prediction from available ground truth. Using an additional user-defined optimizer algorithm, the trainable parameters are iteratively adjusted to decrease the loss value in subsequent forward passes. This framework of optimizing parameters in earlier layers using information from the predicted outcome is deemed “back propagation.” During training, forward and back propagation are repeated for a defined number of repetitions, or epochs, but training can also be stopped if other defined optimal conditions are met. Given the need for at least 109 calculations per forward pass, parallel computing often requires specific hardware such as Graphics Processing Units (GPU) to expedite necessary matrix operations to be finished within reasonable timeframes.
Convolutions
At the time of writing, the most popular type of DL architecture is the convolutional neural network (CNN). The prototypical CNN algorithm assesses a smaller grid-like portion of each input image prior to propagation towards the next layer (Figure 2C). CNN utilize the convolution operation between layers, which involves matrix multiplication across overlapping sub-sections of the input image to produce a lower-dimensional output representation.
Pre-trained networks and transfer learning
Initially, CNN trained to perform object detection required millions of manually-annotated images, training for days or weeks on industry-grade computational equipment.20 After training is complete, CNN have traditionally been understood to learn “low-level” general features such as lines, edges, and shapes in earlier layers of the network, but more complex “high-level” features such as faces, patterns, and spatial distributions are learned in subsequent layers that are more closely associated with the evaluated outcome.21
In clinical research, it is rare for clinicians to have the resources to develop new CNN architectures with initially random parameters; such a feat requires large-scale databases with expert-level annotations and access to industry-grade supercomputers. Researchers have taken advantage of the learned features along progressive layers by using models previously trained on large databases for non-clinical tasks, but repurposing the final few layers to predict specific clinically-relevant outcomes. The concept of transfer learning involves utilizing a pre-trained network such as those already trained on the ImageNet database of over 1 million general images,22 initializing the model with the parameters that learned “low-level” features from images unrelated to the application of interest, and allowing the model to retrain and modify parameters in the last few layers to learn “higher-level” features on images for specific patient-related tasks. By utilizing transfer learning, the minimum required dataset and computational power is significantly less than fully training a network from completely random parameters.23
Specific deep learning architectures in clinical research
While DL is a framework of neural networks for outcome prediction, each specific model architecture incorporates drastically different complexities with regards to number of layers, connections between layers, functions, and many other highly-engineered features. In fact, newer contemporary models lack any convolutional layers, and infer local and global image features by other methods.24
Thus far, the predominant architecture for hematology-specific questions tend to be from a class of CNN known as Residual Neural Networks (ResNets), which utilize skip connections. Most specific ResNet architectures, such as Inception, EfficientNets, MobileNets, and other various ResNet models are open-source and widely available.25
Certain model architectures are engineered to provide an output that is an additional image; these model structures are needed for dimensionality reduction, bounding-box detection, segmentation, and noise reduction tasks. One specific architecture, Autoencoders, are networks that pass an input image through an intermediate lower-dimensional representation, followed by upsizing to a higher-dimensional space to recreate the input image.26 Theoretically, the lower-dimensional intermediate representation still retains features of the original image which may be clinically or biologically relevant. Similar architectures such as U-Net require additional training data, such as object ROI or low/high-quality image pairs, to accomplish tasks such as image segmentation or digital optimization.
An additional relevant DL framework utilized is Multiple Instance Learning (MIL)27,28 and its attention-based derivatives,29,30 including the Clustering-constrained Attention Multiple Instance Learning (CLAM).31 The main distinction in MIL frameworks is the prediction for data subsets and not for single instances. Specifically, input images are separated into smaller subsets. The entirety of the subset is predicted “positive” if at least one image in the subset is predicted “positive”. As an example of MIL in histopathology, a biopsy whole-slide image would be predicted “cancerous” when one extracted tile is predicted as such.1 This framework may be particularly helpful when single annotations are provided across an entire image, or “weak supervision”, and not necessarily labels for each specific segmented ROI. In addition, Attention, or a numeric weight, can be assigned to each image tile to produce weighted predictions, as well as provide explainable heatmaps. Using Attention, CLAM was developed to increase the speed of MIL and reduce the noise from irrelevant image tiles.
Vision Transformers (ViT) are a novel technique that do not utilize the convolution operator.24 The entire image is separated into a grid of sub-images that are analyzed in parallel along with the relative location of each sub-image. With this method, global relationships across the entire image may be learned by the model as opposed to only local features that are seen by the previously-described CNN.
Currently, most architectures for hematology-specific questions utilize ResNet architectures, with just a few examples also incorporating MIL. However, the emergence of ViT and CLAM frameworks are part of a changing landscape of implemented DL architectures. In general, the choice of model architecture is somewhat informed by expected outcome task, but it is still largely empiric. However, there are broad advantages and disadvantages for each of the previously-mentioned frameworks. With weak supervision, MIL tends to require significantly larger amounts of training data than ResNets.1 CNN and ViT perform equally well at the scale of currently available clinical datasets. However, ViT are superior to CNN for larger scale datasets and are more computationally efficient with significantly fewer parameters.24 There are numerous methods to attempt to explain the inner mechanisms of standard CNN,32 but similar methods to “open the black box” of ViT are currently under development.33
Explainability
While “explainability” in DL research is loosely defined, in this review, “explainability” refers to the efforts in describing DL models and predictions in humanly-understandable concepts.34
Although DL may empirically exhibit a high performance, DL is often criticized for its highly complex mechanisms and is often thought of as a “black box.” In multiple examples, seemingly high-performing models often utilize artifact or contextually irrelevant features for its predictions, as the artifactual features may be unintentionally over-represented in certain imaging subgroups.35 Multiple methods are under development to explain and validate biologically reasonable predictions. As such, explainability is increasingly important in clinical AI development and in developing physicians’ trust of DL.36
To give just a few examples, unsupervised data dimensionality reduction methods such as principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) are statistical techniques used to group visually similar input images into clusters, which may overlap with relevant outcomes. These methods are also popularized in non-imaging data such as single cell molecular and cytometry time-of-flight analyses. Feature maps are direct visual representations of the intermediate trained parameters. Plotting Attention scores or using Saliency map methods such as Grad-CAM or Smooth-Grad can overlay heat-maps upon the input image to highlight relevant visual cues associated with the outcome of interest.35 For example, the heatmap explainability methods of a peripheral blood smear image may highlight pathognomonic Auer Rods for the accurate diagnosis of acute promyelocytic leukemia (Figure 3). More complex methods such as Generative Adversarial Networks are architectures trained to generate synthetic images, which can create representations of a particular class or outcome.37
Metrics
Common performance metrics for the evaluation of DL classification models include Area Under the Receiver Operator Curve (AUROC), sensitivity, specificity, and accuracy. The AUROC represent the tradeoff between true and false positive rates for a binary model along a range of possible threshold values. AUROC values nearing 1.0 represent a model with perfect discriminatory power, and values tending towards 0.5 perform no better than random chance.
For segmentation tasks, the Sørensen-Dice similarity coefficient (Dice) represents the overlap between the predicted area of interest with the ground truth, where a Dice coefficient of 1.0 represents ideal predictive overlap. Other segmentation metrics include the similarly defined Jaccard index, also known as Intersection over Union (IoU).
Literature review for clinical application of deep learning in hematologic conditions
A Boolean query was submitted to PubMed to extract articles created between January 1, 1990, and August 1, 2022. Search terms included both a “deep learning” and a “hematology” specific term (Figure 4A). The query resulted in 2,708 initial articles. Further refinement by manual review by one author excluded a large number of articles (Figure 4B), resulting in 65 manuscripts. General trends and findings of the resulting articles are described in the context of how DL has been utilized to enhance phases of clinical care within various hematologic conditions, including task automation, detail optimization, disease detection, differential diagnosis, disease classification, risk prediction, complication assessment, therapy response, and survival prediction (Figure 5). General considerations for critical appraisal of the following manuscripts include performance metrics, use of external or prospective validation cohorts, use of explainability methods, and comparison with human expert performance (Tables 1-4).
Task automation
Routine clinical workflows in pathology and radiology may involve repetitive actions. Automation models can be developed to increase efficiency and decrease physician burden for tasks such as counting cell types in peripheral blood smears or contouring the borders of suspicious lesions on imaging. For pathology workflows, DL models trained to contour white blood cell (WBC) borders in peripheral blood smears were highly effective with near perfect Dice co-efficients in multiple cohorts.38 Automatic detection of cells can be put through downstream analyses and provide an automated cell count, for which DL-based methods achieve high accuracy.39
In addition, chromosomal analyses are standard for diagnosis and prognostication for multiple hematologic malignancies. Manual segmentation and rotation of digital karyograms is time-consuming, but automated models can significantly expedite throughput.40 In radiology workflows, contouring suspicious lesions or organs can help characterize downstream parameters such as volume, width, and avidity. Hypermetabolic lesions on PET/CT have been localized with DL algorithms for multiple adult and pediatric lymphomas or multiple myeloma lesions.41,42 Segmentation metrics were reportedly high, with Dice coefficient 0.86-0.98 among various lymphomatous conditions.43-45 For other conditions, the automated volume calculation of particular regions of interest have been explored in myeloproliferative neoplasms (MPN) for spleen volume,46 as well as clot burden quantification for new pulmonary emboli.47
Detail optimization
For expert diagnosticians, image quality is critical for the identification of disease. Using U-Net architectures, DL-enhanced images may improve user readability and potentially reduce the amount of toxic contrast material given to patients. Enhancement of peripheral blood images to assess red blood cell (RBC) aberrations have yielded promising results. For sickle cell disease, mobile-device photos of peripheral blood have been digitally upscaled to match laboratory microscope quality; upon further validation, the upscaled images retained relevant visual cues with near-perfect classification.48 However, similar attempts to detect malaria RBC inclusion were less successful, noting that CNN-based enhancement of peripheral blood images was insufficient to resolve parasites that were not already easily distinguishable at low resolution.49 Multiple optimization efforts in radiology have investigated whether DL can improve image quality from lower-contrast images, which may help spare patients from nephrotoxic or radioactive risks. For both positron emission tomography/magnetic resonance imaging (PET-MRI) in lymphomatous conditions and computed tomography (CT) scans in multiple myeloma, authors have concluded that reduced contrast volumes may be feasible while still maintaining diagnostic quality.50, 51
Disease detection
In clinical practice, a common initial diagnostic step for hematologic disorders is the analysis of peripheral blood to observe morphologic abnormalities of RBC, WBC, and platelets. The detection of structural RBC aberrations can identify certain infectious diseases and hemoglobinopathies. In endemic areas of malaria, the Plasmodium parasites are often identified by light microscopy as RBC inclusions. Multiple DL initiatives report high accuracy and good model performance for the diagnosis of malaria from peripheral blood in both cross-validated and external cohorts.52-56 Other RBC aberrations, such as hemoglobin H inclusions in α-thalassemia, can be detected by DL with appropriate peripheral blood staining protocols.57 With regards to transfusion medicine needs, the quality and degradation of RBC products prior to transfusion can also be determined with DL methods. Using explainability techniques, Doan et al. explored their proposed autoencoder network trained on RBC images to identify novel features associated with poor storage quality RBC products.58 For certain disorders, the detection of aberrant WBC morphologies from peripheral blood is paramount. DL algorithms consistently detect dysplastic neutrophils pathognomonic for myelodysplastic syndrome (MDS),59 as well as other white blood precursors to aid in the diagnosis of MPN,60 acute promyelocytic leukemia (APL),12 or acute lymphoblastic leukemia (ALL).61,62 Many DL models for WBC detection have performed with high accuracy and AUROC upon internal validation strategies. If translated into clinical practice, DL models for peripheral blood assessment may expedite critical diagnoses which necessitate emergent therapy, such as APL.
Particularly for myeloid malignancies, bone marrow assessment is usually needed to establish a diagnosis. DL can detect particular cellular morphologies of neutrophils, megakaryocytes, promyelocytes, and plasma cells associated with MDS,63 MPN,64 APL,13 and multiple myeloma,65 respectively. Similarly for lymphoid malignancies, assessing lymph node architectures can aid the diagnosis of various lymphomas, such as diffuse large B-cell lymphoma (DLBCL)66 or follicular lymphoma (FL).67 DL models developed by Li et al. maintained high accuracy for the diagnosis of DLBCL from lymph biopsies across four separate institutional cohorts.66 Furthermore, Syrykh et al. utilized the clinical challenge of differentiating follicular lymphoma from follicular hyperplasia to develop a novel DL method quantifying prediction uncertainty, which is not often reported in DL studies. With their uncertainty method, the authors report higher classification capabilities when only considering the newly categorized low-uncertainty images.67
In addition to pathologic analysis, clinical guidelines commonly suggest radiologic assessment for the initial workup of suspected malignancy or thrombosis. Using PET/CT images, DL models exhibit high classification of the hypermetabolic lesions for DLBCL diagnosis.68 However, similar attempts using PET/CT images of mantle cell lymphoma (MCL) patients are challenged with tradeoffs between sensitivity and false positive rates for diagnosis in external cohorts.69 For select non-malignant conditions, multiple studies explored DL for the expedited and more affordable diagnosis of pulmonary emboli (PE) and deep vein thromboses (DVT), for which a diagnosis may require immediate intervention.70-72 Huang et al. integrated clinical data in conjunction with CT scans to improve their DL model for PE detection. The authors report that multi-modal models exhibit higher classification performance than image-only DL models.71 In addition, automated detection of common thrombotic conditions may reduce the financial burden, with cost analyses revealing positive financial benefit to health care systems.72
Finally, a particularly novel use of DL is the prediction of disease from imaging modalities beyond standard pathologic or radiologic domains. Multiple studies have shown that anemia can be detected with high accuracy utilizing DL on atypical modalities such as electrocardiograms (ECG)73 or funduscopic examinations.74 Both authors have implemented explainability methods to reveal features associated with anemia, such as QRS complexes in ECG or optic disk aberrations in funduscopic images. Thus, screening for anemia may offer a low-cost benefit for patients already undergoing these common examinations.
Differential diagnosis
Various hematologic conditions share similar features and presentations, posing challenges in providing a definitive diagnosis in clinical scenarios where radiologic findings may be non-specific and pathological morphologies may be subtle. Differentiating among possible diagnoses is a common clinical task, and various approaches of DL have been explored as a potential means to increase objectivity towards a true diagnosis. For example, Li et al. used transfer learning to pre-train their model with images of common household objects, such as bananas, rings, and pears to learn the analogous morphologies of similarly-shaped RBC inclusions of Toxoplasma, Plasmodium, and Babesia.75
Interestingly, DL models have been shown to better extract subtle features for disease differentiation than can be assessed by humans. Cytopenias can be a common presentation for either MDS or aplastic anemia (AA) patients. Though either diagnosis typically requires bone marrow biopsy assessment, Kimura et al. trained a DL model on peripheral blood images to accurately differentiate between the two conditions.76 Newly diagnosed leukemia patients commonly present with blasts in peripheral blood. The categorization of blasts into either myeloid or lymphoid lineages requires identifying cell-surface markers by flow cytometry; thus, visualization of blasts is not usually sufficient for classification. Similarly, lymphoma histology share visual commonalities and require immunohistochemical staining of cell-surface markers on biopsy specimens. To address these classification challenges, DL algorithms reportedly differentiate between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) utilizing only peripheral blood or bone marrow images,9,77,78 and similarly among various non-Hodgkin lymphomas (NHL) utilizing standard hematoxylin and eosin (H&E) lymph node biopsy images.79-82
For patients with malignant brain lesions found on MRI imaging, clinicians may be tasked to differentiate between primary central nervous system lymphoma (PCNSL) and glioblastoma multiforme (GBM).83 DL models for this revealed seemingly high initial performance but with a significant reduction to an AUROC of nearly 0.5 in external cohorts.83 The problem of generalizing results highlights the continued need for critical appraisal of any newly-developed DL model across patient populations.
Disease classification
The classification of blood cells in standard peripheral blood smear review is a ubiquitous task useful in a broad array of diseases. The differential of WBC is necessary to stratify the likelihood of the malignant and non-malignant causes of WBC abnormalities. Numerous studies developed DL models as a single cell WBC classifier. Across the studies, performance remained robust, with the majority of studies achieving accuracies above 90% and explainability techniques highlighting sensitive cellular features.11,84-86 However, validation upon external cohorts, which commonly reveal a lower performance,11 is still needed prior to deployment in clinical practice. In addition to WBC classification, the categorization of RBC morphologies is useful within various anemias,87 including sickle cell disease.88 To explore platelet abnormalities, Zhou et al. developed a highly accurate DL model predicting the identity of agonists causing platelet aggregation using imaging flow cytometry.89
Specific sub-classification of diagnoses is often necessary to guide prognostication, counseling, and therapeutic considerations. In numerous non-hematologic applications, previous DL models can accurately further categorize various cancers into genetic and clinical subtypes,4 which has led to similar explorations within leukemic and lymphomatous conditions. For leukemic classifications, ALL bone marrow images can be separated into the historically relevant French-American-British (FAB) classifications.90 Furthermore, genomic subtypes may be accurately identified by DL models; Eckardt et al. identified NPM1 mutations among newly diagnosed AML patients and characterized novel cellular morphological features that had not previously been reported.14 Broader DL efforts to identify each clinically relevant molecular or cytogenetic abnormality have been attempted for MDS sub-classifications.91 For lymphoma, Swiderska-Chadaj et al. developed a DL model predicting MYC gene rearrangements in DLBCL patients using lymph node biopsy images. Though MYC rearrangement is typically assessed with ancillary fluorescent in situ hybridization, the DL model using only H&E images maintained high accuracy upon external cohorts.92
Advanced stages of patient care
There are currently few examples of DL for the assistance of later stages of patient care, including risk prediction, complication assessment, therapy response, and survival prediction. For such tasks, the disease processes and image modalities are heterogenous. Risk has been assessed with CT images or digitalized bone marrow biopsies (BMB) for DLBCL outcomes. DL models predict the transformation of low-grade lymphomas to high-grade DLBCL using BMB images,93 and, furthermore, known clinical risk factors such as sarcopenia can be extracted and quantified in CT images of DLBCL patients.94 Risk in thrombotic conditions can be characterized automatically using DL classification of right ventricular strain in chest imaging for PE workup.95 Cai et al. assessed complications of sickle cell disease by detecting sea fan neovascularization in funduscopic images, which is a vision-threatening complication warranting prophylactic management.96 Doan et al. evaluated therapy response in ALL patients by using DL methods to detect residual lymphoblasts after receiving induction chemotherapy.97 Finally, DL models for relapse prediction using baseline imaging have been developed for extranodal natural killer/T-cell lymphoma98 and mantle cell lymphoma.99 However, further evaluations upon external cohorts are needed for these advanced stage tasks.
Conclusions
The use of deep learning in hematologic conditions has attracted significant interest in recent years. As noted above, researchers have utilized multiple data structures including radiologic images, pathology specimens, clinical data, and atypical imaging such as funduscopic examinations to perform a variety of clinically relevant tasks. Most Authors reported high model performance for disease diagnosis, segmentation, and subtyping. Other studies explored tasks beyond human capabilities such as genomic inference and prognostication from imaging analysis alone. Few studies have used hematologic conditions as a means to implement state-of-the-art architectures to improve the field of DL in general. Compared to other clinical domains, DL in hematology is still in its infancy, so it is not widely used in clinical practice. As such, the intention of this review is to introduce broad concepts to hematology clinicians to assist in the evaluation and understanding of future DL implementations, as well as to provide an overview of the clinical uses currently being explored throughout patient care.
The fact that it is still early days for DL in hematology may be due to a lack of appropriate algorithm design, data availability, computational resources, and insufficient disease-specific expertise involved in DL development.100 To the best of our knowledge, there are still no large clinically-annotated multi-modal public datasets for many hematologic conditions. In addition, critical morphological information in hematopathology may only be available at higher magnification levels, surpassing the limits of standard pathology scanners. Although these structural barriers continue to compromise the development of DL in hematology, rapid technological advances continue, and interest for DL within the academic community is growing.101
Though promising, the methods and conclusions from the numerous studies are heterogenous and challenging to compare. As yet, there is no standardized approach in DL research, reporting, or implementation. In the present overview, the majority of publications were evaluated by internal validation strategies, with the minority evaluated on external institution cohorts. Explaining model predictions were not ubiquitous, and few DL models were compared directly against human evaluation. Major government initiatives currently aim to standardize DL protocol design,102 and, despite the variance in outcome reporting in DL analyses, the SPIRIT-AI, STARD-AI, and CONSORT-AI initiatives aim to standardize future clinical trial design and reporting of artificial intelligence interventions.103-105
The research and results of DL analyses must be interpreted cautiously, as a number of practical and ethical issues have arisen in other domains of machine learning. CNN are prone to “memorize” the training set; thus, the initial high performance may fail to be carried forward on new previously unseen data. For this reason, it is imperative to evaluate DL models on external cohorts from separate institutions. If training data are acquired from multiple institutions, care must be given to correct for known “batch effects,” as DL models may infer site-specific artifact signatures not related to the underlying disease biology.106 Similarly, researchers should investigate explainability and error analysis to ensure that the models rely on scientifically reasonable features and ignore irrelevant factors. In addition, uncertainty in model predictions are rarely reported but are arguably necessary for clinical implementation of DL algorithms.
In this review, the majority of DL applications are aimed towards earlier phases of clinical care, such as automation and disease detection. DL in lymphoma resulted in the plurality of exploratory analyses, likely due to the importance of both radiologic and pathologic findings in the care of lymphoma patients. Though explored in a myriad of malignant and non-malignant conditions, notably lacking are DL applications in stem cell transplantation and many other non-malignant processes where morphological assessment is paramount, such as thrombotic microangiopathies.
Future work is needed to address large scale applications of DL in hematology. As a hematopathologist typically assesses histology specimens at different magnification levels, customized architectures to implement multi-scale image analysis should be explored. DL in solid oncology is widely used, in part due to the publicly available digital biopsy specimens provided by The Cancer Genome Atlas,107 of which there is no analogous database for hematologic conditions. In addition, the combination of multi-modal data structures that incorporate images in concert with flow cytometry, molecular analyses, cytogenetics, or other clinical factors may provide additional relevant features to improve DL models. While numerous considerations remain before large-scale implementation of DL is feasible, the development of new models and applications in hematology is rapidly increasing, and it is imperative for clinicians to be aware of the opportunities that DL may provide.
Footnotes
- Received June 3, 2022
- Accepted January 18, 2023
Correspondence
Disclosures
ATP reports support via grants from NIH/NCI U01-CA243075, grants from NIH/NIDCR R56-DE030958, grants from Horizon 2021-SC1-BHC, grants from DoD Breakthrough Cancer Research program BC211095, and grants from SU2C (Stand Up to Cancer) Fanconi Anemia Research Fund – Farrah Fawcett Foundation Head and Neck Cancer Research Team Grant during the conduct of the study. ATP reports grants from Abbvie via the UChicago – Abbvie Joint Steering Committee Grant, and grants from Kura Oncology. ATP reports personal fees from Prelude Therapeutics Advisory Board, from Elevar Advisory Board, from Ayala Advisory Board, personal fees from Abbvie Advisory Board, and from Privo Advisory Board outside of submitted work. MES is an employee of Sonic Healthcare. AS has no conflicts of interest to disclose.
Contributions
All authors conceived the manuscript. AS wrote the manuscript. All authors approved the final version.
Data-sharing statement
No applicable.
Funding
ATP reports this study has been funded in whole or in part with Federal funding by the NCI-DOE Collaboration established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health, Cancer Moonshot Task Order N. 75N91019F00134 and under Frederick National Laboratory for Cancer Research Contract 75N91019D00024. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357.
References
- Campanella G, Hanna MG, Geneslaw L. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019; 25(8):1301-1309. https://doi.org/10.1038/s41591-019-0508-1PubMedPubMed CentralGoogle Scholar
- Kather JN, Pearson AT, Halama N. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med. 2019; 25(7):1054-1056. https://doi.org/10.1038/s41591-019-0462-yPubMedPubMed CentralGoogle Scholar
- Coudray N, Ocampo PS, Sakellaropoulos T. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med. 2018; 24(10):1559-1567. https://doi.org/10.1038/s41591-018-0177-5PubMedPubMed CentralGoogle Scholar
- Kather JN, Heij LR, Grabsch HI. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat Cancer. 2020; 1(8):789-799. https://doi.org/10.1038/s43018-020-0087-6PubMedPubMed CentralGoogle Scholar
- Saillard C, Schmauch B, Laifa O. Predicting survival after hepatocellular carcinoma resection using deep learning on histological slides. Hepatology. 2020; 72(6):2000-2013. https://doi.org/10.1002/hep.31207PubMedGoogle Scholar
- Artificial intelligence and machine learning (AI/ML)-enabled medical devices. 2021. Publisher Full TextGoogle Scholar
- Matek C, Krappe S, Münzenmayer C, Haferlach T, Marr C. An expert-annotated dataset of bone marrow cytology in hematologic malignancies. The Cancer Imaging Archive. 2021. Google Scholar
- Matek C, Schwarz S, Marr C, Spiekermann K. A single-cell morphological dataset of leukocytes from AML patients and non-malignant controls. The Cancer Imaging Archive. 2019. Google Scholar
- Schouten JPE, Matek C, Jacobs LFP, Buck MC, Bosnacki D, Marr C. Tens of images can suffice to train neural networks for malignant leukocyte detection. Sci Rep. 2021; 11(1):7995. https://doi.org/10.1038/s41598-021-86995-5PubMedPubMed CentralGoogle Scholar
- Matek C, Schwarz S, Spiekermann K, Marr C. Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks. Nature Machine Intelligence. 2019; 1(11):538-544. https://doi.org/10.1038/s42256-019-0101-9Google Scholar
- Matek C, Krappe S, Munzenmayer C, Haferlach T, Marr C. Highly accurate differentiation of bone marrow cell morphologies using deep neural networks on a large image data set. Blood. 2021; 138(20):1917-1927. https://doi.org/10.1182/blood.2020010568PubMedPubMed CentralGoogle Scholar
- Sidhom JW, Siddarthan IJ, Lai BS. Deep learning for diagnosis of acute promyelocytic leukemia via recognition of genomically imprinted morphologic features. NPJ Precis Oncol. 2021; 5(1):38. https://doi.org/10.1038/s41698-021-00179-yPubMedPubMed CentralGoogle Scholar
- Eckardt JN, Schmittmann T, Riechert S. Deep learning identifies acute promyelocytic leukemia in bone marrow smears. BMC Cancer. 2022; 22(1):201. https://doi.org/10.1186/s12885-022-09307-8PubMedPubMed CentralGoogle Scholar
- Eckardt JN, Middeke JM, Riechert S. Deep learning detects acute myeloid leukemia and predicts NPM1 mutation status from bone marrow smears. Leukemia. 2022; 36(1):111-118. https://doi.org/10.1038/s41375-021-01408-wPubMedPubMed CentralGoogle Scholar
- Chollet F. Deep learning with Python. 1st ed. 2017. Google Scholar
- Murphy KP. Probabilistic machine learning: an introduction. 2022. Google Scholar
- Liu Y, Chen PC, Krause J, Peng L. How to read articles that use machine learning: Users' Guides to the medical literature. JAMA. 2019; 322(18):1806-1816. https://doi.org/10.1001/jama.2019.16489PubMedGoogle Scholar
- Bankhead P, Loughrey MB, Fernandez JA. QuPath: open source software for digital pathology image analysis. Sci Rep. 2017; 7(1):16878. https://doi.org/10.1038/s41598-017-17204-5PubMedPubMed CentralGoogle Scholar
- Dolezal J, Kochanny S, Howard F. Slideflow: a unified deep learning pipeline for digital histology: Zenodo. 2022. Google Scholar
- Szegedy C, Liu W, Jia Y. Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); June 7-12 2015. Boston, MA, USA. IEEE. c2015;1-9. https://doi.org/10.1109/CVPR.2015.7298594Google Scholar
- Olah C, Satyanarayan A, Johnson I. The building blocks of interpretability. Distill. 2018; 3(3)https://doi.org/10.23915/distill.00010Google Scholar
- Deng J, Dong W, Socher R, Li L, Kai L, Li F-F. ImageNet: a large-scale hierarchical image database. Miami, FL, USA. IEEE. c2009;248-255. https://doi.org/10.1109/CVPR.2009.5206848Google Scholar
- Riasatian A, Babaie M, Maleki D. Fine-tuning and training of densenet for histopathology image representation using TCGA diagnostic slides. Med Image Anal. 2021; 70:102032. https://doi.org/10.1016/j.media.2021.102032PubMedGoogle Scholar
- Dosovitskiy A, Beyer L, Kolesnikov A. An image is worth 16x16 words: transformers for image recognition at scale. arXiv. 2021. Google Scholar
- Bianco S, Cadene R, Celona L, Napoletano P. Benchmark analysis of representative deep neural network architectures. IEEE access. 2018; 6:64270-64277. https://doi.org/10.1109/ACCESS.2018.2877890Google Scholar
- Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006; 313(5786):504-507. https://doi.org/10.1126/science.1127647PubMedGoogle Scholar
- Combalia M, Vilaplana V. Monte-Carlo sampling applied to multiple instance learning for histological image classification. c2018;274-281. https://doi.org/10.1007/978-3-030-00889-5_31Google Scholar
- Dietterich TG, Lathrop RH, Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence. 1997; 89(1-2):31-71. https://doi.org/10.1016/S0004-3702(96)00034-3Google Scholar
- Ilse M, Tomczak J, Welling M. Attention-based deep multiple instance learning. Stockholm, Sweden. PMLR. c2018;2127-2136. Google Scholar
- Sadafi A, Makhro A, Bogdanova A. Attention based multiple instance learning for classification of blood cell disorders. c2020;246-256. https://doi.org/10.1007/978-3-030-59722-1_24Google Scholar
- Lu MY, Williamson DFK, Chen TY, Chen RJ, Barbieri M, Mahmood F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng. 2021; 5(6):555-570. https://doi.org/10.1038/s41551-020-00682-wPubMedPubMed CentralGoogle Scholar
- Hooker S, Erhan D, Kindermans P-J, Kim B. A benchmark for interpretability methods in deep neural networks. In: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019); Vancouver, Canada. c2019;9734-9745. Google Scholar
- Raghu M, Unterthiner T, Kornblith S, Zhang C, Dosovitskiy A. Do vision transformers see like convolutional neural networks. c2021;12116-12128. Google Scholar
- Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit Health. 2021; 3(11):e745-e750. https://doi.org/10.1016/S2589-7500(21)00208-9PubMedGoogle Scholar
- Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M. SmoothGrad: removing noise by adding noise. arXiv. 2017. Google Scholar
- Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019; 1(5):206-215. https://doi.org/10.1038/s42256-019-0048-xPubMedPubMed CentralGoogle Scholar
- Krause J, Grabsch HI, Kloor M. Deep learning detects genetic alterations in cancer histology generated by adversarial networks. J Pathol. 2021; 254(1):70-79. https://doi.org/10.1002/path.5638PubMedGoogle Scholar
- Fan H, Zhang F, Xi L, Li Z, Liu G, Xu Y. LeukocyteMask: an automated localization and segmentation method for leukocyte in blood smear images using deep neural networks. J Biophotonics. 2019; 12(7):e201800488. https://doi.org/10.1002/jbio.201800488PubMedGoogle Scholar
- Alam MM, Islam MT. Machine learning approach of automatic identification and counting of blood cells. Healthc Technol Lett. 2019; 6(4):103-108. https://doi.org/10.1049/htl.2018.5098PubMedPubMed CentralGoogle Scholar
- Vajen B, Hanselmann S, Lutterloh F. Classification of fluorescent R-Band metaphase chromosomes using a convolutional neural network is precise and fast in generating karyograms of hematologic neoplastic cells. Cancer Genet. 2022; 260:23-29. https://doi.org/10.1016/j.cancergen.2021.11.005PubMedGoogle Scholar
- Jemaa S, Fredrickson J, Carano RAD, Nielsen T, de Crespigny A, Bengtsson T. Tumor segmentation and feature extraction from whole-body FDG-PET/CT using cascaded 2D and 3D convolutional neural networks. J Digit Imaging. 2020; 33(4):888-894. https://doi.org/10.1007/s10278-020-00341-1PubMedPubMed CentralGoogle Scholar
- Xu L, Tetteh G, Lipkova J. Automated whole-body bone lesion detection for multiple myeloma on (68)Ga-Pentixafor PET/CT imaging using deep learning methods. Contrast Media Mol Imaging. 2018; 2018:2391925. https://doi.org/10.1155/2018/2391925PubMedPubMed CentralGoogle Scholar
- Weisman AJ, Kieler MW, Perlman SB. Convolutional neural networks for automated PET/CT detection of diseased lymph node burden in patients with lymphoma. Radiol Artif Intell. 2020; 2(5):e200016. https://doi.org/10.1148/ryai.2020200016PubMedPubMed CentralGoogle Scholar
- Weisman AJ, Kim J, Lee I. Automated quantification of baseline imaging PET metrics on FDG PET/CT images of pediatric Hodgkin lymphoma patients. EJNMMI Phys. 2020; 7(1):76. https://doi.org/10.1186/s40658-020-00346-3PubMedPubMed CentralGoogle Scholar
- Sadik M, Lopez-Urdaneta J, Ulen J. Artificial intelligence could alert for focal skeleton/bone marrow uptake in Hodgkin's lymphoma patients staged with FDG-PET/CT. Sci Rep. 2021; 11(1):10382. https://doi.org/10.1038/s41598-021-89656-9PubMedPubMed CentralGoogle Scholar
- Yang Y, Tang Y, Gao R. Validation and estimation of spleen volume via computer-assisted segmentation on clinically acquired CT scans. J Med Imaging (Bellingham). 2021; 8(1):014004. https://doi.org/10.1117/1.JMI.8.1.014004PubMedPubMed CentralGoogle Scholar
- Liu W, Liu M, Guo X. Evaluation of acute pulmonary embolism and clot burden on CTPA with deep learning. Eur Radiol. 2020; 30(6):3567-3575. https://doi.org/10.1007/s00330-020-06699-8PubMedGoogle Scholar
- de Haan K, Ceylan Koydemir H, Rivenson Y. Automated screening of sickle cells using a smartphone-based microscope and deep learning. NPJ Digit Med. 2020; 3(1):76. https://doi.org/10.1038/s41746-020-0282-yPubMedPubMed CentralGoogle Scholar
- Shaw M, Claveau R, Manescu P. Optical mesoscopy, machine learning, and computational microscopy enable high information content diagnostic imaging of blood films. J Pathol. 2021; 255(1):62-71. https://doi.org/10.1002/path.5738PubMedGoogle Scholar
- Huber N, Anderson T, Missert A. Clinical evaluation of a phantom-based deep convolutional neural network for whole-body-low-dose and ultra-low-dose CT skeletal surveys. Skeletal Radiol. 2022; 51(1):145-151. https://doi.org/10.1007/s00256-021-03828-2PubMedGoogle Scholar
- Theruvath AJ, Siedek F, Yerneni K. Validation of deep learning-based augmentation for reduced (18)F-FDG dose for PET/MRI in children and young adults with lymphoma. Radiol Artif Intell. 2021; 3(6):e200232. https://doi.org/10.1148/ryai.2021200232PubMedPubMed CentralGoogle Scholar
- Rajaraman S, Antani SK, Poostchi M. Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ. 2018; 6:e4568. https://doi.org/10.7717/peerj.4568PubMedPubMed CentralGoogle Scholar
- Rajaraman S, Silamut K, Hossain MA. Understanding the learned behavior of customized convolutional neural networks toward malaria parasite detection in thin blood smear images. J Med Imaging (Bellingham). 2018; 5(3):034501. https://doi.org/10.1117/1.JMI.5.3.034501PubMedPubMed CentralGoogle Scholar
- Kuo PC, Cheng HY, Chen PF. Assessment of expert-level automated detection of plasmodium falciparum in digitized thin blood smear images. JAMA Netw Open. 2020; 3(2):e200206. https://doi.org/10.1001/jamanetworkopen.2020.0206PubMedPubMed CentralGoogle Scholar
- Manescu P, Shaw MJ, Elmi M. Expert-level automated malaria diagnosis on routine blood films with deep neural networks. Am J Hematol. 2020; 95(8):883-891. https://doi.org/10.1002/ajh.25827PubMedGoogle Scholar
- Li S, Du Z, Meng X, Zhang Y. Multi-stage malaria parasite recognition by deep learning. Gigascience. 2021; 10(6):giab040. https://doi.org/10.1093/gigascience/giab040PubMedPubMed CentralGoogle Scholar
- Lee SY, Chen CME, Lim EYP. Image analysis using machine learning for automated detection of hemoglobin H inclusions in blood smears - a method for morphologic detection of rare cells. J Pathol Inform. 2021; 12:18. https://doi.org/10.4103/jpi.jpi_110_20PubMedPubMed CentralGoogle Scholar
- Doan M, Sebastian JA, Caicedo JC. Objective assessment of stored blood quality by deep learning. Proc Natl Acad Sci U S A. 2020; 117(35):21381-21390. https://doi.org/10.1073/pnas.2001227117PubMedPubMed CentralGoogle Scholar
- Acevedo A, Merino A, Boldu L, Molina A, Alferez S, Rodellar J. A new convolutional neural network predictive model for the automatic recognition of hypogranulated neutrophils in myelodysplastic syndromes. Comput Biol Med. 2021; 134:104479. https://doi.org/10.1016/j.compbiomed.2021.104479PubMedGoogle Scholar
- Kimura K, Ai T, Horiuchi Y. Automated diagnostic support system with deep learning algorithms for distinction of Philadelphia chromosome-negative myeloproliferative neoplasms using peripheral blood specimen. Sci Rep. 2021; 11(1):3367. https://doi.org/10.1038/s41598-021-82826-9PubMedPubMed CentralGoogle Scholar
- Sahlol AT, Kollmannsberger P, Ewees AA. Efficient classification of white blood cell leukemia with improved swarm optimization of deep features. Sci Rep. 2020; 10(1):2536. https://doi.org/10.1038/s41598-020-59215-9PubMedPubMed CentralGoogle Scholar
- Shafique S, Tehsin S. Acute lymphoblastic leukemia detection and classification of its subtypes using pretrained deep convolutional neural networks. Technol Cancer Res Treat. 2018; 17:1533033818802789. https://doi.org/10.1177/1533033818802789PubMedPubMed CentralGoogle Scholar
- Mori J, Kaji S, Kawai H. Assessment of dysplasia in bone marrow smear with convolutional neural network. Sci Rep. 2020; 10(1):14734. https://doi.org/10.1038/s41598-020-71752-xPubMedPubMed CentralGoogle Scholar
- Sirinukunwattana K, Aberdeen A, Theissen H. Artificial intelligence-based morphological fingerprinting of megakaryocytes: a new tool for assessing disease in MPN patients. Blood Adv. 2020; 4(14):3284-3294. https://doi.org/10.1182/bloodadvances.2020002230PubMedPubMed CentralGoogle Scholar
- Gehlot S, Gupta A, Gupta R. A CNN-based unified framework utilizing projection loss in unison with label noise handling for multiple myeloma cancer diagnosis. Med Image Anal. 2021; 72:102099. https://doi.org/10.1016/j.media.2021.102099PubMedGoogle Scholar
- Li D, Bledsoe JR, Zeng Y. A deep learning diagnostic platform for diffuse large B-cell lymphoma with high accuracy across multiple hospitals. Nat Commun. 2020; 11(1):6004. https://doi.org/10.1038/s41467-020-19817-3PubMedPubMed CentralGoogle Scholar
- Syrykh C, Abreu A, Amara N. Accurate diagnosis of lymphoma on whole-slide histopathology images using deep learning. NPJ Digit Med. 2020; 3(1):63. https://doi.org/10.1038/s41746-020-0272-0PubMedPubMed CentralGoogle Scholar
- Sibille L, Seifert R, Avramovic N. (18)F-FDG PET/CT uptake classification in lymphoma and lung cancer by using deep convolutional neural networks. Radiology. 2020; 294(2):445-452. https://doi.org/10.1148/radiol.2019191114PubMedGoogle Scholar
- Zhou Z, Jain P, Lu Y. Computer-aided detection of mantle cell lymphoma on (18)F-FDG PET/CT using a deep learning convolutional neural network. Am J Nucl Med Mol Imaging. 2021; 11(4):260-270. Google Scholar
- Huang SC, Kothari T, Banerjee I. PENet-a scalable deep-learning model for automated diagnosis of pulmonary embolism using volumetric CT imaging. NPJ Digit Med. 2020; 3(1):61. https://doi.org/10.1038/s41746-020-0266-yPubMedPubMed CentralGoogle Scholar
- Huang SC, Pareek A, Zamanian R, Banerjee I, Lungren MP. Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection. Sci Rep. 2020; 10(1):22147. https://doi.org/10.1038/s41598-020-78888-wPubMedPubMed CentralGoogle Scholar
- Kainz B, Heinrich MP, Makropoulos A. Non-invasive diagnosis of deep vein thrombosis from ultrasound imaging with machine learning. NPJ Digit Med. 2021; 4(1):137. https://doi.org/10.1038/s41746-021-00503-7PubMedPubMed CentralGoogle Scholar
- Kwon JM, Cho Y, Jeon KH. A deep learning algorithm to detect anaemia with ECGs: a retrospective, multicentre study. Lancet Digit Health. 2020; 2(7):e358-e367. https://doi.org/10.1016/S2589-7500(20)30108-4PubMedGoogle Scholar
- Mitani A, Huang A, Venugopalan S. Detection of anaemia from retinal fundus images via deep learning. Nat Biomed Eng. 2020; 4(1):18-27. https://doi.org/10.1038/s41551-019-0487-zPubMedGoogle Scholar
- Li S, Yang Q, Jiang H, Cortes-Vecino JA, Zhang Y. Parasitologist-level classification of apicomplexan parasites and host cell with deep cycle transfer learning (DCTL). Bioinformatics. 2020; 36(16):4498-4505. https://doi.org/10.1093/bioinformatics/btaa513PubMedGoogle Scholar
- Kimura K, Tabe Y, Ai T. A novel automated image analysis system using deep convolutional neural networks can assist to differentiate MDS and AA. Sci Rep. 2019; 9(1):13385. https://doi.org/10.1038/s41598-019-49942-zPubMedPubMed CentralGoogle Scholar
- Ahmed N, Yigit A, Isik Z, Alpkocak A. Identification of leukemia subtypes from microscopic images using convolutional neural network. Diagnostics (Basel). 2019; 9(3):104. https://doi.org/10.3390/diagnostics9030104PubMedPubMed CentralGoogle Scholar
- Huang F, Guang P, Li F, Liu X, Zhang W, Huang W. AML, ALL, and CML classification and diagnosis based on bone marrow cell morphology combined with convolutional neural network: a STARD compliant diagnosis research. Medicine (Baltimore). 2020; 99(45):e23154. https://doi.org/10.1097/MD.0000000000023154PubMedPubMed CentralGoogle Scholar
- Achi HE, Belousova T, Chen L. Automated diagnosis of lymphoma with digital pathology images using deep learning. Ann Clin Lab Sci. 2019; 49(2):153-160. Google Scholar
- Guan Q, Wan X, Lu H. Deep convolutional neural network Inception-v3 model for differential diagnosing of lymph node in cytological images: a pilot study. Ann Transl Med. 2019; 7(14):307. https://doi.org/10.21037/atm.2019.06.29PubMedPubMed CentralGoogle Scholar
- Mohlman JS, Leventhal SD, Hansen T, Kohan J, Pascucci V, Salama ME. Improving augmented human intelligence to distinguish Burkitt lymphoma from diffuse large B-cell lymphoma cases. Am J Clin Pathol. 2020; 153(6):743-759. https://doi.org/10.1093/ajcp/aqaa001PubMedGoogle Scholar
- Miyoshi H, Sato K, Kabeya Y. Deep learning shows the capability of high-level computer-aided diagnosis in malignant lymphoma. Lab Invest. 2020; 100(10):1300-1310. https://doi.org/10.1038/s41374-020-0442-3PubMedGoogle Scholar
- Yun J, Park JE, Lee H, Ham S, Kim N, Kim HS. Radiomic features and multilayer perceptron network classifier: a robust MRI classification strategy for distinguishing glioblastoma from primary central nervous system lymphoma. Sci Rep. 2019; 9(1):5746. https://doi.org/10.1038/s41598-019-42276-wPubMedPubMed CentralGoogle Scholar
- Zhao J, Zhang M, Zhou Z, Chu J, Cao F. Automatic detection and classification of leukocytes using convolutional neural networks. Med Biol Eng Comput. 2017; 55(8):1287-1301. https://doi.org/10.1007/s11517-016-1590-xPubMedGoogle Scholar
- Lippeveld M, Knill C, Ladlow E. Classification of human white blood cells using machine learning for stain-free imaging flow cytometry. Cytometry A. 2020; 97(3):308-319. https://doi.org/10.1002/cyto.a.23920PubMedGoogle Scholar
- Wu YY, Huang TC, Ye RH. A hematologist-level deep learning algorithm (BMSNet) for assessing the morphologies of single nuclear balls in bone marrow smears: algorithm development. JMIR Med Inform. 2020; 8(4)https://doi.org/10.2196/15963PubMedPubMed CentralGoogle Scholar
- Durant TJS, Olson EM, Schulz WL, Torres R. Very deep convolutional neural networks for morphologic classification of erythrocytes. Clin Chem. 2017; 63(12):1847-1855. https://doi.org/10.1373/clinchem.2017.276345PubMedGoogle Scholar
- Xu M, Papageorgiou DP, Abidi SZ, Dao M, Zhao H, Karniadakis GE. A deep convolutional neural network for classification of red blood cells in sickle cell anemia. PLoS Comput Biol. 2017; 13(10):e1005746. https://doi.org/10.1371/journal.pcbi.1005746PubMedPubMed CentralGoogle Scholar
- Zhou Y, Yasumoto A, Lei C. Intelligent classification of platelet aggregates by agonist type. Elife. 2020; 9:e52779. https://doi.org/10.7554/eLife.52938PubMedPubMed CentralGoogle Scholar
- Rehman A, Abbas N, Saba T, Rahman SIU, Mehmood Z, Kolivand H. Classification of acute lymphoblastic leukemia using deep learning. Microsc Res Tech. 2018; 81(11):1310-1317. https://doi.org/10.1002/jemt.23139PubMedGoogle Scholar
- Bruck OE, Lallukka-Bruck SE, Hohtari HR. Machine learning of bone marrow histopathology identifies genetic and clinical determinants in patients with MDS. Blood Cancer Discov. 2021; 2(3):238-249. https://doi.org/10.1158/2643-3230.BCD-20-0162PubMedPubMed CentralGoogle Scholar
- Swiderska-Chadaj Z, Hebeda KM, van den Brand M, Litjens G. Artificial intelligence to detect MYC translocation in slides of diffuse large B-cell lymphoma. Virchows Arch. 2021; 479(3):617-621. https://doi.org/10.1007/s00428-020-02931-4PubMedPubMed CentralGoogle Scholar
- Irshaid L, Bleiberg J, Weinberger E. Histopathologic and machine deep learning criteria to predict lymphoma transformation in bone marrow biopsies. Arch Pathol Lab Med. 2022; 146(2):182-193. https://doi.org/10.5858/arpa.2020-0510-OAPubMedGoogle Scholar
- Jullien M, Tessoulin B, Ghesquieres H. Deep-learning assessed muscular hypodensity independently predicts mortality in DLBCL patients younger than 60 years. Cancers (Basel). 2021; 13(18):4503. https://doi.org/10.3390/cancers13184503PubMedPubMed CentralGoogle Scholar
- Cahan N, Marom EM, Soffer S. Weakly supervised attention model for RV strain classification from volumetric CTPA scans. Comput Methods Programs Biomed. 2022; 220:106815. https://doi.org/10.1016/j.cmpb.2022.106815PubMedGoogle Scholar
- Cai S, Parker F, Urias MG, Goldberg MF, Hager GD, Scott AW. Deep learning detection of sea fan neovascularization from ultra-widefield color fundus photographs of patients with sickle cell hemoglobinopathy. JAMA Ophthalmol. 2021; 139(2):206-213. https://doi.org/10.1001/jamaophthalmol.2020.5900PubMedPubMed CentralGoogle Scholar
- Doan M, Case M, Masic D. Label-free leukemia monitoring by computer vision. Cytometry A. 2020; 97(4):407-414. https://doi.org/10.1002/cyto.a.23987PubMedPubMed CentralGoogle Scholar
- Guo R, Hu X, Song H. Weakly supervised deep learning for determining the prognostic value of (18)F-FDG PET/CT in extranodal natural killer/T cell lymphoma, nasal type. Eur J Nucl Med Mol Imaging. 2021; 48(10):3151-3161. https://doi.org/10.1007/s00259-021-05232-3PubMedPubMed CentralGoogle Scholar
- Lisson CS, Lisson CG, Mezger MF. Deep neural networks and machine learning radiomics modelling for prediction of relapse in mantle cell lymphoma. Cancers (Basel). 2022; 14(8):2008. https://doi.org/10.3390/cancers14082008PubMedPubMed CentralGoogle Scholar
- Kochanny SE, Pearson AT. Academics as leaders in the cancer artificial intelligence revolution. Cancer. 2021; 127(5):664-671. https://doi.org/10.1002/cncr.33284PubMedGoogle Scholar
- Radakovich N, Nagy M, Nazha A. Machine learning in haematological malignancies. Lancet Haematol. 2020; 7(7):e541-e550. https://doi.org/10.1016/S2352-3026(20)30121-6PubMedGoogle Scholar
- New NCI-DOE collaboration project IMPROVE seeks deep learning model approaches. 2022. Publisher Full TextGoogle Scholar
- Cruz Rivera S, Liu X, Chan AW. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med. 2020; 26(9):1351-1363. https://doi.org/10.1136/bmj.m3210PubMedPubMed CentralGoogle Scholar
- Liu X, Cruz Rivera S, Moher D. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med. 2020; 26(9):1364-1374. https://doi.org/10.1136/bmj.m3164PubMedPubMed CentralGoogle Scholar
- Sounderajah V, Ashrafian H, Golub RM. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open. 2021; 11(6):e047709. https://doi.org/10.1136/bmjopen-2020-047709PubMedPubMed CentralGoogle Scholar
- Howard FM, Dolezal J, Kochanny S. The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nat Commun. 2021; 12(1):4423. https://doi.org/10.1038/s41467-021-24698-1PubMedPubMed CentralGoogle Scholar
- Cancer Genome Atlas Research N, Weinstein JN, Collisson EA. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013; 45(10):1113-1120. https://doi.org/10.1038/ng.2764PubMedPubMed CentralGoogle Scholar
Figures & Tables
Article Information
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.