### Whole-slide images and sequencing datasets
This research study was approved by the respective institutional review boards at the Icahn school of Medicine at Mount Sinai (protocol 19-00951) and MSKCC (protocol 18-013). Informed consent was waived as per the institutional review board protocols. Participants were not compensated. Sex and/or gender was not considered in the study design as cohorts were generated as random samples of the patient population.
### MSKCC
The MSKCC cohort consists of 7,586 patients (7,996 slides or cases) diagnosed as LUAD from 2014 through 2024, divided into the training dataset (4,867 patients, 5,174 slides), validation dataset (1,640 patients, 1,742 slides), dataset for calibrating clinical threshold (764 patients, 765 slides, 397 primary) and the slides processed in real time for the silent trial (315 patients or slides, 197 primary). for the retrospective cohorts (training and validation), the last section taken from each formalin-fixed paraffin-embedded (FFPE) tissue block (after all unstained slides used for molecular sequencing have been cut) is stained with H&E. All samples, including cytology cell block samples, are part of the retrospective cohort. The calibrating and IRT datasets are evaluated using the diagnostic slides, including from cytology cell blocks, the first section from the block, before either the rapid test or genomic sequencing. All slides are digitized with a mix of Aperio AT2 (at 20× magnification) and GT450 (at 40× magnification) digital slide scanners from Leica Biosystems. All slides that were part of the standard clinical workflow were utilized. Thus, this study represents the full extent of biological and technical variability of the clinical setting. The cases for prospective sequencing are selected in real time by identifying LUAD samples for which both rapid EGFR genomic sequencing is ordered prospectively. Demographic and clinical information for this cohort is provided in Supplementary Table 1.
The ground truth for EGFR mutations is established using the MSK-IMPACT targeted genomic sequencing assay17,18 performed on the same tissue block from which the digital slide is created. MSK-IMPACT is a hybridization capture-based NGS assay routinely used to detect clinically relevant somatic mutations, copy number alterations and gene fusions across cancers. This assay screens for variants in up to 505 unique cancer-related genes, including EGFR, in all tumor types. All sequencing occurs in a Clinical Laboratory Advancement Amendments (CLIA)-certified laboratory, and each variant is reviewed by a board-certified molecular pSequencing was performed on a NextSeq 550DX (Illumina) system using a NextSeq 500/550 High Output Kit v2.5 (300 cycles). Data were processed and analyzed by the trusight Oncology 500 Local App version 2.11.3, followed by an in-house pipeline using a second variant caller (Mutect219) and ANNOVAR20 for annotation of the alterations. For DNA analysis, single-nucleotide variants, insertions and deletions, copy number variations, total mutation burden and microsatellite instability were calculated.For RNA analysis, putative gene fusion of around 50 fusion driver genes and RNA splice variants from *EGFR*, *AR* or *MET* (for exmaple, *MET*exon 14 skipping) were explored.
The TCGA-LUAD is a well-characterized cohort of primary resection specimens that is part of the broader TCGA project21. TCGA-LUAD originally consisted of comprehensive genomic profiling of 230 resected LUADs by whole-exome sequencing (WES) and RNA sequencing. The cohort has subsequently been expanded to 585 unique cases with 582 undergoing WES. The corresponding 519 diagnostic digital slides were downloaded using the GDC Data Transfer Tool. An expert pathologist reviewed all samples,annotating the presence of different types of artifact on the slides: low-quality stain,blue and red saturation,blur,widespread tissue necrosis,freeze artifacts and severe artifacts that wholly obscured the morphology of the sample (Supplementary Table 10).EGFR mutations from WES were clinically characterized using the OncoKB database.
Accelerated EGFR Biomarker Prediction with AI-Powered Digital Pathology
The integration of artificial intelligence into digital pathology is rapidly transforming cancer diagnostics, offering the potential for faster and more efficient biomarker analysis. Recent advancements have demonstrated the feasibility of real-time EGFR (Epidermal Growth Factor Receptor) prediction using whole slide imaging (WSI) and deep learning models.This represents a notable step towards streamlining clinical workflows and accelerating patient care.
high-Performance model Training and Deployment
A novel AI model, referred to as EAGLE, was developed and trained to predict EGFR status directly from digital pathology slides. To facilitate the processing of large image datasets, patch encoding was implemented utilizing 16-bit floating-point precision. This allowed for the utilization of larger batch sizes during training. The model underwent 20 epochs of training on a cluster of 24 NVIDIA H100-80GB GPUs, completing the process in approximately 9.28 hours. Notably, once trained, EAGLE can operate effectively on a single GPU, broadening it’s accessibility.
For initial validation, EAGLE was deployed with full floating-point precision on a single NVIDIA RTX 3090 GPU (26GB). Performance testing revealed a median processing time of 68 seconds per slide, indicating its suitability for integration into a live clinical habitat. Furthermore,the model’s adaptability allows for deployment on hardware with limited memory capacity,albeit with a potential trade-off in inference speed. This flexibility is crucial for wider adoption across diverse healthcare settings.
Streamlined Clinical Workflow for Real-Time Analysis
The implementation of an automated, real-time EGFR prediction pipeline is now a reality.Consider a major cancer center processing between 90 and 110 Non-Small Cell Lung Cancer (NSCLC) cases each month requiring EGFR testing. The proposed pipeline automatically identifies digital slides submitted for molecular testing, alongside corresponding slides from the same tissue block.Automated monitoring applications, running hourly, detect newly scanned slides and cases awaiting molecular analysis. Upon identification of a match, the slide is promptly transferred to a GPU-equipped computing infrastructure for AI-driven inference.
this system prioritizes the first scanned slide when multiple wsis are available. During a silent trial, data was collected from EAGLE predictions, conventional rapid testing, and the comprehensive MSK-IMPACT assay. Detailed timestamps were recorded for key events – test accessioning, EAGLE result generation, rapid test completion, and MSK-IMPACT result availability – enabling a comparative performance assessment of the AI-assisted screening pipeline against the existing rapid testing workflow. This data-driven approach is essential for validating the clinical utility and impact of AI-powered diagnostics.
Technical specifications
The EAGLE model was constructed using PyTorch (v.2.1.1+cu121), a widely adopted deep learning framework. Supporting software pipelines were developed using Python (v.3.8.18), ensuring compatibility and ease of integration with existing laboratory information systems.