Dataset for the revelation of lung cancer-associated TCR RFUs
To identify the RFUs associated with cancer status (Fig. 1), we assembled a cohort of blood samples from 463 patients diagnosed with lung cancer (Supplementary Data 1). The cohort was enriched for subjects with stage I disease (Fig. 2a) and spanned all the major lung cancer subtypes (Fig. 2b). Blood samples from 587 subjects without lung cancer were collected as control samples, with the majority of individuals meeting the current inclusion criteria for lung cancer screening (Fig.2c-d). We used a custom TCR sequencing assay (Methods) to sequence 128,902,511 productive TCR clonotypes. The median productive TCR clonotypes per sample was 113,571, which was comparable between cancer patients and non-cancer controls (Fig. 2f, g). After filtering for the most abundant CDR3 lengths of 10-16 residues and removing any clonotypes with unique molecular identifier (UMI) read count below each sample’s median UMI read count × 0.75 to preferentially remove naïve T cell clonotypes, 69,027,705 total clonotypes remained and were used for further analysis.
Fig. 1: Overview of the TCR RFU workflow. (1) TCR β chain of circulating T cells from blood buffy coats is deeply sequenced using an NGS-based assay. (2) Filtered TCRs are clustered into repertoire functional units (RFU) using sequence similarity as the distance metric. (3) A generalized linear regression model is applied to each RFU to discover RFUs that are individually associated with cancer status after accounting for demographic and technical covariates.(4) Significantly cancer-associated RFUs are used jointly to train a machine learning model to predict cancer status. Figure created in part using BioRender.com.
Fig. 2: Case-control cohort statistics.### Lung cancer-associated RFU discovery
With the defined RFUs, we next turned to cancer case/non-cancer control association testing to identify the cancer-associated RFUs.Owing to the large number of rfus and their predominantly small size, there is a significant multiple testing burden imposed by small RFUs, for which we have minimal statistical power to detect cancer associations at our sample size. To address this for the case-control analysis, we restricted the set of candidate RFUs to the most common RFUs with TCR clonotypes observed in at least 15 individuals and with multiple (≥8) distinct clonotypes present in at least three individuals, regardless of cancer status. This resulted in 6375 RFUs being tested for cancer association (Supplementary Table 1). While RFUs obtained with different dc settings partially overlap, clustering TCRs across a range of dc cutoffs allows us to find the optimal balance between the population prevalence and the degree of putative shared antigen specificity of each RFU for cancer association testing.
We observed that the per-subject distribution of RFU TCR counts (number of TCR clonotypes from an RFU present in an individual) can be modeled analogously to gene expression levels measured using RNA-seq, with the level of an RFU computed as the sum of its constituent TCR clonotype counts.Therefore, we used the well-established gamma-Poisson generalized linear model33 to test for RFU association with cancer status. This model accounts for variable depth of sequencing and RFU count overdispersion and allows us to incorporate demographic and technical covariates such as age, gender, race, and TCR repertoire depth into the analysis.
We identified a total of 327 RFUs associated with cancer status at false discovery rate (FDR) ≤ 0.1 across the eight dc cutoffs, including 157 that were enriched in cancer samples with fold change between 1.03 and 2.26, and 170 that were enriched in non-cancer controls with a fold change between 1.05 and 17.2 (Fig. 4a,Supplementary Data 2). Of the 327 cancer-associated rfus, TCR clonotype counts for 157 RFUs were also correlated with subject age, while 136 were correlated with race, and 88 to gender at FDR ≤ 0.1 (Supplementary Data 3). Of the 327 RFUs, 124 had a repeated TCR centroid across different dc cutoffs, indicating that they were overlapping RFUs.Supplementary Fig. 3e-g) and were uniformly higher in cancer patients than in age-matched non-cancer controls (Supplementary Fig. 3a). Model scores were also higher in cancer patients than in non-cancer controls known to have heart disease and/or chronic obstructive pulmonary disease (COPD) (Supplementary Fig.3h), and were uniform across various lung cancer subtypes (supplementary Fig. 3i). Notably, 48% of stage I subjects (test samples of each cross-validation fold) could be detected by the model at a specificity of 80% (Fig. 4c), and the model could differentiate between lung cancer and benign nodules (Fig. 4d), highlighting the promise of this early detection approach. TCR cancer prediction score did not significantly differ between cancer stages (p = 0.42) (Supplementary Fig. 4).
The greatest imbalances between our cases and controls involved age and TCR repertoire depth (Fig. 2).to confirm that the ML model scores were not driven by these covariates, we fitted a linear regression model using cancer status, age, TCR repertoire clonotype counts, and UMI read counts as predictors for the RFU TCR score as the response. The resulting p values for the cancer status, age, TCR clonotype count, and TCR UMI count were (9.5times {10}^{-22}), (9.1times {10}^{-4}), (5.5times {10}^{-4}) and (2.3times {10}^{-3}), indicating that the final cancer prediction score was predominantly driven by a sample’s cancer status as opposed to its age or TCR repertoire depth.
### Uncovering the cancer signal requires TCR grouping by sequence similarity
To estimate the benefit of the TCR clustering (RFU formation) step for cancer prediction, we applied the same cross-validation procedure to features derived from individual unclustered TCRs. Using filtered TCRs as the starting point, we fitted the GLM model to the TCR counts of each unique TCR sequence based on the V gene, J gene, and CDR3 amino acid sequence. Significantly cancer-associated unique TCR sequences (FDR ≤ 0.1) were used as features in the SVM model. Across CV folds,there were,on average,only 5.6 significantly cancer-associated TCRs (range: 3-8), which were all used for the prediction model. The mean CV AUC for this TCR model was only 0.59. The significantly lower feature count and CV AUC of TCRs vs. RFUs are consistent with the hypothesis that TCR-cancer associations are too weak to be discovered individually, and that a stronger signal can be achieved by combining TCRs with similar sequences26.
### Lung cancer-associated plasma protein biomarkers
We next sought to assess the potential contribution of this TCR-based signature to the early detection of cancer in the context of established tumor analytes. We first reviewed the literature on protein biomarkers with known or suggested roles in lung cancer detection in plasma. Two large, well-designed case-control studies have recently evaluated multiplexed protein biomarker panels for their association with either imminent lung cancer diagnosis or pulmonary nodule malignancy37,38. A total of 54 distinct protein markers were reported to be possibly associated with either diagnosed lung cancer or malignant nodules in the two studies.
To compare the predictive performance of the published protein biomarkers with our TCR signature, we generated circulating protein level data from 235 study subjects, including 109 cancer patients and 126 non-cancer controls. The demographic and tumor property distributions of these subjects closely matched the overall distribution (Supplementary Figure 5,and Figure 2). We used the Olink Oncology and Inflammatio## Enhancing Early Cancer Detection: The Power of T Cell Receptor Repertoire Analysis
Liquid biopsies are revolutionizing cancer management, offering a non-invasive method for diagnosis, monitoring, and treatment guidance.While circulating tumor DNA (ctDNA) and protein biomarkers have shown promise,recent research highlights the significant potential of incorporating T cell receptor (TCR) repertoire analysis,specifically focusing on recurrent functional units (RFUs),to improve early cancer detection. This approach leverages the adaptive immune system’s response to cancer, providing a complementary layer of data to conventional biomarkers.
### Identifying Genomic Alterations Through ctDNA Analysis
To understand the contribution of TCR RFUs, a study investigated genomic alterations in a cohort of 100 subjects. Utilizing next-generation sequencing (NGS) on both DNA and germline DNA (gDNA) samples, researchers identified 28 mutations after filtering out those present in the gDNA. These mutations were then used to train a logistic regression (LR) model, incorporating mutation count and average allele frequency, to distinguish between individuals with and without detectable ctDNA.
As anticipated, the model demonstrated higher sensitivity in detecting cancer in patients with advanced (stage III-IV) disease. This aligns with existing literature demonstrating that ctDNA levels generally correlate with tumor burden.However, sensitivity diminished in earlier stages, a common challenge in ctDNA-based early detection, mirroring findings from previous studies13,16. This highlights the need for more sensitive approaches to identify cancer at its earliest, most treatable stages.
### Common Genetic Drivers and Their Limited Correlation with TCR RFUs
Analysis of the identified mutations revealed that *TP53* was the most frequently mutated gene (15 out of 100 patients),followed by *KRAS* (3 out of 100 patients). These genes are well-established cancer drivers, involved in critical cellular processes like cell cycle control and signal transduction. Interestingly, the TCR RFU-based cancer prediction scores showed no significant association with the presence of mutations in either *TP53* (p=0.65) or *KRAS* (p=0.40). This suggests that TCR RFUs capture a different aspect of the tumor-immune interaction, independent of these specific genetic alterations. Such as, a tumor might not have a *KRAS* mutation, but still elicit a strong T cell response detectable through TCR analysis.### Multi-Analyte Liquid Biopsy: TCR RFUs Enhance Sensitivity
To assess the added value of TCR RFUs, researchers integrated data from TCR analysis, protein biomarkers, and ctDNA mutations. A subset of 85 subjects had data available for all three analytes. The performance of each analyte was evaluated independently using cross-validation, determining the sensitivity at various specificity levels – mirroring the standards used in typical cancer screening tests.
The results were compelling: incorporating TCR RFU biomarkers alongside ctDNA mutations and protein analysis led to a significant increase in sensitivity for stage I cancer, with improvements reaching approximately 20 percentage points.This suggests that TCR RFUs can help identify cancers that might be missed by current methods alone.Consider a scenario where a patient has a very small tumor shedding minimal ctDNA; the corresponding T cell response, detectable through TCR analysis, could provide an early warning signal. However, the benefit of adding TCR RFUs was less pronounced for later-stage cancers (stage II-IV), likely because existing plasma-based analytes already achieve high performance in detecting advanced disease.
### Delving Deeper: TCR RFU Analysis of tumor-Infiltrating Lymphocytes
Further investigation focused on characterizing the TCRs within RFUs, specifically examining tumor-infiltrating lymphocytes (TILs). this analysis aims to understand the specific T cell responses targeting cancer cells and potentially identify shared antigens driving these responses.
“`html