How Artificial Intelligence and Multiple-Ancestry Polygenic Risk Scores Could Improve Breast Cancer Outcomes for Black Women

How Artificial Intelligence and Multiple-Ancestry Polygenic Risk Scores Could Improve Breast Cancer Outcomes for Black Women

Triple-negative breast cancer (TNBC) remains one of the most aggressive forms of breast cancer, particularly impacting Black women in the United States, who often face higher rates of TNBC at younger ages. Traditional breast cancer risk prediction models often lack the sensitivity and specificity needed for these diverse populations, underscoring an urgent need for tailored approaches like multiple-ancestry polygenic risk scores (MA-PRS). Recent advancements in artificial intelligence (AI) are pushing the frontiers of MA-PRS, enabling more precise risk prediction, data handling, and personalized interventions.

A recent study delved into the effectiveness of a multiple-ancestry polygenic risk score (MA-PRS) for predicting the risk of triple-negative breast cancer (TNBC) among Black women. By examining both clinical and genetic data, researchers aimed to assess how well the MA-PRS could identify elevated TNBC risk in Black women.

Why Focus on Black Women and TNBC?

Black women in the United States often experience a higher incidence of aggressive forms of breast cancer, particularly TNBC, compared to White women. This disparity often involves younger age of onset and advanced cancer stages at diagnosis. Currently, most polygenic risk scores (PRS) perform poorly for non-European populations, which limits their effectiveness as predictive tools. Improving risk prediction models for Black women, especially in relation to TNBC, is critical for advancing early detection and intervention strategies.

Key Findings and Methodology

The study leveraged data from a cohort of over 17,000 self-identified Black women. It evaluated the accuracy of the MA-PRS in predicting TNBC, specifically targeting the younger subgroup of the cohort, where the need for early risk assessment is particularly pressing. The results indicated that integrating MA-PRS into existing risk assessments significantly enhanced the prediction of TNBC risk, outperforming clinical factors alone. Women in the top 5% of the MA-PRS distribution exhibited approximately twice the risk of developing TNBC compared to those with lower scores.

Dr. Holly Pederson, director of medical breast services at Cleveland Clinic, and Dr. Elisha Hughes, director of research biostatistics at Myriad Genetics, highlighted that this research fills significant gaps in current breast cancer risk assessments. Dr. Pederson emphasized the broad, unmet needs in addressing breast cancer within the Black population, noting that limited genetic testing options often fall short in identifying risk in young Black women predisposed to TNBC. This research utilizes polygenic risk scores in an innovative way, aggregating small genetic variations that, while individually minor, collectively impact cancer risk significantly.

A pivotal aspect of this study involved real-world data showing that over half of women under 40 who develop breast cancer have no first- or second-degree family history of breast or ovarian cancer. The MA-PRS could help to bridge this gap, providing risk information for individuals who otherwise may not be flagged for high-risk screening.

Dr. Hughes noted that the polygenic score used in the study is as effective as other risk factors, such as mammographic density, in predicting TNBC risk. This score could enable clinicians to refine screening guidelines, potentially recommending earlier or more frequent screenings for those at higher risk based on ancestry and genetic profiles rather than age alone.

Distribution of Triple Negative Breast Cancer (TNBC) by Age Group – Demonstrates the age-related prevalence of TNBC, underscoring the importance of early detection.

Understanding Polygenic Risk Scores (PRS) and Their Limitations

To understand MA-PRS, it helps to start with the basics of polygenic risk scores (PRS). Traditional PRS models analyze multiple small genetic variations across the genome, called single nucleotide polymorphisms (SNPs), that each contribute a small amount to a person’s disease risk. These scores aggregate these risks to create an overall risk assessment.

However, most PRS have been developed using data primarily from European populations, which means they are less accurate for people of non-European ancestry. This lack of diversity has led to a “genetic gap” in risk prediction, underscoring the need for ancestry-inclusive PRS models.

For Black women, the limitations of European-based PRS can be critical. According to the American Cancer Society, Black women in the U.S. are 40% more likely to die from breast cancer than White women, partly due to the prevalence of aggressive cancers like TNBC. This disparity emphasizes the need for tools like MA-PRS, which factor in multiple ancestries, to improve risk predictions and enable timely, targeted interventions.

MA-PRS represents a critical advancement in risk prediction, offering a nuanced approach that reflects genetic diversity. This method’s ability to factor in ancestral genetic differences provides a more accurate risk assessment across multiple ancestries, which is essential for diseases like breast cancer, where risk factors can vary significantly by ancestry.

The calculation of a multiple-ancestry polygenic risk score (MA-PRS) involves several key steps to combine genetic information across various ancestries into a single score, reflecting the risk of a particular disease, such as breast cancer. Here’s a breakdown of how MA-PRS is typically calculated:

1. Data Collection and Genome-Wide Association Studies (GWAS)

  • First, genome-wide association studies (GWAS) are conducted across diverse populations to identify specific genetic variants, known as single nucleotide polymorphisms (SNPs), associated with the disease of interest.
  • Each SNP identified is analyzed to determine how strongly it correlates with the disease in each population group.

2. Selecting Relevant SNPs for the Score

  • Not all SNPs contribute significantly to disease risk. So, scientists select SNPs that have been shown to have a meaningful association with the disease across multiple ancestries.
  • The chosen SNPs are often filtered based on significance thresholds, minor allele frequency (to avoid rare variants), and their linkage disequilibrium (to avoid redundancy).

3. Assigning Weights to SNPs

  • For each selected SNP, a weight is calculated based on the SNP’s effect size or its strength of association with the disease. This weighting process typically uses the beta coefficients derived from the GWAS analysis.
  • When calculating a multi-ancestry PRS, researchers often assign ancestry-specific weights or apply methods that aggregate effect sizes across ancestries, thereby reflecting the SNP’s effect within each population.

4. Incorporating Ancestry Information

  • To account for the genetic diversity across ancestries, adjustments are made so that SNPs are weighted differently depending on the individual’s ancestry.
  • Advanced statistical methods, like meta-analysis of GWAS results across ancestries or machine learning techniques, can optimize these weights to balance predictive accuracy for each ancestry.

5. Summing the SNP Contributions

  • The score is calculated by summing the products of each SNP’s effect size (weight) and the individual’s genotype at that SNP (0, 1, or 2, representing the number of risk alleles they carry).
  • This calculation yields an overall risk score, where higher scores indicate a greater genetic predisposition to the disease.

6. Standardization and Validation

  • To make the scores comparable across individuals, the scores are often standardized by converting them to z-scores based on the distribution within the study population.
  • Validation of the MA-PRS is essential to ensure its accuracy across different ancestries. This step involves testing the score in independent datasets and comparing its predictive power with clinical data.

7. Combining MA-PRS with Clinical Risk Factors (Optional)

  • In some cases, the MA-PRS is combined with other risk factors, such as age, family history, and lifestyle factors, to improve the model’s overall predictive accuracy. This integrated model is often tested and validated for each ancestry to confirm its reliability.

Example of MA-PRS Calculation

If we have selected SNPs, their weights, and an individual’s genotype at these positions, the calculation might look like this:

For an individual with SNPs A, B, and C, we could calculate the MA-PRS as:MA-PRS=(Genotype at A×Weight for A)+(Genotype at B×Weight for B)+(Genotype at C×Weight for C)+…MA-PRS=(Genotype at A×Weight for A)+(Genotype at B×Weight for B)+(Genotype at C×Weight for C)+…

For instance, if:

  • SNP A has a weight of 0.4, and the individual carries two risk alleles (2),
  • SNP B has a weight of 0.6, and the individual carries one risk allele (1),
  • SNP C has a weight of 0.2, and the individual carries zero risk alleles (0),

Then, their score would be calculated as:MA-PRS=(2×0.4)+(1×0.6)+(0×0.2)=0.8+0.6+0=1.4MA-PRS=(2×0.4)+(1×0.6)+(0×0.2)=0.8+0.6+0=1.4

This score then contributes to assessing the individual’s overall risk of disease, which would be interpreted in the context of a population’s risk distribution.

How AI Enhances MA-PRS Models

One of the most challenging aspects of building an MA-PRS model is selecting the SNPs that contribute significantly to disease risk across multiple ancestries. AI-powered machine learning algorithms can analyze extensive datasets and identify the SNPs that have the strongest association with TNBC in diverse populations. By processing these SNPs’ effects in large-scale datasets, AI can help refine MA-PRS models, ensuring they are not only comprehensive but also ancestry-inclusive.

For instance, random forest models and support vector machines can sift through millions of SNPs, identifying those with predictive power for breast cancer in Black women. These AI techniques can detect which SNPs, though previously overlooked in traditional PRS for European populations, carry significant relevance for other ancestries, ensuring the model is more accurate and equitable.

In MA-PRS, each SNP is assigned a weight based on its effect size, or its strength of association with TNBC. AI algorithms can dynamically adjust these weights, learning over time from new data and refining risk predictions based on a patient’s genetic and clinical profile. Bayesian networks and neural networks, for example, are commonly used AI models that allow for real-time adjustment of SNP weights.

An AI model can thus personalize risk stratification, enabling healthcare providers to identify those at the highest risk. Studies have shown that such AI-enhanced models improve risk prediction accuracy by nearly 30% compared to traditional methods.

A core benefit of using AI in MA-PRS models is its ability to account for genetic diversity by adjusting SNP weights and risk predictions based on ancestry. AI models, especially those using ensemble techniques, can incorporate ancestry-specific data to dynamically calibrate risk scores. This adjustment is vital for predicting breast cancer risk in Black women, whose genetic backgrounds may differ significantly from the populations traditionally represented in PRS datasets.

For example, AI models trained on diverse datasets can differentiate between the relative risks of TNBC in African, Caribbean, or Afro-Latin populations, which allows for more accurate predictions within the broader category of “Black” ancestry. Such models make MA-PRS highly adaptive and accurate, contributing to better-informed healthcare decisions.

Takeaways for Clinicians and Patients

The use of AI in calculating MA-PRS models has profound implications for both clinicians and patients, especially those from underrepresented populations:

  • Clinicians can leverage AI-powered MA-PRS to personalize screening recommendations, ensuring that high-risk individuals receive the necessary proactive monitoring.
  • Patients can advocate for genetic testing that uses MA-PRS models, particularly if they have ancestry-related risk factors for breast cancer. For readers, especially Black women concerned about breast cancer, the takeaways are clear: proactive engagement with healthcare providers, an understanding of genetic testing options, and a commitment to personal health can be powerful tools in managing cancer risk. 

AI and the Continued Evolution of MA-PRS

As AI technology continues to evolve, the future of MA-PRS models looks promising. Next-generation AI algorithms that incorporate real-world data (e.g., electronic health records, environmental data) could further personalize breast cancer risk prediction, making MA-PRS an even more powerful tool in personalized healthcare.

For now, the integration of AI into MA-PRS represents a substantial leap forward. By making risk prediction more precise, inclusive, and actionable, AI is transforming society’s approach to breast cancer risk in diverse populations, moving us closer to a world where early detection and prevention are within reach for everyone.


Are you interested in how AI is changing healthcare? Subscribe to our newsletter, “PulsePoint,” for updates, insights, and trends on AI innovations in healthcare.

💻 Stay Informed with PulsePoint!

Enter your email to receive our most-read newsletter, PulsePoint. No fluff, no hype —no spam, just what matters.

We don’t spam! Read our privacy policy for more info.

💻 Stay Informed with PulsePoint!

Enter your email to receive our most-read newsletter, PulsePoint. No fluff, no hype —no spam, just what matters.

We don’t spam! Read our privacy policy for more info.

We don’t spam! Read our privacy policy for more info.

Leave a Reply