Quando la prova sta: Corso Ultimate DTA (V3)

Non hai sentito la storia della donna
who promised to cambiare il mondo con una goccia di sangue,
who raised billions on a test that never worked?

Palo Alto, 2003

STANFORD UNIVERSITY

Un diciannovenne si è ritirato con una visione: centinaia di sangue test da una singola goccia.

Investors believed. Walgreens believed. The Pentagon believed.

They gave her $9 billion.

Ma i test hanno dato risultati errati. Ai pazienti veniva detto che avevano l'HIV quando non era così. Ai pazienti è stato detto che il loro sangue era normale quando erano dying.

Carreyrou J. Bad Blood. 2018

L'albero decisionale dell'inganno

What Theranos Did vs. What Should Happen

New Diagnostic Test

↓

SHOULD DO

Validate Against Gold Standard

↓

Publish TP/FP/FN/TN

↓

FDA Approval

THERANOS DID

Skip Validation

↓

Hide Failures

↓

Harm Patients

"E il test mentì,
e la menzogna era vestita di certezza,
e nessuno ha chiesto la tavola 2×2."

Ecco perché studiamo l'accuratezza dei test diagnostici.

When a test speaks,
ci sono solo four possible truths.

Due sono benedizioni. Due sono maledizioni.

L'albero dei risultati

Every Test Result Has a Reality Behind It

Patient Tested

↓

Qual è la VERITÀ?

Has Disease

D+

↓

TPTest +

FNTest -

No Disease

D-

↓

FPTest +

TNTest -

La Sacra Tavola 2×2

HIV Rapid Test Example (Real Data)

	HIV+	HIV-	Total
Test +	98	3	101
Test -	2	895	897
Total	100	898	998

DA QUESTA TABELLA NASCE TUTTA LA VERITÀ

Sensitivity = 98/100 = 98%
Specificity = 895/898 = 99.7%

"Two outcomes save. Two outcomes harm.
TP, TN: il test ha parlato vero.
FP, FN: il test ha mentito.
Know them by name, for they determine fate."

Non hai sentito parlare del sangue che fu analizzato,
found clean,
e donato a migliaia di persone—
while death swam within it?

La crisi dell'approvvigionamento di sangue, 1985

UNITED STATES

When HIV testing began, doctors celebrated: they could now screen the blood supply.

Ma il test aveva a window period—settimane dopo l'infezione quando il virus era presente ma undetectable.

Il sangue è stato testato. Il sangue è stato "negativo". Il sangue è stato trasfuso.

8,000-12,000 Americans sono stati infettati tramite trasfusioni prima che test migliori chiudessero la finestra.

CDC. MMWR. 1987;36(49):833-840

The Window Period Decision Tree

Why False Negatives Are Deadly

Person Recently Infected

↓

Time Since Infection?

< 2 weeks

Test NEGATIVEVirus present!

↓

Blood DonatedOthers infected

> 4 weeks

Test POSITIVECorrectly detected

↓

Blood DiscardedSupply safe

La sensibilità cambia Ora

0%

Day 1-7
Eclipse period

~50%

Day 14
Seroconversion

~95%

Day 21
Most detected

99.9%

Day 45+
Window closed

THE LESSON

La sensibilità non è fissa. It depends on when you test. A "99% sensitive" test may be 0% sensitive in early infection.

"E il test ha detto 'pulito'
perché il virus non si era ancora mostrato il suo volto.
E il sangue è stato condiviso,
e l'infezione diffondersi agli innocenti."

Non hai sentito parlare della pillola somministrata alle madri
to protect their pregnancies,
that planted cancer in their daughters
twenty years before it bloomed?

La tragedia del DES, 1938-1971

UNITED STATES & EUROPE

Diethylstilbestrol (DES) was given to millions of pregnant women to prevent miscarriage.

No proper clinical trial was ever conducted. Doctors assumed it worked because it seemed reasonable.

Decades later, their daughters developed a rare cancer: clear cell adenocarcinoma of the vagina. A cancer so rare it was a diagnostic signal in itself.

5-10 million women sono stati esposti al danno attraversato generazioni.

Herbst AL et al. N Engl J Med. 1971;284:878-881

L'albero decisionale di convalida

What Should Have Happened

New Medical Intervention

↓

È stato testato adeguatamente?

YES

Randomized Trial

↓

Long-term Follow-up

↓

Know True EffectsVantaggi e danni

NO (DES)

Assumption Only

↓

Widespread Use

↓

Hidden HarmDiscovered too late

Il segnale diagnostico

QUANDO LA RARITÀ DIVENTA PROVE

L'adenocarcinoma a cellule chiare della vagina era così raro nelle giovani donne che 7 cases in one hospital triggered an investigation.

Il cluster stesso era il test diagnostico:
Sensitivity to DES exposure: nearly 100%
Se hai questo cancro a questa età, sei quasi certamente esposto.

1:1000

Risk of clear cell
cancer in DES daughters

5-10M

Women exposed
worldwide

"E le madri prendevano la pillola nella speranza,
e le figlie crebbero nell'ombra,
e vent'anni dopo sbocciò il cancro—
a diagnosis that indicted a generation of medicine."

A test has two virtues and two vices.

Sensitivity: Può trovare i malati?

Specificity: Può risparmiare i sani?

Sensibilità: Il cacciatore

THE FORMULA

Sensitivity = TP / (TP + FN)

"Of all the sick, how many did we catch?"

Worked Example: COVID PCR Test

Given: 200 infected patients tested

TP = 196 (correctly positive), FN = 4 (missed)

Sensitivity = 196 / (196 + 4) = 196/200 = 98%

Interpretation: Test catches 98 of every 100 infected people

Specificità: Il guardiano

THE FORMULA

Specificity = TN / (TN + FP)

"Of all the healthy, how many did we spare?"

Worked Example: Same COVID PCR Test

Given: 1000 uninfected people tested

TN = 999 (correctly negative), FP = 1 (false alarm)

Specificity = 999 / (999 + 1) = 999/1000 = 99.9%

Interpretation: Test correctly clears 999 of every 1000 healthy people

Le regole della memoria

When to Use Which Test

Di cosa hai bisogno?

RULE OUT disease

Use HIGH SENSITIVITY

↓

SnNoutSensitive Negative = OUT

RULE IN disease

Use HIGH SPECIFICITY

↓

SpPinSpecific Positive = IN

"La sensibilità cattura i malati.
La specificità risparmia i buoni.
But no test masters both perfectly—
questo è il fardello che dobbiamo affrontare sopportare."

Non hai visto il medico
who saw 99% accurate
and believed a positive result meant 99% certainty?

Questa è la cosa più mortale errore in medicina.

L'errore del tasso di base

THE PUZZLE

A disease affects 1 in 1000 people.
Un test è sensibile al 99% e specifico al 99%.
A patient tests positive.

Qual è la probabilità che abbiano la malattia?

Most doctors say ~99%. La vera risposta riguarda 9%.

La matematica rivelata

Testing 100,000 People (Prevalence 1/1000)

Step 1: 100 have disease, 99,900 healthy

Step 2: Of 100 sick: 99 test positive (TP), 1 negative (FN)

Step 3: Of 99,900 healthy: 999 test positive (FP), 98,901 negative (TN)

Step 4: Total positives = 99 + 999 = 1,098

PPV = TP / All Positives = 99 / 1,098 = 9%

Il 91% dei risultati positivi sono FALSI POSITIVI!

Interactive Base Rate Calculator

See How Prevalence Changes PPV

Prevalence:

1%

Sensitivity:

99%

Specificity:

99%

9%

Positive Predictive Value (PPV)

Il 91% dei positivi è falso allarmi

L'albero decisionale della prevalenza

Same Test, Different Settings

Test: 99% Sens, 99% Spec

↓

Where Is Testing Done?

General Pop
0.1%

PPV = 9%91% false +

High-Risk
10%

PPV = 92%8% false +

Confirmatory
50%

PPV = 99%1% false +

"E il medico disse 'accurato al 99%'
e il paziente sentì 'certo al 99%'
ed entrambi furono ingannati—
perché si erano dimenticati di chiedere: quanto è raro questo ?"

Non hai sentito parlare della macchina
that could find TB in two hours,
che si chiamava revolutionary—
ma ti sei perso drug-resistant strains?

La storia di GeneXpert, Sud Africa

CAPE TOWN, 2010

Per un secolo, la diagnosi della tubercolosi ha richiesto la crescita di batteri per settimane. Poi è arrivato GeneXpert: risultati in 2 hours.

South Africa deployed it nationwide. The WHO endorsed it.

Ma nei pazienti con low bacterial loads—often HIV co-infected— sensitivity dropped to 67%. One in three cases missed.

E per rilevare la resistenza alla rifampicina, ha mancato 5% casi resistenti. Quei pazienti hanno ricevuto il trattamento sbagliato. La diffusione della tubercolosi resistente.

Steingart KR et al. Cochrane Database Syst Rev. 2014;1:CD009593

TB Diagnosis Decision Tree

Quando GeneXpert non basta

Suspected TB Patient

↓

GeneXpert Test

↓

Positive

↓

Rifampicin?

SensitiveStandard Tx

ResistantMDR-TB Tx

Negative

↓

HIV+ or High Suspicion?

YesCulture needed

NoLikely negative

Sensitivity by Patient Type

98%

Smear-positive
(high bacterial load)

67%

Smear-negative
(low bacterial load)

61%

HIV co-infected
(immune suppressed)

THE LESSON

La sensibilità di un test negli studi clinici potrebbe non corrispondere alla sua sensibilità nei pazienti. Conosci la tua popolazione.

"E la macchina disse 'negativo,'
e il medico credette alla macchina,
e il paziente tornò a casa con la tubercolosi nei polmoni,
resistenza alla tosse nel mondo."

Non hai sentito parlare del test per gli uomini
che ha scoperto tumori che avrebbe never kill,
e avrebbe portato a trattamenti che destroyed lives?

La tragedia dello screening del PSA

UNITED STATES, 1990s-2010s

PSA (Prostate-Specific Antigen) could detect prostate cancer early.

I medici hanno sottoposto a screening milioni di uomini. Sono stati trovati tumori. Le prostate furono rimosse.

Ma molti di questi "tumori" non avrebbero mai causato sintomi. L'intervento ha causato impotenza e incontinenza in men who would have died of old age, not cancer.

Moyer VA. Ann Intern Med. 2012;157:120-134

I numeri del danno

1

Vita salvata dallo
prostate cancer
per 1000 screened

30-40

Men made impotent
or incontinent
per 1000 screened

100+

False positives
(biopsies, anxiety)
per 1000 screened

THE REVERSAL

In 2012, the US Preventive Services Task Force recommended against screening PSA di routine. Il test stava rilevando troppe cose che non era necessario trovare.

Patient Decision Aid: PSA Screening

Se 1.000 uomini di età compresa tra 55 e 69 anni vengono sottoposti a screening per 13 anni

Deaths from prostate cancer prevented

1-2 men

Men who will have false positive requiring biopsy

100-120 men

Uomini a cui è stato diagnosticato un cancro che non farebbe mai loro del male

20-50 men

Men left impotent or incontinent from treatment

30-40 men

È questo compromesso accettabile per te?

"E il test ha trovato l'ombra,
e il chirurgo tagliato,
e l'uomo viveva – impotente, incontinente –
da un cancro che non si sarebbe mai risvegliato."

Non hai sentito parlare dell'uomo con dolore al petto
la cui prima troponina era normal,
che fu mandato a casa—
e morì prima mattina?

Il problema della tempistica della troponina

EMERGENCY DEPARTMENTS WORLDWIDE

La troponina è il gold standard per la diagnosi di infarto. Ma ci vuole 3-6 hours to rise after myocardial injury.

A patient arrives one hour after chest pain begins. Troponin is tested: normal. "You're fine. Go home."

Il cuore stava morendo. La proteina non era ancora fuoriuscita.

Studies show 2-5% of MI patients sent home from ED die within 30 days.

Pope JH et al. N Engl J Med. 2000;342:1163-1170

Serial Testing Decision Tree

Il protocollo delle due troponine

Chest Pain Patient

↓

First Troponin

↓

Elevated

↓

Treat as MI

Normal

↓

When Did Pain Start?

<6 hrs

Wait 3 hrsRepeat troponin

>6 hrs

Low riskConsider d/c

High-Sensitivity Troponin

~70%

Conventional troponin
sensitivity at 0 hrs

~95%

hs-Troponin
sensitivity at 0 hrs

99%

hs-Troponin
at 3 hrs serial

THE TRADE-OFF

High-sensitivity troponin catches more heart attacks early. But it also has more false positives—elevated in kidney disease, heart failure, sepsis, and marathon runners.

"E il test ha detto 'normale'
perché il cuore aveva appena iniziato a morire.
E il paziente era rassicurato,
and went home to finish dying."

La sensibilità descrive il test.
La specificità descrive il test.

Ma il paziente chiede:
"I tested positive. What are MY chances?"

Likelihood Ratios

POSITIVE LIKELIHOOD RATIO

LR+ = Sensitivity / (1 - Specificity)

How much more likely is a + result in sick vs healthy?

NEGATIVE LIKELIHOOD RATIO

LR- = (1 - Sensitivity) / Specificity

How much more likely is a - result in sick vs healthy?

Il Fagan Nomogramma

Dalla probabilità pre-test a quella post-test

Pre-Test
Probability

99%

50%

20%

5%

1%

Likelihood
Ratio

100

10

1

0.1

0.01

Post-Test
Probability

99%

80%

50%

20%

1%

Draw a line from pre-test through LR to find post-test probability

Interpreting Likelihood Ratios

Quanto è potente questo test?

LR+ Value?

LR+ > 10Strong rule-in

5-10Moderate

2-5Weak

1-2Useless

LR- Value?

< 0.1Strong rule-out

0.1-0.2Moderate

0.2-0.5Weak

0.5-1Useless

"La sensibilità racconta dei malati.
La specificità racconta dei malati bene.
But the likelihood ratio answers:
Cosa significa questo risultato per QUESTO paziente?"

Non hai visto il bambino con la febbre nel villaggio,
il test rapido detto questo negative,
and the Plasmodium che continuava a moltiplicarsi?

Il problema dell'RDT sulla malaria

SUB-SAHARAN AFRICA

Malaria kills 600,000 people yearly, mostly children under 5.

Rapid Diagnostic Tests were meant to guide treatment in remote areas without microscopes or laboratories.

But when parasitemia is low—l'RDT non rileva i casi. And when P. falciparum elimina il gene HRP2— the RDT sees nothing at all.

WHO. Malaria RDT Performance. 2022

La decisione clinica Albero

Child with Fever in Malaria-Endemic Area

Febrile Child

↓

Perform RDT

↓

RDT Positive

↓

Trattamento per la malaria

RDT Negative

↓

Clinical Suspicion?

High

Treat Anywayor Microscopy

Low

Look forOther Cause

Sensitivity Varies by Parasitemia

95%

High parasitemia
(>200/μL)

75%

Low parasitemia
(100-200/μL)

50%

Very low
(<100/μL)

LA LEZIONE CLINICA

A negative RDT does not rule out malaria in endemic areas. Clinical judgment must override the test when suspicion is high.

"E il test ha dato 'negativo'
e il bambino fu mandato a casa,
e i parassiti si moltiplicarono nell'oscurità,
e al mattino il bambino non poteva più wake."

Nell'anno della pestilenza,
il mondo aveva bisogno di un test che era fast.

Ma veloce non è la stessa cosa di accurate.

Il verdetto Cochrane

COVID-19 Rapid Antigen Tests (155 Studies)

Population	Sensitivity	Missed
Symptomatic	73%	27%
Asymptomatic	55%	45%
First 7 days	80%	20%

Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705

The False Security Decision Tree

Thanksgiving 2020: What Happened

Family Member Tests Negative

↓

Truly Negative?

55% if asymptomatic

True NegativeSafe to gather

45% if asymptomatic

FALSE NegativeInfectious!

↓

Si riunisce con la famigliaGrandparents infected

"E il test ha dato 'negativo'
e la famiglia abbracciato,
e alla fine dell'inverno,
il nonno fu sepolto."

Non hai sentito parlare dello screening
che ha scoperto tumori che would never kill,
e avrebbe portato a trattamenti che caused more harm than the disease?

La sovradiagnosi Problema

3-4

Lives saved
per 10,000 screened

~15

Overdiagnosed
(treated unnecessarily)

~500

False alarms
(anxiety, biopsies)

THE QUESTION

Per salvare 3-4 vite, circa 15 donne vengono sottoposte a intervento chirurgico, radioterapia e chemioterapia per tumori che non avrebbero mai potuto danneggiarle.

Vale la pena questo compromesso?

Patient Decision Aid: Mammography

Se 10.000 donne di età compresa tra 50 e 69 anni vengono sottoposte a screening per 10 anni

Deaths from breast cancer prevented

3-4 women

Women called back for false alarms

~500 women

Unnecessary biopsies

~200 women

Donne curate per un cancro che non farebbe mai loro del male

~15 women

Lo screening è adatto a te?

The Screening Cascade Decision Tree

10.000 donne sottoposte a screening in oltre 10 Anni

10,000 Women

↓

~1,000 RecalledAbnormal

↓

~500 False
Alarm

~500 Biopsy
~50 cancer

~9,000 Cleared

Of ~50 Cancers Found

~35 Would Kill3-4 saved

~15 Would Never KillOverdiagnosed

"E il test ha trovato l'ombra,
e lo chiamavano cancro,
e la donna veniva tagliata e bruciata—
per un'ombra che non l'avrebbe mai oscurata giorni."

Non hai sentito parlare della scansione
che trova le placche nel cervello,
ma non puoi dirti
se mente volontà fade?

Il paradosso dell'amiloide

ALZHEIMER'S RESEARCH, 2010s-2020s

PET scans can now detect amyloid plaques—the hallmark of Alzheimer's.

But 30% of cognitively normal elderly have amyloid plaques. They may never develop dementia.

And 10-20% delle persone affette da demenza have no amyloid.

Il test rileva le placche, ma le placche non sono la malattia. Stiamo testando un surrogato, non il. risultato.

Jack CR et al. Lancet Neurol. 2018;17:760-773

Surrogate vs. Outcome Decision Tree

Che cosa stiamo realmente analizzando?

Diagnostic Test

↓

What Does It Detect?

Outcome itself

Direct Diagnosisad esempio, biopsia per il cancro

↓

High clinical value

Surrogate marker

Indirect Signalad esempio, amiloide per la demenza

↓

Validated link?

YesUse cautiously

NoLimited value

"E la scansione ha trovato il placche,
e il medico lo chiamò Alzheimer,
e il paziente viveva nel terrore—
of a forgetting that might never come."

Non tutti gli studi sono uguali.

Some are biased.
Some are poorly designed.
Alcuni non dovrebbero esserlo trusted.

Come separiamo il grano dalla pula?

QUADAS-2: La lista di controllo della qualità

Four Domains of Risk of Bias

1

Patient Selection

È stato arruolato un campione consecutivo o casuale? È stato evitato un disegno caso-controllo?

2

Index Test

Il test è stato interpretato senza conoscere la norma di riferimento? La soglia è stata pre-specificata?

3

Reference Standard

È probabile che lo standard di riferimento classifichi correttamente la condizione? È stato interpretato alla cieca?

4

Flusso e tempistiche

C'era un intervallo appropriato tra i test? Tutti i pazienti hanno ricevuto lo stesso standard di riferimento?

QUADAS-2 Decision Tree

Dovreste fidarvi di questo studio?

DTA Study

↓

Check All 4 Domains

All Low Risk

High QualityTrust results

Some Unclear

ModerateUtilizzare con cautela

Any High Risk

Low QualityI risultati potrebbero essere distorti

Distorsioni comuni nella DTA Studi

!

Verification Bias

Only positive tests get the reference standard → inflates sensitivity

!

Spectrum Bias

La popolazione studiata differisce dalla realtà clinica → i risultati non si generalizzano

!

Incorporation Bias

Index test is part of reference standard → artificially high accuracy

!

Review Bias

Index test interpreted knowing reference result → inflates both metrics

"Prima di fidarsi dei numeri,
ask: How were they gathered?
Uno studio parziale parla con sicurezza—
but its confidence is a lie."

Uno studio può ingannare.
Uno studio può lusingare.

Ma quando raccogli tutti i prove—
the truth becomes harder to hide.

Why DTA Meta-Analysis Is Different

THE PROBLEM

La sensibilità e la specificità sono correlated. When one goes up, the other tends to go down.

Non è possibile raggrupparle separatamente come gli effetti del trattamento. Hai bisogno di bivariate model.

La curva SROC

Summary Receiver Operating Characteristic

Sensitivity

1 - Specificity (False Positive Rate)

Individual studies

Summary estimate

Leggere lo SROC

Cosa ti dice la curva?

SROC Curve Position

↓

Top-Left Corner

Excellent TestHigh sens + spec

Near Diagonal

Useless TestNo better than chance

Points Scattered

High HeterogeneityInvestigate sources

"Uno studio può ingannare.
Molti studi, soppesati insieme,
tracciano il percorso della verità:
la curva SROC che rivela ciò che il test può veramente fare."

Ma cosa accadrebbe se gli studi disagree?

One says sensitivity is 95%.
Another says 60%.

A quale verità credi?

Sources of Heterogeneity

Perché gli studi non concordano

Stesso test, risultati diversi?

ThresholdDifferent cutoffs

PopulationSeverity, age

SettingPrimary vs specialist

QualityBias, blinding

Measuring Disagreement: I²

I² < 25%

Low
Studies agree

I² 25-75%

Moderate
Some variation

I² > 75%

High
Major disagreement

THE WARNING

When I² > 75%, the pooled estimate may be meaningless. Explain the disagreement before averaging.

"Quando gli studi sono in disaccordo,
non tacere il dissenso.
Ask: Why do they see differently?
Il disaccordo stesso insegna."

Il tuo DTA Toolkit

Le misure essenziali e quando utilizzarle

The Checklist

✓

Was there a valid reference standard?

Gold standard applied to ALL patients?

✓

Gli interpreti erano accecati?

Test readers unaware of diagnosis?

✓

Lo spettro era appropriato?

Pazienti simili alla vostra popolazione?

✓

Era la soglia pre-specificata?

O scelti per massimizzare i risultati?

When Results Don't Match Suspicion

The Clinical Override Decision Tree

Test Negative, High Suspicion

↓

What Is the LR-?

LR- < 0.1

Strong rule-outAccept negative

LR- 0.1-0.5

Repeat testOr different test

LR- > 0.5

Trust judgmentTest is weak

Sequential Testing Decision Tree

When One Test Isn't Enough

Initial Screening Test

↓

Positive

↓

Confirmatory TestHigh specificity

↓

PositiveDiagnose

NegativeFalse alarm

Negative

↓

Likely negativeIf high sens screen

"Armed with sensitivity, specificity, likelihood,
armato dello SROC e della misura dell'accordo,
puoi vedere oltre la menzogna del test—
e giudicarne la verità te stesso."

Non hai sentito parlare del paziente
che ha ricevuto il wrong blood,
non perché il test fosse sbagliato,
but because no one performed it?

Il test che non era Fatto

HOSPITALS WORLDWIDE

ABO blood typing is nearly 100% accurate when performed.

Yet transfusion reactions still kill—non a causa del fallimento del test, ma da human failure:

• Wrong blood drawn from wrong patient
• Le etichette scambiate in laboratorio
• Bedside check skipped in emergency

In the UK, 1 in 13,000 transfusions sono state inviate al paziente sbagliato. Il test ha funzionato. Il sistema non è riuscito.

Bolton-Maggs PHB. Transfus Med. 2016;26:303-311

Test vs. System Decision Tree

Where Can Things Go Wrong?

Diagnostic Process

↓

Error Source?

Test itself

Analytical ErrorSens/Spec issue

↓

Better test needed

Pre-analytical

Wrong sampleID error

↓

System fix needed

Post-analytical

Wrong actionReporting error

↓

Process fix needed

"The perfect test means nothing
se il sangue è sbagliato disegnato,
viene applicata l'etichetta sbagliata,
è appesa la borsa sbagliata."

Gli studi DTA misurano l'accuratezza del test. Non misurano la precisione del sistema.

Non hai visto l'algoritmo
che ha imparato da biased data,
e diffuso quel pregiudizio
to every patient it touched?

La rivoluzione diagnostica dell'IA

STANFORD & BEYOND, 2017-PRESENT

Deep learning algorithms now match dermatologists at detecting skin cancer.

Ma i dati di training era predominantly light skin. On dark skin, performance dropped significantly.

L'algoritmo ha appreso i modelli, ma anche biases.

E quando distribuito senza convalida esterna, ha funzionato peggio del previsto perché training population didn't match the clinical population.

Esteva A et al. Nature. 2017;542:115-118; Adamson AS. JAMA Dermatol. 2018

AI Validation Decision Tree

Questa intelligenza artificiale è pronta per l'uso clinico?

AI Diagnostic Tool

↓

Validation Type?

Internal only

High RiskOverfitting likely

↓

Not ready

External validation

BetterBut check population

↓

Corrisponde al tuo pazienti?

YesConsider use

NoCaution

Prospective RCT

Gold StandardPatient outcomes

Calibrazione AI: il problema nascosto

DISCRIMINATION VS. CALIBRATION

Discrimination (AUC/ROC): Can the AI rank patients by risk?

Calibration: When the AI says "80% risk," do 80% actually have disease?

Molti strumenti AI hanno good AUC but poor calibration. Questo è l’errore del tasso di base in forma algoritmica.

AUC

Can it rank?
(usually reported)

CAL

Is probability accurate?
(often ignored)

"E l'algoritmo ha imparato dai dati,
e i dati erano distorti,
e la distorsione si è estesa a ogni previsione—
e nessuno si è chiesto: chi mancava alla formazione impostato?"

Il paziente chiede: "Is my test positive?"

But what they mean is:
"Ho il malattia?"

Come colmare questa lacuna?

Communication Scripts

SCRIPT 1: EXPLAINING A POSITIVE RESULT

"Il tuo test è risultato positivo. Ma voglio spiegarti cosa significa."

"Questo test è efficace per trovare persone affette da questa condizione, ma presenta anche falsi allarmi."

"In base ai tuoi fattori di rischio, c'è circa una [X]% possibilità che sia un vero positivo."

"We'll do a confirmatory test to be certain before any treatment."

Communication Scripts

SCRIPT 2: EXPLAINING A NEGATIVE RESULT (HIGH SUSPICION)

"Your test came back negative, but I'm still concerned."

"Questo test può non rilevare casi, soprattutto nelle fasi iniziali della malattia."

"Dati i tuoi sintomi, vorrei ripetere il test tra qualche giorno o provare un test diverso."

"A negative test doesn't always mean you're clear—i tuoi sintomi importa."

Communication Decision Tree

Come spiegare i risultati del test

Test Result

↓

Positive

↓

PPV?

>90%"Very likely true"

<90%"Need to confirm"

Negative

↓

NPV?

>95%"Very reassuring"

<95%"Still watch symptoms"

Domande da porre al medico

1

"Quanto è accurato questo test?"

Chiedere sensibilità e specificità in un linguaggio semplice

2

"E se il risultato fosse sbagliato?"

Comprendere le conseguenze dei falsi positivi e negativi

3

"What happens next?"

Will there be a confirmatory test? Repeat test? Treatment?

4

"What if I don't get tested at all?"

Comprendere il compromesso tra testare e non testare

"Il test parla in numeri.
Il paziente sente le paure e speranze.
Il compito del guaritore è tradurre:
colmare il divario tra statistica e anima."

A test may be accurate.
But is it worth it?

What does it cost—in money,
in anxiety, in harm?

La soglia del test-trattamento

When Is Testing Worthwhile?

Pre-Test Probability

↓

Very Low

Below Test ThresholdDon't test, reassure

Intermediate

Testing ZoneTest will change management

Very High

Above Treat ThresholdDon't test, treat

THE PRINCIPLE

Test only when the result will cambia ciò che fai. If you'd treat regardless, or not treat regardless—why test?

Qualità delle prove GRADE

Valutazione DTA Evidenza

⊕⊕⊕⊕

HIGH

Studi multipli di alta qualità, risultati coerenti, direttamente applicabili

⊕⊕⊕○

MODERATE

Some limitations in study quality, consistency, or applicability

⊕⊕○○

LOW

Serious limitations—may need to downgrade recommendations

⊕○○○

VERY LOW

Very serious limitations—evidence uncertain

Cost-Consequence Analysis

Example: Universal vs. Targeted Screening

Cost per case detected (universal)

$50,000

Cost per case detected (high-risk only)

$5,000

Cases missed by targeted approach

~10%

False positives avoided by targeted

~90%

Quale approccio è giusto per la vostra popolazione?

"A test is not just accurate or inaccurate.
It has costs—in money, in worry, in harm.
Il medico saggio valuta tutti questi elementi—
ed esegue i test solo quando i test servono allo scopo paziente."

La curva SROC mostra where il test esegue.

But how certain are we?
E quanto sarà vary in practice?

Confidence vs. Prediction Regions

Two Types of Uncertainty

95% CI (summary estimate)

Previsione al 95% (studi futuri)

What Each Region Tells You

CI

Confidence Region (smaller ellipse)

Dove siamo sicuri al 95% che true average risieda la sensibilità/specificità. L'incertezza sulla stima sintetica.

PI

Prediction Region (larger ellipse)

Where we expect 95% of future studies a scendere. Tiene conto dell'eterogeneità tra gli studi.

CLINICAL IMPLICATION

Se la regione di previsione è ampia, il test potrebbe funzionare in modo molto diverso nella tua impostazione rispetto a quanto suggerito dalla media. Wide prediction = high heterogeneity = investigate sources.

Bivariate Model Interpretation

Lettura dei risultati della meta-analisi

Summary Sens/Spec

↓

Check Regions

CI narrow, PI narrow

ConsistentAffidare alla media

CI narrow, PI wide

HeterogeneousLa media potrebbe non essere attendibile applica

CI wide

UncertainSono necessari ulteriori studi

"L'area di confidenza ti dice: Quanto siamo sicuri?
L'area di previsione ti dice: Quanto varierà?
Both questions matter—
per il test che utilizzerai domani potrebbe non essere il media."

References

Key Sources

Carreyrou J. Bad Blood. Knopf, 2018. [Theranos]
CDC. MMWR. 1987;36(49):833-840. [HIV blood supply]
Herbst AL et al. N Engl J Med. 1971;284:878-881. [DES]
Moyer VA. Ann Intern Med. 2012;157:120-134. [PSA]
Pope JH et al. N Engl J Med. 2000;342:1163-1170. [Troponin]
Steingart KR et al. Cochrane 2014;1:CD009593. [GeneXpert]
Dinnes J et al. Cochrane 2022;7:CD013705. [COVID RAT]
UK Panel. Lancet. 2012;380:1778-1786. [Mammography]
Jack CR et al. Lancet Neurol. 2018;17:760-773. [Amyloid]
WHO. Malaria RDT Performance. 2022.
Reitsma JB et al. J Clin Epidemiol. 2005;58:982-990. [Bivariate]
Whiting PF et al. Ann Intern Med. 2011;155:529-536. [QUADAS-2]
Bolton-Maggs PHB. Transfus Med. 2016;26:303-311.

Un test è sensibile al 99% e specifico al 99%. La prevalenza della malattia è 1/1000. Un paziente risulta positivo. Qual è la probabilità che abbiano la malattia?

99%

90%

About 9%

50%

What does "SnNout" mean?

A highly Sensitive test, when Negative, rules OUT disease

A highly Specific test, when Negative, rules OUT disease

Sensitivity should be used for screening

Specificity should be above 90%

Perché il sangue è stato contaminato dall'HIV nonostante i test?

The tests had low specificity

Tests had a window period with zero sensitivity in early infection

I test non sono stati eseguiti correttamente

I test erano troppo costosi

Quale dominio QUADAS-2 valuta se il test è stato interpretato senza conoscere il diagnosi?

Patient Selection

Index Test

Reference Standard

Flusso e tempistiche

✔

Course Complete

"Ora conosci i quattro risultati,
le due virtù di un test,
L'errore della base tasso,
l'arte di mettere in comune le prove,
e i pregiudizi che nascondono la verità.

Quando la prossima prova ti mentirà:
lo saprai."