試練の時: 究極の DTA コース (V3)

その女性の話を聞いたことがありませんか
who promised to 一滴の血で世界を変える,
who raised billions on a test that never worked?

Palo Alto, 2003

STANFORD UNIVERSITY

19 歳の少年は、一滴の滴から何百もの血液検査を受けるというビジョンを抱いて中退しました。

Investors believed. Walgreens believed. The Pentagon believed.

They gave her $9 billion.

しかし、テストでは間違った結果が得られました。患者は、HIVに感染していないにもかかわらず、HIVに感染していると告げられました。患者は、自分の血液は正常だったと言われました。 dying.

Carreyrou J. Bad Blood. 2018

欺瞞の決定木

What Theranos Did vs. What Should Happen

New Diagnostic Test

↓

SHOULD DO

Validate Against Gold Standard

↓

Publish TP/FP/FN/TN

↓

FDA Approval

THERANOS DID

Skip Validation

↓

Hide Failures

↓

Harm Patients

「そしてテストは嘘をつきました、
そしてその嘘は確実性を帯びていた、
そして誰も 2×2 テーブルを要求しませんでした。"

これが、私たちが診断テストの精度を研究する理由です。

When a test speaks,
あるだけです four possible truths.

二つは祝福です。 2つは呪いです。

結果のツリー

Every Test Result Has a Reality Behind It

Patient Tested

↓

真実とは何でしょうか？

Has Disease

D+

↓

TPTest +

FNTest -

No Disease

D-

↓

FPTest +

TNTest -

神聖な 2x2 テーブル

HIV Rapid Test Example (Real Data)

	HIV+	HIV-	Total
Test +	98	3	101
Test -	2	895	897
Total	100	898	998

この表からすべての真実が得られます

Sensitivity = 98/100 = 98%
Specificity = 895/898 = 99.7%

"Two outcomes save. Two outcomes harm.
TP、TN: テストは真実でした。
FP、FN: 検査は嘘だった。
Know them by name, for they determine fate."

血液検査のことを聞いたことがありませんか?
found clean,
そして何千人もの人に与えられました—
while death swam within it?

血液供給危機、1985 年

UNITED STATES

When HIV testing began, doctors celebrated: they could now screen the blood supply.

しかし、テストには window period—ウイルスが存在していた感染後数週間ですが、 undetectable.

血液検査が行われました。血液検査は「陰性」だった。輸血された。

8,000-12,000 Americans より良い検査が可能になる前に、輸血によって感染したのです。

CDC. MMWR. 1987;36(49):833-840

The Window Period Decision Tree

Why False Negatives Are Deadly

Person Recently Infected

↓

Time Since Infection?

< 2 weeks

Test NEGATIVEVirus present!

↓

Blood DonatedOthers infected

> 4 weeks

Test POSITIVECorrectly detected

↓

Blood DiscardedSupply safe

時間の経過とともに感度が変化する

0%

Day 1-7
Eclipse period

~50%

Day 14
Seroconversion

~95%

Day 21
Most detected

99.9%

Day 45+
Window closed

THE LESSON

感度は固定ではありません。 It depends on when you test. A "99% sensitive" test may be 0% sensitive in early infection.

「そしてテストでは『クリーン』と出ました。
なぜなら、ウイルスはまだその姿を現していなかったからだ。
そして血は分かち合った、
そして感染は罪のない人々に広がった。」

母親に与えられた錠剤について聞いたことがありますか?
to protect their pregnancies,
that planted cancer in their daughters
twenty years before it bloomed?

1938 年から 1971 年の DES の悲劇

UNITED STATES & EUROPE

Diethylstilbestrol (DES) was given to millions of pregnant women to prevent miscarriage.

No proper clinical trial was ever conducted. Doctors assumed it worked because it seemed reasonable.

Decades later, their daughters developed a rare cancer: clear cell adenocarcinoma of the vagina. A cancer so rare it was a diagnostic signal in itself.

5-10 million women の被害が暴露されました。

Herbst AL et al. N Engl J Med. 1971;284:878-881

検証デシジョンツリー

What Should Have Happened

New Medical Intervention

↓

適切にテストされましたか?

YES

Randomized Trial

↓

Long-term Follow-up

↓

Know True Effects利益と害

NO (DES)

Assumption Only

↓

Widespread Use

↓

Hidden HarmDiscovered too late

診断信号

希少性が高まったとき証拠

膣の明細胞腺癌は若い女性では非常にまれであったため、 7 cases in one hospital triggered an investigation.

クラスター自体が診断検査となった。
Sensitivity to DES exposure: nearly 100%
この年齢でこの癌を患っていれば、ほぼ確実に感染していることになる。

1:1000

Risk of clear cell
cancer in DES daughters

5-10M

Women exposed
worldwide

そして母親たちは錠剤を服用した。希望、
そして娘たちは影の中で成長しました
そして 20 年後、癌が開花しました—
a diagnosis that indicted a generation of medicine."

A test has two virtues and two vices.

Sensitivity: 病気の人を見つけることはできますか?

Specificity：健康な人を救うことができるでしょうか？

感受性: ハンター

THE FORMULA

Sensitivity = TP / (TP + FN)

"Of all the sick, how many did we catch?"

Worked Example: COVID PCR Test

Given: 200 infected patients tested

TP = 196 (correctly positive), FN = 4 (missed)

Sensitivity = 196 / (196 + 4) = 196/200 = 98%

Interpretation: Test catches 98 of every 100 infected people

特異性: ガーディアン

THE FORMULA

Specificity = TN / (TN + FP)

"Of all the healthy, how many did we spare?"

Worked Example: Same COVID PCR Test

Given: 1000 uninfected people tested

TN = 999 (correctly negative), FP = 1 (false alarm)

Specificity = 999 / (999 + 1) = 999/1000 = 99.9%

Interpretation: Test correctly clears 999 of every 1000 healthy people

記憶のルール

When to Use Which Test

あなたは何が必要ですか？

RULE OUT disease

Use HIGH SENSITIVITY

↓

SnNoutSensitive Negative = OUT

RULE IN disease

Use HIGH SPECIFICITY

↓

SpPinSpecific Positive = IN

「敏感さが病人を捕まえる。
特異性があれば問題はありません。
But no test masters both perfectly—
これが我々が負う重荷だ。」

医師の診察を受けなかったのですか
who saw 99% accurate
and believed a positive result meant 99% certainty?

これは医療における最も致命的な間違いです。

基本料金の誤謬

THE PUZZLE

A disease affects 1 in 1000 people.
検査の感度は 99%、特異度は 99% です。
A patient tests positive.

彼らが病気に罹患している確率はどれくらいでしょうか?

Most doctors say ~99%. 本当の答えは約9％です。

明らかになった数学

Testing 100,000 People (Prevalence 1/1000)

Step 1: 100 have disease, 99,900 healthy

Step 2: Of 100 sick: 99 test positive (TP), 1 negative (FN)

Step 3: Of 99,900 healthy: 999 test positive (FP), 98,901 negative (TN)

Step 4: Total positives = 99 + 999 = 1,098

PPV = TP / All Positives = 99 / 1,098 = 9%

陽性結果の 91% は偽陽性です。

Interactive Base Rate Calculator

See How Prevalence Changes PPV

Prevalence:

1%

Sensitivity:

99%

Specificity:

99%

9%

Positive Predictive Value (PPV)

陽性の 91% は誤りアラーム

有病率の決定木

Same Test, Different Settings

Test: 99% Sens, 99% Spec

↓

Where Is Testing Done?

General Pop
0.1%

PPV = 9%91% false +

High-Risk
10%

PPV = 92%8% false +

Confirmatory
50%

PPV = 99%1% false +

「そして医師は『99％正確です』と言いました。」
すると患者は「99％確実だ」と聞きました。
そして二人とも騙された――
なぜなら彼らは、「この病気はどれくらい珍しいのですか？」と尋ねるのを忘れていたからです。」

マシン
that could find TB in two hours,
のことを聞いたことがありますか? revolutionary—
と呼ばれていましたが、 drug-resistant strains?

南アフリカの GeneXpert Story

CAPE TOWN, 2010

を見逃していました。1 世紀にわたり、結核の診断には数週間かけて細菌を増殖させる必要がありました。次に GeneXpert が登場しました。結果は 2 hours.

South Africa deployed it nationwide. The WHO endorsed it.

でしたが、患者の場合は low bacterial loads—often HIV co-infected— sensitivity dropped to 67%. One in three cases missed.

、リファンピシン耐性を検出するために、耐性症例の 5% を見逃していました。それらの患者は間違った治療を受けました。耐性結核が蔓延しています。

Steingart KR et al. Cochrane Database Syst Rev. 2014;1:CD009593

TB Diagnosis Decision Tree

GeneXpert が不十分な場合

Suspected TB Patient

↓

GeneXpert Test

↓

Positive

↓

Rifampicin?

SensitiveStandard Tx

ResistantMDR-TB Tx

Negative

↓

HIV+ or High Suspicion?

YesCulture needed

NoLikely negative

Sensitivity by Patient Type

98%

Smear-positive
(high bacterial load)

67%

Smear-negative
(low bacterial load)

61%

HIV co-infected
(immune suppressed)

THE LESSON

臨床試験での検査の感度は、患者の感度と一致しない可能性があります。 あなたの集団を知りましょう。

そして機械は言いました。「陰性」
そして医師は機械を信じた
そして患者は肺に結核を抱えて帰宅し、
咳を我慢して外へ出た。"

男性向けのテストについて聞いたことはありませんか
癌が発見された never kill,
そして、それが治療法につながりました。 destroyed lives?

PSAスクリーニングの悲劇

UNITED STATES, 1990s-2010s

PSA (Prostate-Specific Antigen) could detect prostate cancer early.

医師たちは何百万人もの男性を検査しました。がんが見つかった。前立腺を切除した。

しかし、これらの「がん」の多くは症状を引き起こすことはありませんでした。手術が原因で インポテンスと失禁 in men who would have died of old age, not cancer.

Moyer VA. Ann Intern Med. 2012;157:120-134

害の数

1

～から救われた命
prostate cancer
per 1000 screened

30-40

Men made impotent
or incontinent
per 1000 screened

100+

False positives
(biopsies, anxiety)
per 1000 screened

THE REVERSAL

In 2012, the US Preventive Services Task Force recommended against 定期的なPSA検査。テストでは、見つける必要のないものが多すぎました。

Patient Decision Aid: PSA Screening

55 ～ 69 歳の男性 1,000 人が 13 年間検査を受けた場合

Deaths from prostate cancer prevented

1-2 men

Men who will have false positive requiring biopsy

100-120 men

決して害を及ぼすことのない癌と診断された男性

20-50 men

Men left impotent or incontinent from treatment

30-40 men

このトレードオフはあなたに受け入れられますか?

「そしてテストで影が見つかった、
そして外科医が切った、
そして男は生きていた――無力で失禁していた――
決して目覚めることのなかった癌からです。」

胸痛を患った男性
のことを聞いたことがありませんか?その男性の最初のトロポニンは normal,
自宅に送られ、その後死亡しました
朝ですか?

トロポニンのタイミングの問題

EMERGENCY DEPARTMENTS WORLDWIDE

トロポニンは心臓発作診断のゴールドスタンダードです。しかし、それには時間がかかります 3-6 hours to rise after myocardial injury.

A patient arrives one hour after chest pain begins. Troponin is tested: normal. "You're fine. Go home."

心臓は瀕死の状態でした。タンパク質はまだ漏れていませんでした。

Studies show 2-5% of MI patients sent home from ED die within 30 days.

Pope JH et al. N Engl J Med. 2000;342:1163-1170

Serial Testing Decision Tree

2 トロポニンプロトコル

Chest Pain Patient

↓

First Troponin

↓

Elevated

↓

Treat as MI

Normal

↓

When Did Pain Start?

<6 hrs

Wait 3 hrsRepeat troponin

>6 hrs

Low riskConsider d/c

High-Sensitivity Troponin

~70%

Conventional troponin
sensitivity at 0 hrs

~95%

hs-Troponin
sensitivity at 0 hrs

99%

hs-Troponin
at 3 hrs serial

THE TRADE-OFF

High-sensitivity troponin catches more heart attacks early. But it also has more false positives—elevated in kidney disease, heart failure, sepsis, and marathon runners.

「そして、検査では「正常」と判定されました。
、心臓はちょうど死に始めていたからです。
そして患者は安心してください、
and went home to finish dying."

感度はテストを表します。
特異性はテストを説明します。

しかし、患者はこう尋ねます。
"I tested positive. What are MY chances?"

Likelihood Ratios

POSITIVE LIKELIHOOD RATIO

LR+ = Sensitivity / (1 - Specificity)

How much more likely is a + result in sick vs healthy?

NEGATIVE LIKELIHOOD RATIO

LR- = (1 - Sensitivity) / Specificity

How much more likely is a - result in sick vs healthy?

フェイガンのノモグラム

テスト前からテスト後の確率まで

Pre-Test
Probability

99%

50%

20%

5%

1%

Likelihood
Ratio

100

10

1

0.1

0.01

Post-Test
Probability

99%

80%

50%

20%

1%

Draw a line from pre-test through LR to find post-test probability

Interpreting Likelihood Ratios

このテストはどれほど強力ですか?

LR+ Value?

LR+ > 10Strong rule-in

5-10Moderate

2-5Weak

1-2Useless

LR- Value?

< 0.1Strong rule-out

0.1-0.2Moderate

0.2-0.5Weak

0.5-1Useless

「感受性は病人について語る。
特異性は井戸について語ります。
But the likelihood ratio answers:
この結果はこの患者にとって何を意味するのでしょうか?"

村で熱を出している子供を見かけませんでしたか？
と言う迅速検査 negative,
and the Plasmodium それは増え続けましたか？

マラリアRDT問題

SUB-SAHARAN AFRICA

Malaria kills 600,000 people yearly, mostly children under 5.

Rapid Diagnostic Tests were meant to guide treatment in remote areas without microscopes or laboratories.

But when parasitemia is low—RDT はケースを見逃します. And when P. falciparum HRP2遺伝子を削除します— the RDT sees nothing at all.

WHO. Malaria RDT Performance. 2022

臨床意思決定ツリー

Child with Fever in Malaria-Endemic Area

Febrile Child

↓

Perform RDT

↓

RDT Positive

↓

マラリアの治療

RDT Negative

↓

Clinical Suspicion?

High

Treat Anywayor Microscopy

Low

Look forOther Cause

Sensitivity Varies by Parasitemia

95%

High parasitemia
(>200/μL)

75%

Low parasitemia
(100-200/μL)

50%

Very low
(<100/μL)

臨床レッスン

A negative RDT does not rule out malaria in endemic areas. Clinical judgment must override the test when suspicion is high.

「そして検査結果は『陰性』でした」
そして子供は家に帰されました、
そして寄生虫は暗闇の中で増殖し、
そして朝までに子供は目を覚ますことができませんでした。"

疫病の年に、
世界は fast.

というテストを必要としていましたが、速いということは accurate.

と同じではありません。コクラン評決

COVID-19 Rapid Antigen Tests (155 Studies)

Population	Sensitivity	Missed
Symptomatic	73%	27%
Asymptomatic	55%	45%
First 7 days	80%	20%

Dinnes J et al. Cochrane Database Syst Rev. 2022;7:CD013705

The False Security Decision Tree

Thanksgiving 2020: What Happened

Family Member Tests Negative

↓

Truly Negative?

55% if asymptomatic

True NegativeSafe to gather

45% if asymptomatic

FALSE NegativeInfectious!

↓

家族と集まりGrandparents infected

「そして検査結果は『陰性』でした」
家族は抱き合い、
そして冬の終わりまでに
祖父は「

癌を発見した検査
のことを聞いたことがありますか? would never kill,
そして、それが治療法につながりました。 caused more harm than the disease?

過剰診断問題

3-4

Lives saved
per 10,000 screened

~15

Overdiagnosed
(treated unnecessarily)

~500

False alarms
(anxiety, biopsies)

THE QUESTION

3～4人の命を救うために、およそ15人の女性が、本来なら害を及ぼすことのなかった癌に対して手術、放射線治療、化学療法を受けています。

このトレードオフは価値がありますか?

Patient Decision Aid: Mammography

の場合50～69歳の女性10,000人が10年間スクリーニング検査を受ける

Deaths from breast cancer prevented

3-4 women

Women called back for false alarms

~500 women

Unnecessary biopsies

~200 women

女性に悪影響を及ぼさないがんの治療を受ける

~15 women

スクリーニング検査はあなたに適していますか?

The Screening Cascade Decision Tree

10年間にわたり1万人の女性を検査

10,000 Women

↓

~1,000 RecalledAbnormal

↓

~500 False
Alarm

~500 Biopsy
~50 cancer

~9,000 Cleared

Of ~50 Cancers Found

~35 Would Kill3-4 saved

~15 Would Never KillOverdiagnosed

「そしてテストで影が見つかった、
し、それをガンと呼びました
そして女性は切られ火傷を負いました—
彼女の日々を決して暗くすることのなかった影のために。」

脳内のプラークを見つけるスキャン
について聞いたことがありますか?
しかし、それを伝えることはできません
心はこうなる fade?

アミロイドパラドックス

ALZHEIMER'S RESEARCH, 2010s-2020s

PET scans can now detect amyloid plaques—the hallmark of Alzheimer's.

But 30% of cognitively normal elderly have amyloid plaques. They may never develop dementia.

And 認知症患者の10～20% have no amyloid.

検査ではプラークが見つかるが、プラークは病気ではない。 我々は代用者を検査しているのだ。結果。

Jack CR et al. Lancet Neurol. 2018;17:760-773

Surrogate vs. Outcome Decision Tree

実際に検査するのは何ですか?

Diagnostic Test

↓

What Does It Detect?

Outcome itself

Direct Diagnosis例: がんの生検

↓

High clinical value

Surrogate marker

Indirect Signal例: 認知症のアミロイド

↓

Validated link?

YesUse cautiously

NoLimited value

"そして、スキャンでプラークが見つかりました。
医師はこれをアルツハイマー病と名付けました
そして患者は恐怖の中で暮らしました—
of a forgetting that might never come."

すべての研究が平等に作成されるわけではありません。

Some are biased.
Some are poorly designed.
一部の研究は、そうではありません trusted.

小麦をもみ殻から分離するにはどうすればよいですか?

QUADAS-2: 品質チェックリスト

Four Domains of Risk of Bias

1

Patient Selection

連続サンプルまたはランダムサンプルが登録されましたか?ケースコントロール設計は避けられましたか?

2

Index Test

参照標準についての知識なしにテストが解釈されましたか?しきい値は事前に指定されていましたか?

3

Reference Standard

参照標準は状態を正しく分類する可能性がありますか?盲目的に解釈されたのでしょうか？

4

流れとタイミング

テスト間に適切な間隔がありましたか?すべての患者に同じ参照標準が投与されましたか?

QUADAS-2 Decision Tree

この研究を信じるべきですか?

DTA Study

↓

Check All 4 Domains

All Low Risk

High QualityTrust results

Some Unclear

Moderate慎重に使用してください

Any High Risk

Low Quality結果に偏りがある可能性がある

DTA 研究によくある偏見

!

Verification Bias

Only positive tests get the reference standard → inflates sensitivity

!

Spectrum Bias

研究対象集団が臨床現実と異なる → 結果は一般化しない

!

Incorporation Bias

Index test is part of reference standard → artificially high accuracy

!

Review Bias

Index test interpreted knowing reference result → inflates both metrics

「数字を信じる前に、
ask: How were they gathered?
偏った研究は自信を持って語ります—
but its confidence is a lie."

ある研究では、欺瞞します。
1 つの研究はお世辞かもしれません。

でも集まると すべての証拠—
the truth becomes harder to hide.

Why DTA Meta-Analysis Is Different

THE PROBLEM

感度と特異度は correlated. When one goes up, the other tends to go down.

治療効果のように別々にプールすることはできません。必要なのは bivariate model.

SROC 曲線

Summary Receiver Operating Characteristic

Sensitivity

1 - Specificity (False Positive Rate)

Individual studies

Summary estimate

SROC の読み取り

曲線は何を教えてくれますか?

SROC Curve Position

↓

Top-Left Corner

Excellent TestHigh sens + spec

Near Diagonal

Useless TestNo better than chance

Points Scattered

High HeterogeneityInvestigate sources

「1 つの研究では騙される可能性があります。
多くの研究を比較検討一緒に、
真実の道筋、
テストで実際に何ができるかを明らかにする SROC 曲線を追跡します。"

しかし、もし研究が disagree?

One says sensitivity is 95%.
Another says 60%.

あなたはどの真実を信じますか？

Sources of Heterogeneity

なぜ研究結果が一致しないのか

同じテストでも結果は異なりますか?

ThresholdDifferent cutoffs

PopulationSeverity, age

SettingPrimary vs specialist

QualityBias, blinding

Measuring Disagreement: I²

I² < 25%

Low
Studies agree

I² 25-75%

Moderate
Some variation

I² > 75%

High
Major disagreement

THE WARNING

When I² > 75%, the pooled estimate may be meaningless. Explain the disagreement before averaging.

「研究結果が一致しないときは、
反対意見を黙らせないでください。
Ask: Why do they see differently?
意見の相違自体が教えてくれます。」

DTA ツールキット

重要な対策とそれをいつ使用するか

The Checklist

✓

Was there a valid reference standard?

Gold standard applied to ALL patients?

✓

通訳者は盲目だったのでしょうか？

Test readers unaware of diagnosis?

✓

スペクトルは適切でしたか?

母集団と類似した患者?

✓

しきい値は事前に指定されていましたか?

それとも結果を最大化するために選択されましたか?

When Results Don't Match Suspicion

The Clinical Override Decision Tree

Test Negative, High Suspicion

↓

What Is the LR-?

LR- < 0.1

Strong rule-outAccept negative

LR- 0.1-0.5

Repeat testOr different test

LR- > 0.5

Trust judgmentTest is weak

Sequential Testing Decision Tree

When One Test Isn't Enough

Initial Screening Test

↓

Positive

↓

Confirmatory TestHigh specificity

↓

PositiveDiagnose

NegativeFalse alarm

Negative

↓

Likely negativeIf high sens screen

"Armed with sensitivity, specificity, likelihood,
SROCと合意手段で武装し、
テストの嘘を見破ることができる――
そしてその真実を自分で判断してください。」

患者さんのこと聞いてないんですか？
誰が受け取ったのか wrong blood,
テストが間違っていたからではなく、
but because no one performed it?

行われなかったテスト

HOSPITALS WORLDWIDE

ABO blood typing is nearly 100% accurate when performed.

Yet transfusion reactions still kill—テストの失敗によるものではなく、 human failure:

• Wrong blood drawn from wrong patient
• ラボでのラベルの切り替え
• Bedside check skipped in emergency

In the UK, 1 in 13,000 transfusions 間違った患者のところに行ってしまいます。テストはうまくいきました。システムが失敗しました。

Bolton-Maggs PHB. Transfus Med. 2016;26:303-311

Test vs. System Decision Tree

Where Can Things Go Wrong?

Diagnostic Process

↓

Error Source?

Test itself

Analytical ErrorSens/Spec issue

↓

Better test needed

Pre-analytical

Wrong sampleID error

↓

System fix needed

Post-analytical

Wrong actionReporting error

↓

Process fix needed

"The perfect test means nothing
間違った血液が採取された場合、
間違ったラベルが適用されている場合、
間違ったバッグが掛けられています。」

DTA 研究ではテストの精度を測定します。システムの精度は測定されません。

アルゴリズムを見たことがありませんか
から学んだこと biased data,
そしてその偏見を広める
to every patient it touched?

AI 診断革命

STANFORD & BEYOND, 2017-PRESENT

Deep learning algorithms now match dermatologists at detecting skin cancer.

しかし、トレーニングデータは predominantly light skin. On dark skin, performance dropped significantly.

アルゴリズムはパターンを学習しましたが、 biases.

また、外部検証なしでデプロイすると、予想よりもパフォーマンスが悪くなりました。 training population didn't match the clinical population.

Esteva A et al. Nature. 2017;542:115-118; Adamson AS. JAMA Dermatol. 2018

AI Validation Decision Tree

この AI は臨床使用の準備ができていますか?

AI Diagnostic Tool

↓

Validation Type?

Internal only

High RiskOverfitting likely

↓

Not ready

External validation

BetterBut check population

↓

あなたの患者さんに合っているでしょうか？

YesConsider use

NoCaution

Prospective RCT

Gold StandardPatient outcomes

AI キャリブレーション: 隠れた問題

DISCRIMINATION VS. CALIBRATION

Discrimination (AUC/ROC): Can the AI rank patients by risk?

Calibration: When the AI says "80% risk," do 80% actually have disease?

多くの AI ツールは、 good AUC but poor calibration。これは、アルゴリズム形式における基本レートの誤謬です。

AUC

Can it rank?
(usually reported)

CAL

Is probability accurate?
(often ignored)

「そしてデータから学習したアルゴリズムは、
データには偏りがあり、
そしてバイアスはあらゆる予測に広がります—
そして誰も、「誰がトレーニングセットから欠けていたのか？」とは尋ねませんでした。

患者はこう尋ねます。 "Is my test positive?"

But what they mean is:
「私は病気ですか？」

このギャップをどのように埋めるのでしょうか?

Communication Scripts

SCRIPT 1: EXPLAINING A POSITIVE RESULT

「あなたの検査結果は陽性でした。しかし、それが何を意味するのか説明したいと思います。」

「この検査は、この状態にある人を見つけるのに優れていますが、誤報もあります。」

「あなたの危険因子に基づくと、約 [X]% おそらくこれは真の陽性反応だろう。」

"We'll do a confirmatory test to be certain before any treatment."

Communication Scripts

SCRIPT 2: EXPLAINING A NEGATIVE RESULT (HIGH SUSPICION)

"Your test came back negative, but I'm still concerned."

「この検査では、特に病気の初期に症例を見逃す可能性があります。」

「あなたの症状を考えると、数日後にもう一度検査するか、別の検査を試してみたいと思います。」

"A negative test doesn't always mean you're clear—あなたの症状も重要です."

Communication Decision Tree

テスト結果の説明方法

Test Result

↓

Positive

↓

PPV?

>90%"Very likely true"

<90%"Need to confirm"

Negative

↓

NPV?

>95%"Very reassuring"

<95%"Still watch symptoms"

医師に尋ねるべき質問

1

「この検査はどれくらい正確ですか？」

平易な言葉で繊細さと具体性を求める

2

「結果が間違っていたらどうする？」

偽陽性と偽陰性の結果を理解する

3

"What happens next?"

Will there be a confirmatory test? Repeat test? Treatment?

4

"What if I don't get tested at all?"

テストする場合とテストしない場合のトレードオフを理解する

「テストは数字でものを言います。
患者は恐怖と希望を感じながら聞いています。
ヒーラーの仕事は翻訳です—
統計と魂の間のギャップを埋めるために。」

A test may be accurate.
But is it worth it?

What does it cost—in money,
in anxiety, in harm?

検査と治療の閾値

When Is Testing Worthwhile?

Pre-Test Probability

↓

Very Low

Below Test ThresholdDon't test, reassure

Intermediate

Testing ZoneTest will change management

Very High

Above Treat ThresholdDon't test, treat

THE PRINCIPLE

Test only when the result will やることを変える. If you'd treat regardless, or not treat regardless—why test?

GRADE 証拠の品質

DTA 証拠のグレーディング

⊕⊕⊕⊕

HIGH

複数の質の高い研究、一貫した結果、直接適用可能

⊕⊕⊕○

MODERATE

Some limitations in study quality, consistency, or applicability

⊕⊕○○

LOW

Serious limitations—may need to downgrade recommendations

⊕○○○

VERY LOW

Very serious limitations—evidence uncertain

Cost-Consequence Analysis

Example: Universal vs. Targeted Screening

Cost per case detected (universal)

$50,000

Cost per case detected (high-risk only)

$5,000

Cases missed by targeted approach

~10%

False positives avoided by targeted

~90%

あなたの母集団にとってどのアプローチが適切ですか?

"A test is not just accurate or inaccurate.
It has costs—in money, in worry, in harm.
賢明な臨床医はこれらすべてを考慮します—
検査が患者に役立つ場合にのみ検査を行います。」

SROC 曲線は次のことを示しています where テストが実行されます。

But how certain are we?
そしてそれはいくらになりますか vary in practice?

Confidence vs. Prediction Regions

Two Types of Uncertainty

95% CI (summary estimate)

95% 予測 (将来の研究)

What Each Region Tells You

CI

Confidence Region (smaller ellipse)

95% の信頼性がある場合、 true average 感度/特異度が決まります。概要推定値は不確実です。

PI

Prediction Region (larger ellipse)

Where we expect 95% of future studies 低下する可能性があります。研究間の不均一性を考慮します。

CLINICAL IMPLICATION

予測領域が大きい場合、設定ではテストのパフォーマンスが平均値と大きく異なる可能性があります。 Wide prediction = high heterogeneity = investigate sources.

Bivariate Model Interpretation

メタ分析結果の読み取り

Summary Sens/Spec

↓

Check Regions

CI narrow, PI narrow

Consistent平均値を信頼してください

CI narrow, PI wide

Heterogeneous平均値は当てはまらない可能性がありますapply

CI wide

Uncertainさらに調査が必要です

「信頼領域は次のことを示します: どれくらい確信していますか?
予測領域は次のことを示します: どのくらい変化しますか?
Both questions matter—
明日使用するテストでは、 "

References

Key Sources

Carreyrou J. Bad Blood. Knopf, 2018. [Theranos]
CDC. MMWR. 1987;36(49):833-840. [HIV blood supply]
Herbst AL et al. N Engl J Med. 1971;284:878-881. [DES]
Moyer VA. Ann Intern Med. 2012;157:120-134. [PSA]
Pope JH et al. N Engl J Med. 2000;342:1163-1170. [Troponin]
Steingart KR et al. Cochrane 2014;1:CD009593. [GeneXpert]
Dinnes J et al. Cochrane 2022;7:CD013705. [COVID RAT]
UK Panel. Lancet. 2012;380:1778-1786. [Mammography]
Jack CR et al. Lancet Neurol. 2018;17:760-773. [Amyloid]
WHO. Malaria RDT Performance. 2022.
Reitsma JB et al. J Clin Epidemiol. 2005;58:982-990. [Bivariate]
Whiting PF et al. Ann Intern Med. 2011;155:529-536. [QUADAS-2]
Bolton-Maggs PHB. Transfus Med. 2016;26:303-311.

テストは 99% の感度と 99% の特異性を持っています。病気の有病率は1/1000です。患者が検査で陽性反応を示した。彼らが病気に罹患している確率はどれくらいでしょうか?

99%

90%

About 9%

50%

What does "SnNout" mean?

A highly Sensitive test, when Negative, rules OUT disease

A highly Specific test, when Negative, rules OUT disease

Sensitivity should be used for screening

Specificity should be above 90%

検査にもかかわらず血液供給がHIVに汚染されたのはなぜですか?

The tests had low specificity

Tests had a window period with zero sensitivity in early infection

検査は正しく実施されませんでした

検査は費用が高すぎました

どの QUADAS-2 ドメインが、テストが理解されずに解釈されたかどうかを評価します。診断?

Patient Selection

Index Test

Reference Standard

流れとタイミング

✔

Course Complete

「これで 4 つの結果がわかりました。
テストの 2 つの美徳
根拠の誤りレート、
証拠を集める技術
そして真実を隠す偏見。

次の試練があなたに課せられたとき—
分かるでしょう。"

What Theranos Did vs. What Should Happen

Every Test Result Has a Reality Behind It

HIV Rapid Test Example (Real Data)

Why False Negatives Are Deadly

What Should Have Happened

Worked Example: COVID PCR Test

Worked Example: Same COVID PCR Test

When to Use Which Test

Testing 100,000 People (Prevalence 1/1000)

See How Prevalence Changes PPV

Same Test, Different Settings

GeneXpert が不十分な場合

55 ～ 69 歳の男性 1,000 人が 13 年間検査を受けた場合

2 トロポニン プロトコル

テスト前からテスト後の確率まで

このテストはどれほど強力ですか?

Child with Fever in Malaria-Endemic Area

COVID-19 Rapid Antigen Tests (155 Studies)

Thanksgiving 2020: What Happened

の場合50～69歳の女性10,000人が10年間スクリーニング検査を受ける

10年間にわたり1万人の女性を検査

実際に検査するのは何ですか?

Four Domains of Risk of Bias

Patient Selection

Index Test

Reference Standard

流れとタイミング

この研究を信じるべきですか?

Verification Bias

Spectrum Bias

Incorporation Bias

Review Bias

Summary Receiver Operating Characteristic

曲線は何を教えてくれますか?

なぜ研究結果が一致しないのか

Was there a valid reference standard?

通訳者は盲目だったのでしょうか？

スペクトルは適切でしたか?

しきい値は事前に指定されていましたか?

The Clinical Override Decision Tree

When One Test Isn't Enough

Where Can Things Go Wrong?

この AI は臨床使用の準備ができていますか?

テスト結果の説明方法

「この検査はどれくらい正確ですか？」

「結果が間違っていたらどうする？」

"What happens next?"

"What if I don't get tested at all?"

When Is Testing Worthwhile?

DTA 証拠のグレーディング

HIGH

MODERATE

LOW

VERY LOW

Example: Universal vs. Targeted Screening

Two Types of Uncertainty

Confidence Region (smaller ellipse)

Prediction Region (larger ellipse)

メタ分析結果の読み取り

Key Sources

2 トロポニンプロトコル