Monday, August 18, 2014

Sensitivity Or Specificity? Which Would YOU Prefer?

A typical day at work...from I Love Lucy, first aired September 15, 1952

There are days when the grind feels a lot like Lucy's candy factory as seen in the clip above. But the beat goes on, the images keep coming, and they have to be read. As one of my professors used to say, "Miss 'em slow, or miss 'em fast, boys!" Of course, that was a joke. Of course it was. Definitely.

You probably know the difference between sensitivity and specificity. In essence, sensitivity is the percentage of the time you find something that is actually present. Specificity is the percentage of the time you don't find something when nothing is there. In other words, were I 100% sensitive, I would find every cancer that comes through on the PACS worklist. Were I 100% specific, everyone I declare negative will truly be without disease. Put in tabular form (courtesy of Penn State's online Stat course):

I want all my positives and negatives to be true, with no false positives (saying there is disease when there isn't) or false negatives (saying there is no disease when there is.)

There is a whole science surrounding this stuff. Everyone, and particularly every radiologist, has a different set of sensitivities and specificities, and this is all wrapped up in a concept called Receiver-Operating Characteristics, or ROC. From MediCalc:

In a Receiver Operating Characteristic (ROC) curve the true positive rate (Sensitivity) is plotted in function of the false positive rate (100-Specificity) for different cut-off points. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. A test with perfect discrimination (no overlap in the two distributions) has a ROC curve that passes through the upper left corner (100% sensitivity, 100% specificity). Therefore the closer the ROC curve is to the upper left corner, the higher the overall accuracy of the test (Zweig & Campbell, 1993).
Got it? Just remember that everybody's ROC is going to be different, with different blends of sensitivity and specificity.

Fellow radiologist and wannabe writer Saurabh Jha, M.D., takes the concept one step further with his "fictional" colleagues, Drs. Singh and Jha. I'm guessing the second isn't fictional at all, and I'm sure he based the first on someone he knows.  Anyway, Dr. Jha wrote this piece published in the Healthcare Blog, and republished by KevinMD, and also cited by several radiologist friends of mine.

Who Is the Better Radiologist?

There’s a lot of talk about quality metrics, pay for performance, value-based care and penalties for poor outcomes.

In this regard, it’s useful to ask a basic question. What is quality? Or an even simpler question, who is the better physician?

Let’s consider two fictional radiologists: Dr. Singh and Dr. Jha.

Dr. Singh is a fast reader. Her turn-around time for reports averages 15 minutes. Her reports are brief with a paucity of differential diagnoses. The language in her reports is decisive and her reports contain very few disclaimers. She has a high specificity meaning that when she flags pathology it is very likely to be present.

The problem is her sensitivity. She is known to miss subtle features of pathology.

There’s another problem. Sometimes when reading her reports one isn’t reassured that she has looked at every organ. For example, her report of a CAT scan of the abdomen once stated that “there is no appendicitis. Normal CT.” The referring physician called her wondering if she had looked at the pancreas, since he was really worried about pancreatitis not appendicitis. Dr. Singh had, but had not bothered to enlist all normal organs in the report.

Dr. Jha is not as fast a reader as Dr. Singh. His turn-around time for reports averages 45 minutes. His reports are long and verbose. He meticulously lists all organs. For example, when reporting a CAT of the abdomen of a male, he routinely mentions that “there is no gross abnormalities in the seminal vesicles and prostate,” regardless of whether pathology is suspected or absence of pathology in those organs is of clinical relevance.

He presents long list of possibilities, explaining why he thinks a diagnosis is or is not. He rarely comes down on a specific diagnosis.

Dr. Jha almost never misses pathology. He picks up tiny lung cancers, subtle thyroid cancers and tiny bleeds in the brain. He has a very high sensitivity. This means that when he calls a study normal, and he very rarely does, you can be certain that the study is normal.

The problem with Dr. Jha is specificity. He often raises false alarms such as “questionable pneumonia,” “possible early appendicitis” and “subtle high density in the brain, small punctate hemorrhage not entirely excluded.”

In fact, his colleagues have jokingly named a scan that he recommends as “The Jha Scan Redemption.” These almost always turn out to be normal.

Which radiologist is of higher quality, Dr. Singh or Dr. Jha?

If you were a patient who would you prefer read your scan, the under calling, decisive Dr. Singh or the over calling, painfully cautious Dr. Jha?

If you were a referring physician which report would you value more, the brief report with decisive language and a paucity of differential diagnoses or the lengthy verbose report with long lists on the differential?

If you were the payer which radiologist would you wish the hospital employed, the one who recommended fewer studies or the one who recommended more studies?

If you were a hospital administrator which radiologist would you award a higher bonus, the fast reading Singh or the slow reading Jha? This is not a slam dunk answer because the slow-reading over caller generates more billable studies.

If you were hospital’s Quality and Safety officer or from Risk Management, who would you lose more sleep over, Dr. Singh’s occasional false negatives or Dr. Jha’s frequent false positives? Note, it takes far fewer false negatives to trigger a lawsuit than false positives.

I suppose you would like hard numbers to make an “informed” decision. Let me throw this one to you.

For every 10, 000 chest x-rays Dr. Singh reads, she misses one lung cancer. Dr. Jha does not miss a single lung cancer, but he recommends 200 CAT scans of the chest for “questionable nodule” per 10, 000 chest x-rays. That is 200 more than Dr. Singh. And 199/ 200 of these scans are normal.

I can hear the siren song of an objection. Why can’t a physician have the sensitivity of Dr. Jha and the specificity of Dr. Singh? The caution of Jha and the speed of Singh? The decisiveness of Singh and the comprehensiveness of Jha?

You think I’m committing a bifurcation fallacy by enforcing a false dichotomy. Can’t we have our specificity and eat it?

Sadly, I’m not. It is a known fact of signal theory that no matter how good one is, there is a trade-off between sensitivity and specificity. Meaning if you want fewer false negatives, e.g. fewer missed cancers on chest X-ray, there will be more false positives, i.e. negative CAT scans for questioned findings on chest X-ray.

Trade-off is a fact of life. Yes, I know it’s very un-American to acknowledge trade-offs. And I respect the sentiment. The country did, after all, send many men to the moon.

Nevertheless, whether we like it or not trade-offs exist. And no more so than in the components that make up the amorphous terms “quality” and “value.”

Missing cancer on a chest x-ray is poor quality (missed diagnosis). Over calling a cancer on a chest x-ray which turns out to be nothing is poor quality (waste). But now you must decide which is poorer. Missed diagnosis or waste? And by how much is one poorer than the other.

That’s a trade-off. Because if you want to approach zero misses there will be more waste. And if we don’t put our cards on the table, “quality” and “value” will just be meaningless magic talk. There, I just gave Hollywood an idea for the next Shrek, in which he breaks the iron triangle of quality, access and costs and rescues US healthcare.

If I had a missed cancer on a chest x-ray I would have wanted Dr. Jha to have read my chest x-ray. If I had no cancer then I would have wanted Dr. Singh to have read my chest x-ray. Notice the conditional tense. Conditional on knowing the outcome.

In hindsight, we all know what we want. Hindsight is just useless mental posturing. The tough proposition is putting your money where your mouth is before the event. Before you know what will happen.

This is the ex-ante ex-post dilemma. In case you want a clever term for what is patently common sense.

Dr. Singh is admired until she misses a subtle cancer on a chest x-ray. Then Risk Management is all over her case wondering why? How? What systems must we change? What guidelines must we incorporate?

Really? Must you ask?

Dr. Jha, on the other hand, is insidiously despised and ridiculed by everyone. All who remain unaware that he is merely a product of the zero risk culture in the bosom of which all secretly wish to hide.

The trouble with quality is not just that it is nebulous in definition and protean in scope. It can mean whatever you want it to mean on a Friday. It is that it comprises elements that are inherently contradictory.

Society, whatever that means these days, must decide what it values, what it values more and how much of what it values less is it willing to forfeit to attain what it values more.

Before you start paying physicians for performance and docking them for quality can we be precise about what these terms mean, please?

Thank you.
So what is quality? I guess getting it right every time would be a good start. But that really isn't in the realm of human performance. No one has a vertical ROC curve. If you read enough X-rays and scans, you will miss something. The old saying goes that the only way not to miss anything is not to read anything. That's not very practical.

Our fictional Dr. Singh misses one lung lesion for every 10,000 studies read. Let's say that she reads 200 studies per day; she will miss something every 50 days, every two months or so. Is this acceptable? Frankly, it is fantastic. A rate within acceptable human parameters would be more like missing something on one of every one hundred exams, something like once or twice a day. Is this acceptable? Not, I suppose, if the lesion is in your chest, or your relative's. But it is a completely reasonable number for a flawed human being. Average radiologist miss rates have been quoted at anything from .1% to 30%. An ACR presentation based in part on Dr. David Yousem's materials reveal the following uncomfortable facts:
  • Radiologists error rate reported at 30%
  • >70% perceptual
    • abnormality is not perceived, i.e. “missed”
  • <30% cognitive
    • Abnormality is perceived but misinterpreted
  • Error does not equal negligence
    • Negligence occurs when the degree of error exceeds an accepted standard
  • Missed diagnoses are the major reason radiologists are sued 
    • Most commonly missed: 
      • Cancers (breast and lung are the largest percentacge) 
      • Spine fractures 
  • Retrospective error/miss rate averages 30% (i.e. hindsight is 20-20) 
  • “Real-time” error rate in daily practice averages 3-5%
So back to sensitivity and specificity. Is it possible to be 100% sensitive and find every single lesion, never having a false negative? Yes, if you read VERY slowly and call everything positive, then yes, you will pick up every cancer, but in the process, you will prompt a lot of unnecessary negative scans (and a lot of anxiety) for all the little dots that weren't really cancers after all. This is the fictional Dr. Jha, and no one appreciates him, it seems. Can you be 100% specific, never having a false positive, and never send anyone on to an unneeded followup scan or biopsy? Sure, and then you get sued when you do miss something. And you will. I've heard it said that sometimes the lesion and the radiologist simply never meet. True enough.

The bottom line is that human beings (and their ROC curves) are anything but perfect. We can try to seek perfection by applying quality metrics and such, but in the end, what do we achieve? Possibly an outlier will come to light, someone whose miss rate is well beyond his or her colleagues, or perhaps well below the rest for that matter. So in the end, this implied rating process accomplishes nothing more than the perpetuation of the fiction of our perfection. Which raises impossible expectations in our patients, and sets the trial lawyers to licking their collective chops. After all, how can we possibly tolerate anything less than perfection? Because perfection doesn't exist.

I've told you the story of Mar-Mar, my Mother-In-Law, and her untimely passing, which was assisted by a radiological miss. My musings at the time are apropos for this discussion:
I've got enough friends who happen to be litigators to know that two things drive a malpractice suit: anger and greed/envy, and they go hand-in-hand. (And as an aside, the majority of cases appear to reach the attention of a lawyer because ANOTHER DOCTOR told the patient that something wasn't done as well as HE would have done it.) As with the young lady driving the beat-up car, an accident or even an incident that approaches such is enough to promote rage in some of us, perhaps even most of us. It doesn't matter that the act was unintentional. I did not set out yesterday to trash some kid's little red jalopy. I think it's also reasonable to say that no physician decides some morning to cause harm to his patient. A missed finding, like a parking-lot collision, is an accident. It is not meant to happen, and everyone would prefer that it doesn't. This is where greed and envy can augment the madness of rage. The young lady above, at some level, realized that my truck was likely worth 8-10 times what her beater might bring, and no doubt this got her all the more riled. Why should that doofus have a nice car? Who gave him the right to almost plow into me? He must think he owns the road, having an expensive car like that. I'll show him!

In the case of a miss or other adventure in medical errors, I think the same thing applies, although certainly with a little more justification. There is clearly a relationship between doctor and patient. If something goes wrong, the patient feels betrayed And the patient gets angry. Given the perception of docs as wealthy, the next step in the mental equation may become: he hurt me (or could have hurt me) and he's going to pay! He can afford it!

While a financial award could put a car back together again, it may not be able to fix what was broken by the medical error. Somewhere along the way, our society has decided that money can compensate for the damage, and maybe that is true. However, juries of our "peers" are wont to award huge sums as punitive measure to "punish" the "bad" doctor. And let us not forget the fact that the litigator might receive 30-50% of the proceeds.

This is wrong. The whole scenario is horrible, and accomplishes nothing but padding the pockets of the litigating AND the defending lawyers. It leads to millions and billions of dollars spent for "cover your ass" procedures and tests. And it's all predicated on the anger over an accident and the thought that there might be a gold-mine to be had having won the malpractice lottery. This must stop.

I want this to be Mar-Mar's legacy: we must forgive those who make honest mistakes. We need to remove anger, greed and envy (and lawyers) from the equation, and somehow set up some entity, some body or board, that would determine actual damages and arrange for those to be made as whole as possible, but without multi-million dollar punitive, redistributive, awards. I know this is next to impossible, as there is way too much money to be made by trying "rich" doctors in front of a jury of their "peers" who would love nothing more than to sock it to them. But it is the right thing, and all but those who profit from the malpractice industry, not just the lawyers, but the plaintiff whores who sell their testimony, know that I'm spot on. Mar-Mar would approve.
Hopefully the above discussion of sensitivity and specificity brings this all full-circle. You can see the pressures under which we operate. We are to produce the work with decisive reports one after the other after the other, functioning as Dr. Singh, but we are never to miss anything, wearing the Dr. Jha hat. Why not just do both? Because we are human and humans can't do that.

No doubt Elliot Siegel will eventually teach Watson the Computer to read imaging studies, and then we will achieve perfection. Well, maybe not. But I'd like to see the litigators sue IBM instead of us.

No comments :