Big Data, Bad Practices

I’ve just finished reading (well, listening to on Audible actually) Ben Goldacre’s Bad Pharma.  Highly recommended, as is his previous book: Bad Science.

It’s all about dodgy practice in the pharmaceutical industry and is, frankly, terrifying.  Ignoring such dodgy practices as marketing spin, straight-up fraud and bribery there runs a theme that pervades more than just pharmaceuticals (that’s not to say the aforementioned practices don’t!).

The underlying issue, and the root cause of the poor decision making about the drugs that are prescribed in preference to others – which can ultimately cost real lives in the case of pharmaceuticals, and many other industries – is misuse, or all-out abuse, of data.

The maxim “more data = better data” is used so often that it must be true.  Right?  My previous post should give you some indication for how I feel about received wisdom and ‘things having sayings being right’ *shakes fist*.

Of course, more data contains more opportunity to make better decisions, but it also contains more opportunities to find the answer that you’re looking for.  With great power comes great responsibility, and, in the wrong hands, the power of big data can show whatever you hoped it might, whilst ignoring hugely significant trends that contradict the ‘facts’ you hope to present.

Statistical anomalies crop up all the time.  You might call an event with a one-in-a-thousand chance of happening an anomaly, but you only have to run a thousand trials, on average, to see one of these events.  In a world of automation, cheap online surveying and cheap storage and processing of big data, one-in-a-thousand events can be engineered without too much hassle.

Here are some of the common tricks of the Big Data Sith.  For each, have a think about where else this might be going on; whether you might have been a victim of these tactics and whether you might even have been (unwittingly, I’ll assume) complicit in such activities.

Switching of outcomes

Elaborating on the idea that any particular one-in-a-thousand anomaly is easy to engineer, it’s even easier to engineer a one-in-a-thousand anomaly if you’re not fussy about which one-in-a-thousand anomaly you find.

If your trial fails to provide adequate evidence for the outcome you wanted, then you could always mine the data until you find an alternative outcome that you’re still happy to slap on the marketing materials for your drug.

This is called switching of outcomes, and is a not-uncommon practice in pharmaceuticals and elsewhere.

The drug may not cure alopecia after all, but it does seem to reduce the effects of vertigo in males aged between 33-51 living in wales.  Clearly this is a statistical anomaly, and it’s essential that if ever we witness outcome switching like this that we insist on a follow-up trial with this as the declared primary outcome with an aim to replicate the results.

Surrogate outcomes

The example given in the drugs industry would be something like measuring a drugs effect on blood pressure as a proxy for risk of stroke.

Elsewhere it might be the impact that an ad has on brand awareness as a proxy for sales.  Beware, beware – being aware of a brand does not mean that you buy the brand’s product just as a short-term reduction in blood pressure does not necessarily reduce the risk of stroke.

If anyone ever tries to pull a fast-one and fob you off with a surrogate outcome, then you must demand evidence that the surrogate is a good proxy for the primary outcome.

Destruction of negative trials

It is, according to Ben, and I think he’d have been sued out of existence had it not been true, quite common practice in the pharmaceuticals industry to run ‘n’ trials, and simply to throw away the ones that don’t support your desired result.

Let’s stick with this a second so that the gravity of that statement can sink in.  A pharmaceutical company will, of course, want to produce trials results that show that their drug is effective at curing whatever ailment at which it is targeted, and that it does so without harmful side-effects.  A drug company will set up ‘n’ trials, and simply bury those that don’t support this outcome.  This means that evidence that the drug is either ineffective, or causes undesirable side-effects, or both, is simply destroyed.  The outcome of the trials that support their desired message is cherry-picked, submitted for review and spun into bold claims for the company’s marketing teams.  This is a real thing, that is happening right now, and it’s perfectly legal.

Don’t be afraid to have a third party run an audit on a trial or test that has been run on your behalf, and make sure that you have access to ALL of the data from the trial.

Selective data sampling

There’s a lovely analogy in the book that perfectly demonstrates this, so I won’t construct another.  Imagine a Christmas pudding with several coins in it.  You can estimate the number of coins in the pudding by taking a slice, counting the number of coins in that slice, and then extrapolate this to the rest of the pudding.

You’re making an assumption about the distribution of coins in the pudding by doing this, and it’s important to be mindful of your assumptions, and to challenge them (and this goes for just about every aspect of life).  The assumption that you’re making is that the coins are uniformly distributed throughout the pudding.  In other words, that the slice you’ve taken is typical.

What if the guy that took the slice had an x-ray machine, and used knife-craft that would make a Samurai blush to carve out a slice that contained almost all of the coins in the pudding?  You’d extrapolate this intentionally misrepresentative slice across the entire pudding and think that it contained a small fortune.

Again, don’t be afraid to have someone audit the methodologies of those that have run a trial for you.  Be prepared to ask some pretty detailed questions about data sampling methodologies.

Testing against something naff

Sticking with pharmaceuticals, in much the same way as results and data are cherry-picked, as is the straw man against which the drug is tested.  It’s amazing how often a claim like “5% better” is accepted without the obvious question of “better than what…” being asked.

Oftentimes, what the drug was actually better than was a placebo.  There may have been a dozen drugs already on the market that are 10% better than placebo.  Naturally, this isn’t mentioned.

Ask questions, make sure you’re not having the results of a trial oversold to you.

Testing in a misrepresentative environment

The outsourcing of drugs trials, often to developing nations and those with less stringent regulatory environments, is also a common practice.  Moral arguments aside, the assumption here is that the results from trials in these populations is applicable to the population in which it is to be released.

In the book, Ben asks whether a poor person suffering from depression in China will respond to medication in the same way as a wealthy Californian.  It’s a fair challenge, and one that applies equally to a marketing message being tested against in the north and applied to the south.

Ask question, have someone else check their homework, make sure that the results you’re being sold are applicable to the market in which you operate.

So there we go.  There are a lot of bad analysts and agencies out there, but I should close by mentioning that there are also a fair share of bad clients.  We have been asked several times to apply dodgy practices to engineer a certain outcome, and our answer has always been, and will always be, a resounding “no”.

So, if you’re inclined towards the dark side, then please don’t contact us!