Perpetuating Data Hazards

Thoughts on data hazards.

By Zoë Turner

September 27, 2022

I once worked as a temporary administrator for the local fire service which involved logging queries from the public. One such query was a person who contacted the team to ask for the sign on their boundary wall, it’s yellow background with a black capital H sign, to be removed because it was unsightly. What that person didn’t realise, and nor did I until that moment, was that these signs which are everywhere on the streets in the UK, are fire hydrant locations. Unlike in the films from the US where the hydrant is above ground, often red and has a distinctive shape, is a small sign in the UK that give measurements as to distance from the plaque in metres and the size of the main that feeds it in millimetres.

These signs are so familiar that I never questioned them; a bit like a data hazard.

What is a data hazard?

Symbols are everywhere, and no less in the world of hazards. In the UK we have COSHH - Control of Substances Hazardous to Health which is succintly conveyed on things like bottles of bleach in a symbol to ensure awareness of a potential hazard. People may not know all of these symbols but the overall shape, location and colour (black in a red rhombus shape outline) catch people’s attention and we might vaguely recognise “hazard” even if we don’t know the precise nature of the hazard. The creators of the Data Hazards, which follow the imagery theme of COSHH, say the images are:

Attention-grabbing, asking people to stop and think, and take the safety precautions seriously, rather than as an optional extra.

We’re asking people to “handle with care”, not to stop doing the work. We still use chemicals, but we think about how it can be done safely and how to avoid emergencies.

They are familiar, especially to scientists, who (within universities) tend to have the least experience of applying ethics.

https://datahazards.com/index.html

This idea of highlighting hazards is wonderful, it really makes you consider the potential problems, even though these may be very remote risks. For data, this could extend beyond what you do with the data, for example, to what others could feasibly do with it. This inevitably leads to considering the possible precautions so that must all be a good thing - right?

Perpetuating risk

The problem, if it could be called that, with looking at your data tasks with a “Data Hazard” view is that you suddenly may see things you didn’t before because it was just accepted. Like the story of the fire hydrant, it’s always been there, no one questioned it, until one day someone asked for it to be removed.

What do you do when the sudden realisation that something you’ve taken for granted is actually hugely risky and it’s never been documented or precautions put in place? Well, this is where moral dilemmas come into their own. Do you:

  • keep quiet - after all this has always happened and nothing bad has occurred (so far and insofar as you know),
  • keep quiet - as you may not be senior enough to do anything anyway or
  • keep quiet - if you are senior as this could be very embarrassing
  • try to introduce the idea of data hazards so people understand them first and then reassess what to do or
  • just go ahead and raise the alarms?

The answer will depend on a number of factors and not all around you as a person ( psychological safety is a huge factor) but you can see already there are only really 3 options: inaction, introduce data hazards (then perhaps inaction hoping someone else will do something) or raise the issue somehow.

One way it to raise this formally is by making it an incident or a near-miss, but that process can feel frightening or too over the top for remote hypotheticals. Fear is a big consideration and is one of the reasons I use incident data cautiously in analysis as the act of raising an incident is a process in itself. To say that violent incidents have increased, for example, could be obscured by the fact that the “recording” of incidents increased and the incidents were always there.

I think we all know the answer to many moral dilemmas - but often in the abstract. Inaction in the face of massive change can feel easier, if uncomfortable, but in reflecting upon that story of the hydrant sign, the person may have felt a bit embarrassed when they were told of its purpose but their question enlightened me. They didn’t know I would learn from their request but I did and now I’m telling you (you may have already known but not everyone does it seems: “People in disbelief after finding out the meaning of ‘the’H’ signs on lampposts” [Daily Record] 6 Sep 2022). What mechanisms do we have, as analysts and data scientists for raising and sharing these data ethical dilemmas? Could ethics committees support us after all this data is often open to abuse and can harm patients? I’m not sure of the answers, I just know that, at this point in time, I’m not sure we have any options to choose from.

NHS Digital have a page for Clinical risk that includes an Excel spreadsheet Hazard log.

The Data Hazards website is a great resource to explore and is open to contribution and resuse.

Image

Photo by Nathan Cowley: https://www.pexels.com/photo/pink-flowers-photography-1128797/