Asking the right questions: big data and civil rights
by Anders Sandberg
Practical Ethics (University of Oxford)
Is this a civil rights issue?
Civil rights deal with ensuring free and equal citizenship in a liberal democratic state. This includes being able to adequately participate in public discussion and decisionmaking, making autonomous choices about how one’s life goes, and avoiding being discriminated against.
That big data analysis infers information about people does not itself affect civil rights: it is at most a privacy issue. It does not affect the moral independence of people. The real issue is how other agents act on this information: we likely do not mind that a computer somewhere knows our innermost secrets if we think it will never act or judge us. But if a person (or institution) can react to this information, then we might already experience chilling effects on freedom of thought or speech. And the act itself may be discriminatory in a wrongful way.
Discrimination is however a complex issue. Exactly what constitutes wrongful discrimination is shaped by complex social codes, sometimes wildly inconsistent. Just consider how churches are able to get away with discriminating against non-believer or wrong-sex applicants (and possibly even sexual preferences) in a way that would be completely impossible for private companies by claiming that these traits actually are highly relevant (and hence discrimination not wrongful) by their views. Groups like sexual minorities and the disabled have gained protection from discrimination following vigorous debate and cultural change. If it is OK to select partners in person based on racial characteristics, should commercial online dating services provide such criteria or are they abetting racism? And so on. Just as it is not possible to decide beforehand what questions might produce discriminatory answers, it might not be easy to tell what behavior is discriminatory before it had been discussed publicly.
Big data analysis might help various forms of discrimination, but also expose it. No doubt more advocacy groups are going to be mining the activities of companies and states to show the biases inherent in the system.
One regulatory challenge with big data and big analytics is that, unlike what the nicknames suggest they can be done on a small scale or in a distributed manner: while there are huge amounts of data out there, collection and analysis is not necessarily located at a few easily regulated major players. While Facebook, Google, Acxiom and the NSA might be orders of magnitude more powerful than small businesses or hobby projects, such projects can still harness enough data and ask problematic questions – especially since they can often piggyback on the infrastructure built by the giants.
A second challenge is that analyzing questions can be done silently and secretly. It can be nearly impossible to tell that an agent has inferred sensitive information and uses it. Sometimes active measures are taken to keep analyzed people in the dark but in many cases the response to the questions can be invisible – nobody notices offers they do not get. And if these absent opportunities start following certain social patterns (for example not offering them to certain races, genders or sexual preferences) they can have a deep civil rights effect – just consider the case of education.
This opacity can occur inside the analyzing organization too. For example, training a machine learning algorithm to estimate the desirability of a loan applicant from available data might produce a system that “knows” the race of applicants and uses it to estimate their suitability (something that would be discriminatory if a human did it). The programmers did not tell it to do this and it might not even be transparent from the outside what is going on (conversely, getting an algorithm to not take race into account in order to follow legal restrictions might also be hard to implement: the algorithm will follow the data, not how we want it to “think”).
A third challenge is that the growth of this infrastructure is not just supported by business interests and government snoops, but by most consumers. We want personalization, even though that means we enter our preferences into various systems. We want ease of use and self-documentation, even though that means we carry smart devices and software that monitor us and our habits. We want self-expression, even though that places our self in the world of data.
The fourth challenge is that what questions are problematic is ill defined. It is not implausible that there exist groups that might be discriminated against on the basis of data mining that are not known as socially salient groups, or that apparently innocuous questions turn out to reveal sensitive information when investigated. This cannot be predicted beforehand.
These challenges suggest that public regulation will not be able to effectively enforce formal rules. Transgressions can occur silently, anywhere and in ways not covered by the rules.
Where does this leave us?
Croll suggests that we should link our data with how it can be used. While technological solutions to this might sometimes be possible, and some standards like creative commons licenses are being spread, he thinks – and I agree – that the real monitoring and enforcement will be social, legal and ethical.
To Read the Entire Post