AI Futures: The dangers of data colonialism and the marginalized
The adage ‘data is the new oil’ is a now common refrain, referring to data as the fuel that drives the digital economy, as a broader statement, or drives the development of AI as a more specific statement of digital progress. Given the fundamental impact of data it should be no surprise that the value of data is becoming an increasingly relevant factor in the pursuit of improved AI models. There is a constant search for the ‘right’ data at the ‘right’ price that can yield the most predictive value for training AI models. In the ‘data rush’ of the early 21st century, like most resource rushes of previous centuries, there are winners and losers and we have yet to see the full impact of this phenomenon.
Couldry and Meijas (2019) coined the term data colonialism, somewhat akin to historical colonialism, as a lens through which to recognise that data relations can lead to the exploitation of humans with a potential for social discrimination, behavioural influence, and ultimately marginalization of economies. Just as in historical colonialism, the colonised are set to lose through an imbalance of value exchange as colonisers, whether corporations or nation states, seek to control data flows and landscapes, and ultimately influence the global digital economy. This can, for example, be seen directly in the exploitation of workers (typically in developing economies) hired to process training data for machine learning algorithms. Arguments persist as to the level of exploitation, after all, workers are compensated for their labour, but there is little doubt that the bulk of the value generated by the eventual algorithm is likely to accrue to the ‘data barons’. The digital divide that continues to prevent marginalised workers from seeking fairer rents, is slowly becoming a digital abyss as technology, access, information, and education appear evermore out of reach.
It is worth noting that data colonisers are not restricted to particular parts of the world, and the data colonised and marginalised can reside anywhere across the globe, including richer nation states. There is today a near constant scramble for access to an individual’s data. Not everyone is equipped to understand how to control their personal data, to know what is being done with it, and to resist the attempts to influence their behaviour. Despite legislators’ and regulators’ attempts to aid them, the value of data is such that it is worth pursuing at significant cost.
Understanding who the marginalised is key to ensuring that they are not only protected but are also able to engage with and manage data in all aspects of their lives. This is not necessarily a new call to arms. Many, such as proponents of ICT for development (ICT4D), have been calling for an increasing focus on digital inclusion for a long while now. The difference today is that digital inclusion needs to expand beyond aspects such as access and education. There also needs to be consideration of aspects such as a balancing of power to address the question of control of data, and a more effective calculation of the value exchange in data flows. There are also wider considerations to limit social discrimination and encourage suitable representation in datasets that affect individuals.
The danger is that if nothing is done, or whatever is done is inadequate, then inequalities could spiral out of control. Whether we seek to encourage governments to control and protect data more directly, or develop ethical standards and measures for AI companies and their technologies, the time for debate is here and the need for action is pressing.
References
Couldry, N. and Mejias, U.A., 2019. Data colonialism: Rethinking big data’s relation to the contemporary subject. Television & New Media, 20(4), pp.336-349.