Big Data: A fine product of the 'petabyte' revolution?

Abstract

Data is everywhere and in everything. It’s growing at a rapid place too. It’s said that nearly 90% of the data we see today was created in just the last two years. The figure 4.4 zettabytes blow our minds since our minds just cannot make sense of it all. Our human brains are simply incapable of processing and analysing such vast qualities of data. In this essay, the relatively newly coined terms – Datafication, Dataism and Dataveillance shows three dimensions to a collection of Big Data. The argument of whether Big Data is a fine product of our petabyte revolution or not is discussed through each of the above aspects.

Introduction
Scientific Paradigm VS Ideology
Datafication
Dataism
Dataveillance
Conclusion

INTRODUCTION

Data is everywhere and in everything; it’s all around us and its growing at a rapid pace. According to Steve Lohr, who is an expert in fields of technology, suggests nearly 90% of all the data in history was created in the last two years. According to Data Corporation, the data universe is calculated to be 4.4 zettabytes or 4.4 trillion gigabytes. This phenomenon is what gave rise to a relatively new coined word, ‘Big Data’ (Lohr, 2015).

So what exactly is data? In simple terms, these are sets of values for qualitative or quantitative variables that are being measured, collected, reported and analysed on a continuous basis. But raw data is unfathomable, undefinable, unaccountable, and unintelligible. It becomes somewhat fathomable, definable, accountable, and intelligible when we work on it; transforming raw data into action-guiding wisdom through collecting, organising, summarising, analysing, synthesizing and lastly decision making (Mohanty, Jagadeesh, & Srivatsa, 2013).

Having the data by itself is of no use especially when looked at in business context. Businesses are increasingly eager to learn more about their market, customers, suppliers, operations, and their networks. They would like to make sense of all the data collected in every part of their business process with an end goal of gaining greater market share and earning greater profits. With an ever-increasing amount of data, companies face design and architectural challenges in harnessing the power of data they collected by use of various existing systems including Big Data Analytics, Data Warehousing, and Business Intelligence (Ebner, Buhnen, & Urbach, 2014).

In this essay, I am going to take you through the whole idea of the changing perceptions on Big Data by comparing scientific paradigm to ideology. To explain this phenomenon, we will look at the roles Big Data has through relatively newly coined terms – Datafication, Dataism, and Dataveillance. In the end, I will conclude by discussing a little on how these things are constantly shaping the world we live in today and their impact on the personal and corporate level. More so than this, by the end of this essay, the reader might benefit from gaining a greater understanding of Big Data and answer the question on whether it is a fine product of the petabyte revolution.

Scientific Paradigm VS Ideology

This essay tries to examine and make sense of the extent to which scientific paradigm and ideology act in sync and fine lines that exist between distinguishing them about data. To explain this, Van uses relatively newly coined words – Datafication, Dataism, and Dataveillance (Van Dijck, 2014). The world was taken by surprise quite recently when Edward Snowden, the now-famous whistleblower, released rather sensitive data affecting everyone from biggest government agencies to small corporates. It just goes to show the power of data and the seriousness of its impact on the world today (Van Dijck, 2014).

Part of his admittance was his detailed description of “architecture of oppression” whereby N.S.A. contractors could intercept well over three billion phone call interactions between the biggest companies like Facebook, Google, Apple, and others. When such distressing details are released, a lot of publicity and awareness is grown around the power we hand over to tech giants and their invasive tactics to know more about what we do and why we do what we do. It always ends up sparking public debate on privacy matters and stricter privacy laws, but a question that doesn’t escape our minds is inevitable asking how much of our “laws” can protect us from this rapidly growing technology (Van Dijck, 2014).

Not much by the looks of it. According to Mayer-Schoenberger, Viktor, and Cukier, they discuss the concept of making sense of big data in a complex world as a scientific paradigm (Van Dijck, 2014). As Snowden’s documents revealed, a lot of public have placed their faith in institutions that handle their (meta) data on a strong belief that they will be protected at any cost and not necessarily used to study the subjects involved. This study of subjects is therefore closed related to Datafication, which has a growing importance in the world today. It is said that every institution on the planet that has a good amount of data is most likely trying to making sense of it to make itself better at serving its subjects. It’s happening all around the world now (Van Dijck, 2014).

Datafication

Lycett in her article on Datafication illustrates nicely through “three innovative concepts” – Dematerialisation, Liquidity, and Density (Lycett, 2013). He talks about how dematerialisation, simply put, is the ability to separate the informational specific part of an asset or a resource to its contextual use in the physical world. A good example of this is seen through one of the most popular on-demand internet streaming services today, Netflix. In the United States, its disc rental model has standardised and structured metadata around the content of the disc alongside other data such as the subscriber queue. One might wonder what exactly is metadata. In simple terms, Metadata is data that describes other data that is present. And, this is precisely what is exposed through disc rental model of Netflix (Bloach, Craven, Mcginty, & Quealy, 2009).

The second concept that Lycett discusses is termed as Liquefaction (Lycett, 2013). The idea here revolves around how after dematerialisation happens, the information that lies within it can be easily manipulated and moved around. By allowing this happen, whatever resource and activity sets were physically close could be unwrapped and wrapped back again in ways that would have been difficult if done through traditional methods (Lycett, 2013). Not to mention, time-consuming and expensive too. In the example of Netflix, one feature where liquefaction is present is in the increasing pervasiveness of recommendation in the streaming model. It helps provider and subscriber interact in a highly dynamic way allowing them to benefit mutually. Subscribe can watch what they want to, and Netflix can keep them subscribed for longer (Bloach, Craven, Mcginty, & Quealy, 2009).

The third concept that Lycett tries to relate to other discussed is the concept of Density (Lycett, 2013). It’s described as a combination of both concepts discussed earlier. It’s the process of value creation from dematerialisation and liquefication combined. Again, going back to the example of Netflix, the process of dematerialisation apparently has close to 30 million daily plays and 3 million odd searches to power its recommendation (Bloach, Craven, Mcginty, & Quealy, 2009). With both concepts combined, it allowed Netflix to harness the power of value creation through the use of Data. Netflix made a move recently from streaming content which is what it is good at to producing it. Through thorough statistical analysis, it was able to understand and figure out the right combination of genre, actors and directors that are in demand. This is seen through its production of House of Cards television series – a political thriller (Lycett, 2013).

As seen in the examples above, Datafication doesn’t just operate for business processes but many others. Wladawsky-Berger correctly summarises how digital technologies are creating disruptions in established industries through his article on “How Datafication Will Redefine Business and Society” (Wladawsky-Berger, 2015). He says, the rise of Big Data is increasing, as some would put it, adversely affecting on four different areas in day-to-day life. These involve Satisfying our personal behaviour, business processes, cities, and private lives. He relates to how all these four areas are working in sync to disrupt even the most established industries in the market by changing the way we do business with one another and how we manage our companies, lives and cities. He concludes by saying how it has a bigger effect that is realised or known so far from changing the techniques used for the scientific method to changing how we measure our economy and its structure (Wladawsky-Berger, 2015).

Dataism

DeWitt, in his article, “Dataism” defines it as a term coined to label computer art. A form of art that “restates traditional aesthetics through formal practices” (DeWitt, 1989). His consideration of art form labels its creators as “Dataists” similar to that of an artist, but their works differ. Dataists seem to create work that is not the singular form of art but algorithmic procedures within digital datasets that have a symbolic description. In other words, they can be accurately duplicated and widely distributed. Lohr takes a step further in explaining this relatively newly coined term. He presents it as a revolution that’s transforming the way institutions make choices, the way consumers behave, and the way it transforms everything else in between. He discusses a lot on how Big Data allows us to improve on existing methods, analyze data of all types, and make sense of it all through powerful algorithms and machine learning software. Another dimension he takes us through is the consideration of Dataism in a philosophical view. A view that perhaps calls for decisions to be made by advice from complex systemic analysis of an intelligent system (Lohr, 2015).

Data-ism is described as the next phase in technology whereby the huge data sets we acquire is put to good use. These large data sets have a way of enabling people to make decisions that are timely and relevant. Without informed decision making, the adverse effect is very much evidently through unsuccessful businesses in recent times. The lack of adopting newer technologies to analyse complex data sets has put them far behind their competitions and eventually draining out the life-blood of that business or industry.

The need for constant innovation and ongoing evaluation has gained its prominence in this rapid changing technology-oriented world. Without the urge or desire to be one step ahead with innovative products and services, the business is said to be digging its own grave. The data sets acquired by industry or set of businesses operating for a period have a substantial advantage over those that are relatively new to the industry. One can say this owing to Lohr discussion on how the next phase involves businesses to make use of their large datasets for discovery and prediction in virtually every field and industry (Lohr, 2015).

Data science is now at the forefront of small and large businesses as they place their bets on the predictive ability it. The giants in the industry such as IBM are already investing heavily in platforms and architecture that’s bound to affect their future in the industry. It’s concluded as a way of thinking about the future. Institutions and individuals are bound to exploit, protect and manage their data to stay competitive in the years ahead. Big Data, which in itself, explains the process of collecting huge amounts of data from all the consumers is affecting our everyday lives in very real ways (DeWitt, 1989). It raises questions that our current law does not answer. It makes us a question on the true transparencies of companies we trust our data with and the practices they have in place which are likely to have significant implications for everyone living in this century and in the centuries to come (Lohr, 2015).

The real question now is, are we ready to manage its consequences effectively and is it doing a lot more good than bad? That’s a question we might ponder on when we start to understand fully where we are headed and how are lives are affected by unintended consequences of decisions made by the intelligent systems or as one day we might refer to it as, intelligent beings (Van Dijck, 2014). Lohr discussion on this revolutionary phase makes us question a lot of things we have in place now from how we use computers to what we use them for.

Dataveillance

Clarke (1988) defines Dataveillance as “systematic use of personal data systems in the investigation or monitoring of the actions or communications of one or more persons”. This hasn’t changed much from the time she defined it. He also details on how this term is sub-divided to personal surveillance and mass surveillance, each having their meaning within a given context. Personal surveillance is described as an activity that is performed with a positive intention in mind. Most of the reasoning is concentrated on trying to wage against “social evils” such as terrorism (Clarke, 1988). It’s concentrated more on a personal level than anything else. Mass surveillance, on the other hand, is a monitoring activity performed with a large number of subjects or people involved. Its essential idea is to monitor those matching specific criterion of interest to any given surveillance institution.

The article also describes techniques in each and I believe discussing these would enhance our understanding of how Dataveillance is affecting our lives. Personal Dataveillance techniques include integration of data, screening or authentication of transactions, front-end verification of transactions, front-end audit of individuals, and cross-system enforcement against individuals. Mass Dataveillance techniques cover a wide scope. The techniques used in them include screening or authentication of all transactions, front-end verification of all transactions, front-end audit of each, single-factor file analysis of all data held or able to be acquired, profiling/multi-factor file analysis of all data held or able to be acquired (Clarke, 1988).

Going back to Dataveillance as a subject in itself, we can look at what it essentially disrupts. It’s a rather wide subject of privacy. Privacy is defined as “interest that individuals have in sustaining a ‘personal space’, free from interference by other people and organisations (Clarke, 1988). And the protection of ‘privacy’ is attached close to this unique process of achieving a perfect balance between what we believe is our right to a set of data and the competing interests of our personal rights to that very same set of data. One excellent example that explains the break in this process is none other than the infamous Netflix case. Netflix faced heavy penalties when they found that it held data of people’s viewing habits even after they had their accounts deactivated or deleted. It just goes to show what consumers believed was their right to data/transactions had conflicting interests from other parties. In this case, it was Netflix and its related products that made use of this data (Rick, 2011).

With so many conflicting rights to privacy, an article by Amoore and De Goede goes into how we can manage the risks and it explores the idea of a war on terror. Part of the discussion included an introduction to terrorist risk which goes on to describe how terrorists would select their targets and the changes of disputing a terrorist network. It also looks into the idea of a risk-based approach and looks deeper into using data to analyse practices that point towards terrorist financing (Amoore & Goede, 2005). Carrying on from the idea of how one could harness data to tap into the terrorist network and planning to figure out their activities to some extent, Ashworth and Free, try to explain the other side of the board – Marketing Dataveillance and its impact on privacy for everyday consumers.

Their idea explores how the technology used in online marketing platforms has a collective and collaborative ability to gather, enhance and aggregate information almost instantaneously. This collective and collaborative ability bring with it some privacy concerns for online consumers. Using theories of justice, the article looks into how consumers perceive and react to privacy concerns in a digital environment.

Following this, they used collected information and practices to prescribe to firms and regulators on how a consumer responds to breach in privacy in a similar way to that of an unfair exchange of value (Ashworth & Free, 2006). Dataveillance can be summed up by what Clarke says regarding computer matching, that it is a “mass surveillance technique involving the comparison of data about many people which have been acquired from multiple sources” (Clarke, 1994). While it offers considerable financial savings as its biggest benefit, the cost is equally considerable. “Dataveillance is, by its very nature, intrusive and threatening” (Clark, 1988).

Conclusion

Data is everywhere, and it’s really in everything. From waking up every day to going back to bed, we are outputting rapidly increasing the amount of data to multiple sources we encounter. This aggregation of data is raising questions about its potential use and its very real destructive and disruptive power. Looking at Big Data through scientific paradigm and its opposing view of ideology, we were able to understand and see the fine line that exists between distinguishing them about data (Van Dijck, 2014).

A greater understanding of this phenomena was likely to be gained through the relatively newly coined terms that Van Dijck discussed on – Datafication, Dataism, and Dataveillance. Datafication involved looking at Meta Data and how something can be dematerialised, liquefied, to which it adds density (Lycett, 2013). It was clearly illustrated through Netflix example and its use in disc rental model (Bloach, Craven, Mcginty, & Quealy, 2009). Dataism looks into a different realm of using Big Data. It’s said to transform everything from how institutions make choices to how consumers behave.

Decisions are now more reliant on complex algorithms that enable us to make decisions in our current fast moving environments (Lohr, 2015). Dataveillance summarises both concepts by taking into consideration how using of dematerialised data and use of complex algorithms only compound the problem of privacy issues. It’s concluded through a statement from Clarke on how something can offer significant financial savings but also be just as destructive if right constraints aren’t in place. A question we can answer hopefully by the end of this essay is if Big Data is a fine product of the petabyte revolution or is it disguised as harmless sheep but is a vicious wolf.

References

Amoore, L., & Goede, M. D. (2005). Governance, risk and dataveillance in the war on terror. Crime Law Soc Change Crime, Law and Social Change, 43(2-3), 149-173. doi:10.1007/s10611-005-1717-8

Ashworth, L., & Free, C. (2006). Marketing Dataveillance and Digital Privacy: Using Theories of Justice to Understand Consumers’ Online Privacy Concerns. J Bus Ethics Journal of Business Ethics, 67(2), 107-123. doi:10.1007/s10551-006-9007-7

Beynon-Davies, P., Galliers, R., & Sauer, C. (2009). Business information systems. Basingstoke, Hampshire: Palgrave Macmillan.

Bloch M, Cox A, Craven Mcginty J and Quealy K (2009) A Peek into Netflix Queues. Retrieved April 15, 2016, from http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html

Clarke, R. (1988). Information technology and dataveillance. Communications of the ACM Commun. ACM, 31(5), 498-512. doi:10.1145/42411.42413

Clarke, R. (1994). Dataveillance by Governments. Information Technology & People Info Technology & People, 7(2), 46-85. doi:10.1108/09593849410074070

Dewitt, T. (1989). Dataism. Leonardo. Supplemental Issue, 2, 57. doi:10.2307/1557946

Ebner, K., Buhnen, T., & Urbach, N. (2014). Think Big with Big Data: Identifying Suitable Big Data Strategies in Corporate Environments. 2014 47th Hawaii International Conference on System Sciences. doi:10.1109/hicss.2014.466

Lohr, S. (2015). Data-ism: The revolution transforming decision making, consumer behavior, and almost everything else. New York (NY): Harper Business.

Lycett, Mark. ‘Datafication’: Making Sense of (big) Data in a Complex World. European Journal of Information Systems Eur J Inf Syst 22.4 (2013): 381-86. Web.

Mayer-Schönberger, Viktor, and Kenneth Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Houghton Mifflin Harcourt, 2013. Print.

Mohanty, S., Jagadeesh, M., & Srivatsa, H. (2013). Big data imperatives: Enterprise big data warehouse, BI implementations and analytics. New York: Apress.

Rick, C. (2011, October 21). Netflix To Attack Privacy Law As Unconstitutional, Raises Further Privacy Issues. Retrieved May 12, 2016, from http://www.reelseo.com/netflix-privacy-law/

Van Dijck, J. (2014). Datafication, dataism and dataveillance: Big data between scientific paradigm and ideology. Surveillance & Society, 12(2), 197-208. Retrieved from http://ezproxy.auckland.ac.nz/login?url=http://search.proquest.com/docview/1547988865?accountid=8424

Wladawsky-Berger, I. (2015, June 12). How Datafication Will Redefine Business and Society. Retrieved April 15, 2016, from http://blogs.wsj.com/cio/2015/06/12/how-datafication-will-redefine-business-and-society/

AUTHOR
Roshan Roy Jonnalagadda – University of Auckland – ISOM Student