Big%20Data

Mayer-Schönberger, V; Cukier, K. (2013). Big Data: A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin. (ePub)

Destaques

Rui Alexandre Grácio [2024]

“There was a shift in mindset about how data could be used.
Data was no longer regarded as static or stale, whose usefulness was finished once the purpose for which it was collected was achieved, such as after the plane landed (or in Google’s case, once a search query had been processed). Rather, data became a raw material of business, a vital economic input, used to create a new form of economic value. In fact, with the right mindset, data can be cleverly reused to become a fountain of innovation and new services. The data can reveal secrets to those with the humility, the willingness, and the tools to listen.

Letting the data speak
 
The fruits of the information society are easy to see, with a cellphone in every pocket, a computer in every backpack, and big information technology systems in back offices everywhere. But less noticeable is the information itself. Half a century after computers entered mainstream society, the data has begun to accumulate to the point where something new and special is taking place. Not only is the world awash with more information than ever before, but that information is growing faster. The change of scale has led to a change of state. The quantitative change has led to a qualitative one. The sciences like astronomy and genomics, which first experienced the explosion in the 2000s, coined the term “big data.” The concept is now migrating to all areas of human endeavor.”

“One way to think about the issue today—and the way we do in the book—is this: big data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more.
But this is just the start. The era of big data challenges the way we live and interact with the world. Most strikingly, society will need to shed some of its obsession for causality in exchange for simple correlations: not knowing why but only what. This overturns centuries of established practices and challenges our most basic understanding of how to make decisions and comprehend reality.”

“At its core, big data is about predictions. ”

“Just as the Internet radically changed the world by adding communications to computers, so too will big data change fundamental aspects of life by giving it a quantitative dimension it never had before.”

“Often, big data is messy, varies in quality, and is distributed among countless servers around the world. With big data, we’ll often be satisfied with a sense of general direction rather than knowing a phenomenon down to the inch, the penny, the atom. We don’t give up on exactitude entirely; we only give up our devotion to it. What we lose in accuracy at the micro level we gain in insight at the macro level.
These two shifts lead to a third change, which we explain in Chapter Four: a move away from the age-old search for causality. As humans we have been conditioned to look for causes, even though searching for causality is often difficult and may lead us down the wrong paths. In a big-data world, by contrast, we won’t have to be fixated on causality; instead we can discover patterns and correlations in the data that offer us novel and invaluable insights. The correlations may not tell us precisely why something is happening, but they alert us that it is happening.”

“Big data is about what, not why. We don’t always need to know the cause of a phenomenon; rather, we can let data speak for itself.”

“There’s no good term to describe what’s taking place now, but one that helps frame the changes is datafication, a concept that we introduce in Chapter Five. It refers to taking information about all things under the sun—including ones we never used to think of as information at all, such as a person’s location, the vibrations of an engine, or the stress on a bridge—and transforming it into a data format to make it quantified. This allows us to use the information in new ways, such as in predictive analysis: detecting that an engine is prone to a break-down based on the heat or vibrations that it produces. As a result, we can unlock the implicit, latent value of the information.”

“It leads to an ethical consideration of the role of free will versus the dictatorship of data.”

“We’re entering a world of constant data-driven predictions where we may not be able to explain the reasons behind our decisions.”

“We’re entering a world of constant data-driven predictions where we may not be able to explain the reasons behind our decisions.”

“Big data marks an important step in humankind’s quest to quantify and understand the world. A preponderance of things that could never be measured, stored, analyzed, and shared before is becoming datafied.”

“As the world shifts from causation to correlation, how can we pragmatically move forward without undermining the very foundations of society, humanity, and progress based on reason? This book intends to explain where we are, trace how we got here, and offer an urgently needed guide to the benefits and dangers that lie ahead.”

“We tend to think of statistical sampling as some sort of immutable bedrock, like the principles of geometry or the laws of gravity. But the concept is less than a century old, and it was developed to solve a particular problem at a particular moment in time under specific technological constraints. Those constraints no longer exist to the same extent. Reaching for a random sample in the age of big data is like clutching at a horse whip in the era of the motor car. We can still use sampling in certain contexts, but it need not—and will not—be the predominant way we analyze large datasets. Increasingly, we will aim to go for it all.”

“USING ALL AVAILABLE DATA is feasible in an increasing number of contexts. But it comes at a cost. Increasing the volume opens the door to inexactitude.”

“For much of history, humankind’s highest achievements arose from conquering the world by measuring it. The quest for exactitude began in Europe in the middle of the thirteenth century, when astronomers and scholars took on the ever more precise quantification of time and space—“the measure of reality,” in the words of the historian Alfred Crosby.
If one could measure a phenomenon, the implicit belief was, one could understand it. Later, measurement was tied to the scientific method of observation and explanation: the ability to quantify, record, and present reproducible results. “To measure is to know,” pronounced Lord Kelvin. It became a basis of authority. “Knowledge is power,” instructed Francis Bacon. In parallel, mathematicians, and what later became actuaries and accountants, developed methods that made possible the accurate collection, recording, and management of data.”

“Moving into a world of big data will require us to change our thinking about the merits of exactitude. To apply the conventional mindset of measurement to the digital, connected world of the twenty-first century is to miss a crucial point. As mentioned earlier, the obsession with exactness is an artifact of the information-deprived analog era. When data was sparse, every data point was critical, and thus great care was taken to avoid letting any point bias the analysis.
Today we don’t live in such an information-starved situation. In dealing with ever more comprehensive datasets, which capture not just a small sliver of the phenomenon at hand but much more or all of it, we no longer need to worry so much about individual data points biasing the overall analysis. Rather than aiming to stamp out every bit of inexactitude at increasingly high cost, we are calculating with messiness in mind.”

“When the quantity of data is vastly larger and is of a new type, exactitude in some cases is no longer the goal so long as we can divine the general trend. Moving to a large scale changes not only the expectations of precision but the practical ability to achieve exactitude. Though it may seem counterintuitive at first, treating data as something imperfect and imprecise lets us make superior forecasts, and thus understand our world better.”

“But the idea of “a single version of the truth” is doing an about-face. We are beginning to realize not only that it may be impossible for a single version of the truth to exist, but also that its pursuit is a distraction. To reap the benefits of harnessing data at scale, we have to accept messiness as par for the course, not as something we should try to eliminate.”

“As radical a transformation as these shifts in mindset are, they lead to a third change that has the potential to upend an even more fundamental convention on which society is based: the idea of understanding the reasons behind all that happens. Instead, as the next chapter will explain, finding associations in data and acting on them may often be good enough.”

“Knowing why might be pleasant, but it’s unimportant for stimulating sales. Knowing what, however, drives clicks. This insight has the power to reshape many industries, not just e-commerce. Salespeople in all sectors have long been told that they need to understand what makes customers tick, to grasp the reasons behind their decisions. Professional skills and years of experience have been highly valued. Big data shows that there is another, in some ways more pragmatic approach. Amazon’s innovative recommendation systems teased out valuable correlations without knowing the underlying causes. Knowing what, not why, is good enough.”

“Of course, correlations cannot foretell the future, they can only predict it with a certain likelihood. But that ability is extremely valuable.”

“Predictions based on correlations lie at the heart of big data.”

“The correlations show what, not why, but as we have seen, knowing what is often good enough.”

“But most important, these non-causal analyses will aid our understanding of the world by primarily asking what rather than why.”

“Rather, when we say that humans see the world through causalities, we’re referring to two fundamental ways humans explain and understand the world: through quick, illusory causality; and via slow, methodical causal experiments. Big data will transform the roles of both.”

“In many cases, the deeper search for causality will take place after big data has done its work, when we specifically want to investigate the why, not just appreciate the what.
Causality won’t be discarded, but it is being knocked off its pedestal as the primary fountain of meaning. Big data turbocharges non-causal analyses, often replacing causal investigations.”

“Big data itself is founded on theory. For instance, it employs statistical theories and mathematical ones, and at times uses computer science theory, too. Yes, these are not theories about the causal dynamics of a particular phenomenon like gravity, but they are theories nonetheless. And, as we have shown, models based on them hold very useful predictive power. In fact, big data may offer a fresh look and new insights precisely because it is unencumbered by the conventional thinking and inherent biases implicit in the theories of a specific field.”

“The age of big data clearly is not without theories—they are present throughout, with all that this entails.
Anderson deserves credit for raising the right questions—and doing so, characteristically, before others. Big data may not spell the “end of theory,” but it does fundamentally transform the way we make sense of the world.”

“To datafy a phenomenon is to put it in a quantified format so it can be tabulated and analyzed.”

“Again, this is very different from digitization, the process of converting analog information into the zeros and ones of binary code so computers can handle it. Digitization wasn’t the first thing we did with computers. The initial era of the computer revolution was computational, as the etymology of the word suggests. We used machines to do calculations that had taken a long time to do by previous methods: such as missile trajectory tables, censuses, and the weather.”

“In order to capture quantifiable information, to datafy, we need to know how to measure and how to record what we measure. This requires the right set of tools. It also necessitates a desire to quantify and to record. Both are prerequisites of datafication, and we developed the building blocks necessary for datafication many centuries before the dawn of the digital age.”

“The ability to record information is one of the lines of demarcation between primitive and advanced societies. Basic counting and measurement of length and weight were among the oldest conceptual tools of early civilizations. By the third millennium B.C. the idea of recorded information had advanced significantly in the Indus Valley, Egypt, and Mesopotamia. Accuracy increased, as did the use of measurement in everyday life. The evolution of script in Mesopotamia provided a precise method of keeping track of production and business transactions. Written language enabled early civilizations to measure reality, record it, and retrieve it later. Together, measuring and recording facilitated the creation of data. They are the earliest foundations of datafication.”

“The next frontiers of datafication are more personal: our relationships, experiences, and moods. The idea of datafication is the backbone of many of the Web’s social media companies. Social networking platforms don’t simply offer us a way to find and stay in touch with friends and colleagues, they take intangible elements of our everyday life and transform them into data that can be used to do new things.”

“Because of smartphones and inexpensive computing technology, datafication of the most essential acts of living has never been easier. ”

“The enthusiasm over the “internet of things”—embedding chips, sensors, and communications modules into everyday objects—is partly about networking but just as much about datafying all that surrounds us.”

“In contrast, datafication represents an essential enrichment in human comprehension. With the help of big data, we will no longer regard our world as a string of happenings that we explain as natural or social phenomena, but as a universe comprised essentially of information.”

“Seeing the world as information, as oceans of data that can be explored at ever greater breadth and depth, offers us a perspective on reality that we did not have before. It is a mental outlook that may penetrate all areas of life. Today, we are a numerate society because we presume that the world is understandable with numbers and math. And we take for granted that knowledge can be transmitted across time and space because the idea of the written word is so ingrained. Tomorrow, subsequent generations may have a “big-data consciousness”—the presumption that there is a quantitative component to all that we do, and that data is indispensable for society to learn from. ”

“What makes our era different is that many of the inherent limitations on the collection of data no longer exist. Technology has reached a point where vast amounts of information often can be captured and recorded cheaply. Data can frequently be collected passively, without much effort or even awareness on the part of those being recorded. And because the cost of storage has fallen so much, it is easier to justify keeping data than discarding it. All this makes much more data available at lower cost than ever before. Over the past half-century, the cost of digital storage has been roughly cut in half every two years, while storage density has increased 50 million-fold. In light of informational firms like Farecast or Google—where raw facts go in at one end of a digital assembly line and processed information comes out at the other—data is starting to look like a new resource or factor of production.”

“Ultimately, the value of data is what one can gain from all the possible ways it can be employed. These seemingly infinite potential uses are like options—not in the sense of financial instruments, but in the practical sense of choices. The data’s worth is the sum of these choices: the “option value” of data, so to speak. In the past, once data’s main use was achieved we often thought the data had fulfilled its purpose, and we were ready to erase it, to let it slip away. After all, it seemed the key worth had been extracted. In the big-data age, data is like a magical diamond mine that keeps on giving long after its principal value has been tapped. There are three potent ways to unleash data’s option value: basic reuse; merging datasets; and finding “twofers.”

“Despite the rosy benefits, however, there are also reasons to worry. As big data makes increasingly accurate predictions about the world and our place in it, we may not be ready for its impact on our privacy and our sense of freedom. Our perceptions and institutions were constructed for a world of information scarcity, not surfeit. We explore the dark side of big data in the next chapter.”

“Big data erodes privacy and threatens freedom. But big data also exacerbates a very old problem: relying on the numbers when they are far more fallible than we think.”

“As we have seen, big data allows for more surveillance of our lives while it makes some of the legal means for protecting privacy largely obsolete. It also renders ineffective the core technical method of preserving anonymity. Just as unsettling, big-data predictions about individuals may be used to, in effect, punish people for their propensities, not their actions. This denies free will and erodes human dignity.
At the same time, there is a real risk that the benefits of big data will lure people into applying the techniques where they don’t perfectly fit, or into feeling overly confident in the results of the analyses. As big-data predictions improve, using them will only become more appealing, fueling an obsession over data since it can do so much. (…)
We must guard against overreliance on data rather than repeat the error of Icarus, who adored his technical power of flight but used it improperly and tumbled into the sea.”

“As the world moves toward big data, society will undergo a similar tectonic shift. Big data is already transforming many aspects of our lives and ways of thinking, forcing us to reconsider basic principles on how to encourage its growth and mitigate its potential for harm.”

“Courts of law hold people responsible for their actions. When judges render their impartial decisions after a fair trial, justice is done. Yet, in the era of big data, our notion of justice needs to be redefined to preserve the idea of human agency: the free will by which people choose their actions. It is the simple idea that individuals can and should be held responsible for their behavior, not their propensities.
Before big data, this fundamental freedom was obvious. So much so, in fact, that it hardly needed to be articulated. After all, this is the way our legal system works: we hold people responsible for their acts by assessing what they have done. In contrast, with big data we can predict human actions increasingly accurately. This tempts us to judge people not on what they did, but on what we predicted they would do.
In the big-data era we will have to expand our understanding of justice, and require that it include safeguards for human agency as much as we currently protect procedural fairness. Without such safeguards the very idea of justice may be utterly undermined.”

“A fundamental pillar of big-data governance must be a guarantee that we will continue to judge people by considering their personal responsibility and their actual behavior, not by “objectively” crunching data to determine whether they’re likely wrongdoers. Only that way will we treat them as human beings: as people who have the freedom to choose their actions and the right to be judged by them.”

“Big data operates at a scale that transcends our ordinary understanding.”

“Just as the printing press led to changes in the way society governs itself, so too does big data. It forces us to confront new challenges with new solutions. To ensure that people are protected at the same time as the technology is promoted, we must not let big data develop beyond the reach of human ability to shape the technology.”

“The effects of big data are large on a practical level, as the technology is applied to find solutions for vexing everyday problems. But that is just the start. Big data is poised to reshape the way we live, work, and think. The change we face is in some ways even greater than those sparked by earlier epochal innovations that dramatically expanded the scope and scale of information in society. The ground beneath our feet is shifting. Old certainties are being questioned. Big data requires fresh discussion of the nature of decision-making, destiny, justice. A worldview we thought was made of causes is being challenged by a preponderance of correlations. The possession of knowledge, which once meant an understanding of the past, is coming to mean an ability to predict the future.
These issues are much more significant than the ones that presented themselves when we prepared to exploit e-commerce, live with the Internet, enter the computer age, or take up the abacus. The idea that our quest to understand causes may be overrated—that in many cases it may be more advantageous to eschew why in favor of what—suggests that the matters are fundamental to our society and our existence”.

“Ultimately, big data marks the moment when the “information society” finally fulfills the promise implied by its name. The data takes center stage. All those digital bits that we have gathered can now be harnessed in novel ways to serve new purposes and unlock new forms of value. But this requires a new way of thinking and will challenge our institutions and even our sense of identity. The one certainty is that the amount of data will continue to grow, as will the power to process it all. But where most people have considered big data as a technological matter, focusing on the hardware or the software, we believe the emphasis needs to shift to what happens when the data speaks.”

“(…) changes our idea of what constitutes useful information.
Instead of obsessing about the accuracy, exactitude, cleanliness, and rigor of the data, we can let some slack creep in. We shouldn’t accept data that is outright wrong or false, but some messiness may become acceptable in return for capturing a far more comprehensive set of data. In fact, in some cases big and messy can even be beneficial, since when we tried to use just a small, exact portion of the data, we ended up failing to capture the breadth of detail where so much knowledge lies.”

“Because correlations can be found far faster and cheaper than causation, they’re often preferable. We will still need causal studies and controlled experiments with carefully curated data in certain cases, such as designing a critical airplane part. But for many everyday needs, knowing what not why is good enough. And big-data correlations can point the way toward promising areas in which to explore causal relationships.”

“New tools, from faster processors and more memory to smarter software and algorithms, are only part of the reason we can do all this. While the tools are important, a more fundamental reason is that we have more data, since more aspects of the world are being datafied. To be sure, the human ambition to quantify the world long predated the computer revolution. But digital tools facilitate datafication greatly. ”

“Punishment on this basis negates the concept of free will and denies the possibility, however small, that a person may choose a different path. As society assigns individual responsibility (and metes out punishment), human volition must be considered inviolable. The future must remain something that we can shape to our own design. If it does not, big data will have perverted the very essence of humanity: rational thought and free choice.”

“Big data will become integral to understanding and addressing many of our pressing global problems. Tackling climate change requires analyzing pollution data to understand where best to focus our efforts and find ways to mitigate problems. The sensors being placed all over the world, including those embedded in smartphones, provide a cornucopia of data that will let us model global warming at a better level of detail. ”

“As big data transforms our lives—optimizing, improving, making more efficient, and capturing benefits—what role is left for intuition, faith, uncertainty, and originality?
If big data teaches us anything, it is that just acting better, making improvements—without deeper understanding—is often good enough. Continually doing so is virtuous.”

“If so, then there will be a special need to carve out a place for the human: to reserve space for intuition, common sense, and serendipity to ensure that they are not crowded out by data and machine-made answers. What is greatest about human beings is precisely what the algorithms and silicon chips don’t reveal, what they can’t reveal because it can’t be captured in data. It is not the “what is,” but the “what is not”: the empty space, the cracks in the sidewalk, the unspoken and the not-yet-thought.”

“What we are able to collect and process will always be just a tiny fraction of the information that exists in the world. It can only be a simulacrum of reality, like the shadows on the wall of Plato’s cave. Because we can never have perfect information, our predictions are inherently fallible. This doesn’t mean they’re wrong, only that they are always incomplete. It doesn’t negate the insights that big data offers, but it puts big data in its place—as a tool that doesn’t offer ultimate answers, just good-enough ones to help us now until better methods and hence better answers come along. It also suggests that we must use this tool with a generous degree of humility . . . and humanity.”

logos%20cllc

Última atualização em 1 de maio de 2025