Big data is a term juicy and nebulous enough to take on many different definitions and orientations. A reference to scale (of data), speed (of processing and analyzing) and complexity (of the algorithms employed) at minimum, big data is heralded by Forbes as “the hottest sector in IT at the moment” and just as regularly explored as a tangle of political and ethical concerns. But most importantly perhaps for the conversations we have been having over the last two semester, I think big data means something quite different for the humanities and the social sciences. Typically for the humanities, the stirrings of data is seen as an exciting opportunity—as James Grossman notes the techniques and technologies of big data not only offer new opportunities of analysis and collaboration for historians, but also the possibility of non-academic employment opportunities (as well as academic one, one hopes). For the social sciences, and sociology in particular, thinking “data” is familiar territory—indeed as Burrows and Savage note academic sociologists were pioneers in data collection and analysis, particularly in the development of the social survey. One might say that social data collection is deeply methodologically and epistemologically foundational to sociology. Thus some social scientists, most prominently perhaps Bruno Latour, have argued that large scale data mining and visualization can perhaps allow for a reconfiguration of sociological epistemology, away from the levels of structure and individual. Lev Manovich writes of something similar when he critically analyzes the notion that new computational tools might mean that “we no longer have to choose between data size and data depth.”
While some are optimistic about what these new technologies might mean for the social sciences, others see more of a threat than opportunity (Burrows and Savage call it the “Coming Crisis of Empirical Sociology”), in large part because both the collection and analytics of this data has largely occurred outside the academy. Nigel Thrift has written about this as the emergence of “knowing capitalism,” when “capitalism began to intervene in, and make a business out of, thinking the everyday” (Knowing Capitalism, p.1). Indeed, it would seem that in for both humanists and social scientists working with truly big data requires, as Manovich notes, a reliance upon either the state (the military and domestic security apparatus being most data hungry parts of the state) or more likely privately owned, profit minded collections, many of which are only in part publicly accessible. What are the political ramifications of working with data that has been collected and (often) organized for purposes that rarely include critical inquiry (mostly, of course, the purpose is to sell stuff)?
As you probably have guessed at this point, my engagement with the phenomena known as “big data” has thus far been mostly been theoretical in nature, so in the interest of not letting that completely dominate my post here, I’m going stop now and pivot to some more practical concerns.
At their best, interactive web-based visual visualizations open up the possibility of some level of data “transparency” and even manipulation (A notable example being this CUNY designed slider map showing changes in NYC’s racial demographics). As “Tooling up for the Digital Humanities” notes, this allows the possibility of users themselves finding novel patterns and correlations. Unfortunately this level of transparency and interactivity still seems to be rare—more frequently data visualization is an increasingly popular way to tell a particular story, usually a relatively simple one. Tooling Up writes that “many viewers are not necessarily used to reading visualizations critically,” but perhaps there is something about engagement with data through visualization that thwarts criticality, or at least pushes towards simpler rather than complex answers to social questions. I’ll occasionally give class assignments where students will have to find a data visualization around a certain topic that they find particularly compelling, and quite often they bring me well-designed infographics that clearly tell a relatively simple story about a complex topic. Can we beautifully and clearly convey nuance and complexity in data visualizations?
More interesting to me than using visualizations to tell a story are the possibilities of data visualization as a research method for seeing patterns and tracing connections. Simply visualizing data by time or geography can reveal surprising patterns and connections. Mapping is an obvious area where visualizing can help us quickly see interesting patterns and connections in data, but it seem clear like things like googles ngram viewer and even word clouds could be useful tools in early stages of a research project. Has anyone used visualization tools in actually developing lines of critical inquiry?
Ben, I really liked your point about using big data to connect patterns and find meaning in work. I’ve found visualization techniques (even things as simplistic as word clouds or mind maps) to help on projects– to see which ideas I keep circling back to and what *actually* seems to be guiding my paper, which is often different than the intentions I set for myself at the beginning of a project. I’ve also found data visualization helpful in collaboration…in a recent project, it seemed that my collaborator and I weren’t able to access the same vocabulary for our experiences and represent what we wanted to come out of our work together, but a mind mapping tool helped us to synthesize our ideas across disciplines (and across space, as we live quite a distance apart). I think the potential for developing collaboration through data visualization or even visualizations of concepts or ideas is vast and very helpful personally, although I don’t really interact with other forms of data visualization (thus far) in my scholarly work.
Would love to hear and think more about this; pulling out here so we can bat around in class.
Thought I might add a link to my latest DH post on Visualization techniques I am currently engaging with in my classes with Prof Dwyer and Prof Manovich: http://andersondh2.commons.gc.cuny.edu/2013/04/15/visualization/
And I agree that the paragraph Prof Waltzer pulled from Ben’s commentary is a central concern that is batted around. I see this issue in almost all of my classes, and I would claim that the issue here is in how simple it is to make thwarted visualizations vs how hard it is to make proper ones. If one is willing to rely on a corporate structure like Google (NGram Viewer (among a million other things)) or IBM (Many Eyes) it is not terribly difficult to throw in some data and get a relatively complex looking visualization online. The issue is that it is relatively difficult to get such online programs to do exactly what you want them to do. You will have a visualization that can be passed around to peers, profs, supervisors, etc… but almost never will such a visualization be what you intended. I know this because I’ve seen some peers that refuse to mess with the assigned programming homework in Manovich’s class, and at this point in the semester they are starting to sweat.
I’m still working on honing some actual expertise in R programming and hoping to craft something with some Pygame interactivity, and I’m still not positive that I’ll have something all that worthwhile by the end of the semester, but at least I’ll understand what the data means.
I think a good book to reference here might be “How to Lie with Maps” [http://www.goodreads.com/book/show/1005549.How_to_Lie_with_Maps]. Visualization tools in some semblance have been around since the 17th Century in some cases, and they’ve always been prone to these problems of “criticality,” now it is just easier to churn digital versions of them out without a lot of work, and without a lot of work, inaccuracies and issues are far more likely to rear their ugly head, or anyway, that is my excuse for spending a ton of man hours on this stuff over the past few months. Interested to see what other people think in class.
Thanks all, for these comments. And Anderson, thanks for sharing your blog post — your personal account is invaluable in juxtaposition to this week’s readings and I hope that you and anyone else who has experience in this practice can share the particulars of your projects and why such techniques are meaningful to your study. Ironically, even though the entire point of these data visualizations is to efficiently & perhaps seductively convey a story, I’m still struggling to understand them as exciting. Why, for example, should turning massive amounts of literature into charts and numbers not make me head screaming for the hills? What sort of joy in reading is that? Why should I so easily dismiss the stewing suspicion that this crunching of literature into numbers is turning the form into something useful for fascism/capitalism as Walter Benjamin warned in his essay on mechanical reproduction (just think of the potential of semantic text analysis) and useless to people? Inherent in my choice to studying literature, I suppose, is the idea that written words can do something that data can neither fully describe nor quite mimic. What do we lose, then, when we act as if data can indeed describe everything? I know I’m taking a rather simplistic, old-foggy viewpoint, but I offer it in hopes that we might be able to hash out some its contradictions.
Anderson and many in this week’s reading describe a certain type of excitement to this data mining approach. It’s this excitement that I earnestly want to know more about. One of my friends told me there’s a saying in mathematics that a concept that takes a week to understand via a paper, takes only an hour if explained in conversation. I’ve begun reading Stanford Literary Lab’s lengthy, and promising series of pamphlets that might address my question, but, old foggy that I am, I think nothing could substitute an in person explanation.
Erin, as a fellow bookish English person, I agree that there is something distasteful about the “crunching of literature into numbers.” But is there analytical potential in this approach that we may be overly quick to dismiss because of that bookishness?
At the recent Data Visualization workshop, Micki Kaufman (who is from the History program) had some interesting justifications for the use of big data analysis. She said that data visualization is all about looking at content absent of narrative—which is important for Micki because of the materials she works with: declassified Vietnam War-era meeting memoranda and phone conversations of Henry Kissinger, whose modus operandi was the manipulation of policy narratives (*I am generalizing her project here). In other words, data visualization can help facilitate suspicious readings of political/historical/social narratives. Micki explained that while big data visualizations can be useful for realizing patterns, its value as a tool is really how it can help us to avoid the pitfalls of patterns. The tendency to gravitate toward neat patterns to make sense of the materials we work with can actually be reductive if we are (innocently enough) overlooking elements for analysis that don’t accord with the pattern. I think the most interesting thing Micki said about data visualization is that it can enable us to see evidence that is suggestive of absence—it points to those places that may be the most fruitful sites for further analysis (but which we might not otherwise notice). This could address Ben’s concern that “there is something about engagement with data through visualization that thwarts criticality, or at least pushes towards simpler rather than complex answers to social questions.”
In theory, I think there is something potentially exciting in the use of data visualization tools to reveal that which needs our closer attention, and Micki successfully demonstrates this approach using historical documents. However, I really struggled to conceive of how to do this with imaginative literature, which doesn’t lend itself to quantification. So Erin, I’m on board with your thinking that data can’t fully describe or mimic all that words do in a literary text. But do we need data to be able to ‘fully describe everything’ in order to get something out of it? Do the literary devices we use to make sense of what words are doing in a text actually fully describe everything?
I love how you raised this question, Ben, since it resonates with one of the many debates currently raging in education circles. The Gates Foundation-funded non-profit inBloom is in collaboration with several districts including the NYC DOE in a large-scale student data integration and sharing project, which is in pilot form.
A recent Q&A between a blogger and a representative from inBloom led to some feisty discussion in the comments section from parents and other advocates for privacy of student information. Although I’m not entirely sure I understand the service that inBloom will be providing, I think the idea is to aggregate data from a number of sources and produce personalized products for individual students, in essence streamlining data into easy-to-read deliverables that would no doubt simplify and distort the infinitely complex subject of student learning and possibly further wield violence against public schools with data manipulation. Diane Ravitch recently covered the possibility of identity theft inherent in this arrangement among school districts, inBloom, and commercial interests.
No matter what, if Wireless Generation and Rupert Murdoch are involved, this project is not about critical inquiry and is of course about profit.