Revolution Lullabye

January 31, 2013

Lang and Baehr, Data Mining

Lang, Susan and Craig Baehr. “Data Mining: A Hybrid Methodology for Complex and Dynamic Research.” College Composition and Communication 64.1 (September 2012): 172-194.

Lang and Baehr argue that data mining is a useful research methodology for researchers and administrators in composition and rhetoric because of its inductive nature and its ability to organize and use large sets of data.  Their article defines data mining, explains how current computer technologies make data mining an efficient and useful research tool, describes the process of data mining, gives an example of it in practice (from their work at Texas Tech), and names the limitation of the methodology.  They offer data mining as a tool for researchers to engage in a RAD research agenda, as called for by Richard Haswell and Chris Anson.  They believe that in this age of increased demand for accountability, data mining can help teachers and administrators develop better assessment techniques and argue for their programs.

Notable Notes

data mining allows for categorization, clustering, and the emergence of associations and patterns (178-179).

distinction: data mining is more inductive – the data comes first (not the hypothesis), and the findings emerge (179).

application of data mining to Chris Anson’s taxonomy of six types of research (research categories) (181-184).

example: why do students earn DFW in first-year writing? What are the factors? Data mining study at Texas Tech

limitations: the complexity and scope of the data; longitudinal studies are necessary to increase validity; it cannot completely substitute for other kinds of research methodology; quantitative methods aren’t as accepted in the field (190-191).

data mining process: (185-186)

  1. identify the problem(s)
  2. select raw source of data
  3. decide what measures or criteria to apply to the data
  4. develop a formal procedure (a repeatable process) for sifting through the data
  5. interpret the results

Quotable Quotes

“Data mining is the iterative process of systematically interpreting, organizing, and making meaning from data sources” (191).

“The increasingly accoutnability-focused climate of higher education demands that we at least begin to explore the use of data-mining technologies” (184).

“Data and text mining extend these activities beyond what is possible for us to do as individuals without the assistance of computer technology, as large amounts of numeric or textual data can be examined for various types of relationships, including classes, clusters, associations, and patterns” (178).

January 25, 2013

Moretti, Graphs, Maps, Trees

Moretti, Franco. Graphs, Maps, Trees: Abstract Models for a Literary History. London: Verso, 2005.

Moretti uses three distant reading approaches, borrowed from the social and natural sciences (graphs, maps, trees) as an way to investigate literary history on a large scale.  He argues that the corpus of literature is so large that it is necessary, in order to understand its evolution, to use quantitative approaches, like maps, graphs, and trees, which place emphasis not on individual texts but on larger movements in the field.

His book is divided into three main chapters – Graphs; Maps; Trees.  In each chapter, he demonstrates how the particular distant reading approach helps him see patterns that are not discernable on the level of the individual text.  He is interested in the history of the book, and his work builds off of other literary historians.  He argues that his quantitative approach is a more methodical or “rational” way to approach literary history, and he argues that the forms that emerge in the process illustrate the forces that shape the texts and the field.

Moretti asks how his graphs, trees, and models work as a theories to change how literary scholars think about their work and the distinctions that have been made in the literary corpus – do they still hold true?

Notable Notes

Graphs – looks at the rise and fall of the novel in both Britain and in Japan, Italy, Spain, and Nigeria over 300 years.  The number of books published a year seems to intersect with political and social movements, revolutions and wars

Maps – focuses on the mapping of locations, characters, events, in the five volumes of Mary Mitford’s five volumes Our Village (1824-1832)  The novels map onto concentric circles, but over the course of the five volumes, the activity becomes more and more distant from the village at the center

Trees – draws on Darwin’s evolutionary tree, shows the divergence, emergence, and divergence again of syntax-specific constructions (free indirect style in modern narrative 1800-2000)

Trees have horizontal and vertical movement, space and time

Quotable Quotes

“The models I have presented also share a clear preference for explanation over interpretation; or perhaps, better, for the explanation of general structures over the interpretation of individual texts” (91).

His name for the what his graphs, maps, and trees have in common: “a materialist conception of form” (92)

Maps – they help us understand and see forces that shape texts: “form as a diagram of forces” (64)

“Each pattern is a clue” (57)

“What do literary maps do…First, they are a good way to prepare a text for analysis. You choose a unit – walks, lawsuits, luxury goods, whatever – find its occurrences, place them in space…or in other words: you reduce the text to a few elements, and abstract them from the narrative flow, and construct a new, artificial object like the maps that I have been discussing. And, with a little luck, these maps will be more than the sum of their parts: they will possess ‘emerging’ qualities, which were not visible at the lower level” (53)

“Not that the map is itself an explanation, of course: but at least, it offers a model of the narrative universe which rearranges its components in a non-trivial way, and may bring some hidden patterns to the surface” (54).

“I began this chapter by saying that quantitative data are useful because they are independent of interpretation; then, that they are challenging because they often demand an interpretation that transcends the quantitative realm” (30).

**Important point: quantitative models and research “provides data, not interpretation” (9).

“A field this large cannot be understood by stitching together separate bits of knowledge about individual cases, because it isn’t a sum of individual cases: it’s a collective system, that should be grasped as such, as a whole” (4)

Purpose: “A more rational literary history. That is the idea.” (4) – “a quantitative approach to literature.” – methodical, patterns, groups.

“From texts to models, then; and models drawn from three disciplines with which literary studies have had little or no interaction: graphs from quantitative history, maps from geography, and trees from evolutionary theory” (1-2)

“‘Distant reading,’ I have once called this type of approach; where distance is however not an obstacle, but a specific form of knowledge: fewer elements, hence a sharper sense of their overall interconnection. Shapes, relations, structures. Forms. Models.” (1)

January 24, 2013

Mueller, Grasping Rhetoric and Composition by Its Long Tail

Mueller, Derek. “Grasping Rhetoric and Composition by Its Long Tail: What Graphs Can Tell Us about the Field’s Changing Shape.” College Composition and Communication 64.1 (Septembter 2012): 195-223.

Mueller investigates 25 years of citations from the journal College Composition and Communication (1987-2011) to explore the discipline’s citation practices and changing shape.  He uses graphs, lists, and tables (an application of distant reading methods drawn from Franco Moretti’s work) to demonstrate the field’s growing specialization, as shown by the diminishing frequency of top-cited scholars among the data set of citations.  He uses Chris Anderson’s concept of the long tail to describe what he sees in the shifting citation practices of CCC articles: not only have the top-cited authors changed over 25 years (the scholars most frequently cited in 1987-1991 are not those most frequently cited in 2007-2011), but also there has been a growing number of once- or twice-cited authors or scholars, which shows the expansion and increasingly specialization of composition and rhetoric.  Mueller offers his study as a way to query the field and ask how our graduate education curriculum and professional development prepare future scholars for the field of the future.

Notable Notes

Chris Anderson – Wired magazine 2004: the long tail.  Anderson used the long tail to describe market practices, showing how online retailers are able to capitalize on less-popular niche markets (Amazon v. Borders.)  Pareto distribution/power law

contains a series of graphs – some looking at the aggregate data, others split into five-year subsets

distant reading – systematic, quantitative approach to data, a different scale than close reading, and this larger scale helps us recognize patterns and developments that are not always apparent at close range. Table of contents, article abstracts as an example of distant reading.  They enable decision making: “Readers rely on these devices to make quick decisions about whether to read a particular article or not, but reading the journal through these devices alone is not quite the same as reading a scholarly article in the common sense of the activity” (198).  (Mueller cites Malcolm Gladwell’s Blink in his endnote.)

the usefulness of graphs and distant reading – they encourage new questions

His graphs/lists/tables:

  • Figure 1- page count and citation count over 25 years (both have increased)
  • Figure 2- 102 most frequently cited authors in CCC from 1987-2011
  • Figure 3 – top ten most frequently cited authors in CCC from 1987-2011, divided into 5-year intervals
  • Figure 4 – Chris Anderson’s “Anatomy of the Long Tail”
  • Figure 5 – the long tail, references to unique names in CCC works cited 1987-2011
  • Figures 6-10 – the long tail, references to unique names in CCC works cited 1987-2011, split into 5-year intervals

there is no one stable field.  Growing specialization isn’t a problem to solve; it is something to query and base our actions on (215-217)

more research in the dataset – how does an author’s citation practices change over time? Are citation practices from graduates of certain programs similar? (214)

the problem of keeping up with scholarship in the field.  How can one read the whole long tail?  How has the field changed because of increasing specialization? (214)

our understanding of the field is based on our own vantage point (217)

extension of study done by Phillips, Greenberg, and Gibson in 1993

16,726 citations in 491 journal articles published in CCC from 1987-2011 (25 years) (197)

Who was central when? What does that say about our field? (203)

problem: “citation listings lack dimension” – the works cited does not indicate the importance or general impact of a citation on the work as a whole

dappled field (206)

Quotable Quotes

“From graphs, then, come new insights, new provocations, and new questions: what has changed, over time, in the relationship between the head of the curve and the long tail?” (215)

“A deliberate adjustment in the level of detail at which we ordinarily experience texts: this is a key motive when producing graphs as a distant reading method, and it is a common tactic for mediating large datasets, including scholarly corpora” (197-198).

“Certainly the figures at the top tell us something about citation practices and centrality in the journal’s scholarly conversation; however, the larger number of figures at the bottom indicates something more. It is, after all, in this long, flat expanse of unduplicated references that we can begin to assess just how broad-based the conversations (in a given journal) have grown – and just how much the centered, coherent, and familiar locus of conversation, based on citation practices, has slid” (210).

“Burke’s parlor is nowadays full and teeming, more crowded than ever before” (214).

“A changing disciplinary density: this is not a condition for us to solve; nonetheless, it demands a certain reckoning, particularly for graduate education and professional development” (219).

Blog at WordPress.com.