GrokTank: 2012

Monday, March 5, 2012

Soaking in the Twittersphere

My first story on Storify aggregates more than 100 tweets (from one day!) containing quotes from Montaige's Essais. My tweet about the story has been retweeted, and I've gained three new followers. I've been captured by the Twittersphere... and I'm having fun!

Sunday, March 4, 2012

Exciting DH Tools: TraduXio

TraduXio is a collaborative platform for the translation of cultural texts. The primary developers are French. This is a potentially radical tool that must be explored.

Questions for Zotero Workshop at First THATCamp

I'm heading to my first THATCamp on Friday (woot!), and I plan to attend the Zotero workshop. After a few failed attempts, I managed to get the Firefox add-on installed and I've added a few entries. Now the questions begin... I'll be updating this post throughout the week and then (hopefully) adding answers during or after the workshop.

- If I want to enter the Companion to Digital Humanities, should I enter each chapter separately or the work as a whole? (I vaguely remember seeing something about a parent/child relationship between resources; maybe this is the answer).

- Do I need to capture a snapshot of every webpage and blog post if I want to view it offline?

- What is the difference between the first and second levels in the center panel? Each item I have saved so far (all blog posts) has been created this way.

- Is there any way to blog directly from Zotero?

-There has to be a better way to import the PDF of a paper than what I did for the "Translation, Style, and Ideology" paper - what is it?

Tuesday, February 21, 2012

Tom Scheinfeldt is My New Hero

... Because he writes things like this:

I believe we are at a similar moment of change right now, that we are entering a new phase of scholarship that will be dominated not by ideas, but once again by organizing activities, both in terms of organizing knowledge and organizing ourselves and our work. My difficulty in answering the question “What’s the big idea in history right now?” stems from the fact that, as a digital historian, I traffic much less in new theories than in new methods. The new technology of the Internet has shifted the work of a rapidly growing number of scholars away from thinking big thoughts to forging new tools, methods, materials, techniques, and modes or work which will enable us to harness the still unwieldy, but obviously game-changing, information technologies now sitting on our desktops and in our pockets.

And like this:

Eventually digital humanities must make arguments. It has to answer questions. But yet? Like 18th century natural philosophers confronted with a deluge of strange new tools like microscopes, air pumps, and electrical machines, maybe we need time to articulate our digital apparatus, to produce new phenomena that we can neither anticipate nor explain immediately.

Method over theory, tools over arguments. This is the most appealing academic approach I have come across.

Saturday, February 11, 2012

What DH is for: Clues in Abstracts from Literary & Linguistic Computing

I'm continuing my quest to examine current answers to the question 'What is the Digital Humanities?' In addition to the manifestos and essays in Debates in the Digital Humanities, I've been keeping my eyes peeled for implicit definitions. For example, I've highlighted a key phrase that points to an implicit definition in each of the two abstracts below:

Evidence of intertextuality: investigating Paul the Deacon's Angustae Vitae
Christopher W. Forstall, Sarah L. Jacobson, and Walter J. Scheirer
Lit Linguist Computing (2011) 26(3): 285-296 first published online May 30, 2011 doi:10.1093/llc/fqr029

Abstract: In this study, we use computational methods to evaluate and quantify philological evidence that an eighth century CE Latin poem by Paul the Deacon was influenced by the works of the classical Roman poet Catullus. We employ a hybrid feature set composed of n-gram frequencies for linguistic structures of three different kinds—words, characters, and metrical quantities. This feature set is evaluated using a one-class support vector machine approach. While all three classes of features prove to have something to say about poetic style, the character-based features prove most reliable in validating and quantifying the subjective judgments of the practicing Latin philologist. Word-based features were most useful as a secondary refining tool, while metrical data were not yet able to improve classification. As these features are developed in ongoing work, they are simultaneously being incorporated into an existing online tool for allusion detection in Latin poetry. (emphasis added)

Text encoding and ontology—enlarging an ontology by semi-automatic generated instances
Amélie Zöllner-Weber
Lit Linguist Computing (2011) 26(3): 365-370 first published online May 16, 2011 doi:10.1093/llc/fqr021

Abstract: The challenge in literary computing is (1) to model texts, to produce digital editions and (2) to model the meaning of literary phenomena which readers have in their mind when reading a text. Recently, an approach was proposed to describe and present structure and attributes of literary characters (i.e. the mental representation in a reader’s mind), to explore, and to compare different representations using an ontology. In order to expand the ontology for literary characters, users must manually extract information about characters from literary texts and, again manually, add them to the ontology. In this contribution, I present an application that supports users when working with ontologies in literary studies. Therefore, semi-automatic suggestions for including information in an ontology are generated. The challenge of my approach is to encode aspects of literary characters in a text and to fit it automatically to the ontology of literary characters. The application has been tested by using an extract of the novel ‘Melmoth the Wanderer’ (1820), written by Charles Robert Maturin. For the main character, Melmoth, 72 instances were generated and assigned successfully to the ontology. In conclusion, I think that this approach is not limited to the theme of character descriptions; it can also be adapted to other topics in literary computing and Digital Humanities. (emphasis added)

There's a clear relationship between "...validating and quantifying the subjective judgments of" [insert traditional humanities discipline here] and modeling "the meaning of literary phenomena which readers have in their mind when reading a text" (in other words, validating the subjective judgments of readers). Since I've previously stated that Tom Scheinfeldt is my new hero, I have to keep in mind that it may not be time yet for digital humanities to be answering questions. But it seems to me that validating subjective judgments may be one of the most promising avenues for digital humanities to pursue.

Tuesday, February 7, 2012

A Companion to Digital Humanities, Ch. 18: Electronic Texts - Audiences and Purposes

A Companion to Digital Humanities
Ch. 18: Electronic Texts - Audiences and Purposes
Author: Perry Willett

Willett provides an overview of the history and key questions in the development and use of electronic text. Unfortunately, his attempt to provide such an overview suffers from the rapid development of technology in the decade since he wrote his article.

Willett's most useful comments address longer-range questions, such as the development and application of editorial standards and practices for electronic text. However, these useful comments are interspersed with woefully out of date descriptions of hypertext. This leaves the reader with no way to evaluate Willett's comments when it is less clear what effect the passage of time may have had on the topic at hand. For example, when he describes the current rush to publish electronic texts as "a land rush, as publishers, libraries, and individuals seek to publish significant collections," one must turn away from the Companion and toward more recently published sources to verify his statement. Later in the chapter, Willett mentions that 'recent' books (from 1997 and 2000) focus on the production of electronic texts, "leaving aside any discussion of their eventual use," because of the unresolved question about how the computer can "aid in literary criticism." While this may still be the case, one must question any statement about a field as dynamic as the digital humanities that relies on such outdated sources.

Willett's chapter does include an insightful discussion of the uneven distribution of the availability of works in electronic text format. This is another longer term problem that seems like it will continue to apply for years to come, at least in part because of one of the key reasons for the uneven distribution: copyright. Willett calls copyright "the hidden force behind most electronic text collections." As in many of the other chapters in the Companion, Willett seems to have identified a key obstacle faced by digital humanists that is unlikely to change any time soon: "The effect of copyright means that researchers and students interested in twentieth-century and contemporary writing are largely prevented from using electronic text." The issue of copyright looms large in many humanities disciplines, such as translation, but it does not seem to be an active topic of discussion - instead, it seems to be treated (as it probably is) as a fait accompli.

A Companion to Digital Humanities, Ch. 16: Marking Texts of Many Dimensions

A Companion to Digital Humanities
Ch. 16: Marking Texts of Many Dimensions
Author: Jerome McGann

In the second paragraph of his chapter, Jerome McGann describes the process of digitizing text:

As we lay foundations for translating our inherited archive of cultural materials, including vast corpora of paper-based materials, into digital depositories and forms, we are called to a clarity of thought about textuality that most people, even most scholars, rarely undertake. (emphasis added)

This quote resonates strongly with conventional wisdom among translators, who often say that, to quote translator Giovanni Pontiero, "...the most careful reading one can give of a text is to translate it." The parallels between the way that translators and digital humanists regard their approach as more careful than that of other approaches to texts is interesting.

McGann continues the parallels between digitization and translation in his third paragraph. Compare the following quotes:

"All text is marked text..." -- McGann

"All acts of communication are acts of translation." -- George Steiner, After Babel

And later in the chapter, McGann raises another potentially interesting parallel between digitization and translation when he states that:

Two forms of critical reflection regularly violate the sanctity of such self-contained textual spaces: translation and editing... Consequently, acts of translation and editing are especially useful forms of critical reflection because they so clearly invade and change their subjects in material ways. To undertake either, you can scarcely not realize the performative - even the deformative - character of your critical agency. (emphasis added)

McGann refers to editing, not digitization, but his overall perspective seems to support the idea that digitization is a subset of editing, or that it could have been included as a third form of critical reflection that fits with translation and editing. If this is not the case, it's unclear why McGann is discussing translation and editing in this chapter.

Allopietic vs. Autopoietic Systems

Despite the interesting intertextuality inherent in his introduction, the remainder of McGann's chapter is difficult to follow. His distinction between allopoietic and autopoietic systems is far from clear. In particular, McGann doesn't seem to state clearly whether coding and markup are allopoietic or autopoietic. At one point, he says that they "appear" allopoetic, but that they "are not like most allopoietic systems" because "Coding functions emerge as code only within an autopoietic system that has evolved those functions as essential to the maintenance of its life." This seems to presuppose that code is integral to the text, which fits with the statement that "All text is marked text." However, the lack of concrete examples in this section of the chapter makes it difficult to know for certain what McGann is getting at.

The Dimensions of Textual Fields

In the section of the chapter entitled "Marking the Text: A Necessary Distinction," McGann explains why SGML-based tagging schemas are unable "to render the forms and functions of traditional textual documents." As a result, he seems to be saying that all efforts at tagging are limited by an incomplete "conception of textuality" because they are limited to the linguistic, while books "organize themselves along multiple dimensions of which the linguistic is only one." In one of the chapter's appendices, McGann describes the six dimensions of "textual fields":

Linguistic
Graphical/Auditional
Documentary
Semiotic
Rhetorical
Social

While the first three categories are straightforward (given that McGann defines the documentary dimension as the text's "transmission history"), the latter three are unclear. His definition of the semiotic dimension is opaque and his description of the rhetorical dimension is reductionist ("The dominant form of this dimension is genre"). The social dimension is interesting, but it is also the furthest dimension from traditional theories of textuality, according to McGann, who argues that, in these traditional theories, "the social dimension is not considered an intrinsic textual feature or function." In any case, McGann seems to dismiss the potential value of analyzing the linguistic dimension of a text without taking the other dimensions into account.

This stance seems to limit the potential utility of markup, an attitude McGann echoes in his conclusion when he asserts that "... computer markup as currently imagined handicaps or even baffles altogether our moves to engage with the well-known dynamic functions of textual works." While McGann's perspective may be interesting and/or cutting-edge, he provides new researchers in digital humanities (the presumed audience of the Companion) with few clues as to how to implement his approach. The idea of a "digital processing program... that allows one to mark and store these maps of the textual fields and then to study the ways they develop and unfold and how they compare with other textual mappings and transactions" is intriguing, but when faced with the choice between McGann's vague, idealistic approach and the far more concrete and approachable markup approach advocated by TEI, it's hard to see new digital humanists taking the road less traveled.

Monday, February 6, 2012

A Companion to Digital Humanities, Ch. 14: Classification and its Structures

A Companion to Digital Humanities
Ch. 14: Classification and its Structures
Author: C.M. Sperberg-McQueen

Sperberg-McQueen touches on a wide variety of topics concerning the theory and practice of classification applied to text. The chapter provides numerous examples and definitions related to classification. However, the chapter does not seem to have an overall thesis about classification, and does not provide a clear explanation of the potential applications of classification.

Nevertheless, Sperberg-McQueen makes an number of interesting points. For example, like Rommel in Ch. 8, he emphasizes that the seemingly mechanical act of classification actually requires the classifier to adopt an interpretive perspective, whether consciously or subconsciously. He also concisely describes the epistemology of classification: "... a perfect classification scheme would exhibit perfect knowledge of the object."

Sperberg-McQueen is most helpful when he is providing concrete examples and definitions. For example, he describes two methods of classification that are common in the humanities:

"the application of pre-existing classification schemes"
"the post hoc identification of clusters among a sample of... texts"

He also defines a number of types of classification:

One-dimensional classifications (based on a single characteristic)

Nominal classifiers [which] consist simple of a set of categories
Ordinal classifiers (first-year, second-year, etc.)
Segmented classifiers (e.g. age, height, price by range)

Classification Schemes as N-dimensional Spaces

Tree structure (e.g. Dewey decimal system)
Faceted vs. enumerative schemes (Sperberg-McQueen's explanation of this concept could use some of the concrete examples he uses in other parts of the chapter.)
Pre- and post-coordinate classification schemes (This section is also less clear than others.)

Sunday, February 5, 2012

Questions Concerning A Companion to Digital Humanities

Ch. 8: Literary Studies

In his first paragraph, Thomas Rommel states that "the analysis of literature is traditional seen as a subjective procedure." Is this true? If so, is digital humanities the only way to incorporate empirical evidence?
Rommel poses an intriguing question that he never answers: What does it mean to collect empirical evidence with regard to a literary text? Where can I look for the answer to this question?
Is Rommel as pessimistic about the potential of literary computing as he seems to be? Is there general agreement on the statement that literary computing "has yet to add anything to the traditional study of literature? (Also see McGann)
Rommel states that "the question of 'method' remains at the heart of most electronic analysis of literature." What does Rommel mean by 'method'?

Ch. 14: Classification and its Structures

Why does Sperberg-McQueen remain so theoretical in his discussion of classification? Wouldn't it be more helpful to give examples or to explain how to apply the methods of classification he describes?

Ch. 16: Marking Texts of Many Dimensions

Jerome McGann describes the process of digitizing text as 'translating.' What conclusions can be drawn from McGann's use of the word translating to describe the process of digitizing text, or from the similar way in which translators and digital humanists characterize their practices as more careful than others who engage with texts?
Like Rommel, McGann seems skeptical about the potential for tagging because it only addresses the linguistic dimension of a text. He goes so far as to say that "... computer markup as currently imagined handicaps or even baffles altogether our moves to engage with the well-known dynamic functions of textual works." How valid is his criticism?

Ch. 18: Electronic Texts - Audiences and Purposes

Willett's chapter suffers drastically from its age. What is the standard for relevant (current) research in digital humanities?
Willett calls the question about how the computer can "aid in literary criticism" unresolved. Do we have a better idea about the answer now than we did in 2004?

A Companion to Digital Humanities: Ch. 8 Literary Studies

A Companion to Digital Humanities
Ch. 8 Literary Studies
Author: Thomas Rommel

Rommel's first paragraph contains an intriguing sentence:
"But the analysis of literature is traditionally seen as a subjective procedure. Objectivity, based on empirical evidence, does not seem to figure prominently in studies that elucidate meaning from literary texts."

Rommel is describing a phenomenon that represents my biggest problem with traditional literary criticism -- I consider the systematic collection of empirical evidence to be prerequisite to the attempt to explicate, if not to elucidate meaning in a text of any kind. Herein lies one of the great appeals of an approach such as literary computing. However, Rommel's pessimism about the potential of literary computing is a bit discouraging.

Rommel purports to address a question that he never actually answers: What does it mean to collect empirical evidence with regard to a literary text? It is a question that seems to demand a quantitative approach, in the sense of identifying elements of the text that demonstrate common characteristics. Rommel does mention the use of computer to identify "strings and patterns in electronic texts." This type of analysis is far more systematic than the traditional, subjective analysis of literature.

But Rommel seems to believe that literary computing, which has now been engaged in systematic analysis of literary texts for more than twenty years, has yet to add anything to the traditional study of literature. Rommel repeatedly declares that literary computing can only access the "surface features of texts." He bemoans the fact that it has "never really made an impact on mainstream scholarship," that it "remains a marginal pursuit," and that it has added "significant insight in a very narrow spectrum of literary analysis." He quotes McGann's 2001 statement that literary computing will not be taken seriously until digital humanists demonstrate that their tools and methods "expand our interpretational procedures."

There are moments in his chapter when Rommel acknowledges the potential of literary computing, such as his allusion to the ability of markup to embody the encoder's interpretation of the text, Near the end of his chapter, Rommel also mentions that "Numerous studies of individual, and collections, of texts show that empirical evidence can be used productively for literary analysis." However, the overall tone of Rommel's article indicates that he is rather pessimistic about the potential for literary computing.

Mission Statement

I am beginning this blog as I begin to study for my qualifying exams for the PhD in Humanities at the University of Texas at Dallas. I will be taking exams in three fields: Digital Humanities, the History of Print Culture, and 20th Century Rhetorical Theory.