Warren Sack

Conversation Map

Graphical browser for very large-scale conversations (VLSCs)

Conversation Map_Interface

Conversation Map_Interface

Content Description

Conversation Map

The Conversation Map system can analyze several thousand messages at a time.  It employs a set of computational linguistics and sociology techniques in order to generate a graphical summary of the messages.  The graphical summary includes

* a set of social networks that illustrates who is corresponding with whom;

* a menu of themes of discussion that are important to the conversation embodied in the messages; and,

* a semantic network that articulates some of the emergent synonyms or metaphors of the discussion.

One can use the Conversation Map like Netscape Messenger, Outlook, Eudora, or any other conventional news or mail reader. However, right now, the text analysis procedures are too slow.  An analysis of several thousand messages currently takes the system several hours.  I am re-engineering the system (and redesigning the interface) to allow one to use the Conversation Map as an everyday email reader or news browser.

Social Networks: The upper left quadrant of the interface depicts a set of social networks that record who is corresponding with whom. By "corresponding" I mean who is mutually responding to and/or quoting from whom. According to my definition, two participants -- say "Sally" and "Spot" -- correspond with one another if Sally posts to the newsgroup, Spot responds to (or cites) Sally's message and then, later in the discussion, Spot posts to the group and Sally responds to (or quotes from) Spot's message.  In the social network, Sally and Spot will be represented as two nodes with a line connecting them.  If they correspond frequently, then the line between them will be short.  In contrast, those pairs of participants who correspond only once will be plotted relatively far apart. Note that posters who spam the group with many messages, but who receive no replies, do not even show up on the graph. Those participants who show up closely connected are pushed to the middle of the graph and can be understood as virtual mediators of the newsgroup.  They are virtual moderators because most of the analyses I have done have been of unmoderated, public discussion spaces on the Net. To end up in such a position one needs -- not only to post many messages -- but also to have others in the group reply to or quote from many of one's messages. So, the social network display acts both as a filter for spammers and a means to identify some of the main players in a discussion.

Themes:The menu in the upper middle of the interface lists the themes of the conversation.  Imagine that Sally posts a message about football, and then Spot responds with a message that includes some reference to baseball. Then, perhaps later in the discussion, Spot posts a message about skiing and Sally responds with one concerning skating. This correspondence will be represented in the social network, but some approximation to the theme of their exchange will also be listed in the middle menu.   In this case, since football, baseball, skiing, and skating are all sports, the term "sports" might be listed on the menu of themes. Calculating that these four terms are all sports requires, of course, a machine-readable thesaurus. The thesaurus employed in the Conversation Map system is WordNet, a lexical resource created by George Miller, his colleagues, and students at Princeton University (see Fellbaum, 1998).  The  algorithm for calculating the multi-authored themes is akin to (but not exactly the same as) a set of procedures from computational linguistics designed to analyze the lexical cohesion of single-authored texts (cf., Hirst and St-Onge, 1998).

Semantic Network: The calculations performed to create the semantic network shown in the upper right-hand corner do not use a thesaurus, but, rather, automatically generate a rough-draft thesaurus. To create a rough-draft thesaurus the Conversation Map system does the following: First, the content of all of the messages exchanged during the conversation is parsed.  In other words, the subjects, verbs, objects and some of the other modifying relations are identified between the words of each sentence in the texts of the messages. Next, for each unique noun mentioned in the corpus of messages a profile is built. By "profile" I mean that, for each noun, a vector is created that records all of the verbs for which the noun functioned as a subject; all of the verbs for which the noun functioned as an object; all of the adjectives which modified the noun; etc. Once a profile has been calculated for each noun, the nouns' profiles are compared to one another and each noun's nearest neighbor is identified. An algorithm described in Grefenstette (1994) is used to calculate and compare the noun profiles. If two nouns have similar profiles, then they can be said to have been "talked about" in similar ways by the participants in the discussion.  Therefore they may be considered synonyms or possibly metaphors for one another.  In the semantic network, if two nouns are nearest neighbors, then they are plotted as two nodes connected to one another.

Why is this sort of analysis of interest for the navigation of very large-scale conversations? To answer this question, I compare this sort of analysis with some work done by the cognitive scientists George Lakoff and Mark Johnson. Lakoff and Johnson wrote a book entitled Metaphors We Live By (Lakoff and Johnson, 1980). The book is filled with a set of metaphors that Lakoff and Johnson claim are central to our (presumably North American, English-speaking) culture.  In their book, for instance, they claim that one emergent metaphor of our culture is that arguments are buildings . As part of their method to argue for the validity of insights like this, they show how two nouns, which might a priori be considered to be completely unlike one another, show up in very similar contexts. For example, one can say "The building is shaky" but one can also say "The argument is shaky." One can say "The building collapsed" but also "The argument collapsed." Similarly, both buildings and arguments can be said to have "foundations," "to stand," and "to fall," "to be constructed," "to be supported," "to be buttressed," etc. A set of similar sentences of this sort provides an empirical means for thinking about and discovering how synonyms and metaphors are produced over the course of a large amount of discussion.

Thus, this tool for automatic, rough-draft thesaurus generation gives one the means to begin to generate the sorts of hypotheses that Lakoff and Johnson explore in their book. Alternatively, one can understand the noun profiles and semantic networks in Michel Foucault's terms; namely as "statements" and "diagrams" respectively.  Gilles Deleuze explains Foucault's terms (Deleuze, 1988).  So, the Conversation Map gives one some data exploration/navigation tools to start to understand how conversations differ from one another according to the metaphors, synonyms and "statements" that are produced by the collective efforts of their participants.

Message Archive: The lower half of the interface is a graphical representation of all of the messages that have been analyzed by the Conversation Map system. Messages are organized into threads.  A thread is defined as an initial post, all of the responses to the initial post, all of the responses to responses, etc.  The threads are plotted like spider webs.  The first message posted is represented as a large node, the responses, responses to responses, etc. are plotted as radiating out from the center.  Double-clicking on a message thread in the lower half of the interface, will cause a larger picture of the thread to be displayed.

The lower half of the screen is divided into a grid and the threads are organized in chronological order from upper left to lower right.   If a thread contains many messages, it shows up as an almost completely green square on this display. If a thread contains few messages, then it shows up as an almost completely black square. So, scanning across from upper-left to lower-right the lower-half of the screen can be seen as a rough guide to the posting activity in the newsgroup.
(Warren Sack)

The text above has been extracted from: Warren Sack: WHAT DOES A VERY LARGE-SCALE CONVERSATION LOOK LIKE?
2001. http://www.sims.berkeley.edu/~sack/SIGGRAPH01/wsack.html. This paper provides a full project description.
see also: http://www.sims.berkeley.edu/~sack/CM