|Collab-U||CMC Play||E-Commerce||Symposium||Net Law||InfoSpaces||Usenet|
Assessing the Structure of Communication on the World Wide Web
Michele H. Jackson
Department of Communication
Florida State University
Table of Contents
- Hypertext: The Vision and the Emerging Reality
- Representing Structure
- Interpreting Structure
- About the Author
This paper examines closely the nature of the hypertext link as a communication tool for Web designers and authors. The strategic nature of the link raises important questions for the representation and interpretation of Web structure. Network analysis is suggested as a methodology that can be used by researchers investigating the World Wide Web from a communication perspective.
IntroductionNot only has the World Wide Web grown dramatically since its inception, but so have the uses to which the Web has been put. Once simply a means of accessing information stored across various platforms, the Web is now a widely used medium for communication. Though there exists a well-developed body of research investigating computer-mediated communication [Jackson, 1996], the analysis of Web-based communication (WBC) is in its early stages. This article contributes to efforts in this area by proposing a method for investigating the structure of WBC.
The basic structural element of WBC is the hypertext link. These links do more than define the means for moving from one location or document to another. They offer a new strategy for structuring communication. It is possible that such structuring may carry unique implications for the nature and consequences of human communication. Investigating this theoretical question, however, requires that we have a means for representing and interpreting the structure of WBC. Thus, the prior and more basic problem is methodological, that is, how do we study the Web as a unique medium for communication?
In response to this question, this paper develops two independent but complementary arguments. First, I lay out in detail the claim that a methodological problem exists at all. Second, the standard techniques of network analysis are proposed as one possible set of tools for Web analysis, along with some suggestions for how this methodology might prove useful for identifying directions for theory development.
Hypertext: The Vision and The Emerging Reality
The hypertext-like nature of the World Wide Web introduces specific methodological problems for communication research, apart from any theoretical problems or opportunities. There are at least two reasons this is the case. First, the Web, in a strict, technical sense, does not use hypertext. Second, communication, as it is structured on the Web through the use of hypertext links, indeed may be substantively unique. This section takes up each of these reasons.
For anyone familiar with the Web, the claim that it does not use hypertext may cause some raised eyebrows. Yet, strictly speaking, it doesn't. The Web relies on "mark-up" tags, which, taken together, resemble programming languages, capable of performing functions similar to hypertext. Lest this seem like splitting hairs, let us consider the differences between hypertext and hypertext links in more detail. Hypertext began as a vision, sometimes traced to [Vannevar Bush's (1945)] Memex machine and, more often, to the work of [Theodor Nelson (1967)]. The vision was one of the complete interlocking of all texts or information in whatever medium (including text, images, and sound). For Bush, this interlocking structure would allow scientists to organize ever expanding amounts of information. The Memex would allow individual scientists to organize and retrieve their collected material through a personalized associative structure, rather than through traditional hierarchical cataloguing or indexing. Whereas for Bush, the Memex was primarily a tool for organization, for Nelson, hypertext was a tool for the expression and development of ideas. On a general level, Nelson saw hypertext as removing the confines of linearity imposed on ideas by existing media. In hypertextual expression, ideas may branch in several directions, and paths through these ideas are followed and created by the reader who also becomes author. A hypertext document, therefore, cannot be recreated on a conventional page of linear text.
Both Bush and Nelson acknowledged that, at the time of their initial conceptualizations, the technological capacity for hypertext did not exist, though they were confident that one day it would. Indeed, Bush spelled out rather precisely how the Memex might be constructed given the existing and emerging technologies of his time, and Nelson worked actively toward his vision, for example, in Project Xanadu. The absence of an existing technological framework allowed for an articulation of hypertext unconstrained by problems of--or opportunities for--practical implementation of the idea. Subsequent authors have taken up the vision or the ideal of hypertext and articulated it further. For example, [Heim (1993)] muses upon Nelson's vision and its implication for new structures and new relations. Technically, hypertext is "push button access to the text of all texts" (p. 9) in which the user may follow an indeterminable number of paths through information. In this associative movement lies the potential for radically altering human thought:
Like the fictional hyperspace, hypertext unsettles the logical tracking of the mind. In both, our linear perception loses track of the series of discernible movements. A hypertext connects things at the speed of a flash of intuition. . . . [H]ypertext supports the intuitive leap over the traditional step-by-step logical chain [Heim, 1993, p.31].
Note, however, that this vision entails the complete and automatic interlocking of text, so that all documents are coexistent, with none existing in a prior or primary relation to any other. To perform as a "flash of intuition," linking must be performed in a manner transparent to either the author or the reader, which, in this vision, collapse into one. There is no distinction between existing and potential documents: all exist in an "eternal present":
The hypertext link turns out in fact to be much more than a reference tool. The link indicates the implicit presence of other texts and the ability to reach them instantly. It implies the jump. With the jump, all texts are virtually coresident. . . . All texts are virtually present and available for immediate access. The original text is merely the text accessed at the moment, the current center of focus [Heim, 1993, p. 35].
An important point in both Nelson's and Heim's perspective is the transformed position of the reader or the user. Such a transformation is possible only if the user has the true ability to move completely about the material in paths that have not been determined prior to the user's journey. This is technically possible in one of two ways. First, the technology might, indeed, automatically create links between terms or phrases, creating the "text of all texts." For example, the selection of one term or phrase will call up for the user all other instances of that term or phrase. In a crude fashion, this is what Internet search engines presently accomplish. A more sophisticated approach might be a knowledge-based system that would intelligently "learn" to create links to documents based on patterns of user actions. Thus, links could be based on the association of ideas rather than the identification of identical terms. A second way it is possible to support free movement is to design a system in which any user may modify any document or create any link between documents. This is the approach adopted by Landow and his associates in developing Intermedia at Brown University. [Landow (1990)] explicitly extends the vision of completely interlocking documents and, hence, implies the collapse of the independent, isolated document. Intermedia is built on the principle that no document in the system can exist in isolation; every document always is potentially linked with any other document. Traditionally we work with documents separated physically: pages have physical dimensions, books have bindings, sound is recorded on tape or disc. But when any section of any document can be integrated into any other document instantaneously and without regard to specific location, these physical separations are meaningless. Linked documents "collaborate," destroying the authoritative vantage point of the "original text" and the univocal voice of the printed text. Placing a text in a network of texts "forces it to exist as part of a complex dialogue" [Landow, 1990, p. 912]. Hypertext renders readers of a document simultaneous authors of that document. Landow argues such a system is highly transformative, redefining the traditional relationships of author and reader or text and commentary by eliminating the boundaries and hierarchies made possible by the presence of an "original" text.
In the move from vision to implementation, every hypertext system requires a means for the user to select the specific information he or she will view at any one time. Because this selection must flow associatively as the user follows the paths that he or she perceives in the material, any hypertext system will employ the link, often called the jump. Recall that in a hypertext system, the notion of individual documents collapses as all material is always simultaneously connected to each other into the single universe of the "Document." A link, when activated, will cause the user to "jump" from one specific location in the Document to another. (Or, alternatively, it may cause the material to jump from one location to another so that it may be viewed by the user.) In a true hypertext system, all terms, phrases, images, sounds, and so on, are links. In an automated system, these connections would be transparent to the user. In a collaborative system, such as Intermedia, any user would be able to "define" a link at any time.
In terms of the argument developing here, the important point is this: the link is not the essence of hypertext. It is a mechanism to implement hypertext. It is a tool for navigating through the Document. The essence of hypertext is the view of the unified Document and the importance of this Document in breaking down constraints posed by having to interact with physically separate individual documents. Each reader or user is able to interact with any material however he or she wishes. In a hypertext world open to several users, no one holds proprietary claims and, therefore, no one can control by fiat the content or structure of material. If another method were found to implement the hypertext vision, abandoning the link would not be problematic.
In contrast to its mere practical role in a hypertext system, the link is the essence of the World Wide Web. Under the guidance of Tim Berners-Lee, the link became the mechanism through which information could be passed across otherwise incompatible systems, platforms, and networks.
Berners-Lee conceptualized and led the development of the World Wide Web at CERN, the European Laboratory for Particle Physics, in 1990. A proposal circulated in 1989 suggests that he was very familiar with Nelson's vision of hypertext. Berners-Lee saw, in hypertext, a means for managing the large amounts of information generated and used by CERN projects:
In providing a system for manipulating this sort of information, the hope would be to allow a pool of information to develop which could grow and evolve with the organisation and the projects it describes. For this to be possible, the method of storage must not place its own restraints on the information. This is why a "web" of notes with links (like references) between them is far more useful than a fixed hierarchical system. When describing a complex system, many people resort to diagrams with circles and arrows. Circles and arrows leave one free to describe the interrelationships between things in a way that tables, for example, do not. The system we need is like a diagram of circles and arrows, where circles and arrows can stand for anything . . . The system must allow any sort of information to be entered. Another person must be able to find the information, sometimes without knowing what he (sic) is looking for. [Berners-Lee, 1990]
Though acknowledging the philosophical vision of hypertext, Berners-Lee's proposal centered on developing a system appropriate to the CERN context. Thus the philosophical turned practical: "To be a practical system in the CERN environment, there are a number of clear practical requirements." The system had to do the following:
- accomodate a network of heterogeneous systems,
- operate without any central control or coordination,
- provide access to existing databases
- allow establishing "private" links to and from "public" information
- have a minimum of "bells and "whistles"
- support data analysis
- support links to "live," non-static data
For Berners-Lee, the success of his "web" would be measured by whether or not it was used, and his proposal suggests that he was well aware of the problems new systems face. Critical decisions were made to ensure this success. First, the software for storing information was separated from the software for displaying information. Berners-Lee referred to this division as important for CERN (which possessed very heterogeneous systems) and "a boon for the world in general." Thus were born the "browser" and the notion of client-defined (rather than server-defined) information display. An important implication of this decision is that the user is separated from the data. Indeed, the second critical decision was that, at least in the initial stages of development, the user would be given read-only access to information. Thus, users could construct their own documents on their own systems to link to various material accessible to the "web," but they could not modify documents "owned" by other users or other systems. In direct contrast to the vision of hypertext, ownership, and therefore separation, of documents was preserved.
By October, 1990, the contours of the World Wide Web were beginning to gel. T. Berners-Lee and Robert Cailliau submitted a more detailed proposal for a "WorldWideWeb" hypertext project [Berners-Lee & Cailliau, 1990]. The primary motive for such a project, as laid out in their introduction, was the heterogeneity of computing systems at CERN. The incompatibilities of system and tool platforms meant data was difficult to find and to access. A hypertext-based system would allow easy and transparent movement between various systems and would enable the linking of related data, such as project names and team members, even though such data may reside in separate databases, that is, in physically separate locations. It is clear in this later proposal that Nelson's vision of the universal document had been eclipsed by Bush's vision of the universal organizer.
Further references indicate even greater movement away from a true hypertext vision. First, the ideal of completely interlocking text is abandoned, as is the notion of the indetermined path. Instead, well-planned links will allow the user to reach specific information efficiently:
The web is also not complete, since it is hard to imagine that all the possible links would be put in by authors. Yet a small number of links is usually sufficient for getting from anywhere to anywhere else in a small number of hops. [Berners-Lee & Cailliau, 1990]
For Nelson, hypertext structure was integral to the expression and development of ideas. In the CERN project, it is a means to find and access desired information. Users go into the web not to embark on a journey, but to find something and to find it quickly and efficiently.
A second move away from a true hypertext vision is that the roles of authors/designers and users/readers are maintained. Browsers are developed simply to read and display information, rather than create information or, more critically, create links. Browsing is defined, in the fundamental architecture of the Web, as distinct from creating or designing. A user may not be, simultaneously, both a reader and a writer on the Web. In fact, the user does not even control the display of information. It is the browser, the software program itself, that negotiates with the user "what format is acceptable for display on the user's screen."
A third move away is that the concept of ownership and separation of material is maintained. Though it appears otherwise to the user, a browser does not "jump" the user from one place to another. Instead, when a link is activated, the browser requests a copy of some packet of information from a computer, a document which might consist of text, an image, an executable program, or whatever else might be stored in a computer file. If that computer is a "Web server," meaning it has been configured to be recognized by Web browsers, and if that server grants the browser permission to access that information (it might not), only then will a copy be sent to the user's computer to be read and displayed by the browser. Links makes this negotiation between client and server transparent to the user:
A link is specified as an ASCII string from which the browser can deduce a suitable method of contacting an appropriate server. When a link is followed, the browser addresses the request for the node to the server. The server therefore has nothing to know about other servers or other webs and can be kept simple. [Berners-Lee & Cailliau, 1990]
Documents are owned, and whoever owns them may elect to remove them from the Web at any time, or change their location, or their name, creating "dead links" in other documents that point to them. There is no central repository of material accessible by Web browsers, which means that authors have no way of knowing for certain the other documents to which they might link. There is no signed agreement that owners must make their information accessible to all for all time. It is possible to have a completely self-contained and closed set of documents that might never be accessed because no other documents link to them. Divisions on the Web are identified by the phrase "Web site," denoting territory or property. In fact, current advice widely given to people designing their own Web sites is that they continually change their information so that people will want to revisit, or that they form alliances with other site owners and agree to link to one another in "web rings." Other empire building techniques undoubtedly will continue to surface.
We might depict the progression from hypertext vision to hypertext practice in this way: The hypertext vision, which is independent of any specific technological implementation, is of information perceived by the user to be joined into a single Document, with each user able to wander according to his or her own interests and motivations. To the user, the "WorldWideWeb" facilitates the movement of the user from one document, commonly called "pages" or "URLs", to another. Technically, however, the Web facilitates movement of documents from a server to a client computer, preserving issues of ownership, boundaries, and territory. To the increasing numbers of Web squatters who want to entice users to return to (and stay within) their "site" (the collection of documents over which an owner has complete control), the Web is a collection of destinations. Since these sites are increasingly in competition with each other for user attention, they are, therefore, decreasingly likely to pursue or support the original hypertext vision. In fact, it is more common that designers are expected to ask permission before linking their site to another. And in the emerging commercial world of the Internet, one designer might have to pay another in order for the second to establish a link to the first.
From this progression, certain methodological problems emerge for communication researchers. First, it is clear that researchers should not use philosophical discussions of hypertext as the basis for making claims or building theories about characteristics of WBC or Web activity. "Hypertext" is bound up in a theoretical perspective that has not been--and may not ever be--fulfilled technologically in the fundamental architecture of the Web. Rather, researchers need to be able to represent and assess the characteristics of Web structure unencumbered by references to hypertext as the ideal against which all else should be measured. Indeed, the methodology should allow comparisons to other "ideals," including the various models and theories of communication. The second methodological problem comes into focus once the allegience to the hypertext model is relaxed or abandoned: how do we inquire about the characteristics of WBC? As one line of inquiry, the rest of this paper explores the representation and interpretation of structure.
Representing Structure in Web-based CommunicationMost examples of computer-mediated communication (CMC), such as electronic mail or computer conferencing, extend the interactional mode of communication, as in a conversation or a group meeting. In the interactional mode, all participants have equal access to the communication space and participants are not predefined as speakers or audience. Web-based communication, in contrast, primarily is an extension of the presentational mode of communication, as in a public speech, a television broadcast, a newsletter or an advertisement. Web sites are designed primarily to be modified only by persons who "own" them, or in ways sanctioned by their owners (e.g., site visitors may contribute to a comment list, add URLs, or participate in chat using owner-provided interfaces). These owners plan and deliver the communication. This characteristic is reflected in the terminology used to describe WBC. The speaker or communicator is the Web "designer" or "author," and the audience are the "users." In traditional communication media, the speaker structures the presentation of information. Whether a user has the ability to bypass that structure has been a function of the medium itself. For example, a public speech must be heard from beginning to end only once, yet a book could be opened to any page any number of times, and a videocassette could be viewed at any point by fast-forwarding or rewinding through the tape. In WBC, the ability of users to vary the way they access information is not determined by inherent characteristics of the medium, but can be controlled by the designer through the manipulation of web structure. A study of WBC, therefore, can be construed as a study of the relationship between a speaker and an audience.
The ability of the user to move through information on the Web is limited to three means: entering the address of a location that the user already knows, scrolling through a single document, or following a hypertext link. The first means, entering an address, resembles picking up the needle on a phonograph from one place and putting it down in another. Every Web document or "page" possesses a unique identifier called a Uniform Resource Locator (URL). The user instructs a Web browser program, such as Netscape, Internet Explorer, Mosaic, or Lynx, to display information from a specific location as identified by the URL. The direct use of URLs releases the user from limitations on navigation between Web pages as might be dictated by Web designers. Browsers also support this by allowing users to save locations in "bookmarks." The second means of movement is scrolling through a single page. A page may be any length, from a single screen to hundreds. Scrolling through screens would be similar to the experience of reading a book. Scrolling allows unconstrained movement within a single document.
Movement by either of these two methods is under the control of the user. In the first, the user selects the URL to display. In the second, the user determines when to scroll a page to view additional information. In contrast, the third method of movement, following a hypertext link, is controlled by the designer. A link, when activated by the user, will instruct a browser to "jump" from one specific URL to another. User movement is limited and directed by the availability and placement of links. Users cannot control the placement of links. Placing a link requires altering the Web page, which is simply a file on a server for which permission to "write" to the file (for example, place links within the document) is limited to the author and/or author-designates. Although the designer may provide forms which permit visitors to add links to the site, he or she retains full control over how and when the link is added, and whether it will be retained or deleted.
Several Web pages collected together form a Web site. Typically, Web sites share an entry page, and are under the control or ownership of a single designer. Exceptions may exist in large corporate or organizational sites that are complex and involve hundreds of pages. Links may be divided into two general types, internal or external. An internal link takes the user to another document within the same site. An internal link might even take the user to another location on the same Web page, bypassing the use of scrolling, and reinforcing the designer's control of user movement. An external link takes the user to a document at a different Web site, typically not under the control of the author or designer.
According to conventional wisdom, the use of links defines the Web as a hypertext medium. A significant amount of scholarly attention has been paid to the implications hypertext-based expression might have for the way we think. A hypertextual structure is seen to support nonlinear or multilinear progression of ideas and associative thought ([Fuller & Jenkins, 1995]; [Heim, 1993]; [Landow, 1990]; [Lanham, 1993]). For authors of Internet guides, such as [Krol (1992)] or [Reddick & King (1996)], the Web is an ever-expanding source of interlinked information. From their viewpoint, the Web is a collection or a repository and users are attempting to locate specific information housed within it.
These depictions miss the fundamental strategic character of Web structure. Links are not generated automatically in Web architecture. There are no natural or automatic links between information. (Even search engines use algorithms that must be designed and programmed.) Instead, every link is planned and, most often, specifically created by the web designer. Thus, the presence of a link reflects a communicative choice made by the designer. A link, therefore, is strategic. The possible variations for structure are shaped by communicative ends, rather than technological means. The use of the link in the creation of Web structure enables the designer to control the potential ways a user can move through information. Web designers might choose to use a very limited number of links, or to use them in a traditional indexing fashion, or to use them to encourage linear progression through the material, or to use them conscientiously to approximate an associative experience for the user. Differences in structure reflect differences in communicative agendas.
Once we become critical of the assumption that the Web is a neutral repository of information, the structure of the Web becomes much more interesting. Methodological questions arise relating specifically to Web structure and the use of the link. How do we represent or capture the variations that demonstrate strategic use of structure in Web-based communication? How do we inquire into the communicative implications of this new structure? One possibility is the adaptation of a tool developed to represent the structure of communication processes in a social context: network analysis.
A Methodological Tool: Network analysis
Network analysis is a methodology for mapping relationships onto a (typically) 2-dimensional space. The methodology is widely used to investigate a range of communication and social phenomena [see Garton, Haythornthwaite & Wellman, this issue, for further discussion]. Even a slight familiarity with the diagrams, or "sociograms," produced by network analysis is sufficient for recognizing the similarity between these maps and the fundamental structure of the World Wide Web. The similarity may be too obvious, of course. The insights of network analysis come from its ability to represent and assess structures that may not be obvious or apparent, yet would still influence communication. In other words, network analysis reveals the structures present in social interaction. By contrast, the structures of Web sites are rather obvious and fairly easy to represent. There is no need to reveal structure, for it is manifest. Yet there remains a need to interpret and evaluate structure. The remainder of this section discusses representation of structure using network analysis. The next section turns its attention to how the representations could be used to assess WBC.
On a surface level, network analysis translates well into the Web environment. As common sense suggests, the Web is itself a network. Representing the structure of a Web site requires identifying the pages composing a site and the set of internal and external links programmed within those pages. On most sites, pages and links are all directly observable. Software programs are available that will download a complete copy of a Web site and automatically construct a map that displays pages and links. These programs change constantly and, undoubtedly, will increase in functionality as new software is developed. Given that pages and links are directly observable, there are none of the problems typically faced by researchers using network analysis: there is no need to rely on recall or observation to collect data, there are no discrepancies among various sources in their reports of relationships, and there is no subjectivity in determining the strength of a specific relationship [Rogers & Agarwala Rogers, 1976]. Yet the utility of network analysis is not the production of a map, it is the ability to assess and represent the nature of communication structure. This is precisely the methodological problem that needs to be addressed in the study of Web-based communication.
The remainder of this section follows rather closely the treatment of network analysis presented by [Knoke and Kuklinski (1982)]. They aim to present a "basic primer designed to guide the interested user" (p. 8), rather than a technical treatise. In following the issues they raise, this article might serve as the first step in building a similar primer for extending network analysis to WBC. Network analysis is treated here on a very basic level. Various directions in contemporary network analysis are reviewed in [Richards & Barnett (1993)] and appear in the journal Social Networks.
Network analysis originates in theoretical perspectives that seek to take seriously the effect of social contexts on actor behavior [Rogers and Agarwala-Rogers 1976]. The intent is to take a contextual rather than individualistic or atomistic approach. A contextual perspective is consistent with studying WBC, although interrelationships cannot be assumed necessarily to exist for all pages because it is possible for a document to be accessible to a user but contain no links at all. (The user would access the document by the URL rather than by a link). Four assumptions underlying network analysis are discussed here, two that carry to analysis of WBC and two that do not.
First, [Knoke and Kuklinski (1982)] suggest that network analysis assumes that "any actor typically participates in a social system involving many other actors, who are significant reference points in one another's decisions" (p. 9). Though Web documents do not have intentionality or the ability to make decisions, it is the case that they may serve as significant reference points. A similar argument is made by [McLaughlin (1996)] in the use of "embeddedness" as a dimension of Web sites: to what extent can a site be constructed so that it seems to be embedded within a system of other related sites, and to what extent do these sites serve as reference points for one another? Embeddedness need not refer only to external links, the concept could apply to the internal structure of a site as well: what is the nature of the system within the site?
The second assumption is that structure can be defined as the regularity of patterns in relationships at various levels. Thus, certain phenomena are not evident when considering individual actors and exist only in the patterns of relationships. Network analysis of WBC must also emphasize the investigation of patterns of relationships. The meaningful appearance of such patterns is crucial to knowing whether network analysis will generate insight into WBC, or whether it will be a trivial transposition of a mapping technique.
Two additional assumptions of social network analysis do not apply to WBC. First, network analysis assumes the dependence of elements within a network. Web structures are independent of one another. The structure of a Web site is not a function of the structure of another site. Changes in structure must be programmed by the designer. Undoubtedly, designers alter their own sites in response to changes in others. However, a change in one site cannot cause a change in structure in another site. Indeed, the problem of "dead links," or links to documents that no longer exist, is evidence of the role of "clockmakers" in the Web view of the universe. The exceptions to this are sites that use database-driven structures to create pages dynamically. These pages are created in response to an immediate request from a user. As more sites utilize dynamic Web page creation, the characteristic of structure independence may lessen significantly.
A second assumption of network analysis is that structures are emergent. Emergence implies that any particular structure may be in various stages of existence and that the development of that structure is the result of the interaction of contextual elements. A link, in contrast, may not emerge. Links are planned and they do not change. Even sites that create Web pages dynamically are governed by static algorithms and program code. Further, the interaction of contextual elements will not affect the properties of a link. Until removed or changed by the designer, the link will continue to exist whether or not it is ever followed by a user. Nor is a link affected by changes to the Web pages that anchor that link. For as long as the designer maintains the link within the source document, any other changes can be made to either the source document or the target document, and they will not affect the link. If the URL of the target document changes, the link will be "dead," and the user will face an error message when attempting to follow the link. But a dead link reflects designer error rather than emergence.
This is not to say that emergence is not a useful concept for analyzing WBC. Patterns of Web use, for example, may be emergent, changing as users gain more experience with "surfing the Web." Applications of WBC may also be emergent, as the Web expands to an ever increasing number of communication domains including personal, political, and commercial domains. Structure might even be understood as emergent if the designer is defined as the node of analysis, not the Web page or Web site. However, it remains that the links between documents may not generate new or unexpected structures. Nor does movement between links generate new links or new configurations of links. Web structure, on a technological rather than social level, is not emergent.
There are a core of terms basic to network analysis:
- actors or nodes,
- relations and attributes,
- network, and
- network structure.
Actors or nodes are those units that possess some attribute that identifies them as "members of the same equivalence class," and, therefore, members of a network. Given that Web elements change only when designed to do so, the term "node" is more appropriate than "actor." In social network analysis, nodes may be people, objects, or events. In Web network analysis, nodes are typically documents, though the unit of analysis might be defined as an entire Web site. In this case, the node would be the site. Whether it is the document or the site that will serve as the node depends on the level of analysis, which is discussed below.
A relation is a property of the connection between units of observation. Relations may be assessed along a number of dimensions, such as whether or not they are present between two units, their strength, their stability. A relation is a constituted in a specific context; it is not an intrinsic element of a unit of observation. Attributes will be consistent regardless of context and regardless of whether a unit interacts with other such units. Relations are constituted by those interactions. Patterns of relations create in a system properties that cannot be measured by aggregating the attributes of individual nodes. Internal and external links are relations, and the use of design techniques such as color, text, and formatting, are attributes.
A network is the specific type of relation linking the nodes. In social network analysis, it is possible to construct multiple networks within the same set of nodes by attending to differing types of interaction. For example, one might find networks of informal communication, semantic networks, task networks, status networks, and so forth. For Web structures, there really is only one network that might be mapped given any set of units. While there certainly exist different types of links (for example, links to documents, links that send email messages, or links that activate programs or applications), each relation may be represented completely in a single network. There is no need for the isolation of various types of relations into multiple networks. Finally, the network structure is the configuration of ties between nodes. This is the product of the identification of nodes and relations.
Network analysis requires that the researcher collect information from all nodes identified as elements of a network [Rogers and Agarwala-Rogers, 1976]. Every node omitted in the analysis reduces the number of links that potentially might be identified by N-1. Such omissions could have a significant effect on the pattern that is identified to exist within the network, particularly as network size increases. The planned limitation of data collection through a sampling procedure can result in significant problems for data analysis. It is preferable to collect data from the population of subjects. This requirement is a central problem for social network analysis, but it is a straightforward matter in Web network analysis. A researcher may browse completely through a site, following all links, to collect data (the visited documents will be housed in the user's cache directory, which may be accessed by various cache-reading software programs), or software programs may be used that download a copy of an entire site, keeping intact the site structure.
Next the researcher should assess the basic properties of relations. Common properties are relational strength and reciprocality. In Web network analysis, strength may be conceived of as the likelihood that following a link in one document (A) will take a user to a second document (B). Strength may represented by the ratio of links to B in A by the total number of links in A. Reciprocality is a property of communication flow and indicates whether a relation is unidirectional or bidirectional. Most links are unidirectional. Certain types of links could be considered bidirectional, such as a "mailto:" link that prompts user input that is then sent as an electronic mail message. If the unit of analysis is the document, then bidirectionality could also be defined as whether document A links to document B and document B links to document A.
Once constructed, a network structure may be analyzed on several levels. Following the discussion of [Rogers & Agarwala-Rogers (1976)], three levels are considered here: node level, clique or group level, and systems level.
Node level analysis compares structural properties of the nodes within the network. An analysis on the node level would be appropriate for investigating claims regarding the nature of Web pages. An example of a measure that might be used in node level analysis is the integration of a node's personal network. Given that a particular node (A) will be connected directly to a set of nodes (X), the pattern of those connections constitutes the node's personal network. Integration of that network is figured by the ratio of actual links among the nodes in set X to possible links among those nodes. The resulting pattern may be indexed along a range from highly interlocking to radial. For Web network analysis, the index of personal networks might serve as an indicator of a page's position in a site. A radial network suggests a single document is a launching point or entry point into a site, thus serving a gatekeeping function, while an interlocking structure suggests any single document within a site plays a role or performs a function that is highly dependent on its context in a specified set of documents. Analysis of server log files to determine patterns of user entry and paths through the site can indicate whether or not planned and observable navigation patterns actually coincide.
A node level analysis may seem to contradict the point made above, that network analysis adopts a contextual rather than atomistic perspective. While analysis allows the researcher to draw conclusions about nodes, the analysis focuses on patterns and characteristics of relationships. As [Knoke and Kuklinski (1982)] suggest, node level characteristics might function similarly to attributes. For example, perceived authoritativeness of a document might be a function of both attributes (such as informational content) and personal networks (a radial structure suggests a document is located at the top of a hierarchy, typically a position of authority).
A clique level analysis focuses on the relationships of subgroups within a network. Though traditional network analysis imposes a strict definition of a clique as complete interconnectedness among a set of nodes, a common definition is that the members of a clique possess stronger relationships among themselves than among other members of the system. Cliques might be identified within a Web site, if the researcher sets the site as the network boundary, or across sites if the scope is wider. Of the several clique-level measures that have been developed, two are discussed here. One measure is integration, or the nature of the relationships between the clique and the larger network. For example, a researcher might investigate the number of links that connect a clique to other elements in its environment, and the directionality of those links. Integration suggests the likelihood that a user will travel across clique boundaries and directionality of connections assesses how easily users might enter into a clique (by following an link from a source external to the clique) and how likely it is that the user will leave the clique (by following an external link). An additional comparison useful for Web analysis is the relative integration of connected cliques. Similarity of integration patterns displayed by various cliques might be used by a designer as a strategy for either differentiating a clique from others, or assimilating a clique into its surroundings. A pattern of low integration, for example, creates a sense that each clique is autonomous and independent. A Web designer might use this strategically to suggest the experience of the multiple rooms in a MUD (multi-user dungeon). Highly disparate levels of integration among a set of cliques might be used to focus attention on the content of a particular set of pages, the way repetitive phrases and content focus attention in a public speech.
A second clique-level measure is connectedness, or the ratio of actual connections existing within a clique to the total number that are possible. A ratio of 1.00 would indicate a structure in which each node is connected bidirectionally to every other node. This completely connected structure imposes no constraint on movement, and would allow the user to move associatively through the clique. Such associative movement is at the heart of the original hypertext vision advanced by Vannevar Bush (1945) and Theodor Nelson (1967). As a "hypertext index," connectedness would be useful to researchers interested in assessing the extent to which Web designers are able to create structures capable of supporting associative thought.
A final level is system level analysis. Examined here is the overall structure created by the connections between all nodes within the network. As with clique-level analysis, several indices might be used to assess the system structure. Two will be discussed here, dominance and connectedness. Dominance is the deviation from equality of the distribution of links among nodes or among cliques. In other words, to what extent are links evenly distributed among system components. In a system exhibiting high dominance, most links will connect to a select number of nodes or cliques. Connectedness, at the system level, is the ratio of actual connections existing between cliques to the total number that are possible. The combination of dominance and connectedness provide a useful technique for identifying various structures as suggested in Table 1:
High Dominance Low Dominance High Connectedness Satellite structure Hypertext/Associative structure Low Connectedness Index structure Linear, narrative structure Table 1: System level structures
A system with high connectedness and high dominance indicates a system with a high number of links, but a very skewed distribution of those links. Such a system might exhibit a "satellite" structure in which a few dominant nodes are central and the remaining nodes are ancillary. A designer, in this structure, would use links to build redundancy in the relations between any two nodes or cliques. This would encourage the user to move often from center to periphery, giving the site a dynamic, "pushing," feel. A satellite structure might be used strategically by a designer to differentiate "primary" information from "secondary" or supporting information or to focus the user's attention on a small number of "central" documents. It is also a typical structure used by news organizations in presenting an online news product.
In a system with high connectedness and low dominance, a high number of links are evenly distributed across system components. In such a structure, users may move from any one system component to any other, at any time. This is the ideal structure to support associative movement as represented in original hypertext visions. It also requires the designer to abdicate any control over user exploration and movement throughout the site. While the user's paths may not be controlled, the use of such a structure still may serve strategic ends, such as encouraging users to "get lost" within a site, meandering and exploring and, therefore, increasing their contact with the site's content.
A system with low connectedness and low dominance possesses few connections, but these are evenly distributed across system components. Such a structure would be consistent with linear, traditional narrative that offers users few paths through information. For example, a designer might divide a story or article into sections. Each page in the Web site would consist of one of these sections. By placing two links on each page such as "next" and "previous," a designer would constrain a user to move through the material in a specified and determined order.
A system with low connectedness and high dominance would possess few connections, concentrated among a few system components. A designer might use such a structure if movement through the site were not a priority. For example, the primary purpose of a particular site might be to act as a repository for information. Users of that site are expected to know what information they need and simply wish to get that information. An example of this structure is an index, or a "list of links," with a central page listing all other pages and linking to them. In such a situation, whether or not the user explores the site is irrelevant to the designer. In fact, the need to explore indicates that the site is not structured to maximize the efficiency of locating information.
Existing measures for network analysis have been developed for the purposes of understanding social or cognitive structures. Consideration of several of the most fundamental of these measures demonstrates that transposition onto Web structures is possible, that the validity of the measures is adequately preserved, and that the measures may generate insights into the nature and characteristics of relations on the Web. The use of an established methodology provides researchers with a means to ensure a precision and depth of analysis that would simply be impractical in a network of any complexity. Undoubtedly, as attention is given to network analysis of Web structures, additional measures may be developed to respond to specific considerations of WBC and CMC researchers.
Communication researchers have not drawn on network analysis as a methodology for analyzing Web structures. Perhaps this is because the World Wide Web is such a new communication medium. Perhaps it is because the idea of using a methodology based on the metaphor of a network to examine a communication medium based on the metaphor of a web seems to be so obvious that it threatens to be trivial. Further, the power of network analysis has been its ability to reveal structure in social interaction and such revelation is not required in analyzing Web based communication. Network analysis recognizes the emergence of communication phenomena, yet Web structures are not in themselves emergent. Despite these differences, the methodology has significant potential to generate insight into the communicative nature of Web structures.
Web structures are designed and planned. An analysis of structure is significant to the extent that such analysis contributes to the development of theory. [Knoke and Kuklinski (1982)] make this observation concerning the relationship between social network analysis and theory:
If network analysis were limited just to a conceptual framework for identifying how a set of actors is linked together, it would not have excited much interest and effort among social researchers. But network analysis contains a further explicit premise of great consequence: The structure of relations among actors and the location of individual actors in the network have important behavioral, perceptual, and attitudinal consequences both for the individual units and for the system as a whole. (p. 13, italics in original).
Web structures are not composed of actors, yet there may exist a similar "premise of great consequence." Perhaps it is what has been suggested here: The structuring of relations among nodes is a powerful strategic tool unique to communication in Web-based media which will have important consequences for the way we communicate, and for what we understand as the structure of communication as a whole.
Berners-Lee, T. (1990). Information management: A proposal [Online]. Available: WWW URL http://www.w3.org/pub/WWW/History/1989/proposal.html.
Berners-Lee. T, & Cailliau, R. (1990). WorldWideWeb: Proposal for a HyperText project [Online]. Available: WWW URL http://www.w3.org/pub/WWW/Proposal.html.
Bush, V. (1945, July). As we may think. Atlantic Monthly.
December, J. (1996). Units of analysis for Internet communication. Journal of Computer-Mediated Communication, 1 (4) [Online]. Available: WWW URL http://www.usc.edu/dept/annenberg/vol1/issue4/december.html.
Fuller, M., & Jenkins, H. (1995). Nintendo and new world travel writing: A dialogue. In S. Jones (Ed.), Cybersociety: Computer-mediated communication and community (pp. 57-72). Thousand Oaks, CA: Sage.
Garton, L., Haythornthwaite, C., & Wellman, B. (1997). Studying online social networks. Journal of Computer-Mediated Communication, 3 (1). [Online]. Available: WWW URL http://184.108.40.206/jcmc/vol3/issue1/garton.html.
Heim, M. (1993). The metaphysics of virtual reality. New York: Oxford University Press.
Jackson, M. H. (1996). The meaning of "technology:" The technology-context scheme. In B. Burleson (Ed.), Communication Yearbook 19. Thousand Oaks, CA: Sage.
Knoke, D., & Kuklinski, J. H. (1982). Network analysis. Beverly Hills: Sage.
Krol, E. (1992). The whole Internet: User's guide and catalog. Sebastopol, CA: O'Reilly & Associates.
Landow, G. P. (1990). Hypertext and collaborative work: The example of Intermedia. In J. Galegher, R. E. Kraut, & C. Egido (Eds.), Intellectual teamwork: Social and technological foundations of cooperative work (pp. 407-428). Hillsdale, NJ: Lawrence Erlbaum Associates.
Lanham, R. A. (1993). The electronic word: Democracy, technology, and the arts. Chicago: University of Chicago Press.
McLaughlin, M. L. (1996). The art site on the World Wide Web. Journal of Communication, 46, 51-79. See also McLaughlin, M. L. (1996). The art site on the World Wide Web. Journal of Computer-Mediated Communication, 1 (4). [Online]. Available: WWW URL http://www.usc.edu/dept/annenberg/vol1/issue4/mclaugh.html.
Nelson, T. H. (1967). Getting it out of our system. In G. Schechter (Ed.), Information retrieval: A critical review. Washington, D.C.: Thompson Books.
Reddick, R., & King, E. (1996). The online student: Making the grade on the Internet. Ft. Worth, TX: Harcourt Brace.
Richards, W. D., & Barnett, G. A. (1993). (Eds.) Progress in communication sciences, Vol. XII. Norwood, NJ: Ablex.
Rogers, E. M., & Agarwala-Rogers, R. (1976). Communication in organizations. New York: Free Press.
About the Author
Michele Jackson (Ph.D., University of Minnesota, 1994) is Assistant Professor in the Department of Communication at Florida State University. She has been a presenter at numerous conferences and has published several pieces in the area of computer-mediated communication, most recently in Communication Yearbook. Her research interests include structures of communication, the organizational and group implications of communication technologies, and group communication.
Address: Michele Jackson, Department of Communication, Florida State University, Tallahassee, FL 32306-2064.