![]() |
|
Collaborative U | CMC Play | E-Commerce | Symposium | Net Law | InfoSpaces |
Clustering on the Net:
Applying an autoassociative neural network
to computer-mediated discussionsMichael R. Berthold
Institut für Rechnerentwurf und Fehlertoleranz
University of KarlsruheFay Sudweeks
Key Centre of Design Computing
University of SydneySid Newton
Department of Design Studies
University of Western Sydney, NepeanRichard D. Coyne
Department of Architecture
University of Edinburgh
Table of Contents
- Abstract
- Introduction
- Typicality
- The Data
- Autoassociative Neural Networks
- Applying an ANN to 3000 Messages
- Results Drawn from the ANN
- Typicality in CMC
- Conclusions
- Acknowledgements
- References
- About the Authors
Abstract
ProjectH, a research group of a hundred researchers, produced a huge amount of data from computer mediated discussions. The data classified several thousand postings from over 30 newsgroups into 46 categories. One approach to extract typical examples from this database is presented in this paper. An autoassociative neural network is trained on all 3000 coded messages and then used to construct typical messages under certain specified conditions. With this method the neural network can be used to create "typical" messages for several scenarios. This paper illustrates the architecture of the neural network that was used and explains the necessary modifications to the coding scheme. In addition several "typicality sets" produced by the neural net are shown and their generation is explained. In conclusion, the autoassociative neural network is used to explore threads and the types of messages that typically initiate or contribute longer lasting threads.
Introduction
The web of computer networks reaches into homes and organisations, and high-speed network highways provide a medium for communication and community formation on a scale that has never been feasible before. New mores are being created as people invest varying amounts of time and energy in communicating on computer mediated discussion groups with "strangers". What triggers and sustains communication among people who have never met and may never meet face-to-face? Can we measure or predict the interactivity of communication and the interrelatedness of a virtual group? Rafaeli and co-workers (Rafaeli, 1986, 1988; Rogers and Rafaeli, 1985; Rafaeli and Sudweeks, 1997) argue that the variable affecting the interactive nature of messages, threads and groups is the theoretical construct of interactivity - the degree to which communication transcends reaction. Interactivity is a pivotal measure of the social dynamics of group communication (c.f. Rafaeli and Sudweeks, 1997).
In this paper we use a connectionist model to analyse and explore the features of messages that initiate or contribute to longer lasting threads to enable us to propose a model of a typical interactive ("referenced") message. The data set comprises 3000 postings to 30 newsgroups classified on 46 variables or groups of features. In the context of categorisation, each variable equates with a reference point or feature within some information setting.
First, we define typicality. We then describe the compilation and preparation of the data set and give an overview of autoassociative neural networks (ANNs). The results of the study are reported and conclusions drawn. In summary, our findings provide further support to interactivity as a variable of communication settings.
Typicality
A key component of human thought is our ability to identify distinct categories, or classes of information, which impose order on an otherwise amorphous, continuous mass of sensory input. The "classical" view of categories is that they carve the world according to well-defined "natural" boundaries (Smith and Medin, 1981; Pulman, 1983). There is a concept of essentialism -- that a category has some intrinsic nature. Category boundaries, though, are ill-defined, if at all existing. What does exist are points of reference to which comparisons are made and which are combined in different ways depending on the particular context (Newton, 1992; Smith and Medin, 1981; Rosenman and Sudweeks, 1995).
An alternate approach considers a typicality effect when dealing with categories (Rosch, 1978); that is, the more typical an example, the "better" the membership of the category. The most typical members are referred to as prototypes or exemplars (Kahneman and Tversky, 1973; Collins and Loftus, 1975). However it remains extremely difficult to formulate its structure. How, for example, is the degree of correspondence (i.e. the similarity) between candidate and exemplar determined? The most structural interpretation of similarity is based on features. The features are used to codify the known members, and act as a reference against which a decision about the membership of some candidate entity can be gauged. In its most trivial form, membership is determined on the basis of a candidate having a minimum "threshold" number of features in common with the category representation; that is, some critical sum of the weighted features. The best-known applications of this process are the "contrast" (Tversky, 1977) and "spreading activation" or "connectionist" (Collins and Loftus, 1975; Rumelhart and McClelland, 1987) models. In both cases, membership is determined on the basis of both similarity and dissimilarity.
In the contrast model, ratings are summed statically. Statistical analyses, such as a Euclidean cluster analysis, provide techniques for identifying correlations between particular features in a given data set, which is a useful indication of where the aggregation (boundaries) within a data set might appear. This form of analysis is widely recognised as providing a static view of data (a "snapshot" of typical and atypical instances) as the clusterings are based entirely on pair-wise correlations. In human cognition, however, the clusterings are more dynamically created across all features synchronously. As features are drawn into particular groupings they form dynamic allegiances which can effectively overrule the original cohesion based on a simple pair-wise correlation.
In the spreading activation model, weights are applied dynamically to coerce other features into play. Groups of features are formed, and these groups use their combined weights to force incompatible features out of consideration. In this sense, categories "emerge" over several iterations, and final groupings represent the summary features of an implied category (Coyne and Yokozawa, 1992). Categories are implied by the example entities used as input to the "training" part of the connectionist approach.
The effect of dynamic clustering of features to extract a typical message is explored in this paper.The connectionist (or autoassociative neural network) approach exploits a distributed description of each particular message (instance) as a pattern of activation across all features (nodes). A particular clustering of features (category) emerges as the network stabilises on a particular pattern of activation. Each message is described in terms of features, such as relevance, time, tone and so on. The pattern of activation captures complex information about dependencies between combinations of features. In identifying typical referenced messages in mediated discussions, a profile emerges of the features that engage the attention of others, encourage participation, and predict the formation and/or maintenance of interactive communication settings.
The Data
The data set was created by ProjectH, a large group of researchers who collaboratively collected a representative sample of computer mediated discussions to study social and linguistic dynamics in public newsgroups and mailing lists. Batches of 100 messages were downloaded from randomly selected discussion groups on Internet, Bitnet and Compuserve and coded on 46 variables. In all, 4322 messages were coded of which 1000 were double coded for reliability purposes, 2000 were single coded, and 322 were partially coded batches. The partially coded batches were excluded, and one of each double-coded list was chosen randomly resulting in a database of 3000 messages. (See Sudweeks and Rafaeli (1996) and Rafaeli, Sudweeks, Konstan and Mabry (1994) for a full description of the project.)
For the present study, the database was converted to a form ready for processing by a neural network. First, identification codes (author-id, coder-id and message-id) were deleted. Second, the date and time stamp was converted to two new entries -- one indicating day of week and the other time of day (worktime, evening, night). Third, three new entries were computed since the exploration of the nature of threads was a main focus of analysis:
- reference-depth: how many references were found in a sequence before this message.
- reference-width: how many references were found, which referred to this message.
- reference-height: how many references were found in a sequence after this message.
These entries were extracted from the original database, but were not present in individual entries, because they refer to sequences of references.
The entries, now 51 per message, were recoded individually to suit a neural network. Since the original coding scheme had several options per entry (e.g. four different classifications for the number of lines in a message: <10, 10-25, 25-100, and >100), each entry was split into as many "features" as the entry had options. In the case of number of message lines this led to four features, each having only two possible values: 1 (or on), indicating the feature is present; or 0 (off), indicating the feature is absent. Note that each "group of features", which resembles one entry in the original database, always has one option chosen; that is, each group of features is mutually exclusive. The recoding resulted in 149 binary features in the new database. (See Appendix A in Berthold, Sudweeks, Newton and Coyne (1997) for a list of entries and how they are split into groups of features.)
Autoassociative Neural Networks
Autoassociative neural networks (ANNs) are special kinds of neural networks that are used to simulate (and explore) associative processes. Association in these types of neural networks is achieved through the interaction of a set of simple processing elements (called units), which are connected through weighted connections. These connections can be positive (or excitatory), zero (which indicates no correlation between the connected units), or negative (inhibitory). The value of these connections is learned during the "training" process of the ANN (a detailed description of ANNs with examples is in the full report of this study (Berthold et al., 1997)).
During training, patterns are presented to the network and weights are gradually adjusted in a way that the final pattern of connectivity matches all patterns being presented. One complete presentation of all patterns with which the network is trained is called one epoch; usually a network requires many such epochs to perform satisfactorily. The weights can therefore be seen as a distributed representation of the data.
Training examples are presented to the network one after the other, each unit is inspected, and the weights leading to this unit (and therefore influencing its activation) are adjusted according to the following rules:
- When unit A and B are simultaneously excited (or correlated), increase the strength of the connection between them.
- When unit A and B are counter-correlated, decrease the strength of the connection between them.
The interpretation of the units can be manifold. The units, for example, can represent aspects of things (features) or they can symbolize certain actions or goals. Another possibility is that a single unit could represent an hypothesis about certain properties of a model.
Applying an ANN to 3000 Messages
Since the data consisted of 149 features, each taking a value of either "0" or "1" after processing, the network has 149 binary units. This leads to 149*149=22,201 weights and 149 thresholds to adjust during training.
The idea of this type of network is to present each of the 3,000 training patterns to the network and adjust the weights in a way which stores the information contained in each of the patterns. Each unit is connected with each other unit via a directed arc, thus allowing the units to have an excitatory (positive weight) or inhibitory (negative weight) influence on each other individually. The pattern of connectivity (or weight matrix) will be explored in the results section.
Training a network with over 20,000 weights
Training this neural network obviously requires millions of computations. In this case the network consists of 22,350 parameters (22,201 weights and 149 threshold values) to adjust and to update each of them, 149 activations have to be computed, each of them requiring again 150 weighted summations. This has to be done for each of the 3,000 patterns in the database, which leads to a computation of over 65 million weighted summations (or connections) plus roughly the same amount of compare and update operations per epoch. Almost 100 hours of CPU-time were spent before the error-rate of the network started to settle on a plateau (see Figure 1) after five days.
![]()
Figure 1. The decreasing number of misses vs. training time is shown in this graph. After five days of training, when the error finally reached a plateau, training was stopped.
Figure 2 shows a weight diagram. Each of the weights is presented as a small rectangle. Its colour indicates the value of the corresponding weight: blue means a negative value, grey equals zero and red indicates a positive weight.
![]()
Figure 2. The matrix of 149*149 weights. Blue means a negative value, red values indicate positive weights. The bottom row of points represents the threshold values of each unit.
Results Drawn from the Trained ANN
Interpreting the weight matrix
Looking at the weights (Figure 2), one interesting property comes to mind: using the 149*150 = 22,350 weights the network is able to recall the 3,000 messages almost perfectly. 3,000*149 = 447,000 on/off-records (or bits) are stored in approximately 67,000 bits (each weight stores about 3 bit), a compression factor of close to one order of magnitude. In addition the access to common features of all messages is much easier using the neural network compared to a global search in the whole database.
A closer look at the weight matrix shown in Figure 2 gives some interesting insights. Here only a few examples are listed:
- the blue squares along the diagonal line show that all features of one group are mutually exclusive, only one of them can have the value 1 at a time. This results in strong inhibitory connections from one feature to all others in one group and can be seen as a block of blue points along the diagonal with its width according to the number of features in this group.
- some excitatory connections (shown by red squares at the corresponding position):
- in column 1, corresponding with the feature MSGLINES-A (1-10 lines of original text, see also Appendix A) and row 13 (OPINION-A = no opinion is stated) an almost white square indicates a strong excitatory connection from unit 1 to unit 13. This leads to the conclusion that short messages (1) state no opinion (13). However this does not mean that short messages always state no opinion, but it is a property of the database the network picked up. It is still possible that other units inhibit unit 13 much stronger and therefore it is not always going to be active when unit 1 is active.
- column 1, row 43: short messages (1) request information (43).
- column 33, row 1: unformatted messages (33) are short (1).
- and another inhibitory connection (blue square):
- if message contains an artistic icon(s) (54, 55) it is not short (1).
Of course, those observations can only reveal relations between a few of the networks' units. To explore the dependencies between all of the 149 units, especially under certain conditions (usually modelled through clamped units), the network has to settle in a state with low energy. Clamping of units can be used to restrict the space the neural network is exploring to find a solution. In the case of the ProjectH-ANN this technique was used to define certain properties of a message and let the network determine which other features are correlated.
This leads to the creation of typical examples for specific features being on, discussed in the next section.
Creating typical examples
To create typical examples, the feature(s) that is required to be present in the feature set is forced to be on (i.e. having an output value of 1). Such forcing (or clamping) of units restricts the feature space of solutions to a subspace and therefore eliminates unwanted solutions. The network eventually settles at the typical pattern. This worked well in the work described in Coyne and Yokozawa (1992) but in the case of the ProjectH-network, there are several states in which the network can settle. This is mainly because the input data is not free from noise and not all of the units are going to be strongly correlated to other units (or correlated at all).
But if the network is allowed to settle several times, each time from a different random starting state, some features occur more frequently in the final states than others. Table 1 shows the frequency of 1's occurring for some features when feature 29 (message contains humour) was clamped.
Table 1. The frequency of feature activations for feature no 29 (message contains humour) clamped. The columns show the number of the feature, its description and the frequency of on-activations.
no description #1s 1
LINES-A: 1-10 lines of real msg 40%
2
LINES-B: 11-25 lines of real msg 50%
3
LINES C: 26-100 lines of real msg 10%
4
LINES D: >100 lines of real msg 0%
5
SUBJECT-A: no subject line 5%
6
SUBJECT B: subject line is appropriate 95%
7
SUBJECT C: subject line is inappropriate 0%
21
QUESTION-A: no question/request contained 70%
22
QUESTION-B: contains question/request 30%
28
HUMOR-A: no humor contained 0%
29
HUMOR-B: contains humour 100%
33
FORMAT-A: unformatted 25%
34
FORMAT-B: minimal formatted 35%
35
FORMAT-C: mostly formatted 35%
36
FORMAT-D: overformatted 5%
56
GENDER1-A: can't tell author gender 0%
57
GENDER1-B: female 20%
58
GENDER1-C: male 80%
112
FLAME2-A: no abusive language 100%
113
FLAME2-B: abusive language about content only 0%
114
FLAME2-C: abusive language about person 0%
115
FLAME2-D: abusive language about general others 0%
116
FLAME2-E: mixture 0%
This list only contains a few of the 149 total features, but it illustrates quite well how some features are strongly correlated to the feature that was clamped and others are not correlated at all. For example, non-humorous messages seem to be gender specific, since feature 58 (GENDER-C, male) is on 80% of the 20 experiments that were conducted. On the other hand, considering that almost 75% of all messages were written by males, the significance of this information might not be very high. Also, non-humorous messages do not contain abusive language, as the strong response on feature 112 shows. Interesting is feature-group 33-36, here none of the features has an exceptionally high occurrence. This leads to the assumption that a message containing humour does not depend on the formating. This feature group seems to be not significant to feature 29.
This process can be done for all features separately or for a combination of features clamped together. The result will be a list of features, each with an indication of how often the network settled in a state which had this particular state being on. This information can be used to produced typicality sets as shown in the next section.
Typicality sets of features
Since the main focus of analysis is correlations between features, it is interesting to extract a set of typical features from the output of the ANN. An a priori. specified threshold can be used to choose features for this set. The example from the previous section is again used to show its typicality set (see Table 2).
Table 2. The typicality set for feature 29 (message contains humour)
no. label description 6
SUBJECT-B subject line is appropriate 9
NOISE-B regular msg 18
APOLOGY-A no apology 26
CHALLENGE-A no challenge/bet/dare 38
STYLE1-B regular capitalization 53
ARTICON-A no artistic icons 58
GENDER1-C male 68
QUOTE1-A no quoted text from this list 72
QUOTE2-A no CMC text quoted from outside list 95
COALIT2-A no first person plural 98
COALIT3-B addresses other person 112
FLAME2-A no abusive language 117
FLAME3-A no intention to prevent/calm tension 120
STATUS-A no identification of status 126
SIGNAT2-A no ending quotation 145
EVENING 6pm - 12am
This table shows which features seem to be highly correleated with feature 29. But so far there is no information about the quality of the list. It could well be that one or even several of these features appear in almost every typicality set and are not well suited to distinguish between different message types. On the other hand a feature could most of the time just behave randomly. It was therefore necessary to score the sensitivity of each feature.
Scoring features and sensitivity
Of course, some of the features in the typicality set might not be as interesting as others. Some features are typical for almost all messages and therefore will be on no matter which feature is clamped. A feature behaving like this is called insensitive. To distinguish between sensitive and insensitive features, the features have to be ranked or scored in a way that indicates the sensitivity of the feature to the clamping of other features. This information is hidden in the distribution of 1s over all typicality sets for single features (this leads to 149 typicality sets, one for each feature). For the further analysis the percentage of 1s in the case of a clamped feature was compressed into 5 classes:
A - between 80% to 100% of 1s in one experiment B - 60% to 80% of 1s in one experiment C - 40% to 60% of 1s in one experiment D - 20% to 40% of 1s in one experiment E - 0% to 20% 1s (or 100% to 80% 0s) in one experiment Taking again the example where feature 29 is clamped, Table 3 shows a few of those classifications.
Table 3. The frequency of feature activations when feature 29 (message contains humour) is clamped. The last column shows the classification.
no description #ls class 1
LINES-A: 1-10 lines of real msg 40%
C
2
LINES-B: 11-25 lines of real msg 50%
C
3
LINES C: 26-100 lines of real msg 10%
E
4
LINES D: >100 lines of real msg 0%
E
21
QUESTION-A: no question/request contained 70%
B
22
QUESTION-B: contains question/request 30%
D
28
HUMOR-A: no humor contained 0%
E
29
HUMOR-B: contains humour 100%
A
33
FORMAT-A: unformatted 25%
D
34
FORMAT-B: minimal formatted 35%
D
35
FORMAT-C: mostly formatted 35%
D
36
FORMAT-D: overformatted 5%
E
56
GENDER1-A: can't tell author gender 0%
E
57
GENDER1-B: female 20%
D
58
GENDER1-C: male 80%
A
...
Taking all 149 typicality sets it is easy to compute 5 global values for each feature, the frequency with which the feature was covered by that specific class over all experiments. Table 4 shows a few examples from the table of all features.
With these five numbers a number can be computed to measure what sensitivity of a feature really means. If for example the percentage of A's for one feature is a perfect 100%, this specific feature is always on, but since it is never off it does not really help to distinguish between different classes of messages. It does however tell us about a typical message in the whole set database. On the other hand, a feature having 50% A's and 50% E's would be much better suited to group messages in the database, in fact this is the best case one could imagine. Somewhere in between are features with unbalanced percentages of A and E. To generate a unifiying score for all features the following four heurisitcs were chosen:
- A sensitive feature has at least one A (apart from the case where that feature was clamped) and one E.
- A feature is more sensitive than another one if the number of A's is better balanced to the number of E's.
- A smaller number of B, C and D indicates a sensitive feature.
- An insensitive feature has either no A or no D and a high number of B, C and D.
These heuristics led to a computational method of measuring feature sensitivity (see Berthold et al., 1997, for a detailed description). Table 4 shows a few examples from the 149 features.
Table 4. A few examples of scored features. For each feature the percentage of times it got classified as being in a specific class is shown and the final score resulting from these classifications is listed in the last column.
no. label description E D
C
B
A
score 28
HUMOR-A no humor 1% 12% 32% 42% 12% +1 29
HUMOR-B contains humor 13% 42% 32% 11% 0% -23 ... 33
FORMAT-A unformatted 53% 39% 6% 0% 0% -66 34
FORMAT-B minimal formatted 34% 49% 13% 2% 0% -49 35
FORMAT-C mostly formatted 17% 42% 30% 8% 0% -28 36
FORMAT-D overformatted 90% 9% 0% 0% 0% -93 ... 40
STYLE2-A no colloquial spelling 4% 14% 22% 43% 16% +5 41
STYLE2-B contains colloquial spelling 17% 43% 22% 14% 2% +2 ... 56
GENDER1-A can't tell 100% 0% 0% 0% 0% -100 57
GENDER1-B female 75% 22% 2% 0% 0% -82 58
GENDER1-C male 2% 0% 3% 38% 56% +2 ...
This table shows how some features are not very sensitive towards the activations (caused by clamping) of others. A good example would be the format of the messages (FORMAT-A to FORMAT-D), all features have a negative sensitivity score because none of them appears in another typicality set besides its own. In contrast the STYLE2-entry has for both features positive scores. Looking at the table it becomes clear why. STYLE2-A appears in 5% of all typicality sets and is completely absent in 4% of all messages. Almost the same goes for STYLE2-B, it is absent for 17% and included in the typicality set for 2% of all cases. As expected, none of the features reached the perfect score of +100.
Typicality in CMC
The previous section described how typicality sets for single features can be generated by an autoassociative neural network. For an analysis of typicality in CMC, and especially an investigation of threads and their characteristics, some typicality sets are more interesting than others.
To analyse the nature of threads the comparison of a message which starts or continues a thread vs. a message which ends a thread is interesting. Figure 3 shows a reference-tree to illustrate the used terms reference-width, reference-height and reference-depth.
![]()
Figure 3. A reference-tree, illustrating the terminology. A thread starts with message A and the last message participating is L. Message E is referenced by four messages (the reference-width), references itself a sequence of two message (reference-depth) and is reference by a sequence of three messages (reference-height).
The thread is the longest path from the top down into one of the branches in this tree, in the example this would be the path starting at A leading over B, E, G and K to L. Message E in this figure is being referenced directly by four messages (F, G, H and I) and results therefore in reference-width = 4, the same message E references a sequence of two messages (B and A), measured by reference-depth = 2 and is referenced by another sequence of three messages (G, K and L) leading to a reference-height of 3. Note that several of these messages could have been written by the same author. Different labels in this example only indicate different messages, not different authors.
To characterize a "good" vs. a "bad" message in the sense of participation in a thread the variable reference-width was used. A message is called "good" if it as at least referenced by one other message, it participates in a thread. In contrast a "bad" message is not referenced at all, it does not participate in a thread. Clamping the corresponding features (134, no messages are referencing this message; 135, 1-2 references to this message) leads to two typicality sets for the two types of messages being investigated. They are shown in Tables 5 and 6.
Table 5. The typicality set of a "referenced" message. Clamped feature: 135 (MSGWIDTH-B: 1-2 references to this message)
no. feature description sensitivity-
score2 11-25 lines of original text 1 6 subject line is appropriate 1 12 contains verbal self-disclosure 1 17 contains statement of a fact 1 21 no question/request 1 26 no challenge/bet/dare 1 38 regular capitalization 2 47 no emoticons 2 50 no punctuation device to express emotion 3 53 no artistic icons 1 58 male 2 60 identifies gender via name/signature 3 68 no quoted text from this list 2 72 no CMC text quoted from outside list 2 98 addresses other person 1 112 no abusive language 2 120 no identification of status 1 126 no ending quotation 1
Table 6. The typicality set of a "nonreferenced" message. Clamped feature: 134 (MSGWIDTH-A: no references to this message)
no. feature description sensitivity-
score12 contains verbal self-disclosure 1 26 no challenge/bet/dare 1 28 no humour 1 38 regular capitalization 2 53 no artistic icons 1 68 no quoted text from this list 2 80 no previous msg referenced by this msg 4 87 new topic, no reference to previous discussion 18 95 no first person plural 1 112 no abusive language 2 117 no intention to prevent/calm tension 2 120 no identification of status 1 126 no ending quotation 1 128 no previous msg referenced by this msg 2 131 no references after this msg 9
Interestingly the two typicality sets have several features in common. This is due to the fact that not every feature is sensitive to every other one. In addition some of the features have a low sensitivity-score meaning that they are not sensitive at all to other features. To create the final typical "good" and "bad" message, features appearing in both typicality sets will be deleted from both sets and features with a too low sensitivity score will be discarded too. This leads to Tables 7 and 8.
Table 7. Typical distinguishing features of a referenced (or "good") message.
no. feature description sensitivity-
score2 11-25 lines of original text 1 6 subject line is appropriate 1 17 contains statement of a fact 1 21 no question/request 1 47 no emoticons 2 50 no punctuation device to express emotion 3 58 male 2 60 identifies gender via name/signature 3 68 no quoted text from this list 2 72 no CMC text quoted from outside list 2 98 addresses other person 1
Table 8. Typical features of a nonreferenced (or "bad") message.
no.
feature description sensitivity-
score80 no previous msg referenced by this msg 4 87 new topic, no reference to previous discussion 18 95 no first person plural 1 131 no references after this msg 9
and finally enables us to extract some properties of the messages in the database:
- a "good" message has medium length (2) and an appropriate subject line (6).
- a statement of a fact (17) also enhances the chances of being followed-up.
- if during an already ongoing thread one introduces a completely new topic (87), the chances of getting a response are slim. This point seems to be a very strong one, regarding the high sensitivity score of that specific feature.
- interesting also is that a message which does not reference seems likely not to be referenced. But the sensitivity score of this feature is reasonably low, which makes sense, otherwise threads would never start. But this discovery indicates that the start of a thread is not an easy task. Being followed-up when one already participates in a thread is much easier.
Conclusions
We have described an approach to use autoassociative neural networks to explore typicality in computer mediated discussions. We showed how to train an ANN and how the final weight matrix can be used to extract relationships between variables. We then used the neural network to extract typicality sets for specified features and showed how messages which support threads ("good" messages) can be distinguished from those messages not participating in a thread ("bad" messages).
This sort of approach can be used to act as a preprocessor for a more detailed statistical analysis, concentrating on the subsets of features already discovered by the neural network. The ANN would thus only be used to discover feature-groups that are correlated and further statistics would concentrate on the strength and statistical significance of those correlations.
In addition the approach presented here provides insights into the quality of the database. There are several blocks of features that are strongly correlated while other features are only loosely or not at all connected. In contrast to the example used by Coyne et al. (1993), noise from coder-errors as well as differences on opinionated variables (as described by Rafaeli and Sudweeks (1996) result in a database which is not as well structured as artificial ones.
The possibilities using an ANN are far from being exhausted and several features are well worth exploring, STYLE2 for example, which has both high sensitivity scores for all features of the group and a fairly unbalanced frequency distribution (see the appendix in Berthold et al., 1997, for a listing). This would be another way of exploring threads. But also an analysis about the quality of interactivity could be performed, by using feature-group DEPEND3, which describes the manner in which previous messages are referenced. Yet another example is GENDER3 which is also a feature group with a good distribution and high sensitivity-scores. GENDER3 codes the fact that gender-identification is an issue. The same approach can also be used to find features that are a typical for messages. Features with a very low sensitivity-score and a typical value of 0 have a strong negative correlation with almost every other feature. This would lead to an "anti-message" within a typicality set.
Obviously only a very specific kind of neural network was used for this analysis, more architectures are being published every day. The ANN was chosen because the autoassociative structure supports the emergence of examples; if the main focus of analysis were on only a few variables, a feedforward architecture would also be feasible. An approach using feedforward neural networks would create a network to classify examples rather than create an environment for emerging examples. If a Localized Receptive Field Network (Moody and Darken, 1989) were used the prototypes represent typical examples for each class and the radii and weights of those reference vectors are indicators for the value and generality of the example.
The approach we presented in this paper is obviously capable of extracting a form of relationships between features, but the ANN-aproach also helped to verify tentative hypotheses pertaining to computer-mediated communication as most results reported by the neural network did "make sense".
Acknowledgements
This work was supported by a University of Sydney Research Grant (URG).
References
- Allbritton, M. M. 1996. Collaborative Communication among Researchers using Computer-Mediated Communication: A Study of ProjectH. Masters Thesis, Department of Communication, University of New Mexico, Albuquerque, NM, available at http://www.arch.usyd.edu.au/~fay/netplay/marcel/index.html.
- Berthold, M. R., Sudweeks, F., Newton, S. and Coyne, R. D. 1997. "It Makes Sense": Using an autoassociative neural network to explore typicality in computer-mediated discussions. In Network and Netplay: Virtual Groups on the Internet eds Sudweeks, F., McLaughlin, M. and Rafaeli, S. (to appear). Menlo Park, CA: AAAI/MIT Press.
- Collins, A. M. and Loftus, E. F. 1975. A spreading-activation theory of semantic processing. Psychological Review, 82: 407-28.
- Coyne, R. D. and Yokozawa, M. 1992. Computer assitance in designing from precedent, Environment and Planning B: Planning and Design, 19: 143-171.
- Coyne, R. D., Newton, S. and Sudweeks, F. 1993. Modelling the emergence of schemas in design reasoning. In Modeling Creativity and Knowledge-Based Creative Design. eds J. S. Gero and M. L. Maher, 177-209. Hillsdale, New Jersey: Lawrence Erlbaum.
- Kahneman, D. and Tversky, A. 1973. On the psychology of prediction. Psychological Review. 80: 237-51.
- Moody, J. and Darken, C. J. 1989. Fast learning in networks of locally-tuned processing units. Neural Computation. 1: 281-294.
- Newton, S. 1992. On the relevance and treatment of categories in AI in design. In Artificial Intelligence in Design '92, ed. J. S. Gero, 861-882. Dordrecht: Kluwer.
- Pulman, S. G. 1983. Word Meaning and Belief. London: Croom Held.
- Rafaeli, S. 1986. The electronic bulletin board: A computer driven mass medium. Computers and the Social Sciences, 2(3): 123-136.
- Rafaeli, S. 1988. Interactivity: From new media to communication. In Sage Annual Review of Communication Research: Advancing Communication Science, Vol. 16. eds R. P. Hawkins, J. M. Wiemann and S. Pingree, 110-134. Beverly Hills, CA: Sage.
- Rafaeli, S. and Sudweeks, F. 1997. Interactivity on the net. In Network and Netplay: Virtual Groups on the Internet, eds F. Sudweeks, M. McLaughlin and S. Rafaeli. Menlo Park, CA: AAAI/MIT Press.
- Rafaeli, S., Sudweeks, F., Konstan, J. and Mabry, E. 1994. ProjectH overview: A quantitative study of computer mediated communication, available from http://www.arch.usyd.edu.au/~fay/netplay/techreport.html or ftp.arch.su.edu.au/pub/projectH/techreport.txt.
- Rogers, E. M. and Rafaeli, S. 1985. Computers and communication. In Information and Behavior, Vol. 1, ed. B. D. Ruben, 135-155. New Brunswick, NJ: Transaction Books.
- Rosch, E. 1978. Principles of categorization. InCognition and Categorization. eds E. Rosch and B. B. Lloyd, 27-48. Hillsdale, NJ: Lawrence Erlbaum.
- Rosenman, M. A. and Sudweeks, F. 1995. Categorisation and prototypes in design. In Perspectives on Cognitive Science: Theories, Experiments and Foundations, eds P. Slezak, T. Caelli and R. Clarke, 189-212. Norwood, NJ: Albex.
- Rumelhart, D. E. and McClelland, J. L. eds 1987. Parallel Distributed Processing: Exploration in the Microstructure of Cognition, Vol. 1, Foundations. Cambridge, Massachusetts: MIT Press.
- Smith, E. E. and Medin, D. L. 1981. Categories and Concepts. Cambridge, Massachusetts: Harvard University Press.
- Sudweeks, F. and Rafaeli, S. 1996. How do you get a hundred strangers to agree: Computer mediated communication and collaboration. In Computer Networking and Scholarship in the 21st Century University, eds T. M. Harrison and T. D. Stephen, 115-136. New York: SUNY Press.
- Tversky, A. 1977. Features of similarity, Psychological Review, 84: 327-52.
About the Authors
Michael R. Berthold, PhD in Computer Science, University of Karlsruhe. In 1992 he was a visiting researcher at Carnegie Mellon University, Pittsburgh, and he joined Intel's Neural Network group in 1993. In 1994 he spent 3 months at the Key Centre for Design Computing at the University of Sydney, Australia. He is now a researcher at the University of Karlsruhe. His research interests include Neural Networks, Fuzzy Logic and Intelligent Data Analysis.
Address: Institut für Rechnerentwurf und Fehlertoleranz, Universität Karlsruhe, Postfach 6980,
76128 Karlsruhe, Germany.Fay Sudweeks BA (Psychology), MCogSc, is a Research Associate at the University of Sydney and a doctoral candidate (Business Systems) at the University of Wollongong. Her research interests are sociolinguistic aspects of computer-supported collaborative work, group development, and the application of Web-based technology to education and collaborative work. She is currently involved in the development of an interactive multimedia international journal and a MOO-based distance learning program.
Address: Key Centre of Design Computing, University of Sydney NSW 2006, Australia.Sidney Newton is Associate Professor of Design Studies at the University of Western Sydney, Nepean. His research interests include the creative application of new media, articulations between digital technology and design practice, and developments in flexible learning. He is currently a Director of the Access Australia Cooperative Multimedia Centre, the Centre for Applied Design Research and Education, and the Parallel Computing and Visualisation Laboratory.
Address: Department of Design Studies, University of Western Sydney - Nepean, PO Box 10, Kingswood NSW 2747, Australia.Richard Coyne is Professor of Architectural Computing at the Department of Architecture, University of Edinburgh. He teaches and researches in the area of design theory, CAD and multimedia. His recent book, Designing Information Technology in the Postmodern Age (MIT Press, 1995) examines how current thinking about information technology is informed and challenged by the writings of Heidegger, Derrida, the critical theorists, and other contemporary philosophers. He is currently researching the application of the WWW to innovations in computer-aided design.
Address: Department of Architecture, University of Edinburgh, 20 Chalmers St, Edinburgh EH1 1JZ, Scotland