You are on page 1of 13

Neural Network Computing and Natural Language Processing*

Frank L. Borchardt
Duke University

ABSTRACT: After twenty years of disfavor, a technology has returned which imitates
the processes of the brain. Natural language experiments (Sejnowski & Rosenberg: 1986)
demonstrate that neural network computing architecture can learn from actual spoken
language, observe rules of pronunciation, and reproduce sounds from the patterns
derived by its own processes. The consequences of neural network computing for natural
language processing activities, including second language acquisition and
representation, machine translation, and knowledge processing may be more convulsively
revolutionary than anything imagined in current technology. This paper introduces
neural network concepts to a traditional natural language processing audience.

KEYWORDS: neural networks, parallel distributed processing, associative


learning, natural language processing

Tolerant Computers
A certain exasperation is commonly experienced when a present day
computer does what it is told to do instead of what a user intended it to do. The
reason for the simple-mindedness of computers is itself simple. The modern
deterministic, serial, digital computer has at its core a central processing unit,
which is capable (a) of following one instruction at a time, therefore in sequence
and (b) calculating an outcome arithmetically as either zero or one or logically as
either true or false. To be sure, the modern computer can do this very quickly, in
some at a rate of many million times a second, but nonetheless, finally, one
instruction at a time, in sequence, with an either/or outcome. That means that a
computer can tell very quickly whether someone has typed something correctly
or not, or will very quickly follow the erroneous instruction given it, with no
tolerance for errors. With effort, to be sure, someone could probably program a
somewhat more flexible judge of typing, but finally, that judge has also to
produce an unequivocal either/or for the machine to work. This constitutes in
part what is known as "the Von Neumann Bottleneck" (Brown 1986).

CALICO Journal, Volume 5 Number 4 63


The Von Neumann Machine
Von Neumann and his associates designed the bottle, neck and all, in the
mid 'forties, against a specific technological background. This included the great
ENIAC computer, which had a parallel architecture, depended on
electromechanical switches (Goldstine 1972, 158), and calculated in base-ten
arithmetic, like humans do. Von Neumann, applying an insight gained from
Leibniz, proposed an architecture which would boil down problems of
calculation to the uttermost minimum, expressible in base-two or binary format,
and solved serially, one at a time, at enormously high speeds. That is the
architecture we are living with to this day, given several remarkable exceptions
(Insight, 4,7 [15 February 1988], p. 12). Von Neumann himself saw early that his
binary, digital computer was not just a calculator but rather a logic machine, and
that his zeroes and ones could just as well represent true and false, either/or,
both/and, and all the other operators of Boolean logic.

Early Neural Models


The importance of this insight in reference to computers as thinking
machines cannot be underestimated. Not long before the ultimate design of the
von Neumann machine, a model of neural activity had been proposed by a pair
of mathematicians, McCulloch and Pitts (McCulloch and Pitts 1943). This model
gave neural activity a version of Boolean logic and von Neumann a hope that
mechanical reproduction of brain activity was theoretically possible. An often
quoted remark of von Neumann's from 1951 reads:

It has often been claimed that the activities and functions of the
human nervous system are so complicated that no ordinary
mechanism could possibly perform them. It has also been
attempted to name specific functions which by their nature
exhibit this limitation. It has been attempted to show that such
specific functions, logically, completely described, are per se
unable of mechanical neural realization. The McCulloch-Pitts
result puts an end to this. It proves that anything that can be
completely and unambiguously put into words is ipso facto
realizable by a suitable finite neural network. (McCorduck 1979,
65).

Computing With Unreliable Components


The McCulloch-Pitts result was published in 1943. Advances in
neurophysiology from that time to this are easily as dramatic as those in the

CALICO Journal, Volume 5 Number 4 64


history of computing from that time to this. The neuron of 1988 would be as
recognizable to McCulloch and Pitts as a Cray model Y-MP grinding away at 4
billion calculations per second across eight (parallel) central processors (Time,
131,8 [22 February 1988] p. 53). Von Neumann has to be given credit that, in his
last year of life (1956) he extended the McCulloch-Pitts result from a
deterministic, "unary" model of a logical neuron to a distributed, parallel model
of a neural network (Cowan & Sharp 1987, 8 and reference there to von
Neumann 1956). He was wrestling, in part, with mathematical problems such as
emerged from computers with hundreds or thousands of vacuum tubes, five to
ten percent of which might be failing at any given time (Goldstine 1972, 275-285).
For a model of a computer that functioned reliably, information could not be
placed uniquely at any one site but would have to be distributed at no less than a
certain proportion of a large number of available sites. "Such a representation of
information is termed redundant, and von Neumann proved that redundant
McCulloch-Pitts nets, operating in parallel fashion, could be designed to carry
out arithmetical calculations with high reliability" (Cowan & Sharp 1987, 8).
Subsequent work (Winograd & Cowan 1963; and Pierce 1965) added to the
redundancy of the von Neumann-McCulloch-Pitts model by introducing the
feature of distributed representation of information, so that many sites might
contain the same information redundantly, as in von Neumann's 1956 model, but
additionally, any unit of information might be partially (and redundantly)
represented at many sites, to be recombined at the moment of output, so that loss
of n number of sites would still provide reasonable reliability.
To envision the problem metaphorically, one might imagine that a
computer held the information unit "tennis ball" somewhere in its memory. The
loss of that information unit in a serial, digital computer could be disastrous,
sending a client to a roller derby instead of Forest Hills. If the information unit
were located redundantly at many points, then of course a single or even
multiple loss could be made up by the redundancy. So far von Neumann.
Winograd, Cowan, and Pierce proposed distributed representation of
information, so that the unit "tennis ball" might, in fact, appear nowhere across
an information grid, but rather "fuzzy," "green," "relatively small," "spherical
object" might appear and to be united at output to the unit "tennis ball," or
"rough," "brown," "relatively large," "spherical object" might appear and be
united at output as "basketball." If one of the information subunits were lost, say
"fuzzy," the grid would still be able, with reduced certainty, to identify "tennis

CALICO Journal, Volume 5 Number 4 65


ball," as long as it still knew about "green," "relatively small," and "spherical
object." If the grid also lost "green," and "relatively small," of course, it might well
take the object for a basket-ball and thoroughly disrupt Boris Becker's serve. By
combining redundancy and distribution, that is, having "fuzzy," "green,"
"relatively small," and "spherical object" appearing, say, a hundred times each on
a grid, ninety-nine representations of any one feature could be lost, and the grid
would still have enough discriminating information left to recognize a "tennis
ball" with certainty and differentiate it from "basket-ball."
We are at this point only one small step away from an imaginary
computer fully capable of dealing with faulty typing. The model thus far
described deals with partial failure in the computer; the problem posed by faulty
typing is partial failure in inputting information.

Pattern Recognition
Consider, for a moment, the instructions typed on a keyboard as
something other than mathematically or logically precise values, say, as patterns,
more or less like handwriting, where no two representations of the same letter
are precisely identical, but where one can detect, in most cases, a general
similarity from one representation to the other, in some cases discretely (a
"Palmer Method" handwritten 'a'), in other cases, only by comparison or
knowledge of the context (a sloppy handwritten 'e' over against a sloppy
handwritten '1'). A computer that could deal with that would have to have a
good idea what the intended character would look like ideally but still be
tolerant of a large number of variations, some of which might come closer to the
ideals of other letters altogether. The mathematics underlying problems of
precisely this kind began being addressed in the 'forties and 'fifties.
The next major step forward in the development of theoretical models of
"tolerant" computers was taken in the area of pattern recognition. Pitts and
McCulloch (1947) themselves developed models which would recognize
"properties common to all Possible variants of the pattern" and a mechanism by
which a new variant could be transformed into a standard representation.
(Cowan & Sharp 1987, 12-13). In 1958, a decade after the publication of the
McCulloch-Pitts pattern recognition models, Frank Rosenblatt (Rosenblatt 1958,
386) invented what he called the "Perceptron," a front end to McCulloch-Pitts
networks by which they "could be trained to classify certain sets of patterns as
similar or distinct" (Cowan & Sharp 1987, 13). Within a couple of years (1960), a

CALICO Journal, Volume 5 Number 4 66


variation of the Perceptron, called "Adaline" (for adaptive linear neuron) was
invented by the team of Widrow and Hoff (Widrow and Hoff 1960). The
Perceptron and the Adaline differ in training procedures, but both "learn" to
identify certain patterns after a limited number of "study cycles" of stimulus and
response.
Rosenblatt himself recognized a gap in the logic of his Perceptron model
and suggested, as early as 1961, a feed-back cycle which would address that gap.
He called it a "back-propagating error correction algorithm," which was,
however, not solved until 1985 (see Cowan & Sharp 1987, 15).

The Catastrophe
In the interim between the recognition of this problem and its solution, a
great disaster befell the development of "tolerant" computers. In the mid-sixties,
the chief proponents of Artificial Intelligence, alert to the logical gap in the then-
current neural models, are supposed, successfully, to have made the case before
government that further research in the area of neural networks was premature.
In 1969, Marvin Minsky and his associate S. A. Papert, published a monograph
which proved that "elementary" (as they are now called) Perceptrons or Adalines
could not perform two crucial logical operations [exclusive OR and not
(exclusive OR)]. He conjectured then and maintains now (Johnson 1987, 52) that
no multi-layering of McCulloch-Pitts neurons within the Perceptron or Adaline
could solve the problem. For all practical purposes, funding in the United States
came to a dead stop for twenty years, and research slowed to a crawl.
An important exception to that generalization is Stephen Grossberg of
Boston University, who continued to labor in neural networks for a handful of
admirers and now leads the recent explosive renewal of interest in the field as
first President of the International Neural Network Society. Research in Europe
was less affected by this turn of events: Christian von der Malsburg of the Max
Planck Institute at Göttingen, and Teuvo Kohonen of the University of Helsinki,
made important contributions to the field throughout the 'seventies
(Klimasauskas 1987, 49-53, 68-70, 123).
Minsky has remained skeptical. Even today, confronting dazzling
demonstrations of neural network applications, he is quoted as saying: "we don't
really know if these demonstrations are only the beginning—or the final
achievement" (Newsweek, 110,3 [20 July 1987] 53).

Spin Glass and the Hopfield Discovery


One event seems most responsible for the return of neural networks to

CALICO Journal, Volume 5 Number 4 67


respectability: the development of a certain "artificial" or Class 2 neural net, that
is, one which only loosely resembles biological structure but has technological
applications. (Class 1 neural nets directly model some aspect of brain function or
structure, thus Cowan & Sharp 1987, 60.) The kind in question employed
symmetric connections which turned out to be directly analogous to properties
of a physical substance known as a spin-glass. In such a substance, magnetic and
anti-magnetic features are randomly distributed across the substance, but always
in such a proportion as to produce no net magnetism. The disposition of these
features is stable until some environmental change, such as temperature
variation, shakes them up again. This stability functions as a kind of memory.
The change in disposition depends in part on what the previous disposition was,
so that the substance "remembers" the previous random distribution within its
new distribution. The application of this principle to machine learning, problem
solving and optimization has been called, "profound," anticipated in the work of
others, especially Grossberg, but first made explicit in principle by John J.
Hopfield of Cal Tech for whom the nets are named (Cowan & Sharp 1987, 41).
Hopfield's work was published in 1982 in the Proceedings of the National Academy
of Sciences and opened the flood gates for renewed research in the area.

Learning Machines
The properties both of spin-glass and Hopfield nets closely resemble a
theory of learning published as early as 1949 by D.O. Hebb and going back, by
Hebb's own admission, to the very discovery of the neuron in the last decades of
the nineteenth century, namely, "connectionism" (Cowan & Sharp 1987, 9 & 38).
Hebb postulated that learning, and subsequent memory, take place as groups of
weakly connected cells organize into more strongly connected assemblies by
repeated stimulus and become relatively stable, that is, less susceptible to change
by new stimuli. This theory has been enormously influential in the development
of artificial neural nets, despite an absence of confirming evidence from
neurophysiological research (Cowan & Sharp 1987, 9).
The problems pointed out by Minsky and Papert back in 1969 (variously
called "the credit assignment," the "exclusive OR," and the "T/C problem"), were
solved in rapid order, first, by an adaption of a Hopfield net, called a "Boltzmann
machine" by its inventors, Terrence Sejnowski of Johns Hopkins and Geoffrey
Hinton of Carnegie Mellon (Hinton & Sejnowski 1983). They introduced into a
Hopfield net a randomizing function in the shape of a version of the "Monte
Carlo" algorithm, well known to statisticians (and card sharks), a procedure

CALICO Journal, Volume 5 Number 4 68


which had also previously been employed to solve spin-glass problems (Cowan
& Sharp 1987, 44).
In addition to solving the "credit assignment" problem, the Boltzmann
machine also provided a model for autoassociative or unsupervised machine
learning. This network can form representations that reproduce relations
between classes of events. Teuvo Kohonen, using very different mathematical
presuppositions (Willshaw and von der Malsburg 1976) designed an
unsupervised learning algorithm and topographical network for the reading of
maps, in which similar events were represented by neighboring units (Cowan &
Sharp 1987, 46) so that, say, color information, "red, pink," "rose" would be
available at adjacent locations in a network.
The Boltzmann machine also abandoned McCulloch-Pitts neurons in favor
of analog devices which vary continuously according to familiar neural patterns
(sigmoidal [Cowan & Sharp 1987, 45]). The partial abandonment of a binary,
digital mode for stimulus and response inside an artificial neural network has
very wide ramifications.
The remaining expressions of the Minsky-Papert exceptions to neural nets (the
exclusive OR and the T/C problem) were solved by networks employing
Rosenblatt's original perceptron in two or three layers (Cowan & Sharp 1987, 46)
and applying what is called the Back-Propagation algorithm. This was the
accomplishment of David Rumelhart, working with Geoffrey Hinton, and R. J.
Williams (1986). In the case of the exclusive OR, a two-layer Perceptron produces
three "hidden units," as they are called, (actually McCulloch-Pitts neurons),
which return a complete "truth table" for the Boolean "x or else y" problem
(Cowan & Sharp 1987, 48). To be sure, the network needs 558 sweeps through the
pattern to produce the result.
The T/C problem was simply that an elementary perceptron could not
distinguish between a pattern of five boxes shaped as a 'T' (in any rotation) and
as a 'C' (in any rotation). After some 5,000 to 10,000 presentations of 'T' and 'C'
patterns, a triple layer perceptron, using its errors to be instructed about the
desired outcome, eventually discriminates a 'T' with certainty and a 'C' with a
probability so high as to pass the uncertainty threshold. [Cowan & Sharp 1987,
49-51]
Both Boltzmann machines and the Back-Propagation algorithm were
employed by Sejnowski and his associate Charles Rosenberg of Princeton to
demonstrate in the concrete what this means. They did so by instructing a
network in the rules of the pronunciation of English (1987), giving the network a
written text to pronounce, and, after a week of training on a VAX 780, producing

CALICO Journal, Volume 5 Number 4 69


a garbled speech that had, however, successfully discriminated between vowels
and consonants. (Sejnowski & Rosenberg 1987, 153); after another week of
training, the network produced more than acceptable, fully recognizable English
speech: "French," "scent," "around," "not," "let," "soon," "doubt," "key," "attention"
(it had a little trouble with that one), and "loss." When it made an error it often
substituted phonemes that sounded similar to each other. For example, a
common confusion was the 'th' sound in 'thesis' and 'these' which differ only in
voicing (Sejnowski & Rosenberg 1987, 153).

The Architectures
These dazzling experiments were performed on conventional computers
which were programmed to emulate the various unconventional architectures
demanded by neural networks. To differentiate between the architectures it is
possible to isolate the significant categorical differences, which might be
classified as "Time," "Manner," and "Place" in the honored taxonomy of adverbs:

Von Neumann Machine Neural Net Machine


Time serial: parallel:

one piece of information many pieces of information


processed at a time processed simultaneously,
correctable, "future affects the past"
(Cohen, Grossberg, & Stark 1987).

Manner digital (or binary): analogue:

the answer produced to any both input and output acceptable at


questions being true or false any point on a continuum, with full
and the arithmetic reducible range of variations reproducible even
to base 2. at very small increments.

CALICO Journal, Volume 5 Number 4 70


Place deterministic: distributed:

information located, isolated information located, connectively, in


in whole, once, at a unique whole or in part or both, redundantly
address. at many addresses.

Hopfield's own summary of the requirements for neural network


computers reads: "The ability of the model networks to compute effectively is
based on large connectivity, analog response, and reciprocal or reentrant
connections. The computations are qualitatively different from those performed
by Boolean logic" (Hopfield & Tank 1986, 632).
Neural net machines have to be parallel in order to deal with all the
simultaneous changes of information that change the picture of the way an event
is perceived by the computer. Thus far, neural networks have been emulated
chiefly on deterministic, serial, digital computers. That emulation takes
synchronous events and turns them into very long series, which, in turn, take a
very long time to compute. The Sejnowski/Rosenberg experiment required the
full resources of a Vax 780 for hundreds of hours at a stretch. A comparable
experiment undertaken by N.-K. Huang at the University of Minnesota (Huang
1987) developed a network to act as a spelling checker, rather like the one that
may come with any word-processing package. The training phase of his network
took the exclusive use of a Cray II for an hour and a half, running at one hundred
million floating-point operations per second (100 Megaflops).
Neural net machines ought to be analog to deal with a full range of
variations available in the stimulus, not just variations which cross a certain
threshold. The Sejnowski/Rosenberg experiment needed far more urgently to
trace many approximations of correct speech than to reproduce some imaginary
ideal. A zero/one threshold would only have accepted perfectly correct speech.
Instead, Sejnowski designed a grid that accepted stimulus between 0 and 1 at
some distribution such as 0, .1, .2, .3, etc. That permitted a gradual learning
process, by which the network learned the principles of pronunciation, not
merely the imitation of a correct sound.
Sejnowski admits that his network inclines to cheat, to look for the easiest
imitative value, and that he had to intervene to adjust the weights so that this
inclination was frustrated.
There are machine learning processes which refer, not just facetiously, to
pleasure/pain conditioning. In these, "pleasure" may be reproduced by relatively

CALICO Journal, Volume 5 Number 4 71


many relatively low-grade stimuli, such as you may perceive when the hairs of
the fur of your dog pass across your hand, as opposed to "pain," the relatively
few, relatively high grade stimuli your hand may feel when he bites it, An
analogue network would presumably deal equitably with both qualities of
stimuli.
Of all these concepts, that of an analog computer (Goldstine, 84-105) is
perhaps the most difficult to grasp, even though one normally has one hanging
on an outside or inside wall somewhere. It is the watt/hour meter which tells the
power company how much to charge every month. The instrument measures
continuously the flow of electricity entering your house, not just whether the
electricity is on or off, and its own velocity proportionately represents the
volume of the flow. A neural net computer will, ideally, have similar properties
when measuring, say, the continuous flow of speech.
Neural net machines have also to be distributed, preferably associative as
well, to learn the patterns that inhere in events and to represent them in accord
with the stimuli. This characteristic hangs together with the analog requirements
of the machine. One of the genuinely surprising, in fact disturbing outcomes of
the Sejnowski/Rosenberg experiment was the plain fact that the network
distributed vowels topographically across its own grid in a disposition
remarkably close to the classic representation of the placement of vowels in the
human mouth (Sejnowski 1987).

Natural Language Applications


The Sejnowski/Rosenberg experiment is far from the only or even the first
attempt to apply neural nets to natural language problems. As Sejnowski and
Rosenberg treated phonology, Rumelhart, the premier theoretician of parallel
processing, early turned his mind and resources toward employing a net to learn
the principles of the formation of the past tense of English verbs and thereby to
"predict" their morphology (Rumelhart & McClelland 1986). Sophisticated
attempts have been made to employ nets in phrase structure, that is, syntax,
toward the elaboration of semantic grids: nets that might indeed understand
what they are representing (Waltz & Pollack 1985). Phonology, morphology and
syntax, semantics—is there anything else categorically the scientists and
engineers have left out?
The first products to reach the market-place are applications which deal
with natural language. The Nestor Corporation, headed by a Brown University
Nobel Prize laureate, Leon Cooper, produces a tablet which reads handwriting,
even sloppy handwriting (Johnson 1988). Neural Tech marketed a

CALICO Journal, Volume 5 Number 4 72


teaching/learning product called Plato/Aristotle (Schwartz 1987) which would
"learn" input in any two natural languages (theoretically in more than two) and
then, within a certain tolerance, recognize that input no matter how it was
reentered—misspelled or mixed between the languages.
What does that have to do with modern language instruction, foreign
language representation, perhaps machine translation? Well, the implications
should be clear. It is possible even today to produce a network that can describe
an event in several natural languages and permit a student to crawl gradually
from one to the other by trial and error, with the network persistently informing
the student of the degree of correctness. Feedback of this kind would emulate the
real-life circumstance of full, partial, or non-comprehension on the part of an
interlocutor in another language. It should be possible in the mid-term future for
a network to contain a flexible ideal for the pronunciation of a foreign language
and to inform a new speaker, in real-time, of the degree of adherence to that
flexible ideal. It may be possible for a network to learn not just the vocal but also
the grammatical patterns of a language, and, if that is possible, the grammatical
patterns of another language, to compare the two, to learn the nature of the
differences and to account for them in the transformation of one to the other.
That may be asking too much for the foreseeable future, but if any technology
will be able to accomplish such a feat, it will be neural networking or some
technology like it.
As fascinating and compelling as the new applications of this rejuvenated
technology may be, the truth of the matter is that it is still highly experimental
and years away from reaching the desk of the ordinary user. Neural nets are still
the property of pioneer scientists and engineers. That they, of all people,
experiment as often as they do with natural language should be sending a signal
to those who may lack the scientific expertise but possess a lifetime of expertise
in natural language, a signal that something is going on about which they must
be kept informed and, ideally, participate in themselves.

*This paper was originally delivered at the national CALICO meetings, Salt Lake
City, 26 February 1988.

References

Brown, C. "Parallel Optics: Solution to Von Neumann Bottleneck?" Electronic


Engineering Times (24 November 1986), 35.
Brown, C. "Parallel Processors Bypass Von Neumann Bottleneck," Electronic
Engineering Times (21 April 1986), 58.

CALICO Journal, Volume 5 Number 4 73


Business Week (26 January 1987): "They're here: Computers that 'think.'"
Cohen, M., S. Grossberg, and D.G. Stork, "Recent Developments in a Neural
Model of Real-Time Speech Analysis and Synthesis," IEEE First International
Conference on Neural Networks, San Diego, 21-24 June 1987.
Cowan, J.D., and D.H. Sharp, "Neural Nets," Preprint, 1987.
Goldstine, Herman H. The Computer from Pascal to von Neumann. Princeton:
Princeton University Press, 1972.
Hebb, D.O. The Organization of Behavior. New York: Wiley, 1949.
Hinton, Geoffrey E., and Terrence Sejnowski. "Optimal Perceptual Inference,"
Proceedings of the IEEE Computer Science Conference on Computer Vision and
Pattern Recognition. Washington, DC (1983), 448-453.
Hopfield, John J. "Neural networks and physical systems with emergent
collective computational abilities," Proceedings of the National Academy of
Sciences, 79 (1982) 2554.
Hopfield, John J. and David W. Tank, "Computing with Neural Circuit: A
Model," Science, 233 (8 August 1986), 625-633.
Huang, N.-K. "A Learning Experiment on English Spelling Rules," IEEE First
International Conference on Neural Networks, San Diego, 21-24 June 1987.
Insight, 4, 7 (15 February 1988), 8-12: "Making Machines in Mind's Image."
Johnson, R.C. "Neural Networks Naive, Says Minsky," Electronic Engineering
Times (3 August 1987), pp. 41, 52.
Johnson, R.C. "Nestor Frees Neural Nets from Hardware Chains," Electronic
Engineering Times (1 February 1988), pp. 33-34.
Klimasauskas, C.C., ed. The 1987 Annotated Neuro-Computing Bibliography.
Sewickley, PA: NeuroConnection, 1987.
McCorduck, Pamela. Machines Who Think. San Francisco: W.H. Freeman, 1979.
McCulloch, W.S., and W. Pitts. "A logical calculus of the ideas immanent in
nervous activity," Bull. Math. Biophys. 5 (1943), 115.
Minsky, Marvin, and S.A. Papert. Perceptrons: an Introduction to Computational
Geometry. Cambridge, MA: MIT Press, 1969.
Nature, 323 [1986], 533.
Neumann, J. von. "Probabilistic logics and the synthesis of reliable organisms
from unreliable components," Automata Studies, C.E. Shannon and J. McCarty,
eds. Princeton: Princeton University Press, 1956.
Newsweek, 110, 3 (20 July 1987) 52-53: "Mimicking the Human Mind."
Pierce, W.H. Failure Tolerant Computer Design. New York: Academic Press, 1965.
Pitts, W., and W.S. McCulloch. "How we know universals: the perception of
auditory and visual forms," Bull, Math. Biophysics. 9 (1947), 127.
Rosenblatt, Frank. "The Perceptron: a probabilistic model for information storage
and organization in the brain," Psychological Review, 62 (1958), 386.
Rumelhart, David, and James L. McClelland. "On Learning the Past Tense of
English Verbs," Parallel Distributed Processing: Psychological and Biological
Models, vol. 2. Cambridge, MA: MIT Press, 1986.
Rumelhart, David, Geoffrey Hinton, and R.J. Williams, "Learning Internal
representations by error propagation," Parallel Distributed Processing:

CALICO Journal, Volume 5 Number 4 74


Explorations in the microstructure of cognition. vol. 1. Cambridge, MA: MIT
Press, 1986.
Schwartz, T.J. "Neural Net software for PC AT," Electronic Engineering Times (2
March 1987), pp. 27, 49.
Sejnowski, Terrence J., and Charles R. Rosenberg. "Parallel Networks that Learn
to Pronounce English Text," Complex Systems, 1 (1987), 145-168.
Sejnowski, Terrence J., and Charles R. Rosenberg. "NETtalk: A Parallel Network
that Learns to Read Aloud," JHU/EECS-86/01, Johns Hopkins University,
EE/CS Department, January 1986.
Sejnowski, Terrence. "Speech and Signal Processing Tutorial," IEEE First
International Conference on Neural Networks, San Diego, 21-24 June 1987.
Time, 131, 8 (22 February 1988), 53: "Supercomputers: The Fastest Brain in Town."
Waltz, Davis L., and Jordan B. Pollack, "Massively Parallel parsing: A Strongly
Interactive Model of Natural Language Interpretation," Cognitive Science, 9
(1985), 51-74.
Widrow, B., and M.E. Hoff, "Adaptive switching circuits," WESCON [Institute of
Radio Engineers, Western Electronic Show and Convention] Convention
Record, 4 (1960), 96.
Willshaw, D.J., and Christian von der Malsburg, "How patterned neural
connections can be set up by self-organization," Proceedings of the Royal Society,
London B. 194 (1976), 431.
Winograd, S., and J.D. Cowan. Reliable Computation in the Presence of Noise.
Cambridge, MA: MIT Press, 1963.

Author's Biodata
Dr. Frank L. Borchardt is Associate Professor of German at Duke
University and Chairman of the Department. He also functions as Principal
Investigator for Duke's Humanities Computing Projects.

Author's Address
Frank L. Borchardt
Department of German
Duke University
Durham, NC 27706

CALICO Journal, Volume 5 Number 4 75

You might also like