Professional Documents
Culture Documents
as a
Game of Types
(Thesis Format: Monograph)
bv
Graduate Program
in
Computer Science
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
1*1
Library and
Archives Canada
Bibliotheque et
Archives Canada
Published Heritage
Branch
Direction du
Patrimoine de I'edition
NOTICE:
The author has granted a non
exclusive license allowing Library
and Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or non
commercial purposes, in microform,
paper, electronic and/or any other
formats.
AVIS:
L'auteur a accorde une licence non exclusive
permettant a la Bibliotheque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par telecommunication ou par I'lnternet, preter,
distribuer et vendre des theses partout dans
le monde, a des fins commerciales ou autres,
sur support microforme, papier, electronique
et/ou autres formats.
i*i
Canada
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
CERTIFICATE OF EXAMINATION
Supervisor
Examiners
Supervisory Committee
The thesis by
Jackson Carvalho
entitled:
Date
Richard Kane______
Chair o f the Thesis Examination Board
April , 2005___________________
8
ii
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
A bstract
T his thesis presents a gram m ar-based approach to the specification of m ath em atical
n o tatio n . T he m ethod introduced is based on a m eta-stru ctu re th a t uses a ttrib u te d
context-free gram m ars for cap tu rin g th e m eaning of m athem atical concepts.
T his
iii
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
Acknowledgm ents
I would like to th an k my supervisor, Dr. H elm ut Jiirgensen, who believed in me,
for proposing th e problem , for his guidance and m entorship.
th an k M aia H oeberechts for reading th e previous version of this thesis and for her
suggestions.
I am grateful to my parents, Jose and Jan ete, for m aking me u n d erstan d th e im por
tance of education and work. I wish to th a n k my children C arolina, M arcello e Luiza
for always rem inding me life can be fun even d uring difficult tim es. My special th anks
to my wife R ozane for her su p p o rt, love and dedication to our children.
This work has been partially su pported by th e Conselho Nacional de D esenvolvim ento
Cientffico e Tecnologico (C N Pq), by th e U niversidade Federal do Rio G rande do N orte
(U FR N ), by Dr. H elm ut Jiirgensen.
iv
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
Table of C ontents
ii
A bstract
iii
iv
1 Introduction
1.1
3
4
1.2.1
1.2.2
1.2.3
1.2.4
A S T E R .......................................................................................................
1.2.5
O p e n M a th ...................................................................................................
1.2.6
M a th M L .......................................................................................................
13
1.2.7
14
1.2.8
14
1.3
15
1.4
17
1.5
A pproach T a k e n ......................................................................................................
18
19
21
Basic D e f in it io n s ...................................................................................................
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
21
24
3.1
Basic N o t i o n s ........................................................................................................
24
3.1.1
24
3.1.2
26
3.2
27
3.3
An E xisting M o d e l .............................................................................................
28
3.3.1
29
30
A New F r a m e w o r k ..................................................................................
30
3.5
E x a m p le ..................................................................................................................
32
3.6
Sum m ary
33
3.4
3.4.1
...............................................................................................................
34
4.1
I n tr o d u c tio n ............................................................................................................
34
4.2
34
4.3
Cognitive D is ta n c e s .............................................................................................
36
4.4
R endering I n f o r m a t io n .......................................................................................
37
4.5
38
4.6
40
4.7
Changes in th e I n t e r f a c e ...................................................................................
41
4.8
42
4.9
Sum m ary
42
...............................................................................................................
44
5.1
45
5.2
46
5.3
48
49
vi
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
5.3.2
C FG s and D a ta T y p e s ..........................................................................
50
5.3.3
51
5.3.4
U p d atin g C F G s ........................................................................................
52
5.3.4
. 1
5.3.4.2
54
56
5.4
...............................................................................
59
5.5
61
5.5.1
O verloading S u b s c r ip ts ...........................................................................
63
5.5.2
64
5.6
R epresenting M a t r i c e s ......................................................................................
64
5.7
6 6
5.8
R epresenting S u m s .............................................................................................
67
5.9
C o n c lu sio n ...............................................................................................................
70
71
6.1
71
6.2
75
77
79
6.3.1
82
6.2.1
6.3
...
6.4
87
6.5
90
92
94
6.6.1
97
99
6.7
T he Role of C o m p i le r s .......................................................................................
100
6 . 8
M e ta - S t r u c tu r e .....................................................................................................
102
6.9
C o n c lu sio n ...............................................................................................................
104
6.5.1
6 . 6
vii
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
7 Exam ples
7.1
107
7.2
1 1 2
7.3
116
120
8.1
120
8.2
122
8.3
122
E x a m p le .....................................................................................................
124
8.3.1
106
125
9.1
D is c u s s io n ................................................................................................................
125
9.2
127
9.3
F u tu re W o r k .............................................................................................................
129
V ita
141
viii
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
List o f Tables
4.1
35
4.2
36
4.3
43
5.1
52
5.2
54
5.3
54
5.4
D erivation of word
+ 2.......................................................................................
55
5.5
55
5.6
55
5.7
56
5.8
C FG
60
5.9
C FG
5.10 C FG
61
5.11
62
64
65
6 6
67
6 8
C FG
ix
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
60
5.17
G9
6.1
74
6 . 2
83
6.3
84
6.4
85
6.5
8 8
6 . 6
89
6.7
89
6 . 8
6.9
90
6.10
91
6.11
91
6.12
92
6.13
92
6.14
93
6.15
90
94
99
103
6.19 A ttrib u ted g ram m ar to su p p o rt th e cap tu rin g of sim ple sum m ations.
104
7.1
107
7.2
107
7.3
7.4
109
o p eratio n s..................................................................................................................
x
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
1 1 0
7.5
7.6
1 1 2
o p eratio n s..................................................................................................................
7.7
7.8
7.9
G ram m ars
113
113
. 114
118
118
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
115
List o f Figures
3.1
28
3.2
32
4.1
40
5.1
47
6.1
6 . 2
xii
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
74
73
Chapter 1
Introduction
R ath er th a n require th a t users change, system designers could ad a p t th eir
system s to key aspects of th e users work practice [33] . . .
R eading and w riting m athem atics are activities th a t involve distinct characteristics
of the n o tatio n used. R eading requires a stab le m eaning-to-syntax m apping where
concepts may always be identified by an expected syntax. On th e oth er hand, w riting
m athem atics dem ands th e possibility of th e introduction of m eaning-to-syntax m ap
pings th a t, according to th e a u th o r of th e docum ent, best identify th e inform ation
to be com m unicated. T he fact th a t readers benefit from a stan d ard notatio n and
w riters require th e flexibility to define new m eaning-to-syntax m appings is viewed, in
this thesis, as characteristics th a t are in tension.
A pproaching th e specification of th e m ath em atical n o tatio n for electronic docum ents
by providing a stan d ard will, of course, benefit readers. This also implies th a t users
of com puterized system s th a t su p p o rt th e stan d ard will be forced to ad ap t to the
details provided by th e specific n o tatio n in order to m anipulate th e concepts there
represented. O ne may argue th a t learning any notation provided by a system may
not be a m ajo r concern since ad eq u ate hum an-com puter interfaces may be provided
to su p p o rt this activity. This is tru e for th e case when th e underlying m ath em atical
no tatio n is stab le and fixed. It m eans th e relation between syntax and sem antics
does not change and new concepts are not allowed to be added to th e set covered by
th e n o tatio n . It is undeniable th a t n o tatio n s th a t are both stable and fixed could be
enforced for users of com puter algebra system s, for instance. It is also intuitive to see
th a t th e ad dition of ad eq u ate G raphical User Interfaces (GUIs) would help minimize
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
th e effort required to use any system th a t initially supports only text-based interfaces
for th e m anipulation of m ath em atical n otation. An exam ple of this is M ath T y p e [70]
th a t uses a G U I as a form of helping th e user to produce th e correct TX syntax.
As new concepts are introduced, encodings are needed to sup p o rt th eir m an ipula
tion. C onsequently th e m ath em atical n o tatio n evolves by extending th e relationship
between concept and syntactical representation.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
In p articu lar th e developm ent of m ultim edia docum ents su p p o rted by user
interfaces which can be configured to a d a p t to users with print disabilities have been
addressed by [80]. T he im portance of m ultim odalities and m ultim edia to sup p o rt the
com puter-based com m unication of m athem atics has been em phasized by [42],
This research was originally m otivated to make docum ents accessible to blind people.
Fundam ental requirem ents associated w ith these users lim itations had therefore to
be considered. These concerns included th e followed two possibilities1:
1.1
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
1.2
Related Work
A discussion of some related efforts which have trea ted th e problem of th e represen
ta tio n of th e sem antics of m ath em atical concepts is presented in this section. Due to
th e im portance of processing electronic docum ents th a t contain m athem atics a new
interdisciplinary field, M athem atical Knowledge M anagem ent (M KM ), has em erged
[13, 12, 45]. This field deals w ith th e intersection between m athem atics and com
p u te r science and aim s to develop b e tte r ways to articu late, organize, dissem inate
and provide access to m ath em atical knowledge. A ST E R [
8 6
M athM L [14] are im p o rtan t research projects in this field. P rio r to th e discussion of
th e th ree approaches m entioned, a brief introduction to th e notions of d a ta model
and d a ta representation have been included. T he reason for this is because I believe
they are fundam ental concepts for th e definition of docum ent specification structures.
An intro d u ctio n to th e strateg y proposed by SGM L [57] to stru ctu re docum ents is
also discussed. T he end of this section addresses th e principle of com positionality of
m eaning [98, 58, 101].
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
1.2.1
updated.
In addition to a set of operators, an efficient im plem entation of th e d a ta u p d ate
concept requires bo th th e identification and control of redundant d ata. It also involves
th e notions of equivalence, functional dependencies and norm al forms. An exam ple
of a d a ta model which addresses these issues is th e relational d a ta model [32],
A d a ta model is basically a d a ta encoding and a set of operators which m an ipulate
th e d ata, whereas a d a ta representation does not include th e operators. A discussion
involving th e differences between d a ta model and d a ta representation is provided by
[90]. T he im portance of th e notion of u p d ate in d a ta models may be expressed by the
relations between th e notions of u p d ate and equivalence. As em phasized bv [90] an
efficient use of u p d ate should involve some m echanism to control redundancy which
requires th e notion of equivalence.
1.2.2
T he S tan d ard G eneralized M arkup Language (SGM L) [57] is a docum ent represen
ta tio n language which standardizes th e application of generic coding and generalized
m arkup concepts. One of its im p o rtan t characteristics is th a t it allows docum ents
to be trea ted in a way sim ilar to databases [90, 89]. As a m eta-language, SGML
defines a stan d ard process for th e specification of th e syntax of descriptive m arkup
languages.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
As stated in th e Intern atio n al S tan d ard ISO 8879 [57] an SGML en tity is defined as a
collection of characters th a t can be referenced as a unit. An entity has no stru ctu ral
properties. Its application is restricted to th e replacem ent of a strin g of characters
by an identifier.
S tru ctu red docum ents are com posed of a collection of com ponents. These com ponents
are characterized by th eir context, scope and type. The relationship a com ponent
has w ith o th er com ponents is its context. T he boundaries determ ining th e beginning
and end of a com ponent define its scope. D ocum ent com ponents may contain other
com ponents or ju s t d ata. C onsequently th e type of a given com ponent will either
be determ ined bv th e d a ta or by th e com position of the types of th e com ponents
which co n trib u te to its definition. In SGM L these com ponents are represented by
elements. An SGML elem ent may contain a ttrib u te s. The purpose of th e a ttrib u te s
is to describe some properties of th e element.
SGML provides no operations for u p d atin g D TD s. It relies on editing for accom plish
ing any possible m odification on any of its derived languages. Therefore it represents
descriptions of sta tic d ata. This characteristic is considered a lim itatio n when ap
plied to th e representation of dynam ic d a ta sets. A lthough entities and th e a ttrib u te
pair ID /ID R E F may be used as a way of elim inating redundant d ata, they cannot
be applied to control it since bo th are controlled by th e au th o r of th e docum ent [90].
Also, as pointed out by [90] there is no system sup p o rt to indicate w hether th e use
of ID /ID R E F a ttrib u te s refer to red u n d an t inform ation.
A ccording to [ ] th e Extensible M arkup Language (XML) [20] is a simplified subset
6 8
of SGML th a t has capabilities for su p p o rtin g its use over th e Internet. R elated to this
fact is a relevant distinction between XML and SGML. As pointed out by [
, 62],
XML does not require a D TD to be delivered w ith its associated docum ent. Instead
it requires docum ents to be well-formed.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
1.2.3
XML Schem a provides an altern ativ e to D TD . It allows much m ore rigorous control
and su p p o rts d a ta types. In this thesis th e RELA X NG [27, 28] schem a language is
considered because it has been adopted by O penM ath [23] as th e m ajo r form alism
for its encoding.
A ccording to [90], RELAX NG is a d a ta model since it includes bo th su p p o rt for d a ta
encoding and operations on th e d ata. M ost operations proposed by RELA X NG are
based on th e operations used by D TD s to express d a ta constraints. Some of these
are, for exam ple, choice, optional and zeroO rM ore which correspond to |, ? and *
D T D 's operators respectively.
Among th e d a ta operations RELA X NG proposes, th e replace definition mechanism
is not su p p o rted by XML D TD s. Its im plem entation involves th e ref, include and
define operations. No specific o p erato r is provided for this operation. Its sem antics is
provided by an exam ple [29]. T he sem antics of this operation is sim ilar to th e contextfree g ram m ar extension operation [36] I have proposed in 1998. T he following exam ple
illustrates this operation:
<grammar>
<start>
<element name="addressBook">
<zero0rMore>
<element name="card">
<ref name="cardContent"/>
</element>
</zero0rMore>
</element>
</start>
<define name="cardContent">
<element name="name">
<text/>
</element>
<element name="email">
<text/>
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
</element>
</define>
</grammar>
A ssum ing th e above syntax is available as th e file addressBook.rng a define elem ent,
containing th e syntax to be replaced, is placed inside an include elem ent. T he syntax
th a t follows replaces th e contents of th e card element.
<grammar>
cinclude href="addressBook.rng">
<define name="cardContent">
<element name="name">
<text/>
</element>
<element name="emailAddress">
<text/>
</element>
</define>
</include>
</grammar>
1.2.4
ASTER
8 6
um ents w ritten in th e
A ST E R 's processing en
vironm ent m aps th e logical stru ctu re of th e T^X-based docum ent into its internal
representation, a tree d a ta stru ctu re. Therefore browsing through a m ath em atical
expression corresponds to visiting nodes of th e tree. A representation of th e docu
m ent in audio form at is obtained by th e application of a set of com m ands w ritten in
a language called AFL, which stan d s for Audio F o rm attin g Language. O ne facility
this language provides is th e possibility of variable su b stitu tio n . This m eans an AFL
rule may replace a portion of an expression by a label. This allows th e user to obtain
an overview of th e expression prior to g ettin g exposed to all its details.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
1.2.5
OpenM ath
b o th th e sem antics and stru ctu ra l inform ation of a m athem atical concept. A ttrib u tes
may be attach ed to O penM ath objects and they can be applied to provide additional
inform ation not related to th e sem antics of th e object such as ty p esettin g details or
th e U RI of a given CD, for exam ple.
O penM ath objects are stru ctu red as basic, com pound and derived.
Inform ally an
O penM ath object is viewed as a tree [23]. Basic objects are th e leaf nodes of the
tree. T h e non-leaf nodes of th e tree are m ade up of its com pound objects. This
choice of organization determ ines th e LISP style O penM ath uses for th e encoding
of its com pound objects. This m eans O penM ath builds expressions by using prefix
operators. O penM ath basic objects are integers, symbols, variables, floating-point
num bers, ch aracter strings, and bytearrays. Derived objects are non-O penM ath ob
jects th a t are im ported by m eans of th e a ttrib u tio n construct. C om pound objects
are created by th e application, binding, a ttrib u tio n and error constructs.
The fact th a t O penM ath aim s a t th e com m unication of m athem atics am ong com
putin g system s is expressed by th e way its objects are encoded. A binary and an
XML form of encoding are defined for its objects. A lthough th e stan d ard states th a t
th e XML encoding is readable and w ritable by hum ans, [37, 108] claim th e encod
ings provided are n either m eant to be read by hum ans nor to be created by editing
procedures w here hum ans directly supply all th e necessary syntax. A m ong th e two
stan d ard encodings available, th e XML encoding is used to define th e m eaning of the
objects to be tran sm itted .
A pplication and binding are O penM ath constructors. An application constructs an
O penM ath object from a sequence of one or m ore O penM ath objects. T he following
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
10
, + 1
[23].
</0MI>
<0MI> 10 </0MI>
</0MA>
<0MBIND>
<0MS cd="fnsl" name="lambda"/>
<0MBVAR>
<0MV name="x"/>
</0MBVAR>
<0MA>
<0MS cd="arithl" name="divide"/>
<0MI> 1 </0MI>
<0MV name="x"/>
</0MA>
</0MBIND>
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
11
</OMA>
</OMOBJ>
An a ttrib u tio n decorates an object w ith a sequence of one or m ore pairs com posed
of an O penM ath symbol, th e a ttrib u te , and an associated object, th e value of the
a ttrib u te . A ccording to [23] a ttrib u tio n may either be used as an adornm ent or as
sem antical an n o tatio n s depending on th e role associated w ith th e a ttrib u te .
The
stan d ard states th a t when th e a ttrib u te has role sem an tic-attrib u tio n th e a ttrib u te d
object is modified by th e attrib u tio n . For this reason a ttrib u tio n is also considered a
constructor. A lthough this characteristic is referred to as an im p o rtan t feature, the
a ttrib u tio n exam ples included in th e stan d ard only involve adornm ent an n otations.
The following code illustrates bo th th e use of th e attrib u tio n object by associating
non-O penM ath d a ta w ith an O penM ath object by th e use of th e foreign element.
<0MATTR>
<0MATP>
<0MS cd="presentation" name="mathml"/>
<0MF0REIGN>
<math xmlns="http://www.w 3 .org/1998/Math/MathML">
<mi> sin </mi><mfenced><mi> x </mi></mfenced>
</math>
</0MF0REIGN>
</0MATP>
<0MA>
<0MS cdbase="http://www.openmath.org/cd"
cd="transcl" name="sin"/>
<0MV name="x"/>
</0MA>
</0MATTR>
T he error object is not considered because it has no direct m ath em atical m eaning.
Its use is to rep o rt problem s related to th e com m unication of O penM ath objects.
The O penM ath stru ctu re used for grouping O penM ath objects is a C ontent Dic
tionary or CD for short. T he definition of a CD usually includes oth er CDs.
An
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
12
a CD. CDs may be grouped as a mechanism to define collections or groups and both
CD and CD groups are XML docum ents.
T he d a ta provided by a CD may be stru ctu red according to th e ty p e of inform ation
th a t is addressed. Inform ation included in a CD either
belongs to th e whole CD
0 1
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
13
1.2.6
MathM L
The M athem atical M arkup Language or M athM L [14] is a W orld W ide C onsortium
(W 3C) recom m endation for describing m ath em atical notation. M athM L is an XML
application which focuses on th e provision of m athem atics on th e W orld W ide Web.
M athM L approaches th e m arkup of m ath em atical concepts by m eans of two sets of
elem ents and a ttrib u te s. It is bv m eans of this property th a t M athM L encodes the
layout as well as th e sem antics of m athem atical expressions. P resen tatio n M athM L
and C ontent M athM L are two languages provided to support this characteristic.
In much th e sam e way TX approaches th e ty p esettin g of m ath em atical tex t, pre
sentation M athM L determ ines th e control over th e display of m athem atics. C ontent
M athM L is m eant to supply m ore m eaning to th e description of m ath em atical con
cepts. O ne restriction this form of m arkup provides is th e lim ited range of m ath e
m atical concepts it covers. This is because content M athM L has been designed to
sup p o rt th e encoding of m ath em atical concepts th a t are used from kindergarten to
th e end of high school and th e first two years of college. Like O penM ath, M athM L
also shares th e characteristic of being a system -oriented approach. This p roperty has
been em phasized by [79]:
while M athM L is hum an-readable, it is an ticipated th a t, in all b u t th e
simples cases, auth o rs will use equation editors, conversion program s, and
oth er specialized software tools to generate M athM L.
C ontent M athM L consists of ab o u t 120 elem ents accepting ab o u t a dozen a ttrib u te s.
The representation of concepts not covered by these elem ents may be obtained by
referring to external definitions. T he M athM L csymbol elem ent or content symbol
is provided to address this lim itation.
has to refer to a symbol th e m eaning of which is not provided by M athM L 's core
content elem ents.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
14
The above definition encodes a sym bol th a t sem antically represents th e space of
tw ice-differentiable continuous functions and has its syntax encoded as C 2.
1.2.7
1. M athem atical expressions in b o th O penM ath and M athM L are built by using
prefix operators. Therefore th e order of entry is counter-intuitive [96] since the
m ental model im posed by b o th approaches determ ine th a t user inputs no tation
from th e inner most nested expression outw ard, instead of from left to right.
2. A lthough b o th stan d ard s su p p o rt m ultim odality of o u tp u t, they have not been
designed to su p p o rt m ultim odalitv of input.
1.2.8
C om positionality
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This
15
th e m eaning of a sentence can be com posed from th e m eaning of its parts. In a more
precise form this principle is stated as
T he m eaning of a com pound expression is a function of th e m eaning of
its p arts and th e syntactic rule bv which they are com bined [58].
A language is considered com positional if it satisfies th e com positionality principle.
This involves th e decision on w hat are th e basic sem antic and syntactical com po
nents and how they are com bined [58]. Therefore a design th a t is not com positional
indicates th a t its p arts a n d /o r th e syntactic rules which bind them have not been
selected properly. A lthough achieving com positionality of m eaning m ight seem to be
an im possible task, [58] claims th a t
. . . com positionality becomes possible if sem antic considerations influence
th e design of th e syntactic rules.
The above indicates th a t one can always find a syntax th a t allows th e assignm ent of
th e intended m eaning in a com positional form. This property is supported by Theo
rem 9.4 in [58] which claims th a t any possible m eaning can be assigned to any possible
language in a com positional form. For languages characterized by a fixed (static) syn
ta x com positionality of m eaning is a design decision since it can be achieved by the
choice of a su itab le gram m ar. Theorem 9.3 in [58] supports this characteristic. It
proves th a t if a language can be generated by any algorithm it is possible to gen
erate this language by a com positional gram m ar.
1.3
M otivation
T he work of this thesis was originally m otivated by th e necessity of having a T^Xto-B raille tran slatio n system [10, 11]. As characterized in [10, 11, 44], b o th T^X and
stan d ard B raille representations em phasize th e syntactical stru ctu re of th e concepts
involved. For this reason a sem antics-preserving tran slatio n from T^X input to Braille
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
16
1 1
tion included th e necessity of a sem antics-based m arkup. This has, of course, been
noted by m any others in th e field [
, 14, 23].
docum ents. This implies th a t any m acro definition, including th e ones provided by
th e au th o r, m ust reflect th e logical stru c tu re of th e concepts involved in th e definition.
A nother in terp retatio n of this requirem ent is th a t a restriction
isnecessary in order
approach, obtained
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
17
A lthough docum ent stru ctu res based on stan d ard izatio n of representation favors doc
um ent portability, they are not adequate for rendering docum ents in ways th a t require
different hum an senses for th e un d erstan d in g of inform ation. This has been observed
by bo th R am an [ ] and A rrabito [
8 6
1 0
and T^X into Braille respectively. T he necessity of having a docum ent stru ctu re th a t
would allow th e m ath em atical concepts be com m unicated regardless of m edia used or
th e hum an senses involved, m otivated th e research reported in th is thesis. T he section
th a t follows outlines a sem antics-based solution to th e specification of m ath em atical
concepts.
1.4
5In the context of this work user-oriented refers to a design approach focused on the needs of the
end user.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
18
I claim th e following:
T he m eaning of m athem atical concepts can be captured by a ttrib u te d contextfree gram m ars.
E xtensibility can be supported by operations on th e a ttrib u te d gram m ars.
A m biguity generated by sym bol overloading can be resolved by a scope mech
anism w here th e m eaning of concepts is uniquely defined.
1.5
Approach Taken
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
19
1.6
Thesis Overview
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
20
gram m ar called th e B inding C ontrol m echanism. T he sem antic stru ctu re is based on
a ttrib u te d context-free gram m ars and it addresses extensibility bv com bining gram
m ar definitions. Two g ram m ar operations are defined for this purpose. These opera
tions assum e th e gram m ars involved have been defined according to th e restrictions
specified by a norm al form proposed in th e chapter. The am biguity ch aracteristic is
approached by a context switch which allows th e replacem ent of a sem antic stru ctu re
by another. C h ap ter 7 provides a set of exam ples. These exam ples are used to il
lu stra te th e characteristics of th e approach introduced in C h ap ter
processing th e docum ent organization presented in C h ap ter
te r
. A stru c tu re for
is proposed in C hap
au to m ato n th a t has its states characterized as sets of gram m ars and its tran sitions
by th e m eaning-to-syntax bindings established d uring authoring. C h ap ter 9 contains
a discussion of th e approach proposed by this thesis, conclusions and suggestions for
future work.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
21
Chapter 2
B asic N otions and N otation
This ch ap ter presents th e n o tatio n to be used th roughout this thesis and includes
th e necessary basic definitions. T he specification of gram m ars may be approached
by listing th e ir production rules whenever a com plete specification is not necessary.
All gram m ars in this thesis will be displayed in tab le form. T he g ram m ar's nam e
will always ap p ear in th e far left colum n and each row of th e tab le will contain a
production rule w ritten w ith spaces as symbol delim iters.
term inal symbols are represented by strings of characters, possibly linked by the
underscore character. Lower case strings of letters are used to represent nonterm inals
and u pper case letters and o th er characters are used to represent term inal sym bols1.
T he symbol | is som etim es applied to group tog eth er rules associated w ith th e sam e
nonterm inal. T he nonterm inal on th e left of th e production rule in th e first row is
th e s ta rt symbol. T he arrow is replaced by a colon in all gram m ars except th e one
for th e m eta-stru ctu re. For a ttrib u te d gram m ars, one additional colum n is included
a t th e right edge of th e table to represent th e a ttrib u te s associated w ith th e rules.
Strings of a rb itra ry characters are used to represent attrib u tes.
2.1
Basic Definitions
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
22
An alphabet is a finite non-em pty set. Elem ents of an alp h ab et are called symbols.
Let A' be an alp h ab et. Then A'* is th e set of all words over A' including th e em pty
word e.
D efinition
is th e language generated by G.
and w is th e right (hand) side, or rhs of th e rule. For p = A > w, lhsp = A and
rhsp = w. T he set of nonterm inal sym bols of p is Np = Lp U Rp w here Lp = {lhsp}
and Rp = { M \ M E N and rhsp = W \M w 2 ,Wi a nd w 2 E V*}. T he set of term inal
symbols of p is Op {x | x E T and rhsp = W \xw 2, w i and w 2 G V*}.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
23
assignment.
s Q, w here s is th e s ta rt state,
F C Q , w here F is th e set of accepting states, and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
24
Chapter 3
A Framework for Interactive
System s
In this ch ap ter a framework for interactive system s is proposed.
introduced here is based on th e model defined by Abowd [4].
T he framework
3.1
Basic N otions
sup p o rt from com puter technology. In this section some aspects of hum an-com puter
com m unication are discussed.
3.1.1
It seems th e sta tic world of p ap er docum ents is gradually being replaced bv the
dynam ic environm ent of digital inform ation. In th e electronic form, docum ents need
to be stru ctu red in order to be processed by com puting systems.
A key elem ent of electronic docum ent processing is th e possibility of easy m an ipula
tion of a docum ent's atom ic elem ents by m eans of digital devices. This idea intro
duced th e necessity to view docum ents not only as printed o u tp u t generated by a
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
25
digital m achine, b u t also introduced th e need to store docum ents in a way to provide
full p o rtab ility to oth er com puter environm ents easily. This m eans th e stru ctu re of
docum ents needed to be preserved.
This way of viewing docum ents suggests they are composed of a logical stru ctu re, a
set of a b stra c t com ponents, and contents w here th e actual contents of th e docum ents
can be found. T he logical stru ctu rin g of docum ents is based on th e decom position
of docum ents into parts. Each p a rt in th e stru ctu re has a p articu la r m eaning and
may, recursively, be subdivided into oth er p arts. In this way th e whole docum ent
can be represented as a collection of hierarchically-related com ponents. An ab stract
com ponent, a given parag rap h of a docum ent, for exam ple, may be expressed over
one or more tw o-dim ensional page space, in various different ways, depending on
specifications of font, hyphenation, line length and other concrete variables.
The
sam e logical com ponent may be m apped into different concrete variables and then
m ade available in different m edia by m eans of a tactile display, a Braille p rin ter
or audio, for instance. In this thesis th e process of tran slatin g ab strac t docum ent
com ponents into concrete ones is defined as rendering. T he production of hardcopy,
images, speech or any other possible presentation stru ctu res from concrete docum ent
com ponents to o u tp u t devices are defined as viewing.
According to Levy [67] docum ents have been created in response to a hum an necessity
to provide stabilities in a constantly changing world. T he notion of fixing th e form of
a docum ent as a m eans of fixing its contents is viewed as a property docum ents have
which he defines as invariance.
It is intuitive to relate this notion of invariance to p ap er docum ents since they are the
result of a process by which surfaces of pap er sheets are usually m arked in a stable
way. On th e o th er hand electronic docum ents usually require rendering in order to
be m anipulated by hum ans. T he fact th a t one given ab stract docum ent com ponent
may be m apped into different concrete ones indicates th e existence of a one-to-m anv
relationship between them . This relationship is an im p o rtan t property of electronic
docum ents because it allows various m edia to be used to deliver th e inform ation
provided by th e ab strac t docum ent com ponent. T he idea of using different m edia to
com m unicate is discussed in th e subsection which follows.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
26
3.1.2
Inform ation is shared am ong hum ans bv a com m unication process. This process may
always be described in term s of th ree fundam ental com ponents: a sender, a receiver
and a com m unication channel or m edium .
input and o u tp u t devices and th e physical carriers such as sound waves and photon
distrib u tio n s are media. Therefore m edium is th e physical channel used for inform a
tion encoding. Sensory m odality is a hum an mechanism of perception where vision,
hearing, touch, smell, taste, and balance are used for th e processing of incom ing
inform ation.
medium .
C om m unication through a given set of m odalities is only possible when provided
by adequate inform ation carriers.
B oth th e spoken
In this
thesis, th is form of inform ation exchange is characterized by th e absence of com puterbased system s and by th e fact th a t bo th sender and receiver are hum ans sharing
place and tim e. H um ans also exchange inform ation w ith th e aid of com puter-based
system s. This form of inform ation exchange is referred to here as com puter-assisted
com m unication. T he concept of com puter-assisted com m unication is, in this thesis,
used in a broad sense.
interaction and com puter-m ediated hum an-to-hum an interaction. Also in th e context
of this work, interaction is used to refer to th e com m unication between user and
system .
H um ans usually make use of available m edia to com m unicate ideas and feelings.
A lthough th e increase of inform ation carriers does not necessarily improve th e com
m unication it is, m ost of the tim e, expected th a t th e inform ation to be shared is
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
27
3.2
User interface design for com puter applications is an interactive process w here sets
of objects are m anipulated. These objects can be stru ctu red according to th e role
they play in th e interaction. They can be of input, o u tp u t or b o th input and o u tp u t
types. They may also be of direct use in case th e physical object is m anipulated, or
they can be of indirect access if no physical in teraction is p erm itted .
The com ponent which connects in p u t and o u tp u t objects is generally referred to
as a system . Therefore th e user accesses th e system by m an ip u latin g th e interface
objects. System s differ by th eir intrinsic characteristics. These qualities are viewed
as statem en ts of a language which can be used to represent th e system . This will be
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
28
OUTPUT
SYSTEM
core
task
3.3
An Existing M odel
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
29
illu strates this framework. In this figure com ponents are represented as nodes and
tran slatio n s are th e arrows linking th e nodes. C om ponent nam es are typeset in upper
case letters and bo th th e nam es for th e languages and tran slatio n s are in lower case.
The languages are task , input, core and output.
As shown in Figure 3.1 articulation connects th e USER to IN P U T . Therefore it is
used to represent th e users intentions in term s of th e stru ctu re provided for d a ta entry
by th e system . Perform ance is responsible for th e tran slatio n of inform ation collected
during th e input stage into core d ata. T he s ta te of the system is m ade available to
o u tp u t devices by presentation. Observation is th e user's ability to perceive th e sta te
of th e system .
3.3.1
A Structuring Problem
It is intuitive to decom pose th e in teraction between user and com puter in term s of
execution and evaluation semicycles [39]. D uring this process th e user's intentions,
represented as statem en ts of th e task language, are m apped as in p u t com m ands which,
after execution by th e system , are observed and evaluated by th e user. If th e user's
intentions cannot be com pleted in a single cycle of interaction, o th er related cycles are
introduced. T he additional cycles are viewed as refinements of th e intended task to be
realized. T he fram ework proposed by [4] relates articulation and perform ance to the
execution semicycle and presentation and observation as elem ents of th e evaluation
semicycle. As defined by this approach, th e interactive cycle begins w ith th e USER
by th e form ulation of a goal, and a task to accom plish the goal. This approach is also
based on th e assum ption th a t th e only way th e user can m an ip u late th e m achine is
through th e IN P U T . For this reason, th e task m ust be articu lated w ithin th e input
language. A lthough A bow ds framework assumes th a t execution and evaluation are
not always altern atin g semicvcles, th e m odel does not indicate th e procedure to be
followed when th e user's goals first require th e knowledge of th e system ss sta te as
provided by th e o u tp u t devices1. As illu strated in Figure 3.1 A bow ds framework
establishes th a t th e evaluation semicycle always precedes th e execution semicycle.
Therefore following th e p ath as defined by th e arrows connecting th e USER and the
O U T P U T com ponents, articulation , perform ance and presentation are identified as
1A typical scenario for this is a user interfacing with a display-based system which first prompts
the user for input.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
30
nonactive tran slatio n s for activities when input devices are not involved.
The notion th a t th e interaction cycle m ust s ta rt w ith th e user bv th e form ulation
of a goal and a task is accepted in this thesis.
3.4
3.4.1
A N ew Framework
A new framework for interactive system s based on th e work developed in [4] is intro
duced here. T he proposed framework differs from th e model of [4] by th e in troduction
of an ad d itio n al tran slatio n which su p p o rts th e consultation of th e system s sta te by
th e user th ro u g h th e o u tp u t devices.
T he notion of interactive cycles is understood as sequences of com ponents connected
by tran slatio n s. T he sequences represent th e derivation of words of a language defined
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
.31
by all possible tasks which can be realized through th e system bv m eans of the
interface. T h e results obtained by th e derivation procedure represent th e user's tasks
th a t have been com pletely realized by th e resources available. These characteristics
are represented by th e right-linear g ram m ar
G = (N , T, P, B )
with
N = {U,I,S,0}
T = {c,a,p,v,o}
P = {U
cO \ a l \ e, I
pS, S
vO, O -> o U }
and
B = U
stan d ard sta te tran sitio n diagram s, statech arts [52, 53] will be used. T he reason for
this choice is due to th e fact th a t hierarchical stru ctu res are b e tte r visualized when
represented by these diagram s.
by th e sta te c h a rt in F igure 3.2. T he statech art in this figure has d epth two since
it stru ctu res th e states in two layers or levels of abstraction. T he higher level has
S Y S T E M , I N T E R F A C E and USER as states. T he lowrer level is a refinem ent of the
IN T E R F A C E sta te and is com posed of only two states, O U T P U T and I N P U T As it
can be seen lower case letters have been used to typeset bo th th e nam es for languages
and tran slatio n s. Each language has been placed inside th e box where its related
sta te nam e is located.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
32
FRAMEWORK
SYSTEM
USER
ts k
crtkaU
inpat
3.5
Example
Consider th e scenario where a client of a bank fails to w ithdraw cash from an A uto
m atic Teller M achine (ATM) because h e/sh e has forgotten th e required bank card.
The client/A T M interaction, for this case, may be described by th e following tasks:
C onsult sta te of ATM by reading inform ation provided by its display, and
In terp ret inform ation from display.
It is d uring th e Interpret inform ation from display task th a t th e client realizes the
adequate bank card m ust be supplied. N ot having th e needed card, th e client stops
th e cash w ithdraw activity and consequently th e client/A T M interaction term inates.
This activity may be expressed by th e framework proposed in this ch ap ter bv the
regular expression (co)*. T he tran sitiv e closure is used in this case to indicate the
client's necessity to cycle through consu ltatio n /o b serv atio n zero or as m any tim es as
h e/sh e feels it is necessary.
T he regular expression ((co) + ( a p v o ))* represents all possible interactions th e user
may have w ith th e system .
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
33
3.6
Summary
A framework for interactive system s which is based on th e model defined by Abowd [4]
is introduced in th is chapter. T he proposed approach uses an ad d itio n al tran slatio n
as a way to su p p o rt th e necessary user analysis of th e system 's sta te as supplied by the
o u tp u t devices. T he com plete cycle of interaction is modeled as a regular language.
A graphical representation of this organization is provided in a statech art form at.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
34
Chapter 4
A uthoring Environm ents
4.1
Introduction
4.2
This thesis considers com puter-based docum ent au th o rin g as an interactive process.
D uring th is process th e a u th o r m anipulates docum ents by m eans of interaction objects
as defined in C h ap ter 3. These objects can be m anipulated directly or indirectly by
th e user. A docum ent au th o rin g environm ent is a com bination of interaction objects
and is stru ctu red according to th e form of control th e au th o r has over th e interaction
objects involved.
C onsider a pe n /p a p er docum ent au th o rin g environm ent for instance. In this organi
zation, th e a u th o r uses th e pen to record inform ation on th e paper. This environm ent
is characterized by th e fact th a t all objects involved are directly m an ip u lated by th e
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
35
auth o r. T he interaction is com pletely under th e a u th o r's control because all infor
m ation printed on pap er results from direct actions perform ed by th e a u th o r on the
interface objects. To illu strate th e notion of a docum ent au th o rin g environm ent con
sider, for instance, a docum ent such as a research report w ritten in English. Table
4.1 provides a description of th e p e n /p a p er environm ent according to th e interaction
framework proposed in C h ap ter 3.
U SER
task
A uthor
Produce a h an d w ritten d raft of a research report in English
articu latio n
IN PU T
input
p e n /p a p e r
pen strokes
perform ance
cursive w riting
SYSTEM
core
P e n /p a p e r te x t au th o rin g
W ritten tex t
presentation
O U TPU T
o u tp u t
P ap er
Sets of h an d w ritten cursive characters printed on pap er
observation
consultation
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
36
U SER
task
A uthor
Produce a d raft version of a research rep o rt in English
using plain
m acros
articu latio n
IN P U T
input
K eyboard
Key strokes
perform ance
Key decoding
SYSTEM
core
presentation
O U TPU T
o u tp u t
Video display
Sets of characters rendered on the display
observation
consultation
4.3
Cognitive D istances
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
37
4.4
Rendering Information
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
38
4.5
_ [F(.s)G(s)] = J f ( x ) g { t - x)dx
o
1
In th e exam ple above th e change from lower case to u pper case letters has been used
to indicate th e dom ain change from t to s. T he syntax used enforces th e fact th a t
F( s ) is ju s t a different in terp retatio n of function / ( / ) . The Laplace tran sfo rm ation as
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
39
lower lim it to inform th e reader where th e operation sta rts and ends.
E x a m p le
: T he m atrix equation
Lx = m
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
40
4.6
Environment M odifications
(4.1)
where V , S , I are docum ent au th o rin g environm ent, docum ent instance stru ctu re and
system s interface respectively. This representation may be viewed as a refinement of
th e framework proposed in Section 3.4 to address th e details involved in th e S Y S T E M
com ponent. For com puter-based docum ent au th o rin g environm ents, this sta te needs
to be fu rth er decom posed in order to isolate th e o perating system s services from the
behaviour provided by th e docum ent stru ctu re. Figure 4.1 illustrates th e framework
FRAMEWORK
SYSTEM
OPEATOIG
SYSTEM
output
USER
taik
articalatif
input
Figure 4.1: Fram ew ork for docum ent auth o rin g environm ents.
proposed in C h ap ter 3 where th e S Y S T E M com ponent has been modified to support
th e proposed refinement. In this case core has been replaced by two lower level s ta te s 1,
th e o p eratin g system and th e docum ent stru ctu re.
Consider, for instance, a com puter-based au th o rin g environm ent Io = (So, Io) such as
th e one defined in Table 4.2. In th is case S represents th e plain TgX m acro package
0
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
41
The replacem ent of th e keyboard device by a m ouse/display pair, for instance, would
require articulation, input and perform ance to be redefined. A lthough th e a u th o r may
acknowledge a significant am ount of change due to th e mouse pointing and clicking
actions th a t replaced th e typing form of m anipulation, th e basis of th e docum ent
stru ctu re has not been modified. T he resulting environm ent can be represented as
^ i = (So i A ) w here I\ is th e modified interface. Replacing of th e plain T^X m acro
package by DT^X, for instance, will not have any effect on oth er p arts of th e environ
ment besides th e docum ent stru ctu re. This m eans th e au th o r will use th e keyboard
for in p u t, b u t is now required to have knowledge of fXT^X to express h is/h e r ideas.
This environm ent is represented by V2 = (Si, To) where S i is a docum ent stru ctu re
based on th e Dlj^X m acro definitions.
Different docum ent au th o rin g environm ents may therefore be obtained by th e follow
ing three approaches. One can either:
1
. m ain tain th e docum ent stru ctu re and modify th e system 's interface, or
. m ain tain th e system 's interface and modify th e docum ent stru ctu re, or
3. modify both.
4.7
organized.
2 According to [74] interaction styles are key-modal, direct-manipulation and linguistic.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
42
4.8
Recom m endations
In the previous sections th e basic characteristics which docum ent environm ents should
have in order to su p p o rt th e au th o rin g of m ath em atical concepts have been discussed.
These qualities are presented in term s of properties and indicate possible software
design approaches th a t may be considered in order to achieve them . Ideal docum ent
auth o rin g environm ents are viewed as software system s which su p p o rt th e properties
listed in Table 4.3.
4.9
Summary
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
43
PRO PERTY
D ESIGN A PPR O A C H
High Conciseness
High Expressiveness
A m biguity-freeness /
M ultim odality
E xtensibility
Table 4.3: D ocum ent au th o rin g environm ent characteristics and software design ap
proaches to help achieving them .
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
44
Chapter 5
M athem atical C onstructs and their
R epresentation
D ocum ent au th o rin g is an increm ental activ ity in which a set of in term ediate (draft)
versions of a docum ent are produced by th e a u th o r prior to th e creation of th e final
one. Any given version of a docum ent, except th e first one, may therefore be viewed
as th e result of an u p d ate of th e previous version of th e docum ent.
A uthoring docum ents th a t contain m athem atics or au th o rin g m athem atics for short,
is b o th increm ental and dynam ical. It is during th is activity th a t th e au th o r makes
explicit th e syntax th a t will represent th e m ath em atical concepts included in a given
version of a docum ent. T he design of docum ent stru ctu res to su p p o rt these char
acteristics m ust therefore include m echanism s to m anage b o th th e u p d ate and the
m eaning-to-syntax bindings determ ined during authoring.
This ch ap ter introduces th e notion of using CFG s as a m ajor form alism to support
th e dynam ics of au th o rin g m athem atics.
A set
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
45
5.1
A no tatio n al system uses a set of symbols to describe q u antities and ideas and it
is used as a su p p o rtin g mechanism for th e expression of ideas. A program m ing lan
guage is a special n o tatio n al system designed to solve problem s in a p a rticu la r dom ain.
This characteristic often establishes th e set of basic constructs th a t will provide the
language w ith th e necessary power to approach th e tasks in th e specified dom ain.
Language co nstructs are generally stru ctu red around statem en ts, and these pro gram
m ing statem en ts are, m ost of th e tim e, characterized as block statem en ts, flow control
statem en ts, expressions, and declarations.
This way of stru ctu rin g th e design of a program m ing language leads to th e idea th a t
th e language can be defined as a set of basic m odules th a t can be com bined to generate
oth er m odules. T he task of a m odule design may be accom plished through th e use of
a C ontext-Free G ram m ar, which will th ereafter be referred to as CFG in this thesis.
CFG s have been used as a m ajo r tool for th e specification of program m ing languages.
The im plem entation independence of this approach, provides th e designer w ith the
flexibility to work on th e developm ent of a language w ithout th e need to be concerned
w ith im plem entation details. P rogram m ing languages often need to be m apped into
other dom ains in order to b e tte r respond to th e user processing requests. Com pilers
are well known tools th a t su p p o rt th e tran slatio n of language definitions into other
forms.
CFG s are, in this thesis, viewed as a b strac t ty p e definitions, and sentences belonging
to th e g ram m ar as variables of th a t type. This idea is supported by th e fact th a t,
given a set of basic ty p e definitions or a set of CFG s, other definitions can easily be
produced by th e m anipulation of th e rules already defined. T he parsing process of a
com piler can therefore be interpreted as a ty p e checker which only verifies w hether
a given variable (a sentence) belongs to th e set provided by th e type definition (the
gram m ar).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4G
5.2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
47
Representation
derivative
dv'dt
conjugate
complement
Figure 5.1: M anv-to-m any relationship between m athem atical concepts and their
representation.
loading m eaningful symbols. T he arith m etic m ean, th e conjugate of a com plex num
ber as well as th e com plem ent of a boolean expression are well known concepts th a t
are often represented by placing a horizontal b a r over a variable nam e. For instance,
variable v could be chosen to represent all th ree concepts. It is clear th a t context
has to be included in any a tte m p t to com m unicate m ath em atical concepts.
It is
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
48
Assume (x, y, z) are given C artesian coordinates. We now let (x, y , z) be new coordi
nates w here x = Ax, y = Ay, z = \ z an d A is a positive scalar constant.
In th e context described, x, y and z are neither com plex conjugates, th e com plem ents
of boolean expressions, nor th e m eans. A new in terp retatio n has locally been pro
vided to th e variables. T he extensibility characteristic of th e m ath em atical no tation
increases th e level of com plexity involved in cap tu rin g the sem antics of th e concepts
presented.
T he representation of m ath em atical n o tatio n can be achieved bv eith er a p resenta
tional approach, in which th e visual ch aracteristics of th e symbols used in th e n o tatio n
are em phasized, or by a sem antic approach, where ab stract concepts are used as a
basis for th e representation. T he presentational approach was introduced during the
early stages of com puters. T y p esettin g system s like n roff/troff as well as TX are ex
am ples of such system s. A lthough b o th system s provide stable d a ta representations,
they lack th e necessary features to be used as a basis for th e representation of d a ta
in forms o th er th a n tex t. In contrast, as argued in [11], a n o tatio n al approach based
on th e m eaning of symbols, th a t is, based on th e sem antics of th e concepts is needed.
One of th e difficulties presented by th e representation of m ath em atical expressions by
th e ir contents is to cap tu re th e m eaning of th e concepts. A nother way of expressing
this ch aracteristic is to cap tu re th e m eaning which has been associated w ith a given
set of symbols in case th e concepts have already been encoded as these sym bols for
com m unication. For this reason th e representation of m ath em atical concepts by the
sem antic approach has not yet been im plem ented in totality.
5.3
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
49
5.3.1
The rep resen tatio n 's lifetim e of a m ath em atical construct in a docum ent may be char
acterized as a variable th a t denotes a locally pre-established relationship between the
ab strac t concepts involved and a user-defined interp retatio n .
S yntactic constructs
may tem porarily be bound to specific m eanings as th e result of a process led by the
au th o r of th e docum ent in order to com m unicate h is/h e r knowledge. Therefore this
context-dependent binding process is th e mechanism the au th o r has to express infor
m ation by m eans of a finite set of symbols. By fixing an in terp retatio n for a given
syntax for a period of tim e, th e a u th o r expresses h is/h e r knowledge at th e possible
cost of introducing symbol overloading and syntax am biguity. This process may be
2
to as dynam ic authoring.
M odeling stru ctu res to su p p o rt dynam ic au th o rin g requires m echanism s to support
th e sem antics cap tu rin g of th e m ath em atical concepts. This tran slates to th e need of
addressing not only d a ta representation issues, b u t it also indicates th a t th e context
in which m ath em atical concepts are represented need to be considered.
The docum ent u p d ate notion, im posed by th e dynam ic au th o rin g model, establishes
th e necessity of well-defined m echanism s for b o th accessing and m odifying th e stru c
tu ra l base upon which th e docu m en ts syntax and sem antics are represented. In the
2In this context, symbol overloading is viewed as part of an incremental updating process where
existing connections between mathematical concepts and syntactical constructs are modified. The
modification process either establishes or keeps a many-to-one relation between mathematical con
cept and syntactical representation.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
50
case of having a g ram m ar as th e su p p o rtin g stru ctu re for cap tu rin g th e sem antics of
m ath em atical concepts, a m odification of either th e syntax used for th e representation
of concepts3, or th e introduction of a new construct, will require an u p d ate process
in which th e related gram m ar definitions will need to be ad ap ted according to the
m odifications proposed.
It is d uring th e au th o rin g activity th a t syntax is bound to concepts. In th e event
th a t am biguities are introduced by symbol overloading, au th o rin g m echanism s can
be provided to resolve all context-dependent representations which, according to the
auth o r, need to be included in th e docum ent.
5.3.2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
51
and th e sam e syntax are understood as d istin ct objects. For th is reason they should
be trea ted separately bv m eans of th e ir own gram m ars.
One advantage of having C FG s as th e fundam ental stru ctu re to cap tu re th e m eaning
of m ath em atical concepts, is th e flexibility this mechanism provides in su p p orting
both th e design and recognition phases of th e cap tu rin g activity. T he design phase
is characterized by th e assignm ent of g ram m ar fragm ents to m ath em atical concepts.
D uring recognition, th e in p u t provided by th e au th o r is su b m itted to th e analysis
com ponent of th e associated language processor. At this stage, th e input is encoded
as tokens and its syntactical stru c tu re is m atched against th e related set of production
rules th a t has been provided d uring th e design.
A nother way th e recognition phase may be viewed is as the execution of a m em bership
verification perform ed by th e analysis com ponent. For this in terp retatio n a C FG is
equivalent to a data type or ju st type and each valid input is an instance of th e type.
This association is consistent w ith th e notion of ty p e provided by [72]. T he d a ta type,
in this case, is represented by th e s ta rt sym bol of th e CFG.
T he organization proposed in th is section, merges th e notions of m odule and type
by using C FG s as static stru ctu res to su p p o rt th e sem antics cap tu rin g requirem ent.
One benefit of organizing m ath em atical concepts as sets of g ram m ar fragm ents or
m odules is th e possibility of using b o th decom position and com position as aids to the
stru ctu rin g process.
A lthough is is possible to cap tu re th e m eaning of m athem atical concepts by m eans
of sta tic stru ctu res such as CFG s, this approach presents lim itations. O ne im p o rtan t
lim itatio n is th a t C FG s only su p p o rt th e definition of docum ent interchange form ats.
This m eans C FG s do not su p p o rt th e fundam ental requirem ent th a t au th o rin g m a th
em atics is a dynam ic activity in which th e bindings between m eaning and syntax are
established by th e a u th o r while m an ip u latin g th e docum ent. A discussion involving
this characteristic is presented as follows.
5.3.3
This subsection illustrates th e lim itatio n C FG s have in su p p o rtin g th e sem antics cap
tu rin g of m ath em atical concepts. For this purpose consider, for instance, auth oring
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
52
(5.1)
add
equality
left_expr
left.expr
right_expr
integer
left_expr = right_expr
left.expr + right_expr
integer
integer
1 1o
and
(5.2)
T he syntax of expression (5.2) can be cap tu red by th e g ram m ar in Table 5.1. However
its sem antics cannot. This is because th e au th o r has determ ined th a t th e context in
which th is syntax is valid has changed. O perations on integers have been replaced by
operations on Booleans, 1 m eans T R U E and 0 m eans FALSE.
CFG s provide no m eans of u p d atin g th e ir production rules. Therefore a docum ent
stru ctu re based on this form alism has to include a mechanism to su p p o rt th e ability to
respond to au th o rin g requests aim ed at th e creation of context-dependent m eaningto-syntax bindings.
5.3.4
U pdating CFGs
This thesis approaches th e sem antics cap tu rin g problem by m eans of an organization
th a t is based on sets of modules.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
53
. m odule reuse.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
54
provides exam ples to illu strate th e problem s introduced when g ram m ar rules sharing
th e sam e sy n tax are used to cap tu re different sem antics.
5.3.4.1
ca te x p r
3
4
expr
expr
term
integer
a d dexpr
3
4
expr
expr
term
character
and 2
expr + term
term
character
a | b
tical determ ine th a t bo th gram m ars define lists of term s separated by the + symbol.
A lthough they share this characteristic th e sem antics of b o th rules
and 2 depend
+ 2,
from th e g ram m ar in
Table 5.2 defines lists o f integers 1 and 2 that are separated by the + symbol. T he
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
expr
=+
=>
=>
=>
=>
expr + term
term + term
integer + term
+ term
+ integer
+
1
expr
=>
=>
=f>
=>
=>
expr + term
term + term
character + term
a + term
a + character
a + b
fact th a t integers are being separated by th e + sym bol suggests th a t rule , from this
1
a and b separated by the + symbol. For this case it can be stated th a t rule 1, from
ca te x p r
3
4
5
expr + term
term
integer | character
expr
expr
term
integer
character
a | b
T he C FG defined by th e rules in Table 5.6 combines rules 3 and 4 from b o th gram m ars
defined in Tables 5.2 and 5.3. Table 5.7 illu strates th a t th e derivation of a word
a +
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
56
expr
=>
=>
=>
=>
=>
expr + term
term + term
character + term
a + term
a + integer
a -I2
in terp retatio n for th e + symbol is not determ ined because th e sem antics attach ed to
this sym bol cannot be expressed by th e gram m ar in Table 5.6.
The com plete in terp retatio n for th e + symbol, in this case, is not provided by the
gram m ar rules as it had been for th e previous two scenarios. T he reason for this is
because th e sem antics attach ed to this sym bol cannot directly be expressed by the
g ram m ar in Table 5.6. A dditional inform ation, in this case, is necessary in order to
specify how integers and characters are to be processed by th e + operator.
The g ram m ars presented in this subsection illu strated th e possibility of one rule being
used to express several different sem antics. As has been shown, one C FG rule may
be applied to express m any sem antics. It is also possible to have th e sem antics of
a single concept cap tu red by different C FG rules. This ch aracteristic is discussed in
th e following subsection.
5.3.4.2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
As static stru ctu res, CFG s do not su p p o rt dynam ic authoring. For this reason exter
nal m echanism s need to be designed in order to u p d ate th e set of gram m ars involved
in th e m odifications proposed d uring authoring. Effective up d ates of these stru ctures
require bo th th e identification and control of redundant definitions.
CFG redundancy is, in th e context of this work, defined in term s of g ram m ar rules.
T he exam ples presented in th e previous subsection illustrated th a t CFG rules th a t
have identical syntax may be used to express different sem antics.
of this thesis, such rules are considered redundant.
In th e context
A nother form of redundancy occurs when th e sam e sem antics is expressed by differ
ent CFG s. G ram m ars in this case differ due to nonterm inal renam ing (isom orphic
gram m ars). T his form of redundancy will th ereafter be referred to as redundancy by
syntax equivalence4.
The fact th a t isom orphic gram m ars have different nonterm inal sets implies th a t their
sets of production rules are also different. Since th e s ta rt symbol of a g ram m ar is
interpreted as a type, these gram m ars introduce th e possibility of attach in g different
nam es to a single type definition. A careful analysis, in this case, is necessary to
identify th e scenarios where different types need to be defined. For this situ atio n a
dom ain specification needs to be provided in order to ensure th a t th e type definitions
are unique.
For any a rb itra ry CFG s G x and G 2, it is undecidable [55] w hether L { G X) = L ( G 2).
Therefore th ere is no effective approach to identify redundancies by syntax equiva
lence. In o th er words this form of g ram m ar redundancy cannot effectively be elim i
nated by operations perform ed on th e stru c tu re th a t supports th e sem antics cap turing
4Isomorphic grammars produce equivalent abstract syntax trees for all words in the language they
generate, therefore the same semantics is always expressed by them. For this reason redundancy by
syntax equivalence is only identified if the grammars involved are isomorphic.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
58
Hence
Two different im plem entation aspects will benefit from this norm al form. They are:
th e sem antics cap tu rin g of m ath em atical concepts and
th e com position of g ram m ar fragm ents.
T he need of a norm al form for th e gram m atical stru ctu re used in th e sem antics cap
tu rin g process is to avoid definitions w here th e nonterm inal arrangem ent on th e right
hand side of th e production rules hides th e m eaning of th e concept to be captured.
T his problem is solved by th e adoption of a set of tem plates th a t will enforce the
construction of th e production rules in a p articu lar way in which th e m eaning of the
m ath em atical concepts could correctly be captured. These tem p lates are th e sm allest
stru ctu ra l com ponents th a t are allowed in th e cap tu rin g of m ath em atical concepts
by CFG rules. As restrictions on g ram m ar rules they establish th a t th e cap turing
approach may need to decom pose th e ab strac t concepts.
This is necessary to en
sure th a t concept com ponents are captured by gram m ar rules th a t follow th e form at
defined by th e tem plates.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
59
T he com position process, by which g ram m ar fragm ents may be com bined to pro
duce o th er definitions, should be free of any inform ation th a t is not necessary for
th e successful com pletion of th e desired g ram m ar arrangem ent. This m eans th e com
position process should not introduce definitions th a t carry redundant inform ation.
T he following sections discuss th e possibility of cap tu rin g th e m eaning of ab strac t
m ath em atical concepts by m eans of CFGs.
5.4
T he idea of expressing m ath em atical concepts as language fragm ents is used here
as an aid to cap tu re th e sem antics of m athem atics concepts. W ith this technique,
th e definition of m ath em atical concepts which co n trib u te to th e definition of other
concepts can be isolated and approached by g ram m ar fragm ents.
A com position
process will la ter com bine all necessary g ram m ar fragm ents as a way of representing
com plex m ath em atical concepts.
As it is com posed of ab strac t concepts, m athem atics needs to be encoded in order to
be com m unicated. T he encoding proposed by th e conventional m ath em atical n o ta
tion is a representation form at th a t is usually used for com m unicating m athem atics.
A lthough this no tatio n is used to su p p o rt th e discussions on th e cap tu rin g proce
dure th is thesis proposes, it is im p o rtan t to em phasize th a t m athem atics is com posed
of ab strac t concepts. For this reason encoding strategies are needed to sup p o rt the
m anipulation of these concepts. For instance, a discussion involving a polynom ial
is simplified when this a b strac t concept is encoded according to th e stan d ard m a th
em atical notatio n . Consider, for exam ple, th e following identity expression, which
displays th e elem ents of a polynom ial as its right h an d side term .
k = abc + a 2b2c + . . . + anbncn
(5.3)
This and_so_on
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
60
9i
term
other
ju x tap o sed
first
second
th ird
power
Consider th e right hand side of expression (5.3) where th e polynom ial is defined. One
possible way to express this as g ram m ar fragm ents is to consider each term of the
polynom ial as a word from th e language G { a kbkcfr | 2 < k < n } U {abc}. Table 5.8
92
polynom ial
polyexpr
polyexpr
Table 5.9: CFG fragm ent for expressing addition_ellipsis and addition operations.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
61
93
expression
leftside
rightside
A lthough G has been used to list all term s of th e polynom ial equation, its words may
also be applied to represent o th er m ath em atical concepts. Consider, for instance, the
field of form al languages. In this case akbki* is viewed as a strin g of characters. The
sem antics cap tu rin g process therefore should be based on th e notion of considering
literal strings of characters as th e syntactical stru ctu re to be processed. A strin g such
as a
5.5
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
62
9b
words
words
word
word
index
index
index
One possible g ram m ar fragm ent to represent b o th subscripts and superscripts is pro
vided in T able 5.11. T he production rules associated w ith superscripts follow the
rules for subscripts in order to ensure th e correct precedence for bo th operators.
Consider, for instance, th e representation of
A m ore com plex exam ple is provided as follows to illu strate th e precedence charac
teristics of b o th superscripts and subscripts. T he symbol = is used to represent the
equivalence of th e two forms of representation.
zq
(5.4)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
63
5.5.1
Overloading Subscripts
The need for stru ctu rin g m ath em atical n o tatio n around a set of dom ains is em pha
sized here. C onsider th e recurrence relation a n+i = 3an, n > 0,ao = 5 which has
an 5(3"), n > 0 as its general solution. Also consider a two dim ensional m atrix
defined as follows:
(5.5)
A =
representation as th e elem ent of m atrix A located on th e first row and second col
um n. A lthough these concepts share th e sam e visual form, different in terp retatio n s
are expected depending on th e context in which they are presented. This context is
interpreted as a dom ain or subfield and may be as general as, say, D iscrete M athe
m atics or L inear A lgebra. It may also be specific depending on th e characteristics of
th e concepts involved. By lettin g a
12
necessary is supplied to determ ine its m eaning uniquely. This form of stru ctu rin g
m athem atics, by grouping knowledge into dom ains, will be used as a m echanism to
resolve am biguities in this thesis.
In th e linear algebra dom ain, for instance, th e syntax a,j represents th e operation
th a t establishes th e link which is used to locate elem ents in th e A m atrix.
The
10o is
th e 100^ elem ent of a one dim ensional m atrix. T he in terp retatio n associated w ith
& 1 2 3
6 1 2 3
refers to
one p a rticu la r elem ent in th e stru ctu re. A one-to-one m apping between syntactical
representation and elem ent location is not possible if m atrix B tw o-dim ensional. Two
in terp retatio n s are associated w ith th e syntax >i
2 3
located in row 1, colum n 23 or in row 12, colum n 3. This am biguity could be resolved
by th e intro d u ctio n of an o p erato r to determ ine where th e link between th e dim ensions
of th e stru ctu re is to take place. For instance bsub( 1 2 ,3 ) could be used to reference
th e elem ent located in row 12, colum n 3 in m atrix B.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
64
5.5.2
i/i =/++r
(5.6)
r -r
(5.7)
and
/ =
9f
function_parts
p o sitiv e.p art
negative_part
fu n ctio n ed
positive_part negative_part
fu n ctio n ed s u p +
fu n ctio n ed s u p
ID E N T IF IE R
(5.6), indicates th a t this symbol is used to represent two operations and each instance
of it aim s a t th e representation of a different concept. T he superscripted instance
characterizes th e unary postfix o peration of tak in g th e positive p art of a function,
whereas th e b inary infix instance represents addition. T he definition presented in
Table 5.12 illu strates a possible g ram m ar fragm ent to represent bo th th e positive and
negative p arts of a function.
5.6
Representing M atrices
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
65
fragm ent illu strated in Table 5.13 presents th e necessary rules for th e definition of
m atrices.
Qc
m atrixrule
dim list
dim list
elist
elist
el
size
n (;;)=U)
T he syntax enforced by th e rules provided by Table 5.13 is presented below:
M atrix {2 : 2 (3 ,1 ,0 ,3 )} M atrix{2 : (
1
2 1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
66
5.7
The representation of sets of num bers as intervals is frequently used in algebra. For
exam ple (a, b) = {x | a < x < 6} and as [a, 6] = {x | a < x < b}. In this form of
representing num bers, th e delim iters do not always m atch. We illu strate this bv the
two expressions th a t follow.
[a, b) = {x | a < x < b}
(a, 6] = {a; | a < x < b}
9d
1
2
3
4
5
6
7
8
intervaLvar
values
left_value
right.value
left.delim iter
left_delim iter
right_delim iter
right_delim iter
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
67
9e
interval
interval
o p en Jn terv als
open in te rv a ls
closedJntervals
closedJntervals
open .p a rt
closed.part
body
left.value
right_value
COM M A
RIGHT_CLOSED_PAR
RIGHT_CLOSED_DEL
LEFT_O PEN _PA R
LEFT_O PEN _D EL
o p en Jn terv als
closedJntervals
open_part RIG H T_O PEN _PA R
o p e n .p a rt RIGHT_CLOSED_DEL
closed.part R IG H T .C L O S E D .D E L
closed.part R IG H T .C L O S E D .D E L
L E F T .O P E N J A R body
L E F T .C L O S E D J9 E L body
left.value COM M A right.value
ID E N T IF IE R
ID E N T IF IE R
)
]
(
[
5.8
Representing Sums
T he concept of sum m ation is discussed in this section. B oth am biguity and extensibil
ity problem s associated w ith this operation are illu strated by exam ining its sem antic
characteristics.
Consider th e sum represented by th e expression below.
21 = i
t=i
(5.9)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
One
68
S u m { ra n g e J i s t' . e x p r e s s i o n }
dering is
th e iterative
com ponent of th e sum construct, as illu strated in E quation 5.9. However th e idea of
range is m ore m eaningful when this expression is represented as
21=
(5.11)
!<t<6
9i
1
2
3
4
5
6
7
identity _expr
sample_expr
sample_expr
sum
ra n g e Jist
s ta rt
identifier
T he g ram m ar fragm ent illu strated in Table 5.16 allows sum m ation constructs, such
as th e one in E quation (5.9) to be described by th e syntax th a t follows.
S u m { i = 1,6; i}
(5.12)
T here are situ atio n s where m ore com plex iteration control is required and som etim es
th e necessary sum m ation condition is expressed as com pound statem en ts. T he ex-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
69
*=
Yi
i m /2
j=0
i+ j= n
i+j
(5-13)
In E quation (5.13) th e inner sum m ation includes a com pound statem en t. T he itera
tion m echanism is extended to su p p o rt th e com posed condition which makes use of
a syntactically hidden conjunction to define th e lower lim it for th e iteration. The
m eaning associated w ith th e = sym bol, in its two occurrences on th e conjuncted con
dition, is not th e same. An am biguity was introduced by th e ad d itio n al sem antics
attach ed to th e = symbol as th e result of an extension procedure.
9i
1
2
3
4
5
5.1
5.2
6
6.1
7
identity_expr
sample_expr
sample_expr
sum
ra n g e Jist
s ta rt
s ta rt
single_start
com pound_start
identifier
sample_expr EQ sample_expr
expr
sum
SUM { ran g eJist : sam ple.expr }
s ta rt , end
single_start
com pound_start
identifier = expr
single_start ' identity_expr
ID
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
70
T he gram m ars proposed for cap tu rin g th e sem antics of th e sum m ation concept illus
tra te th e need for a com position process to su p p o rt th e extension of already defined
constructs by reusing existing g ram m ar fragm ents. C h ap ter 6 discusses th e g ram m ar
extension problem and provides a solution in term s of g ram m ar operations.
5.9
Conclusion
This ch ap ter introduced th e notion of using CFG s as the m ajo r form alism to capture
th e sem antics of m ath em atical concepts. It discussed th e advantages and lim itations
of using C FG s to su p p o rt th e dynam ics of au th o rin g m athem atics.
T he syntax of program m ing languages is usually specified by m eans of C FG s [95].
S tru ctu rin g th e m ath em atical n o tatio n as a program m ing language has th e advantage
of using C FG s for its specification and processing.
Specification is su p p o rted by
th e C F G 's stru ctu rin g m ethods which include com position, choice, repetition, and
recursion [95]. Effective and efficient parsing algorithm s and tools are available to
sup p o rt its processing.
A lthough C FG s have successfully been used for th e specification of th e syntax of pro
gram m ing languages, this form alism is not ad eq u ate for th e definition of th e sem antics
of program m ing languages [100]. A nother im p o rtan t lim itation this form alism has re
lates to its sta tic characteristic. This restricts its use to th e su p p o rt of organizations
th a t do not depend on th e notion of update.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
71
Chapter 6
M odelling C ontext D ependent
Inform ation
T he notion of using C FG s to su p p o rt th e sem antics cap tu rin g of m ath em atical con
cepts was introduced in C h ap ter 5. This ch ap ter proposes th e fundam entals of a
docum ent organization th a t models th e dynam ics of au th o rin g m athem atics. T he
m odel su p p o rts b o th th e extensibility and am biguity characteristics of m ath em atical
n o tatio n and is capable of cap tu rin g th e m eaning of m ath em atical concepts bv means
of syntax defined during authoring.
6.1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
72
In
sum m ary this language provides a m echanism to aid th e au th o rin g of m ath em atical
concepts th a t are being cap tu red by th e gram m ars from th e set. It also provides, by
m eans of th e scope, a dynam ical form to cope w ith syntactical am biguities.
Interm ed iate representations of docum ents are generated as a result of th e interac
tion between th e a u th o r and th e PNS. These hierarchical interm ediate representations
sup p o rt th e provision of th e inform ation th a t th e rendering stru ctu re will m anipu
late in order to generate different views of a docum ent. T he set of docum ent views
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
73
PNS
HIR
RS
Figure 6.1: S tru ctu re to su p p o rt dynam ic au th o rin g and m ultim odality processing
T he interaction between th e th ree m odules is illustrated in Figure 6.1.
T he ar
row /function pair is used to represent how inform ation is processed. T he in terp reta
tion associated w ith this form of representation is described as follows.
F unction / ( ) represents th e service provided by PNS to its only client HIR, which
involves th e creation of an in term ediate docum ent representation. T he set of functions
h i ( ) , . . . , hk() is used to represent th e set of services th a t RS provides. These services
are based on th e knowledge stored in H IR th a t are shared w ith RS through g(). They
are m echanism s to produce different views of th e encoded docum ent obtained from
HIR. T he views, represented bv th e boxes labeled
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
74
M odules
PNS
H IR
RS
C om ponents
a
s
1
m
g
i
P
r
d
e
The
provision of all gram m ars necessary for su p p o rtin g th e dynam ic au th o rin g process is
therefore th e result of actions taken by th e au th o r th a t involve th e stru ctu re defined by
th e PNS. T he various docum ent applications d may be obtained from th e interm ediate
representation i , by th e application specialist s. For each application th e knowledge
of specific sem antics p as well as rendering m echanism s r are necessary.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
75
6.2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
76
which can be followed during th e creation of a single docum ent. This idea is now
extended by th e notion th a t a docum ent may be considered as th e result of a set of
m odifications applied to o th er docum ents. This property as well as an exception to
this notion are discussed next.
T he proposed docum ent stru ctu re is based on th e assum ption th a t any docum ent in
its final version is seen as th e result of a com position process in which interm ediate
versions of th e docum ent are produced. As th e a u th o rs ideas evolve and new concepts
need to be included, different versions of th e docum ent are generated. These versions
can be in terp reted as blue prints of th e a u th o rs capacity to com m unicate ideas and
concepts.
T hree im p o rtan t stages related to th e versions of a docum ent produced during the
auth o rin g process are identified here. T he first is th e one in which th e a u th o r makes
use of any available concept definitions. D ocum ents created during th is stage are
called
process is over and th e outcom e is th e final docum ent. In general, m any different
versions of a docum ent are created before th e final one is produced. This leads to the
th ird stage, th e in term ediate one, w here all intermediate versions of a docum ent are
created. In a case where only one docum ent version is produced during th e com plete
auth o rin g process, th e default, final and in term ediate versions are th e same.
At any in stan t during th e au th o rin g process, th e stru ctu re required to su p p o rt the
creation of a p articu la r version of a docum ent is th e result of a process involving a set
of gram m ars. Each isolated gram m ar contributes to th e cap tu re and representation
of a t least one m ath em atical concept and has been included in th e docu m en ts sup
p o rtin g stru ctu re by m eans of one of th e following three approaches. Each gram m ar
fragm ent either
1. has been created by stan d ard ed iting procedures or
2. has already been defined or
3. has resulted from g ram m ar operations.
T he stru ctu re proposed in this ch ap ter organizes gram m ars into directories, and the
com position of a directory includes definitions which have been created by any of
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
77
m ath e
m atics.
6.2.1
( 6 . 1)
( 6 .2 )
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
78
stru c tu re as defined in
Uc ;
(6-3)
t=l
(6.4)
T he gram m ars in N i are gram m ars created by stan d ard ed iting m echanism s. The
gram m ars in F- represent gram m ars th a t have already been created. They are ready
to be used and satisfy th e following condition:
=
F?
if j = 0, i = 1
c u u u #* ifz>i
*= 1
k=
(6'5)
T he th ird set, C l collects all g ram m ars th a t are introduced by th e two binary oper
ations in th e set B {%,
as follows:
C{ = { h P h ' | h, t i {F> U N i ) A P G B }
(6.6)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
79
6.3
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
80
th e stan d ard notatio n , it is often encoded as a single lowercase le tter in bold face
type. T he representation of this concept may, for instance, be viewed as either a
prefixed or a postfixed expression w ith no operands.
A norm al form for CFG s is, therefore, proposed as a way of stru ctu rin g gram m ars
to su p p o rt th e expression form ats discussed. T he term inal symbols of th e proposed
stru ctu re are used for th e representation of th e o p erato r's nam e and th e nonterm i
nals are used for th e representation of th e operands and necessary delim iters. This
gram m ar stru ctu re also provides th e necessary mechanism to su p p o rt recursive def
initions since they are needed to cap tu re th e repetitive occurrences of certain types
of operators in expressions.
> a a
(2)
->
aa
(3)
->
AaA
(4)
->
Proof: This result follows from th e super-norm al-form theorem in [71, 106].
Each p roduction rule of th e EN F may be interpreted as an atom ic g ram m ar fragm ent.
To achieve th is assum e each one of th e four kinds of rules, as proposed by th e EN F,
defines a CFG.
In Section 5.3 th e correspondence between C FG and type was proposed. This indi
cates th a t th e definition of a ty p e will be a function of the num ber and th e stru ctu re
defined by th e g ram m ars rules. For any given C FG rule, th e com bination of te r
m inals and nonterm inals determ ines th e ty p e of th e rule. Rules may therefore be
organized according to th e num ber of term inals and nonterm inals as structured and
7G being in ENF means, G is an interpretation of a 2-symbol CFG form [106] with rules only of
the types listed.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
81
non-structured. N on-structured rules define prim itive types such as integer, real and
character for exam ple. Rules th a t cannot be associated w ith any ty p e are also con
sidered n o n -structured. S tru ctu red rules define types and th e m eaning of a ty p e may
depend on inform ation provided by o th er rules.
T he following definitions im pose restrictions on gram m ars in E N F as a way to clas
sify these gram m ars according to th e criteria of being stru ctu red or n on-structured.
T he resulting gram m ar fragm ents are th e building blocks which will be used for the
sem antics cap tu rin g process.
production.
O peratorless gram m ars are used to introduce specializations. T h a t is, a concept asso
ciated w ith S is specialized to B. Any in stan tiatio n of B is therefore an in stan tiatio n
of S.
D e f in itio n 4 A C FG G = ( N , T , P, S) in E N F is called a primitive g ram m ar if
N = {S}, T = {a} and P = { S
a}.
production.
P rim itive gram m ars introduce atom ic types. T h a t is, the type assigned to its nonter
m inal does not depend on th e type associated w ith any oth er nonterm inal.
D e f in itio n 5 A C FG G = ( N , T , P, S) in E N F is called a basic g ram m ar if for
a ( N U T ) + its set of rules is P { 5 > o } . T he rule S > a is called a basic
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
82
D efinition 6 All C FG s in EN F which are neither operatorless, prim itive nor basic are
called derived gram m ars. Derived gram m ars which have no operatorless productions
are called reduced gram m ars.
The following exam ple illustrates th e notion of basic gram m ar. Consider, for instance,
gram m ars
. Gi
( N u T u P ^ S t ) w ith
N,
= { S U B , C } , 7 \ = {a}, P l = { S l
G-i =
(Ari,T i, P 2, S i) w ith
P2
= {5i
aBC}
aBCa}
6.3.1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
83
c a tE q u a l it y
expr
expr
term _cat
term .cat
term
term
term .strin g
factor
expr EQUALS te rm .c at
te rm .c a t
te rm .c a t CO N CA TEN A TIO N term
term
term P O W E R factor
term _string
STR IN G
IN T E G E R
The operations and operands involved in this expression are described in th e g ram m ar
represented by th e set of production rules defined in Table 6.2. T he hierarchy im posed
by th e rules of g ram m ar catEquality 9 establishes th e following seven dependency
relations:
EQUALS
C O N CA TEN A TIO N
EQUALS
POW ER
EQUALS
STR IN G
C O N CA TEN A TIO N
STR IN G
C O N CA TEN A TIO N
POW ER
POW ER
STR IN G
POW ER
IN T E G E R
9It has been assumed that STRING is defined by the regular expression [a-j]" and INTEGER
is a nonzero positive integer.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
84
The above relations determ ine th e dependencies which exist am ong th e m ath em atical
concepts which have been used for th e definition of an o th er concept. In th is thesis they
will be called terminal dependencies 10 because of th e one-to-one association between
th e nam e of th e m ath em atical concept and th e term inal sym bol wrhich represents it
in th e g ram m ar which captures th e m eaning of th e concept.
scheme
dexpr
dexpr
rest
det
det
det
dlist
first
others
object
object
m ore
more
->
>
>
>
->
->
->
->
->
>
>
ID { dexpr }
dexpr rest
ID
<= det
ID
( dlist ) more
dexpr
first others
ID
, object
ID
dlist
e
; dexpr
T he g ram m ar in Table 6.3 provides th e syntax which will be followed, in this thesis,
to represent term inal dependency relations. Each word belonging to th is g ram m ar is
called a representation scheme.
Since representation schemes are always related to gram m ars, they will be identified
by th e g ram m ar's nam e appended w ith th e literal string Scheme. T he expression
which follows determ ines th a t catEquality Scheme is th e representation scheme for
th e g ram m ar defined in Table 6.2.
c at .E q u a l it yS ch em e {E q u a l s <= ( C o n c a t e n a t io n , P o w e r , S t r i n g );
Co n ca t e n a t io n
( S t r i n g , Power ):
P o w e r <= (Integer, S t r i n g ) }
10Although the formal definition of terminal dependency is provided at the end of this section,
these dependencies can easily be identified whenever the related grammar is expressed as a reduced
grammar.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
85
11
fragm ents which can be tested for nonterm inal relationships. T he existence of rela
tionships am ong nonterm inals in different g ram m ar fragm ents leads to th e notion of
gram m ar dependency. These ideas are form ally presented by th e following definitions.
Ei
Ei
e3
E
4
Es
Ef,
expr
expr
expr
term
term
factor
11 Since the start symbol of a grammar is interpreted as a type, the decomposition of a grammar
as a set of grammar fragments is viewed here as a type decomposition.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
86
<t= ( E 2 , E 3 , E 4 , E h)
(6.7)
E2
<= (E \, E 3, E 4, E 3, Ee)
(6.8)
E4
<= ( E 5 , E 6)
(6.9)
As can be seen, com m as and bo th opening and closing parentheses have been used
for th e representation of dependencies.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
87
6.4
This section introduces two operations. These operations have gram m ars as b o th their
input p aram eters as well as th e ir retu rn ed inform ation. In p u t gram m ars are seen as
providers of b o th syntax and sem antics and they are never modified by th e operations.
T he o u tp u t produced is th e result of th e com bination of th e production rules supplied
by th e input. Since th e application of any of th e two proposed operations produces
a single gram m ar, th e creation of m ore com plex gram m ar definitions may be seen as
th e result of a sequence of operations which would use th e inform ation obtained from
previous operations. Therefore th e creation of th e final g ram m ar may be viewed as
th e result of a process where g ram m ar fragm ents have been inserted a n d /o r deleted.
B oth operations are defined for in p u t gram m ars in EN F. This requirem ent guarantees
th a t th e o u tp u t g ram m ar is also in EN F. In this thesis these operations are the
means by which gram m ars are com bined in order to support th e extensibility of the
m ath em atical n otation.
T he use of C FG s as a su p p o rtin g organization to cap tu re th e m eaning of m ath em atical
concepts, as previously proposed in this work, is restricted to docum ent stru ctures
which can only be modified by editing mechanism s. This lim itatio n was discussed in
Section 5.3 w here th e correspondence between C FG and ty p e was presented.
The need to either overload a given sym bol by attach in g a different m eaning to it,
or to introduce a new syntactical representation for a m ath em atical concept may be
viewed as m odifications to be executed on gram m ars which have already been defined.
A nother approach to this need is to generate gram m ars to su p p o rt th e m entioned
requirem ents by reusing, whenever possible, th e available gram m ars. T he notion of
gram m ar reuse as defined by th e two operations proposed here is considered one of
th e fundam ental m echanism s
1 2
cap tu rin g necessity. For this reason b o th operations do not modify gram m ars which
have already been created. Instead they su p p o rt th e sem antics cap tu rin g activ ity by
12Another important mechanism is the notion of context switching or scope.
introduced in this chapter to support symbol overloading.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This notion is
88
allowing gram m ars to be created by reusing inform ation provided by o th er gram m ars.
T he following definitions introduce these operations:
D e f in itio n 10 Let G b = ( N b, T b, Pb, S b) and G 0 = ( N 0 , T o, P0, S 0) be two C FG s in
EN F. T he composition operation G b o G 0 will produce a C FG G c = ( N c, Tc, Pc. S c) as
follows:
Pc = Pb U P 0
Nc = Nbu N
Tc = Tb U T 0
Sc = Sb
Tb U T 0
Sx = Sb
expr
Table 6.5 displays th e basic g ram m ar G which captures th e sem antics of expressions
2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 . 6
and
89
expr
Table
term
term
num
Table 6.7: O peratorless g ram m ar linking term and num nonterm inals.
Table
6 . 8
respectively.
G^
o Gg o G }Gr i .
} ? 2 4
G has th e nam e G
4
2 4
o Gg
2 4
2 4
and G
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
90
num
Table
N U M B ER
24
expr
expr
i 3
and G r 2
are displayed in Table 6.14 and Table 6.15 respectively. A sim ple g ram m ar fragm ent
to deal w ith th e usage of b o th extension and com position operations is presented in
Table 6.16. A ccording to this g ram m ar th e result of th e binary o peration(s) may
either be saved as a new g ram m ar or not. This is a consequence of th e fact th a t
th e nonterm inal new_class may be replaced by th e term inal ID E N T IF IE R or by the
em pty strin g e. Therefore whenever variable new_class is replaced by th e em pty strin g
th e result of th e binary operation(s) will not be rem em bered. A lthough there is no
m eans of reusing th e result produced, th e procedure does generate a gram m ar. In
this thesis, this g ram m ar is called an implied gram m ar.
T he notion of im plied gram m ar introduces th e possibility of defining dom ains w ithout
adding gram m ars to th e dom ain directory. These types of dom ains exist only during
run-tim e and are called implied dom ains.
6.5
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
91
expr
expr
term
num
Gri
Gi
Table
term
.
1 1
way of generating o th er g ram m ar fragm ents. T he set of g ram m ar fragm ents are, in
this way, u p d ated in an increm ental style by m odule reuse. As described so far, the
solution su p p o rts extensibility from a restricted point of view since it does not con
sider th e m ulti-dom ain aspect of m athem atics. Instead it assumes th a t all concepts
to be represented belong to a single dom ain.
T he proposed approach allows th e possibility of considering g ram m ar fragm ents as
both open and closed concepts. T he fact th a t they may be used to represent unique
inform ation which may be stored as com ponents of a library and used by clients
of th e library, characterizes them as closed concepts. On th e other h an d th e sam e
fragm ents may contain inform ation which may be used for th e creation of a new
gram m ar fragm ent by m eans of th e two binary operations. For th is reason they may
also be considered as open concepts. T his in terp retatio n is consistent w ith th e notion
of object-oriented class as provided by [72]. A ccording to this in terp reta tio n CFG s
correspond to classes. Therefore for a given C FG , say, for exam ple G = ( N , T , P, S ),
th e words in L( G ) will inform ally
w ith G.
13
O peratorless
gram m ars and CFG s which have only operatorless productions are an exception to
this because they have no m eans to express any concrete objects, and therefore cannot
generate m ath em atical expressions.
13This association is loose because some fundamental characteristics of classes cannot be expressed
as grammar operations. Consider, for instance, the notion of subclass. This concept does not always
correspond to grammars which result from either the extension or the composition operations.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
92
G>
term
factor
Table 6.12: O peratorless g ram m ar linking term and factor nonterm inals.
G5
factor
num
Table 6.13: O peratorless g ram m ar linking factor and num nonterm inals.
6.5.1
T he approach presented in th e previous sections does not properly address docum ent
organizations containing symbol arrangem ents which have been used to express con
cepts which belong to m ore th a n a single m athem atical field. In order to extend
th e proposed process a relation between sym bol overloading14 and d o m ain /d irecto ry
needs to be established. T he solution proposed in this subsection approaches symbol
overloading by m eans of a real-tim e u p d a te 15 process. This process is th e mechanism
by which th e stru c tu re of a docum ent ad ap ts in order to cope w ith representation
am biguities introduced as th e result of overloading.
For any given directory th e solution determ ines th a t the overloading is resolved by
means of a dynam ic directory change. This implies th a t th e m eaning of symbol ar
rangem ents is a function of th e directory in which they are defined. T he dynam ic
characteristic is required to su p p o rt th e possibility of user-defined syntax to be in
troduced d uring authoring.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
93
G 13
term
term
G T2
expr
expr
term
term
factor
num
Table 6.15: R esulting g ram m ar for expressions involving addition and m ultiplication.
D efinition 12 Let S be an alp h ab et and C be a nonem pty finite set of m ath em atical
concepts. T he representation of a m ath em atical concept is a m apping from C to S + .
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
94
s tm tJ is t
stm t
class
class
operator
other_class
new .class
s tm tJ is t ; stm t | stm t
{ class } new.class
> class o p erato r other.class
> other.class
% | o
stm t | ID E N T IF IE R
-> ID E N T IF IE R | e
th e need for a sep arate directory. T his m eans th e concept representation forces the
existence of an organization in which its m eaning is uniquely defined. This approach
has th e advantage of considering th e m ulti-directory characteristic as a su p p orting
mechanism for th e solution of th e sym bol overloading necessity.
Sem antics m o dularity is achieved when th e m any-to-one m apping between concept
and representation is restricted to non-overloaded dom ain directories. T he resulting
docum ent created, once th e com plete au th o rin g process is over, will have its contents
n atu rally organized according to th e m eanings of th e concepts involved.
6.6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
95
as a single CFG . Since a directory will u ltim ately be represented by its C FG , there
exits a language processor18 associated w ith its gram m ar.
A lthough a directory which com prises p art of th e logical stru ctu re of a docum ent need
not have any direct dependency w ith th e others, they all share a com mon s tru c tu re 19
where all m ath em atical inform ation of th e docum ent is presented. This requirem ent
establishes th a t a form of synchronization is necessary to g uarantee th a t th e next
piece of d a ta to be processed will be dealt w ith by its associated processor.
The arrangem ent by which th e m ath em atical concepts are organized th ro u g h o u t a
docum ent is a user-defined task which takes place during th e au th o rin g process. It
is during th is phase th a t th e a u th o r specifies b o th th e syntax as well as th e tru e
m eaning of operations by binding concepts to syntax and collecting them into related
dom ains and directories. T he stru ctu re of th e docum ent, a t any tim e d uring this
process, will therefore reflect th e way these directories are arranged. T here are three
possible ways directories may be com posed. A docum ent stru c tu re is th e result of
directories arranged in one of th e following Directory Composition F or m s :
P u re linear,
pure hierarchical or
com bined form , a com bination of linear and hierarchical.
In a pure linear organization, directories are self contained. This m eans th ere is only
a single scope w here objects are delim ited. D irectories organized in th is way may
be processed in a F irst-In F irst-O u t(F IF O ) fashion. In a pure hierarchical organiza
tion, directories are processed in a L ast-In F irst-O u t(L IF O ) style. T he sem antics in
these types of docum ents are stru ctu red in a nested way such th a t only th e inner
m ost directory has no dependency w ith th e others. T he m ost com m on stru ctu re is
th e com bined one which is characterized by a random p a tte rn of F IF O and LIFO
organizations. T his case may be considered general as it contains th e previous two.
For this arrangem ent, th e possible num ber of docum ent stru c tu re p attern s which can
18This characteristic is supported by the fact that for every CFG there exists a Pushdown Au
tomaton that recognizes the language [55, 107, 69].
19 Even though text-based forms of representation are expected to be used in most applications,
the ideas presented here also apply to other input formats.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
96
Pn = Y,PiPn-i
t=l
for
Pn
P n 1? P o
docum ent which requires, for exam ple, 6 d istin ct directories, 132 docum ent stru ctu re
p attern s can be obtained.
Two characteristics which relate to th e way directories take p art d uring th e organiza
tion of m ath em atical inform ation in docum ents have been presented. In an inform al
way they sta te th a t th e sem antics of a docum ent is defined by m eans of a set of
directories and each of these directories m ust contain a t least one object in it. These
20It is understood that these updates do not require the addition of new directories.
21 This characteristic is of course subject to storage requirements. The choice of either keeping the
language processors in main or secondary storage is an implementation decision.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
97
6.6.1
Small fragm ents of docum ents containing sim ple expressions th a t overload th e +
symbol are provided to illu strate th e notion of directory com position. T he syntactical
s tru ctu re as defined by th e production rule
directoryscope > { directory-definition ) block-objects (/)
is used to delim it
directory-definition and
block-objects are g ram m ar variables th a t have been used to represent a directory and
Dl.O
( Expression )
1+ 1+ 0 = 2
1+1+1=3
1 + 1 + 0 = 110
1 + 1 + 0 = tru e
(/)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
98
T he docum ent fragm ent Dl.O, as above illu strated , is characterized by a m onolithic
organization w here th e definition of all m ath em atical objects included in th e docu
m ent are placed in th e Expression directory. T he fact th a t th e + symbol has been
used in th e above exam ple to represent different m athem atical concepts, ch aracter
izes this one-directory docum ent fragm ent as overloaded. A voice Tenderer system ,
for instance, will not be capable of providing th e ap p ro p riate m eaning th a t has been
attach ed to th e -I- symbol in each of th e four expressions. This is because th e repre
sentation used assumes th a t
only visual-based views are necessary, and
th e reader has th e required knowledge to decode th e different m eanings assigned
to th e + symbol.
The above problem is approached here by dividing the single directory into three
sep arate ones in order to ensure th a t th e directory is not overloaded. A directorybased organization is consequently obtained. T he resulting docum ent organization, as
shown below, has therefore been stru ctu red according to th e addition, concatenation
and disjunction operations th a t have been attach ed to the + symbol.
D l.l
( Addition )
1+ 1+0 = 2
1+ 1+ 1= 3
(/)
( Concatenation )
1 + 1 + 0 = 110
(/)
( Disjunction )
1 + 1 + 0 = tru e
(/)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
99
D l.2
( Addition )
1+ 1+ 0 = 2
( Concatenation )
1 + 1 + 0 = 110
(/)
( Disjunction )
1 + 1 + 0 = tru e
(/)
1+ 1+ 1= 3
(/)
( Disjunction )
1 + 1 + 0 = tru e
(/)
6.6.2
1
2
3
4
5
directory .scope
directory-definition
block_objects
various_exprs
scope.change
->
->
->
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
100
T he docum ent organization provided by th e three exam ples from th e previous subsec
tion illu strates th a t a form of control is necessary in order to ensure th e correctness
of th e directory com position forms. This requirem ent has been introduced in Sec
tion 6.2 as c, th e binding control. As p a rt of th e definition of a docum ent instance
stru ctu re Sj (Dj , c), th e binding control is a CFG . A possible definition of c to
sup p o rt th e directory com position forms is provided in Table 6.17. T he nonterm inal
s tm tJ is t is defined in th e g ram m ar fragm ent described in Table 6.16 and th e nonter
m inal new_expr is only to be defined whenever directories are created. This m eans
any C FG which defines a directory will have new_expr as a s ta rt symbol.
6.7
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
101
docum ent to be processed involves inform ation defined in m ore th a n one dom ain.
Different language processors will take over th e processing activity at selected parts
of th e docum ent. Each processor is viewed as an agent th a t has knowledge to validate
th e syntax of its m ath em atical objects and perform other tasks as determ ined by the
sem antics of th e objects.
L etting th e num ber of directories in a docum ent be a p aram eter under th e control
of th e au th o r indicates th a t an equal num ber of language processors will need to be
provided in order to sup p o rt each required directory. To m eet this requirem ent, this
thesis proposes th a t language processors be dynam ically created by th e softw are used
during th e au th o rin g activity.
The au to m atic creation of language processors based on th e knowledge provided by
CFG s requires inform ation ab o u t th e position of bo th term inals and nonterm inals.
A lthough th e g ram m ar stru ctu re im posed by th e E N F determ ines th a t a t m ost one
term inal is p erm itted in production rules, th e num ber of nonterm inals is left unre
stricted. O ne exception to th is is th e operatorless production which is always com
posed of one nonterm inal.
R epresenting th e m eaning of m ath em atical concepts by m eans of CFG s requires th a t
all inform ation which is p a rt of th e concept has to be m apped to th e set of production
rules. This includes th e set of symbols used for th e representation of th e nam e of the
concept, its a ttrib u te s and delim iters.
H aving th e nam e of th e concept as a term inal and b o th its a ttrib u te s and delim iters
represented as nonterm inals introduces th e need for an additional m echanism in order
to distinguish a ttrib u te s from delim iters. For th is purpose, a set of a ttrib u te s is added
to th e g ram m ar rules.
As an extension to th e g ram m ar stru ctu re already proposed for cap tu rin g sem antics,
these a ttrib u te s will also be applied to th e definition of th e term inal symbols. T he
attach m en t of a ttrib u te s to th e rules of C FG s was proposed by K n u th [63, 78]. T he
resulting g ram m ar is called an a ttrib u te d gram m ar.
T he use of a ttrib u te d gram m ars to su p p o rt th e sem antics cap tu rin g of m ath em atical
concepts does not require any m odification to th e approach already presented. B oth
th e com position and extension op erato rs can also be applied to a ttrib u te d gram m ars.
T he following definition presents th is characteristic:
D e f in itio n 16 Let G i (A i, Tj, P i, S i, A , i) and G i = (iV2, T2, P 2, S 2, A , a 2) be two
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
102
(6 .10)
D e f in itio n 17 Let G i =
will produce
if i 4 -> it'G P i,
if A w G P or '4 > it G P i fi P 2
2
w P 3.
T he following section proposes th e stru ctu re of th e gram m ars which will be used to
sup p o rt th e definition of th e dom ains. T his m eta-g ram m ar is therefore th e tem p late
which will be applied during th e creation of every g ram m ar fragm ent required to
cap tu re th e m eaning of m ath em atical concepts.
6.8
M eta-Structure
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
103
m eta
cfg
item s
item s
item
attrib u te s
cardinality
args.position
position
->
>
>
>
->
>
cfg a ttrib u te s
N O N TE R M IN A L : items
item s item
item
TE R M IN A L | N O N TER M IN A L
# regular_expr # cardinality | e
( args.position ) | e
args_position , position | position
IN T E G E R
P roductions 9 to 12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
104
1
2
3
4
5
6
7
8
9
10
11
12
identity _expr
sample_expr
sam ple.expr
sum
sum_elmts
ran g eJist
s ta rt
end
identifier
leftDel
right Del
expr
# vSum v # (3)
# v; # ( T 3 )
# " ," #(1,3)
# '
# (1,3)
# [a~z]+ #
#{ #
# T #
# [1-9] [0-9]* #
Table 6.19: A ttrib u ted g ram m ar to su p p o rt th e cap tu rin g of sim ple sum m ations,
a ttrib u tes.
6.9
Conclusion
In this ch ap ter I have presented a gram m ar-based docum ent organization to cap
tu re th e m eaning of m ath em atical concepts. T he approach models th e dynam ics of
au th o rin g m ath em atics and su p p o rts th e introduction of user-defined syntax to rep
resent m ath em atical concepts. This m eans, th e sem antics of m ath em atical concepts
included in th e docum ent can be bound to syntax proposed during authoring. These
ideas are expressed in term s of th e D ocum ent D escription Model described as follows.
A D ocum ent D escription Model (DDM ) is a stru ctu re composed of
1. a docum ent dictionary H 3 such th a t all gram m ars in this set are in EN F, and
2. th e following operations:
(a) G ram m ar C reation: introduced in section 6.4 by th e com position opera
to r o.
(b) G ram m ar U pdate: introduced in section 6.4 by th e extension o p erato r 9cG ram m ars resulting from th is o peration as well as from th e g ram m ar cre-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
105
ation operation are elem ents of set Cf for some version of th e docum ent
stru c tu re j and docum ent directory i.
3. G ram m ar Identity: provided by th e union operations used for th e creation of
th e dom ain directory G\.
4. Closure: all gram m ars introduced by th e creation and th e u p d ate operations
are in Hj.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
106
Chapter 7
Exam ples
Among th e various forms of representation available, the conventional notatio n is the
one which has been used by th e m ajority of th e activities which involve th e com
m unication of m athem atics. A m ajo r lim itatio n on rendering m ath em atical concepts
according to th is notatio n is th e sy n tactic overloading of th e symbols used for th e en
coding of th e operators. This problem has been discussed in Section 5.2, and Figure
5.1 displays three com m on m eanings th a t are usually attach ed to symbol v.
It is assum ed, in th is thesis, th a t people, m ost of th e tim e, get exposed to m athem atics
by m eans of th e encoding provided by th e conventional notation. For this reason this
notatio n has been used in this work as th e basic source of inform ation for th e sem antics
cap tu rin g process. A lthough som etim es th e encoding provided bv th e conventional
n o tatio n is not th e ideal, it is im p o rtan t to m ain tain th e syntactical arrangem ent this
encoding provides. This decision is fundam ental to th e cap tu rin g strateg y because the
choice of a n o tatio n which is widely used should free th e a u th o r from th e requirem ent
of learning th e altern ativ e syntax su pported by th e cap tu rin g system .
In this thesis a docum ent stru ctu re com posed of a ttrib u te d g ram m ar fragm ents is
proposed to cap tu re th e m eaning of th e m ath em atical concepts. C ontext-dependent
representations are su p p o rted by a directory change mechanism where a set of gram
m ars is replaced by an o th er to allow o th er interp retatio n s to be associated w ith the
symbols considered. T he following sections illu strate th e stru ctu re proposed by de
scribing th e process involved d uring th e au th o rin g of sim ple docum ents which only
contain m ath em atical concepts represented by strings of characters.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
107
7.1
9\
92
93
94
95
96
97
9s
k
ee
ee
it
ep
ep
st
ec
ec
new_expr
ee EQ te
te
IN T E G E R
ep PLUS tp
tp
STR IN G
ec CAT tc
tc
ee
# r="#(l,3)
# [1-9] [0-9]* #
# " + # (1,3)
# [0-9]+ #
# +- # ( 1 , 3 )
h
k
h
u
te
tp
te
tc
ep
it
ec
st
P rototyp e
< dx >
1+ 1+0 = 2
< d2 >
1 + 1 + 0 = 110
</>
</>
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
108
T he above p ro to ty p e version of th e docum ent is com posed of two dom ains represented
by di and d2. As illu strated by th e syntax of th e docum ent, dom ain
should contain
<77
and concatenation operations respectively. G ram m ars g% and ge define th e dom ains
over which th e specification of addition and concatenation operations can respectively
apply. G ram m ar fragm ents g2 g$ and g% su p p o rt th e definitions of th e equality, the
addition and th e concatenation operations respectively. G ram m ar lo links gram m ar
gi to th e control m echanism . G ram m ar fragm ents l\, /2, /3 and /4 have been created
The
E xl-V ersion 1
<
{lo{g\
0^2}^}^;
{{ti
o { g i o g 5} t 3} t 0;
0 / 1
{ t 0 o l 2 o g 3} d l >
1+ 1+ 0 = 2
< { {< 1 o h o { g 7 g s } t 4 } t 5 o l A o g 6} d 2 >
1 + 1 + 0 = 110
</>
</>
As stated before, th e m ain objective of this initial version of th e docum ent is to rep
resent b o th th e ad dition and concatenation operations by th e + symbol. For this
purpose th e a u th o r organizes th e inform ation to be presented into two sep arate do
m ains as a way of resolving th e sem antical nondeterm inism generated bv overloading
th e -I- symbol. T he g ram m ar fragm ents used for th e definition of dom ain d\ have
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
109
been o b tained from dom ain directory G? and th e fragm ents used for th e definition of
dom ain d2 were taken from dom ain directory G. T he com plete definitions to support
this version of th e docum ent are described next according to th e docum ent stru ctu re
proposed in C h ap ter 6.
{gi o g 2}t2
{/0 0 f 2 } t l
{g4 0 g 5}t.3
{ h o h o f 3 }*0
{ t 0 o l 2 o g 3}di
ee
ee
new_expr
ee
ee
ep
ep
new_expr
ee
ee
te
ep
ep
new_expr
ee
ee
te
ep
ep
tp
it
ee EQ te
te
ee
ee EQ te
te
ep PLUS tp
tp
ee
ee EQ te
te
ep
ep PLU S tp
tp
ee
ee EQ te
te
ep
ep PLUS tp
tp
it
IN T E G E R
# = # ( 1 , 3 )
# ?r=TT # (1,3)
# " + r # (1,3)
# = # ( 1 , 3 )
# " + r # ( U3)
# = # ( 1 , 3 )
#"+ "#(1,3)
# [1-9] [0-9] #
Table 7.3: G ram m ars in dom ain directory G? th a t have been created by gram m ar
operations.
T he current version of th e docum ent is su p p o rted by th e docum ent instance stru ctu re
S 0 = ( D 0, c ). T he organization of th e sem antic stru ctu re D 0 is defined in term s of its
two dom ain directories G j and G for th is initial version of th e docum ent as follows:
A ) = (G ?,G )
(7.1)
where G is defined as
G =
to,
<^i}
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(7-2)
110
{9798)tA
Oi h
4 ) ^ 5
# r+ r #(1,3)
co'
4b
Jl
ec CAT tc
tc
ee
ee EQ te
te
ec
ec CAT tc
tc
ee
ee EQ te
te
ec
ec CAT tc
tc
st
STR IN G
4b
{ t b o U o g 6} d 2
ec
ec
new_expr
ee
ee
te
ec
ec
new_expr
ee
ee
te
ec
ec
tc
st
# [0-9]+ #
Table 7.4: G ram m ars in dom ain directory G 2 th a t have been created by g ram m ar
operations.
with
= 0i> ^ } ;
F = { g i , g 2,g3,g4,g5,lo}-
(7.3)
and G 2 as
G 2 = N % U F 2 U C 2 = { g e , 9 7 ,98, h, h, ^ ,^ }
1
(7.4)
with
= {ge, 97,98, h,U}'-
-P21 = {^1}-
C 2 = {4, t 5, ^ 2 } .
(7.5)
(7.6)
i=l
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ill
is a binary infix op erato r which concatenates its left operand th e num ber of tim es
stated by its right integer operand.
Syntactically, b o th operations are represented by th e * symbol. This characteristic
indicates th a t two d istin ct directories will need to be provided in order to capture
th e m eanings
E xl-V ersion 2
< d2 >
1 + 1 + 0 = 110
<
{ ^ 2 % { h {<?9 5 l o } ^ 6 0
>
1 + 1 *0 = 1
<
{t5 h k g n k g 3 g n }d i >
a * 3 + b = aaab ;
1 * 3 = 1 + 1 + 1 = 3:
</>
1*3 = 1 + 1 + 1 = 3
</>
</>
T he code presented by E xl-V ersion2 above, makes use of three d istin ct dom ains
d 2,
previous version of
of th e stru c tu re which su p p o rts this is provided by th e docum ent instance stru ctu re
S i = (>i,c).
directories G},
where
(7.7)
G} = N l U F} U C \ = { d 2}
(7.8)
is defined as
w ith
.%' = {};
F,1 = {<(,};
C} = {},
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(7.9)
112
G \ as
G =
U F2 U C2
(7-10)
with
F 2 = { d 2}:
= {gg, giQ,h,k}'-
C\ = {U,dz},
(7.11)
and G as
3
^ 3
= -W3 u
^3
^3
(7-12)
w ith
^
= {^ 11^
12^ 7 ,
Fg1 = { p 3, f 5};
^8,^9, };
C = { d 4}
3
(7-13 )
The gram m ars involved in this new version of th e docum ent stru ctu re are given by
3
(7-14)
1=1
99
gio
h
h
tm
tm
tp
fm
tm MULT fm
fm
tm
it
# v* " # ( l , 3 )
Tables 7.5 and 7.6 provide gram m ars which belong to dom ain directory G \ . These
gram m ars have been introduced by editing and by gram m ar operations respectively.
Table 7.7 shows th e gram m ars which belong to G . They were introduced by editing.
3
7.2
This exam ple proposes a sem antic stru ctu re to su p p o rt th e syntactical overloading
of th e sym bols + and *. Two different m eanings are attach ed to each sym bol and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
113
{<79 0 <7lo}^6
{^2
% {h t e 0 ^ } } ^ 3
tm
trn
new_expr
ee
ee
te
ep
ep
tp
tp
tm
tm
fm
it
tm MULT fm
fm
ee
ee EQ te
te
ep
ep PLU S tp
tp
it
tm
t m MULT fm
fm
it
IN T E G E R
# v*v # ( 1 , 3 )
# : = " # (1,3)
# :+ " # ( 1 , 3 )
# ;' * " # ( 1 , 3 )
# [1-9][0-9] #
Table 7.6: G ram m ars in dom ain directory G \ th a t have been created bv g ram m ar
operations.
9n
9n
h
h
h
tp
st
tc
tc
fp
St P O W E R fp
A LPHA N UM
tp
St
it
# (1,3)
# [ 0 - la - s ] + #
#
each m eaning requires a custom ized dom ain where gram m ar fragm ents are needed to
sup p o rt th e sem antic cap tu rin g process.
A lthough th e sem antics usually attach ed to these symbols characterizes them as bi
nary o perators, as provided by dom ain d3, m any oth er m eanings may also be associ
ated w ith them . O ne possibility, for exam ple, is to have them as th e elem ents of a
set. For th is scenario, th e two symbols will be th e operands of th e comma " ,v binary
op erato r which is used to organize th e elem ents of a set in a list form at. This char
acteristic is illu strated by th e single statem en t defined w ithin th e scope of dom ain d$
in th e sem antic stru ctu re th a t follows:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
114
{ t 5 0 l7 0
/ 8
0 <7U
O /g
<73
< 7 1 2 )^ 4
new_expr
ee
ee
te
ec
ec
tc
tc
tp
ee
ee EQ te
te
ec
ec CAT tc
tc
tp
st
st P O W E R fp
it
IN T E G E R
ALPHANUM
fp
it
st
# " = r # (1,3)
#"+' # ( 1 ,3 )
#"*"#(1,3)
# [1-9] [0-9]* #
# [ - l a - ]+ #
0
Ex2:
<d3>
0 + 1*1 = 1
<d5>
R = S = { + ,* }
</>
</>
^10
In
9l3
514
515
516
517
518
te
te
id
bs
endset
el
el
tl
id
bs
ID E N T IF IE R
SET el endset
E N D SE T
el LISTD EL tl
tl
B IN A RY O P
#
#
#
#
[A-Z\ #
# [+*] #
Table 7.9: G ram m ars in dom ain directory G!> created by editing.
Tables 7.9 and 7.10 illu strate all g ram m ars required for this exam ple. Since gram m ar
ti has already been defined in Section 7.1 it has not been included in these tables.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
115
bs
endset
el
el
new_expr
ee
ee
te
te
id
bs
endset
el
el
tl
{<714 0 <7l5}^6
{<7i6 0 9 n } t 7
{fi o / i o / n <?i3 ot.6 o t 7 o g \ s } d 3
SET el endset
EN D SET
el LISTD EL tl
tl
ee
ee EQ te
te
id
bs
ID E N T IF IE R
SET el endset
EN D SET
el LISTD EL tl
tl
B IN ARY O P
#T # ( 2 )
#T #
# :' = " # (1,3)
# " = r # (1,3)
# [A-Z] #
#"{" #(2)
# ''} ' #
# V #(1,3)
# [+*] #
Table 7.10: G ram m ars in dom ain directory G created by g ram m ar operations.
= (G ?,G ,G )
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(7.15)
116
D om ain G is defined as
G = JV U F? U C f = { d 3}
(7.16)
with
N i={h
C? = 0
Fi={dsh
(7.17)
and G as
^
(7 -1 8 )
with
^ = {^l}'
^ 2
= {^ , ^7, ^ } (7.19)
6
The gram m ars required for this exam ple are provided by
2
1 7
(7.20)
j= i
7.3
The docum ent stru ctu res introduced by th e previous exam ples illu strated a scenario
where th e overloading of symbols took place in d istin ct expressions.
This m eans
a given sym bol appeared in m ore then a single expression w ith different m eanings
associated w ith it.
current dom ain was replaced by an ad eq u ate one th a t provided th e necessary gram m ar
sup p o rt for th e cap tu rin g of th e m eaning of th e concepts involved.
Symbol overloading may also take place w ithin th e expression itself. For th is scenario
th e context switch would introduce as m any d istin ct dom ains as th e num ber of differ
ent m eanings which are associated w ith any given symbol included in th e expression.
This section presents a docum ent stru c tu re to su p p o rt expressions which require more
then a single dom ain to cap tu re th e m eaning of th e concepts they represent. To il
lu stra te th is problem consider th e following expression which attach es two different
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
117
m eanings to th e symbol + .
|A + B | + 1 = a
(7.21)
Ex3:
{< 7l9{< 720 0 9 2 } h } t 9 -
<
+ B | < { <723 ^ 2
0 <724 } # 7 >
+1 = a
</>
</>
<7l9
*720
<721
<722
new_expr
ee
et
endet
D E T ee endet domain_scope
ee M ATRIX_ADD et
M ATRIX J D
ENDET
#T
# (2, 3, 4)
# " + " #(1,3)
# [A-Z \ #
#
<723
<724
new_expr
te
PLU S ee
CO N STA N T
" + :' #
(2 )
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(7.22)
118
ee
et
new_expr
ee
et
new_expr
ee
et
endet
{<?20 0 <?2}^8
{<7l9 0 t s } h
{g 0 p21 0 5 2 2 ) ^ 6
ee M ATRIX_ADD et
M ATRIX J D
D E T ee endet domain_scope
ee M A TR IX .A D D et
M A T R IX JD
D E T ee endet domain_scope
ee M ATRIX_ADD et
M A T R IX JD
ENDET
# + # 0 , 3 )
# [A -Z \ #
# (2 ,3 ,4 )
# " + " # ( 1 ,3 )
# [A -Z \ #
# " | " # (2 ,3 ,4 )
# r +" # ( 1 ,3 )
# [A -Z ] #
# " |" #
#T
Table 7.13: G ram m ars in dom ain directory G created by g ram m ar operations.
new_expr
ee
ee
te
PLUS ee
ee EQ te
te
CO N STA N T
:' + "
(2)
# = v # 0 ,3 )
# (0 |[l-9 ][0 -9 ]* )|[< H #
Table 7.14: G ram m ars in dom ain directory G created by g ram m ar operations.
2
where G is defined as
G?
(7 .2 3 )
w ith
^ i = {<?i9, <720, <721 , # 2 2 };
F f = {^ 2};
G f = {<8, <9 , d 6}
(7 .2 4 )
and G as
2
= A ^ 1 U F 2 U C 2 = {<723 , 924 , h , d j )
(7 .2 5 )
w ith
= { 923, 924, <#};
F 2 = { t 2};
C 2 = { d 7}.
(7 .2 6 )
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(7 .2 7 )
119
Tables 7.11 and 7.13 are b o th associated w ith dom ain directory G?.
Table 7.11
shows all gram m ars in this directory th a t were created by editing and Table 7.13 the
gram m ars th a t were generated as th e result of com position operations. In a sim ilar
way th e gram m ars in Tables 7.12 and 7.14 are associated w ith th e dom ain directory
G. T he g ram m ars in Table 7.12 are th e result of editing and th e gram m ars in 7.14
were created by com position.
As discussed in Section 7.2 th e integer a ttrib u te s which are introduced as p art of
th e rules of some gram m ars, have th e purpose of determ ining th e position of the
relevant nonterm inals of a rule. In Table 7.11 gram m ar fragm ent
<719
uses a ttrib u te s
2 ,3 and 4 to refer to its three nonterm inals th a t are necessary in order to su pport
th e correct expansion of this rule. N onterm inals ee and endet are associated with
a ttrib u te s 2 and 3 respectively. A lthough bo th nonterm inals ee and endet belong
to th e sam e dom ain directory, nonterm inal d o m a in sc o p e , which is associated w ith
a ttrib u te 4, does not. As p a rt of th e dynam ic control gram m ar, this nonterm inal is
associated w ith th e context switch which is need to provide th e ad eq u ate gram m ar
for th e m ath em atical concepts being processed.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
120
Chapter 8
T he Processing Structure
In C h ap ter
docum ent organization th a t uses C FG s as its fundam ental form alism . This chapter
presents a processing stru ctu re for th e proposed organization.
8.1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
121
syntax definitions which contain overloaded symbols the sem antic characteristics of
concepts needs to be considered. T he expression
m atics can be m odeled as a set of states w here each sta te is uniquely characterized
by a C FG or scope. In other words a finite au to m ato n whose states are C FG s and
tran sitio n s are supplied by th e au th o r. O ne problem w ith this association is to de
term ine th e boundaries of an au th o rin g increm ent. This m eans when one ends and
th e next is to be considered.
To get around this nondeterm inism I have used th e state change concept as a mech
anism to resolve am biguities. O f course a s ta te change, in this context, m ust also be
triggered whenever th e syntax used for a given concept cannot be recognized by the
gram m ars defined for th a t state.
long as no syntactical am biguities are introduced and all syntax proposed are valid
statem en ts for th e current scope. T he syntax attach ed to a concept will only be valid
w ithin a given scope and will be recognized as long as the scope it belongs to is active.
A ccording to this strateg y th e docum ent a t th e end of th e au th o rin g activ ity will be
organized as a sequence of sets of gram m ars. Since th e docum ent has been created
by an increm ental approach it is intuitive to stru ctu re its processing by m eans of a
mechanism th a t su p p o rts this characteristic. In essence new language processors will
need to be provided as new scopes are introduced. This m eans th e dynam ic au th oring
characteristic determ ines increm ental changes to be m ade in th e n o ta tio n /lan g u ag e
used. Therefore increm ental changes also need to be provided to th e gram m ars used
for th e definition of th e n o ta tio n /lan g u ag e. This process may be viewed as a language
proto ty p in g activ ity where language fragm ents are included as a way to su p p o rt new
features.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
122
8.2
and th e back end. T he p arts associated w ith th e source language are th e lexical and
syntactic analysis, th e symbol table creation, th e sem antic analysis, th e generation of
interm ediate code and code optim ization. T he front end is th e collection of all these
p arts. T he back end portion is related to tasks th a t are associated w ith th e targ et
language. Therefore ta rg e t code generation and target code optim ization are back
end tasks. T he symbol tab le m anagem ent and error handling are tasks which are not
restricted to a single phase. These tasks may belong to b o th th e front and back end
phases.
As described above th e phase oriented decom position approach views a program m ing
language as a single indivisible object. An altern ativ e way would be to describe a
language as a collection of fragm ents such th a t th eir com bination would provide the
sam e processing power as th e indivisible definition. T he im p o rtan t characteristic of
this approach is th e fact th a t language fragm ents can be defined to represent not only
syntax b u t also th e sem antic stru ctu re of language constructs.
T he following section presents th e organization th is thesis proposes to th e construction
of docum ent processors to su p p o rt th e dynam ics of au th o rin g m athem atics.
The
solution com bines b o th notions of phase oriented processing and fragm ented language
definitions.
8.3
In Section 6.2.1 I have introduced a docum ent stru ctu re to model th e dynam ics of
auth o rin g m athem atics. T he model described there organizes au th o rin g as a sequence
of sets of gram m ars. In this organization each set captures th e syntax and portions
of th e sem antics of some m ath em atical concepts th a t have been included in th e docu
m ent. A com plete sequence, in th is case, characterizes one stage during th e au th o rin g
activity. In o th er words it corresponds to a version of th e docum ent.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
123
In order to process a given version of a docum ent, say for instance, version v, the
docum ent processor m ust step th ro u g h th e com plete sequence of sets of gram m ars
which is associated w ith v s ta rtin g from th e sequence's first elem ent.
As a result
a context sw itching or scope change will take place whenever a set of g ram m ars is
replaced by another. This procedure is th e approach this thesis proposes to capture
sem antics th a t is associated w ith th e field of knowledge th a t m ath em atical concepts
belong to.
Let
T he language processor for docum ent stru ctu re S3 is defined by th e d eterm inistic
finite au to m ato n
PDj = ( Q j,E j, j,S j,F j)
where
Q j is th e set whose elem ents are all processors associated w ith th e directories
th a t com pose th e sem antic stru ctu re D 0,
sj = P'm % g {
= FM%GJnj
E j is th e set containing elem ents which are th e syntax of m ath em atical objects
associated w ith version j of th e docum ent, and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
124
For all w G E j
' = P M%&
M % C,\
Q ] - { P m % g >}
8.3.1
if w e L ( G t ) ,
otherw ise
Example
C h ap ter 7 provides a set of exam ples to illu strate th e organization th is thesis proposes
to su p p o rt th e dynam ics of a u th o rin g m athem atics. A scenario where two versions of
a simple docum ent containing m ath em atical expressions th a t overloads th e + symbol
is provided in Section 7.1.
authoring.
T he language processors associated w ith each version of this docum ent are therefore
P d 0, for th e first version, and P di for th e second. T he sem antic stru ctu re for the
second version is
D, = (G|,G5,GS,GJ)
and th e set of states for its language processor is
Qi
{ P m % g \ i P m %g \ i P m % g \->Pm %g \ }
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
125
Chapter 9
Concluding Rem arks
This work introduced a user-oriented organization to sup p o rt th e creation of m ulti
purpose m ath em atical docum ents. To approach this characteristic a m echanism to
cap tu re th e sem antics of th e m ath em atical concepts was proposed. This mechanism
m odels th e dynam ics of au th o rin g and allows m eaning-to-svntax bindings to take
place d uring th e au th o rin g activity. It also provides th e au th o r w ith th e power to
select th e sy n tax h e/sh e believes is th e m ost ap p ro p riate to express th e ideas to be
com m unicated. A processing stru c tu re to su p p o rt th e proposed organization was also
presented.
9.1
Discussion
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
126
p art of a library and are ready to be used. In th e event th a t new concepts need to
be introduced or th e ir m eaning-to-syntax m appings need to be modified th e model
determ ines th a t th e needed gram m ars are to be created by either editing or by the
application of operations on th e existing gram m ars or a com bination of these two
approaches.
E d itin g could be required only when few gram m ars are available or
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
127
9.2
In this dissertatio n I have described th e goal of cap tu rin g th e m eaning of m ath em atical
concepts by m eans of a docum ent stru c tu re which
1. allows th e sem antics of m ath em atical concepts be encoded by user-defined syn
tax, provided th e notatio n is context-free and
2. su p p o rts bo th extensibility and am biguity characteristics of th e conventional
m ath em atical notation.
In C h ap ter 1 I have m ade three claim s concerning my approach to au th o rin g docu
m ents containing m athem atics. These claims are repeated here followed by com m ents
ab o u t th e approach I took to accom plish each one of them .
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
128
1. B oth the m eaning and syn tax o f m athem atical concepts can be captured by a t
tributed context-free gram m ars. T he solution I have proposed to cap tu re the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
129
9.3
Future Work
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
130
2. Is th ere a way one can ensure com positionality of m eaning for such system s?
These questions I leave as open. A detailed investigation of th e application of com
positionality is therefore a fu tu re goal.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
131
References
[1] J. A b b o tt: O penM ath Design C om m ittee R eport. Technical report, O penM ath
C onsortium , 1996. Available from h ttp ://w w w .o p e n m a th .o r g /.
[2] J. A b b o tt, A. Diaz, R. S. Sutor: A R ep o rt on O penM ath, A P rotocol for the
Exchange of M athem atical Inform ation. S IG S A M B u lletin 30(1) (M arch 1996),
21-24.
[3] J.
A b b o tt,
M ath.
A.
van
Leeuwen,
A.
S trotm ann:
O bjectives
of O pen
Available from
h t t p : / /www. o p en m ath . o r g / .
[4] G. D. Abowd: Form al A s p e c ts o f H u m an -C om pu ter Interaction. P hD thesis,
Oxford University, Oxford, E ngland, 1991.
[5] S. R. Adam s:
Press, 2005.
[8] D. S. A rnon, S. A. M am rak: On th e Logical S tru ctu re of M athem atical N ota
tion. T U G b o a t 12(4) (1991), 479-484.
[9] R. A rrabito:
Using
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1987.
132
[10] R. G. A rrabito:
Verlag.
[13] A. A sperti, B. Buchberger, J. H. D avenport (editors):
Conference, M K M 2003.
2003. Springer-Verlag.
[14] R. A usbrooks, S. Buswell, D. Carlisle, S. D alm as, S. D ev itt, A. Diaz, M. Froum entin, R. H unter, P. Ion, M. K ohlhase, R. Miner, N. Poppelier, B. Sm ith,
N. Soiffer, R. Sutor, S. W att:
version 2.0 (Second E dition).
http://www.w 3 .org/TR/2003/REC-MathML2-20031021/
106-119.
[17] P. V. Biron, A. M alhotra: XML Schem a P a r t 2: D atatypes. Technical report,
OASIS, 2001. Available from http://www.w3.org/xmlschema-2/
[18] In stru ction M anual for B raille Transcribing. Am erican P rin tin g House for the
Blind, Louisville, Kentucky, 3rd ed., 1984.
[19] The N em eth B raille C ode for M a th em a tic s and Science N o ta tio n , 1972 R evision.
A m erican P rin tin g House for th e Blind, Louisville, Kentucky, 1985.
[20] T. Bray, J. Paoli, C. M. Sperberg-M cQ ueen, E. M aler, F. Yergeau, J. Cowan:
Extensible M arkup Language (XML) 1.1. Technical report, W 3C, 2004. Avail
able from http://www.w3.org/TR/2004/REC-xmlll-20040204/
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
133
[21] M. B ryan: A T^X User's G uide to IS O s D ocum ent Style Sem antics and Spec
ification Language (DSSSL). T U G b o a t 14 (1993), 223-226.
[22] H. B unt: Issues in M ultim odal H um an-C om puter C om m unication. In H. B unt,
R .-J.B eun, T. Borghuis (editors): M u ltim o d a l H u m an -C om pu ter C om m unica
tion: S ystem s, Techniques, and E xperim en ts, 1374. L ectu re N o tes in C o m p u ter
Science, 1-12, Springer-Verlag, Berlin, Jan u ary 1998.
[23] S.
Buswell,
tano,
M.
Technical
0.
C ap ro tti,
Kohlhase:
rep o rt,
D. P. Carlisle,
T he
T he
M.
O penM ath
O penM ath
Society,
C.
Dewar,
S tan d ard
2004.
M.
G ae
(version
2.0).
Available
from
http://www.openmath.org/cocoon/openmath/standard/om20/index.html
[25] 0 . C ap ro tti, D.
Technical report,
P. Carlisle, A. M. Cohen:
Available from
http://www.nag.co.uk/proj ects/OpenMath/omstd
[27] J. C lark: T he design of RELA X NG. Technical report, OASIS, 2001. Available
from http://www.thaiopensource.com/relaxng/design.html
[28] J.
tion.
C lark,
M.
Technical
M akoto:
report,
OASIS,
RELA X
NG
Specifica
2001.
Available
from
http://www.oasis-open.org/committees/relax-ng/spec.html
[29] J.
rial.
C lark,
M.
Technical
M akoto:
rep o rt,
RELAX
OASIS,
2001.
NG
T uto
A vailable
from
http://www.oasis-open.org/committees/relax-ng/tutorial.html
[30] R. E. C lark (editor): L earning From M edia: A rgum ents, A n alysis and E vidence.
P e rsp ec tiv e s in In stru ction al Technology and D istan ce Learning. Inform ation
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
134
[31] R. E. Clark:
ch. 15. In P ersp ectives in In stru ction al Technology and D istan ce L earning [30],
2001 .
[33] P. R. Cohen, D. R. McGee: Tangible M ultim odal Interfaces for Safety-C ritical
A pplications. C om m u n ication s o f the A C M 47(1) (Jan u ary 2004), 41-46.
[34] J. H. Coombs, A. H. R enear, S. J. DeRose: M arkup System s and th e F u tu re of
Scholarly Text Processing. C om m u n ication s o f the A C M 30(11) (1987), 933947.
[35] J. C outaz, L. Nigay, D. Salber: M ultim odality from th e User and System P er
spectives. In Proc. E R C IM (E uropean Research C onsortium for In form atics and
M a th em a tics), workhop on User Interface For All, H eraklion. 1995. A vailable
from citeseer.ist.psu.edu/coutaz95multimodality.html
[36] J. de Carvalho, H. Jiirgensen: D ynam ic M ulti-Purpose M athem atics N otation.
Technical R eport 521, T he U niversity of W estern O ntario, 1998.
[37] M. Dewar:
2-5.
[38] C. Dirckx: A M athem atical Text to B raille T ranslator. 1992. P ro ject D isser
ta tio n , C hurchill College, U niversity of Bradford.
[39] A. Dix, J. Finlay, G. Abowd, R. Beale: H u m an -C om pu ter Interaction. PrenticeHall, 1998.
[40] M. B. Dorf, E. R. Scharrv: In stru ction M anual for B raille Transcribing. Division
for th e Blind and Physically H andicapped, Library of Congress, W ashington,
D. C., 1979.
[41] S. D unne, H. Jiirgensen:
o f W O O D M A N 89:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
135
[42] A. D. Edw ards, R. D. Stevens: Une Interface M ultim odale po u r l'Access aux
Form ules M athem atiques p ar des Eleves ou E tu d ian ts Aveugles. In C om m e les
A u tres: Interfaces M u ltim o d a les p o u r H andicapes Visuels, Special n um ber 1.
T he Ben
T U G b o u t 16(2)
(1995), 174-214.
[52] D. Harel: S tatech arts: A V isual Form alism for Com plex System s. Science o f
C o m p u te r P ro g ram m in g 8(3) (1987), 231 -274.
A CM
T ransactions o f Softw are E ngineering and M eth o d o lo g y 5(4) (1996), 293 -333.
[54] F. C. Heeman:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
136
In P roceedin gs o f the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
137
[68] X. Li: XML and th e C om m unication of M athem atical O bjects. M aster's thesis.
T he U niversity of W estern O ntario, London, C anada, 1999.
[69] J. C. M artin:
1986.
[77] D. A. N orm an, S. W . D rap er (editors): User C entered S y ste m Design. Lawrence
E rlbaum Associates, Publishers, 1986.
[78] J. Paakki:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
138
T U G b o a t 13 (1992),
372-379.
[85] T . V. R am an:
[86] T. V. R am an:
P hD thesis, Cornell
(1995), 311-314.
[88] T. V. R am an: Em acspeak: A Speech-E nabling Interface. Dr. D o b b s Journal
(Septem ber 1997).
[89] D. R. Raym ond, F. W . Tom pa, D. W ood:
M arkup Reconsidered.
In F irst
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
E lectronic
139
[93] W . R udin: R eal and C om plex Analysis. M cGraw-Hill, New York, New York,
th ird ed., 1987.
[94] G. Salomon: Television is "easy" and p rin t is v tough5: T he differential invest
m ent of m ental effort in learning as a fucntion of perceptions and attrib u tio n s.
Journal o f E du cation al P sych ology 76(4) (1984), 233 -243.
Differences in
[100] J.-P. Trem blay, P. G. Sorenson: The T h eory and P ractice o f C om piler W riting.
M cGraw-Hill, 1989.
[101] J. van B enthem , A. te r Meulen:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
140
[107] D. Wood: T h eory o f C o m p u ta tio n . John W iley k Sons, first ed., 1987.
[108] F. J. W right: Interactive M athem atics via th e Web using M athM L. S IG S A M
B u lletin 34(2) (June 2000), 49-57.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
141
VITA
Name:
P lace o f birth:
Brazil
Education:
Awards:
R elated Work
Experience:
Lecturer
D ep artm en t of C om puter Science
U niversity of P ittsb u rg h
P ittsb u rg h , PA, USA
2002-present
Lecturer
School of C om p u ter Science
U niversity of W indsor
W indsor, O ntario, C an ad a
1999-2002
G rad u ate Research A ssistan t/L ectu rer
D ep artm en t of C om p u ter Science
T he U niversity of W estern O ntario
London, O ntario, C an ad a
1999
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
142
R elated Work
Experience:
(cont)
Teaching A ssistant
Faculty of Inform ation and M edia Studies
T he U niversity of W estern O ntario
London, O ntario, C an ad a
1998
L ecturer
D ep artm en t of C om puter Science
T he U niversity of W estern O ntario
London, O ntario, C an ad a
1997
Teaching A ssistant
D ep artm en t of C om puter Science
T he U niversity of W estern O ntario
London, O ntario, C anada
1996-1998
C o o rd in ato r of th e Scientific
C om puting C enter (NCC)
D ep artm en t of C om puter Science
U niversidade Federal do Rio G rande do N orte
N atal, RN, Brazil
1991-1995
L ecturer
D ep artm en t of C om puter Science
U niversidade Federal do Rio G rande do N orte
N atal, RN, Brazil
1989-1995
G rad u ate A ssistant
D ep artm en t of Electrical Engineering
U niversity of M aine a t O rono
O rono, M aine, USA
1985
Electrical Engineer
Technological Nucleus a t C enter of Technology
U niversidade Federal do Rio G rande do N orte
N atal, RN, Brazil
1986-1989
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
143
Presentations:
Technical Reports:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.