Mathematics As A Game of Types

Mathematics
as a
Game of Types
(Thesis Format: Monograph)
bv
Jackson W. Marques de Carvalho
Graduate Program
in
Computer Science
A thesis su b m itted in p artial fulfillment

of th e requirem ents for th e degree of
D octor of Philosophy
Faculty of G rad u ate Studies

T he U niversity of W estern O ntario
London, O ntario, C an ad a
Jackson W. M arques de Carvalho 2005
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
1*1
Library and
Archives Canada
Bibliotheque et
Archives Canada
Published Heritage
Branch
Direction du
Patrimoine de I'edition
395 W ellington Street

Ottawa ON K1A 0N4
Canada
395, rue W ellington

Ottawa ON K1A 0N4
Canada
Your file Votre reference
ISBN: 0-494-12080-0
Our file Notre reference
ISBN: 0-494-12080-0
NOTICE:
The author has granted a non
exclusive license allowing Library
and Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or non
commercial purposes, in microform,
paper, electronic and/or any other
formats.
AVIS:
L'auteur a accorde une licence non exclusive
permettant a la Bibliotheque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par telecommunication ou par I'lnternet, preter,
distribuer et vendre des theses partout dans
le monde, a des fins commerciales ou autres,
sur support microforme, papier, electronique
et/ou autres formats.
The author retains copyright

ownership and moral rights in
this thesis. Neither the thesis
nor substantial extracts from it
may be printed or otherwise
reproduced without the author's
permission.
L'auteur conserve la propriete du droit d'auteur

et des droits moraux qui protege cette these.
Ni la these ni des extraits substantiels de
celle-ci ne doivent etre imprimes ou autrement
reproduits sans son autorisation.
In compliance with the Canadian

Privacy Act some supporting
forms may have been removed
from this thesis.
Conformement a la loi canadienne

sur la protection de la vie privee,
quelques formulaires secondaires
ont ete enleves de cette these.
While these forms may be included

in the document page count,
their removal does not represent
any loss of content from the
thesis.
Bien que ces formulaires

aient inclus dans la pagination,
il n'y aura aucun contenu manquant.
i*i
Canada
THE UNIVERSITY OF WESTERN ONTARIO

FACULTY OF GRADUATE STUDIES
CERTIFICATE OF EXAMINATION
Supervisor
Examiners
Dr. Helmut Jurgensen
Dr. Stephen Watt
Supervisory Committee
Dr. Kamran Sedig
Dr. David Spencer
Dr. Gerhard Weber
The thesis by
Jackson Carvalho
entitled:
Mathematics as a Game of Types

is accepted in partial fulfillment o f the
requirements for the degree o f
Doctor o f Philosophy
Date
Richard Kane______
Chair o f the Thesis Examination Board
April , 2005___________________
8
ii
A bstract
T his thesis presents a gram m ar-based approach to the specification of m ath em atical
n o tatio n . T he m ethod introduced is based on a m eta-stru ctu re th a t uses a ttrib u te d
context-free gram m ars for cap tu rin g th e m eaning of m athem atical concepts.
T his
stru c tu re supports th e creation of m ulti-purpose docum ents and allows th e specifi

catio n of m athem atical n o tatio n in a dynam ical way. In th e context of th is thesis,
m ulti-purpose docum ents refer to docum ents th a t may be rendered or used in differ
ent ways, some of which m ight not be known a t th e tim e th e docum ent is created.
By dynam ical it is understood th a t th e m eaning associated w ith syntax is allowed to
be modified.
T h e proposal described in this thesis is based on an au th o rin g m odel which addresses
th e user needs as a fundam ental requirem ent. This characteristic is stru ctu re d around
a scope mechanism th a t allows th e m apping between sem antics and sy n tax to be
m odified a t any tim e during auth o rin g . This process su p p o rts th e dynam ic charac
teristics of th e m eaning-to-syntax binding necessary during th e au th o rin g of m a th
em atical concepts. T he m ulti-purpose property is su pported by a sem antics-based
m ark -u p th a t provides th e possibility for th e m ath em atical concepts to be processed
according to the specific requirem ents of applications. M odular gram m ar fragm ents
characterized by a one-to-one m apping between m athem atical concept and g ram m ar
representation provide th e adequate su p p o rt for th e definition of th e various scopes.
An increm ental update process is defined as a way to modify th e necessary g ram m ar
fragm ents to support th e changes proposed during th e au th o rin g process.
/keywords: m athem atics, types, user-oriented, interfaces, m etasystem , g ram
m ars, rendering, notation, authoring, m ultim odal
iii
Acknowledgm ents
I would like to th an k my supervisor, Dr. H elm ut Jiirgensen, who believed in me,
for proposing th e problem , for his guidance and m entorship.
I would also like to
th an k M aia H oeberechts for reading th e previous version of this thesis and for her
suggestions.
I am grateful to my parents, Jose and Jan ete, for m aking me u n d erstan d th e im por
tance of education and work. I wish to th a n k my children C arolina, M arcello e Luiza
for always rem inding me life can be fun even d uring difficult tim es. My special th anks
to my wife R ozane for her su p p o rt, love and dedication to our children.
This work has been partially su pported by th e Conselho Nacional de D esenvolvim ento
Cientffico e Tecnologico (C N Pq), by th e U niversidade Federal do Rio G rande do N orte
(U FR N ), by Dr. H elm ut Jiirgensen.
iv
Table of C ontents
C ertificate of Exam ination
ii
A bstract
iii
A cknow ledgem ents
iv
1 Introduction
1.1
T he Problem : C ap tu rin g Sem antics by M eans of User-Defined Syntax
1.2 R elated W o r k ..........................................................................................................
3
4
1.2.1
D ata Model and D ata R e p re s e n ta tio n ...............................................
1.2.2
SGML and X M L .....................................................................................
1.2.3
XML and RELAX N G ...........................................................................
1.2.4
A S T E R .......................................................................................................
1.2.5
O p e n M a th ...................................................................................................
1.2.6
M a th M L .......................................................................................................
13
1.2.7
Some L im itations of Both O penM ath and M a t h M L ...................
14
1.2.8
C o m p o s itio n a lity .....................................................................................
14
1.3
M o tiv a tio n .................................................................................................................
15
1.4
A Solution: D ynam ical D ocum ent S t r u c t u r e ................................................
17
1.5
A pproach T a k e n ......................................................................................................
18
1.6 Thesis O v erv iew ......................................................................................................
19
2 Basic N otion s and N otation

2.1
21
Basic D e f in it io n s ...................................................................................................
21
A Framework for Interactive System s
24
3.1
Basic N o t i o n s ........................................................................................................
24
3.1.1
Electronic and P ap er D o c u m e n t s ......................................................
24
3.1.2
C om m unication, M edia and M o d a litie s ...........................................
26
3.2
User Interface Basic C o m p o n e n ts .................................................................
27
3.3
An E xisting M o d e l .............................................................................................
28
3.3.1
A S tru ctu rin g P r o b l e m ...........................................................................
29
A Different S tru ctu re for Interactive S y s te m s .............................................
30
A New F r a m e w o r k ..................................................................................
30
3.5
E x a m p le ..................................................................................................................
32
3.6
Sum m ary
33
3.4
3.4.1
...............................................................................................................
A uthoring Environm ents
34
4.1
I n tr o d u c tio n ............................................................................................................
34
4.2
In teraction O bjects and A uthoring E n v i r o n m e n ts ....................................
34
4.3
Cognitive D is ta n c e s .............................................................................................
36
4.4
R endering I n f o r m a t io n .......................................................................................
37
4.5
Encoding M athem atical C o n c e p t s ..................................................................
38
4.6
E nvironm ent M o d ific atio n s................................................................................
40
4.7
Changes in th e I n t e r f a c e ...................................................................................
41
4.8
R eco m m en d a tio n s.................................................................................................
42
4.9
Sum m ary
42
...............................................................................................................
M athem atical C onstructs and their R epresentation
44
5.1
N otational Systems as L a n g u a g e s ..................................................................
45
5.2
S tan d ard M athem atical N o tatio n C h a ra c te ris tic s ......................................
46
5.3
C ap tu rin g th e Sem antics of M athem atical C o n c e p ts .................................
48
5.3.1 M athem atics and D ocum ent A u th o r in g ............................................
49
vi
5.3.2
C FG s and D a ta T y p e s ..........................................................................
50
5.3.3
C FG L im itation to S upport A uthoring M a th e m a tic s ...................
51
5.3.4
U p d atin g C F G s ........................................................................................
52
5.3.4
. 1
5.3.4.2
Identical Syntax and R ule S e m a n tic s ..............................
54
Redundancy, Syntax Equivalence and Norm al Forms
56
5.4
R epresenting Polynom ials
...............................................................................
59
5.5
R epresenting Subscripts and S u p e rs c rip ts ...................................................
61
5.5.1
O verloading S u b s c r ip ts ...........................................................................
63
5.5.2
O verloading S uperscripted S y m b o ls ..................................................
64
5.6
R epresenting M a t r i c e s ......................................................................................
64
5.7
R epresenting Sets of N u m b e r s ........................................................................
6 6
5.8
R epresenting S u m s .............................................................................................
67
5.9
C o n c lu sio n ...............................................................................................................
70
6 M odelling C ontext D ependent Inform ation
71
6.1
A uthoring M athem atics and M u ltim o d a lity ................................................
71
6.2
A Form al S tru ctu re for D ocum ent A u th o rin g .............................................
75
G ram m ars and D ynam ic D ocum ent A u th o r in g .............................
77
S tru ctu rin g w ith G r a m m a r s ............................................................................
79
6.3.1
82
6.2.1
6.3
M athem atical C oncepts and G ram m atical D ependencies
...
6.4
G ram m ar O perations and E x te n s ib ility .......................................................
87
6.5
S tru ctu rin g w ith D om ains and D i r e c t o r i e s ................................................
90
D om ains, D irectories and Symbol O v e rlo a d in g .............................
92
Languages as Control S tr u c tu r e s .....................................................................
94
6.6.1
D irectory C om position E x a m p le .........................................................
97
6.6.2 T he Control M e c h a n i s m .......................................................................
99
6.7
T he Role of C o m p i le r s .......................................................................................
100
6 . 8
M e ta - S t r u c tu r e .....................................................................................................
102
6.9
C o n c lu sio n ...............................................................................................................
104
6.5.1
6 . 6
vii
7 Exam ples
7.1
E xam ple 1: O verloading th e -I- and * s y m b o l s ...........................................
107
7.2
E xam ple 2: Symbols as operators and o p e r a n d s ........................................
1 1 2
7.3
Exam ple 3: More m eanings for th e + s y m b o l..............................................
116
The P rocessing Structure
120
8.1
D ynam ic A uthoring and Language F r a g m e n t s ...........................................
120
8.2
Processing G ram m ar F r a g m e n ts ......................................................................
122
8.3
D ynam ic A uthoring and D ocum ent P r o c e s s o r s ...........................................
122
E x a m p le .....................................................................................................
124
8.3.1
106
C oncluding Rem arks
125
9.1
D is c u s s io n ................................................................................................................
125
9.2
A uthoring w ith G ram m ar F r a g m e n t s ............................................................
127
9.3
F u tu re W o r k .............................................................................................................
129
V ita
141
viii
List o f Tables
4.1
P e n /p a p e r au th o rin g environm ent....................................................................
35
4.2
l^ X -b a se d au th o ring environm ent....................................................................
36
4.3
D ocum ent au th o rin g environm ent characteristics and software design

approaches to help achieving th e m ...................................................................
43
5.1
C FG rules for addition of integers 0 and 1.....................................................
52
5.2
G ram m ar for addition of integers 1 and 2........ ..............................................
54
5.3
G ram m ar for concatenation of characters a and b .....................................
54
5.4
D erivation of word
+ 2.......................................................................................
55
5.5
D erivation of word a + b ........................................................................................
55
5.6
G ram m ar for operations on integers and ch aracters....................................
55
5.7
D erivation of word a + 2.......................................................................................
56
5.8
C FG
fragm ent for expressing words from G ...........................................
60
5.9
C FG
fragm ent for expressing a d d itio n , ellipsis and ad dition operations.
5.10 C FG
fragm ent for expressing equality o p eratio n ...................................
61
5.11
fragm ent for subscripts and sup erscrip ts.......................................
62
5.12 C FG representation of th e positive and negative p arts of a function. .
64
5.13 C FG fragm ent for m atrices..................................................................................
65
5.14 CFG fragm ent for intervals..................................................................................
6 6
5.15 C FG fragm ent to cap tu re th e sem antics of intervals..................................
67
5.16 G ram m ar for su m m atio n ......................................................................................
6 8
C FG
ix
60
5.17
G ram m ar for su m m atio n ....................................................................................
G9
6.1
C om ponents involved in dynam ic au th o rin g for m ultim odality.............
74
6 . 2
C FG for equality of strings of ch aracters......................................................
83
6.3
CFG for representation of schem es..................................................................
84
6.4
G ram m ar fragm ents illu stratin g gram m ar dependencies..........................
85
6.5
Basic g ram m ar for a d d itio n ...............................................................................
8 8
6 . 6
O peratorless g ram m ar linking
expr and term nonterm inals.......
89
6.7
term and num nonterm inals.......
89
6 . 8
P rim itiv e g ram m ar settin g nonterm inal num to term inal N U M B ER
6.9
Derived g ram m ar for ad d itio n ...........................................................................
90
6.10
R esulting g ram m ar for expressions involving ad d itio n ..............................
91
6.11
Basic g ram m ar for m u ltip licatio n....................................................................
91
6.12
term and factor n o n term inals.....
92
6.13
factor and num nonterm inals.....
92
6.14
Derived g ram m ar for m u ltip licatio n................................................................
93
6.15
R esulting g ram m ar for expressions involving addition and m ultiplication. 93
90
6.16 G ram m ar to sup p o rt th e use of b o th th e composition and extension

o p erato rs....................................................................................................................
94
6.17 C FG for th e binding control m echanism ........................................................
99
6.18 P ro d u ctio n rules for th e m eta-g ram m ar........................................................
103
6.19 A ttrib u ted g ram m ar to su p p o rt th e cap tu rin g of sim ple sum m ations.
104
7.1
D efault gram m ars..................................................................................................
107
7.2
G ram m ar fragm ents created by ed itin g ..........................................................
107
7.3
G ram m ars in dom ain directory G th a t have been created by gram m ar

o p eratio n s..................................................................................................................
7.4
G ram m ars in dom ain directory
109
th a t have been created by gram m ar
o p eratio n s..................................................................................................................
x
1 1 0
7.5
G ram m ars in dom ain directory G \ created by ed itin g..........................
7.6
G ram m ars in dom ain directory G \ th a t have been created by gram m ar
1 1 2
o p eratio n s..................................................................................................................
7.7
G ram m ar in dom ain directory G created by ed iting.......................
7.8
G ram m ar in dom ain directory G created by g ram m ar operations. .
7.9
G ram m ars
113
113
. 114
in dom ain directory G created by ed itin g .............................114
7.10 G ram m ars
in dom ain directory G created by g ram m ar operations.
7.11 G ram m ars
in dom ain directory G j created by ed itin g.............................117
7.12 G ram m ars
in dom ain directory G created by ed itin g .............................117
7.13 G ram m ars
in dom ain directory G? created by gram m ar operations.
118
7.14 G ram m ars
in dom ain directory G created by gram m ar operations.
118
115
List o f Figures
3.1
G regory A bow d's framework for interactive system s.......................
28
3.2
T he Proposed Fram ew ork for Interactive System s............................
32
4.1
Fram ew ork for docum ent au th o rin g environm ents............................
40
5.1
M anv-to-m any relationship between m athem atical concepts and th eir

rep resen tatio n .................................................................................................
47
6.1
S tru ctu re to su p p o rt dynam ic au th o rin g and m ultim odality processing
6 . 2
A sketch of th e dynam ics of th e au th o rin g /ren d erin g process......
xii
74
73
Chapter 1
Introduction
R ath er th a n require th a t users change, system designers could ad a p t th eir
system s to key aspects of th e users work practice [33] . . .
R eading and w riting m athem atics are activities th a t involve distinct characteristics
of the n o tatio n used. R eading requires a stab le m eaning-to-syntax m apping where
concepts may always be identified by an expected syntax. On th e oth er hand, w riting
m athem atics dem ands th e possibility of th e introduction of m eaning-to-syntax m ap
pings th a t, according to th e a u th o r of th e docum ent, best identify th e inform ation
to be com m unicated. T he fact th a t readers benefit from a stan d ard notatio n and
w riters require th e flexibility to define new m eaning-to-syntax m appings is viewed, in
this thesis, as characteristics th a t are in tension.
A pproaching th e specification of th e m ath em atical n o tatio n for electronic docum ents
by providing a stan d ard will, of course, benefit readers. This also implies th a t users
of com puterized system s th a t su p p o rt th e stan d ard will be forced to ad ap t to the
details provided by th e specific n o tatio n in order to m anipulate th e concepts there
represented. O ne may argue th a t learning any notation provided by a system may
not be a m ajo r concern since ad eq u ate hum an-com puter interfaces may be provided
to su p p o rt this activity. This is tru e for th e case when th e underlying m ath em atical
no tatio n is stab le and fixed. It m eans th e relation between syntax and sem antics
does not change and new concepts are not allowed to be added to th e set covered by
th e n o tatio n . It is undeniable th a t n o tatio n s th a t are both stable and fixed could be
enforced for users of com puter algebra system s, for instance. It is also intuitive to see
th a t th e ad dition of ad eq u ate G raphical User Interfaces (GUIs) would help minimize
th e effort required to use any system th a t initially supports only text-based interfaces
for th e m anipulation of m ath em atical n otation. An exam ple of this is M ath T y p e [70]
th a t uses a G U I as a form of helping th e user to produce th e correct TX syntax.
As new concepts are introduced, encodings are needed to sup p o rt th eir m an ipula
tion. C onsequently th e m ath em atical n o tatio n evolves by extending th e relationship
between concept and syntactical representation.
From an a u th o rs point of view
th e relationship between m ath em atical concepts and th eir representation may be ex

pressed in two possible ways: au th o rs may choose to use an already existing syntax,
or they may provide a new syntactical encoding for th e concept.
Regardless of using new or already available no tatio n and using a GUI or any other
ty p e of user interface, com puterized system s to su p p o rt m ath em atical n o tatio n need
to be based on an au th o rin g model. T he set of constraints and facilities th e a u th o r
will experience d uring th e com plete process of generating m ath em atical notation for
electronic docum ents are th e fundam ental characteristics of these models.
A lthough it is reasonable to enforce a specific m athem atical n o tatio n for readers
it does not make sense to restrict th e au th o rin g process to any stan d ard no tation
vocabulary and w riting style.
not necessary.
T his does not indicate th a t a stan d ard n o tatio n is
It ju s t supports th e intuitive notion th a t au th o rs should have the
freedom to modify th e set of m appings between symbols and m eaning provided by a

stan d ard . T he m odifications required d uring th e au th o rin g activity may eith er result
from th e a u th o rs need to com m unicate concepts not supported by th e stan d ard or by
a necessity to redefine some elem ents of th e set of m appings. A nother ch aracteristic
of th is process is th a t auth o rs do not usually supply th eir n o tatio n al conventions at
one specific p a rt of th e docum ent. They, instead, introduce notatio n wherever they
feel it is necessary.
In essence a stan d ard notatio n for th e representation of m ath em atical concepts is
therefore necessary for th e com m unication of inform ation am ong com puter systems.
Exam ples of such n o tatio n s are th e ones proposed by O penM ath [23] and M athM L [14],
However, such stan d ard s are not desirable for su p p o rtin g th e flexibility required by
th e au th o rin g process during th e creation of docum ents containing m athem atics. This
is because user requirem ents regarding th e no tatio n are determ ined d uring th e au
thoring activity. For this scenario a dynam ic no tatio n is needed.
In order to be capable of handling unforeseen m eaning-to-syntax relationships, a n o ta
tional system m ust be organized around th e possibility of describing th e construction
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
of th e rules instead of providing th e rules themselves. This allows au th o rs to create

th e n o tatio n th a t, according to them , best fits th e purpose of th e docum ent. This pro
cess characterizes a m eta-system , and instances of it will consequently be n o tational
systems.
C entral to th e design of any com puter-based application are th e user's characteristics
and th e contexts in which th e application will be used. T he need for m ultiple modes
of com m unication and m ultim edia has been acknowledged bv [73, 80, 15], and m any
others, as a prom ising approach to improve th e com puter access by visually im paired
users.
In p articu lar th e developm ent of m ultim edia docum ents su p p o rted by user
interfaces which can be configured to a d a p t to users with print disabilities have been
addressed by [80]. T he im portance of m ultim odalities and m ultim edia to sup p o rt the
com puter-based com m unication of m athem atics has been em phasized by [42],
This research was originally m otivated to make docum ents accessible to blind people.
Fundam ental requirem ents associated w ith these users lim itations had therefore to
be considered. These concerns included th e followed two possibilities1:
. to allow input and o u tp u t to be perform ed through th e various senses of the

hum an p erceptual system and,
. to optim ize th e use of each m odality in order to ad ap t to th e users' cognitive

abilities2.
The above m entioned characteristics required th e docum ent representation to be in

dependent of th e m o d ality /m ed ia used for com m unication.
1.1
The Problem: Capturing Semantics by Means

of User-Defined Syntax
I am concerned w ith th e design of com puter-based interactive system s for processing

both th e cap tu rin g and rendering of m ath em atical concepts. In this thesis I focus on
'These requirements as well as other characteristics related to the design of multimodal user
interfaces are presented in [91].
2The communication of digital logic diagrams to visually impaired users, for instance, may he
improved when a tactile display is used in combination to speech [59].
th e cap tu rin g p art of th e problem . In order to approach this I consider th e following

issues:
1. T he notatio n used for th e encoding of th e m athem atical concepts is not fixed.
It may be modified a t th e docum ent a u th o r's discretion. This m eans th e au th o r
is free to attach any syntax to any given concept.
2. T he m eaning of m ath em atical concepts can be captured bv m eans of a tex tbased docum ent stru ctu re.
3. T he stru ctu re of any docum ent involving only m athem atics is th e only provider
of m eaning to th e concepts there included.
4. T he user interface used for com puter-assisted docum ent au th o rin g is indepen
dent of th e stru ctu re of th e docum ent. It is viewed as a com ponent th a t com
m unicates w ith th e docum ent stru ctu re.
1.2
Related Work
A discussion of some related efforts which have trea ted th e problem of th e represen
ta tio n of th e sem antics of m ath em atical concepts is presented in this section. Due to
th e im portance of processing electronic docum ents th a t contain m athem atics a new
interdisciplinary field, M athem atical Knowledge M anagem ent (M KM ), has em erged
[13, 12, 45]. This field deals w ith th e intersection between m athem atics and com
p u te r science and aim s to develop b e tte r ways to articu late, organize, dissem inate
and provide access to m ath em atical knowledge. A ST E R [
8 6
], O penM ath [23] and
M athM L [14] are im p o rtan t research projects in this field. P rio r to th e discussion of
th e th ree approaches m entioned, a brief introduction to th e notions of d a ta model
and d a ta representation have been included. T he reason for this is because I believe
they are fundam ental concepts for th e definition of docum ent specification structures.
An intro d u ctio n to th e strateg y proposed by SGM L [57] to stru ctu re docum ents is
also discussed. T he end of this section addresses th e principle of com positionality of
m eaning [98, 58, 101].
1.2.1
D ata M odel and D ata Representation
A ccording to [43] th e concept of d a ta model in a d atab ase relates to th e idea of hiding

d a ta storage details by m eans of d a ta ab stractions. T he stru ctu re provided by the
d a ta ab stractio n s usually includes su p p o rt for d a ta type definitions, d a ta relationships
and constraints which th e d a ta should satisfy. A p art from providing a d a ta stru ctu re
for representing inform ation a d a ta m odel includes operations on th e d a ta stru ctu re
[90].
These operations are th e m eans by which d a ta are accessed, retrieved and
updated.
In addition to a set of operators, an efficient im plem entation of th e d a ta u p d ate
concept requires bo th th e identification and control of redundant d ata. It also involves
th e notions of equivalence, functional dependencies and norm al forms. An exam ple
of a d a ta model which addresses these issues is th e relational d a ta model [32],
A d a ta model is basically a d a ta encoding and a set of operators which m an ipulate
th e d ata, whereas a d a ta representation does not include th e operators. A discussion
involving th e differences between d a ta model and d a ta representation is provided by
[90]. T he im portance of th e notion of u p d ate in d a ta models may be expressed by the
relations between th e notions of u p d ate and equivalence. As em phasized bv [90] an
efficient use of u p d ate should involve some m echanism to control redundancy which
requires th e notion of equivalence.
1.2.2
SGML and XML
T he S tan d ard G eneralized M arkup Language (SGM L) [57] is a docum ent represen
ta tio n language which standardizes th e application of generic coding and generalized
m arkup concepts. One of its im p o rtan t characteristics is th a t it allows docum ents
to be trea ted in a way sim ilar to databases [90, 89]. As a m eta-language, SGML
defines a stan d ard process for th e specification of th e syntax of descriptive m arkup
languages.
This characteristic is based on th e notion of docum ent representation
schemes or D ocum ent Type D efinitions (D TD s) in SGML words.

It is by m eans of D TD s th a t SGM L provides th e necessary constructs to su p p o rt the
representation of th e logical stru ctu re of docum ents. Three fundam ental concepts are
involved in th is activity: entities, elem ents and attrib u tes.
As stated in th e Intern atio n al S tan d ard ISO 8879 [57] an SGML en tity is defined as a
collection of characters th a t can be referenced as a unit. An entity has no stru ctu ral
properties. Its application is restricted to th e replacem ent of a strin g of characters
by an identifier.
S tru ctu red docum ents are com posed of a collection of com ponents. These com ponents
are characterized by th eir context, scope and type. The relationship a com ponent
has w ith o th er com ponents is its context. T he boundaries determ ining th e beginning
and end of a com ponent define its scope. D ocum ent com ponents may contain other
com ponents or ju s t d ata. C onsequently th e type of a given com ponent will either
be determ ined bv th e d a ta or by th e com position of the types of th e com ponents
which co n trib u te to its definition. In SGM L these com ponents are represented by
elements. An SGML elem ent may contain a ttrib u te s. The purpose of th e a ttrib u te s
is to describe some properties of th e element.
SGML provides no operations for u p d atin g D TD s. It relies on editing for accom plish
ing any possible m odification on any of its derived languages. Therefore it represents
descriptions of sta tic d ata. This characteristic is considered a lim itatio n when ap
plied to th e representation of dynam ic d a ta sets. A lthough entities and th e a ttrib u te
pair ID /ID R E F may be used as a way of elim inating redundant d ata, they cannot
be applied to control it since bo th are controlled by th e au th o r of th e docum ent [90].
Also, as pointed out by [90] there is no system sup p o rt to indicate w hether th e use
of ID /ID R E F a ttrib u te s refer to red u n d an t inform ation.
A ccording to [ ] th e Extensible M arkup Language (XML) [20] is a simplified subset
6 8
of SGML th a t has capabilities for su p p o rtin g its use over th e Internet. R elated to this
fact is a relevant distinction between XML and SGML. As pointed out by [
, 62],
XML does not require a D TD to be delivered w ith its associated docum ent. Instead
it requires docum ents to be well-formed.
This characteristic relates to th e proper
nesting of th e s ta rt and end tags used for m arkup.

V alidity constraints on th e content of th e instances not expressible through th e XM L's
D TD s are not effectively verified [17]. This is because X M L's leaf nodes' stru ctu re
is usually eith er plain te x t or empty. This m eans rigorous type checking is not sup
ported.
Checking w hether th e inform ation provided is either a date, a telephone
num ber or a ZIP code, for instance, is not supported.
1.2.3
XML and RELAX NG
XML Schem a provides an altern ativ e to D TD . It allows much m ore rigorous control
and su p p o rts d a ta types. In this thesis th e RELA X NG [27, 28] schem a language is
considered because it has been adopted by O penM ath [23] as th e m ajo r form alism
for its encoding.
A ccording to [90], RELAX NG is a d a ta model since it includes bo th su p p o rt for d a ta
encoding and operations on th e d ata. M ost operations proposed by RELA X NG are
based on th e operations used by D TD s to express d a ta constraints. Some of these
are, for exam ple, choice, optional and zeroO rM ore which correspond to |, ? and *
D T D 's operators respectively.
Among th e d a ta operations RELA X NG proposes, th e replace definition mechanism
is not su p p o rted by XML D TD s. Its im plem entation involves th e ref, include and
define operations. No specific o p erato r is provided for this operation. Its sem antics is
provided by an exam ple [29]. T he sem antics of this operation is sim ilar to th e contextfree g ram m ar extension operation [36] I have proposed in 1998. T he following exam ple
illustrates this operation:
<grammar>
<start>
<element name="addressBook">
<zero0rMore>
<element name="card">
<ref name="cardContent"/>
</element>
</zero0rMore>
</element>
</start>
<define name="cardContent">
<element name="name">
<text/>
</element>
<element name="email">
<text/>
</element>
</define>
</grammar>
A ssum ing th e above syntax is available as th e file addressBook.rng a define elem ent,
containing th e syntax to be replaced, is placed inside an include elem ent. T he syntax
th a t follows replaces th e contents of th e card element.
<grammar>
cinclude href="addressBook.rng">
<define name="cardContent">
<element name="name">
<text/>
</element>
<element name="emailAddress">
<text/>
</element>
</define>
</include>
</grammar>
As a result th e previous g ram m ar defined in th e file addressBook.rng has th e contents

of its card elem ent replaced by th e inform ation provided through th e include elem ent.
1.2.4
ASTER
Audio System For R eadings (A STER ) [
8 6
um ents w ritten in th e
] is an audio previewer for electronic doc
family of m arkup languages.
A ST E R 's processing en
vironm ent m aps th e logical stru ctu re of th e T^X-based docum ent into its internal
representation, a tree d a ta stru ctu re. Therefore browsing through a m ath em atical
expression corresponds to visiting nodes of th e tree. A representation of th e docu
m ent in audio form at is obtained by th e application of a set of com m ands w ritten in
a language called AFL, which stan d s for Audio F o rm attin g Language. O ne facility
this language provides is th e possibility of variable su b stitu tio n . This m eans an AFL
rule may replace a portion of an expression by a label. This allows th e user to obtain
an overview of th e expression prior to g ettin g exposed to all its details.
1.2.5
OpenM ath
Intended to becom e a m ajor stan d ard to su p p o rt th e exchange of m ath em atical infor

m ation, O penM ath concentrates on th e dissem ination of scientific knowledge through
electronic m eans and on th e d istrib u ted processing of m athem atical inform ation [23].
By specifying th e sem antic contents of m ath em atical d ata, O penM ath aim s a t the
inter-operability provision between th e diverse system s capable of processing m ath e
m atical inform ation [23].
T he m ain focus of O penM ath is on th e unam biguous com m unication of m a th em a ti
cal concepts [108]. This characteristic is achieved bv representing th e m ath em atical
concepts as O penM ath objects.
These objects have th e pro p erty of in corporating
b o th th e sem antics and stru ctu ra l inform ation of a m athem atical concept. A ttrib u tes
may be attach ed to O penM ath objects and they can be applied to provide additional
inform ation not related to th e sem antics of th e object such as ty p esettin g details or
th e U RI of a given CD, for exam ple.
O penM ath objects are stru ctu red as basic, com pound and derived.
Inform ally an
O penM ath object is viewed as a tree [23]. Basic objects are th e leaf nodes of the
tree. T h e non-leaf nodes of th e tree are m ade up of its com pound objects. This
choice of organization determ ines th e LISP style O penM ath uses for th e encoding
of its com pound objects. This m eans O penM ath builds expressions by using prefix
operators. O penM ath basic objects are integers, symbols, variables, floating-point
num bers, ch aracter strings, and bytearrays. Derived objects are non-O penM ath ob
jects th a t are im ported by m eans of th e a ttrib u tio n construct. C om pound objects
are created by th e application, binding, a ttrib u tio n and error constructs.
The fact th a t O penM ath aim s a t th e com m unication of m athem atics am ong com
putin g system s is expressed by th e way its objects are encoded. A binary and an
XML form of encoding are defined for its objects. A lthough th e stan d ard states th a t
th e XML encoding is readable and w ritable by hum ans, [37, 108] claim th e encod
ings provided are n either m eant to be read by hum ans nor to be created by editing
procedures w here hum ans directly supply all th e necessary syntax. A m ong th e two
stan d ard encodings available, th e XML encoding is used to define th e m eaning of the
objects to be tran sm itted .
A pplication and binding are O penM ath constructors. An application constructs an
O penM ath object from a sequence of one or m ore O penM ath objects. T he following
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
10
XML encoding illustrates th e use of th e application object to cap tu re th e sem antics

of th e variable x
, + 1
[23].
<0MV name= "x">

<0MA>
<0MS csbase="http://www.openmath.org/cd"
cd="arithl" name="plus"/>
<0MV name="i"/>
<0MI>1</0MI>
</0MA>
</0MV>
A binding is com posed of th ree objects, a binder which is th e first, followed by an

optional set of argum ents which are variables to be bound followed by a body. The
following exam ple is taken from th e a r ith l CD [23] which captures th e m eaning of
th e m ath em atical expression Y}x=\^/x by m eans of th e binding object.
<0M0BJ>
<0MA>
<0MS>
<0MA>
<0MS cd="interfall" name="integer_interval"/>
<0MI> 1
</0MI>
<0MI> 10 </0MI>
</0MA>
<0MBIND>
<0MS cd="fnsl" name="lambda"/>
<0MBVAR>
<0MV name="x"/>
</0MBVAR>
<0MA>
<0MS cd="arithl" name="divide"/>
<0MI> 1 </0MI>
<0MV name="x"/>
</0MA>
</0MBIND>
11
</OMA>
</OMOBJ>
An a ttrib u tio n decorates an object w ith a sequence of one or m ore pairs com posed
of an O penM ath symbol, th e a ttrib u te , and an associated object, th e value of the
a ttrib u te . A ccording to [23] a ttrib u tio n may either be used as an adornm ent or as
sem antical an n o tatio n s depending on th e role associated w ith th e a ttrib u te .
The
stan d ard states th a t when th e a ttrib u te has role sem an tic-attrib u tio n th e a ttrib u te d
object is modified by th e attrib u tio n . For this reason a ttrib u tio n is also considered a
constructor. A lthough this characteristic is referred to as an im p o rtan t feature, the
a ttrib u tio n exam ples included in th e stan d ard only involve adornm ent an n otations.
The following code illustrates bo th th e use of th e attrib u tio n object by associating
non-O penM ath d a ta w ith an O penM ath object by th e use of th e foreign element.
<0MATTR>
<0MATP>
<0MS cd="presentation" name="mathml"/>
<0MF0REIGN>
<math xmlns="http://www.w 3 .org/1998/Math/MathML">
<mi> sin </mi><mfenced><mi> x </mi></mfenced>
</math>
</0MF0REIGN>
</0MATP>
<0MA>
<0MS cdbase="http://www.openmath.org/cd"
cd="transcl" name="sin"/>
<0MV name="x"/>
</0MA>
</0MATTR>
T he error object is not considered because it has no direct m ath em atical m eaning.
Its use is to rep o rt problem s related to th e com m unication of O penM ath objects.
The O penM ath stru ctu re used for grouping O penM ath objects is a C ontent Dic
tionary or CD for short. T he definition of a CD usually includes oth er CDs.
An
exception to this is th e M ETA-CD which contains th e definition of th e stru c tu re of
12
a CD. CDs may be grouped as a mechanism to define collections or groups and both
CD and CD groups are XML docum ents.
T he d a ta provided by a CD may be stru ctu red according to th e ty p e of inform ation
th a t is addressed. Inform ation included in a CD either
belongs to th e whole CD
0 1
is ab o u t th e m ath em atical concepts th ere represented.

R epresented bv th e element OM S , an O penM ath symbol is th e m echanism th e s ta n
dard uses to refer to symbols from a C ontent D ictionary. It is by m eans of its three
attrib u te s, cd, n am e and cdbase th a t th e elem ent O M S determ ines where th e sem an
tics of a nam e is defined. A restriction regarding th e location a t which a sym bol may
ap p ear in an O penM ath object is provided by a characteristic called th e role of the
symbol.
Inform ation related to th e definition of an O penM ath symbol is organized as m an d a
tory and optional d ata. T he nam e and th e description of th e symbol are m andatory.
O ptional inform ation includes exam ples, form al m athem atical properties (F M P ),
com m ented m ath em atical properties (C PM ) and th e role.
T he optional characteristic of F M Ps indicate th a t there exists no consistent way of
expressing th e sem antics of m ath em atical concepts. The definition of th e sum object
as provided in th e a rith l CD is presented by m eans of a te x t description followed by
an exam ple. Even when form al properties are provided it is difficult to determ ine the
set of properties th a t best characterize a concept.
A lthough th e role is one of th e fields of inform ation th a t defines an O penM ath Symbol
its definition is provided as a CD elem ent. It is not clear from th e description provided
by th e stan d ard th e reason why a sym bol characteristic is defined in a CD.
O penM ath extensibility is based on th e notion of CDs. This m eans for each m a th
em atical concept not su pported by th e stan d ard , a CD m ust be provided w ith the
definition of th e concept stru ctu red according to th e O penM ath objects. A lthough
th e latest version of th e stan d ard [23] relies on RELA X N G 's m echanism s to su pport
th e au to m atic generation of CDs, th e definition of th e O penM ath objects included in
th e CD depend on th e sam e ed iting tools used for th e m anipulation of tex t . For this
reason CDs are sta tic descriptions of d ata. O penM ath resolves am biguous definitions
by m eans of th e cdbase a ttrib u te of OMS.
13
1.2.6
MathM L
The M athem atical M arkup Language or M athM L [14] is a W orld W ide C onsortium
(W 3C) recom m endation for describing m ath em atical notation. M athM L is an XML
application which focuses on th e provision of m athem atics on th e W orld W ide Web.
M athM L approaches th e m arkup of m ath em atical concepts by m eans of two sets of
elem ents and a ttrib u te s. It is bv m eans of this property th a t M athM L encodes the
layout as well as th e sem antics of m athem atical expressions. P resen tatio n M athM L
and C ontent M athM L are two languages provided to support this characteristic.
In much th e sam e way TX approaches th e ty p esettin g of m ath em atical tex t, pre
sentation M athM L determ ines th e control over th e display of m athem atics. C ontent
M athM L is m eant to supply m ore m eaning to th e description of m ath em atical con
cepts. O ne restriction this form of m arkup provides is th e lim ited range of m ath e
m atical concepts it covers. This is because content M athM L has been designed to
sup p o rt th e encoding of m ath em atical concepts th a t are used from kindergarten to
th e end of high school and th e first two years of college. Like O penM ath, M athM L
also shares th e characteristic of being a system -oriented approach. This p roperty has
been em phasized by [79]:
while M athM L is hum an-readable, it is an ticipated th a t, in all b u t th e
simples cases, auth o rs will use equation editors, conversion program s, and
oth er specialized software tools to generate M athM L.
C ontent M athM L consists of ab o u t 120 elem ents accepting ab o u t a dozen a ttrib u te s.
The representation of concepts not covered by these elem ents may be obtained by
referring to external definitions. T he M athM L csymbol elem ent or content symbol
is provided to address this lim itation.
This elem ent is th e con stru cto r M athM L
has to refer to a symbol th e m eaning of which is not provided by M athM L 's core
content elem ents.
It is by its two a ttrib u te s definitionURL and encoding th a t
csymbol determ ines th e characteristics of th e external element. T he def initionURL
a ttrib u te specifies th e Uniform Resource Identifier (URI) th a t provides th e definition

for th e new symbol. T he encoding a ttrib u te determ ines th e syntax of th e targ et
th a t has been referred to by th e def initionURL attrib u te . T he content of a csymbol
is either PCDATA or a presentation construct. T he following exam ple illu strates the
characteristics of this form of extension:
14
C csym bol d e f i n i t i o n U R L = " www. e x a m p l e . c o m / C o n t D i f f F u n c s . h tm "

e n c o n d i n g = " t e x t ">
<msup>
<mi> C < /m i>
<mn> 2 </mn>
</msup>
</csym b ol>
The above definition encodes a sym bol th a t sem antically represents th e space of
tw ice-differentiable continuous functions and has its syntax encoded as C 2.
1.2.7
Some Lim itations of B oth O penM ath and MathM L
1. M athem atical expressions in b o th O penM ath and M athM L are built by using
prefix operators. Therefore th e order of entry is counter-intuitive [96] since the
m ental model im posed by b o th approaches determ ine th a t user inputs no tation
from th e inner most nested expression outw ard, instead of from left to right.
2. A lthough b o th stan d ard s su p p o rt m ultim odality of o u tp u t, they have not been
designed to su p p o rt m ultim odalitv of input.
This is because th e ir stru ctu re
involves com plex syntax.

3. B oth stan d ard s are system -oriented.
C onsequently th eir constructs are not
easily readable and w ritable by hum ans.
1.2.8
C om positionality
Regardless of th e notatio n used to express th e m eaning of a m ath em atical concept one

property which needs to be considered is th e sem antic stru ctu re of th e concept. Se
m antic stru ctu re denotes th e p arts which com prise th e concept, th eir ordering, group
ing and relations am ong these p arts. O ne challenge introduced by this characteristic
is to ensure th e correctness of a chosen sem antic stru ctu re for th e representation of a
m ath em atical concept.
T he principle of compositionality o f meaning has been proposed bv [98] as a require
m ent to be considered to th e design of knowledge representation languages.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This
15
concept is covered in detail in a ch ap ter titled Com positionality [58] in th e Handbook

of Logic and Language [101]. T he key idea of th e com positionality principle is th a t
th e m eaning of a sentence can be com posed from th e m eaning of its parts. In a more
precise form this principle is stated as
T he m eaning of a com pound expression is a function of th e m eaning of
its p arts and th e syntactic rule bv which they are com bined [58].
A language is considered com positional if it satisfies th e com positionality principle.
This involves th e decision on w hat are th e basic sem antic and syntactical com po
nents and how they are com bined [58]. Therefore a design th a t is not com positional
indicates th a t its p arts a n d /o r th e syntactic rules which bind them have not been
selected properly. A lthough achieving com positionality of m eaning m ight seem to be
an im possible task, [58] claims th a t
. . . com positionality becomes possible if sem antic considerations influence
th e design of th e syntactic rules.
The above indicates th a t one can always find a syntax th a t allows th e assignm ent of
th e intended m eaning in a com positional form. This property is supported by Theo
rem 9.4 in [58] which claims th a t any possible m eaning can be assigned to any possible
language in a com positional form. For languages characterized by a fixed (static) syn
ta x com positionality of m eaning is a design decision since it can be achieved by the
choice of a su itab le gram m ar. Theorem 9.3 in [58] supports this characteristic. It
proves th a t if a language can be generated by any algorithm it is possible to gen
erate this language by a com positional gram m ar.
According to [98] O penM ath is
com positional and M athM L is not.
1.3
M otivation
T he work of this thesis was originally m otivated by th e necessity of having a T^Xto-B raille tran slatio n system [10, 11]. As characterized in [10, 11, 44], b o th T^X and
stan d ard B raille representations em phasize th e syntactical stru ctu re of th e concepts
involved. For this reason a sem antics-preserving tran slatio n from T^X input to Braille
16
was n o t achieved. T h e recom m endations provided by [10,
1 1
] regarding th is tra n sla
tion included th e necessity of a sem antics-based m arkup. This has, of course, been
noted by m any others in th e field [
, 14, 23].
T he a u to m atic tran slatio n of T^X in p u t into stan d ard B raille o u tp u t wasapproached

by A rrabito [10]. T he im possibility of th is tran slatio n was reported as a consequence
of th e sem antic am biguities of some frequently used m athem atical constructs, in the
T^rjX definition, and th e lack of m eta-rules in th e B raille stan d ard to cope w ith the
m acro expansion characteristic of T^X.
The experience reported by A ST E R and bv A rrab ito s experim ent provided some
valuable insight into th e rendering of m athem atics. Since th e two approaches were
based on input provided from T^X files, they b o th had to deal w ith all th e conse
quences a te x t fo rm atter could im pose when used as a source for representation of
m ath em atical sem antics. A ST E R assum es all its source in p u t are well w ritten lAI^X
3
docum ents. This implies th a t any m acro definition, including th e ones provided by
th e au th o r, m ust reflect th e logical stru c tu re of th e concepts involved in th e definition.
A nother in terp retatio n of this requirem ent is th a t a restriction
isnecessary in order
to lim it th e excess of power provided by T^X to th e user.

The fact th a t lAI^X is characterized by a procedural m arkup
approach, obtained
by means of m acro calls, can be viewed as b o th an advantage and also a constraint.

M acro definitions provide th e ability to su p p o rt th e n atu ral in stability of th e conven
tional m ath em atical n o tation. On th e o th er hand, th e same m acro definitions pose a
m ajo r difficulty to th e processing environm ent w ith respect to th eir use. If expanded,
th e sem antic contents they provide are lost. If not defined properly, they m ay not
carry th e needed sem antics.
As te x t form atters, system s based on I^jX were designed around th e necessity of
having a stru ctu red d a ta representation. T he m ain m otivation for this approach is
th a t a stan d ard izatio n of representation paves th e grounds for its interchange. Bypreserving th e way inform ation is represented, th e possibility of having to re-process
d a ta whenever a new system was introduced or as th e result of an upgrade in the
current system is no longer a concern.
3ASTER's structure is based on the assumption that, distinct mathematical concepts that share
the same syntactic encoding must be described by distinct macro definitions.
4 Procedural markup consists of commands that determine how text should be formatted [34],
17
A lthough docum ent stru ctu res based on stan d ard izatio n of representation favors doc
um ent portability, they are not adequate for rendering docum ents in ways th a t require
different hum an senses for th e un d erstan d in g of inform ation. This has been observed
by bo th R am an [ ] and A rrabito [
8 6
1 0
] while working on m apping RTgX into speech
and T^X into Braille respectively. T he necessity of having a docum ent stru ctu re th a t
would allow th e m ath em atical concepts be com m unicated regardless of m edia used or
th e hum an senses involved, m otivated th e research reported in th is thesis. T he section
th a t follows outlines a sem antics-based solution to th e specification of m ath em atical
concepts.
1.4
A Solution: Dynamical Docum ent Structure
I propose th a t th e m eaning of m ath em atical concepts can be captured in a useroriented
way by m eans of an ap p ro p riate g ram m ar form alism which satisfies the
following criteria. T he g ram m ar form alism must

1
. model th e dynam ics of au th o rin g m athem atics,
. describe th e stru ctu re of th e rules bv which syntax is created,
3. provide operations on th e rules th a t define syntax and

4. su p p o rt th e definition of syntax by th e application of th e operations on these
rules.
In my thesis I introduce a text-based docum ent stru ctu re (D ocum ent D escription
Model) which satisfies th e above four criteria, and is therefore capable of cap turing
th e sem antics of m ath em atical concepts in a user-oriented way. T he proposed model
has th e following characteristics:
1
. it su p p o rts bo th th e extensibility and am biguity characteristics of th e conven

tional m ath em atical notatio n and
. it allows th e au th o r of a docum ent th e possibility of introducing h is/h e r own

syntax for th e encoding of th e m ath em atical concepts.
5In the context of this work user-oriented refers to a design approach focused on the needs of the
end user.
18
I claim th e following:
T he m eaning of m athem atical concepts can be captured by a ttrib u te d contextfree gram m ars.
E xtensibility can be supported by operations on th e a ttrib u te d gram m ars.
A m biguity generated by sym bol overloading can be resolved by a scope mech
anism w here th e m eaning of concepts is uniquely defined.
1.5
Approach Taken
This thesis presents a gram m ar-based approach to th e specification of m ath em atical

notatio n . T he m ethod introduced is based on a m eta-stru ctu re th a t uses a ttrib u te d
context-free gram m ars for cap tu rin g th e m eaning of m athem atical concepts. This
stru ctu re su p p o rts th e creation of m ulti-purpose docum ents and allows th e specifica
tion of m ath em atical n o tation in a dynam ical way [36]. In th e context of this thesis,
th e term m ulti-purpose docum ents refer to docum ents th a t may be rendered or used
in different ways, some of which m ight not be known at th e tim e th e docum ent is
created. By dynam ical it is understood th a t th e m eaning associated w ith syntax is
allowed to be modified.
T he proposal described in this thesis is based on an au th o rin g model which addresses
th e user needs as a fundam ental requirem ent. This characteristic is stru ctu red around
a scope m echanism th a t allows th e m apping between m eaning and syntax to be
modified a t any tim e d uring th e creation of th e docum ent. This process supports
th e dynam ic characteristics of th e m eaning-to-syntax binding necessary d uring the
auth o rin g of m athem atical concepts. T he m ulti-purpose property is su pported by a
sem antics-based cap tu rin g mechanism [49, 11, 21] th a t provides th e possibility for
th e represented concepts to be processed according to th e specific requirem ents of
applications.
M odular g ram m ar fragm ents characterized by a one-to-one m apping
between m ath em atical concept and g ram m ar representation provide th e adequate

sup p o rt for th e definition of th e various scopes. An increm ental u p d ate process is
defined as a way to modify th e necessary g ram m ar fragm ents to su p p o rt th e changes
proposed d uring th e au th o rin g process.
19
1.6
Thesis Overview
In order to provide docum ent auth o rs w ith th e freedom of com m unicating m a th e m a t

ical concepts bv m eans of th e syntax th a t, according to th e authors, best represent
th e concepts involved, adequate docum ent stru ctu res need to be available. This thesis
addresses th is problem by introducing a system atic approach th a t allows an au th o r
to cap tu re th e m eaning of each m ath em atical concept according to th e syntax h e/sh e
feels best describes it.
T he approach presented here is based on a m eta-stru ctu re which has been designed
w ith th e sup p o rt of a ttrib u te d context-free gram m ars.
C h ap ter 2 introduces the
reader to th e no tatio n and th e fundam ental definitions. Some of th e definitions pro

vided may be found in books covering th e theory of com putation, however they have
been included to make th e thesis n o tatio n ally self-contained.
A framework for interactive system s is proposed in C h ap ter 3. T he proposed fram e
work is based on th e model developed by Abowd [4] and introduces an additional
tran slatio n in order to sup p o rt th e consultation of th e system 's s ta te by th e user.
T he framework is refined in C h ap ter 4 by th e decom position of its core com ponent
into two subcom ponents, th e O p eratin g System and the D ocum ent S tru ctu re. This
organization is also used in th a t ch ap ter to su p p o rt th e claim th a t docum ent a u th o r
ing is an interactive activity th a t requires an environm ent for its fulfillm ent. Defined
as a p air (D ocum ent S tructure, User Interface) th e notion of A uthoring Environm ent
separates user interface com ponents from th e stru ctu re of th e docum ent. C h apter
4 also provides th e basic concepts needed for th e definition of requirem ents for the
evaluation of au th o rin g environm ents. For this purpose a set of properties is provided.
In C h ap ter 5 th e possibility of cap tu rin g th e m eaning of m athem atical concepts by
m eans of context-free g ram m ar fragm ents is introduced. This possibility illustrates
th a t although these gram m ars can be used for th e cap tu rin g activity, they do not
provide th e necessary sup p o rt for b o th extensibility and am biguity characteristics of
th e conventional m athem atical n o tatio n . T he m ajo r lim itation w ith this approach is
because context-free gram m ars only su p p o rt sta tic descriptions of sem antics. This
restriction is addressed in C h ap ter
w here th e dynam ics of docum ent au th o rin g is
considered. T he approach developed in th a t ch ap ter proposes th e docum ent stru ctu re

com ponent for com puter-based au th o rin g environm ents. This stru ctu re is com posed
of two com ponents: a sequence of sets of gram m ars called Sem antic S tru ctu re and a
20
gram m ar called th e B inding C ontrol m echanism. T he sem antic stru ctu re is based on
a ttrib u te d context-free gram m ars and it addresses extensibility bv com bining gram
m ar definitions. Two g ram m ar operations are defined for this purpose. These opera
tions assum e th e gram m ars involved have been defined according to th e restrictions
specified by a norm al form proposed in th e chapter. The am biguity ch aracteristic is
approached by a context switch which allows th e replacem ent of a sem antic stru ctu re
by another. C h ap ter 7 provides a set of exam ples. These exam ples are used to il
lu stra te th e characteristics of th e approach introduced in C h ap ter
processing th e docum ent organization presented in C h ap ter
te r
. A stru c tu re for
is proposed in C hap
. T he language processing model introduced is defined as a d eterm inistic finite
au to m ato n th a t has its states characterized as sets of gram m ars and its tran sitions
by th e m eaning-to-syntax bindings established d uring authoring. C h ap ter 9 contains
a discussion of th e approach proposed by this thesis, conclusions and suggestions for
future work.
21
Chapter 2
B asic N otions and N otation
This ch ap ter presents th e n o tatio n to be used th roughout this thesis and includes
th e necessary basic definitions. T he specification of gram m ars may be approached
by listing th e ir production rules whenever a com plete specification is not necessary.
All gram m ars in this thesis will be displayed in tab le form. T he g ram m ar's nam e
will always ap p ear in th e far left colum n and each row of th e tab le will contain a
production rule w ritten w ith spaces as symbol delim iters.
B oth nonterm inal and
term inal symbols are represented by strings of characters, possibly linked by the
underscore character. Lower case strings of letters are used to represent nonterm inals
and u pper case letters and o th er characters are used to represent term inal sym bols1.
T he symbol | is som etim es applied to group tog eth er rules associated w ith th e sam e
nonterm inal. T he nonterm inal on th e left of th e production rule in th e first row is
th e s ta rt symbol. T he arrow is replaced by a colon in all gram m ars except th e one
for th e m eta-stru ctu re. For a ttrib u te d gram m ars, one additional colum n is included
a t th e right edge of th e table to represent th e a ttrib u te s associated w ith th e rules.
Strings of a rb itra ry characters are used to represent attrib u tes.
2.1
Basic Definitions
T he m ain definitions are here included in order to establish th e no tatio n th a t is used

th ro u g h o u t th e thesis. For fu rth er inform ation see [55] as a stan d ard reference.
1The choice of representation for both terminals and nonterminals is consistent with the approach
used by compiler tools such as lex and yacc.
22
An alphabet is a finite non-em pty set. Elem ents of an alp h ab et are called symbols.
Let A' be an alp h ab et. Then A'* is th e set of all words over A' including th e em pty
word e.
D efinition
A context-free gram m ar (C FG ) is denoted G = (AT, T, P, S ) where N is
an alp h ab et of nonterm inal symbols, T is an alp h ab et of term in al symbols such th a t

N ( I T = 0, P is a finite set of (production) rules of th e form A > w w ith A E N and
w E ( N U T )*, and S E N is th e s ta rt symbol.
Let G = (N , T, P, S ) be a context-free gram m ar, let V = N U T , and let u, v G V*.

The word v is derived from u in one step, if th ere is a rule A w G P and th ere are
words U\,U 2 G V* such th a t u u \ A u 2 and v = U\WU2 - T he fact th a t v is derived
from u in one step is denoted by u => v. We w rite u =>* v to denote th e fact th a t there
is a non-negative integer n and there are words u0, Ui , . . . , un G V'* such th a t u = u 0,
v = un, and Uj_i => u* f r i = L , n - ' n this case we say th a t v is derived from u,
th e integer n is th e number o f derivation steps, and th e sequence uq, i, . . . , un G V'*

is a derivation of v from u. T he set
L (G ) = {u | u G T* and S =>* u}
is th e language generated by G.
D e f in itio n 2 Let G = ( N , T , P, S ) be a context-free gram m ar. For all rules A >

w E P , A E N , w E ( N U T)*, A is called th e left (hand) side, or lhs, of th e rule,
and w is th e right (hand) side, or rhs of th e rule. For p = A > w, lhsp = A and
rhsp = w. T he set of nonterm inal sym bols of p is Np = Lp U Rp w here Lp = {lhsp}
and Rp = { M \ M E N and rhsp = W \M w 2 ,Wi a nd w 2 E V*}. T he set of term inal
symbols of p is Op {x | x E T and rhsp = W \xw 2, w i and w 2 G V*}.
D e f in itio n 3 An attributed gram m ar is a sextuple G = (N , T, P, S, A , a ) w ith the

following properties:
T he quadruple G = { N , T , P, S) is a context-free gram m ar, th e underlying gram
mar.
A is a language over some finite alp h ab et, th e attribute language.
23
is a m apping of P into A , th e attribute
assignment.
Any word in A is called an attribute. For a rule p P , the word a( p) is th e attribute

of p.
D e f in itio n 4 A d eterm inistic finite au to m ato n (DFA), M, is a quintuple, ( Q , E, 6, s, F ),

whose
Q
is an alp h ab et of sta te symbols,
is an alp h ab et of input symbols,
s Q, w here s is th e s ta rt state,
F C Q , w here F is th e set of accepting states, and
S : Q x E > Q is th e tran sitio n function
24
Chapter 3
A Framework for Interactive
System s
In this ch ap ter a framework for interactive system s is proposed.
introduced here is based on th e model defined by Abowd [4].
T he framework
It differs from his
approach by th e introduction of an additional tran slatio n which connects th e user

and th e o u tp u t com ponent of th e system 's interface.
3.1
Basic N otions
C om puter-based system s have been designed to su p p o rt a wide variety of hum an

activities.
H um an com m unication is one field wrhich has been expanding through
sup p o rt from com puter technology. In this section some aspects of hum an-com puter
com m unication are discussed.
3.1.1
Electronic and Paper D ocum ents
It seems th e sta tic world of p ap er docum ents is gradually being replaced bv the
dynam ic environm ent of digital inform ation. In th e electronic form, docum ents need
to be stru ctu red in order to be processed by com puting systems.
A key elem ent of electronic docum ent processing is th e possibility of easy m an ipula
tion of a docum ent's atom ic elem ents by m eans of digital devices. This idea intro
duced th e necessity to view docum ents not only as printed o u tp u t generated by a
25
digital m achine, b u t also introduced th e need to store docum ents in a way to provide
full p o rtab ility to oth er com puter environm ents easily. This m eans th e stru ctu re of
docum ents needed to be preserved.
This way of viewing docum ents suggests they are composed of a logical stru ctu re, a
set of a b stra c t com ponents, and contents w here th e actual contents of th e docum ents
can be found. T he logical stru ctu rin g of docum ents is based on th e decom position
of docum ents into parts. Each p a rt in th e stru ctu re has a p articu la r m eaning and
may, recursively, be subdivided into oth er p arts. In this way th e whole docum ent
can be represented as a collection of hierarchically-related com ponents. An ab stract
com ponent, a given parag rap h of a docum ent, for exam ple, may be expressed over
one or more tw o-dim ensional page space, in various different ways, depending on
specifications of font, hyphenation, line length and other concrete variables.
The
sam e logical com ponent may be m apped into different concrete variables and then
m ade available in different m edia by m eans of a tactile display, a Braille p rin ter
or audio, for instance. In this thesis th e process of tran slatin g ab strac t docum ent
com ponents into concrete ones is defined as rendering. T he production of hardcopy,
images, speech or any other possible presentation stru ctu res from concrete docum ent
com ponents to o u tp u t devices are defined as viewing.
According to Levy [67] docum ents have been created in response to a hum an necessity
to provide stabilities in a constantly changing world. T he notion of fixing th e form of
a docum ent as a m eans of fixing its contents is viewed as a property docum ents have
which he defines as invariance.
It is intuitive to relate this notion of invariance to p ap er docum ents since they are the
result of a process by which surfaces of pap er sheets are usually m arked in a stable
way. On th e o th er hand electronic docum ents usually require rendering in order to
be m anipulated by hum ans. T he fact th a t one given ab stract docum ent com ponent
may be m apped into different concrete ones indicates th e existence of a one-to-m anv
relationship between them . This relationship is an im p o rtan t property of electronic
docum ents because it allows various m edia to be used to deliver th e inform ation
provided by th e ab strac t docum ent com ponent. T he idea of using different m edia to
com m unicate is discussed in th e subsection which follows.
26
3.1.2
Comm unication, M edia and M odalities
Inform ation is shared am ong hum ans bv a com m unication process. This process may
always be described in term s of th ree fundam ental com ponents: a sender, a receiver
and a com m unication channel or m edium .
Inform ation carriers such as com puter
input and o u tp u t devices and th e physical carriers such as sound waves and photon
distrib u tio n s are media. Therefore m edium is th e physical channel used for inform a
tion encoding. Sensory m odality is a hum an mechanism of perception where vision,
hearing, touch, smell, taste, and balance are used for th e processing of incom ing
inform ation.
R epresentation m odality is th e way inform ation is encoded in some
medium .
C om m unication through a given set of m odalities is only possible when provided
by adequate inform ation carriers.
T he following scenario illustrates this relation:
Consider, for instance, th e directions given by one person to an o th er to find a place

in a city. T he necessary directions may, for exam ple, be given by voice in com bination
w ith gestures. In this case th e sensory m odalities used are hearing and vision. The
sound waves and photon d istrib u tio n s are inform ation carriers.
B oth th e spoken
language and th e set of gestures are representation m odalities.

Sensory m odalities are physical characteristics of th e hum an body, therefore their
num ber is fixed. On th e o th er hand th e num ber of inform ation carriers varies. In
the scenario illu strated by th e above exam ple th e inform ation carriers were chosen
to characterize a face-to-face or hum an-to-hum an com m unication activity.
In this
thesis, th is form of inform ation exchange is characterized by th e absence of com puterbased system s and by th e fact th a t bo th sender and receiver are hum ans sharing
place and tim e. H um ans also exchange inform ation w ith th e aid of com puter-based
system s. This form of inform ation exchange is referred to here as com puter-assisted
com m unication. T he concept of com puter-assisted com m unication is, in this thesis,
used in a broad sense.
Its m eaning includes th e notion of bo th hum an-com puter
interaction and com puter-m ediated hum an-to-hum an interaction. Also in th e context
of this work, interaction is used to refer to th e com m unication between user and
system .
H um ans usually make use of available m edia to com m unicate ideas and feelings.
A lthough th e increase of inform ation carriers does not necessarily improve th e com
m unication it is, m ost of the tim e, expected th a t th e inform ation to be shared is
27
available to th e receivers through all possible m odalities. A ccording to B unt [22],

people com m unicate w ith each other, m ost of th e tim e, according to w hat he calls
th e M ultim ax Principle. He defines this principle as follows:
In n atu ra l com m unication, th e p articip an ts use all th e m odalities and
m edia th a t are available in th e com m unicative situation.
The m ultim ax ch aracteristic is present even in situations w here one of th e parties
involved by th e com m unication is not capable of com m unicating in all th e m odalities
by which inform ation is m ade available. As an exam ple of th is consider, for instance,
th e face-to-face com m unication between sighted and blind people. If it is assum ed the
sighted person com m unicates w ith th e blind using voice and gestures, for exam ple, it
is clear th a t th e inform ation provided by a set of gestures will not be processed bv
th e blind. A lthough it is known th a t th e exchange of inform ation w ith blind people
is not im proved when gestures are used, sighted people do not usually avoid this
representation m odality when com m unicating w ith blind people.
It is in tuitive to th in k ab o u t com puter-assisted com m unication in term s of face-toface com m unication having all available m edia and m odalities as characterized by the
m ultim ax property. O ne challenge to th is approach is the definition of ad eq u ate stru c
tures for b o th software and hardw are to su p p o rt this characteristic. T he rem ainder
of this ch ap ter discusses some aspects of th e software needed in term s of a framework
to sup p o rt th e interaction between th e user and th e com puter.
3.2
User Interface Basic Components
User interface design for com puter applications is an interactive process w here sets
of objects are m anipulated. These objects can be stru ctu red according to th e role
they play in th e interaction. They can be of input, o u tp u t or b o th input and o u tp u t
types. They may also be of direct use in case th e physical object is m anipulated, or
they can be of indirect access if no physical in teraction is p erm itted .
The com ponent which connects in p u t and o u tp u t objects is generally referred to
as a system . Therefore th e user accesses th e system by m an ip u latin g th e interface
objects. System s differ by th eir intrinsic characteristics. These qualities are viewed
as statem en ts of a language which can be used to represent th e system . This will be
28
referred to as th e core language. Users can be described in term s of psychological and

physical characteristics relevant to th e com m unication w ith th e system . Users have
goals which m ay be realized by th e system . These goals are stru ctu red as activities
which th e user may realize by com m unicating w ith th e system . These properties may
also be expressed as language statem en ts which we call th e task language.
T he system s sta te is reported in forms defined bv th e o u tp u t objects. T he a ttrib u te s
which establish th e way th e sta te of th e system is rendered characterize th e language
used by o u tp u t objects to com m unicate. In a sim ilar way, user requests are sent to the
system by configuring input a ttrib u te s according to th e required behavior defined by
th e task to be perform ed. T he a ttrib u te s involved in these ty p e of requests represent
th e features of th e language th e user has to use to interact w ith th e system .
OUTPUT
SYSTEM
core
task
F igure 3.1: Gregory A bow ds framework for interactive system s.
3.3
An Existing M odel
T he interaction framework proposed by Abowd [4] describes th e com m unication be

tween user and com puter by a m odel com posed of four com ponents and four tra n s
lations. T he com ponents represent th e stages th e interaction goes through. Each
com ponent has its own language by which its internal characteristics are defined.
T he tran slatio n s are used to m ap knowledge between th e com ponents. F igure 3.1
29
illu strates this framework. In this figure com ponents are represented as nodes and
tran slatio n s are th e arrows linking th e nodes. C om ponent nam es are typeset in upper
case letters and bo th th e nam es for th e languages and tran slatio n s are in lower case.
The languages are task , input, core and output.
As shown in Figure 3.1 articulation connects th e USER to IN P U T . Therefore it is
used to represent th e users intentions in term s of th e stru ctu re provided for d a ta entry
by th e system . Perform ance is responsible for th e tran slatio n of inform ation collected
during th e input stage into core d ata. T he s ta te of the system is m ade available to
o u tp u t devices by presentation. Observation is th e user's ability to perceive th e sta te
of th e system .
3.3.1
A Structuring Problem
It is intuitive to decom pose th e in teraction between user and com puter in term s of
execution and evaluation semicycles [39]. D uring this process th e user's intentions,
represented as statem en ts of th e task language, are m apped as in p u t com m ands which,
after execution by th e system , are observed and evaluated by th e user. If th e user's
intentions cannot be com pleted in a single cycle of interaction, o th er related cycles are
introduced. T he additional cycles are viewed as refinements of th e intended task to be
realized. T he fram ework proposed by [4] relates articulation and perform ance to the
execution semicycle and presentation and observation as elem ents of th e evaluation
semicycle. As defined by this approach, th e interactive cycle begins w ith th e USER
by th e form ulation of a goal, and a task to accom plish the goal. This approach is also
based on th e assum ption th a t th e only way th e user can m an ip u late th e m achine is
through th e IN P U T . For this reason, th e task m ust be articu lated w ithin th e input
language. A lthough A bow ds framework assumes th a t execution and evaluation are
not always altern atin g semicvcles, th e m odel does not indicate th e procedure to be
followed when th e user's goals first require th e knowledge of th e system ss sta te as
provided by th e o u tp u t devices1. As illu strated in Figure 3.1 A bow ds framework
establishes th a t th e evaluation semicycle always precedes th e execution semicycle.
Therefore following th e p ath as defined by th e arrows connecting th e USER and the
O U T P U T com ponents, articulation , perform ance and presentation are identified as
1A typical scenario for this is a user interfacing with a display-based system which first prompts
the user for input.
30
nonactive tran slatio n s for activities when input devices are not involved.
The notion th a t th e interaction cycle m ust s ta rt w ith th e user bv th e form ulation
of a goal and a task is accepted in this thesis.
However th e user is free to either
m an ip u late th e system by m eans of its in p u t devices or consult th e system 's sta te as

supplied by th e o u tp u t. T he following section proposes an additional tran slatio n to
A bow ds framework as a way of approaching th is nondeterm inistic behavior.
3.4
A Different Structure for Interactive System s
This section introduces an additional tran slatio n to th e framework proposed in [4],

The inform ation m ade available to th e user by th e system 's o u tp u t devices is now
stru ctu red as a process com posed of two phases, consultation and observation. By
consulting th e o u tp u t provided by th e system , th e user obtains th e available require
m ents to continue h is/h e r activity. These requirem ents are viewed as conditions from
th e system 's perspective and as possible m odifications to th e task to be perform ed
from th e users p oint of view. T he m odifications may be as sim ple as th e addition
of an ex tra interaction cycle or as com plex as requiring th e com plete task to be re
stru ctu red . As an exam ple of this, consider th e scenario w here a client of a bank tries
to w ithdraw cash from h is/h e r account by m eans of an au to m atic teller machine. If
th e system is in a sta te which displays an out of order message, th e client has to
modify h is/h e r g o al/task pair because h is/h e r intentions could not be expressed by
th e system 's interface at th a t p articu la r instance. Consultation is therefore viewed
as a tran slatio n which m aps th e users expectations to th e sy stem s s ta te as supplied
by th e o u tp u t devices.
3.4.1
A N ew Framework
A new framework for interactive system s based on th e work developed in [4] is intro
duced here. T he proposed framework differs from th e model of [4] by th e in troduction
of an ad d itio n al tran slatio n which su p p o rts th e consultation of th e system s sta te by
th e user th ro u g h th e o u tp u t devices.
T he notion of interactive cycles is understood as sequences of com ponents connected
by tran slatio n s. T he sequences represent th e derivation of words of a language defined
.31
by all possible tasks which can be realized through th e system bv m eans of the
interface. T h e results obtained by th e derivation procedure represent th e user's tasks
th a t have been com pletely realized by th e resources available. These characteristics
are represented by th e right-linear g ram m ar
G = (N , T, P, B )
with
N = {U,I,S,0}
T = {c,a,p,v,o}
P = {U
cO \ a l \ e, I
pS, S
vO, O -> o U }
and
B = U
where U , I, S and O are short forms for USER, IN PU T, S Y S T E M and O U T P U T

respectively, and c, a, p, v and o are representations for consultation, articulation,
performance, presentation and observation respectively.
G ram m ar G is nondeterm inistic. This characteristic relates to th e need th e user may

have to analyze th e o u tp u t in order to decide th e next action to be taken. D uring
th e analysis process th e user may refine or even redefine th e m ental model h e/sh e
has developed.
A lthough regular languages can be graphically represented by the
stan d ard sta te tran sitio n diagram s, statech arts [52, 53] will be used. T he reason for
this choice is due to th e fact th a t hierarchical stru ctu res are b e tte r visualized when
represented by these diagram s.
T he dynam ics of th e proposed model is captured
by th e sta te c h a rt in F igure 3.2. T he statech art in this figure has d epth two since
it stru ctu res th e states in two layers or levels of abstraction. T he higher level has
S Y S T E M , I N T E R F A C E and USER as states. T he lowrer level is a refinem ent of the
IN T E R F A C E sta te and is com posed of only two states, O U T P U T and I N P U T As it
can be seen lower case letters have been used to typeset bo th th e nam es for languages
and tran slatio n s. Each language has been placed inside th e box where its related
sta te nam e is located.
32
FRAMEWORK
SYSTEM
USER
ts k
crtkaU
inpat
Figure 3.2: T he Proposed Fram ew ork for Interactive Systems.
3.5
Example
Consider th e scenario where a client of a bank fails to w ithdraw cash from an A uto
m atic Teller M achine (ATM) because h e/sh e has forgotten th e required bank card.
The client/A T M interaction, for this case, may be described by th e following tasks:
C onsult sta te of ATM by reading inform ation provided by its display, and
In terp ret inform ation from display.
It is d uring th e Interpret inform ation from display task th a t th e client realizes the
adequate bank card m ust be supplied. N ot having th e needed card, th e client stops
th e cash w ithdraw activity and consequently th e client/A T M interaction term inates.
This activity may be expressed by th e framework proposed in this ch ap ter bv the
regular expression (co)*. T he tran sitiv e closure is used in this case to indicate the
client's necessity to cycle through consu ltatio n /o b serv atio n zero or as m any tim es as
h e/sh e feels it is necessary.
T he regular expression ((co) + ( a p v o ))* represents all possible interactions th e user
may have w ith th e system .
T he term (apvo) represents interactions th a t involve
both execution and evaluation semicycles. This characteristic is present in b o th the

framework proposed here and in A bow d's framework. T he term (co) involves only
interactions th a t include th e evaluation semicycle. This characteristic is not included
in th e A bow d's framework.
33
3.6
Summary
A framework for interactive system s which is based on th e model defined by Abowd [4]
is introduced in th is chapter. T he proposed approach uses an ad d itio n al tran slatio n
as a way to su p p o rt th e necessary user analysis of th e system 's sta te as supplied by the
o u tp u t devices. T he com plete cycle of interaction is modeled as a regular language.
A graphical representation of this organization is provided in a statech art form at.
34
Chapter 4
A uthoring Environm ents
4.1
Introduction
T he purpose of this ch ap ter is to provide an u n d erstanding of th e docum ent au th oring

process and to establish a context for th e discussion of th e quality of environm ents
used in th e au th o rin g of docum ents containing m athem atics. For this reason a set of
characteristics is considered in order to assess th e quality of th e environm ents. An
ideal environm ent is proposed and design approaches which may be used in order to
achieve them are presented.
4.2
Interaction Objects and Authoring Environ

m ents
This thesis considers com puter-based docum ent au th o rin g as an interactive process.
D uring th is process th e a u th o r m anipulates docum ents by m eans of interaction objects
as defined in C h ap ter 3. These objects can be m anipulated directly or indirectly by
th e user. A docum ent au th o rin g environm ent is a com bination of interaction objects
and is stru ctu red according to th e form of control th e au th o r has over th e interaction
objects involved.
C onsider a pe n /p a p er docum ent au th o rin g environm ent for instance. In this organi
zation, th e a u th o r uses th e pen to record inform ation on th e paper. This environm ent
is characterized by th e fact th a t all objects involved are directly m an ip u lated by th e
35
auth o r. T he interaction is com pletely under th e a u th o r's control because all infor
m ation printed on pap er results from direct actions perform ed by th e a u th o r on the
interface objects. To illu strate th e notion of a docum ent au th o rin g environm ent con
sider, for instance, a docum ent such as a research report w ritten in English. Table
4.1 provides a description of th e p e n /p a p er environm ent according to th e interaction
framework proposed in C h ap ter 3.
U SER
task
A uthor
Produce a h an d w ritten d raft of a research report in English
articu latio n
H and m ovem ents associated w ith handw riting
IN PU T
input
p e n /p a p e r
pen strokes
perform ance
cursive w riting
SYSTEM
core
P e n /p a p e r te x t au th o rin g
W ritten tex t
presentation
R endering of cursive w ritten tex t on pap er
O U TPU T
o u tp u t
P ap er
Sets of h an d w ritten cursive characters printed on pap er
observation
Sets of h an d w ritten te x t according to th e

fo rm attin g style defined
consultation
In terp retatio n of th e cursive sets of characters based on the

English syntactical and sem antic definitions
Table 4.1: P e n /p a p e r au th o rin g environm ent.
A lthough in com puter-based au th o rin g environm ents th e au th o r directly interacts

w ith physical objects such as keyboard and mouse, th e expected result, when avail
able, is also dependent on objects of an indirect form of control. Software and h a rd
ware com ponents not available for m anipulation by th e system 's users are considered
here as indirect objects. Table 4.2 presents th e description of a T^jX-based environ
m ent for th e research report au th o rin g task.
D ocum ent au th o rin g environm ents which make use of no object of indirect interac
tion form of control are referred to here as direct or ideal environm ents. All other
environm ents are considered indirect.
36
U SER
task
A uthor
Produce a d raft version of a research rep o rt in English
using plain
m acros
articu latio n
H and m ovem ents associated w ith typing
IN P U T
input
K eyboard
Key strokes
perform ance
Key decoding
SYSTEM
core
All related hardw are not directly accessed by th e au th o r

Plain TX m acro package plus o perating system used
presentation
T^jX com piler plus dvi viewer
O U TPU T
o u tp u t
Video display
Sets of characters rendered on the display
observation
R eading th e displayed te x t according to th e

characteristics of th e dvi viewer
consultation
In terp retatio n of th e sets of characters based on th e

English syntactical and sem antic definitions
Table 4.2: T^X-based au th o rin g environm ent.
4.3
Cognitive D istances
T he characteristics of th e au th o rin g environm ents as presented in Tables 4.1 and 4.2

show th a t th ere exists a body of knowledge th a t th e au th o r is required to know in order
to accom plish any established task successfully. An im p o rtan t ch aracteristic related to
th e com m unication between USER and S Y S T E M is th e difficulty th e USER may have
in m apping intentions into physical com m ands of th e input language. This difficulty
is referred to as th e gulf of execution [76, 56, 39]. A nother relevant characteristic of
interactive system s is th e difficulty th e user has in interpreting th e available o u tp u t.
This difficulty is called gulf of evaluation [76, 56, 39]. B oth th e gulf of execution
and th e gulf of evaluation are results of design decisions th a t are usually related to
restrictions im posed by th e specification of th e interactive system . These gulfs are
viewed, by th e user, as distances to be bridged in order to realize tasks successfully
through th e provided interface.
For ideal environm ents such as th e p e n /p a p e r one, th e knowledge needed to bridge
both gulfs is not relevant if we assum e th e a u th o r already knows how to read and
37
write. C om puter-based environm ents usually require additional knowledge. To au

th o r docum ents using a T^X-based system , for instance, a user should, a t least, know
th e basics of b o th I^ X and th e underlying o p eratin g system in addition to typing and
reading from a display screen. T he cognitive distance to be bridged in th is scenario
will depend not only on th e m anipulation of th e physical objects which com pose the
interface; b u t it will also relate to th e users knowledge of th e tool used for typesetting.
As stated in [4, 76, 56], sem antic distance relates th e tran slatio n between th e user's
intentions and th e m eaning of th e interface language. This distance is a function of
both th e expressiveness and th e conciseness of th e input language. Expressiveness
relates to th e scope or sem antic coverage of a language. Ideally, highly expressive
languages provide sup p o rt for th e representation of all concepts in th e dom ain in
which th e language is intended to be applied.
Conciseness relates to th e m apping
th e language provides to link tasks to th e in p u t syntax. Highly concise languages

are stru ctu re d in a way to cap tu re th e sem antics of tasks, in th e language's dom ain,
by syntactically sim ple statem en ts. T he
m acro package, for instance, is highly
expressive b u t it is not concise.
4.4
Rendering Information
Inform ation exchanged in h um an-to-hum an com m unication is usually inaccurate and

unclear. For th is reason different forms of inform ation exchange are usually neces
sary. In n atu ra l language com m unication, for instance, we often use gestures and
vocal sounds not related to th e language as an a tte m p t to improve th e tran sfer of
inform ation. However, in user-com puter interaction th e acknowledgment of a mouse
click may be reported by bo th a display change and a sound signal. In this case
th e user-com puter interaction is enhanced by th e provision of feedback to th e click
action in two d istin ct modes. As an o th er exam ple consider th e flight boarding an
nouncem ents th a t are usually m ade in m ost airp o rts through video term inals and
speech. In th e described scenarios th e additional m odality may be viewed as a form
of redundancy th a t enhances th e quality of th e inform ation tran sfer process.
C entral to th e use of m ultim odality as a form of com m unication enhancem ent is
th e notion of sem antics-based inform ation organization.
This form of stru ctu rin g
d a ta is understood as fundam ental in designing system s to be used for broadcasting

inform ation. It establishes th a t th e d a ta to be supplied to th e m odality Tenderers
38
m ust be free of am biguities. If th e docum ent to be processed includes m ath em atical

concepts, th e am biguity-free requirem ent does not allow th e representation of different
concepts by syntactically overloaded symbols.
4.5
Encoding M athem atical Concepts
M athem atical concepts need to be encoded in some form in order to be m anipulated.

The conventional m ath em atical n o tatio n is, m ost of th e tim e, th e first encoded form
of these concepts we are exposed to. A lthough this general-purpose no tatio n has been
th e p rim ary tool used for th e teaching of m athem atics, it is not an ad eq u ate n o tation
to support th e electronic com m unication of th e concepts.
As a visual system , th e conventional m ath em atical no tatio n relies not only on a set
of symbols as a way of representing concepts, b u t it also makes use of sp atial a r
rangem ents, variations of bo th font size and type, and o th er visual m arkers to aid
th e representation of inform ation. These visual m arkers provide an efficient way to
represent a com plex set of constructs by m eans of a lim ited set of symbols. This
characteristic is illu strated by th e following two examples:
Exam ple 1: T he convolution of two functions could be defined as follows:

If
(/(/.)] = F{ s )
and
[(<)] = G()
then th e inverse product F ( s ) G ( s ) can be obtained in term s of f ( t ) and g(t ) bv the
expression
t
_ [F(.s)G(s)] = J f ( x ) g { t - x)dx
o
1
In th e exam ple above th e change from lower case to u pper case letters has been used
to indicate th e dom ain change from t to s. T he syntax used enforces th e fact th a t
F( s ) is ju s t a different in terp retatio n of function / ( / ) . The Laplace tran sfo rm ation as
well as its inverse are represented by th e character C which is th e character L typeset

in a different way. T he integration equation has its upper lim it t placed above its
39
lower lim it to inform th e reader where th e operation sta rts and ends.
E x a m p le
: T he m atrix equation
Lx = m
represents a system of linear equations and has

x = L _1m
as a solution. T he linear equation
lx m
w ith x and m as real num bers, has

x = r lm
as a solution.
A lthough b o th solutions are obtained by m eans of tak in g th e inverse of th e object

th a t prefixes th e variable we want to solve for, and then m ultiplying this result by the
object on th e right side of th e equality, th e sem antics attach ed to these operations is
not th e same. This fact is represented by th e use of upper case and bold face type in
th e m atrix equation.
T he necessity of representing m athem atics by m eans of encodings th a t su p p o rt elec
tronic com m unication of th e concepts, has m otivated th e creation of other notations.
P erhaps th e m ost intuitive approach is to m ap all dim ensions involved in th e stan d ard
representation of th e concepts into a single dim ension. A lthough conceptually trivial,
this linearization procedure allows th e com plete dom ain to be input into com puter
system s. O ne relevant aspect of th is approach is th e stru ctu re used for cap tu rin g the
m eaning of th e m ath em atical concepts. Such stru ctu re should supply th e au th o r w ith
th e necessary m eans to encode not only all existing concepts, b u t it should also be
capable of su p p o rtin g th e encoding of concepts proposed by th e author.
40
4.6
Environment M odifications
The fact th a t th e a u th o r relies on th e interface to obtain th e behaviour defined bv

th e core as proposed in C h ap ter 3, may be used to represent docum ent au th oring
environm ents by th e following pair
V = (S, I)
(4.1)
where V , S , I are docum ent au th o rin g environm ent, docum ent instance stru ctu re and
system s interface respectively. This representation may be viewed as a refinement of
th e framework proposed in Section 3.4 to address th e details involved in th e S Y S T E M
com ponent. For com puter-based docum ent au th o rin g environm ents, this sta te needs
to be fu rth er decom posed in order to isolate th e o perating system s services from the
behaviour provided by th e docum ent stru ctu re. Figure 4.1 illustrates th e framework
FRAMEWORK
SYSTEM
OPEATOIG
SYSTEM
output
USER
taik
articalatif
input
Figure 4.1: Fram ew ork for docum ent auth o rin g environm ents.
proposed in C h ap ter 3 where th e S Y S T E M com ponent has been modified to support
th e proposed refinement. In this case core has been replaced by two lower level s ta te s 1,
th e o p eratin g system and th e docum ent stru ctu re.
Consider, for instance, a com puter-based au th o rin g environm ent Io = (So, Io) such as
th e one defined in Table 4.2. In th is case S represents th e plain TgX m acro package
0
and Io is th e com plete interface p a rt of th e environm ent.

d e t a ils of the communication between these two states which are irrelevant to the present
discussion have been omitted.
41
The replacem ent of th e keyboard device by a m ouse/display pair, for instance, would
require articulation, input and perform ance to be redefined. A lthough th e a u th o r may
acknowledge a significant am ount of change due to th e mouse pointing and clicking
actions th a t replaced th e typing form of m anipulation, th e basis of th e docum ent
stru ctu re has not been modified. T he resulting environm ent can be represented as
^ i = (So i A ) w here I\ is th e modified interface. Replacing of th e plain T^X m acro
package by DT^X, for instance, will not have any effect on oth er p arts of th e environ
ment besides th e docum ent stru ctu re. This m eans th e au th o r will use th e keyboard
for in p u t, b u t is now required to have knowledge of fXT^X to express h is/h e r ideas.
This environm ent is represented by V2 = (Si, To) where S i is a docum ent stru ctu re
based on th e Dlj^X m acro definitions.
Different docum ent au th o rin g environm ents may therefore be obtained by th e follow
ing three approaches. One can either:
1
. m ain tain th e docum ent stru ctu re and modify th e system 's interface, or
. m ain tain th e system 's interface and modify th e docum ent stru ctu re, or
3. modify both.
4.7
Changes in the Interface
E nvironm ent m odifications as discussed in th e previous sections do not include the

reasons why th e changes were considered. This problem is approached, in this section,
by exam ining w hat m otivates changes in th e system 's interface.
U ser-com puter interfaces can be viewed as facilitators which provide services to users.
These services are stru ctu red according to th e characteristics of th e inform ation ob
tain ed by users as th e result of com m ands executed on th e interface objects. T he
services may include inform ation which is directly available through th e functionality
provided by th e operatin g system . They may also involve concepts defined by the
s tru ctu re of th e application, in which case, th e user interacts w ith th e application and
th e o p eratin g system is viewed as a m ediator. In bo th scenarios, th e dialogue between
user and com puter may be stru ctu red according to th e way in teractio n resources are
2
organized.
2 According to [74] interaction styles are key-modal, direct-manipulation and linguistic.
42
In cases w here th e o p eratin g system is a m ediator, it is possible to represent th e ser

vices provided by th e application as interaction objects. A lthough th e ap p lication's
functionality may be preserved by this procedure completely, th e user may not be able
to access to th e stru ctu re of th e application. This side effect is som etim es intentional
since hiding th e internal stru ctu re may im prove th e use of th e application for inex
perienced users. As an exam ple, consider an au th o rin g environm ent which has plain
T^jX as docum ent stru ctu re and uses a k eyboard/display arrangem ent as interaction
device. T he au th o r in this case is forced to directly m anipulate th e objects defined
by th e TjX m acro package. T he replacem ent of this type of interaction by one based
on a m ouse/graphical display com bination w ith th e necessary m acro package objects
stru ctu red as sets of icons, for instance, would allow th e use of th e package w ith no
other knowledge besides th e m an ipulation of th e interaction devices. M odifications
to th e sy stem s interface such as this are usually perform ed as an a tte m p t to improve
th e usability of th e docum ent au th o rin g environm ent.
4.8
Recom m endations
In the previous sections th e basic characteristics which docum ent environm ents should
have in order to su p p o rt th e au th o rin g of m ath em atical concepts have been discussed.
These qualities are presented in term s of properties and indicate possible software
design approaches th a t may be considered in order to achieve them . Ideal docum ent
auth o rin g environm ents are viewed as software system s which su p p o rt th e properties
listed in Table 4.3.
4.9
Summary
T he framework for interactive system s proposed in C h ap ter 3 was extended through

a refinem ent of th e S Y S T E M state. T he m odification introduced a lower level of ab
stractio n com posed of two states. This approach aim s a t a separation of functionality
between th e o p eratin g system and th e docum ent stru ctu re. A set of properties which
may be used to assess th e quality of docum ent au th o rin g environm ents designed to
sup p o rt th e representation of m ath em atical concepts has also been introduced.
43
PRO PERTY
D ESIGN A PPR O A C H
High Conciseness
High Expressiveness
A m biguity-freeness /
M ultim odality
E xtensibility
L ayer/processor addition to existing

docum ent stru ctu re definitions.
Im prove in teraction style.
Scope enhancem ent by the use of m eta-stru ctu res
and extensibility operations.
Enforcem ent of syntactically unique representations
by th e creation of dom ains.
In tro d u ctio n of operations to u p d ate
th e docum ent stru ctu re.
Table 4.3: D ocum ent au th o rin g environm ent characteristics and software design ap
proaches to help achieving them .
44
Chapter 5
M athem atical C onstructs and their
R epresentation
D ocum ent au th o rin g is an increm ental activ ity in which a set of in term ediate (draft)
versions of a docum ent are produced by th e a u th o r prior to th e creation of th e final
one. Any given version of a docum ent, except th e first one, may therefore be viewed
as th e result of an u p d ate of th e previous version of th e docum ent.
A uthoring docum ents th a t contain m athem atics or au th o rin g m athem atics for short,
is b o th increm ental and dynam ical. It is during th is activity th a t th e au th o r makes
explicit th e syntax th a t will represent th e m ath em atical concepts included in a given
version of a docum ent. T he design of docum ent stru ctu res to su p p o rt these char
acteristics m ust therefore include m echanism s to m anage b o th th e u p d ate and the
m eaning-to-syntax bindings determ ined during authoring.
This ch ap ter introduces th e notion of using CFG s as a m ajor form alism to support
th e dynam ics of au th o rin g m athem atics.
It discusses th e use of CFG s as a tool
to cap tu re th e sem antics of m ath em atical concepts by means of user-defined syntax

th a t can be proposed d uring authoring. T he lim itations CFG s have in su p p orting
docum ent stru ctu res th a t allow u p d a te are also addressed and an overview of the
solution proposed by this thesis to approach these lim itations is presented.
A set
of exam ples illu stratin g th e possibility of using C FG s to cap tu re th e sem antics of

m ath em atical concepts is provided.
45
5.1
N otational System s as Languages
A no tatio n al system uses a set of symbols to describe q u antities and ideas and it
is used as a su p p o rtin g mechanism for th e expression of ideas. A program m ing lan
guage is a special n o tatio n al system designed to solve problem s in a p a rticu la r dom ain.
This characteristic often establishes th e set of basic constructs th a t will provide the
language w ith th e necessary power to approach th e tasks in th e specified dom ain.
Language co nstructs are generally stru ctu red around statem en ts, and these pro gram
m ing statem en ts are, m ost of th e tim e, characterized as block statem en ts, flow control
statem en ts, expressions, and declarations.
This way of stru ctu rin g th e design of a program m ing language leads to th e idea th a t
th e language can be defined as a set of basic m odules th a t can be com bined to generate
oth er m odules. T he task of a m odule design may be accom plished through th e use of
a C ontext-Free G ram m ar, which will th ereafter be referred to as CFG in this thesis.
CFG s have been used as a m ajo r tool for th e specification of program m ing languages.
The im plem entation independence of this approach, provides th e designer w ith the
flexibility to work on th e developm ent of a language w ithout th e need to be concerned
w ith im plem entation details. P rogram m ing languages often need to be m apped into
other dom ains in order to b e tte r respond to th e user processing requests. Com pilers
are well known tools th a t su p p o rt th e tran slatio n of language definitions into other
forms.
CFG s are, in this thesis, viewed as a b strac t ty p e definitions, and sentences belonging
to th e g ram m ar as variables of th a t type. This idea is supported by th e fact th a t,
given a set of basic ty p e definitions or a set of CFG s, other definitions can easily be
produced by th e m anipulation of th e rules already defined. T he parsing process of a
com piler can therefore be interpreted as a ty p e checker which only verifies w hether
a given variable (a sentence) belongs to th e set provided by th e type definition (the
gram m ar).
This analogy can be fu rth er extended to include abstractio n s such as
th e possibility of reuse of well defined gram m ars in th e design of o th er program m ing

languages.
A lthough some no tatio n al system s are not designed to support program m ing, they can
be stru ctu red in a way sim ilar to program m ing languages. T he stan d ard m ath em atical
no tatio n system is one exam ple of such systems.
4G
5.2
Standard M athem atical N otation Characteris

tics
The representation of m athem atics by a finite set of symbols imposes restrictions on

th e n o tatio n used. In th e following section, th e im plications of this lim itation are
addressed, and th e need for a form of representation based on sem antics is discussed.
The field of m athem atics is com posed of a collection of subfields or dom ains. T he
various branches of science often make use of these subfields as su p p o rtin g tools to ex
press th e ir ideas. For instance, th e form al presentation of some electrical engineering
concepts is su p p o rted through th e use of calculus.
To develop an un d erstan d in g of th e m ath em atical notation, classes of m ath em atical
concepts can be defined, and a triv ial one-to-one m apping between these classes and
th e subfields can be established. A b stract m ath em atical constructs are m apped onto
concrete symbols in order to provide hum ans w ith th e representations necessary to
com m unicate m ath em atical ideas as well as concepts. T he m ath em atical n o tatio n
can therefore be viewed as a language used to describe th e ab strac t concepts.
D espite th e fact th a t hum ans depend on concrete objects for sharing th eir knowl
edge, all m ath em atical com putations rely on th e ability to m an ip u late th e ab stract
concepts involved. Like n atu ral languages, th e m ath em atical notatio n has its basis in
a dynam ic process where an ab strac t idea can be represented by different language
constructs, and th e inform ation conveyed by a p articu lar language construct may
relate to different ab strac t concepts. This m any-to-m any m apping between ab stract
concepts and language constructs characterizes this dynam ic process as am biguous
and incom plete. Therefore any p articu lar ab strac t m ath em atical concept is said to
be represented by a n o tation co n stru ct if th e parties involved in th e inform ation ex
change have previously agreed on th e notatio n defined for th e concept. This leads to
th e conclusion th a t this representation process not only is unstable, b u t also imposes
th e ch aracteristic of being locally redefined. T he derivative of v w ith respect to t.
for instance, is a good exam ple of th e representation am biguity of a m ath em atical
concept. E ith er w, v' or ^ could be chosen to illu strate th e concept1.
T he representation of various m ath em atical concepts is usually accom plished bv over'T he form of attachment where one concept is accessible by more than one reference is here
denoted as aliasing.
47
Representation
derivative
dv'dt
conjugate
complement
Figure 5.1: M anv-to-m any relationship between m athem atical concepts and their
representation.
loading m eaningful symbols. T he arith m etic m ean, th e conjugate of a com plex num
ber as well as th e com plem ent of a boolean expression are well known concepts th a t
are often represented by placing a horizontal b a r over a variable nam e. For instance,
variable v could be chosen to represent all th ree concepts. It is clear th a t context
has to be included in any a tte m p t to com m unicate m ath em atical concepts.
It is
during au th o rin g th a t th e relationship between m athem atical concept and concept

representation is available for m odification. Selecting a p articu lar syntax represen
ta tio n may therefore not only determ ine th e m eaning of a concept b u t it may also
indicate th e dom ain where th e concept is defined. Figure 5.1 illustrates th e m anyto-m any relationship between m ath em atical concepts and th e ir representations. T he
representation flexibility illu strated by th e exam ples presented restricts th e m anipu
lation and un d erstan d in g of th e m ath em atical n o tatio n to users who share a com mon
und erstanding of th e term inology applied.
As science progresses, th e new ideas proposed, as well as th e necessary su p p o rt
ing assum ptions, need to be fully described. This condition places th e extensibility
requirem ent on th e n o tatio n used to express th e results obtained. M athem atical sym
bols will need to be provided in order to precisely describe th e new concepts and new
syntax may therefore need to be introduced as a way of avoiding am biguities. E x ten
sibility is frequently used in m ath em atical n o tatio n to eith er locally define symbols
or to represent new concepts. T he following scenario illustrates this characteristic.
Consider th e scaling of a plane defined by th e following statem en t:
48
Assume (x, y, z) are given C artesian coordinates. We now let (x, y , z) be new coordi
nates w here x = Ax, y = Ay, z = \ z an d A is a positive scalar constant.
In th e context described, x, y and z are neither com plex conjugates, th e com plem ents
of boolean expressions, nor th e m eans. A new in terp retatio n has locally been pro
vided to th e variables. T he extensibility characteristic of th e m ath em atical no tation
increases th e level of com plexity involved in cap tu rin g the sem antics of th e concepts
presented.
T he representation of m ath em atical n o tatio n can be achieved bv eith er a p resenta
tional approach, in which th e visual ch aracteristics of th e symbols used in th e n o tatio n
are em phasized, or by a sem antic approach, where ab stract concepts are used as a
basis for th e representation. T he presentational approach was introduced during the
early stages of com puters. T y p esettin g system s like n roff/troff as well as TX are ex
am ples of such system s. A lthough b o th system s provide stable d a ta representations,
they lack th e necessary features to be used as a basis for th e representation of d a ta
in forms o th er th a n tex t. In contrast, as argued in [11], a n o tatio n al approach based
on th e m eaning of symbols, th a t is, based on th e sem antics of th e concepts is needed.
One of th e difficulties presented by th e representation of m ath em atical expressions by
th e ir contents is to cap tu re th e m eaning of th e concepts. A nother way of expressing
this ch aracteristic is to cap tu re th e m eaning which has been associated w ith a given
set of symbols in case th e concepts have already been encoded as these sym bols for
com m unication. For this reason th e representation of m ath em atical concepts by the
sem antic approach has not yet been im plem ented in totality.
5.3
Capturing the Semantics of M athem atical Con

cepts
T he use of C FG s as a form alism to su p p o rt th e cap tu rin g of m ath em atical concepts is

discussed in th is section. B oth its advantages as well as its lim itations are addressed.
49
5.3.1
M athem atics and D ocum ent Authoring
The rep resen tatio n 's lifetim e of a m ath em atical construct in a docum ent may be char
acterized as a variable th a t denotes a locally pre-established relationship between the
ab strac t concepts involved and a user-defined interp retatio n .
S yntactic constructs
may tem porarily be bound to specific m eanings as th e result of a process led by the
au th o r of th e docum ent in order to com m unicate h is/h e r knowledge. Therefore this
context-dependent binding process is th e mechanism the au th o r has to express infor
m ation by m eans of a finite set of symbols. By fixing an in terp retatio n for a given
syntax for a period of tim e, th e a u th o r expresses h is/h e r knowledge at th e possible
cost of introducing symbol overloading and syntax am biguity. This process may be
2
interpreted as context switching, w here th e a u th o r has th e power to assign different

in terp retatio n s to th e set of symbols used for th e representation of th e m ath em atical
concepts. Therefore am biguities introduced by th e editing procedure have th e au
th o rs approval and control. T hey are p a rt of th e docum ent because they express the
result of an already accepted form of representation.
The fact th a t th e a u th o r is allowed to atta c h different m eanings to syntactical stru c
tures, adds a com plex com ponent to th e problem of cap tu rin g th e sem antics of m ath e
m atical concepts. This characteristic introduces th e idea of real-tim e docum ent stru c
tu re u p d ate. This m eans th e stru c tu re of th e docum ent is modified during au th oring
to include ad eq u ate syntax to cap tu re th e sem antics of m ath em atical concepts. For
this reason a sem antics-based docum ent au th o rin g model is necessary. This form of
auth o rin g docum ents is form ally defined in C h ap ter
and it will th ereafter be referred
to as dynam ic authoring.
M odeling stru ctu res to su p p o rt dynam ic au th o rin g requires m echanism s to support
th e sem antics cap tu rin g of th e m ath em atical concepts. This tran slates to th e need of
addressing not only d a ta representation issues, b u t it also indicates th a t th e context
in which m ath em atical concepts are represented need to be considered.
The docum ent u p d ate notion, im posed by th e dynam ic au th o rin g model, establishes
th e necessity of well-defined m echanism s for b o th accessing and m odifying th e stru c
tu ra l base upon which th e docu m en ts syntax and sem antics are represented. In the
2In this context, symbol overloading is viewed as part of an incremental updating process where
existing connections between mathematical concepts and syntactical constructs are modified. The
modification process either establishes or keeps a many-to-one relation between mathematical con
cept and syntactical representation.
50
case of having a g ram m ar as th e su p p o rtin g stru ctu re for cap tu rin g th e sem antics of
m ath em atical concepts, a m odification of either th e syntax used for th e representation
of concepts3, or th e introduction of a new construct, will require an u p d ate process
in which th e related gram m ar definitions will need to be ad ap ted according to the
m odifications proposed.
It is d uring th e au th o rin g activity th a t syntax is bound to concepts. In th e event
th a t am biguities are introduced by symbol overloading, au th o rin g m echanism s can
be provided to resolve all context-dependent representations which, according to the
auth o r, need to be included in th e docum ent.
5.3.2
CFGs and D ata Types
T he ty p e of m apping between m ath em atical concept and g ram m ar representation

determ ines th e degree of dependence between th e two dom ains. If this dependence
is established by a one-to-one m apping (every m ath em atical concept is captured bv
an isolated g ram m ar definition), then m odifications proposed by th e a u th o r will be
reflected only in th e production rules involved in th e definition of th e concepts m a
nipulated. This organization is su pported by th e software engineering principle of
separation of concerns which approaches a com plex problem by co ncentrating on
each individual aspect of th e problem one a t a tim e [48]. T he m odularity necessary

for th e application of th is principle is obtained by assigning th e set of production
rules th a t define a given m ath em atical concept a unique context-free gram m ar. In
this thesis these stru ctu res are referred to as g ram m ar fragm ents, m odules or sim ply
fragm ents.
A lthough a m odule is viewed as a syntactic concept which only affects th e wav in
which software te x t is p artitio n ed [72], sem antic restrictions on th e associated text
may be used as criteria for m odularization. For instance, for m ath em atical concepts
sharing th e sam e syntactical stru ctu re when presented according to th e conventional
m ath em atical notatio n , th e ir sem antic content is th e only characteristic which may
be used to identify them . Therefore m ath em atical concepts w ith different sem antics
3This may seem contradictory since the semantic characteristics of a concept are not affected by
the form in which it is rendered. The semantic information attached to the standard visual presen
tation of a concept will, most of the time, be included during the associated capturing procedure.
A typical illustration of this characteristic is the juxtaposed multiplication in polynomials which is
discussed later in this chapter.
51
and th e sam e syntax are understood as d istin ct objects. For th is reason they should
be trea ted separately bv m eans of th e ir own gram m ars.
One advantage of having C FG s as th e fundam ental stru ctu re to cap tu re th e m eaning
of m ath em atical concepts, is th e flexibility this mechanism provides in su p p orting
both th e design and recognition phases of th e cap tu rin g activity. T he design phase
is characterized by th e assignm ent of g ram m ar fragm ents to m ath em atical concepts.
D uring recognition, th e in p u t provided by th e au th o r is su b m itted to th e analysis
com ponent of th e associated language processor. At this stage, th e input is encoded
as tokens and its syntactical stru c tu re is m atched against th e related set of production
rules th a t has been provided d uring th e design.
A nother way th e recognition phase may be viewed is as the execution of a m em bership
verification perform ed by th e analysis com ponent. For this in terp retatio n a C FG is
equivalent to a data type or ju st type and each valid input is an instance of th e type.
This association is consistent w ith th e notion of ty p e provided by [72]. T he d a ta type,
in this case, is represented by th e s ta rt sym bol of th e CFG.
T he organization proposed in th is section, merges th e notions of m odule and type
by using C FG s as static stru ctu res to su p p o rt th e sem antics cap tu rin g requirem ent.
One benefit of organizing m ath em atical concepts as sets of g ram m ar fragm ents or
m odules is th e possibility of using b o th decom position and com position as aids to the
stru ctu rin g process.
A lthough is is possible to cap tu re th e m eaning of m athem atical concepts by m eans
of sta tic stru ctu res such as CFG s, this approach presents lim itations. O ne im p o rtan t
lim itatio n is th a t C FG s only su p p o rt th e definition of docum ent interchange form ats.
This m eans C FG s do not su p p o rt th e fundam ental requirem ent th a t au th o rin g m a th
em atics is a dynam ic activity in which th e bindings between m eaning and syntax are
established by th e a u th o r while m an ip u latin g th e docum ent. A discussion involving
this characteristic is presented as follows.
5.3.3
CFG Lim itation to Support Authoring M athem atics
This subsection illustrates th e lim itatio n C FG s have in su p p o rtin g th e sem antics cap
tu rin g of m ath em atical concepts. For this purpose consider, for instance, auth oring
52
a docum ent which includes an expression involving th e addition of integers such as

1+0 = 1
(5.1)
T he m eaning of expression (5.1) may be cap tu red by th e C FG rules in Table 5.1.

This m eans + is th e ad dition of integers, 1 and 0 are integers and = is equality.
add
equality
left_expr
left.expr
right_expr
integer
left_expr = right_expr
left.expr + right_expr
integer
integer
1 1o
Table 5.1: C FG rules for addition of integers
and
Assume th e a u th o r decides to u p d ate th e current version of th e docum ent by including

an o th er expression.
This expression contains th e Boolean O R operation which is
represented by th e + symbol, and T R U E and FALSE values represented bv integers

1 and 0 respectively. An exam ple of such expression is
1+ 0 = 1
(5.2)
T he syntax of expression (5.2) can be cap tu red by th e g ram m ar in Table 5.1. However
its sem antics cannot. This is because th e au th o r has determ ined th a t th e context in
which th is syntax is valid has changed. O perations on integers have been replaced by
operations on Booleans, 1 m eans T R U E and 0 m eans FALSE.
CFG s provide no m eans of u p d atin g th e ir production rules. Therefore a docum ent
stru ctu re based on this form alism has to include a mechanism to su p p o rt th e ability to
respond to au th o rin g requests aim ed at th e creation of context-dependent m eaningto-syntax bindings.
5.3.4
U pdating CFGs
This thesis approaches th e sem antics cap tu rin g problem by m eans of an organization
th a t is based on sets of modules.
A stan d ard library is used as a storage facility
53
w here a basic or default set of m odules is placed. T he m odules required by a given

docum ent, at any instance, may be obtained from the set of default ones or from
th e result of a com position procedure th a t may either include only m odules taken
from th e library or, if necessary, com pletely new ones. In this work th e ability to
m an ip u late th e set of m odules in order to u p d ate th e available types is introduced as
th e m eans to produce docum ent stru ctu res th a t support extensibility.
As has been shown previously, docum ent stru ctu res based on static organizations
such as C FG s do not su p p o rt th e dynam ic extensibility as required d uring th e au
tho rin g process. It is during this stage th a t th e au th o r is free to select th e syntactical
arrangem ent necessary to represent each m athem atical concept th a t takes p art in
th e docum ent. M echanism s th a t su p p o rt run-tim e u p d ate are therefore needed when
considering th e design of stru ctu res to handle th e dynam ic au th o rin g of m a th e m a t
ics. For th is reason th e following m echanism s need to be included when considering
th e design of docum ent stru ctu res to m anage th e dynam ical binding of m ath em atical
concepts to syntactical constructs:
1
. increm ental u p d ate, and
. m odule reuse.
C entral to th e effective use of u p d ate as a process to modify d a ta are th e notions of

identity, redundancy control and norm al forms [43]. For CFG s, th e notion of norm al
ization as a process to elim inate u p d ate anom alies requires th e identity verification
of gram m ar rules. This includes verification of bo th th e syntax and th e sem antics
of th e rules. T his requirem ent is necessary because rules th a t have identical syntac
tical stru c tu re do not express th e reasons why they are intended for. T he m eaning
attach ed to nonterm inals on th e rhs of a rule depends on g ram m ar rules th a t define
these nonterm inals. Therefore sem antics of a CFG rule is a concept th a t involves
th e notion of rule dependency. For this reason identical syntactical stru ctu re in CFG
rules does not g uarantee identical sem antics.
E xception to th is characteristic are rules th a t have a single term in al and no nonter
m inals on th e ir rhs. These rules have b o th th e ir syntax and sem antics determ ined by
them selves. S yntactical identity, in this case, determ ines sem antical identity.
A discussion regarding th e involvement of identity, redundancy control and norm al
forms in docum ent stru ctu res th a t use CFG s for th e sem antics cap tu rin g of m a th e m a t
ical concepts is presented in th e rem ainder of this section. T he following subsection
54
provides exam ples to illu strate th e problem s introduced when g ram m ar rules sharing
th e sam e sy n tax are used to cap tu re different sem antics.
5.3.4.1
Identical Syntax and R ule Sem antics
O bjects are identical if they are indistinguishable. This suggests th a t indistinguish

able g ram m ar rules should be viewed as identical. This characteristic is addressed as
follows by an exam ple in which a set of rules is shared by th ree different gram m ars.
Consider, for instance, CFG rules containing nonterm inals in th e ir rhs. This indicates
th a t such rules depend on oth er rules in which th e definitions of these nonterm inals
are provided. For this reason it is som etim es not possible to determ ine th e sem antics
of g ram m ar rules before all nonterm inals have been replaced by term inals. To illus
tra te this characteristic consider, for instance, th e CFGs defined bv th e production
rules in Tables 5.2 and 5.3 . T he fact th a t rules
ca te x p r
3
4
expr
expr
term
integer
and 2 from b o th gram m ars are iden-
expr -I- term

term
integer
|
1
Table 5.2: G ram m ar for ad dition of integers
a d dexpr
3
4
expr
expr
term
character
and 2
expr + term
term
character
a | b
Table 5.3: G ram m ar for concatenation of characters a and b
tical determ ine th a t bo th gram m ars define lists of term s separated by the + symbol.
A lthough they share this characteristic th e sem antics of b o th rules
and 2 depend
on th e inform ation provided by rules 3 and 4. T he derivation of a word such as

for exam ple, as provided in Table 5.4, illustrates th a t rule
+ 2,
from th e g ram m ar in
Table 5.2 defines lists o f integers 1 and 2 that are separated by the + symbol. T he
expr
=+
=>
=>
=>
=>
expr + term
term + term
integer + term
+ term
+ integer
+
1
T able 5.4: D erivation of word 1 + 2.
expr
=>
=>
=f>
=>
=>
expr + term
term + term
character + term
a + term
a + character
a + b
Table 5.5: D erivation of word a + b.
fact th a t integers are being separated by th e + sym bol suggests th a t rule , from this
1
gram m ar, captures th e addition of integers concept.

In a sim ilar way, th e derivation of a word such as a + b, as provided in Table 5.5, for
exam ple, shows th a t rule
from th e g ram m ar in Table 5.3 defines lists of characters
a and b separated by the + symbol. For this case it can be stated th a t rule 1, from
this gram m ar, captures th e concatenation of characters concept.
ca te x p r
3
4
5
expr + term
term
integer | character
expr
expr
term
integer
character
a | b
Table 5.6: G ram m ar for operations on integers and characters.
T he C FG defined by th e rules in Table 5.6 combines rules 3 and 4 from b o th gram m ars
defined in Tables 5.2 and 5.3. Table 5.7 illu strates th a t th e derivation of a word
a +
, for exam ple, determ ines th a t rule
from th e gram m ar in Table 5.6 defines
56
expr
=>
=>
=>
=>
=>
expr + term
term + term
character + term
a + term
a + integer
a -I2
Table 5.7: D erivation of word a + 2.
lists o f characters a and b a n d / o r integers 1 and 2 separated by the + symbol. T he
in terp retatio n for th e + symbol is not determ ined because th e sem antics attach ed to
this sym bol cannot be expressed by th e gram m ar in Table 5.6.
The com plete in terp retatio n for th e + symbol, in this case, is not provided by the
gram m ar rules as it had been for th e previous two scenarios. T he reason for this is
because th e sem antics attach ed to this sym bol cannot directly be expressed by the
g ram m ar in Table 5.6. A dditional inform ation, in this case, is necessary in order to
specify how integers and characters are to be processed by th e + operator.
The g ram m ars presented in this subsection illu strated th e possibility of one rule being
used to express several different sem antics. As has been shown, one C FG rule may
be applied to express m any sem antics. It is also possible to have th e sem antics of
a single concept cap tu red by different C FG rules. This ch aracteristic is discussed in
th e following subsection.
5.3.4.2
Redundancy, Syntax Equivalence and Norm al Forms
In relational databases th e idea of redundancy is related to th e notions of identity,

functional dependency and norm al form [43]. As a property of th e sem antics of the
a ttrib u te s, functional dependency expresses relationships am ong a ttrib u te s. T here
fore it depends on a value-based notion of identity. Functional dependency is used in
determ ining th e presence of redundancies in d atab ase schemas.
A norm al form is a schem a which has desirable u p d ate properties and does not contain
certain types of redundancies. A norm alization is a process th a t breaks down u n sa t
isfactory relation schemas according to norm al forms criteria. T he schemas generated
through norm alization are therefore said to be norm alized.
As static stru ctu res, CFG s do not su p p o rt dynam ic authoring. For this reason exter
nal m echanism s need to be designed in order to u p d ate th e set of gram m ars involved
in th e m odifications proposed d uring authoring. Effective up d ates of these stru ctures
require bo th th e identification and control of redundant definitions.
CFG redundancy is, in th e context of this work, defined in term s of g ram m ar rules.
T he exam ples presented in th e previous subsection illustrated th a t CFG rules th a t
have identical syntax may be used to express different sem antics.
of this thesis, such rules are considered redundant.
In th e context
This form of redundancy will
th ereafter be referred to as redundancy by syntax identity.

This ty p e of redundancy can be detected by strin g com parison. Its control can be
obtained by a g ram m ar u p d ate procedure th a t
. creates a new gram m ar for th e red u n d an t rule, and
. elim inates this rule from th e g ram m ar where it was identified.
A nother form of redundancy occurs when th e sam e sem antics is expressed by differ
ent CFG s. G ram m ars in this case differ due to nonterm inal renam ing (isom orphic
gram m ars). T his form of redundancy will th ereafter be referred to as redundancy by
syntax equivalence4.
The fact th a t isom orphic gram m ars have different nonterm inal sets implies th a t their
sets of production rules are also different. Since th e s ta rt symbol of a g ram m ar is
interpreted as a type, these gram m ars introduce th e possibility of attach in g different
nam es to a single type definition. A careful analysis, in this case, is necessary to
identify th e scenarios where different types need to be defined. For this situ atio n a
dom ain specification needs to be provided in order to ensure th a t th e type definitions
are unique.
For any a rb itra ry CFG s G x and G 2, it is undecidable [55] w hether L { G X) = L ( G 2).
Therefore th ere is no effective approach to identify redundancies by syntax equiva
lence. In o th er words this form of g ram m ar redundancy cannot effectively be elim i
nated by operations perform ed on th e stru c tu re th a t supports th e sem antics cap turing
4Isomorphic grammars produce equivalent abstract syntax trees for all words in the language they
generate, therefore the same semantics is always expressed by them. For this reason redundancy by
syntax equivalence is only identified if the grammars involved are isomorphic.
58
of th e m ath em atical concepts. For th is reason th e idea of norm alization as a process

to remove u p d ate anom alies is not considered.
W hen cap tu rin g th e sem antics of m ath em atical concepts by m eans of C FG s, the
term inals of th e g ram m ar are associated w ith th e nam es of th e concepts.
Hence
th e set of dependencies th a t exist am ong th e term inals of th e gram m ar describes

concept dependencies which have been expressed by th e rules of th e gram m ar. C hap
te r
introduces th e notion of gram m atical dependency which is a form of expressing
relationships am ong g ram m ar rules. This ch aracteristic is applied to express th e de

pendencies which exist am ong th e term inals of a gram m ar. T he knowledge of these
dependencies identifies sets of gram m ars which may contain redundancies.
As it has already been illustrated in Subsection 5.3.4.2, the syntax and portions of the
sem antics of m ath em atical concepts can be captured bv C FG rules. This approach
will lead to th e creation of a set of gram m ars which will be used to su p p o rt the
representation of th e concepts th a t will take p a rt in a given instance of a docum ent.
T he creation of gram m ars may be accom plished either by m eans of ed iting procedures
or they may be generated as th e result of operations th a t involve o th er gram m ars.
For b o th scenarios, th e creation process will be simplified if an ad eq u ate g ram m ar
form at is im posed. This form at is proposed in C h ap ter
as a norm al form for CFGs.
Two different im plem entation aspects will benefit from this norm al form. They are:
th e sem antics cap tu rin g of m ath em atical concepts and
th e com position of g ram m ar fragm ents.
T he need of a norm al form for th e gram m atical stru ctu re used in th e sem antics cap
tu rin g process is to avoid definitions w here th e nonterm inal arrangem ent on th e right
hand side of th e production rules hides th e m eaning of th e concept to be captured.
T his problem is solved by th e adoption of a set of tem plates th a t will enforce the
construction of th e production rules in a p articu lar way in which th e m eaning of the
m ath em atical concepts could correctly be captured. These tem p lates are th e sm allest
stru ctu ra l com ponents th a t are allowed in th e cap tu rin g of m ath em atical concepts
by CFG rules. As restrictions on g ram m ar rules they establish th a t th e cap turing
approach may need to decom pose th e ab strac t concepts.
This is necessary to en
sure th a t concept com ponents are captured by gram m ar rules th a t follow th e form at
defined by th e tem plates.
59
T he com position process, by which g ram m ar fragm ents may be com bined to pro
duce o th er definitions, should be free of any inform ation th a t is not necessary for
th e successful com pletion of th e desired g ram m ar arrangem ent. This m eans th e com
position process should not introduce definitions th a t carry redundant inform ation.
T he following sections discuss th e possibility of cap tu rin g th e m eaning of ab strac t
m ath em atical concepts by m eans of CFGs.
5.4
Representing Polynom ials
T he idea of expressing m ath em atical concepts as language fragm ents is used here
as an aid to cap tu re th e sem antics of m athem atics concepts. W ith this technique,
th e definition of m ath em atical concepts which co n trib u te to th e definition of other
concepts can be isolated and approached by g ram m ar fragm ents.
A com position
process will la ter com bine all necessary g ram m ar fragm ents as a way of representing
com plex m ath em atical concepts.
As it is com posed of ab strac t concepts, m athem atics needs to be encoded in order to
be com m unicated. T he encoding proposed by th e conventional m ath em atical n o ta
tion is a representation form at th a t is usually used for com m unicating m athem atics.
A lthough this no tatio n is used to su p p o rt th e discussions on th e cap tu rin g proce
dure th is thesis proposes, it is im p o rtan t to em phasize th a t m athem atics is com posed
of ab strac t concepts. For this reason encoding strategies are needed to sup p o rt the
m anipulation of these concepts. For instance, a discussion involving a polynom ial
is simplified when this a b strac t concept is encoded according to th e stan d ard m a th
em atical notatio n . Consider, for exam ple, th e following identity expression, which
displays th e elem ents of a polynom ial as its right h an d side term .
k = abc + a 2b2c + . . . + anbncn
(5.3)
In order to cap tu re th e m eaning of equation (5.3) by C FG s, th e m eaning of each con

cept th a t is included in this equation can be expressed by a g ram m ar fragm ent. This
indicates th a t g ram m ar fragm ents for equality, juxtaposed multiplication, addition,
pow er and additiom ellipsis operations need to be supplied.
It is im p o rtan t to observe th a t expression (5.3) used th e ellipsis (continuation dots)

o peration to express th e repetitive addition of polynom ial term s.
This and_so_on
60
ab strac t concept m eans th a t th e ad dition p a tte rn th a t started w ith th e first term of

th e polynom ial continues and stops when th e last term is reached. For this reason
this o peration is captured as a addition-ellipsis5 binary operation.
9i
term
other
ju x tap o sed
first
second
th ird
power
first ju x tap o sed other

second ju x tap o sed th ird
e
a power
b power
c power
superscript identifier
Table 5.8: C FG fragm ent for expressing words from G.
Consider th e right hand side of expression (5.3) where th e polynom ial is defined. One
possible way to express this as g ram m ar fragm ents is to consider each term of the
polynom ial as a word from th e language G { a kbkcfr | 2 < k < n } U {abc}. Table 5.8
92
polynom ial
polyexpr
polyexpr
polyexpr addition.ellipsis term

polyexpr addition term
term
Table 5.9: CFG fragm ent for expressing addition_ellipsis and addition operations.
illustrates one possible g ram m ar fragm ent th a t recognizes th e words defined by G.

The g ram m ar displayed there captures bo th th e juxtaposed multiplication and the
pow er concepts. T he addition-ellipsis and addition operations may be captured by
th e g ram m ar in Table 5.9.

In order to com pletely express equation (5.3) by C FG s, th e equality concept needs to
be considered. Table 5.10 provides a g ram m ar fragm ent th a t captures th is concept.
T he com position of th e th ree gram m ars displayed in Tables 5.8 to 5.10 produces a
C FG th a t recognizes equation (5.3).
5The expression 1 < 2 < 3 < . . . < 5001 states that each integer from 1 to 5000 is less than its
successor. The an dL so-on concept for this scenario abstracts the notion that the logical condition
le ss th a n that applies to the first pair of integers continues until the last pair is reached. For this
situation the a n d .s o .o n operation would be captured as le s s .th a n -e llip s is binary operation.
61
93
expression
leftside
rightside
leftside equals rightside

ID E N T IF IE R
polynom ial
Table 5.10: C FG fragm ent for expressing equality operation.
A lthough G has been used to list all term s of th e polynom ial equation, its words may
also be applied to represent o th er m ath em atical concepts. Consider, for instance, the
field of form al languages. In this case akbki* is viewed as a strin g of characters. The
sem antics cap tu rin g process therefore should be based on th e notion of considering
literal strings of characters as th e syntactical stru ctu re to be processed. A strin g such
as a
is therefore in terpreted as th e concatenation of a w ith itself which generates
aa. To cap tu re th e m eaning of th e words in G for this in terp retatio n a m echanism to
represent th e concatenation of a k w ith bk concatenated with & needs to be provided.

For this reason, either ak, bk or c* is to be recognized by a stru ctu re th a t accepts the
concatenation of a character w ith itself k tim es.
As illu strated by th e two different in terp retatio n s associated w ith th e syntax defined
by th e words in G , th e context in which concepts are expressed m ust also be taken
into consideration. As different in terp retatio n s may be attach ed to any given syn
tax, a strateg y to resolve th e syntactical am biguities introduced needs to be defined.
T he needed m echanism should be capable of su p p o rtin g th e cap tu re of all possible
in terp retatio n s associated w ith th e syntax considered.
E xponents and indexing are concepts used in various fields of m athem atics. These
two concepts are usually described w ith th e su p p o rt of superscripts and subscripts.
T he section th a t follows discusses a g ram m ar approach to cap tu re bo th concepts.
5.5
Representing Subscripts and Superscripts
S ubscripts and superscripts attach ed to literal strings of characters can be viewed

as modifiers th a t carry ad d itio n al m eaning of a symbol. T he sem antics of b o th sub
scripts and superscripts may be captured by considering them as b inary m ath em atical
concepts whose argum ents are th e base and th e su b /su p erscrip t. They have right as
sociativity and th e highest precedence am ong th e o th er concepts.
62
9b
words
words
word
word
index
index
index
words SUB word

word
word SUP index
index
N U M B ER
ID E N T IF IE R
( words )
Table 5.11: CFG fragm ent for subscripts and superscripts.
One possible g ram m ar fragm ent to represent b o th subscripts and superscripts is pro
vided in T able 5.11. T he production rules associated w ith superscripts follow the
rules for subscripts in order to ensure th e correct precedence for bo th operators.
Consider, for instance, th e representation of
w here S is identified by m eans of an
index i which itself has bo th a superscript and a subscript . T he following expression

represents th e variable S in term s of th e subscript and superscript concepts.
S sub(i s u b j ) s u p k
A m ore com plex exam ple is provided as follows to illu strate th e precedence charac
teristics of b o th superscripts and subscripts. T he symbol = is used to represent the
equivalence of th e two forms of representation.
zq
S J = (S sub(i s u b j ) s u p k ) s u p ((z subp) s u p q )
(5.4)
The use of C FG s to cap tu re b o th subscript and superscript concepts do not express

th e context in which these definitions are considered. One way to approach this re
quirem ent is to introduce a scope m echanism to delim it th e context in which concepts
are expressed by unique syntax. C h ap ter
proposes a scope stru ctu re to solve the
syntax am biguity problem created by th e overloading of th e symbols used for the

representation of th e concepts.
63
5.5.1
Overloading Subscripts
The need for stru ctu rin g m ath em atical n o tatio n around a set of dom ains is em pha
sized here. C onsider th e recurrence relation a n+i = 3an, n > 0,ao = 5 which has
an 5(3"), n > 0 as its general solution. Also consider a two dim ensional m atrix
defined as follows:
(5.5)
A =
T he \ 2 th term of th e recurrence relation, a
= 5(312), has th e sam e syntactical
representation as th e elem ent of m atrix A located on th e first row and second col
um n. A lthough these concepts share th e sam e visual form, different in terp retatio n s
are expected depending on th e context in which they are presented. This context is
interpreted as a dom ain or subfield and may be as general as, say, D iscrete M athe
m atics or L inear A lgebra. It may also be specific depending on th e characteristics of
th e concepts involved. By lettin g a
12
be p a rt of a dom ain, th e additional inform ation
necessary is supplied to determ ine its m eaning uniquely. This form of stru ctu rin g
m athem atics, by grouping knowledge into dom ains, will be used as a m echanism to
resolve am biguities in this thesis.
In th e linear algebra dom ain, for instance, th e syntax a,j represents th e operation
th a t establishes th e link which is used to locate elem ents in th e A m atrix.
The
need for an o p erato r to represent th e dim ensional link expressed by th e subscript

used for th e location of m atrix elem ents, can be illu strated by th e fact th a t
10o is
th e 100^ elem ent of a one dim ensional m atrix. T he in terp retatio n associated w ith
& 1 2 3
is not unique. If m atrix B is three dim ensional, for instance, then
6 1 2 3
refers to
one p a rticu la r elem ent in th e stru ctu re. A one-to-one m apping between syntactical
representation and elem ent location is not possible if m atrix B tw o-dim ensional. Two
in terp retatio n s are associated w ith th e syntax >i
2 3
in this case: either an element
located in row 1, colum n 23 or in row 12, colum n 3. This am biguity could be resolved
by th e intro d u ctio n of an o p erato r to determ ine where th e link between th e dim ensions
of th e stru ctu re is to take place. For instance bsub( 1 2 ,3 ) could be used to reference
th e elem ent located in row 12, colum n 3 in m atrix B.
64
5.5.2
Overloading Superscripted Symbols
C onsider th e overloading of bo th + and symbols provided by th e expressions below.
i/i =/++r
(5.6)
r -r
(5.7)
and
/ =
9f
function_parts
p o sitiv e.p art
negative_part
fu n ctio n ed
positive_part negative_part
fu n ctio n ed s u p +
fu n ctio n ed s u p
ID E N T IF IE R
Table 5.12: C FG representation of th e positive and negative p arts of a function.
In th e above expressions / is a function, / + represents th e positive p a rt of / , and

f ~ th e negative p a rt of it [93]. T he sem antics attach ed to th e + symbol in equation
(5.6), indicates th a t this symbol is used to represent two operations and each instance
of it aim s a t th e representation of a different concept. T he superscripted instance
characterizes th e unary postfix o peration of tak in g th e positive p art of a function,
whereas th e b inary infix instance represents addition. T he definition presented in
Table 5.12 illu strates a possible g ram m ar fragm ent to represent bo th th e positive and
negative p arts of a function.
5.6
Representing M atrices
The representation of m atrices is usually done by means of a com bination of upper

and lower case letters. An u pper case le tte r is used to denote th e m atrix itself and the
corresponding lower case le tte r com bined w ith lower case subscripts define b o th its
elem ents as well as th e ir location in th e m atrix. S tru ctu ral concepts such as vectors
and m atrices depend on th e representation of lists since bo th vectors and m atrices
characterize a collection of elem ents organized in a p articu lar way. T he gram m ar
65
fragm ent illu strated in Table 5.13 presents th e necessary rules for th e definition of
m atrices.
Qc
m atrixrule
dim list
dim list
elist
elist
el
size
M ATRIX} dim list ( elist ) }

dim list : size
size
elist , el
el
ID E N T IF IE R
N U M B ER
Table 5.13: C FG fragm ent for m atrices.
T he following system of linear equations represented in m atrix form at is used to

illu strate th e syntax defined by th e rules presented in Table 5.13.
n (;;)=U)
T he syntax enforced by th e rules provided by Table 5.13 is presented below:
M atrix {2 : 2 (3 ,1 ,0 ,3 )} M atrix{2 : (
1
2 1
, : )} = M atrix {2 : 1(4, 5)}

0
where th e sym bol denotes m atrix m ultiplication operation. T he o p erato r , is in tro

duced as a way of representing th e m atrix elem ents as nodes of a hierarchical relation
between th e entries of a m atrix.
T he representation of th e power, inverse and tran sp o se of a m atrix by superscripted
symbols does not carry th e necessary sem antics of each individual operation. For this
reason, it is necessary to represent each one of these concepts bv m eans of its own
sem antics. A lthough th e representation of th e power of a m atrix is characterized by
a binary operation, b o th inverse and transpose are unary operations and are usually
recognized by syntactical stru ctu res in th e postfix form.
M atrices w ith only one row or colum n can also be considered as vectors. T he syntac
tical representation of these concepts is usually obtained by m eans of single lower case
letters ty p eset in boldface font. In th is com pact form of representation, th e o p erato r
is identified not by m eans of symbols attach ed to th e operands, b u t by th e ty pe of
visual representation adopted for displaying th e selected symbol.
66
5.7
Representing Sets of Numbers
The representation of sets of num bers as intervals is frequently used in algebra. For
exam ple (a, b) = {x | a < x < 6} and as [a, 6] = {x | a < x < b}. In this form of
representing num bers, th e delim iters do not always m atch. We illu strate this bv the
two expressions th a t follow.
[a, b) = {x | a < x < b}
(a, 6] = {a; | a < x < b}
9d
1
2
3
4
5
6
7
8
intervaLvar
values
left_value
right.value
left.delim iter
left_delim iter
right_delim iter
right_delim iter
left_delim iter values right_delim iter

left_value , right.value
ID E N T IF IE R
ID E N T IF IE R
[
(
]
)
T able 5.14: CFG fragm ent for intervals.
A possible g ram m ar fragm ent for th e representation of th e four types of intervals is

given in Table 5.14. A lthough th e g ram m ar presented in Table 5.14 can be used to
represent num ber intervals, it is not useful for cap tu rin g th e sem antics of th e concepts
involved. T he fact th a t th e nonterm inals left.delimiter and righLdelimiter are not
uniquely defined suggests th a t production rule 1 reduces to four different sentences.
A nother problem w ith this definition is th a t it requires a parser w ith lookahead greater
th a n three.
T he g ram m ar fragm ent shown in Table 5.15 captures the sem antics of th e stru ctu re.
This g ram m ar is designed in a way th a t each p air of interval delim iter is uniquely
represented by a production rule.
67
9e
interval
interval
o p en Jn terv als
open in te rv a ls
closedJntervals
closedJntervals
open .p a rt
closed.part
body
left.value
right_value
COM M A
RIGHT_CLOSED_PAR
RIGHT_CLOSED_DEL
LEFT_O PEN _PA R
LEFT_O PEN _D EL
o p en Jn terv als
closedJntervals
open_part RIG H T_O PEN _PA R
o p e n .p a rt RIGHT_CLOSED_DEL
closed.part R IG H T .C L O S E D .D E L
closed.part R IG H T .C L O S E D .D E L
L E F T .O P E N J A R body
L E F T .C L O S E D J9 E L body
left.value COM M A right.value
ID E N T IF IE R
ID E N T IF IE R
)
]
(
[
Table 5.15: CFG fragm ent to cap tu re th e sem antics of intervals.
5.8
Representing Sums
T he concept of sum m ation is discussed in this section. B oth am biguity and extensibil
ity problem s associated w ith this operation are illu strated by exam ining its sem antic
characteristics.
Consider th e sum represented by th e expression below.
21 = i
t=i
(5.9)
E qu atio n (5.9) illu strates th e am biguity involved in th e use of th e = symbol, where

its m eaning can either represent the start of a sequence of attributions to variable i , or
the equality between two quantities. A lthough th e syntax m ost com m only associated
w ith th e sum of a sequence of item s includes th e = symbol as a way of expressing

th e iteratio n process, th e sem antics of th e sum m ation construct does not require the
equality o perator. T he concept of sum m ation may be described by an operation on
an expression th a t is evaluated according to a sequence of predefined item s.
One
68
possible form of expressing this is by m eans of th e syntax

(5.10)
S u m { ra n g e J i s t' . e x p r e s s i o n }
which captures th e m eaning of th e sum m ation concept. In E quation (5.10) S u m is a

prefixed binary o p erato r and rangeJist defines th e sequence over which expression is
to be com puted. T he operand rangeJist captures th e m eaning of th e iteratio n p art
of th e sum m ation construct.
The fact th a t it is possible to a ttach a p articu lar in terp retatio n to th e form in which
rangeJist
dering is
is syntactically represented is a problem to be considered whenever ren

to take place.It is intuitive to associate th e =
sym bol w ith
th e iterative
com ponent of th e sum construct, as illu strated in E quation 5.9. However th e idea of
range is m ore m eaningful when this expression is represented as
21=
(5.11)
!<t<6
9i
1
2
3
4
5
6
7
identity _expr
sample_expr
sample_expr
sum
ra n g e Jist
s ta rt
identifier
sam ple.expr = sample_expr

expr
sum
SUM { ran g eJist : sample_expr }
s ta rt , end
identifier = expr
ID
Table 5.16: G ram m ar for sum m ation.
T he g ram m ar fragm ent illu strated in Table 5.16 allows sum m ation constructs, such
as th e one in E quation (5.9) to be described by th e syntax th a t follows.
S u m { i = 1,6; i}
(5.12)
T here are situ atio n s where m ore com plex iteration control is required and som etim es
th e necessary sum m ation condition is expressed as com pound statem en ts. T he ex-
69
pression below illustrates this fact.

m+n
*=
Yi
i m /2
j=0
i+ j= n
i+j
(5-13)
In E quation (5.13) th e inner sum m ation includes a com pound statem en t. T he itera
tion m echanism is extended to su p p o rt th e com posed condition which makes use of
a syntactically hidden conjunction to define th e lower lim it for th e iteration. The
m eaning associated w ith th e = sym bol, in its two occurrences on th e conjuncted con
dition, is not th e same. An am biguity was introduced by th e ad d itio n al sem antics
attach ed to th e = symbol as th e result of an extension procedure.
9i
1
2
3
4
5
5.1
5.2
6
6.1
7
identity_expr
sample_expr
sample_expr
sum
ra n g e Jist
s ta rt
s ta rt
single_start
com pound_start
identifier
sample_expr EQ sample_expr
expr
sum
SUM { ran g eJist : sam ple.expr }
s ta rt , end
single_start
com pound_start
identifier = expr
single_start ' identity_expr
ID
Table 5.17: G ram m ar for sum m ation.
A possible g ram m ar definition to su p p o rt th e conjuncted condition can be obtained

by m odifying th e fragm ent associated w ith th e definition of th e sum m ation operation,
and by th e ad dition of a g ram m ar fragm ent to represent th e com posed version of the
iteration. Table 5.17 illustrates a possible set of gram m ars th a t can be used to capture
th e sum m ation operations th a t have com pound iteratio n statem ents.
Since th e g ram m ar proposed in Table 5.17 has been developed w ith th e purpose of
extending th e recognition power provided by th e gram m ar fragm ent in Table 5.16,
it is expected th a t some common stru c tu ra l knowledge is shared between th e two.
This is tru e since rules 1 to 5 and 7 are th e sam e in bo th fragm ents. Also rule 6 is
sem antically equivalent in th e two definitions. T he two instances of this rule differ
only on th e ir left hand side nonterm inals.
70
T he gram m ars proposed for cap tu rin g th e sem antics of th e sum m ation concept illus
tra te th e need for a com position process to su p p o rt th e extension of already defined
constructs by reusing existing g ram m ar fragm ents. C h ap ter 6 discusses th e g ram m ar
extension problem and provides a solution in term s of g ram m ar operations.
5.9
Conclusion
This ch ap ter introduced th e notion of using CFG s as the m ajo r form alism to capture
th e sem antics of m ath em atical concepts. It discussed th e advantages and lim itations
of using C FG s to su p p o rt th e dynam ics of au th o rin g m athem atics.
T he syntax of program m ing languages is usually specified by m eans of C FG s [95].
S tru ctu rin g th e m ath em atical n o tatio n as a program m ing language has th e advantage
of using C FG s for its specification and processing.
Specification is su p p o rted by
th e C F G 's stru ctu rin g m ethods which include com position, choice, repetition, and
recursion [95]. Effective and efficient parsing algorithm s and tools are available to
sup p o rt its processing.
A lthough C FG s have successfully been used for th e specification of th e syntax of pro
gram m ing languages, this form alism is not ad eq u ate for th e definition of th e sem antics
of program m ing languages [100]. A nother im p o rtan t lim itation this form alism has re
lates to its sta tic characteristic. This restricts its use to th e su p p o rt of organizations
th a t do not depend on th e notion of update.
71
Chapter 6
M odelling C ontext D ependent
Inform ation
T he notion of using C FG s to su p p o rt th e sem antics cap tu rin g of m ath em atical con
cepts was introduced in C h ap ter 5. This ch ap ter proposes th e fundam entals of a
docum ent organization th a t models th e dynam ics of au th o rin g m athem atics. T he
m odel su p p o rts b o th th e extensibility and am biguity characteristics of m ath em atical
n o tatio n and is capable of cap tu rin g th e m eaning of m ath em atical concepts bv means
of syntax defined during authoring.
6.1
Authoring M athem atics and M ultim odality
This section presents th e basic com ponents of a stru ctu re to su p p o rt th e dynam ic

auth o rin g process as discussed in Section 5.3. As em phasized there, different inter
p retatio n s m ay be assigned to a given syntax. This behavior is understood as in sta
bilities in th e binding between m eaning and syntactical representation. As a m ajor
characteristic of au th o rin g m athem atics this needs to be addressed in any proposal
to model this type of authoring.
T he conventional n o tatio n which is used for th e com m unication of m ath em atics is
characterized by a context-dependent m eaning-to-syntax binding.
This dynam ical
form of attach in g m eaning to syntax is th e m echanism available to th e au th o r for ex

pressing knowledge by m eans of th e sym bol arrangem ent h e/sh e believes is th e most
72
adequate for th e com m unication of th e ideas to be presented through th e docum ent.

One characteristic this no tatio n has is to leave unspecified th e dom ain which th e con
cepts represented belong to. A lthough this simplifies th e notatio n used it imposes
lim itations on th e rendering of concepts for com m unication in different m odalities.
The context-dependent quality also requires th e a u th o r/u se r to have knowledge of the
context where th e ap p ro p riate m eaning is to be associated to a syntactical represen
ta tio n of a concept. O ne way in which this lim itatio n may be approached is to assign
this knowledge requirem ent to th e stru ctu re th a t is used for cap tu rin g th e sem antics
of th e constructs.
T he organization proposed here addresses th e dynam ic m apping between m eaning
and representation by m eans of a m eta-system . This stru ctu re establishes th e nec
essary m ethods for cap tu ring th e sem antics of m athem atical concepts, leaving the
definition of th e desired n o tation to th e au th o r. W hen new concepts or extensions
to constructs already defined are necessary, th e a u th o rs involvement will be required
to configure th e system for cap tu rin g th e constructs th a t need to be included in the
docum ent. T he dynam ic au th o rin g process may be viewed as a m odular organization
com posed of a P aram eterized N o tatio n al S tru ctu re (PN S), a H ierarchical Interm edi
ate R epresentation (HIR) and a R endering S tru ctu re (RS).
A param eterized no tatio n al stru ctu re is an organization defined by a m eta-stru ctu re,
a program m ing language and a set of gram m ars. This set contains th e necessary
gram m ars to cap tu re th e syntax and sem antics of th e m ath em atical concepts th a t
have been included in a given docum ent. T he m eta-stru ctu re provides th e rules to
be used for th e creation of th e gram m ars th a t belong to th e set. T he program m ing
language m anipulates th e gram m ars th a t have been created according to th e m etastru ctu re.
T he notion of scope is also provided by this language which is applied
to resolve sy n tax am biguities. S tatem en ts of th is language include th e m ath em atical

constructs th a t have been encoded according to a dom ain defined by a scope.
In
sum m ary this language provides a m echanism to aid th e au th o rin g of m ath em atical
concepts th a t are being cap tu red by th e gram m ars from th e set. It also provides, by
m eans of th e scope, a dynam ical form to cope w ith syntactical am biguities.
Interm ed iate representations of docum ents are generated as a result of th e interac
tion between th e a u th o r and th e PNS. These hierarchical interm ediate representations
sup p o rt th e provision of th e inform ation th a t th e rendering stru ctu re will m anipu
late in order to generate different views of a docum ent. T he set of docum ent views
73
produced by th e RS will depend on th e purpose of th e application. For this reason it

processes th e H IR based on knowledge provided by application experts.
PNS
HIR
RS
Figure 6.1: S tru ctu re to su p p o rt dynam ic au th o rin g and m ultim odality processing
T he interaction between th e th ree m odules is illustrated in Figure 6.1.
T he ar
row /function pair is used to represent how inform ation is processed. T he in terp reta
tion associated w ith this form of representation is described as follows.
F unction / ( ) represents th e service provided by PNS to its only client HIR, which
involves th e creation of an in term ediate docum ent representation. T he set of functions
h i ( ) , . . . , hk() is used to represent th e set of services th a t RS provides. These services
are based on th e knowledge stored in H IR th a t are shared w ith RS through g(). They
are m echanism s to produce different views of th e encoded docum ent obtained from
HIR. T he views, represented bv th e boxes labeled
vk in F igure 6.1, are the
result of th e application of th e rendering form ats required by th e final application.

T he diagram shown in F igure 6.2 describes th e dynam ics of th e model proposed.
T he following discussion makes use of th e inform ation provided in Table 6.1 and the
diagram shown in Figure 6.2 to present th e operational organization of th e com plete
process. A ccording to this diagram every interm ediate docum ent i is th e result of
intentions of th e a u th o r a, coded as language / statem en ts, th a t will m an ip u late the
concepts defined in th e set of gram m ars g. As illu strated in F igure 6.2, knowledge
of th e m e ta-stru ctu re m is necessary to u p d ate th e set of gram m ars. This may be
done by m eans of th e program m ing language /, which updates th e g ram m ar set based
74
M odules
PNS
H IR
RS
C om ponents
a
s
1
m
g
i
P
r
d
e
D ocum ents Processing E n tity

a u th o r
application specialist
program m ing language
m eta-language
set of gram m ars
in term ediate docum ent representation
application specific sem antics
rendering mechanism
docum ent application
editing
T able 6.1: C om ponents involved in dynam ic au th o rin g for m ultim odality.
Figure 6.2: A sketch of th e dynam ics of th e au th o rin g /ren d erin g process.

on previously defined gram m ars. A nother possible way to u p d ate these gram m ars
is by th e direct use of a tex t editor. E d itin g m echanism s used for th is purpose are
represented by e in Figure 6.2. This way of m anip u latin g g is needed whenever the
cap tu rin g of th e m eaning of a concept cannot be obtained by th e use of /.
The
provision of all gram m ars necessary for su p p o rtin g th e dynam ic au th o rin g process is
therefore th e result of actions taken by th e au th o r th a t involve th e stru ctu re defined by
th e PNS. T he various docum ent applications d may be obtained from th e interm ediate
representation i , by th e application specialist s. For each application th e knowledge
of specific sem antics p as well as rendering m echanism s r are necessary.
75
Usually th e au th o r is interested only in providing th e sem antics for th e assum ed

stan d ard d o cu m en ts usage which, m ost of th e tim e, involves only a printed view of
th e docum ent. T he sem antics for oth er usages such as voice, for exam ple, could be
obtained w ith th e su p p o rt of an application specialist.
E lectronic docum ents allow for th e possibility of rendering th e ab strac t concepts which
com pose th e docum ent's logical stru c tu re as different concrete variables. A nother
characteristic of electronic docum ents is th a t th e m eanings of th e included concepts
need to be properly encoded to allow th e ir processing by th e com puter. It is also
difficult or som etim es even im possible to predict all potential applications th a t may be
assigned to th e ab strac t concepts th a t com prise a docum ent. All these characteristics
may be su p p o rted by th e availability of:
1. an ad eq u ate sem antics-based encoding of th e m ath em atical concepts, and
2. a set of associated rendering m echanism s to convert th e encoded concepts to
th e respective expected form ats.
T he encoded concepts are represented in F igure 6.2 by th e circle nam ed i and the
rendering m echanism bv th e circle nam ed r. As illu strated th e in term ediate docum ent
representation is th e only com ponent th a t is visible to th e rendering m echanism . For
this reason all am biguities m ust be resolved at th e PNS during authoring.
T he organization proposed su p p o rts th e m ultim odal com m unication of concepts by
m odeling th e hum an behavior involved d uring th e au th o rin g activity. Since th is thesis
aim s at cap tu rin g th e sem antics of m ath em atical concepts it only considers th e PNS
portion of th e organization presented. Sections 6.3 to 6.8 describe th e fundam ental
com ponents of th e PNS m odule. Sections 6.3, 6.4, 6.5 and 6.7 discuss th e g ram m ar
com ponent. Section 6.6 introduces th e language and Section 6.8 th e m eta-language
com ponents, respectively.
A gram m ar-based stru ctu re to model th e dynam ics of
au th o rin g m athem atics is discussed in th e rem ainder of this chapter.
6.2
A Formal Structure for Docum ent Authoring
In this section it is assum ed th a t docum ents are created according to th e au thoring

model presented in Section 6.1. T he m odel introduced there establishes a set of steps
76
which can be followed during th e creation of a single docum ent. This idea is now
extended by th e notion th a t a docum ent may be considered as th e result of a set of
m odifications applied to o th er docum ents. This property as well as an exception to
this notion are discussed next.
T he proposed docum ent stru ctu re is based on th e assum ption th a t any docum ent in
its final version is seen as th e result of a com position process in which interm ediate
versions of th e docum ent are produced. As th e a u th o rs ideas evolve and new concepts
need to be included, different versions of th e docum ent are generated. These versions
can be in terp reted as blue prints of th e a u th o rs capacity to com m unicate ideas and
concepts.
T hree im p o rtan t stages related to th e versions of a docum ent produced during the
auth o rin g process are identified here. T he first is th e one in which th e a u th o r makes
use of any available concept definitions. D ocum ents created during th is stage are
called
default docum ents. A nother stage is th e final. At this stage th e au th oring
process is over and th e outcom e is th e final docum ent. In general, m any different
versions of a docum ent are created before th e final one is produced. This leads to the
th ird stage, th e in term ediate one, w here all intermediate versions of a docum ent are
created. In a case where only one docum ent version is produced during th e com plete
auth o rin g process, th e default, final and in term ediate versions are th e same.
At any in stan t during th e au th o rin g process, th e stru ctu re required to su p p o rt the
creation of a p articu la r version of a docum ent is th e result of a process involving a set
of gram m ars. Each isolated gram m ar contributes to th e cap tu re and representation
of a t least one m ath em atical concept and has been included in th e docu m en ts sup
p o rtin g stru ctu re by m eans of one of th e following three approaches. Each gram m ar
fragm ent either
1. has been created by stan d ard ed iting procedures or
2. has already been defined or
3. has resulted from g ram m ar operations.
T he stru ctu re proposed in this ch ap ter organizes gram m ars into directories, and the
com position of a directory includes definitions which have been created by any of
77
th e three above m entioned approaches1. It is assum ed th a t an au th o r who uses the

m odel proposed here will often have a set of concept definitions available which may
be used d u rin g th e creation of th e default docum ent. T he definitions should have
th e ir location included as p art of th e referencing process. These locations may vary
from a local file system to th e W orld W ide Web.
As proposed in Section 4.6, au th o rin g environm ents are represented in term s of a
docum ent stru c tu re and th e system 's interface. T he expression V = ( S , I) wras used
for this purpose. In this section th e docum ent stru ctu re is defined as an organization
to su p p o rt th e dynam ic characteristic of docum ent authoring. This characteristic is
sup p o rted here by m eans of an ad ap tab le organization. For this reason, th e docum ent
stru ctu re 5 will be called th e document instance structure. T he following subsection
introduces a gram m ar-based stru c tu re to model
th e dynam ics of au th o rin g
m ath e
m atics.
6.2.1
Grammars and Dynam ic Docum ent Authoring
A document instance structure Sj, for j > 0, is a tuple

S j = ( Dj ,c)
( 6 . 1)
w here c is a binding control m echanism and Dj is th e semantic structure. T he sub

scripted variable j determ ines th e version of th e docum ent stru ctu re considered. T he
binding control c is a g ram m ar th e purpose of which is th e provision of an environ
m ent in which th e sem antic stru ctu re required for th e docum ent instance stru ctu re
is placed. This environm ent provides su p p o rt for docum ent au th o rin g behavior. For
this reason it m ust be independent of versions.
T he sem antic stru ctu re Dj is a finite sequence,
=
( 6 .2 )
of finite sets of gram m ars.

A do main 2 is a g ram m ar in Gj, 1 < i < nj, such th a t it contains b o th th e syntax and
1Refer to Figures 6.1 and 6.2 and Table 6.1 for an overall view of the authoring mechanism.
2 As discussed in Chapter 5, a CFG is considered equivalent to a data type; therefore domain and
data type are also equivalent.
78
portions of th e sem antics needed to cap tu re th e m eaning of a set of m ath em atical

concepts. T he set G\ is called domain directory or ju st directory.
For a given version of a docum ent w ith docum ent instance
Expression 6.1,th e collection
stru c tu re as defined in
of all g ram m ars which can be found in th a t version of
th e docum ent is called th e document dictionary. This collection is defined as
Uc ;
(6-3)
t=l
Each Gj E D j is a union of three sets of types of gram m ars th a t is

G{ = N \ U F( U C l
(6.4)
T he gram m ars in N i are gram m ars created by stan d ard ed iting m echanism s. The
gram m ars in F- represent gram m ars th a t have already been created. They are ready
to be used and satisfy th e following condition:
=
F?
if j = 0, i = 1
c u u u #* ifz>i
*= 1
k=
(6'5)
T he th ird set, C l collects all g ram m ars th a t are introduced by th e two binary oper
ations in th e set B {%,
o}3. T he set C{ is defined
as follows:
C{ = { h P h ' | h, t i {F> U N i ) A P G B }
C h ap ter 7 presents a set

section.
of exam ples involving th e
T he reason for deferring th e
(6.6)
organization introduced in this
exam ples is because th e form alism s needed
to sup p o rt th e concepts introduced are provided th roughout th e rem ainder of this

chapter. A norm al form for CFG s is proposed in th e following section.
3The definition of these two operations require that their operands be grammars that satisfy the
extension normal form criteria introduced in Section 6.3. Both the syntax and the semantics of the
two operations will be defined in Section 6.4.
79
6.3
Structuring w ith Grammars
T he design strateg y introduced in Section 5.3 suggests th e use of C FG fragm ents to

cap tu re th e sem antics of th e m ath em atical concepts. T he m odular stru ct ure proposed
there is based on th e assignm ent of a unique nonterm inal to represent th e m eaning
of a concept. T he set of productions used for th e definition of a nonterm inal, in this
scheme, is viewed as th e specification of a d a ta type which is represented by th e gram
m a rs s ta rt sym bol4. T he m ath em atical constructs recognized by this organization
are, a t run-tim e, th e instances of th e associated types, as defined by th e g ram m ar5.
For this reason they are considered as objects.
As m entioned in Section 5.3 th e use of norm al forms would benefit b o th th e sem antics
cap tu rin g and th e g ram m ar com position activities. T he connection between sem antics
cap tu rin g and norm al form is approached here by th e definition of a set of tem plates.
These tem p lates establish th e restrictions which th e g ram m ar rules for sem antics
cap tu rin g m ust follow.
T he conventional m athem atical n o tatio n represents concepts as strings of symbols.
T he in terp retatio n of any of these strings is based on th e arrangem ent of th e symbols
and th e dom ain (field) in which definitions are proposed. A lthough th e num ber of
operands th a t can be attach ed to an o p erato r is determ ined by th e concept to be
represented, th e location where an o p erato r is placed inside expressions is usually
lim ited to th ree possibilities. O p erato rs may usually be arranged according to either
infix, prefix or postfix form ats depending on th eir placem ent relative to th eir operands
in th e expression6. They are, therefore, considered infixed if they have b o th left and
right operands, prefixed in case only right operands are provided and postfixed in
situ atio n s where operands are only placed on th e left side of th e operator. V ariations
of this scheme are necessary to su p p o rt situ atio n s where th e object has no operands.
A vector, for instance, illustrates this scenario since, when represented according to
4It is assumed that mathematical concepts only have the expected meaning if the domain directory
in which they are defined is considered.
5Although objects in programming languages are understood as the result of class instantiations,
the interpretation attached here to them differs by the fact that the instances are not automatically
generated by the language processor. In the proposed model grammars correspond to classes and
the expressions are the objects defined during authoring. In this scenario objects are the result of
an incremental process during which portions of the object or the complete object are provided by
the author.
6An exception to this rule is the representation of juxtaposed multiplication as used in polynomials
where no explicit operator is provided.
80
th e stan d ard notatio n , it is often encoded as a single lowercase le tter in bold face
type. T he representation of this concept may, for instance, be viewed as either a
prefixed or a postfixed expression w ith no operands.
A norm al form for CFG s is, therefore, proposed as a way of stru ctu rin g gram m ars
to su p p o rt th e expression form ats discussed. T he term inal symbols of th e proposed
stru ctu re are used for th e representation of th e o p erato r's nam e and th e nonterm i
nals are used for th e representation of th e operands and necessary delim iters. This
gram m ar stru ctu re also provides th e necessary mechanism to su p p o rt recursive def
initions since they are needed to cap tu re th e repetitive occurrences of certain types
of operators in expressions.
D e f in itio n 2 A C FG G = ( N , T , P, S) is said to be in th e Extension Normal Form

(EN F) if, for all A N , w ith a G T and a N *, there are only four kinds of
prod u ctio n s7 in P . They are:
(1)
> a a
(2)
->
aa
(3)
->
AaA
(4)
->
T h e o r e m 1 For every CFG G such that e ^ L ( G ) , one can construct an equivalent

CFG in Extension Normal Form.
Proof: This result follows from th e super-norm al-form theorem in [71, 106].
Each p roduction rule of th e EN F may be interpreted as an atom ic g ram m ar fragm ent.
To achieve th is assum e each one of th e four kinds of rules, as proposed by th e EN F,
defines a CFG.
In Section 5.3 th e correspondence between C FG and type was proposed. This indi
cates th a t th e definition of a ty p e will be a function of the num ber and th e stru ctu re
defined by th e g ram m ars rules. For any given C FG rule, th e com bination of te r
m inals and nonterm inals determ ines th e ty p e of th e rule. Rules may therefore be
organized according to th e num ber of term inals and nonterm inals as structured and
7G being in ENF means, G is an interpretation of a 2-symbol CFG form [106] with rules only of
the types listed.
81
non-structured. N on-structured rules define prim itive types such as integer, real and
character for exam ple. Rules th a t cannot be associated w ith any ty p e are also con
sidered n o n -structured. S tru ctu red rules define types and th e m eaning of a ty p e may
depend on inform ation provided by o th er rules.
T he following definitions im pose restrictions on gram m ars in E N F as a way to clas
sify these gram m ars according to th e criteria of being stru ctu red or n on-structured.
T he resulting gram m ar fragm ents are th e building blocks which will be used for the
sem antics cap tu rin g process.
D efinition 3 A C FG G = ( N , T , P, S) in EN F is called an operatorless g ram m ar if

N = {S', ?}, T = {} and P = { S B } . T he rule S i B is called an operatorless8
production.
O peratorless gram m ars are used to introduce specializations. T h a t is, a concept asso
ciated w ith S is specialized to B. Any in stan tiatio n of B is therefore an in stan tiatio n
of S.
D e f in itio n 4 A C FG G = ( N , T , P, S) in E N F is called a primitive g ram m ar if
N = {S}, T = {a} and P = { S
a}.
T he rule S > a is called a prim itive
production.
P rim itive gram m ars introduce atom ic types. T h a t is, the type assigned to its nonter
m inal does not depend on th e type associated w ith any oth er nonterm inal.
D e f in itio n 5 A C FG G = ( N , T , P, S) in E N F is called a basic g ram m ar if for
a ( N U T ) + its set of rules is P { 5 > o } . T he rule S > a is called a basic
production and it is n either a prim itive production nor an operatorless production.

Basic gram m ars are ty p e constructors. They are used to create com posite types. In
this case th e ty p e assigned to its s ta rt sym bol will depend on th e types associated
w ith th e oth er nonterm inals th a t are p a rt of th e rule.
O peratorless, prim itive and basic gram m ars are th e essential com ponents which will
be involved in th e sem antics cap tu rin g activity. For this reason they will be referred
to as fundamental gram m ars.
8This type of production is often called unit production [55, 107, 69].
82
D efinition 6 All C FG s in EN F which are neither operatorless, prim itive nor basic are
called derived gram m ars. Derived gram m ars which have no operatorless productions
are called reduced gram m ars.
The following exam ple illustrates th e notion of basic gram m ar. Consider, for instance,
gram m ars
. Gi
( N u T u P ^ S t ) w ith
N,
= { S U B , C } , 7 \ = {a}, P l = { S l
G-i =
(Ari,T i, P 2, S i) w ith
P2
= {*51 > B C a } and
(-î) T\i P 31 S i) w ith
= {5i
aBC}
aBCa}
and G 2 are bo th basic gram m ars. G 3 is not basic since it is not in EN F.
6.3.1
M athem atical Concepts and Grammatical D ependen

cies
T he fact th a t th e definition of a m ath em atical concept usually depends on other

concepts indicates th e existence of relationships am ong them . This characteristic is,
in this thesis, in terp reted as a dependency relation where one concept is th e dependent
and a set of others th e determ inants. In th is work this relation is represented by an
arrow th a t sta rts a t th e set of d eterm in an ts and points to th e dependent. T he two
following exam ples illu strate this notion of dependency.
T he concept of absolute value is defined as an o p erato r th a t retu rn s th e unsigned
version of th e expression supplied as its argum ent. A ccording to conventional n o ta
tion, | sin x | is th e encoding for the absolute value of the sine function computed for
argument x. A ssum ing A and S represent th e absolute value and th e sine function
respectively, th e relation between these two concepts, in th e context of |s in x |, is

indicated as follows:
A <$= S
The presented dependency relation establishes a hierarchical relationship th a t has A

as th e parent of its only child, th e S construct. In this case S is th e determ in ant and
A th e dependent.
83
T he concatenation of strings of characters is often encoded as expressions in th e infix

form at w ith th e + symbol representing th e operation on th e operands [107]. An alte r
native encoding is also used when a single strin g of characters is to be concatenated
w ith itself. In this case th e operation may also be encoded as a power expression
where th e expected result is th e original strin g concatenated w ith itself th e num ber
of tim es indicated by th e exponent. T he two encodings are illu strated by th e equality
expression
a * 3 + b = aaab
where th e symbols *, + and = represent th e power, concatenation and equality op

erations respectively.
c a tE q u a l it y
expr
expr
term _cat
term .cat
term
term
term .strin g
factor
expr EQUALS te rm .c at
te rm .c a t
te rm .c a t CO N CA TEN A TIO N term
term
term P O W E R factor
term _string
STR IN G
IN T E G E R
Table 6.2: C FG for equality of strings of characters.
The operations and operands involved in this expression are described in th e g ram m ar
represented by th e set of production rules defined in Table 6.2. T he hierarchy im posed
by th e rules of g ram m ar catEquality 9 establishes th e following seven dependency
relations:
EQUALS
C O N CA TEN A TIO N
EQUALS
POW ER
EQUALS
STR IN G
STR IN G
POW ER
POW ER
STR IN G
POW ER
IN T E G E R
9It has been assumed that STRING is defined by the regular expression [a-j]" and INTEGER
is a nonzero positive integer.
84
The above relations determ ine th e dependencies which exist am ong th e m ath em atical
concepts which have been used for th e definition of an o th er concept. In th is thesis they
will be called terminal dependencies 10 because of th e one-to-one association between
th e nam e of th e m ath em atical concept and th e term inal sym bol wrhich represents it
in th e g ram m ar which captures th e m eaning of th e concept.
scheme
dexpr
dexpr
rest
det
det
det
dlist
first
others
object
object
m ore
more
->
>
>
>
->
->
->
->
->
>
>
ID { dexpr }
dexpr rest
ID
<= det
ID
( dlist ) more
dexpr
first others
ID
, object
ID
dlist
e
; dexpr
Table 6.3: C FG for representation of schemes.
T he g ram m ar in Table 6.3 provides th e syntax which will be followed, in this thesis,
to represent term inal dependency relations. Each word belonging to th is g ram m ar is
called a representation scheme.
Since representation schemes are always related to gram m ars, they will be identified
by th e g ram m ar's nam e appended w ith th e literal string Scheme. T he expression
which follows determ ines th a t catEquality Scheme is th e representation scheme for
th e g ram m ar defined in Table 6.2.
c at .E q u a l it yS ch em e {E q u a l s <= ( C o n c a t e n a t io n , P o w e r , S t r i n g );
Co n ca t e n a t io n
( S t r i n g , Power ):
P o w e r <= (Integer, S t r i n g ) }
10Although the formal definition of terminal dependency is provided at the end of this section,
these dependencies can easily be identified whenever the related grammar is expressed as a reduced
grammar.
85
A lthough th e E N F determ ines th e possible arrangem ent of nonterm inals and te rm i

nals in p roduction rules, this m echanism is not adequate for th e description of the
relationships which exist am ong th e concepts represented. This lim itatio n is a con
sequence of th e fact th a t in a C FG th e nonterm inals are variables and th e term inal
symbols are constants. This implies th a t th e existing relationships am ong concepts
can only be expressed in term s of th e ir associated nonterm inals when represented by
CFG s. T he notion of gram m atical dependency is introduced as a form of describing
th e restrictions a set of gram m ars should satisfy whenever th eir term inal symbols
express a dependency relationship.
P rio r to th e definition of term inal dependency th e notion of ty p e decom position
needs to be introduced.
11
This m eans a C FG is decom posed as a set of gram m ar
fragm ents which can be tested for nonterm inal relationships. T he existence of rela
tionships am ong nonterm inals in different g ram m ar fragm ents leads to th e notion of
gram m ar dependency. These ideas are form ally presented by th e following definitions.
D efinition 7 Let L = ( N , T , P, S) be a reduced gram m ar. T he type decomposition

of L is th e set
Z t = { K p = ( N p, Tp, Pp, Sp) | Pp = {p}, N p = ( Lp U Rp), Tp = O p, Sp = Lp, p e P }
D efinition 8 Let Gb = ( Nb , Tf,, Pb, Si,) be a basic g ram m ar and let G 0 =
(A T0, T0, P0, S 0)
be either a basic or a prim itive g ram m ar such th a t Gb ^ G a. Gb is gram m atically

dependent on G 0, G b <= G 0, if L 0 n R b ^ 0.
Ei
Ei
e3
E
4
Es
Ef,
expr
expr
expr
term
term
factor
expr MINUS term

term TIM ES factor
ID E N T IF IE R
term D IV ID ED B Y factor
ID E N T IF IE R
ID E N T IF IE R
Table 6.4: G ram m ar fragm ents illu stratin g g ram m ar dependencies.
11 Since the start symbol of a grammar is interpreted as a type, the decomposition of a grammar
as a set of grammar fragments is viewed here as a type decomposition.
86
To illu strate th e g ram m ar dependency concept, consider th e gram m ars defined in

Table 6.4 th a t may be viewed as th e result of th e type decom position of some gram
m ar. For th is exam ple Z = [ E i , E 2, E 3, E 4, E 5, E 6}. T he following dependencies are
obtained by applying Definition 8 to th e gram m ars defined by th e productions in
Table 6.4.
Ei
<t= ( E 2 , E 3 , E 4 , E h)
(6.7)
E2
<= (E \, E 3, E 4, E 3, Ee)
(6.8)
E4
<= ( E 5 , E 6)
(6.9)
As can be seen, com m as and bo th opening and closing parentheses have been used
for th e representation of dependencies.
These symbols wrere included in order to
group all d eterm in an ts th a t are associated w ith a dependent in a single dependency

expression. T he gram m atical dependencies determ ined by th e dependency relation
(6.7), for instance, states th a t portions of b o th th e syntax and th e sem antics described
by gram m ar E x are supplied by gram m ars E 2, E 3, E 4 and E 5.
W hen cap tu rin g th e sem antics of m ath em atical concepts by a set of g ram m ar frag
m ents, th e nam es of th e concepts are represented bv the term inals of th e gram m ars
involved. T he resulting g ram m atical dependencies which exist am ong gram m ars in
this set can be expressed by m eans of th e term inal symbols of these gram m ars. As
stated before, these relationships are called term inal dependencies and they are for
mally defined in term s of g ram m atical dependencies as follows.
D efinition 9 Let G b = ( N b, Tb, Pb, S b) and G 0 = ( N 0 , T 0 , P 0, S 0) be C FG s. W hen the

gram m atical dependency G b <= G a is satisfied we also say th a t th ere is a term inal
dependency for each p air of term inals x, y such th a t x G Tb and y T0. T he syntax
x <= y is used to express a term inal dependency between term inals x and y. For this
case term inal x is called th e dependent and term inal y th e d eterm in an t.

The collection of all term inal dependencies which can be determ ined from a type
decom position is called a dependency scheme. A representation scheme is th e syntac
tical stru ctu re which is used to list all term inal dependencies found in a dependency
scheme.
D ependency schemes can be used as an aid to help with th e identification of redun
dancies by syntax equivalence. Reduced gram m ars which do not share dependency
87
schemes are free of redundancies by sy n tax equivalence.
6.4
Grammar Operations and Extensibility
This section introduces two operations. These operations have gram m ars as b o th their
input p aram eters as well as th e ir retu rn ed inform ation. In p u t gram m ars are seen as
providers of b o th syntax and sem antics and they are never modified by th e operations.
T he o u tp u t produced is th e result of th e com bination of th e production rules supplied
by th e input. Since th e application of any of th e two proposed operations produces
a single gram m ar, th e creation of m ore com plex gram m ar definitions may be seen as
th e result of a sequence of operations which would use th e inform ation obtained from
previous operations. Therefore th e creation of th e final g ram m ar may be viewed as
th e result of a process where g ram m ar fragm ents have been inserted a n d /o r deleted.
B oth operations are defined for in p u t gram m ars in EN F. This requirem ent guarantees
th a t th e o u tp u t g ram m ar is also in EN F. In this thesis these operations are the
means by which gram m ars are com bined in order to support th e extensibility of the
m ath em atical n otation.
T he use of C FG s as a su p p o rtin g organization to cap tu re th e m eaning of m ath em atical
concepts, as previously proposed in this work, is restricted to docum ent stru ctures
which can only be modified by editing mechanism s. This lim itatio n was discussed in
Section 5.3 w here th e correspondence between C FG and ty p e was presented.
The need to either overload a given sym bol by attach in g a different m eaning to it,
or to introduce a new syntactical representation for a m ath em atical concept may be
viewed as m odifications to be executed on gram m ars which have already been defined.
A nother approach to this need is to generate gram m ars to su p p o rt th e m entioned
requirem ents by reusing, whenever possible, th e available gram m ars. T he notion of
gram m ar reuse as defined by th e two operations proposed here is considered one of
th e fundam ental m echanism s
1 2
which this thesis introduces to approach th e sem antics
cap tu rin g necessity. For this reason b o th operations do not modify gram m ars which
have already been created. Instead they su p p o rt th e sem antics cap tu rin g activ ity by
12Another important mechanism is the notion of context switching or scope.
introduced in this chapter to support symbol overloading.
This notion is
88
allowing gram m ars to be created by reusing inform ation provided by o th er gram m ars.
T he following definitions introduce these operations:
D e f in itio n 10 Let G b = ( N b, T b, Pb, S b) and G 0 = ( N 0 , T o, P0, S 0) be two C FG s in
EN F. T he composition operation G b o G 0 will produce a C FG G c = ( N c, Tc, Pc. S c) as
follows:
Pc = Pb U P 0
Nc = Nbu N
Tc = Tb U T 0
Sc = Sb
D efinition 11 Let G b = ( N b, T b, P b, S b) and G 0 = { N 0 , T 0, P0, S 0) be two C FG s in

EN F. T he extension operation G b % G 0 will produce a CFG G x = (N x, Tx, Px, Sx), as
follows:
Px = { ^4 y (a | A y o G Pb A A $ N 0 } U P 0
Nx = NbU N 0
Tx
Tb U T 0
Sx = Sb
W hile th e com position operation is left-associative and com m utative th e extension

operation is left-associative, b u t not com m utative.
To illu strate th e use of both
th e com position and extension operators, consider th e need to cap tu re th e m eaning

of expressions consisting of th e ad dition of num bers. For th is purpose assum e th a t
g ram m ar fragm ents G 2, G 4, Ge and G g are available. This m eans these fragm ents have
already been created by editing procedures and have been stored in some com puterbased device.
G
expr
expr PLUS term
Table 6.5: Basic g ram m ar for addition.
Table 6.5 displays th e basic g ram m ar G which captures th e sem antics of expressions
2
consisting of two operands connected by th e infixed PLUS operator. Table
6 . 6
and
89
expr
Table
term
. : O peratorless g ram m ar linking expr and term nonterm inals.

6
term
num
Table 6.7: O peratorless g ram m ar linking term and num nonterm inals.
Table 6.7 contain th e definitions for operatorless gram m ars G and G

4
Table
6 . 8
respectively.
displays th e prim itive g ram m ar G 8. T he gram m ar to cap tu re th e sem antics
of addition expressions involving num bers may be obtained by th e com bination of

these four g ram m ar fragm ents as provided by th e expression
{{G
G^
o Gg o G }Gr i .
} ? 2 4
T he no tatio n { G 2 o G i } G 2 4 is used to express th e fact th a t th e result of th e com po

sition operation G
G has th e nam e G
4
2 4
. In a sim ilar way G r i is th e nam e it has
been assigned to th e com position operation G
o Gg
2 4
G 8. T he derived gram m ars G
2 4
and G r , are displayed in Table 6.9 and Table 6.10 respectively.

C onsider now cap tu rin g th e m eaning of expressions involving bo th th e addition and
th e m ultiplication of num bers. O ne way to approach th is problem is to make use
of th e gram m ars which have already been defined. A dditional gram m ars necessary
to cap tu re th e concepts not covered by these gram m ars may be obtained, say, for
exam ple, by editing.
Assume g ram m ar G n is already available. Also assum e gram m ars G j, G and G have
3
been created by editing. Table 6.11 displays th e basic g ram m ar G i. T he operatorless

gram m ars G
and G
are displayed in Table 6.12 and Table 6.13 respectively. T he
gram m ar to cap tu re th e sem antics of expressions involving th e m ultiplication and

addition of num bers is therefore obtained by m eans of the expression
{G>] % { G \ o G }G i o G }G r2.
3
T he n o tatio n G n % { G o G }G i is used to indicate th a t G ri is extended by G i which

1
is th e result of th e com position operation G i o G 3. T he nam e G T2 is therefore assigned
90
num
Table
N U M B ER
. : P rim itive gram m ar settin g nonterm inal num to term inal N U M B ER

8
24
expr
expr
expr PLUS term

term
Table 6.9: Derived g ram m ar for addition.
to th e result of th e operation {Gri % G i o G 5}. T he derived gram m ars G

3
i 3
and G r 2
are displayed in Table 6.14 and Table 6.15 respectively. A sim ple g ram m ar fragm ent
to deal w ith th e usage of b o th extension and com position operations is presented in
Table 6.16. A ccording to this g ram m ar th e result of th e binary o peration(s) may
either be saved as a new g ram m ar or not. This is a consequence of th e fact th a t
th e nonterm inal new_class may be replaced by th e term inal ID E N T IF IE R or by the
em pty strin g e. Therefore whenever variable new_class is replaced by th e em pty strin g
th e result of th e binary operation(s) will not be rem em bered. A lthough there is no
m eans of reusing th e result produced, th e procedure does generate a gram m ar. In
this thesis, this g ram m ar is called an implied gram m ar.
T he notion of im plied gram m ar introduces th e possibility of defining dom ains w ithout
adding gram m ars to th e dom ain directory. These types of dom ains exist only during
run-tim e and are called implied dom ains.
6.5
Structuring w ith Domains and Directories
Section 6.4 presented a stru ctu red process to cap tu re th e m eaning of m a th em a ti

cal concepts.
T he approach introduced th e notion of atom ic g ram m ar fragm ents
and th e notion of c re a te /u p d a te CFG s by m eans of two binary operations. Instead

of concentrating th e needed knowledge in a m onolithic gram m atical organization,
this process d istrib u tes th e required inform ation am ong a set of g ram m ar fragm ents.
These fragm ents are therefore viewed as decentralized stru ctu res which decom pose
m ath em atical concepts according to C FG s which are either basic, prim itive or oper
atorless. T he d istrib u ted fragm ents may be com bined by th e binary operations as a
91
expr
expr
term
num
Gri
expr PLUS term

term
num
N U M B ER
Table 6.10: R esulting g ram m ar for expressions involving addition.
Gi
Table
term
.
1 1
term TIM ES factor
: Basic g ram m ar for m ultiplication.
way of generating o th er g ram m ar fragm ents. T he set of g ram m ar fragm ents are, in
this way, u p d ated in an increm ental style by m odule reuse. As described so far, the
solution su p p o rts extensibility from a restricted point of view since it does not con
sider th e m ulti-dom ain aspect of m athem atics. Instead it assumes th a t all concepts
to be represented belong to a single dom ain.
T he proposed approach allows th e possibility of considering g ram m ar fragm ents as
both open and closed concepts. T he fact th a t they may be used to represent unique
inform ation which may be stored as com ponents of a library and used by clients
of th e library, characterizes them as closed concepts. On th e other h an d th e sam e
fragm ents may contain inform ation which may be used for th e creation of a new
gram m ar fragm ent by m eans of th e two binary operations. For th is reason they may
also be considered as open concepts. T his in terp retatio n is consistent w ith th e notion
of object-oriented class as provided by [72]. A ccording to this in terp reta tio n CFG s
correspond to classes. Therefore for a given C FG , say, for exam ple G = ( N , T , P, S ),
th e words in L( G ) will inform ally
w ith G.
13
correspond to instances of th e class associated
M athem atical expressions will consequently be th e objects.
O peratorless
gram m ars and CFG s which have only operatorless productions are an exception to
this because they have no m eans to express any concrete objects, and therefore cannot
generate m ath em atical expressions.
13This association is loose because some fundamental characteristics of classes cannot be expressed
as grammar operations. Consider, for instance, the notion of subclass. This concept does not always
correspond to grammars which result from either the extension or the composition operations.
92
G>
term
factor
Table 6.12: O peratorless g ram m ar linking term and factor nonterm inals.
G5
factor
num
Table 6.13: O peratorless g ram m ar linking factor and num nonterm inals.
6.5.1
Dom ains, Directories and Symbol Overloading
T he approach presented in th e previous sections does not properly address docum ent
organizations containing symbol arrangem ents which have been used to express con
cepts which belong to m ore th a n a single m athem atical field. In order to extend
th e proposed process a relation between sym bol overloading14 and d o m ain /d irecto ry
needs to be established. T he solution proposed in this subsection approaches symbol
overloading by m eans of a real-tim e u p d a te 15 process. This process is th e mechanism
by which th e stru c tu re of a docum ent ad ap ts in order to cope w ith representation
am biguities introduced as th e result of overloading.
For any given directory th e solution determ ines th a t the overloading is resolved by
means of a dynam ic directory change. This implies th a t th e m eaning of symbol ar
rangem ents is a function of th e directory in which they are defined. T he dynam ic
characteristic is required to su p p o rt th e possibility of user-defined syntax to be in
troduced d uring authoring.
A directory therefore defines a scope and th e symbol
overloading determ ines th e need for a change of scope or context sw itch16. T he ap

proach su p p o rts th is requirem ent by introducing th e notion th a t any twro sem antically
distin ct m ath em atical concepts which have been assigned th e sam e arrangem ent of
symbols for th e ir syntactical representation are considered here to have an overloaded
14One relevant aspect of the dynamical authoring of mathematics is the fact that the overloading
of symbols is at the author's discretion. This characteristic is nondeterministic. therefore it cannot
be predicted.
15In the context of this thesis, real-time update is used to refer to the document modifications
done during the document authoring activity.
16It is important to note that, in this scenario, both the number and contents of domains are
under complete control of the author of the document. This indicates that domain is a dynamic
concept. This point of view has been formally stated in Section 6.2.
93
G 13
term
term
term TIM ES factor

factor
Table 6.14: Derived g ram m ar for m ultiplication.
G T2
expr
expr
term
term
factor
num
expr PLUS term

term
term TIM ES factor
factor
num
N U M B ER
Table 6.15: R esulting g ram m ar for expressions involving addition and m ultiplication.
representation. Therefore th e representation of a m athem atical concept is considered

non-overloaded if there is a one-to-one relationship between th e concept and symbol
arrangem ent used in its syntactical definition. This idea is form ally stated bv the
following definitions:
D efinition 12 Let S be an alp h ab et and C be a nonem pty finite set of m ath em atical
concepts. T he representation of a m ath em atical concept is a m apping from C to S + .
D efinition 13 T he representation of a m ath em atical concept is overloaded if the

m apping from C to S + is not injective.
By stru ctu rin g concept definitions into dom ains and directories it becomes necessary
to establish th e conditions under which existing gram m ars could be applied to the
construction of a directory.
To ensure th a t a directory is free of am biguities the
restriction proposed by th e following definition needs to be observed.
D efinition 14 A dom ain directory is overloaded if th e representation of some m a th

em atical concept in th e directory is overloaded.
The notion of non-overloaded directories is useful in this context because each over
loaded concept representation which needs to be included in a docum ent determ ines
94
s tm tJ is t
stm t
class
class
operator
other_class
new .class
s tm tJ is t ; stm t | stm t
{ class } new.class
> class o p erato r other.class
> other.class
% | o
stm t | ID E N T IF IE R
-> ID E N T IF IE R | e
Table 6.16: G ram m ar to su p p o rt th e use of b o th th e composition and extension

operators.
th e need for a sep arate directory. T his m eans th e concept representation forces the
existence of an organization in which its m eaning is uniquely defined. This approach
has th e advantage of considering th e m ulti-directory characteristic as a su p p orting
mechanism for th e solution of th e sym bol overloading necessity.
Sem antics m o dularity is achieved when th e m any-to-one m apping between concept
and representation is restricted to non-overloaded dom ain directories. T he resulting
docum ent created, once th e com plete au th o rin g process is over, will have its contents
n atu rally organized according to th e m eanings of th e concepts involved.
6.6
Languages as Control Structures
T he concept of non-overloaded representation establishes th e necessity of a directory

switch m echanism as a way to ad a p t to sym bol overloading. This introduces addi
tional com plexity to th e strateg y chosen for processing th e inform ation provided by
directories. For this reason th e com plexity of language processors designed to address
this characteristic increases w ith th e num ber of dom ains introduced.
As g ram m ar fragm ents, directories are b o th open and closed concepts. They are open
because they are dynam ic and allow m odification17 to take place. As a closed concept
they represent unique inform ation. E ith er as a physical library com ponent or as the
result of operations perform ed on its underlying set of gram m ars, a directory exists
17No dynamic modification is ever allowed to a domain/directory as the result of the composition
and/or extension operations. However both domains and directories may be modified by editing.
95
as a single CFG . Since a directory will u ltim ately be represented by its C FG , there
exits a language processor18 associated w ith its gram m ar.
A lthough a directory which com prises p art of th e logical stru ctu re of a docum ent need
not have any direct dependency w ith th e others, they all share a com mon s tru c tu re 19
where all m ath em atical inform ation of th e docum ent is presented. This requirem ent
establishes th a t a form of synchronization is necessary to g uarantee th a t th e next
piece of d a ta to be processed will be dealt w ith by its associated processor.
The arrangem ent by which th e m ath em atical concepts are organized th ro u g h o u t a
docum ent is a user-defined task which takes place during th e au th o rin g process. It
is during th is phase th a t th e a u th o r specifies b o th th e syntax as well as th e tru e
m eaning of operations by binding concepts to syntax and collecting them into related
dom ains and directories. T he stru ctu re of th e docum ent, a t any tim e d uring this
process, will therefore reflect th e way these directories are arranged. T here are three
possible ways directories may be com posed. A docum ent stru c tu re is th e result of
directories arranged in one of th e following Directory Composition F or m s :
P u re linear,
pure hierarchical or
com bined form , a com bination of linear and hierarchical.
In a pure linear organization, directories are self contained. This m eans th ere is only
a single scope w here objects are delim ited. D irectories organized in th is way may
be processed in a F irst-In F irst-O u t(F IF O ) fashion. In a pure hierarchical organiza
tion, directories are processed in a L ast-In F irst-O u t(L IF O ) style. T he sem antics in
these types of docum ents are stru ctu red in a nested way such th a t only th e inner
m ost directory has no dependency w ith th e others. T he m ost com m on stru ctu re is
th e com bined one which is characterized by a random p a tte rn of F IF O and LIFO
organizations. T his case may be considered general as it contains th e previous two.
For this arrangem ent, th e possible num ber of docum ent stru c tu re p attern s which can
18This characteristic is supported by the fact that for every CFG there exists a Pushdown Au
tomaton that recognizes the language [55, 107, 69].
19 Even though text-based forms of representation are expected to be used in most applications,
the ideas presented here also apply to other input formats.
96
be obtained for a given num ber of directories is provided by th e nonlinear recursive

expression
n
Pn = Y,PiPn-i
t=l
for
Pn
P n 1? P o
where n is th e to ta l num ber of d istin ct directories.
This m eans th a t for a given
docum ent which requires, for exam ple, 6 d istin ct directories, 132 docum ent stru ctu re
p attern s can be obtained.
Therefore a small num ber of language processors can
be rearranged in m any different ways as a form of su p p o rtin g docum ent u p d ates20.

This indicates th a t, once th e associated language processors have been generated, all
docum ent m odifications, which do not involve th e addition of new directories during
th e au th o rin g process, will depend only on th e synchronization procedure needed for
th e generation of th e corresponding hierarchical interm ediate representation21.
T he notion of directories as a stru ctu re to su p p o rt the specification of th e syntax
and portions of th e sem antics of m ath em atical concepts has been introduced in the
previous sections. It is intuitive th a t directories m ust only be involved during the
auth o rin g process if there exists a t least one concept th a t needs to be represented. On
th e o th er hand, no m ath em atical con stru ct may be m anipulated d uring th e au th o rin g
process w ithout th e clear indication of where its related stru ctu re has been defined.
These concerns are sum m arized by th e following definition:
D e f in itio n 15 Let M be a finite set of m ath em atical concepts. T he sem antic stru c
tu re of a docum ent Dj involving m ath em atical concepts M is considered irredundant
if for each directory G j in D j th ere exists a m athem atical concept m G M such th a t
m is represented in th e scope of GK
Two characteristics which relate to th e way directories take p art d uring th e organiza
tion of m ath em atical inform ation in docum ents have been presented. In an inform al
way they sta te th a t th e sem antics of a docum ent is defined by m eans of a set of
directories and each of these directories m ust contain a t least one object in it. These
20It is understood that these updates do not require the addition of new directories.
21 This characteristic is of course subject to storage requirements. The choice of either keeping the
language processors in main or secondary storage is an implementation decision.
97
characteristics co n stitu te th e fundam ental requirem ents th a t need to be considered

for th e elaboration of a mechanism to determ ine th e way directories are m anipulated
during th e au th o rin g and processing phases. This mechanism m ust be flexible enough
to provide th e a u th o r w ith th e freedom to configure versions of th e docum ent by s ta n
dard docum ent operations such as insertion and deletion. Each docum ent version is
therefore th e result of a set of operations which may have changed th e docum ent's
internal stru ctu re, modified th e contents of th e docum ent, or b o th. M odifications af
fecting only th e contents of docum ents by eith er including or rem oving objects wdiich
belong to th e set of directories currently defined for th e stru ctu re of th e docum ent
have no fu rth er im portance besides an increase or reduction in th e am ount of infor
m ation th a t is to be supplied to a Tenderer. On th e other hand, any m odification
which affects th e d ocum ents logical stru ctu re would need to be executed under a
stable form of control.
6.6.1
D irectory C om position Example
Small fragm ents of docum ents containing sim ple expressions th a t overload th e +
symbol are provided to illu strate th e notion of directory com position. T he syntactical
s tru ctu re as defined by th e production rule
directoryscope > { directory-definition ) block-objects (/)
is used to delim it
th e scope of a directory w here
nals and th e symbols
th e strings of letters are nonterm i
() and / are term inals. N onterm inals
directory-definition and
block-objects are g ram m ar variables th a t have been used to represent a directory and
th e m ath em atical constructs included in th e block respectively.
Dl.O
( Expression )
1+ 1+ 0 = 2
1+1+1=3
1 + 1 + 0 = 110
1 + 1 + 0 = tru e
(/)
98
T he docum ent fragm ent Dl.O, as above illu strated , is characterized by a m onolithic
organization w here th e definition of all m ath em atical objects included in th e docu
m ent are placed in th e Expression directory. T he fact th a t th e + symbol has been
used in th e above exam ple to represent different m athem atical concepts, ch aracter
izes this one-directory docum ent fragm ent as overloaded. A voice Tenderer system ,
for instance, will not be capable of providing th e ap p ro p riate m eaning th a t has been
attach ed to th e -I- symbol in each of th e four expressions. This is because th e repre
sentation used assumes th a t
only visual-based views are necessary, and
th e reader has th e required knowledge to decode th e different m eanings assigned
to th e + symbol.
The above problem is approached here by dividing the single directory into three
sep arate ones in order to ensure th a t th e directory is not overloaded. A directorybased organization is consequently obtained. T he resulting docum ent organization, as
shown below, has therefore been stru ctu red according to th e addition, concatenation
and disjunction operations th a t have been attach ed to the + symbol.
D l.l
( Addition )
1+ 1+0 = 2
1+ 1+ 1= 3
(/)
( Concatenation )
1 + 1 + 0 = 110
(/)
( Disjunction )
1 + 1 + 0 = tru e
(/)
T he docum ent organization D l . l differs from organization Dl.O bv th e fact th a t

th e im plicit knowledge needed to distinguish syntactically identical operations w ith
different m eanings, as provided by Dl.O, has been replaced by th e three d istinct di
rectories. This m eans th e task associated w ith decoding to resolve am biguities th a t
99
was previously left to th e user, has now been assigned to th e a u th o r of th e docu

m ent. Therefore, besides outlining th e m ath em atical concepts used in th e docum ent,
th e au th o r is also responsible for specifying th e directory in which th e sy n tax and
sem antics of these concepts is defined.
The docum ent fragm ent D l.2 as shown below, is an exam ple of a docum ent stru ctu re
where com bined directory com position is used. A lthough th e m ath em atical objects
involved are th e sam e as th e ones in th e previous two versions, th is organization differs
from th e o th er two by th e way th e directories have been arranged.
D l.2
( Addition )
1+ 1+ 0 = 2
( Concatenation )
1 + 1 + 0 = 110
(/)
( Disjunction )
1 + 1 + 0 = tru e
(/)
1+ 1+ 1= 3
(/)
( Disjunction )
1 + 1 + 0 = tru e
(/)
6.6.2
1
2
3
4
5
The Control M echanism
directory .scope
directory-definition
block_objects
various_exprs
scope.change
->
->
->
< directory-definition > block.objects < / >

ID E N T IF IE R | s tm tJ is t
various_exprs scope.change
various.exprs ; new .expr | new_expr
directory .scope various.exprs | directory .scope | e
Table 6.17: C FG for th e binding control m echanism.
100
T he docum ent organization provided by th e three exam ples from th e previous subsec
tion illu strates th a t a form of control is necessary in order to ensure th e correctness
of th e directory com position forms. This requirem ent has been introduced in Sec
tion 6.2 as c, th e binding control. As p a rt of th e definition of a docum ent instance
stru ctu re Sj (Dj , c), th e binding control is a CFG . A possible definition of c to
sup p o rt th e directory com position forms is provided in Table 6.17. T he nonterm inal
s tm tJ is t is defined in th e g ram m ar fragm ent described in Table 6.16 and th e nonter
m inal new_expr is only to be defined whenever directories are created. This m eans
any C FG which defines a directory will have new_expr as a s ta rt symbol.
6.7
The Role of Compilers
T he organization im posed by th e dynam ic au th o rin g model allows th e au th o r of a

docum ent th e possibility to modify b o th th e syntax and sem antics of th e notation.
Therefore m odifications proposed a t th e ab strac t level, by th e au th o r, m ust always be
sup p o rted by th e docum ent processing environm ent. This requires th a t if gram m ar
definitions need to be modified, th e corresponding language processor will need to be
created to process th e new version of th e language22. Therefore different language
processors m ight need to be produced during th e auth o rin g process.
To approach th e sem antics cap tu rin g of m ath em atical concepts as proposed in Sec
tion 5.3, th e organization of m athem atics is viewed as a set of fields. A ccording to
this strateg y all concepts th a t belong to a field can be captured by a directory and
therefore require th e sup p o rt of an ad eq u ate language processor.
In a general scenario, docum ents often involve inform ation th a t belongs to m ore than
a single dom ain. For this reason th e notion of directory as a collection of dom ains
was introduced. A processing stru ctu re to su p p o rt this arrangem ent would dem and as
m any language processors as th e num ber of directories necessary to cover th e sta te of
knowledge addressed by th e docum ent. Therefore th e num ber of language processors
to sup p o rt th e dynam ic au th o rin g model will always be greater th an two23 if the
22It is assumed that one directory may be composed of a set of grammar fragments.
23The propsed document structure supports the directory swap strategy" by means of a CFG, the
binding control. For this reason at least two language processors are required. Consequently at
least one additional processor will be needed to process the objects included in the document. This
organization forces the number of processors to be one greater than the number of directories in the
document.
101
docum ent to be processed involves inform ation defined in m ore th a n one dom ain.
Different language processors will take over th e processing activity at selected parts
of th e docum ent. Each processor is viewed as an agent th a t has knowledge to validate
th e syntax of its m ath em atical objects and perform other tasks as determ ined by the
sem antics of th e objects.
L etting th e num ber of directories in a docum ent be a p aram eter under th e control
of th e au th o r indicates th a t an equal num ber of language processors will need to be
provided in order to sup p o rt each required directory. To m eet this requirem ent, this
thesis proposes th a t language processors be dynam ically created by th e softw are used
during th e au th o rin g activity.
The au to m atic creation of language processors based on th e knowledge provided by
CFG s requires inform ation ab o u t th e position of bo th term inals and nonterm inals.
A lthough th e g ram m ar stru ctu re im posed by th e E N F determ ines th a t a t m ost one
term inal is p erm itted in production rules, th e num ber of nonterm inals is left unre
stricted. O ne exception to th is is th e operatorless production which is always com
posed of one nonterm inal.
R epresenting th e m eaning of m ath em atical concepts by m eans of CFG s requires th a t
all inform ation which is p a rt of th e concept has to be m apped to th e set of production
rules. This includes th e set of symbols used for th e representation of th e nam e of the
concept, its a ttrib u te s and delim iters.
H aving th e nam e of th e concept as a term inal and b o th its a ttrib u te s and delim iters
represented as nonterm inals introduces th e need for an additional m echanism in order
to distinguish a ttrib u te s from delim iters. For th is purpose, a set of a ttrib u te s is added
to th e g ram m ar rules.
As an extension to th e g ram m ar stru ctu re already proposed for cap tu rin g sem antics,
these a ttrib u te s will also be applied to th e definition of th e term inal symbols. T he
attach m en t of a ttrib u te s to th e rules of C FG s was proposed by K n u th [63, 78]. T he
resulting g ram m ar is called an a ttrib u te d gram m ar.
T he use of a ttrib u te d gram m ars to su p p o rt th e sem antics cap tu rin g of m ath em atical
concepts does not require any m odification to th e approach already presented. B oth
th e com position and extension op erato rs can also be applied to a ttrib u te d gram m ars.
T he following definition presents th is characteristic:
D e f in itio n 16 Let G i (A i, Tj, P i, S i, A , i) and G i = (iV2, T2, P 2, S 2, A , a 2) be two
102
a ttrib u te d context-free gram m ars. T he extension operation G i % G 2 will produce an

a ttrib u te d context-free g ram m ar G 3 = ( N 3 , T S, P 3, S 3, A , a 3) defined as follows:
For th e underlying context-free gram m ars one has G 3 = G\ % G 2 where G,
denotes Gi w ithout a ttrib u tes.
T he m apping a 3 is given by
(6 .10)
for every rule A > w P 3.
D e f in itio n 17 Let G i =
P l5 Si, A , a^) and G i = ( N 2, T2, P2, S 2, A , a 2) be two
a ttrib u te d context-free gram m ars. T he composition operation G\ o G
will produce
an a ttrib u te d context-free g ram m ar G 3 = ( N 3, T3, P 3, S 3 , A , q 3) defined as follows:

For th e underlying context-free gram m ars one has G 3 = G\ o G 2.
T he m apping a 3 is given by
(a i(A -> w ),
a 3(A -> w = <
| ct2 (A > w),
for every rule A
if i 4 -> it'G P i,
if A w G P or '4 > it G P i fi P 2
2
w P 3.
T he following section proposes th e stru ctu re of th e gram m ars which will be used to
sup p o rt th e definition of th e dom ains. T his m eta-g ram m ar is therefore th e tem p late
which will be applied during th e creation of every g ram m ar fragm ent required to
cap tu re th e m eaning of m ath em atical concepts.
6.8
M eta-Structure
In this section a ttrib u te d C FG s are used as an aid to specifying th e sem antics c a p tu r

ing of m ath em atical concepts. Synthesized a ttrib u te s [6] are attach ed to production
rules which are either prim itive or basic. These a ttrib u te s supply additional sem antics
103
of th e concepts involved th a t have been o m itted due to lim itations of th e C FG p a rt of

th e stru ctu re. T he a ttrib u te s are represented as gram m ar variables regular_expr and
cardinality as defined in Table 6.18. T he cardinality variable holds th e position of
th e argum ents th a t are associated w ith th e term inal symbol of th e rule. N onterm inal
cardinality is e for operatorless and prim itive productions because they b o th have no
argum ents. For basic productions cardinality is defined in term s of th e args.position
nonterm inal.
In th is case th e position of th e argum ent is identified bv a positive
integer g reater th a n zero. T he nonterm inal regular_expr is used to represent regular

expressions th a t describe th e symbol arrangem ent applied to th e com position of te r
m inals. R egular expressions used by th is g ram m ar follows b o th syntax and sem antics
defined by lex [66].
m eta
cfg
item s
item s
item
attrib u te s
cardinality
args.position
position
->
>
>
>
->
>
cfg a ttrib u te s
N O N TE R M IN A L : items
item s item
item
TE R M IN A L | N O N TER M IN A L
# regular_expr # cardinality | e
( args.position ) | e
args_position , position | position
IN T E G E R
Table 6.18: P ro d u ctio n rules for th e m eta-gram m ar.
The m eta-g ram m ar p art of PN S is defined by th e set of production rules shown in

Table 6.18. T he proposed g ram m ar defines nonterm inal m eta as th e s ta rt symbol
which stru ctu res th e problem according to th e two nonterm inals on th e right side of
th e rule. T he p a rt s ta rtin g w ith th e nonterm inal cfg defines th e stru ctu re of th e rules
in th e C FG p a rt of th e stru ctu re. N onterm inal a ttrib u te s, proposes th e stru ctu re for
th e synthesized a ttrib u tes.
Table 6.19 illu strates th e organization proposed by th e m eta-stru ctu re. This exam ple
shows th e E N F version of th e g ram m ar displayed in Table 5.16 w ith th e correspond
ing set of a ttrib u te s attach ed to each production rule. P roductions 2, 3 and 8 are
operatorless productions, therefore they have no attrib u tes.
are prim itive productions.
P roductions 9 to 12
For th is reason they have only th e regular expression
104
1
2
3
4
5
6
7
8
9
10
11
12
identity _expr
sample_expr
sam ple.expr
sum
sum_elmts
ran g eJist
s ta rt
end
identifier
leftDel
right Del
expr
sam ple_expr EQ sample_expr

expr
sum
SUM left_del sum .elm ts right.del
ra n g e Jist SUM DEL sam ple.expr
s ta rt LISTD EL end
identifier ITE R A T IO N expr
expr
ID E N T IF IE R
L E FT D E L
R IG H T D E L
IN T E G E R
# " = " # (U3)
# vSum v # (3)
# v; # ( T 3 )
# " ," #(1,3)
# '
# (1,3)
# [a~z]+ #
#{ #
# T #
# [1-9] [0-9]* #
Table 6.19: A ttrib u ted g ram m ar to su p p o rt th e cap tu rin g of sim ple sum m ations,
a ttrib u tes.
6.9
Conclusion
In this ch ap ter I have presented a gram m ar-based docum ent organization to cap
tu re th e m eaning of m ath em atical concepts. T he approach models th e dynam ics of
au th o rin g m ath em atics and su p p o rts th e introduction of user-defined syntax to rep
resent m ath em atical concepts. This m eans, th e sem antics of m ath em atical concepts
included in th e docum ent can be bound to syntax proposed during authoring. These
ideas are expressed in term s of th e D ocum ent D escription Model described as follows.
A D ocum ent D escription Model (DDM ) is a stru ctu re composed of
1. a docum ent dictionary H 3 such th a t all gram m ars in this set are in EN F, and
2. th e following operations:
(a) G ram m ar C reation: introduced in section 6.4 by th e com position opera
to r o.
(b) G ram m ar U pdate: introduced in section 6.4 by th e extension o p erato r 9cG ram m ars resulting from th is o peration as well as from th e g ram m ar cre-
105
ation operation are elem ents of set Cf for some version of th e docum ent
stru c tu re j and docum ent directory i.
3. G ram m ar Identity: provided by th e union operations used for th e creation of
th e dom ain directory G\.
4. Closure: all gram m ars introduced by th e creation and th e u p d ate operations
are in Hj.
106
Chapter 7
Exam ples
Among th e various forms of representation available, the conventional notatio n is the
one which has been used by th e m ajority of th e activities which involve th e com
m unication of m athem atics. A m ajo r lim itatio n on rendering m ath em atical concepts
according to th is notatio n is th e sy n tactic overloading of th e symbols used for th e en
coding of th e operators. This problem has been discussed in Section 5.2, and Figure
5.1 displays three com m on m eanings th a t are usually attach ed to symbol v.
It is assum ed, in th is thesis, th a t people, m ost of th e tim e, get exposed to m athem atics
by m eans of th e encoding provided by th e conventional notation. For this reason this
notatio n has been used in this work as th e basic source of inform ation for th e sem antics
cap tu rin g process. A lthough som etim es th e encoding provided bv th e conventional
n o tatio n is not th e ideal, it is im p o rtan t to m ain tain th e syntactical arrangem ent this
encoding provides. This decision is fundam ental to th e cap tu rin g strateg y because the
choice of a n o tatio n which is widely used should free th e a u th o r from th e requirem ent
of learning th e altern ativ e syntax su pported by th e cap tu rin g system .
In this thesis a docum ent stru ctu re com posed of a ttrib u te d g ram m ar fragm ents is
proposed to cap tu re th e m eaning of th e m ath em atical concepts. C ontext-dependent
representations are su p p o rted by a directory change mechanism where a set of gram
m ars is replaced by an o th er to allow o th er interp retatio n s to be associated w ith the
symbols considered. T he following sections illu strate th e stru ctu re proposed by de
scribing th e process involved d uring th e au th o rin g of sim ple docum ents which only
contain m ath em atical concepts represented by strings of characters.
107
7.1
Example 1: Overloading the + and * symbols
9\
92
93
94
95
96
97
9s
k
ee
ee
it
ep
ep
st
ec
ec
new_expr
ee EQ te
te
IN T E G E R
ep PLUS tp
tp
STR IN G
ec CAT tc
tc
ee
# r="#(l,3)
# [1-9] [0-9]* #
# " + # (1,3)
# [0-9]+ #
# +- # ( 1 , 3 )
Table 7.1: D efault gram m ars.
h
k
h
u
te
tp
te
tc
ep
it
ec
st
Table 7.2: G ram m ar fragm ents created by editing.
Consider th e need to overload th e + sym bol in order to represent b o th th e addition

of integers and th e concatenation of strings of characters. T he following docum ent
illu strates this by m eans of two identity expressions which use th e sam e sy n tax for
th e ir left side of th e equality. This docum ent is called P rototyp e because it is the
a u th o rs first a tte m p t tow ards its creation.
P rototyp e
< dx >
1+ 1+0 = 2
< d2 >
1 + 1 + 0 = 110
</>
</>
108
T he above p ro to ty p e version of th e docum ent is com posed of two dom ains represented
by di and d2. As illu strated by th e syntax of th e docum ent, dom ain
should contain
th e necessary rules to recognize th e left side of th e first equality as th e ad dition of

three num bers. In a sim ilar way, dom ain d2 should contain rules to recognize the
operations on th e left side of th e second equality as th e concatenation of characters.
In order to provide th e com plete stru ctu re to su p p o rt this docum ent, assum e the
au th o r initially has access only to th e set of default gram m ars as defined in Table 7.1.
G ram m ar fragm ents g i, g 4 and
<77
su p p o rt expressions involving equality, addition
and concatenation operations respectively. G ram m ars g% and ge define th e dom ains
over which th e specification of addition and concatenation operations can respectively
apply. G ram m ar fragm ents g2 g$ and g% su p p o rt th e definitions of th e equality, the
addition and th e concatenation operations respectively. G ram m ar lo links gram m ar
gi to th e control m echanism . G ram m ar fragm ents l\, /2, /3 and /4 have been created
by editing in order to provide th e necessary links w ith th e o th er fragm ents.
The
result of th e a u th o rs first a tte m p t to produce a stru ctu re to cap tu re th e m eaning of

th e two m entioned m athem atical concepts is provided by E xl-V ersion 1. This code
is presented as follows and it illu strates th e two dom ains as well as th e expressions
involving th e overloaded operator.
E xl-V ersion 1
<
{lo{g\
0^2}^}^;
{{ti
o { g i o g 5} t 3} t 0;
0 / 1
{ t 0 o l 2 o g 3} d l >
1+ 1+ 0 = 2
< { {< 1 o h o { g 7 g s } t 4 } t 5 o l A o g 6} d 2 >
1 + 1 + 0 = 110
</>
</>
As stated before, th e m ain objective of this initial version of th e docum ent is to rep
resent b o th th e ad dition and concatenation operations by th e + symbol. For this
purpose th e a u th o r organizes th e inform ation to be presented into two sep arate do
m ains as a way of resolving th e sem antical nondeterm inism generated bv overloading
th e -I- symbol. T he g ram m ar fragm ents used for th e definition of dom ain d\ have
109
been o b tained from dom ain directory G? and th e fragm ents used for th e definition of
dom ain d2 were taken from dom ain directory G. T he com plete definitions to support
this version of th e docum ent are described next according to th e docum ent stru ctu re
proposed in C h ap ter 6.
{gi o g 2}t2
{/0 0 f 2 } t l
{g4 0 g 5}t.3
{ h o h o f 3 }*0
{ t 0 o l 2 o g 3}di
ee
ee
new_expr
ee
ee
ep
ep
new_expr
ee
ee
te
ep
ep
new_expr
ee
ee
te
ep
ep
tp
it
ee EQ te
te
ee
ee EQ te
te
ep PLUS tp
tp
ee
ee EQ te
te
ep
ep PLU S tp
tp
ee
ee EQ te
te
ep
ep PLUS tp
tp
it
IN T E G E R
# = # ( 1 , 3 )
# ?r=TT # (1,3)
# " + r # (1,3)
# = # ( 1 , 3 )
# " + r # ( U3)
# = # ( 1 , 3 )
#"+ "#(1,3)
# [1-9] [0-9] #
Table 7.3: G ram m ars in dom ain directory G? th a t have been created by gram m ar
operations.
T he current version of th e docum ent is su p p o rted by th e docum ent instance stru ctu re
S 0 = ( D 0, c ). T he organization of th e sem antic stru ctu re D 0 is defined in term s of its
two dom ain directories G j and G for th is initial version of th e docum ent as follows:
A ) = (G ?,G )
(7.1)
where G is defined as
G =
U Fi U G{* = {pi, p2, <73) <74) <75) /o,
to,
<î}
(7-2)
110
{9798)tA
Oi h
4 ) ^ 5
# r+ r #(1,3)
co'
4b
Jl
ec CAT tc
tc
ee
ee EQ te
te
ec
ec CAT tc
tc
ee
ee EQ te
te
ec
ec CAT tc
tc
st
STR IN G
4b
{ t b o U o g 6} d 2
ec
ec
new_expr
ee
ee
te
ec
ec
new_expr
ee
ee
te
ec
ec
tc
st
# " + " # (1,3)
# " = ' # (1,3)
# " + " # (1,3)
# [0-9]+ #
Table 7.4: G ram m ars in dom ain directory G 2 th a t have been created by g ram m ar
operations.
with
= 0i> ^ } ;
F = { g i , g 2,g3,g4,g5,lo}-
C? = {to, t l , t-2 , t.3, rfi}
(7.3)
and G 2 as
G 2 = N % U F 2 U C 2 = { g e , 9 7 ,98, h, h, ^ ,^ }
1
(7.4)
with
= {ge, 97,98, h,U}'-
-P21 = {^1}-
C 2 = {4, t 5, ^ 2 } .
(7.5)
T he to ta l set of gram m ars m anipulated by this initial p ro to ty p e is given by

2
= U G = {g\, 9i, 9z, g^, gb, 98,9i, 9 8 A a , h , h , h , h , t \ , t 2 , h , U , h , d.\^ d 2 }
(7.6)
i=l
Now consider th e need to modify th e current version of th e docum ent in order to

include two o th er concepts: m ultiplication of integers and consecutive concatenation
of strings of characters which is here called th e power of strings. T he power operation
Ill
is a binary infix op erato r which concatenates its left operand th e num ber of tim es
stated by its right integer operand.
Syntactically, b o th operations are represented by th e * symbol. This characteristic
indicates th a t two d istin ct directories will need to be provided in order to capture
th e m eanings
under th e label E xl-V ersion 2
assigned to th e * symbol. T he code
presents expressions involving these operations aswell asb o th th e
addition and the
concatenation as introduced by E xl-V ersion 1.
E xl-V ersion 2
< d2 >
1 + 1 + 0 = 110
<
{ ^ 2 % { h {<?9 5 l o } ^ 6 0
>
1 + 1 *0 = 1
<
{t5 h k g n k g 3 g n }d i >
a * 3 + b = aaab ;
1 * 3 = 1 + 1 + 1 = 3:
</>
1*3 = 1 + 1 + 1 = 3
</>
</>
T he code presented by E xl-V ersion2 above, makes use of three d istin ct dom ains
d 2,
d 3 and d4. A lthough dom ain d 2 has been reused from th e
th e docum ent, dom ains d 3 and d 4 needed to be created.
previous version of
T he com plete definition
of th e stru c tu re which su p p o rts this is provided by th e docum ent instance stru ctu re
S i = (>i,c).
directories G},
where
T he sem antic stru c tu re D \ is defined in term s of its th ree dom ain

and G 3 for this version of th e docum ent as follows:
D 1 = ( G 1, G 12, G 13, G 12)
(7.7)
G} = N l U F} U C \ = { d 2}
(7.8)
is defined as
w ith
.%' = {};
F,1 = {<(,};
C} = {},
(7.9)
112
G \ as
G =
U F2 U C2
= {<?g, ^ , hi ^6) ^-6) d2, d 3}

1 0
(7-10)
with
F 2 = { d 2}:
= {gg, giQ,h,k}'-
C\ = {U,dz},
(7.11)
and G as
3
^ 3
= -W3 u
^3
^3
= {#3, tfiii P 1 2 , h, k , h, h i <M
(7-12)
w ith
^
= {^ 11^
12^ 7 ,
Fg1 = { p 3, f 5};
^8,^9, };
C = { d 4}
3
(7-13 )
The gram m ars involved in this new version of th e docum ent stru ctu re are given by
3
H\ = | J G] = {g3i gî g\0i g\\i g \ 2 i h i h i h i h i l 9 i h i h i d 2 , d ^ , d 4 }
(7-14)
1=1
99
gio
h
h
tm
tm
tp
fm
tm MULT fm
fm
tm
it
# v* " # ( l , 3 )
T able 7.5: G ram m ars in dom ain directory G \ created bv editing.
Tables 7.5 and 7.6 provide gram m ars which belong to dom ain directory G \ . These
gram m ars have been introduced by editing and by gram m ar operations respectively.
Table 7.7 shows th e gram m ars which belong to G . They were introduced by editing.
3
T able 7.8 shows th e gram m ars in G which were introduced by editing.

3
7.2
Example 2: Symbols as operators and operands
This exam ple proposes a sem antic stru ctu re to su p p o rt th e syntactical overloading
of th e sym bols + and *. Two different m eanings are attach ed to each sym bol and
113
{<79 0 <7lo}^6
{^2
% {h t e 0 ^ } } ^ 3
tm
trn
new_expr
ee
ee
te
ep
ep
tp
tp
tm
tm
fm
it
tm MULT fm
fm
ee
ee EQ te
te
ep
ep PLU S tp
tp
it
tm
t m MULT fm
fm
it
IN T E G E R
# v*v # ( 1 , 3 )
# : = " # (1,3)
# :+ " # ( 1 , 3 )
# ;' * " # ( 1 , 3 )
# [1-9][0-9] #
Table 7.6: G ram m ars in dom ain directory G \ th a t have been created bv g ram m ar
operations.
9n
9n
h
h
h
tp
st
tc
tc
fp
St P O W E R fp
A LPHA N UM
tp
St
it
# (1,3)
# [ 0 - la - s ] + #
#
Table 7.7: G ram m ar in dom ain directory G created by editing.

3
each m eaning requires a custom ized dom ain where gram m ar fragm ents are needed to
sup p o rt th e sem antic cap tu rin g process.
A lthough th e sem antics usually attach ed to these symbols characterizes them as bi
nary o perators, as provided by dom ain d3, m any oth er m eanings may also be associ
ated w ith them . O ne possibility, for exam ple, is to have them as th e elem ents of a
set. For th is scenario, th e two symbols will be th e operands of th e comma " ,v binary
op erato r which is used to organize th e elem ents of a set in a list form at. This char
acteristic is illu strated by th e single statem en t defined w ithin th e scope of dom ain d$
in th e sem antic stru ctu re th a t follows:
114
{ t 5 0 l7 0
/ 8
0 <7U
O /g
<73
< 7 1 2 )^ 4
new_expr
ee
ee
te
ec
ec
tc
tc
tp
ee
ee EQ te
te
ec
ec CAT tc
tc
tp
st
st P O W E R fp
it
IN T E G E R
ALPHANUM
fp
it
st
# " = r # (1,3)
#"+' # ( 1 ,3 )
#"*"#(1,3)
# [1-9] [0-9]* #
# [ - l a - ]+ #
0
Table 7.8: G ram m ar in dom ain directory G created by g ram m ar operations.

3
Ex2:
<d3>
0 + 1*1 = 1
<d5>
R = S = { + ,* }
</>
</>
^10
In
9l3
514
515
516
517
518
te
te
id
bs
endset
el
el
tl
id
bs
ID E N T IF IE R
SET el endset
E N D SE T
el LISTD EL tl
tl
B IN A RY O P
#
#
#
#
[A-Z\ #
' {" # (2,3)

T #
,!= v # ( l , 3 )
# [+*] #
Table 7.9: G ram m ars in dom ain directory G!> created by editing.
Tables 7.9 and 7.10 illu strate all g ram m ars required for this exam ple. Since gram m ar
ti has already been defined in Section 7.1 it has not been included in these tables.
115
bs
endset
el
el
new_expr
ee
ee
te
te
id
bs
endset
el
el
tl
{<714 0 <7l5}^6
{<7i6 0 9 n } t 7
{fi o / i o / n <?i3 ot.6 o t 7 o g \ s } d 3
SET el endset
EN D SET
el LISTD EL tl
tl
ee
ee EQ te
te
id
bs
ID E N T IF IE R
SET el endset
EN D SET
el LISTD EL tl
tl
B IN ARY O P
#T # ( 2 )
#T #
# :' = " # (1,3)
# " = r # (1,3)
# [A-Z] #
#"{" #(2)
# ''} ' #
# V #(1,3)
# [+*] #
Table 7.10: G ram m ars in dom ain directory G created by g ram m ar operations.
According to th e m eta-g ram m ar defined in Table 6.18 th e a r g s - p o s i t i o n nonterm i

nal, included there, has th e purpose of identifying th e position of th e argum ents of
th e m ath em atical concept represented by th e associated rule. This nonterm inal is, in
th e m eta-gram m ar, expanded as a list of integers.
G ram m ar fragm ent g u has integers 2 and 3 as its attrib u tes. A ccording to th e m etagram m ar, these two a ttrib u te s determ ine th e position of th e nonterm inals which are
relevant to th e definition of th e concept presented by th e only production rule th a t
g ram m ar g u has. A rgum ent 2 relates to th e list which is defined by g ram m ar g\.
A rgum ent 3 indicates where th e delim iter for th e end of a set representation is placed.
T he notion of sp littin g pairs of sym bols which together are p a rt of th e syntax of a
concept is used here as a way to ensure th a t production rules involving these concepts
are in th e EN F. For th is reason th e symbol { from th e pair {} has been used for the
definition of a set in g ram m ar g14.
T he rest of this section describes th e sem antic stru ctu re D 0, for this exam ple, accord
ing to th e model proposed in C h ap ter 6. T he two dom ains d3 and d 5 are defined as
elem ents of th e dom ain directories G and G respectively as follows:
D o
= (G ?,G ,G )
(7.15)
116
D om ain G is defined as
G = JV U F? U C f = { d 3}
(7.16)
with
N i={h
C? = 0
Fi={dsh
(7.17)
and G as
^
U F 2 U C j = {Zio, Zn,3i3, <7m, Pis, <7i6i 9 1 7 , Pis, ^6, *7, d s }
(7 -1 8 )
with
= {^10 J l l , Pl3, Pl4, Pl5, Pl , 017, Pl }:

6
^ = {^l}'
^ 2
= {^ , ^7, ^ } (7.19)
6
The gram m ars required for this exam ple are provided by
2
Ho = ( J G, = {p13, P , pis, pi , P , Pis, 7io, 7n, te, t 7, d5}.

1 4
1 7
(7.20)
j= i
7.3
Example 3: More meanings for the + symbol
The docum ent stru ctu res introduced by th e previous exam ples illu strated a scenario
where th e overloading of symbols took place in d istin ct expressions.
This m eans
a given sym bol appeared in m ore then a single expression w ith different m eanings
associated w ith it.
This problem was approached by a context switch where the
current dom ain was replaced by an ad eq u ate one th a t provided th e necessary gram m ar
sup p o rt for th e cap tu rin g of th e m eaning of th e concepts involved.
Symbol overloading may also take place w ithin th e expression itself. For th is scenario
th e context switch would introduce as m any d istin ct dom ains as th e num ber of differ
ent m eanings which are associated w ith any given symbol included in th e expression.
This section presents a docum ent stru c tu re to su p p o rt expressions which require more
then a single dom ain to cap tu re th e m eaning of th e concepts they represent. To il
lu stra te th is problem consider th e following expression which attach es two different
117
m eanings to th e symbol + .
|A + B | + 1 = a
(7.21)
T he sem antics of expression 7.21 determ ines th a t a is equal to th e ad dition of integer

1 and th e d eterm in an t of th e result of th e ad dition of m atrix A w ith m atrix B . As it
can be seen th e symbol + is overloaded since it is used to represent b o th th e addition
of m atrices and th e ad dition of integers. T he rest of this section provides a stru ctu re
to cap tu re th e sem antics of this expression.
Ex3:
{< 7l9{< 720 0 9 2 } h } t 9 -
<
{ ^ 9 921 0 <722 } # 6 >

IA
+ B | < { <723 ^ 2
0 <724 } # 7 >
+1 = a
</>
</>
<7l9
*720
<721
<722
new_expr
ee
et
endet
D E T ee endet domain_scope
ee M ATRIX_ADD et
M ATRIX J D
ENDET
#T
# (2, 3, 4)
# " + " #(1,3)
# [A-Z \ #
#
Table 7.11: G ram m ars in dom ain directory G \ created by editing.
<723
<724
new_expr
te
PLU S ee
CO N STA N T
" + :' #
(2 )
# (0| [19] [0-9]*)|[a-z] #
Table 7.12: G ram m ars in dom ain directory G \ created by editing.
T he sem antic stru ctu re for th is docum ent is defined as follows:

D 0 = (G ?,G )
(7.22)
118
ee
et
new_expr
ee
et
new_expr
ee
et
endet
{<?20 0 <?2}^8
{<7l9 0 t s } h
{g 0 p21 0 5 2 2 ) ^ 6
ee M ATRIX_ADD et
M ATRIX J D
ee M A TR IX .A D D et
M A T R IX JD
ee M ATRIX_ADD et
M A T R IX JD
ENDET
# + # 0 , 3 )
# [A -Z \ #
# (2 ,3 ,4 )
# " + " # ( 1 ,3 )
# [A -Z \ #
# " | " # (2 ,3 ,4 )
# r +" # ( 1 ,3 )
# [A -Z ] #
# " |" #
#T
new_expr
ee
ee
te
{<723 0 f 2 0 <724 } < #
PLUS ee
ee EQ te
te
CO N STA N T
:' + "
(2)
# = v # 0 ,3 )
# (0 |[l-9 ][0 -9 ]* )|[< H #
2
where G is defined as
G?
U F f U C f = {p 2i <7i9, <?20,52i, 922 , h , h , #>}
(7 .2 3 )
w ith
^ i = {<?i9, <720, <721 , # 2 2 };
F f = {^ 2};
G f = {<8, <9 , d 6}
(7 .2 4 )
and G as
2
= A ^ 1 U F 2 U C 2 = {<723 , 924 , h , d j )
(7 .2 5 )
w ith
= { 923, 924, <#};
F 2 = { t 2};
C 2 = { d 7}.
(7 .2 6 )
T he to ta l gram m ars m anipulated by th is initial pro to ty p e is given by

2
H q | J G = {<?2, <?19, #20, <721, <722, <723, <724#2# 8 # 9 , ^6, d 7}
(7 .2 7 )
119
Tables 7.11 and 7.13 are b o th associated w ith dom ain directory G?.
Table 7.11
shows all gram m ars in this directory th a t were created by editing and Table 7.13 the
gram m ars th a t were generated as th e result of com position operations. In a sim ilar
way th e gram m ars in Tables 7.12 and 7.14 are associated w ith th e dom ain directory
G. T he g ram m ars in Table 7.12 are th e result of editing and th e gram m ars in 7.14
were created by com position.
As discussed in Section 7.2 th e integer a ttrib u te s which are introduced as p art of
th e rules of some gram m ars, have th e purpose of determ ining th e position of the
relevant nonterm inals of a rule. In Table 7.11 gram m ar fragm ent
<719
uses a ttrib u te s
2 ,3 and 4 to refer to its three nonterm inals th a t are necessary in order to su pport
th e correct expansion of this rule. N onterm inals ee and endet are associated with
a ttrib u te s 2 and 3 respectively. A lthough bo th nonterm inals ee and endet belong
to th e sam e dom ain directory, nonterm inal d o m a in sc o p e , which is associated w ith
a ttrib u te 4, does not. As p a rt of th e dynam ic control gram m ar, this nonterm inal is
associated w ith th e context switch which is need to provide th e ad eq u ate gram m ar
for th e m ath em atical concepts being processed.
120
Chapter 8
T he Processing Structure
In C h ap ter
th e dynam ics of au th o rin g m athem atics was m odeled by m eans of a
docum ent organization th a t uses C FG s as its fundam ental form alism . This chapter
presents a processing stru ctu re for th e proposed organization.
8.1
Dynam ic Authoring and Language Fragments
T hro u g h o u t th e previous chapters I have investigated problem s related to m odeling

th e m ath em atics' au th o rin g behavior.
O ne of my m ajo r concerns when designing
a solution to th is problem was th a t a t any instance during th e au th o rin g activity

th e m ath em atical concepts included in th e docum ent had th e ir sem antics captured
regardless of th e syntax used by th e a u th o r for this purpose. This approach faces
th e challenge of processing user-defined sy n tax 1. This m eans a language processor to
verify th e syntactical validity of such a docum ent m ust be provided w ith th e necessary
tools to su p p o rt th e processing of unpredicted language statem ents.
To recognize a given syntax such as a strin g of symbols say, for instance, w it requires a
C FG G such th a t w C L (G ). As already em phasized, allowing user-defined syntax for
expressing th e sem antics cap tu re of m ath em atical concepts introduces th e possibility
of symbol overloading. In order to ensure th a t gram m ar G is not used to recognize
1A similar problem has been approached by [60] where a meta-language addition to the PASCAL
programming language was proposed. The mechanism allowed the programmer to introduce his/her
own syntax to the language.
121
syntax definitions which contain overloaded symbols the sem antic characteristics of
concepts needs to be considered. T he expression
, for instance, could be
used to illu strate bo th th e boolean O R o peration and th e integer ad dition depending

on th e context determ ined by th e au th o r. C onsequently no single C FG should be
provided to cap tu re bo th meanings.
One fundam ental idea I have applied to su p p o rt th e use of CFG s to approach the
sem antics cap tu rin g problem is th e fact th a t au th o rin g m athem atics is an increm ental
activity. U nder th is assum ption th e final docum ent may be viewed as th e result of a
set of docum ent m odifications perform ed by th e a u th o r or for short a set of authoring
increm ents. A nother way to express th is is th a t th e dynam ics of au th o rin g m ath e
m atics can be m odeled as a set of states w here each sta te is uniquely characterized
by a C FG or scope. In other words a finite au to m ato n whose states are C FG s and
tran sitio n s are supplied by th e au th o r. O ne problem w ith this association is to de
term ine th e boundaries of an au th o rin g increm ent. This m eans when one ends and
th e next is to be considered.
To get around this nondeterm inism I have used th e state change concept as a mech
anism to resolve am biguities. O f course a s ta te change, in this context, m ust also be
triggered whenever th e syntax used for a given concept cannot be recognized by the
gram m ars defined for th a t state.
A uthoring therefore requires no scope change as
long as no syntactical am biguities are introduced and all syntax proposed are valid
statem en ts for th e current scope. T he syntax attach ed to a concept will only be valid
w ithin a given scope and will be recognized as long as the scope it belongs to is active.
A ccording to this strateg y th e docum ent a t th e end of th e au th o rin g activ ity will be
organized as a sequence of sets of gram m ars. Since th e docum ent has been created
by an increm ental approach it is intuitive to stru ctu re its processing by m eans of a
mechanism th a t su p p o rts this characteristic. In essence new language processors will
need to be provided as new scopes are introduced. This m eans th e dynam ic au th oring
characteristic determ ines increm ental changes to be m ade in th e n o ta tio n /lan g u ag e
used. Therefore increm ental changes also need to be provided to th e gram m ars used
for th e definition of th e n o ta tio n /lan g u ag e. This process may be viewed as a language
proto ty p in g activ ity where language fragm ents are included as a way to su p p o rt new
features.
122
8.2
Processing Grammar Fragments
A ccording to [104] a program m ing language processor is an application which m anip

ulates program s expressed in a given language. In this thesis, language processors or
processors will also be used to refer to these program s. Some well known program m ing
language processors are com pilers and in terp reters [104],
A ccording to [ ] th e design of a com piler can be logically stru ctu red as th e front end
6
and th e back end. T he p arts associated w ith th e source language are th e lexical and
syntactic analysis, th e symbol table creation, th e sem antic analysis, th e generation of
interm ediate code and code optim ization. T he front end is th e collection of all these
p arts. T he back end portion is related to tasks th a t are associated w ith th e targ et
language. Therefore ta rg e t code generation and target code optim ization are back
end tasks. T he symbol tab le m anagem ent and error handling are tasks which are not
restricted to a single phase. These tasks may belong to b o th th e front and back end
phases.
As described above th e phase oriented decom position approach views a program m ing
language as a single indivisible object. An altern ativ e way would be to describe a
language as a collection of fragm ents such th a t th eir com bination would provide the
sam e processing power as th e indivisible definition. T he im p o rtan t characteristic of
this approach is th e fact th a t language fragm ents can be defined to represent not only
syntax b u t also th e sem antic stru ctu re of language constructs.
T he following section presents th e organization th is thesis proposes to th e construction
of docum ent processors to su p p o rt th e dynam ics of au th o rin g m athem atics.
The
solution com bines b o th notions of phase oriented processing and fragm ented language
definitions.
8.3
Dynam ic Authoring and Docum ent Processors
In Section 6.2.1 I have introduced a docum ent stru ctu re to model th e dynam ics of
auth o rin g m athem atics. T he model described there organizes au th o rin g as a sequence
of sets of gram m ars. In this organization each set captures th e syntax and portions
of th e sem antics of some m ath em atical concepts th a t have been included in th e docu
m ent. A com plete sequence, in th is case, characterizes one stage during th e au th o rin g
activity. In o th er words it corresponds to a version of th e docum ent.
123
In order to process a given version of a docum ent, say for instance, version v, the
docum ent processor m ust step th ro u g h th e com plete sequence of sets of gram m ars
which is associated w ith v s ta rtin g from th e sequence's first elem ent.
As a result
a context sw itching or scope change will take place whenever a set of g ram m ars is
replaced by another. This procedure is th e approach this thesis proposes to capture
sem antics th a t is associated w ith th e field of knowledge th a t m ath em atical concepts
belong to.
It is through this m echanism th a t th e m eaning of concepts which are
represented by overloaded syntactical constructs are captured.

As proposed in Section 6.2.1 expression Sj = ( D 3, c ) w ith j > 0 describes th e s ta te of
th e docum ent a t a given instance d uring au thoring. In this case D-j represents th e sets
of gram m ars needed to su p p o rt th e creation of version j of th e docum ent. S upport
for th e sequencing behavior is provided by th e binding control g ram m ar c. In this
section th e following definition refers to th e organizations defined by b o th D j and r
to present a possible arrangem ent of language processors to handle th e dynam ics of
auth o rin g m athem atics.
D efinition 18 Assume M is th e binding control gram m ar expressed in EN F. Con

sider a given version of a docum ent stru ctu re say, for instance, version j .
Let
D j = (G j, G 32 , , G^ ) be th e sem antic stru ctu re associated w ith version j and

P M%Gi be th e language processor for directory i such th a t
P m %g j- : o b ject's syntax hierarchical representation
T he language processor for docum ent stru ctu re S3 is defined by th e d eterm inistic
finite au to m ato n
PDj = ( Q j,E j, j,S j,F j)
where
Q j is th e set whose elem ents are all processors associated w ith th e directories
th a t com pose th e sem antic stru ctu re D 0,
sj = P'm % g {
= FM%GJnj
E j is th e set containing elem ents which are th e syntax of m ath em atical objects
associated w ith version j of th e docum ent, and
124
For all w G E j
' = P M%&
M % C,\
Q ] - { P m % g >}
8.3.1
if w e L ( G t ) ,
otherw ise
Example
C h ap ter 7 provides a set of exam ples to illu strate th e organization th is thesis proposes
to su p p o rt th e dynam ics of a u th o rin g m athem atics. A scenario where two versions of
a simple docum ent containing m ath em atical expressions th a t overloads th e + symbol
is provided in Section 7.1.
Tw o docum ent instance stru ctu res So = (D 0,c) and
S i = ( D \, c) have been created to su p p o rt th e m ath em atical objects introduced during
authoring.
T he language processors associated w ith each version of this docum ent are therefore
P d 0, for th e first version, and P di for th e second. T he sem antic stru ctu re for the
second version is
D, = (G|,G5,GS,GJ)
and th e set of states for its language processor is
Qi
{ P m % g \ i P m %g \ i P m % g \->Pm %g \ }
125
Chapter 9
Concluding Rem arks
This work introduced a user-oriented organization to sup p o rt th e creation of m ulti
purpose m ath em atical docum ents. To approach this characteristic a m echanism to
cap tu re th e sem antics of th e m ath em atical concepts was proposed. This mechanism
m odels th e dynam ics of au th o rin g and allows m eaning-to-svntax bindings to take
place d uring th e au th o rin g activity. It also provides th e au th o r w ith th e power to
select th e sy n tax h e/sh e believes is th e m ost ap p ro p riate to express th e ideas to be
com m unicated. A processing stru c tu re to su p p o rt th e proposed organization was also
presented.
9.1
Discussion
T he organization introduced by th e au th o rin g model proposed in this work determ ines

th a t th e sem antics of a m ath em atical concept is captured by th e set of g ram m ars th a t
com pose th e directory which is associated w ith th e concept. G ram m ars in this set
are stru ctu red according to th e following characteristics: They either
1. have been created and are already available,
2. are th e result of g ram m ar operations, or
3. have been created by editing.
It is expected th a t th e m ajority of th e m ath em atical concepts included in th e docu
m ent are su p p o rted by gram m ars which are already available. This m eans they are
126
p art of a library and are ready to be used. In th e event th a t new concepts need to
be introduced or th e ir m eaning-to-syntax m appings need to be modified th e model
determ ines th a t th e needed gram m ars are to be created by either editing or by the
application of operations on th e existing gram m ars or a com bination of these two
approaches.
E d itin g could be required only when few gram m ars are available or
whenever th e concepts to be expressed require syntax th a t may not be su pported by

operations on gram m ars th a t are already available.
Cognitive load is th e degree to which cognitive resources are required for activities th a t
facilitate learning [99, 26]. A ccording to [82] cognitive load increases w ith th e am ount
of inform ation to process. In [94], Salom on defines m ental effort as th e num ber of
non -au to m atic elaborations necessary so solve a problem . As noted by C lark in [31],
m ental effort increases linearly and positively as th e cognitive load increases. B ut how
can com puter-based system s be designed to reduce th e cognitive load? As em phasized
by [82] inform ation overload can be reduced by m odeling th e user:
A user model can be described as a system knowledge source containing
assum ptions on aspects of th e user th a t guide th e behavior of th e system .
T he goal of building a user model is to reduce th e users inform ation load.
This can be accom plished by ad ap tin g either th e representation of th e
task or th e task itself.
In th e context of this work th e task is au th o rin g m athem atics and th e representation
of th e task is th e approach taken for authoring. It is by reading and h andw riting
th a t hum ans, m ost of th e tim e, becom e exposed to m athem atics. Consequently the
m ental model developed, d uring this activity, is th e result of associations involving a
p en /p ap er-b ased form of representing th e a b strac t m ath em atical concepts. In other
words sem antics of concepts are bound to syntax.
This thesis proposes a docum ent organization which:
1. m odels th e dynam ics of au th o rin g m ath em atics and
2. allows th e au th o r th e possibility of expressing m ath em atical concepts bv m eans
of syntax h e/sh e feels com fortable w ith.
T he cognitive load associated w ith au th o rin g m athem atics by m eans of user-defined
syntax should therefore be reduced.
By providing h is/h e r own m eaning-to-syntax
127
bindings th e au th o r is free from details of notations which introduce oth er bindings

h e/sh e is not com fortable w ith. Em pirical stu d y results collected by [7] determ ined
th a t th e am ount of errors produced by users when entering m ath em atics on co m put
ers increases when longer equations are considered. A lthough not reported in th eir
experim ent it can be hypothesized th a t th e am ount of errors produced by users d u r
ing entering expressions is also increased w ith th e com plexity of th e n o tatio n used
due to th e cognitive load increase. Consider, for instance, th e representation of the
sum m ation in th e O penM ath system found in Subsection 1.2.5. T he syntax used for
this exam ple is com plex and therefore not ap p ro p riate for speech input. Furtherm ore,
due to its length, according to [7], th is form of representation is prone to input errors.
T he representation of this type of sum m ation is simplified when captured by m eans of
th e approach proposed in this thesis. A sim ilar exam ple may be found in Section 5.8
which requires only a single line of te x t to cap tu re th e sum m ation.
A ccording to [7] th e m ultim odal handw riting-plus-speech form of entering expressions
was faster and b e tte r liked th a n th e keyboard-and-m ouse m ethod. In th is case allow
ing th e au th o r th e possibility of m ultim odal in p u t should be beneficial if th e au th o r
has th e freedom to propose th e m eaning-to-syntax binding. As noted in earlier in
this thesis, approaches such as M athM L and O penM ath have not been designed to
sup p o rt m ultim odal forms of input. This lim itatio n and th e oth er two lim itations
in Subsection 1.2.7, counter-intuitive entry order and com plex syntax form at, are
overcome by th e approach proposed in th is thesis.
9.2
Authoring with Grammar Fragments
In this dissertatio n I have described th e goal of cap tu rin g th e m eaning of m ath em atical
concepts by m eans of a docum ent stru c tu re which
1. allows th e sem antics of m ath em atical concepts be encoded by user-defined syn
tax, provided th e notatio n is context-free and
2. su p p o rts bo th extensibility and am biguity characteristics of th e conventional
m ath em atical notation.
In C h ap ter 1 I have m ade three claim s concerning my approach to au th o rin g docu
m ents containing m athem atics. These claims are repeated here followed by com m ents
ab o u t th e approach I took to accom plish each one of them .
128
1. B oth the m eaning and syn tax o f m athem atical concepts can be captured by a t
tributed context-free gram m ars. T he solution I have proposed to cap tu re the
sem antics of m ath em atical concepts is based on an organization th a t considers

th e a u th o r's needs as a fundam ental requirem ent. To su p p o rt this ch aracter
istic I have m odeled th e dynam ics of au th o rin g by m eans of a gram m ar-based
docum ent stru ctu re, th e DDM. A ttrib u ted context-free gram m ars are used in
this stru ctu re. T he a ttrib u te s determ ine th e following:
th e position of th e o p erato r's operand, and
th e necessary stru ctu re to identify th e symbol arrangem ent to represent a
given m ath em atical concept.
2. E xten sibility can be supported by operations on the attribu ted gram m ars. Three
concepts related to extensibility were introduced in C h ap ter 6, th e extension
norm al form, operation on g ram m ars and fundam ental gram m ars.
T he extension norm al form was proposed in order to determ ine th e gram
m ar form at to be used. T he form at lim its th e num ber of term inal symbols
in th e g ram m ars rule. It also determ ines th e possible te rm in al/n o n te rm in al
sym bol arrangem ents each production rule m ust follow.
B oth th e com position and th e extension binary operations are defined for
gram m ars in extension norm al form and both retu rn gram m ars also in the
extension norm al form. T hey allow th e creation of gram m ars by com bining
previously defined gram m ars. This approach introduced th e possibility of
g ram m ar reuse and increm ental g ram m ar definition.
T he notion of fundam ental g ram m ars established th e basic building blocks
to be used for th e cap tu rin g activity. T he three types of gram m ars defined
for this purpose provide th e necessary means to su p p o rt th e creation of
any possible gram m ar. This statem en t is supported by th e fact th a t each
one of these th ree gram m ars has only a single production rule which is of
one of th e types proposed by th e extension norm al form.
Since th e com position and extension operators are defined for gram m ars in
th e extension norm al form, th e application of these operations on fundam en
ta l gram m ars will produce gram m ars which are also in th e extension norm al
form. This m eans gram m ars can be created during th e au th o rin g activ ity and
129
a program m able form of extensibility is therefore possible allowing user-defined

sy n tax to be introduced d uring th e au th o rin g of th e docum ent. This m echa
nism assum es a set of default gram m ars is available. This has been described in
section 6.1 when a logical diagram introducing th e activities involved during the
au th o rin g of m ath em atical concepts was presented. A detailed consideration of
this characteristic has been provided in section 6.2.
3. A m bigu ities generated by sym bol overloading can be resolved by a scope m ech
anism . As defined in C h ap ter 6 a docum ent instance stru ctu re S is th e tuple
(D , c) w here D is th e sem antic stru ctu re and c th e binding control. T he sem an

tic stru c tu re D is a finite sequence of finite sets of gram m ars. These sets are
represented by a dom ain directory G f where i determ ines th e position th e set
holds in th e sequence and k refers to th e version of th e docum ent considered.
T he binding control c is a C FG which defines th e scope in which th e rules
provided by each dom ain directory in th e sem antic stru ctu re are valid. This
m eans term inals defined in any given dom ain directory are local to th e scope
determ ined by this stru ctu re.
9.3
Future Work
As stated in Subsection 1.2.8 com positionality of m eaning is a design decision for

system s characterized by a sta tic syntax. In order to allow user-defined syntax to be
provided a t ru n -tim e additional com plexity concerning th e application of th e com
positionality principle is introduced. This is because gram m ar rules will also need
to be supplied a t run-tim e. For th is scenario com positionality is a system property.
Therefore any g ram m ar th a t im plem ents th e system m ust include su p p o rt for com
positionality.
T he notion of fundam ental g ram m ar introduced in C h ap ter 6 may be applied to sup
po rt th e application of com positionality. Since these gram m ars are th e basic building
com ponents, th e representation of any concept will be subjected to th e restrictions
they introduce. Therefore com pound concepts m ust be decom posed. T he questions
to be asked a t th is stage are:
1. Is th e resulting decom position com positional?
130
2. Is th ere a way one can ensure com positionality of m eaning for such system s?
These questions I leave as open. A detailed investigation of th e application of com
positionality is therefore a fu tu re goal.
A part from the com positionality problem
th e com plete im plem entation of th e organization proposed in this dissertatio n is an

im m ediate priority.
131
References
[1] J. A b b o tt: O penM ath Design C om m ittee R eport. Technical report, O penM ath
C onsortium , 1996. Available from h ttp ://w w w .o p e n m a th .o r g /.
[2] J. A b b o tt, A. Diaz, R. S. Sutor: A R ep o rt on O penM ath, A P rotocol for the
Exchange of M athem atical Inform ation. S IG S A M B u lletin 30(1) (M arch 1996),
21-24.
[3] J.
A b b o tt,
M ath.
A.
van
Leeuwen,
A.
S trotm ann:
O bjectives
Technical report, O penM ath C onsortium , 1996.
of O pen
Available from
h t t p : / /www. o p en m ath . o r g / .
[4] G. D. Abowd: Form al A s p e c ts o f H u m an -C om pu ter Interaction. P hD thesis,
Oxford University, Oxford, E ngland, 1991.
[5] S. R. Adam s:
M odu lar G ram m ars for P rogram m in g Language P ro to ty p in g .
P hD thesis, U niversity of S o u th h am p to n , S outhham pton, England, 1991.

[6] A. V. Aho, R. Sethi, J. D. U llm an:
Com pilers: P rinciples, Techniques and
Tools. Addison-Wesley, 1986.
[7] L. Anthony, J. Yang, K. R. K oedinger:
E valuation of M ultim odal In p u t for
E ntering M athm atical E quations on th e C om puter. In C H I 05: CH I 05 E x

ten d ed A b s tr a c ts on H um an F actors in C o m p u tin g S ystem s. 1184-1187. ACM
Press, 2005.
[8] D. S. A rnon, S. A. M am rak: On th e Logical S tru ctu re of M athem atical N ota
tion. T U G b o a t 12(4) (1991), 479-484.
[9] R. A rrabito:
Using
to P roduce B raille M athem atical N otation.
U niversity of W estern O ntario, U n d erg rad u ate Thesis.
1987.
132
[10] R. G. A rrabito:
C om puterized Braille T ypesetting: Some R ecom m endations
on M ark-U p and Braille S tandards. M asters thesis, T he U niversity of W estern

O ntario, London, C anada, 1990.
[11] R. G. A rrab ito , H. Jiirgensen: C om puterized B raille T ypesetting: an o th er view
of M ark-U p stan d ard s. E lectron ic P u blish in g 1(2) (Septem ber 1988), 117 131.
[12] A. A sperti, G. Bancerek, A. Trybulec (editors): T hird In tern ation al Conference,
M K M 2004. L ectu re N o te s in C o m p u te r Science 31 1 9 , Berlin, 2004. Springer-
Verlag.
[13] A. A sperti, B. Buchberger, J. H. D avenport (editors):
Conference, M K M 2003.
Second In tern ation al
L ectu re N o te s in C o m p u ter Science 2594, Berlin,
2003. Springer-Verlag.
[14] R. A usbrooks, S. Buswell, D. Carlisle, S. D alm as, S. D ev itt, A. Diaz, M. Froum entin, R. H unter, P. Ion, M. K ohlhase, R. Miner, N. Poppelier, B. Sm ith,
N. Soiffer, R. Sutor, S. W att:
version 2.0 (Second E dition).
M athem atical M arkup Language (M athM L)

Technical report, W 3C, 2003. Available from
http://www.w 3 .org/TR/2003/REC-MathML2-20031021/
[15] Y. Bellik, D. Burger: M ultim odal interfaces: new solutions to th e problem of

com puter accessibilty for th e blind. In Conference com panion on H um an factors
in c o m p u tin g system s. 267-268. ACM Press, 1994.
[16] C. Bigelow, D. Day:
D igital T ypography. Scientific A m erica 249(2) (1983),
106-119.
[17] P. V. Biron, A. M alhotra: XML Schem a P a r t 2: D atatypes. Technical report,
OASIS, 2001. Available from http://www.w3.org/xmlschema-2/
[18] In stru ction M anual for B raille Transcribing. Am erican P rin tin g House for the
Blind, Louisville, Kentucky, 3rd ed., 1984.
[19] The N em eth B raille C ode for M a th em a tic s and Science N o ta tio n , 1972 R evision.
A m erican P rin tin g House for th e Blind, Louisville, Kentucky, 1985.
[20] T. Bray, J. Paoli, C. M. Sperberg-M cQ ueen, E. M aler, F. Yergeau, J. Cowan:
Extensible M arkup Language (XML) 1.1. Technical report, W 3C, 2004. Avail
able from http://www.w3.org/TR/2004/REC-xmlll-20040204/
133
[21] M. B ryan: A T^X User's G uide to IS O s D ocum ent Style Sem antics and Spec
ification Language (DSSSL). T U G b o a t 14 (1993), 223-226.
[22] H. B unt: Issues in M ultim odal H um an-C om puter C om m unication. In H. B unt,
R .-J.B eun, T. Borghuis (editors): M u ltim o d a l H u m an -C om pu ter C om m unica
tion: S ystem s, Techniques, and E xperim en ts, 1374. L ectu re N o tes in C o m p u ter
Science, 1-12, Springer-Verlag, Berlin, Jan u ary 1998.
[23] S.
Buswell,
tano,
M.
Technical
0.
C ap ro tti,
Kohlhase:
rep o rt,
D. P. Carlisle,
T he
T he
M.
O penM ath
O penM ath
Society,
C.
Dewar,
S tan d ard
2004.
M.
G ae
(version
2.0).
Available
from
http://www.openmath.org/cocoon/openmath/standard/om20/index.html
[24] S. Buswell, S. D evitt, A. Diaz, P. Ion, R. Miner, N. Poppelier, B. Sm ith,

N. SoifFer, R. Sutor, S. W att:
M athem atical M arkup Language w3c, P ro
posed R ecom m endation. Technical report, W 3C HTM L, 1998. Available from

http://www.w 3 .org/TR/1998/REC-MathML-19980407/.
[25] 0 . C ap ro tti, D.
Technical report,
P. Carlisle, A. M. Cohen:
T he O penM ath S tan d ard.
T he O penM ath E sprit C onsortium , 2000.
Available from
http://www.nag.co.uk/proj ects/OpenMath/omstd
[26] P. C handler, J. Sweller. C ognitive load theory and th e form at of instruction.

C ognition and Instru ction 8(4) (1991), 293-332.
[27] J. C lark: T he design of RELA X NG. Technical report, OASIS, 2001. Available
from http://www.thaiopensource.com/relaxng/design.html
[28] J.
tion.
C lark,
M.
Technical
M akoto:
report,
OASIS,
RELA X
NG
Specifica
2001.
Available
from
http://www.oasis-open.org/committees/relax-ng/spec.html
[29] J.
rial.
C lark,
M.
Technical
M akoto:
rep o rt,
RELAX
OASIS,
2001.
NG
T uto
A vailable
from
http://www.oasis-open.org/committees/relax-ng/tutorial.html
[30] R. E. C lark (editor): L earning From M edia: A rgum ents, A n alysis and E vidence.
P e rsp ec tiv e s in In stru ction al Technology and D istan ce Learning. Inform ation
Age Publishing, 2001.
134
[31] R. E. Clark:
N ew D irections: C ogn itive and M otiva tio n a l Research Issues.
ch. 15. In P ersp ectives in In stru ction al Technology and D istan ce L earning [30],
2001 .
[32] E. F. Codd: A R elational Model of D a ta for Large Shared D a ta Banks. C om

m u n ication s o f the A C M 13(6) (June 1970), 377-387.
[33] P. R. Cohen, D. R. McGee: Tangible M ultim odal Interfaces for Safety-C ritical
A pplications. C om m u n ication s o f the A C M 47(1) (Jan u ary 2004), 41-46.
[34] J. H. Coombs, A. H. R enear, S. J. DeRose: M arkup System s and th e F u tu re of
Scholarly Text Processing. C om m u n ication s o f the A C M 30(11) (1987), 933947.
[35] J. C outaz, L. Nigay, D. Salber: M ultim odality from th e User and System P er
spectives. In Proc. E R C IM (E uropean Research C onsortium for In form atics and
M a th em a tics), workhop on User Interface For All, H eraklion. 1995. A vailable
from citeseer.ist.psu.edu/coutaz95multimodality.html
[36] J. de Carvalho, H. Jiirgensen: D ynam ic M ulti-Purpose M athem atics N otation.
Technical R eport 521, T he U niversity of W estern O ntario, 1998.
[37] M. Dewar:
O penM ath: An Overview. SIG S A M B u lletin 34(2) (June 2000),
2-5.
[38] C. Dirckx: A M athem atical Text to B raille T ranslator. 1992. P ro ject D isser
ta tio n , C hurchill College, U niversity of Bradford.
[39] A. Dix, J. Finlay, G. Abowd, R. Beale: H u m an -C om pu ter Interaction. PrenticeHall, 1998.
[40] M. B. Dorf, E. R. Scharrv: In stru ction M anual for B raille Transcribing. Division
for th e Blind and Physically H andicapped, Library of Congress, W ashington,
D. C., 1979.
[41] S. D unne, H. Jiirgensen:
o f W O O D M A N 89:
F o rm attin g Specialized N otations. In P roceedings
W orkshop on O b je c t-O rien te d D ocu m en t M anipulation.
R ennes, France, 1989.
135
[42] A. D. Edw ards, R. D. Stevens: Une Interface M ultim odale po u r l'Access aux
Form ules M athem atiques p ar des Eleves ou E tu d ian ts Aveugles. In C om m e les
A u tres: Interfaces M u ltim o d a les p o u r H andicapes Visuels, Special n um ber 1.
97-104. INSERM , 1995.

[43] R. E lm asri, S. B. N avathe:
F u ndam entals o f D atabase S ystem s.
T he Ben
jam in /C u m m in g s P ublishing Company, Inc, Redwood City, California, second

ed., 1994.
[44] M. G. E ram ian:
D isplaying DVI Files in Braille: A Viewer for th e V isually
Im paired. Technical R eport 500, T he U niversity of W estern O ntario, 1997.

[45] W . M. Farm er:
MKM: A New Interdisciplinary Field of Research. S IG S A M
Bull. 38(2) (2004), 47-52.
[46] R. F u ru ta, V. Q uint, J. Andre:
Interactively E d itin g S tru ctu red D ocum ents.
E lectron ic P u blish in g 1(1) (1988), 19-44.
[47] R. F u ru ta, J. Scofield, A. Shaw: D ocum ent F o rm attin g Systems: Survey, C on

cepts and Issues. A CM C o m puting Surveys 14(3) (1982), 417-472.
[48] C. Ghezzi, M. Jazayeri, D. M andrioli: F undam entals o f Softw are Engineering.
P rentice-H all, 1991.
[49] C. F. G oldfarb:
A G eneralized A pproach to D ocum ent M arkup. S IG P L A N
N o tices 16(6) (1981), 68-73.
[50] M. Goossens, J. Saarela: A P ractical In tro d u ctio n to SGML. T U G b o u t 16(2)

(1995), 103-145.
[51] M. Goossens, J. Saarela:
From DTX to HTM L and back.
T U G b o u t 16(2)
(1995), 174-214.
[52] D. Harel: S tatech arts: A V isual Form alism for Com plex System s. Science o f
C o m p u te r P ro g ram m in g 8(3) (1987), 231 -274.
[53] D. H arel, A. N aam ad:
T he STA TEM A TE sem antics of statech arts.
A CM
T ransactions o f Softw are E ngineering and M eth o d o lo g y 5(4) (1996), 293 -333.
[54] F. C. Heeman:
G ran u larity in S tru ctu red D ocum ents. E lectronic P u blish in g
5(3) (1992), 143-155.
136
[55] J. E. H opcroft, J. D. Ullm an: In trodu ction to A u to m a ta T heory , L anguages .

and C o m pu tation . Addison-W esley, first ed., 1979.
[56] E. L. H utchins, J. D. Hollan, D. A. N orm an: D irect M anipulation Interfaces.

87-124. In N orm an and D rap er [77], 1986.
[57] Inform ation P rocessing - Text and Office S y ste m s - S ta n d a rd G eneralized
M arkup Language (SG M L).
In tern atio n al O rganization for S tan d ard ization,
In tern atio n al S tan d ard 8879, 1986.

[58] T. M. V. Janssen: C om positionality. In J. van B enthem , A. te r M eulen (editors):
H an dbook o f Logic and Language. Elsevier Science Publishers, 1997.
[59] H. Jiirgensen: Tactile C om puter G raphics. 1997. M anuscript.

[60] H. Jiirgensen, H. W aldschm idt:
Do Portability, Verifiability, and Sim plicity
of P rogram m ing have to be C onflicting Goals?
Technical R eport 123, T he
U niversity of W estern O ntario, 1984.

[61] B. W . K ernighan, D. M. Ritchie: The C P rogram m in g Language. Prentice-H all,
Englewood Cliffs, New Jersey, 1978.
[62] P. K ilpelainen: SGM L k XML content models. Technical R eport C-1998-12,
U niversity of Helsinki, 1998.
[63] D. E. K nuth:
T h e Genesis of A ttrib u te G ram m ars.
In P roceedin gs o f the
In tern ation al Conference on A ttr ib u te d G ram m ars and th eir A p p lication s. 1-
12. Springer-V erlag New York, Inc, 1990.

[64] D. E. K nuth: The TfcXbook. Addison-Wesley, Reading, M assachusetts, 1993.
[65] L. L am port: BTfcX, a D ocu m en t P reparation System . Addison-W esley, R ead
ing, M assachusetts, 1986.
[66] J. R. Levine, T. M ason, D. Brown:
lex & yacc. O Reilly k Associates, Inc,
Sebastopol, California, second ed., 1995.

[67] D. M. Levy:
Fixed or F luid? D ocum ent S tablility and New M edia. E C H T
1994 P roceedin gs (Septem ber 1994), 24-31.
137
[68] X. Li: XML and th e C om m unication of M athem atical O bjects. M aster's thesis.
T he U niversity of W estern O ntario, London, C anada, 1999.
[69] J. C. M artin:
In trodu ction to Languages and The T h eory o f C o m pu tation .
M cGraw-Hill, first ed., 1991.

[70] M a th T y p e, M a th em atical E qu ation E d ito r , User Manual. Design Science, Inc.,
Long Beach, California, M ay 1997.
[71] H. A. M aurer, A. Salom aa, D. W ood: A supernorm al-form theorem for contextfree gram m ars. JA C M 30(1) (Jan u ary 1983), 95-102.
[72] B. Meyer: O b je c t O rien ted Softw are C onstruction. Addison-Wesley, 1997.
[73] E. D. M y n att, G. W eber: N onvisual P resen tatio n of G raphical User Interfaces:
C o n trastin g Two Approaches. In CH I 1994 Conference Proceedings. 166-172,
April 1994.
[74] W . M. N ew m an, M. G. Lam m ing: In teractive S ystem Design. Addison-W esley,
1995.
[75] L. Nigay, F. Jam bon, J. C outaz:
Form al Specification of M ultim odality. In
C H I95 W orkshop on Form al S pecification s o f User Interfaces. Denver, USA,
1995. Available from c i t e s e e r . i s t . p s u . e d u / n i g a y 9 5 f o r m a l .h t m l

[76] D. A. N orm an:
C o g n itive Engineering, 31-61. In N orm an and D rap er [77].
1986.
[77] D. A. N orm an, S. W . D rap er (editors): User C entered S y ste m Design. Lawrence
E rlbaum Associates, Publishers, 1986.
[78] J. Paakki:
A ttrib u te G ram m ar P aradigm s - A High-Level M ethodology in
Language Im plem entation. A C M C o m p u tin g S u rveys 27(2) (1995), 196-255.

[79] L. Padovani: On th e Roles of D l^ X a n d M athM L in Encoding and Processing
M athem atical Expressions. In A sperti et al. [13], 66-79.
[80] H. P etrie, W . Fisher, G. W eber, I. Langer, K. G. an d C ath y Rundle, L. Pyfers:
U niversal Interfaces to M ultim edia. In 4th IEEE In ternational Conference on
M u ltim o d a l Interfaces (IC M I 2002). IE E E C om puter Society, O ctober 2002.
138
[81] N. A. F. M. Poppelier, E. van H erwijnen, C. A. Rowley: S tan d ard D T D 's and

Scientific Publishing. E P S IG N ew s 5 (Septem ber 1992), 10-19.
[82] L. M. Q uiroga, M. E. Crosby, M. K. Iding: Reducing Cognitive Load. In H IC SS
04: P roceedin gs o f the 37th A nnual H awaii International Conference on S ystem
Sciences (H IC S S04) - Track 5. 50131.1. IE E E C om puter Society, 2004.
[83] T. V. R am an: T^X talk. T U G b o a t 12 (1991), 178.

[84] T. V. R am an:
An Audio View of D I^X D ocum ents.
T U G b o a t 13 (1992),
372-379.
[85] T . V. R am an:
D ocum ents Are not ju s t for P rinting. In P roc. P rin ciples o f
D ocu m en t Processing. 1992.
[86] T. V. R am an:
A u d io S y ste m for Technical Readings.
P hD thesis, Cornell
University, New York, USA, 1994.

[87] T. V. R am an:
An A udio View of DTfrjX D ocum ents - P a rt II. T U G b o a t 16
(1995), 311-314.
[88] T. V. R am an: Em acspeak: A Speech-E nabling Interface. Dr. D o b b s Journal
(Septem ber 1997).
[89] D. R. Raym ond, F. W . Tom pa, D. W ood:
M arkup Reconsidered.
In F irst
In tern ation al W orkshop on P rin ciples o f D ocu m en t Processing. W ashington,
D .C., O ctober 21-23 1992.

[90] D. R. R aym ond, F. W . Tom pa, D. Wood: From D a ta R epresentation to D ata
Model: M eta-Sem antic Issues in th e Evolution of SGM L. C o m p u ter S ta n d a rd s
and Interfaces (1996).
[91] L. M. Reeves, J.-C . M artin, J. Lai, M. M cTear, T. R am an, K. M. Stanney, H. Su,

Q. Y. W ang, J. A. Larson, S. O v iatt, T. B alaji, S. Buisine, P. Codings, P. Cohen,
B. K raal: G uidelines for M ultim odal User Interface Design. C om m u n ication s
o f the A C M 47(1) (Jan u ary 2004), 57-59.
[92] C. Roisin, I. V atton:
M erging Logical and Physical S tructures.
P u blish in g 6(4) (1993), 327-337.
E lectronic
139
[93] W . R udin: R eal and C om plex Analysis. M cGraw-Hill, New York, New York,
th ird ed., 1987.
[94] G. Salomon: Television is "easy" and p rin t is v tough5: T he differential invest
m ent of m ental effort in learning as a fucntion of perceptions and attrib u tio n s.
Journal o f E du cation al P sych ology 76(4) (1984), 233 -243.
[95] R. Sethi: P rogram m in g Languages C o n cep ts and C on stru cts. Addison-Wesley,

1990.
[96] G. G. Sm ith, D. Ferguson: D iagram s and M ath N otation in e-Learning: Grow
ing Pains of a New G eneration. In ternational Journal o f M a th em a tica l E du ca
tion in Science and Technology 35(5) (2004), 681-695.
[97] C. M. Sperberg-M cQ ueen:
Specifying D ocum ent S tructure:
Differences in
DT^X and T E I M arkup. T U G b o a t 12(3) (1991), 415-421.

[98] A. S tro tm an n : C on tent M arkup Languages Design P rinciples. P hD thesis, T he
F lo rid a S tate University, F lorida, USA, 2003.
[99] J. Sweller, P. C handler: W hy some m aterial is difficul to learn. C ognition and
Instru ction 12(3) (1994), 185-233.
[100] J.-P. Trem blay, P. G. Sorenson: The T h eory and P ractice o f C om piler W riting.
M cGraw-Hill, 1989.
[101] J. van B enthem , A. te r Meulen:
H an dbook o f Logic and Language. Elsevier
Science Publishers, 1997.

[102] S. V orkoetter: Proposed O penM ath Specification. Technical report, W aterloo
M aple Software, 1995. Available from http://www.openmath.org/.
[103] J. N. W allace, T. A. B. Wesley:
T he Access to Scientific and M athem atical
Inform ation for Blind People. [1991]. M anuscript, D ep artm en t of C om puting,

U niversity of B radford.
[104] D. A. W att, D. F. Brown: P rogram m in g Language P rocessors in Java. PrenticeHall, Harlow Essex, first ed., 2000.
[105] G. W eber. A M ultim edia E d ito r for M athem atical D ocum ents. Available from
http://www .multireader.org/multimedia'/.20editor.html
140
[106] D. W ood: G ram m ar and L forms: an in trodu ction . L ectu re N o tes in C o m p u te r

Science 91. Springer-V erlag, 1980.
[107] D. Wood: T h eory o f C o m p u ta tio n . John W iley k Sons, first ed., 1987.
[108] F. J. W right: Interactive M athem atics via th e Web using M athM L. S IG S A M
B u lletin 34(2) (June 2000), 49-57.
141
VITA
Name:
Jackson W . M arques de C arvalho
P lace o f birth:
Brazil
Education:

1995-2005 Ph.D .
U niversity of M aine a t O rono
Orono, M aine, USA
1983-1985 M aster of Electrical Eng.
U niversidade Federal do Rio G rande do N orte
N atal, RN, Brazil
1972-1978 B.Sc
Awards:
Conselho N acional de D esenvolvim ento

Cientifico e Tecnologico (C N Pq)
1995-1998
O rganization of A m erican S tates
1983-1985
R elated Work
Experience:
Lecturer
D ep artm en t of C om puter Science
U niversity of P ittsb u rg h
P ittsb u rg h , PA, USA
2002-present
Lecturer
School of C om p u ter Science
U niversity of W indsor
W indsor, O ntario, C an ad a
1999-2002
G rad u ate Research A ssistan t/L ectu rer
D ep artm en t of C om p u ter Science
1999
142
R elated Work
Experience:
(cont)
Teaching A ssistant
Faculty of Inform ation and M edia Studies
1998
L ecturer
1997
Teaching A ssistant
London, O ntario, C anada
1996-1998
C o o rd in ato r of th e Scientific
C om puting C enter (NCC)
N atal, RN, Brazil
1991-1995
L ecturer
N atal, RN, Brazil
1989-1995
G rad u ate A ssistant
D ep artm en t of Electrical Engineering
U niversity of M aine a t O rono
O rono, M aine, USA
1985
Electrical Engineer
Technological Nucleus a t C enter of Technology
N atal, RN, Brazil
1986-1989
143
Presentations:
1987 M TNS, Phoenix Az, USA

S traig h t Line M otion,
Inverse K inem atic Velocities and,
Inverse T rajecto ry P lanning
1987 M TNS, Phoenix Az, USA
M ASK Layout Language and Layout Checking P lots
Technical Reports:
D ynam ic M ulti-P urpose M athem atics N otation

Technical R ep o rt N um ber 521, 1998
In conjuction w ith Dr. H elm ut Jiirgensen
The U niversity of W estern O ntario
London, O ntario, C anada

Mathematics As A Game of Types

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematics As A Game of Types

Uploaded by

Copyright:

Available Formats

Mathematics

Jackson W. Marques de Carvalho

A thesis su b m itted in p artial fulfillment

Faculty of G rad u ate Studies

Jackson W. M arques de Carvalho 2005

395 W ellington Street

395, rue W ellington

The author retains copyright

L'auteur conserve la propriete du droit d'auteur

In compliance with the Canadian

Conformement a la loi canadienne

While these forms may be included

Bien que ces formulaires

THE UNIVERSITY OF WESTERN ONTARIO

Dr. Helmut Jurgensen

Dr. Stephen Watt

Dr. Kamran Sedig

Dr. David Spencer

Dr. Gerhard Weber

Mathematics as a Game of Types

stru c tu re supports th e creation of m ulti-purpose docum ents and allows th e specifi

I would also like to

C ertificate of Exam ination

A cknow ledgem ents

T he Problem : C ap tu rin g Sem antics by M eans of User-Defined Syntax

1.2 R elated W o r k ..........................................................................................................

D ata Model and D ata R e p re s e n ta tio n ...............................................

SGML and X M L .....................................................................................

XML and RELAX N G ...........................................................................

Some L im itations of Both O penM ath and M a t h M L ...................

C o m p o s itio n a lity .....................................................................................

M o tiv a tio n .................................................................................................................

A Solution: D ynam ical D ocum ent S t r u c t u r e ................................................

1.6 Thesis O v erv iew ......................................................................................................

2 Basic N otion s and N otation

A Framework for Interactive System s

Electronic and P ap er D o c u m e n t s ......................................................

C om m unication, M edia and M o d a litie s ...........................................

User Interface Basic C o m p o n e n ts .................................................................

A S tru ctu rin g P r o b l e m ...........................................................................

A Different S tru ctu re for Interactive S y s te m s .............................................

A uthoring Environm ents

In teraction O bjects and A uthoring E n v i r o n m e n ts ....................................

Encoding M athem atical C o n c e p t s ..................................................................

E nvironm ent M o d ific atio n s................................................................................

R eco m m en d a tio n s.................................................................................................

M athem atical C onstructs and their R epresentation

N otational Systems as L a n g u a g e s ..................................................................

S tan d ard M athem atical N o tatio n C h a ra c te ris tic s ......................................

C ap tu rin g th e Sem antics of M athem atical C o n c e p ts .................................

5.3.1 M athem atics and D ocum ent A u th o r in g ............................................

C FG L im itation to S upport A uthoring M a th e m a tic s ...................

Identical Syntax and R ule S e m a n tic s ..............................

Redundancy, Syntax Equivalence and Norm al Forms

R epresenting Polynom ials

R epresenting Subscripts and S u p e rs c rip ts ...................................................

O verloading S uperscripted S y m b o ls ..................................................

R epresenting Sets of N u m b e r s ........................................................................

6 M odelling C ontext D ependent Inform ation

A uthoring M athem atics and M u ltim o d a lity ................................................

A Form al S tru ctu re for D ocum ent A u th o rin g .............................................

G ram m ars and D ynam ic D ocum ent A u th o r in g .............................

S tru ctu rin g w ith G r a m m a r s ............................................................................