Yanofsky - Universal Approach

A Universal Approach to Self-Referential Paradoxes, Incompleteness and Fixed Points
Author(s): Noson S. Yanofsky

Source: The Bulletin of Symbolic Logic, Vol. 9, No. 3 (Sep., 2003), pp. 362-386
Published by: Association for Symbolic Logic
Stable URL: http://www.jstor.org/stable/3109884
Accessed: 01-03-2017 17:38 UTC
REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/3109884?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
Association for Symbolic Logic is collaborating with JSTOR to digitize, preserve and extend access to The
Bulletin of Symbolic Logic
This content downloaded from 132.248.9.8 on Wed, 01 Mar 2017 17:38:25 UTC
All use subject to http://about.jstor.org/terms
THE BULLETIN OF SYMBOLIC LOGIC
Volume 9, Number 3, Sept. 2003
A UNIVERSAL APPROACH TO SELF-REFERENTIAL PARADOXES,

INCOMPLETENESS AND FIXED POINTS
NOSON S. YANOFSKY
The point of these observations

is not the reduction of the
familiar to the unfamiliar[ ... ]
but the extension of the familiar
to cover many more cases.
Saunders MacLane
Categories for the Working Mathematician [14]
Page 226.
Abstract. Following F William Lawvere, we show that many self-referential paradoxes,

incompleteness theorems and fixed point theorems fall out of the same simple scheme. We
demonstrate these similarities by showing how this simple scheme encompasses the semantic
paradoxes, and how they arise as diagonal arguments and fixed point theorems in logic,
computability theory, complexity theory and formal language theory.
?1. Introduction. In 1969, F William Lawvere wrote a paper [11] in which

he showed how to describe many of the classical paradoxes and incomplete-
ness theorems in a categorical fashion. He used the language of category
theory (and of cartesian closed categories in particular) to describe the set-
ting. In that paper he showed that in a cartesian closed category satisfying
certain conditions, paradoxical phenomena can occur. Lawvere then went
on to demonstrate this scheme by showing the following examples
1. Cantor's theorem that N < p(N)
2. Russell's paradox
3. The non-definability of satisfiability
4. Tarski's non-definability of truth and
5. G6del's first incompleteness theorem.
Further work along these lines were done in several papers e.g., [8], [17],
[19], [20]. Unfortunately, Lawvere's paper has been overlooked by many
people both inside and outside of the category theory community. Lawvere
Received April 15, 2003; accepted June 28, 2003.
? 2003, Association for Symbolic Logic

1079-8986/03/0903-0004/$3.50
362
PARADOXES, INCOMPLETENESS, FIXED POINTS 363
and Schanuel revisited these ideas in Session 29 of their book [13]. Recently,
Lawvere and Robert Rosebrugh came out with a book Sets for Mathematics
[12] which also has a few pages on this scheme.
It is our goal to make these amazing results available to a larger audience.
Towards this aim we restate Lawvere's theorems without using the language
of category theory. Instead, we use sets and functions. The main theorems
and their proofs are done at tutorial speed. We generalize one of the theorems
and then we go on to show different instances of this result. In order to
demonstrate the ubiquity of the theorems, we have tried to bring examples
from many diverse areas of logic and theoretical computer science.
Classically, Cantor proved that there is no onto (surjection) function
N - 2N p(N)
where 2N is the set of functions from N to 2 = {0, 1}. 2N is the set of
characteristic functions on the set N and is equivalent to the powerset of N.
We can generalize Cantor's theorem to show that for any set T there is no
onto function
T - 2T p(T).
The same theorem is also true for other sets besides 2, e.g., 3 = {0, 1, 2}
23 = {0, 1,2,... 21,22}. The theorem is not true for the set 1 = {0}. In
general we can replace 2 with an arbitrary "non-degenerate" set Y. F
this generalization, the basic statement of Cantor's theorem roughly
that if Y is "non-degenerate" then there is no onto function
T , YT
where yT is the set of functions from T to Y. Y can be thought of

set of possible "truth-values" or "properties" of elements of T. By
degenerate" we mean that the objects of Y can be interchanged or that
exists a function a from Y to Y without any fixed points (y E Y
(y) = y.)
Rather than looking at functions f: T > yT, we shall look at equi
functions of the form f: T x T - Y. Every f can be converte
function f where f(t, t') = f(t')(t) E Y. Saying that f is not onto i
same thing as saying that there exists a g(-) E yT such that for all
the function f(t') = f(-, t'): T > Y is not the same as the func
g(-): T > Y. In other words there exists a t E T such that
g(t) /- f (t, t').
We shall call a function g: T - Y "representable by to" ifg (-) = f(
So if f is not onto, then there exists a g(-) E yT that is not represe
by any t E T.
On a philosophical level, this generalized Cantor's theorem says t
long as the truth-values or properties of T are non-trivial, there is no
364 NOSON S. YANOFSKY
that a set T of things can "talk about" or "describe" their own truthf
or their own properties. In other words, there must be a limitation in
that T deals with its own properties. The Liar paradox is the thr
sand year-old primary example that shows that natural language
not talk about their own truthfulness. Russell's paradox shows th
set theory is inherently flawed because sets can talk about their own
erties (membership.) G6del's incompleteness results shows that ari
cannot completely talk about its own provability. Turing's Halting pr
shows that computers cannot completely deal with the property of w
a computer will halt or go into an infinite loop. All these different e
are really saying the same thing: there will be trouble when things d
their own properties. It is with this in mind that we try to make
formalism that describes all these diverse-yet similar-ideas.
The best part of this unified scheme is that it shows that there really
paradoxes. There are limitations. Paradoxes are ways of showing that
permit one to violate a limitation, then you will get an inconsistent
The Liar paradox shows that if you permit natural language to talk a
own truthfulness (as it-of course-does) then we will have inconsi
in natural languages. Russell's paradox shows that if we permit
talk about any set without limitations, we will get an inconsistenc
theory. This is exactly what is said by Tarski's theorem about t
formal systems. Our scheme exhibits the inherent limitations of
systems. The constructed g, in some sense is the limitation that your
(f) cannot deal with. If the system does deal with the g, there w
inconsistency (fixed point).
The contrapositive of Cantor's theorem says that if there is a o
yT then Y must be "degenerate" i.e., every map from Y to Y mus
fixed point. In other words, if T can talk about or describe its own pr
then Y must be faulty in some sense. This "degenerate"-ness is a
producing fixed point theorems.
For pedagogical reasons, we have elected not to use the powerful lan
of category theory. This might be an error. Without using category
we might be skipping over an important step or even worse: wave ou
at a potential error. It is our hope that this paper will make you
and look at Lawvere's original paper and his subsequent books. O
language of category theory can give an exact formulation of the
and truly encompass all the diverse areas that are discussed in thi
Although we have chosen not to employ category theory here, its sp
nevertheless pervasive throughout.
This paper is intended to be extremely easy to read. We have tr
make use of the same proof pattern over and over again. Whenever p
we use the same notation. The examples are mostly disjoint. If th
is unfamiliar with or cannot follow one of them, he or she can move on
to the next one without losing anything. Section 2 states Lawvere's main
theorem and some of our generalizations. Section 3 has many worked out
examples. We start the section with the classical paradoxes and then move on
to some of the semantic paradoxes. From there we go on to other examples
from theoretical computer science. Section 4 states the contrapositive of
the main theorem and some of its generalizations. The example of this
contrapositive is in Section 5. We finish off the paper by looking at some
future directions for this work to continue. We also list some other examples
of limitations and fixed point theorems that might be expressible in our
scheme.
We close this introduction with a translation of Cantor's original proof of
his diagonalization theorem. His language is remarkably reminiscent of our
language. This translation was taken from Shaughan Lavine's book [10].
The proof seems remarkable not only because of its simplicity, but
especially also because the principle that is employed in it can be
extended to the general theorem, that the powers of well-defined sets
have no maximum or, what is the same, that for any given set L
another M can be placed beside it that is of greater power than L.
For example Let L be a linear continuum, perhaps the domain of
all real numerical quantities that are > 0 and < 1.
Let M be understood as the domain of all single-valued functions
f(x) that take on only the two values 0 or 1, while x runs through
all real values that are > 0 and < 1. [M = 2L ... ]
But M does not have the same power as L either. For otherwise
M can be put into one-to-one correspondence to the variable z [of
L], and thus M could be thought of in the form of a single valued
function
0(x,z)
of the two variables x and z, in such a way that through every
specification of z one would obtain an element f(x) = + (x, z) of
M and also conversely each element f (x) of M could be generated
from (x, z) through a single definite specification of z. This however
leads to a contradiction. For if we understand by g(x) that single
valued function of x which takes only values 0 or 1 and which every
value of x is different from +(x, x), then on the one hand g(x) is an
element of M, and on the other it cannot be generated from +p(x, z)
by any specification z = zo, because 0(zo, zo) is different from g(zo).
Acknowledgments. The author is grateful to Rohit Parikh for suggesting

that this paper be written and for his warm encouragement. The author also
had many helpful conversations with Eva Cogan, Scott Dexter, Mel Fitting,
Alex Heller, Roman Kossak, Mirco Mannucci, and Paula Whitlock.
?2. Cantor's theorems and its generalizations. It is pedagogically sou

skip this section for a moment and read the beginning of the next se
where you can remind yourself of the proof of the more familiar ve
Cantor's theorem (about N < g(N)) and Russell's set theory parad
theorem here might seem slightly abstract at first.
THEOREM 1 (Cantor's Theorem). If Y is a set and there exists a f
a: Y - Y without afixedpoint (for ally E Y, a(y) # y), then for a
T andfor allfunctions f: T x T - Y there exists a function g
that is not representable by f i.e., such that for all t E T
g(-) f (-,t).
PROOF. Let Y be a set and assume a: Y - Y is a function without fixed
points. There is a function A: T - T x T that sends every t E T to
(t, t) E T x T. Then construct g: T - Y as the following composition of
three functions.
TxT > Y
A
A a
T g > Y.
In other words,
g(t) = a(f(t, t)).

We claim that for all t E T, g(-) f
g(-) = f(-, to) then by evaluation a
f(to, to) = g(to) = a(f(to,
where the first equality is the fac
equality is the definition of g. Bu
point. o
REMARK 1.
to itself th
for talking
morphisms
braic) struc
structure. F
order struc
the partial
the set Y. S
on Y (e.g.,
endomaps o
The above theorem has more content if you take notice of thefact that we might
not be dealing with functions between sets.
REMARK 2. The A map is called the "diagonal" and many of the proofs
are called "diagonalization arguments." f is some type of evaluation function
and f (t, t) is an evaluation of itself, hence "self-reference" or "self-referential
arguments."
REMARK 3. We follow Lawvere and Schanuel [13] in calling this theorem
"Cantor's Theorem" and its contrapositive the "Diagonal Theorem" which is
stated in Section 4.
We generalize the above theorem so that instead of A = (Id, Id) we use

(Id, ,B) for an arbitrary onto (right invertible) function fl: T ) S. Whereas
A = (Id, Id): T - T x T takes every t to (t, t), (Id, fl): T ---, T x S
takes every t to (t, f (t)).
The way to think about this theorem is to say that if there is an onto
p: T - S then in a sense IS < I TI and Cantor's theorem says I T < I YTI
and so we conclude that IS < I YTI.
THEOREM 2. Let Y be a set, a: Y > Y a function without afixedpoint,
T and S sets and B: T )- S a function that is onto (i.e., has a right inverse
fl: S ) T,) thenforallfunctionsf: TxS - Y thefunction gp: T -> Y
constructed as follows
f
TxS > Y
(Id,W) a
T g >Y
is not representable by f .
PROOF. Let Y, a, T and ,B be given

of fl. By definition
ga((t) = a(f(t,/(t))).
We claim that for all s E S gp(
evaluation at B(so) gives
f({p(so), so) = ga(P(so)) by represe

= a(f (/(so), /(/(to)))) by d
= a(f(/t(so), so)) by definitio
o
Which means that a does have a fixed
We can think of this theorem in another way. Set S = T and lets co

a f/ different than IdT. The usual way to visualize Cantor's Theorem
f t t2 t3 t4 t5
tl [y3] Y7 Y21 Y2 Y4
t2 YI [Y17] Y2 Y7 Y41 '"
t3 Yo Y3 [Y7] Y2 Y24 **
t4 Y9 Y7 Y64 [Y2] Y4 ..
t5 Y4 Y73 Y31 Y2 [Y4] **
Everything that is in square brackets gets changed. For example y

changed to a (y3). However a little thought shows that we do not need
along the diagonal. The diagonal is just the simplest way. What is neede
that every row of the table gets at least one element changed. So we m
have a picture that looks like this:
f tl t2 t3 t4 t5
tl y3 Y7 Y21 [Y2] Y4
t2 [yl] Y17 [y2] Y7 Y41 ..
t3 yo Y3 Y7 Y2 [Y24] ...
t4 Y9 [Y7] Y64 [y2] y4
t5 Y4 Y73 Y31 [Y2] y4 .
The fact that every row has something changed is in essence the fa
is onto. As long as fi is onto, Cantor's theorem still holds.
With this in mind we may pose-but do not answer-the follow
tions. Should these theorems really be called "diagonalization th
Does self-reference really play a role here? Since we can generate t
paradoxes without self-reference, does this destroy Russell's vic
principle?
?3. Instances of Cantor's theorems. We shall begin with the famil

sion of Cantor's theorem about the power set of the natural numbe
there we move on to Russell's set theory paradox and other para
limitations. We shall do the first two instances slowly and use
notation and ideas as the theorems in the last section. The other instances
we shall do more quickly.
Instance: Cantor's N < p(N) theorem. The theorem says that there cannot
be an onto function from N to p(N). Let So, S1, S2, ... be a proposed
enumeration of all subsets of N. Let 2 = {0, 1} be a set and consider
the "negation" function a: 2 - 2 where a(O) = 1 and a(1) = 0. Let
f: N x N - 2 be defined as
I if n c Sm
f(n,m)-\O' ifn =Sm.
For each m, f (-, m) is the characteristic function of Smn
f(-,m) = XSm
Construct g as follows:
NxN >2
A
A a
N >2.
g is the characteristic funct
G={n EN n Sn}.
For all m, XG = g(-) 7 f(-, m) = Xsm. Because if there was an m

that g(-) = f(-, mo) then by evaluation at mo we have
f(mo, mo) = g(mo) = a(f(mo, mo))
where the first equality is from the fact that g is representable by

the second equality is by the definition of g. This means that the n
operator has a fixed point which is clearly false. In other words
not in the proposed enumeration of all subsets of N. D
Instance: Russell's paradox. This paradox says that the set of all sets that
are not members of themselves is both a member of itself and not a member of
itself. Let Sets be some universe of sets (we are being deliberately ambiguous
here.) Again consider the "negation" function ca: 2 - 2 where a(0) = 1
and a(l) = 0. Let f: Sets x Sets > 2 be defined as follows on sets s
and t.
( 1 ifs E t
( \ ifs .
We construct g as follows
f
Sets x Sets > 2
A
A c
Sets > 2.
g is the characteristic function

selves. For all sets t, g(-) 7 f(-
g(-) = f(-, to) then from evalu
f (to, to) = g(to) = a(f
where the first equality is beca

is from the definition of g. Th
make sure that there are no par
function of a "collection" of Sets but this "collection" does not form a set.
We mention in passing that the Barber paradox and other simple self-
referential paradoxes can be done exactly like this. The Barber paradox has
a simple solution, namely that the village described by the phrase "there
is a village where everyone who does not shave themselves is shaved by
the barber" does not really exist. We are in a sense saying the same thing
about Russell's paradox. Namely, the collection of sets that do not contain
themselves does not form an existent set. O
Instance: Grelling's paradox. We now move on t

paradoxes. There are some adjectives that describe t
some that do not. "English" is an English word.
word. "Polysyllabic" is polysyllabic but "monosyllab
Call all words that do not describe themselves "
yourself if "heterological" is heterological. It is if an
Consider the set Adj of all (English) adjectives.
function f: Adj x Adj - 2 defined for all adjecti
f1) : if a2 describes al
(a a = if a2 does not decribe al.
And so we have the following construction of g
Adj x Adj > 2

A
A a
Adj > 2.
g is the characteristic function of a

cannot be described by any adjecti
g(-) f (-, a) for all adjectives a. "H
tive that is in this subset. Some au
word "impredicable". Our formulat
tives. o
Instance
is the (C
liars." T
lying." T
with Gre
English s
'yields f
yields fa
The philo
similar to Grelling's paradox, we leave it to the reader. D
Instance: The strong Liar paradox. A common "solution" to the Liar's
paradox is to say that that there are certain sentences that are neither true
nor false but are meaningless. "I am lying" would be such a sentence. This
is a type of three-valued logic. This is, however, not a "solution." Consider
the sentence
'yields falsehood or meaninglessness

when appended to its own quotation'
yields falsehood or meaninglessness
when appended to its own quotation.
If this sentence is true, then it is false or meaningless. If it is false, then
it is true and not meaningless. If it is meaningless, then it is true and not
meaningless.
This paradox can also be formulated with our scheme. Consider the set of
English sentences Sent and the set 3 = {T(rue), M (eaningless), F (alse)}. We
have the following function f: Sent x Sent - 3 defined for all sentences
s1 and s2,
T: if a2 describes a1
f(sl, s2) = M: if it is meaningless for a2 to describe al
F : if a2 does not decribe al.
Now consider the function a: 3 > 3 defined as a(T) = F and a(M) =

a(F) = T. Construct g as follows
Sent x Sent > 3

A
A a
Sent > 3.
g is the characteristic function o

ingless when describing thems
those sentences that g takes to T as opposed to M or F. O
Instance: Richard's paradox. There are many sentences in the English lan-
guage that describe real numbers between 0 and 1. Let us lexicographically
order all English sentences. Using this order, we can select all those English
sentences that describe real numbers between 0 and 1. For example "x is the
ratio between the circumference and the diameter of a circle divided by ten."
describes the number 0.314159 .... There are many similar English sen-
tences. Call such a sentence a "Richard Sentence." So we have the concept
of the "m-th Richard Sentence."
Consider the set 10 = {0, 1,2,... 9} and the function a: 10 > 10 defined
as a(i) = 9 - i. This function does not have a fixed point. Now consider
the function f: N x N - 10 defined as
f(n, m) = The n-th decimal number of the m-th Richard Sentence.
For example, if the sentence in the above paragraph is the 15th Richard
sentence then f(4, 15) = 1 because of the 1 in 0.314159.... Now consider
g: N - 10 constructed as
f
N x N > 10
A
A a
N >g 10.
This g describes a real number between 0 and 1 and yet for all m E N
g(-) f (-, m)
i.e., this number is different than all Richard Sentences. Yet here is a Richard
Sentence that describes this number:
x is the real number between 0 and 1 whose n-th digit is nine

minus the n-th digit of the number described by the n-th Richard
sentence.
Why does this paradox remain? Our scheme is supposed to give a lim-
itation of "Richard-sentenceness" and yet, for some reason, the paradox
remains. Jay Kangel (in an e-mail correspondence) has suggested that the
problem is that "Richard-sentenceness" is not a well defined or a computable
concept. Consider the following sentence: "Let x be a red cow if the Gold-
bach Conjecture is true and let x be 3/4 if the Goldbach Conjecture is false."
Does this sentence describe a real number? Is this sentence a Richard sen-
tence? Even if we know that a sentence is a Richard sentence, we might
not be able to determine what number the sentence describes. It is not clear
what number is described by the following Richard sentence: "Let x be 1/4
if the Riemann Hypothesis is true and let x be 3/4 if it is false." Since the set
of Richard sentences is not well defined and the function f is also not well
defined, it is not surprising that the paradox remains. n
Instance: Turing's Halting problem. The following formulation was in-
spired by Heller's fascinating work on recursion categories [6] and Manin's
intriguing paper on classical and quantum computations [15].
For this instance we leave the comfortable world of sets and functions. We
must talk about computable universes. A computable universe is a category
U with the following two properties
1. N and 2 are objects in U

2. For every object C in U there is some type of enumeration of the
elements of C. An enumeration is a total isomorphism ec: N - C.
One should think of C as a set of computable things, e.g., trees, graphs,
numbers, stacks, strings etc.
3. For every (not necessarily total) function f: C - C' there is a
corresponding number rf ' E N. Think of this as the Godel number
of the program that computes the computation.
4. For every (not necessarily total) function f: C > C' there is a
corresponding recursively enumerable (r.e.) set W(f) C N. For every
c E C, f has a value at c if and only if eC'(c) E W(f). Again one
should think of a partial function from one computable domain to
another.
Halt in a computable universe should be a total function Halt: N

in U such that for all f: C - C'
Halt(-, r-f) = Xw(f).
This says that Halt should be able to tell for what values in C the com
halts. Formally
1' ifn E Wm
Halt(n, m) =1 if n Wm
10: ifn i Wm.
Consider a: 2 - 2 defined as follows: a(0) = 1 and a(l) X, i.e., the

computation is undefined. Construct g as follows:
Halt
NxN >2
A
A /
N g >2.
We conclude by showing that H

r-g1. If Halt was defined at rg- t
tion:
Halt(rgn, rg) = 1 iff rg- E W

iffg(rg~) = 1 by the h
iff Halt(rg', rg-) = 0 b
Hence no total Halt can exist. O
Instance: A non-r.e. language. Th

by any Turing machine. Let Mo,
ing machines on the input langu
enumeration of all the words in S
the numerical value of the binary
f: E* x * - 2 defined as follow
f (wi,) 1:1 ifwi is accep

- 0: if wi is not accept
Then the constructed g
f
?* x ?* > 2
A
A a
?* g > 2.
is the characteristic function of a lan

machine. Of course, the fact that t
from a simple counting argument. N
is countable and the number of
Instance: An oracle B such that pB
tions in computer science is whether
be solved by deterministic Turing
equal to the set NP, of all problems t
TMs in polynomial time. Alas, this qu
per. However there is a related questi
same question for oracle TMs. An o
S, such that the TM can determine if
a given set S there are analogous se
[1] have proven that there exists a set
a set B such that pB N NpB. Here w
every deterministic machine is by de
for every B, pB C NPB. What remain
language LB such that LB E NPB b
was adopted from [7].
Let M, M?, M2,... be some enumer
polynomial Turing machines in th
responding sequence of polynomial
worst execution time for each machine
For any function f: Z* x N - 2
is a characteristic function on the se
characteristic function. Let f(-, i) den
complement of f(-, i), i.e., f(-, i)
F(-, i) denote the cumulative charac
F-,i) U f(-,j).
j<i
We shall define f(-, -) inductively. (Vw E ?*)f(w, O) = 1. For w

and i E N, f(w, i) = 0 if and only if the following three conditi
satisfied
1. (Vw' < w)f(w', i) = 1 where the < is a lexicographical order o

words of *. This insures that there is only one word accepted to
each i.
2. MF(-'i) rejects OIwl within il?gi steps.
3. (Vj < i)M1(-j) on input 01Ol does not to query w within jlogj steps.
Once this f is defined, we construct g as follows
E*x N >2
A
(Id, f) t
Y* > 2
where ,l(w) = Iwl, a(0) =

f(w, Iw ) = 0 if and only if t
g is the characteristic fun
language
LB = {0i I B contains a word of length i}.
This language can easily be recognized by a linear time nondeterministic

TM. On input Oi, the NTM simply has to guess a string w of length i
and see if it is in B. Hence LB E NPB. In contrast, because of condition
2 above, LB cannot be recognized by any DTM in polynomial time, i.e.,
(Vm)g(-) / f(-,m). o
Instance: Time travel paradox
was possible, a time traveller
grandfather thus insuring tha
never born, then he could not
to be so homicidal in order to
might just insure that his par
time and make sure that he d
These self-referential paradox
key point is of central import
own grandfather (besides for
not be born, rather because if
able to go back in time to shoo
The time traveller is not self-r
With this in mind, we defin
dimensional space-time. Eve
tire enterprize of physics i
ever, are considering it onl
f: Events x Events -* 2 defined for all events el and e2,
f(e, e2) 1: if e2 is compatable with el

0: if e2 is incomaptable with el.
For example f ( the sun shining at a particular time in a particular place,

the sun not shining at a particular time in a particular place) = 0. f ( Jack
playing ball in New York now, Jill skating in California now) = 1. In fact,
any two events where the first event is outside of the light-cone of the second
event is compatible. f (Jack killing his bachelor grandfather, Jack being
born) = 0.
We now have the following construction of g
Events x Events > 2

A
A a
Events > 2.
g
g is the characteristic function of those events that would be incompatible

with themselves. These events cannot exist with any other event in our
universe.
In 1949, Kurt Godel wrote a paper on relativity theory. In this paper Godel
constructed a mathematical model in which time travel would be possible.
On page 168 of [21], Rudy Rucker describes an interview with Godel in
which Rucker asks about the time traveller paradoxes. Godel answers that
the universe simply will not let you kill your grandfather. Just like P and -,P
cannot both be true, so too the universe will not let you do something that
will cause a contradiction. Godel's words are worth quoting:
"... time-travel is possible, but no person will ever manage to kill

his past self." G6del laughed his laugh then, and concluded, "The
a priori is greatly neglected. Logic is very powerful." O
?4. Diagonal Theorem and generalizations. The contrapositive of Cantor's

Theorem is of equal importance.
THEOREM 3 (Diagonal Theorem). If Y is a set and there exists a set T

and a function f: T x T - Y such that allfunctions g: T - Y are
representable by f (there exists a t E T such that g(-) = f(-, t)), then all
functions a: Y - Y have afixedpoint.
PROOF. The proof is constructive. Let Y, T, f and a be given. Th

construct g as follows:
TxT > Y
A
A Ca
T g >Y.
g is defined as
g(m) = a(f(m,m)).
Since we have assumed that g is represent
g(m) = f(m, t).

And so we have a fixed point of a at yo = g (t). Explicitly we have
a(g(t)) = a(f (t, t)) by representation of g

= g(t) by definition of g. o
REMARK 4. Obviously, any set Y with two or more elemen

Y - Y that do not have fixed points. It is here that we g
ignoring the category theory that is necessary. In the exam
do, the objects we will be dealing with have more structure t
the functions between the objects are required to preserve th
are only talking about these restricted classes offunctions.
more explanations.
REMARK 5. It is important to note that the theorem uses a
esis than the proof actually uses. The theorem asks tha
representable, but the proof only uses the fact that any g con
manner is representable. In the future, we shall use this fact
that constructed g be representable.
?5. Instances of diagonal theorems. We use Mendelson

and language. In particular rB(x)' is the Godel number o
assume that we are working in a theory where there is a re
that is defined as follows: For all B(x) where B is a logical st
its only free variable then
D(B3(x)') = B(B(x)')1
THEOREM 4 (Diagonalization lemma). For any well-formed formula (
E(x) with x as its only free variable, there exists a closedformula C such t
PROOF. Let Lind' be the set of Lindenbaum classes (algebra) of well-

formed formulas with i free variables. Two wfs are equivalent iff they are
provably logically equivalent. Let f: Lind1 x Lind1 - Lind0 be defined
for two wfs with a free variable B(x) and H(y) as follows:
f (B(x), -(y))= -(r-B(x)1).

Let the operator on Lind0? E: Lind0 - Lind0 be defined as P
CE(P) = ?(rP1). Using these functions, we combine them to create
as follows:
Lind1 x Lind1 > Lind0

A
A 0^
Lind1 Lindo.
By definition
g(B(x)) = q>E(f(B(x), B(x))) = (rB(

We claim that g is representable by S(x) =
g(3(x)) = ?(r13(rf(x)-)n) = ?(D(

= g(rB(x)-) = f(B1(x), 5(y)).
So there is a fixed point of Oe at C = ('rg(
?(rG(r-g(x)-')) -= ((r-9(r9(x)-)) by d
= g((f ((x), 9(x))) by defini
= g(g(x)) by definition of g
= f ((x), G(x)) by representab
= G(rG(x)') by definition off. f
Application: Godel's first incompleteness theorem. Let Prov(y, x) stand

"y is the Godel number of a proof of a statement whose Godel number
Then let
?(x) (y)- Prov(y,x).

A fixed point for this ?(x) in a consistent and co-consistent theory is a
sentence that is equivalent to its own statement of unprovability. E
Application: Godel-Rosser's incompleteness theorem. Let Neg: N > N
be defined for Godel numbers as follows
Neg(B3(x)1) = r-B(x)
Let
?(x) = (Vy)(Prov(y, x) -> (3w)(w < y) A Prov(w, Neg(x))).

A fixed point for this ?(x) in a consistent theory is a sentence that is
alent to its own statement of unprovability. D
Application: Tarski's theorem. Let us assume that there exists a w
formula T(x) that expresses the fact that x is the Godel number of
theorem in the theory. Set
E(x) --(x).
A fixed point of (x) shows that T(x) does not do what it is supposed to
We conclude that a theory in which the diagonalization lemma holds cann
express its own theoremhood. E
Application: Parikh sentences. There are true
proofs, but there are relatively short proofs of
provable. This amazing result about lengths of
496 of R. Parikh's famous paper Existence and
Consider a consistent theory that contains Pe
with the following predicates:
* Prflen(m, x) = m is the length (in symbols
whose Godel number is x. This is decidab
finite number of proofs of length m.
* P(x) = 3y Prov(y, x) i.e., there exists a
Godel number is x.
* n (x) _ -(3m < n Prflen(m,x)).
Applying the diagonalization lemma to ,n (x) gives us a fixed point Cn such
that
- Cn < n(rCnT) --(3m < n Prflen(m, rCn)).

In other words Cn says
"I do not have a proof of myself shorter than n."
If Cn is false, then there is a proof shorter than n of Cn and the system is not
consistent.
Consider the following short proof of P(Cn)
1. If Cn does not have any proof, then Cn is true.
2. If Cn is true, we can check all proofs of length less than n and prove Cn.
3. From 1 and 2 we have that if Cn does not have a proof, then we can
prove Cn. i.e., -P(Cn) - P(Cn).
4. .-. P(Cn).
This proof can be formulated in Peano Arithmetic in a fairly short proof.
In contrast n can be chosen to be fairly large. So we have a statement Cn
which has a very long proof, but a short proof of the fact that it has a
proof. n
Application: Lob's paradox. We prove that every logical sentence is true.

The standard notation for the Godel number of a wff C is rC1. In contrast,
if n is an integer then we shall write LnJ for the wff that corresponds to that
number. Obviously L'-C_ = C
Let A be any sentence. We shall prove that it is always true. Use the
diagonalization lemma on
?(x) - LXJ = A.
A fixed point for this E(x) is a C such that
- C -> E('C') - (L'C, =X> A) = (C => A).
So C is equivalent to C = A. Assume, for a second that C is true. Then
C = A is also true. By modus ponens A is also true. So by assuming C we
have proven A. This is exactly what C = A says and hence it is true as is its
equivalent C and so A is true.
This looks like a real paradox. It seems to me that the paradox arises
because we did not put a restriction on the wffs E(x) for which we are
permitted to use the diagonalization lemma. The Lob's paradox is related
to Curry's paradox which shows that we must restrict the comprehension
scheme in axiomatic set theory. In a similar way, here we must restrict the
diagonalization lemma. Restricting the diagonalization lemma might seem
strange because its constructive proof seems applicable to all ?(x). But
restrict we must as we restrict the seemingly obvious comprehension scheme
in set theory.
The reviewer has suggested that the real reason why this paradox remains
is that the LxJ operation is not defined within the system and hence we are
not permitted to use it in E(x). o
Let us move from logic to computability theory.
and notation of [4].
THEOREM 5 (The Recursion Theorem). Let h
putable function. There exists an no E N such that
Oh(no) = 'no'
PROOF. Let . be the set of unary computable fu
N -> f be defined as f(m,n) - C n(m)' If On,
f(m, n) is also undefined. Letting the operator
oh (?n) = 4h(n). We have the following square:
NxN >
A
A~~9 >
N g >.
g is defined as g(m) = Oh(qm(m)). By the S-M-N theorem there is

computable function s(m) such the ^h(,m(m)) = es(m). Since s is to
computable, there exists a number t such that s(m) = qt(m) and
representable because g(m) = >h(obm(m)) = -s(m) = 4qt(m) = f(m,
there is a fixed point of (Dh at no = t ,(t). Explicitly we have
Olh(ct(t)) = 4h(C4,(t)) by definition of (h

= (Dh(f(t, t)) by definition of f
= g (t) by definition of g
= f (t, t) by representability of g
= ()t(t) by definition of f. o
Application: Rice's theorem. Every nontrivial property of computable
tions is not decidable: Let A be a nonempty proper subset of F, the
all unary computable functions. Let A = {x I qx E A}. Then A i
recursive. We prove this by assuming (wrongly) that A is recursive
a E A and b ~ A. Define the function h as follows.
h(x) aa ifx , A
ab: if x E A.
By definition x E A iff h(x) 4 A. From our assumption, we have t

is computable (and total). Hence by the recursion theorem, there is
such that >h(no) = Ono. Now we have the following contradiction:
no E A h (no) 4 A by definition of h
h(no) ? A by the definition of A
O=no A by the recursion theorem
==no ? A by definition of A. o
Application: Von Neumann's self-reproducing machines. A se

ing machine is a computable function that always outputs its o
It might seem impossible to construct such a self-reproducing
in order to construct such a machine, we would need to know
and hence know the machine in advance. However, by a sim
of the recursion theorem, we get such a machine.
By a description of a machine, we could mean the numbe
putable function i.e., a self-reproducing machine is a func
for all input x.
Let f: N x N - N be the computable projection function
By the S-M-N theorem there exists a total computable functi
Os(y) (x) = f(y, x) = y. From the recursion theorem, there exi
that =n(x) = 45(n)(X) = f(n, x) = n. E
?6. Future directions. There are many possible ways that we can go on
with this work. We shall list a few.
The general Cantor's theorem can be generalized further so that even more
phenomena can be encompassed by this one theorem. For example what
if we have two sets Y and Y' and there is a onto function from Y to Y'.
What does this say about the relationship between f: T x T ) Y and
f': T x T - Y'? We should get the concept of a paradox "reduction"
from one paradox to another.
Rather than simply talking about sets and functions, perhaps we should
be talking about partial orders and order preserving maps. With this gener-
alization, we might be able to not only get fixed point theorems but also least
fixed point theorems. There are many simple least fixed point theorems such
as ones for continuous maps of cpo's and Scott domains; Kripke's definition
of truth [5] and the Knaster-Tarski theorem.
Some more thought must go into Richards and Lbb's paradoxes. Al-
though we have stated their limitations, the paradoxes remain. Perhaps we
are not formulating them correctly or perhaps there is something intrinsically
problematic about these paradoxes.
There are many fixed point theorems throughout logic and mathematics
that are not of the type described in Sections 3 and 4. Can we in some sense
characterize those fixed point theorems that are self-referential?
It seems that the key component of the diagonalization lemma is the
existence of a recursive D: N - N that is defined for all B(x) as
D(F-B(x)') = rB(rB(x)-1).
Similarly, in order to have the recursion theorem we needed the S-M-N
theorem. These two properties of systems are the key to the fact that the
systems can talk about themselves. Are these two properties related to each
other? More importantly, can we find other key properties in systems that
make self-reference possible?
In the introduction of this paper we talked of the lack of an onto function
T -- YT and we said that Y may be thought of as truth-values or properties
of objects in T. Can we find a better word for Y? In Section 5 where we
talked about an onto function Lind1 - Lind?Lind where Lind1 is the
Lindenbaum classes of formula with i variables. In what sense is Lind0 the
truth-values or properties of Lind 1? We then went on to talk about an onto
function N -> yN where F is the set of unary computable functions. We
used this onto function to prove The Recursion Theorem. In what sense is
F the truth-values or the properties of N?
As for more instances of our theorems, the field is wide open. There are
many paradoxical phenomena and fixed point theorems that we have not
talked about. Some of them might be amenable to our scheme and some
might not be.
* There are many of the semantic paradoxes that we did not discus
Berry paradox asks one to consider the sentence "Let x be th
number that cannot be described by any sentence with less th
characters." We just described such a number.
* The Crocodile's Dilemma is an ancient paradox that is a devious
self-referential paradox. A crocodile steals a child and the mother
child begs for the return of her beloved baby. The crocodile resp
"I will return the child if and only if you correctly guess whether
I will return your child." The mother cleverly responds that he wi
the child. What is an honest crocodile to do?!?
* There is a belief that all paradoxes would melt away if there were no self-
referential statements. Yablo's Non-self-referential Liar's Paradox was
formulated to counteract that thesis. There is a sequence of statements
such that none of them ever refer to themselves and yet they are all both
true and false. Consider the sequence
(Si): For all k > i, Sk is false.
Suppose Sn is true for some n. Then Sn+l is false as are all subsequent
statements. Since all subsequent statements are false, Sn+l is true which
is a contradiction. So in contrast, Sn is false for all n. That means that
S1 is true and S2 is true etc etc. Again we have a contradiction.
* Brandenburger's Epistemic Paradox [3] considers the situation where
Ann believes that Bob believes that Ann believes that Bob has a
false belief about Ann.
Now ask yourself the following question: Does Ann believe that Bob
has a false belief about Ann? With much thought, you can see that this
is a paradoxical situation.
* Curry's paradox is a paradox about logic and set theory that is very
similar to L6b's paradox.
* The Ackermann function is not a primitive recursive function. One
hears the phrase that Ackermann's function "diagonalizes-out" of prim-
itive recursive functions.
* There is a famous Paris-Harrington result which says that certain
generalized Ramsey theorems cannot be proven in Peano Arithmatic.
Kanamori and McAloon [9] make the connection to the Ackermann
function. Just as the Ackermann function "diagonalized-out" of prim-
itive recursiveness, so too, generalized Ramsey theory is "diagonalized-
out" of Peano Arithmetic. Both of these are really stating limitations
of the systems.
There are many instances of fixed point theorems that might be put into
the form of our scheme.
* Borodin's Gap Theorem is a type of fixed point theorem in complexity

theory that might be right for our scheme.
* We again mention the Knaster-Tarski theorem about monotonic func-

tions between preorders. There is also a much used theorem about fixed
points of continuous functions between cpo's.
* As the ultimate in self-reference, we would like to mention Kripke's
theory of truth that he used to banish self-referential paradoxes. It is,
in essence, a type of fixed point theorem. It would really be nice to
formulate that way of dealing with paradoxes in our language.
* Brouwer's fixed point theorem, or the far simpler intermediate value
theorem.
* Nash's equilibria theorem and its many generalizations from game the-
ory.
There are several theorems from "real" mathematics that are proved via
diagonalization proofs. We might be able to put them into our language.
* Baire's category theory about metric spaces.
* Montel's theorem from complex function theory.
* Ascoli theorem from topology.
* Helly's theorem about limits of distributions.
The following ideas are a little more "spacey."
* G6del's second incompleteness theorem about the unprovability within
arithmetic of the consistency of arithmetic. This theorem is a usually
proved as a consequence of the first incompleteness theorem. However
Kreisal has a direct model theoretic proofs that uses a diagonal method
(see, e.g., page 860 of Smoryniski's article in [2].) This proof seems
amenable to our scheme.
* Many of Chaitin's algorithmic information theory arguments seem to
fit our scheme.
* We worked out Godel's first incompleteness theorem which showed
that (using the language of the introduction) arithmetic cannot com-
pletely talk about its own provability. What about Godel's completeness
theorem? Certain weak systems can completely talk about their own
provability. Can this be stated as some type of fixed point theorem?
REFERENCES
[1] THEODORE BAKER, JOHN GILL, and ROBERT SOLOVAY, Relativizations of th

question, SIAM Journal on Computing, vol. 4 (1975), no. 4, pp. 431-442.
[2] Jon Barwise (editor), Handbook ofmathematicallogic, North-Holland Publish
Amsterdam, 1977, With the cooperation of H. J. Keisler, K. Kunen, Y. N. Moschov
A. S. Troelstra, Studies in Logic and the Foundations of Mathematics, vol. 90.
[3] ADAM BRANDENBURGER, Thepower ofparadox, available at http: //www. peo
edu/abrandenburger/paradox-03-12-021.pdf.
[4] NIGEL CUTLAND, Computability, An introduction to recursive function theo
bridge University Press, Cambridge, 1980.
[5] MELVIN FITTING, Notes on the mathematical aspects of Kripke's theory of trut
Dame Journal of Formal Logic, vol. 27 (1986), no. 1, pp. 75-88.
[6] ALEX HELLER, An existence theorem for recursion categories, The Journal of S
Logic, vol. 55 (1990), no. 3, pp. 1252-1268.
[7] JOHN E. HOPCROFT and JEFFREY D. ULLMAN, Introduction to automata theo
guages, and computation, Addison-Wesley Series in Computer Science, Addison-We
lishing Co., Reading, Mass., 1979.
[8] HAGEN HUWIG and AXEL POIGNE, A note on inconsistencies caused by fixpo
Cartesian closed category, Theoretical Computer Science, vol. 73 (1990), no. 1, pp.
[9] AKIHIRO KANAMORI and KENNETH MCALOON, On Godel incompleteness an
combinatorics, Annals of Pure and Applied Logic, vol. 33 (1987), no. 1, pp. 23-41.
[10] SHAUGHAN LAVINE, Understanding the infinite, Harvard University Press, Ca
MA, 1994.
[11] F. WILLIAM LAWVERE, Diagonal arguments and cartesian closed categories,
theory, homology theory and their applications, I (Battelle Institute Conference
Wash., 1968), Springer, Berlin, 1969, pp. 134-145.
[12] F. WILLIAM LAWVERE and ROBERT ROSEBRUGH, Sets for mathematics, Ca
University Press.
[13] F WILLIAM LAWVERE and STEPHEN H. SCHANUEL, Conceptual mathematics
introduction to categories, With the assistance of Emilio Faro, Fatima Fenaroli an
Lawvere. Buffalo Workshop Press, Buffalo, NY, 1991.
[14] SAUNDERS MAC LANE, Categories for the working mathematician, second ed.,
ate Texts in Mathematics, vol. 5, Springer-Verlag, New York, 1998.
[15] YURI I. MANIN, Classical computing, quantum computing, and Shor's factor
rithm, Asterisque, (2000), no. 266, Exp. No. 862, 5, pp. 375-404, Seminaire Bourb
1998/99.
[16] ELLIOTr MENDELSON, Introduction to mathematical logic, fourth ed., Chapman &
Hall, London, 1997.
[17] PHILIP S. MULRY, Categorical fixed point semantics, Theoretical Computer Science,
vol. 70 (1990), no. 1, pp. 85-97, Fourth Workshop on Mathematical Foundations of Pro-
gramming Semantics (Boulder, CO, 1988).
[18] ROHIT PARIKH, Existence andfeasibility in arithmetic, The Journal of Symbolic Logic,
vol. 36 (1971), pp. 494-508.
[19] DUSKO PAVLOVI(, On the structure of paradoxes, Archive for Mathematical Logic,
vol. 31 (1992), no. 6, pp. 397-406.
[20] ANDREW M. PITrs and PAUL TAYLOR, A note on Russell's paradox in locally Cartesian
closed categories, Studia Logica, vol. 48 (1989), no. 3, pp. 377-387.
[21] RUDY RUCKER, Infinity and the mind: The science and philosophy of the infinite,
Birkhiiuser, Boston, Mass., 1982.
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE

BROOKLYN COLLEGE, CUNY
BROOKLYN, NY. 11210
and
DEPARTMENT OF COMPUTER SCIENCE
THE GRADUATE CENTER, CUNY
365 FIFTH AVENUE
NEW YORK, N.Y 10016
E-mail: noson@sci.brooklyn.cuny.edu

Yanofsky - Universal Approach

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Yanofsky - Universal Approach

Uploaded by

Copyright:

Available Formats

A Universal Approach to Self-Referential Paradoxes, Incompleteness and Fixed Points

Author(s): Noson S. Yanofsky

Volume 9, Number 3, Sept. 2003

A UNIVERSAL APPROACH TO SELF-REFERENTIAL PARADOXES,

The point of these observations

Abstract. Following F William Lawvere, we show that many self-referential paradoxes,

?1. Introduction. In 1969, F William Lawvere wrote a paper [11] in which

Received April 15, 2003; accepted June 28, 2003.

? 2003, Association for Symbolic Logic

where yT is the set of functions from T to Y. Y can be thought of

Acknowledgments. The author is grateful to Rohit Parikh for suggesting

?2. Cantor's theorems and its generalizations. It is pedagogically sou

g(t) = a(f(t, t)).

We generalize the above theorem so that instead of A = (Id, Id) we use

PROOF. Let Y, a, T and ,B be given

f({p(so), so) = ga(P(so)) by represe

We can think of this theorem in another way. Set S = T and lets co

Everything that is in square brackets gets changed. For example y

?3. Instances of Cantor's theorems. We shall begin with the famil

For each m, f (-, m) is the characteristic function of Smn

g is the characteristic funct

For all m, XG = g(-) 7 f(-, m) = Xsm. Because if there was an m

f(mo, mo) = g(mo) = a(f(mo, mo))

where the first equality is from the fact that g is representable by

g is the characteristic function

f (to, to) = g(to) = a(f

where the first equality is beca

Instance: Grelling's paradox. We now move on t

And so we have the following construction of g

Adj x Adj > 2

g is the characteristic function of a

'yields falsehood or meaninglessness

Now consider the function a: 3 > 3 defined as a(T) = F and a(M) =

Sent x Sent > 3

g is the characteristic function o

f(n, m) = The n-th decimal number of the m-th Richard Sentence.

x is the real number between 0 and 1 whose n-th digit is nine

1. N and 2 are objects in U

Halt in a computable universe should be a total function Halt: N

Halt(-, r-f) = Xw(f).

Consider a: 2 - 2 defined as follows: a(0) = 1 and a(l) X, i.e., the

We conclude by showing that H

Halt(rgn, rg) = 1 iff rg- E W

Hence no total Halt can exist. O

Instance: A non-r.e. language. Th

f (wi,) 1:1 ifwi is accep

Then the constructed g

is the characteristic function of a lan

We shall define f(-, -) inductively. (Vw E ?*)f(w, O) = 1. For w

1. (Vw' < w)f(w', i) = 1 where the < is a lexicographical order o

2. MF(-'i) rejects OIwl within il?gi steps.

where ,l(w) = Iwl, a(0) =

LB = {0i I B contains a word of length i}.

This language can easily be recognized by a linear time nondeterministic

f: Events x Events -* 2 defined for all events el and e2,

f(e, e2) 1: if e2 is compatable with el

For example f ( the sun shining at a particular time in a particular place,

Events x Events > 2

g is the characteristic function of those events that would be incompatible

"... time-travel is possible, but no person will ever manage to kill

?4. Diagonal Theorem and generalizations. The contrapositive of Cantor's

THEOREM 3 (Diagonal Theorem). If Y is a set and there exists a set T

PROOF. The proof is constructive. Let Y, T, f and a be given. Th

g(m) = f(m, t).