Logical Prior Probability: 1 Introduction & Motivation

Logical Prior Probability
Abram Demski
Institute for Creative Technologies, 12015 Waterfront Drive, Playa Vista, CA 90094
A Bayesian prior over rst-order theories is dened. It is

shown that the prior can be approximated, and that it is exponentially
dominant over the Solomono distribution when applied to sequence
prediction.
Abstract.
1 Introduction & Motivation

Logical and algorithmic representations are two broad categories of knowledge
representation, often referred to as
declarative and procedural. Declarative knowl-
edge consists of logical statements; to use it, we make inferences. Procedural

knowledge consists of programs which compute something; to use it, we execute
the program.
These terms are generally understood, but it would be nice to have a more
formal distinction if they are to be contrasted. For the purposes of this paper,
the following denitions will suce:
Procedural:
Provides an explicit method of
havior.
Declarative:
Provides
constraints
generating
a desired object or be-
to which an object or behavior must con-
form.
The two alternatives will be studied in the context of Bayesian modeling. Bayesian
modeling consists of specifying a prior probability (or simply
prior ) over a space
of possible models. The priors discussed here relate to the minimum-descriptionlength principle (MDL), a unifying concept from model selection which balances
simplicity with explanatory power to avoid overmatching.
In an eort to dene the MDL principle in the most general way possible,
the notion of Kolmogorov complexity has emerged: the Kolmogorov complexity
of an object is the length of the shortest computer program which reproduces
that object[3]. This gives us an algorithmic notion of description length, with
many nice properties.
Considering algorithmic description length in a Bayesian setting has led to
the denition of the Solomono distribution, an algorithmic prior for sequence
prediction. This distribution has been put forward as a theory of rational belief
?
This eort has been sponsored by the U.S. Army and the Air Force Oce of Scientic
Research. Statements and opinions expressed do not necessarily reect the position
or the policy of the United States Government, and no ocial endorsement should
be inferred.
given evidence; in that context it is referred to as Solomono induction. Although

still contentious, this idea has gained some popularity as a foundational theory
of inductive learning.
This paper explores an alternative based on logical description length. Declarative representations turn out to yield a prior which assigns positive probability
to more possible circumstances than Solomono induction. This is contrary to
the intuitions of many researchers.
1 Computations can be described within logic,
and the inference rules of logic can be implemented in computers, so it is easy to

think they must be representationally equivalent. However, this is not the case.
The Solomono distribution can be seen as the consequence of equating describable with computable. This is justied to the extent that the formal theory of computation captures the informal notion of description. Computation
has certainly proven to be a very general tool, but there are a few things which
seem to be describable but which we know are not computable. The primary
examples come from mathematics: many mathematical concepts which are commonly described in rst-order logic are fundamentally incomplete (as explained
in 2). This means that we can provide a partial set of axioms, which answer
some questions, but no nite formulation can answer all the questions. Because
the Solomono distribution requires programs to output all answers sequentially,
incomplete formulations like this are given no probability.
Giving positive probability to situations which can be described in terms
of constraints, but for which no generative description exists, seems to capture
something about human probabilistic reasoning which is not captured by the
Solomono distribution. For example, humans can sonsider the possibility of
an uncomputable physics (such as in [5]). While humans are no better than
computers at using such theories as generative models, we
can
consider the
constraints on observations imposed by such theories, and reason logically to

arrive at consequences which can be compared to observations.
2 Selected Background and Notation

We will be using rst-order logic, dening the language
L of rst-order sentences
as follows:
of predicate symbols,
v1 , v2 , ... V , an innite stock

p1 , p2 , ... P , and an innite stock of function symbols,
There is an innite stock of variable symbols,
f1 , f2 , ... F .
The number of arguments fed to a predicate or function is referred to as
its
arity.
For example, a predicate of arity 2 is typically referred to as a
relation. A function of arity 0 is referred to as a constant, and a predicate of

arity 0 is a proposition. For simplicity, the arity of a symbol will be inferred
from its use here, rather than set ahead of time. If the same symbol is used
This statement is based on personal correspondances. Almost all those who I discussed the rst-order prior with suspected that it would equivalent to the Solomono
distribution.
with multiple arities, the uses are independent (so
functions in
An
f2 (v1 )
versus
f2
would notate distinct
f2 (v1 , v2 )).
expression is a composition of function symbols and variable symbols, for
f1 (f1 (v1 )). Specically, the set of expressions E are dened inducV E , and for every function fn F of arity a and expressions
e1, e2, ..., ea E , we have fn (e1 , e2 , ...ea ) E .
For e1 , e2 E , e1 = e2 is in L; this represents equality.
For pn P of arity a and e1, e2, ..., ea E , we have pn (e1, e2, ..., ea ) L.
For A, B L, we have AB L and AB L; these represent conjunction
example
tively by:

and disjunction, respectively.

For
S L, we have S L. This represents negation.

S L and vn V , we have vn .S L
For any
and
vn .S
L,
representing universal and existential quantication.
B (meaning, B is true in any situation

A B . The notation also applies to multiple
premises; if A and B together imply C , we can write A, B C . Uppercase greek
letters will also be used to denote sets of sentences. We can write A 2 B to say
that A does not logically imply B .
If A implies B according to the inference rules (meaning, we can derive B
starting with the assumption A), we write A ` B . This notation applies to
multiple premises as well, and can be denied as 0.
If sentence
in which
logically implies sentence
is true), then we write
The inference rules will not be reviewed here, but some basic results will be
important. These results can be found in many textbooks, but in particular, [1]
has material on everything mentioned here.
Soundness. For a sentence S
and a set of sentences
if
` S,
then
S.
That is, the inference rules will never derive something that doesn't logically
follow from a set of premises.
Completeness.
` S.
For a sentence
and a set of sentences
if
S,
then
That is, the inference rules can derive anything which logically follows
from a set of premises.

Since the rules for
can be followed by a computer, this shows that
is
computably enumerable: a (non-halting) program can enumerate all the true

instances of
` S.
Incompleteness. There exists a nite set of sentences for which no consistent
extension decides all sentences formed from the functions and predicates used
such that and is consistent, there will exist a

S (actually, many sentences) in which any function or predicate symbols
are also found in , and for which neither ` S nor ` S . This means that
in
That is: for any
sentence
does not tell us the truth or falsehood of all questions in its subject matter, and
neither do any extensions of
There exist many such
Basic number theory
as codied by the axioms of Robinson arithmetic is an example. The inference

relation itself,
`,
is another example: we can code expressions to stand for the
sentences of rst-order logic, and use a relation to represent
`.
We can show
that any suciently strong theory of entailment will have no complete extension
which decides
or
for all pairs of sentences.
Encoding computations.
Any computable function can be encoded in rst-
order logic. This can be done, for example, by providing axioms related to the
behavior of Turing machines.
In addition to rst-order logic, it will be useful to have some background on
the Solomono distribution.
denotes the binary alphabet,
length
n; B
{0, 1}; B n
denotes the set of binary strings of
denotes the set of binary strings of any nite length;
the set of binary strings of innite length; and
SB = B B
denotes
denotes the set
of nite and innite binary strings. String concatenation will be represented by

adjacency, so
ab
is the concatenation of
Consider a class
and
b.
of Turing machines with three or more tapes: an input
tape, one or more work tapes, and an output tape. The input and output tape
T C denes
fT from B to SB : for input i B , fT (i) is considered to be
T writes to the output tape, which may be innite if T never
are both able to move in just one direction. Any Turing machine
a partial function
the string which
stops writing output. Now consider a
universal
machine from this class; that
U C such that for any other machine T C , there is a nite
sequence of bits s B which we can place on U 's input tape to get it to behave
exactly like T ; that is, fU (si) = fT (i) for all i.
A distribution M over SB can be dened by feeding random bits to U ; that
is, we take fU (i) for uniformly random i B . The Solomono distribution is

a little bit dierent: we apply the Solomono normalization to M , which gives
a distribution over B . The details of normalization will be skipped.

is, a machine
The development here has been adapted from [3].

Now, how do we compare two distributions?
P1 multiplicatively dominates P2 i there exists > 0 such that P1 (x) >

P2 (x) for any x. An intuitive way of understanding this is that P1 needs at most
2
a constant amount more evidence to reach the same conclusion as P2 . Strict
multiplicative dominance means that P1 multiplicatively dominates P2 , but the
reverse is not the case. This indicates that P1 needs at most a constant amount
more evidence to reach the same conclusion as P2 , but we can nd examples
where P2 needs arbitrarily more evidence than P1 to come to the conclusion P1
reaches.
The main reason that the Solomono distribution is interesting is that it
is multiplicatively dominant over any computable probability distribution for
sequence prediction. This makes the Solomono distribution a highly general
tool.
P1 exponentially dominates P2 i there exists , > 0 such that P1 >

P2 (x) . This intuitively means that P1 needs at most some constant multiple of the amount of evidence which P2 needs to reach a specic conclusion.
Strict
2
exponential dominance again indicates that the reverse is not the case,
This is true if we measure evidence by the log of the likelihood ratio. P1 (x|e) =
P1 (x)P1 (e|x)/P1 (e), so multiplicative dominance indicates thatP1 (e|x)/P1 (e) doesn't
have to get too extreme to bridge the distance between P1 and P2 .
which means that
P2
some conclusions that
needs
P1
more
than multiplicatively more evidence to reach
can reach.
3 A Notion of Logical Probabilities

3.1
Requirements
I will follow [6] in the development of the idea of a probability distribution

over a language, since this provides a particularly clear idea of what it means
for a continuous-valued belief function to t with a logic. I shall say that the
distribution
respects the logic. The approach is to dene probability as a function
on sentences in a language, rather than by the more common
-algebra approach,
and require the probabilities to follow several constraints based on the logic. Since
we are using classical logic, I will simplify their constraints for that case.
Let
be the language of rst-order logic from section 2. We want a proba-
bility function
P :LR
to obey the following rules:
P (A) = 0 if A is refutable.
P (A) = 1 if A is provable.
If A logically implies B , then P (A) P (B).
P (A) + P (B) = P (A B) + P (A B).
(P0)
(P1)
(P2)
(P3)
From these, we can prove other typical properties such as
3.2
P (A) + P (A) = 1.
Denition As a Generative Process
The idea behind the prior is to consider theories as being generated by choosing
sentences at random, one after another. The probability of a particular sentence
is taken to be the probability that it occurs in a theory randomly generated in
this manner.
To be more precise, suppose we have some random process to generate individual sentences from our language
ability
G(S).
L.
Each sentence
is generated with prob-
We would like a description-length prior. Specically,
decrease exponentially with the length of
S.
G(S)
should
The specics of measuring length
will not be too important; we can suppose that each symbol has been assigned
some binary encoding, so that we can measure the length of a sentence in bits.
L. To generate a random theory, we generate

S1 , S2 , S3 , ... according to the following process. For
each Sn , use sentences from G , but discarding those which are inconsistent with
the sentences so far; that is, rejecting any candidate for Sn which would make
S1 ... Sn into a contradiction. (For S1 , the set of preceding sentences is empty,
A theory is a set of sentences in
a sequence of sentences
so we only need to ensure that it does not contradict itself.)

Notice that there is no stopping condition. The sequence generated will be
innite. However, the truth or falsehood of any particular statement (or any
nite theory) will be determined after a nite amount of time. (The remaining
sentences generated will either be consequences of, or irrelevant to, the statement
in question.) Shorter (nite) theories will have a larger probability of occurring

in the sequence.
In this way, we induce a new probability distribution
G . PL (S) is the probability

present in a sequence S1 , S2 , S3 , ... generated from G
the one we began with,
PL
on sentences from
that a sentence
will be
as described. Unlike
G , PL
respects the logic:
Theorem 1. PL obeys (P0)-(P3).
Proof.
(P0) is satised easily, since the process explicitly forbids generation of
contradictions. (P1) is satised, because a provable statement can never contradict the sentences so far, so each will eventually be generated by chance as we
continue to generate the sequence. Therefore, provable statements are generated
with probability 1. (P2) is satised, by a similar argument: if we have already
generated
A,
A, but A implies B , then anything which contradicts B will contradict

B will never be ruled out,
and hence never be generated. This means that
and so must eventually be generated at random.
A ` A B and
B ` A B , the sentence A B will occur in any theory in which A or B occurs.
If both A and B occur in a theory, then A B would be contradictory, so will
not occur; therefore, A B will eventually be generated. Since A B occurs
with probability 1 wherever A or B occurs, and A B occurs with probability
We can extend the argument a bit further to show (P3). Since
1 where both are present, the equation (P3) must hold.

The conditional probability can be dened as usual, with
B)/PL (B).
We can also extend the denition of
sets of sentences, so that
PL ()
PL (A|B) = PL (A
to include probabilities of
PL ( ) for L is the probability that all S
will be
present in a sequence generated by the process dened above. (By an argument

similar to the one used to prove (P3), the probability of a set of sentences will
be equal to the probability of the conjunction.)
3.3
Approximability
The generative process described so far cannot be directly implemented, since

there is no way to know for sure that a theory remains consistent as we add sentences at random. However, we can asymptotically approach
PL () by eliminating
inconsistent possibilities when we nd them.

Suppose we want to approximate
S1 , S2 , ..., Sn
t =1 ,
y =1 ,
PL (A).
I shall call a partial sequence
prex. Consider the following Monte Carlo approximation:
n =1.
loop :
p r e f i x =n o n e
//
Reset
the
prefix
at
the
beginning
of
each
Notice, this means any theory generated in this manner will contain all of its logical
consequences with probability 1. This allows us to talk just about what sentences
are in the theory, when we might otherwise need to talk about the theory plus all
its logical consequences.
loop .
while
not
( s=A o r
s=n e g (A ) ) :
s=g e n e r a t e ( )
//
Until
//
Get
p r e f i x =p u s h ( s , p r e f i x )
//
Append
c=c h e c k ( p r e f i x , t )
//
Spend
if
c:
( s=A ) :
y=y+1
else :
n=n+1
t=t +1
The variables
while
and
we
random
get A or
//
If
backtrack .
//
If
the
//
increment
//
Otherwise ,
n e g (A) ,
sentence .
sample
time
//
pop ( p r e f i x )
if
to
the
sequence
looking
contradiction
generated
for
is
prefix
found ,
contains
increment
n.
//
Increment
at
the
end
of
each
loop .
count the number of positive and negative samples,
provides a continually rising standard for consistency-detection on the
sampled prexes. (I will use the term sample to refer to generated prexes,
rather than individual sentences.) To that end, the function check (,) takes a
prex and an amount of time, and spends that long trying to prove a contradiction from the prex. If one is found, check returns true; otherwise, false. The
specic proof-search technique is of little consequence here, but it is necessary
that it is exhaustive (it will eventually nd a proof if one exists). The function
neg() takes the negation; so, we are waiting for either
or
to occur in each
sample. The prex is represented as a FILO queue. push() adds a sentence to

the given prex, and pop() removes the most recently added sentence.
The inner loop produces individual extensions at random, backtracking whenever an inconsistency is found. The loop terminates when a theory includes either
Theorem 2.
Proof.
y
n+y
based on the result, increments

to get another sample.
will approach PL (A).
The standard of consistency is continually rising, so that it becomes less
and less likely that an inconsistency in a sample goes undetected. Since every
inconsistency has a nite amount of time required for detection, this means the
probability of an undetected inconsistency will fall arbitrarily far as
rises. The
probability of consistent samples, however, does not fall. Therefore, the counts
will eventually be dominated by consistent samples.
The question reduces to whether the probability of a consistent sample containing
is equal to
PL (A).
We can see that this is the case, since if we assume
that the generated sentences will be consistent with the sentence so far, then the
generation probabilities are exactly those of the previous section.
3.4
Dominance
In order to compare the rst-order prior with algorithmic probability, we need to

apply the rst-order prior to sequence prediction. We can do so by encoding bit
sequences in rst-order logic. For example,
f1
A,
y.
//
A or A. The outer loop then increments y or n

t, erases the prex, and re-enters the inner loop
so
can serve as a logical constant rep-
resenting the sequence to be observed and predicted;
f2 ()
far .
contradictions .
can represent adding
a 0 to the beginning of some sequence; and
f3 ()
can represent adding a 1. So, to
say that the sequence begins 0011... we would write

where
f4
f1 = f2 (f2 (f3 (f3 (f4 )))),
is a logical constant standing for the remainder of the sequence.
Dene the algorithmic probability of a rst-order sentence

probability that
S as the Solomono
S will
is consistent with the data; that is, the probability that
be compatible with a sequence drawn from the Solomono distribution. This will
be denoted
PA (S).4
Theorem 3. PL strictly exponentially dominates PA .
Proof.
We can describe the universal machine
from section 2 in a rst-order
f1 to represent the output

D1 . Now any nite input sequence s of length l has
l
probability 2 . The sequence s has a description D2 in rst-order logic of length
at most + l, where is derived from the length of f2 , f3 , f4 , and the necessary
parentheses. absorbs the cost of the initial f1 = and any additional coding
sentence (which will be a large conjunction), taking
tape. Call this description
costs, such as terminal symbols. Given that such a description will be consistent,
we know that there is a chance that it will be generated as the rst statement
in the theory, with probability
G(D1 D2 ),
where
>1
is the normalization
constant to account for the rejection of inconsistent sentences as we draw sen-
G . The description length of D1 D2 will be + l, where is plus

G is exponential in description length with some base b,
l
l
l
this means that PL (D1 D2 ) b b < b b = (2 ) for some , . Thus PL
tences from
some constant. Since
is multiplicatively dominant. It remains to show that this dominance is strict.
Consider some incomplete theory,
and the sublanguage of that theory,
as dened in section 2. There exists a computable bijection from the sentences

in that sublanguage to natural numbers; that is, we can put the sentences in
an order, so that each sentence gets a dierent number. We can therefore use
sequence locations to talk about the truth or falsehood of dierent statements;
each location is associated with a sentence. With this, we can see
as partially
specifying a binary sequence: place a 1 in every location whose number corresponds to a sentence provable by
by
it will get positive probability in

constraint
Since no extension of
PL .
Call the set of sequences satisfying this
decides all the statements of the sublanguage, there
which we can feed to
to ensure that the output agrees
in this way. (If there were, we could decide the sublanguage with the
s000...,
nitely-describable program
means that no subset of
such that
which we could encode as a theory.) This
must be
there can be no
, > 0
is assigned positive probability by
a set of measure zero. Since
and a 0 wherever a sentence is refutable
are no nite strings

with
This constraint on sequences can be encoded as a nite theory. As such,
PA () = 0
PA () > PL () .
and
PL () > 0,
PA ,
so
This proves the desired result.
Note that this PA does not respect the logic. If a consistent sentence A has no logical
relationship to the data sequence, then PA (A) = PA (A) = 1.
This leaves open the question of strict multiplicative dominance. However,

is certainly not multiplicatively dominant over
PL ;
so if
PL
PA
is multiplicatively
dominant, it is strictly so.
4 Conclusion & Questions

PL
can converge to the truth in more situations. However, these situations are
strange in that they display incompleteness. For example,
PL ,
like humans, is
capable of considering the possibility that physics is not computable (as mentioned in the introduction). There are is no obvious need for this in machine
learning. Yet, it does apparently capture something more about the way humans think.
A more practical application of this approach would be to human-like mathematical reasoning, formalizing the way that humans are able to reason about
mathematical conjectures. The study of conjecturing in articial intelligence has
been quite successful, but it is dicult to analyse this theoretically, especially
from a Bayesian perspective.
This situation springs from the problem of
logical omniscience [2]. The logical
omniscience problem has to do with the sort of uncertainty that we can have
when we are not sure what beliefs follow from our current beliefs. For example, we
might understand that the motion of an object follows some particular equation,
but be unable to calculate the exact result without pen and paper. Because
the brain has limited computational power, we must expect the object to follow
some plausible range of motion based on estimation. Standard probability theory
does not model uncertainty of this kind. A distribution which follows the laws
of probability theory will already contain all the consequences of any beliefs (it
is logically omniscient). Real implementations cannot work like that.
An agent might even have beliefs that logically contradict each other.
Mersenne believed that
267 =1 is a prime number, which was proved false
in 1903, [...] Together with Mersenne's other beliefs about multiplication

and primality, that belief logically implies that 0 = 1. [2]
Gaifman proposes a system in which probabilities are dened only with respect
to a nite subset of the statements in a language, and beliefs are required to
be consistent only with respect to chains of deduction in which each statement
occurs in this limited set.
I will not attempt to address this problem to its fullest here, but approximations to
PL
such as the one in section 3.3 seem to have some good properties in
this area. If the proof of some statement
A(t, X, Y )
to
PL (X|Y )
X is too long for some approximation

t, then S will be treated exactly like
to nd given time
a statement which is not provable: it will be evaluated with respect to how well
it ts the evidence
time
Y,
of
t.
Y,
A(t, X, Y )
x.S[x] can
given the connections which
For example, if some universal statement
can nd within

be proven from
but the proof is too long to nd in reasonable time, then the probability
x.S[x]
will still tend to rise with the number of individual instances
S[i]
which are found to be true (although this cannot be made precise without more
assumptions about the approximation process).
It is not clear how one would study this problem in the context of Solomono
induction. Individual beliefs are not easy to isolate from a model when the
model is presented as an algorithm. The problem of inconsistent beliefs does not
even arise.
I do not claim that the rst-order prior is a complete solution to this problem.
There is much more to be said about the nature of mathematical conjecture and
human mathematical reasoning.
This is not the rst published example of a prior more general than the
Solomono prior. Algorithmic priors more general than the Solomono prior have
been proposed in [4]. It would be helpful to establish the relationship between
the rst-order prior and those priors.
It would be interesting to consider similar priors over other logics. For example, what happens if we choose 2nd-order logic rather than 1st-order logic?
There are deep issues to consider.
References
1. G. Boolos, J.P. Burgess, and R.C. Jerey. Computability and Logic. Cambridge
University Press, 2007.
2. Haim Gaifman. Reasoning with limited resources and assigning probabilities to
arithmetical statements. Synthese, 140(1951):97119, 2004.
3. Ming Li and Paul M. B. Vitnyi. An introduction to Kolmogorov complexity and its
applications (2. ed.). Graduate texts in computer science. Springer, 1997.
4. SCHMIDHUBER. Hierarchies of generalized Kolmogorov complexities and nonenumerable universal measures computable in the limit. International Journal of Foundations of Computer Science, 13(4):587612, 2002.
5. Oron Shagrir and Itamar Pitowsky. Physical hypercomputation and the churchturing thesis. Minds and Machines, 13:87101, 2003. 10.1023/A:1021365222692.
6. Brian Weatherson. From classical to intuitionistic probability. Notre Dame Journal
of Formal Logic, 44(2):111123, April 2003.

Logical Prior Probability: 1 Introduction & Motivation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logical Prior Probability: 1 Introduction & Motivation

Uploaded by

Copyright:

Available Formats

Logical Prior Probability

A Bayesian prior over rst-order theories is dened. It is

1 Introduction & Motivation

declarative and procedural. Declarative knowl-

edge consists of logical statements; to use it, we make inferences. Procedural

Provides an explicit method of

a desired object or be-

to which an object or behavior must con-

prior ) over a space

given evidence; in that context it is referred to as Solomono induction. Although

1 Computations can be described within logic,

and the inference rules of logic can be implemented in computers, so it is easy to

constraints on observations imposed by such theories, and reason logically to

2 Selected Background and Notation

v1 , v2 , ... V , an innite stock

There is an innite stock of variable symbols,

For example, a predicate of arity 2 is typically referred to as a

relation. A function of arity 0 is referred to as a constant, and a predicate of

with multiple arities, the uses are independent (so

would notate distinct

expression is a composition of function symbols and variable symbols, for

and disjunction, respectively.

S L, we have S L. This represents negation.

representing universal and existential quantication.

B (meaning, B is true in any situation

logically implies sentence

is true), then we write

Soundness. For a sentence S

and a set of sentences

and a set of sentences

from a set of premises.

can be followed by a computer, this shows that

computably enumerable: a (non-halting) program can enumerate all the true

Incompleteness. There exists a nite set of sentences for which no consistent

such that and is consistent, there will exist a

That is: for any

There exist many such

Basic number theory

as codied by the axioms of Robinson arithmetic is an example. The inference

is another example: we can code expressions to stand for the

sentences of rst-order logic, and use a relation to represent

for all pairs of sentences.

Any computable function can be encoded in rst-

denotes the binary alphabet,

denotes the set of binary strings of

denotes the set of binary strings of any nite length;

the set of binary strings of innite length; and

denotes the set

of nite and innite binary strings. String concatenation will be represented by

of Turing machines with three or more tapes: an input

stops writing output. Now consider a

machine from this class; that

U C such that for any other machine T C , there is a nite

is, we take fU (i) for uniformly random i B . The Solomono distribution is

a distribution over B . The details of normalization will be skipped.

The development here has been adapted from [3].

P1 multiplicatively dominates P2 i there exists > 0 such that P1 (x) >

P1 exponentially dominates P2 i there exists , > 0 such that P1 >

which means that

some conclusions that

than multiplicatively more evidence to reach

3 A Notion of Logical Probabilities

I will follow [6] in the development of the idea of a probability distribution

A Bayesian prior over rst-order theories is dened. It is

given evidence; in that context it is referred to as Solomono induction. Although

v1 , v2 , ... V , an innite stock

There is an innite stock of variable symbols,

representing universal and existential quantication.

Incompleteness. There exists a nite set of sentences for which no consistent

as codied by the axioms of Robinson arithmetic is an example. The inference

sentences of rst-order logic, and use a relation to represent

Any computable function can be encoded in rst-

denotes the set of binary strings of any nite length;

the set of binary strings of innite length; and

of nite and innite binary strings. String concatenation will be represented by

U C such that for any other machine T C , there is a nite

is, we take fU (i) for uniformly random i B . The Solomono distribution is

P1 multiplicatively dominates P2 i there exists > 0 such that P1 (x) >

P1 exponentially dominates P2 i there exists , > 0 such that P1 >

respects the logic. The approach is to dene probability as a function

be the language of rst-order logic from section 2. We want a proba-

Denition As a Generative Process

We would like a description-length prior. Specically,

The specics of measuring length

in question.) Shorter (nite) theories will have a larger probability of occurring

(P0) is satised easily, since the process explicitly forbids generation of

We can also extend the denition of

present in a sequence generated by the process dened above. (By an argument

inconsistent possibilities when we nd them.

prex. Consider the following Monte Carlo approximation:

sample. The prex is represented as a FILO queue. push() adds a sentence to

In order to compare the rst-order prior with algorithmic probability, we need to

say that the sequence begins 0011... we would write

Dene the algorithmic probability of a rst-order sentence

from section 2 in a rst-order

G . The description length of D1 D2 will be + l, where is plus

as dened in section 2. There exists a computable bijection from the sentences

are no nite strings