Philosophy, Computing, and Artificial Intelligence
PHI 319. Technical Background and Foundations of Logic Programming.
Computational Logic and Human Thinking
A1A3 (251283), A5 (290300)
Thinking as Computation
Chapter 2 (2339)
Logic and Logic Programming
This lecture looks much more difficult than it is. Be patient. The ideas are
interesting and beautiful in a certain way, but it takes some time and
effort to appreciate them.
Don't worry if you don't understand every detail. To do well in the
course, you only need to understand enough of this lecture to answer the questions in the
assignments.
Remember too that you can post questions about anything in the lectures or in the assignments.
Posting questions is a good way to learn because just formulating a question helps
makes the issues clearer.
Logic programming was developed to construct a better computer language.
The first computer languages took their form from early attempts to solve the Entscheidungsproblem ("decision problem" in German) David Hilbert and Wilhelm Ackermann posed in 1928. The problem asks roughly whether the answer to every mathematical question can be found by carrying out a set of easy to follow instructions.
To answer this question, it is necessary to understand what counts as "a set of easy to follow instructions." As part of an effort to answer this question, Alan Turing, in 1936, gave a theoretical description of a machine with the ability to carry out such instructions.
The attempt to build such machines became part of the effort in the Second World War. In 1942, John Mauchly and Presper Eckert set out the technical outline of a machine to compute the trajectories of artillery shells. This machine, ENIAC (Electronic Numerical Integrator and Computer), was built in 1946. To carry out a new set of instructions, ENIAC had to be rewired manually. To overcome this problem, Mauchly and Eckert turned their attention to designing EDVAC (Electronic Discrete Variable Automatic Computer). This was a machine that could store sets of instructions. In this same period, John von Neuman had been working on the Manhattan Project to develop nuclear weapons. He recognized that the new computing machines could help carry out the necessary calculations. He joined Mauchly and Eckert, and he produced a technical report describing EDVAC. The machine was built in 1949.
The conceptual model in this report became known as the von Neuman architecture. Computers build according to this architecture are von Neuman machines.
The instructions a von Neuman machine executes were of the sort "place the sum of the contents of addresses a and b in address c." Thinking about the steps of such instructions to solve problems is not natural for most people. This led to an attempt to design languages that abstracted away from the architecture of the machine. The goal in this abstraction was to make the language more natural for human beings to understand. Many languages developed in early computer science (such as the C programming language) did not go very far. They remained heavily influenced by the architecture of the machine. The language of logic programming, however, is completely different in this respect. It is based on the language of logic, and logic has been traditionally thought to be an important part of the language of thought.
The Language of Logic Programming
Logic programming has its basis in what is known as firstorder predicate calculus.
This is the logic that comes out of the work of the philosopher Gottlob Frege (18481925) and others who developed it to clarify certain issues in mathematics.
This is not a course in computer science or in logic, so my intent here is to provide a basic picture of the connection between logic programming and its basis in logic.
A logic program is an abstraction. It translates to a formula of the propositional or firstorder predicate calculus. This formula is a conjunction of clauses.
The logic program from the first lecture provides an example:
a ← b, c.
a ← f.
b.
b ← g.
c.
d.
e.
Each line in this program translates to a clause. The logic program itself translates to the conjunction of these clauses, and we can think of these clauses as a set of beliefs (or KB). Beliefs, in this way, are symbolic structures that the agent uses in thinking and acting.
∧ symbolizes conjunction: "b and c." The letters b
and c stand for declarative sentences. A conjunction has conjuncts. In
the conjunction b ∧ c, the conjuncts are b and c.
∨ symbolizes disjunction: "b or c." The letters
b and c stand for declarative sentences. (In Latin, the word
for "or" is
vel.)
A disjunction has disjuncts. In the disjunction a ∨ ¬b ∨ ¬c,
the disjuncts are a, ¬b, and ¬c.
The English word 'atomic' derives from the Greek adjective
ἄτομος.
One of the meanings of this adjective is "uncuttable" and so "without parts."
In the context of logic, an atomic formula is one not composed of other formulas.
¬ symbolizes negation: "It is not the case that b." The letter
b stands for a declarative sentence.
To understand this more clearly, we need some definitions to fix terminology.
• A clause is a disjunction of literals.
a ∨ ¬b ∨ ¬c is a clause. a ← b, c translates to this clause.
• A literal is an atomic formula or the negation of an atomic formula.
a is an atomic formula.
¬b, ¬c are negations of atomic formulas.
• A positive literal is an atomic formula.
a is a positive literal. Positive literals symbolize declarative sentences.
• A negative literal is the negation of an atomic formula.
¬b, ¬c are negative literals. Negative literals symbolize the negation of a declarative sentences.
• A definite clause contains exactly one positive literal
and zero or more negative literals.
a ∨ ¬b ∨ ¬c is a definite clause.
• A definite logic program is a conjunction (or set) of definite clauses
Here are some more definitions:
• A negative clause contains zero or more negative literals
and no positive literals.
An empty clause is a negative clause containing no
literals. It is designated by the symbol ⊥.
• A Horn clause is a definite clause or a negative clause.
(The mathematician
Alfred Horn
(19182001) described what are now known as "Horn" clauses.)
• A set of definite clauses whose positive literals share the same
predicate is a definition of the predicate (and is also called
a procedure for the predicate).
• Negative clauses are queries or goal clauses.
• An indefinite clause contains at least two positive
literals and zero or more negative literals. An indefinite logic
program is a conjunction (or set) of clauses that contains at least
one indefinite clause. A logic program is a definite logic
program or an indefinite logic program.
The example logic program is a
definite logic program. Definite logic programs are the kinds of logic programs we will use to represent
KBs in this course.
The Propositional Calculus
The formulas in the example logic program translate to formulas in the language of the propositional calculus. The propositional calculus is a simplified form of the firstorder predicate calculus. In philosophy classes in symbolic logic, it is traditional to consider the propositional calculus as an introduction to the firstorder predicate calculus.
One reason for this practice is that the propositional calculus is relatively easy to understand.
Sentences in English translate to formulas in the propositional calculus.
These formulas are composed of atomic formulas
and truthfunctional connectives (¬, ∧, ∨, →). The
"atomic" formulas have no parts (hence their name) and are the formulas from which
we construct compound formulas.
It is a theory in the philosophy of language that declarative sentences
express propositions. This is why the calculus constructed from atomic
formulas and truthfunctional connectives (¬, ∧, ∨, →) is called the
propositional calculus.
In philosophy, it is traditional to use capital letters from the end
of the alphabet (P, Q, R, and so on) for atomic
formulas. In logic programming, it is traditional to use small letters from
the beginning of the alphabet (a, b, c, and so on).
Given the atomic formulas, the compound formulas in the propositional calculus are defined inductively:
φ and ψ are metalinguistic variables. Metalinguistic variables have strings
of the language are their values.
¬φ is shorthand for ⌜¬φ⌝. ⌜¬φ⌝ denotes the concatenation of the string ¬ with
the string that is the value of φ.
If, for example, the value of
φ is (P ∧ Q),
then the value of
⌜¬φ⌝ is ¬(P ∧ Q).
¬φ is a formula if φ is a formula
¬φ is the negation of φ. Read ¬φ as "not φ"
¬b and ¬c are examples
(φ ∧ ψ) is a formula if φ and ψ are formulas
(φ ∧ ψ) is the conjunction of φ and ψ. Read (φ ∧ ψ) as "φ and ψ"
b ∧ c is an example
(φ ∨ ψ) is a formula if φ and ψ are formulas
(φ ∨ ψ) is the disjunction of φ and ψ. Read (φ ∨ ψ) as "φ or ψ"
¬b ∨ ¬c is an example
(φ → ψ) is a formula if φ and ψ are formulas
(φ → ψ) is the implication of ψ from φ. Read (φ → ψ) as "if φ, then ψ"
(b ∧ c) → a is an example
Parentheses eliminate ambiguity. Outside parentheses are dropped to increase readability.
The Example Logic Program
"The central component of a knowledgebased agent is its knowledge
base, or KB. A knowledge base is a set of sentences. (Here
'sentence' is used as a technical term. It is related but not identical to the
sentences of English and other natural languages.) Each sentence is expressed
in a language called a knowledge representation language and represents
some assertion about the world. ... There must be a way to add new sentences
to the knowledge base and a way to query what is known. The standard names for
these operations are TELL and ASK, respectively. Both operations may involve
inference —that is, deriving new sentences from old. Inference must obey the
requirement that when one ASKs a question of the knowledge base, the answer
should follow from what has been told (or TELLed) to the knowledge base
previously" (Stuart J. Russell and Peter Norvig,
Artificial Intelligence. A Modern Approach, 3rd edition, 7.1.235).
Russell and Norvig tell us that "[i]nference must obey the
requirement that ... the answer
should follow from...."
Why?
What does "follow from" mean?
Here, again, is the example logic program from the first lecture:
a ← b, c.
a ← f.
b.
b ← g.
c.
d.
e.
Consider the first line in this program. It (together with the second and the fourth entry) is what in logic programming is called a rule.
a ← b, c.
This rule translates to the backwardarrow conditional
a ← b ∧ c
This backwardarrow conditional is a way to write
(b ∧ c) → a
It is enough for now to understand truthfunctionally equivalent
to mean that the formulas saying the same thing because
they evaluate to the same truthvalue.
Here is a trivial example: a ∧ b and b ∧ a. These formulas are
trivially different ways to say the same thing.
This formula, in turn, is truthfunctionally equivalent to
a ∨ ¬(b ∧ c)
Further, the second disjunct in this formula
¬(b ∧ c)
is truthfunctionally equivalent to
¬b ∨ ¬c
So the logic program translates to the following formulas in the propositional calculus:
a ∨ ¬b ∨ ¬c
a ∨ ¬f
b
b ∨ ¬g
c
d
e
These formulas represent the beliefs the agent possesses and in terms of which he or she decides what to do. These beliefs constitute the agent's "knowledge base" (or "KB").
Truth and Falsehood
"[An] issue to consider is grounding—the connection between logical reasoning processes and the real environment in which the agent exists. In particular, how do we know that KB is true in the real world? (After all, KB is just 'syntax' inside the agent’s head.) This is a philosophical question about which many, many books have been written. ... A simple answer is that the agent’s sensors create the connection. For example, our wumpusworld agent has a smell sensor. The agent program creates a suitable sentence whenever there is a smell. Then, whenever that sentence is in the knowledge base, it is true in the real world. Thus, the meaning and truth of percept sentences are defined by the processes of sensing and sentence construction that produce them. What about the rest of the agent’s knowledge, such as its belief that wumpuses cause smells in adjacent squares? This is not a direct representation of a single percept, but a general rule—derived, perhaps, from perceptual experience but not identical to a statement of that experience. General rules like this are produced by a sentence construction process called learning, which is the subject of Part V. Learning is fallible" (Stuart J. Russell and Peter Norvig, Artificial Intelligence. A Modern Approach, 7.4.243) What an agent believes can be true or false. To understand an important logical property of the method to compute answers to queries, we need a technical device.
To determine truth and falsehood for symbolic structures in a KB, we need a key to know what propositions the formulas represent. The key assigns meanings to the formulas and thus tells us what propositions the formulas express and thus what the agent with this KB believes.
So, for the example logic program, the initial part of the key might be
a = "I can open the door"
b = "I can twist the knob"
c = "I can pull the door"
Given this key, is the proposition that "a ← b, c" represents true?
This proposition seems true, but we can easily imagine that in other cases it may be hard to know whether the propositions in a KB are true or false.
Because we are interested in a logical property of the method to compute answers to queries, we do not want to worry about determining whether the propositions in a KB are true or false.
So we introduce a technical device.
Instead of consider whether the formulas in a KB express propositions that are true or false, we consider whether the formulas are true or false in a model that we define.
To determine truth or falsehood in a model, we define an interpretation function to assign truthvalues to formulas. Interpretation functions are helpful technical devices because they allow us skip the work involved in knowing whether a proposition is true or false.
Every interpretation function has two parts.
The first part of the function is from the atomic formulas to true (T) or false (F). The second part extends the first part of the function to all the formulas in a way that respects the truthfunctional meanings of the connective symbols (¬, ∧, ∨, →).
Each row in the following table displays a part of an interpretation function:
φ ψ ¬φ φ ∧ ψ φ ∨ ψ φ → ψ T T F T T T T F F F T F F T T F T T F F T F F T
Given truthvalues for φ and ψ, each interpretation function assigns compound formulas truthvalues according to the truthfunctions for the connectives (¬, ∧, ∨, →).
For the example logic program, the first part of one possible interpretation function, I, is
I(a) = false
I(b) = true
I(c) = true
Relative to this interpretation function, the formula a ← b, c is false.
We can easily check this.
As we have seen, the formula a ← b, c is equivalent to
a ∨ ¬b ∨ ¬c
If we replace the literals with their truthvalues, we get
false ∨ ¬true ∨ ¬true
If we look at how the interpretation function behaves for the connectives ∨ and ¬, it is clear that each disjunct is false and hence that the disjunction is false.
Interpretation Functions and Models
An interpretation function is a model of a set of formulas just in case the function makes all the formulas in the set true. There is no reason to think that truth and falsehood and truth and falsehood in a model coincide. A formula that represents a proposition can be false in an interpretation even though the proposition is true in the world. This can happen because interpretation functions assign true and false to the atomic formulas arbitrarily.
This can be puzzling unless we remember that we changed the question. In general, it is hard to know whether a belief is true. So we are not asking whether the belief is true. We are asking whether it is true in a model, and we are asking this because we want to understand an important logical property of the backward chaining method to compute answers to queries.
Models and Backward Chaining
Another reason to take an interest in models (which will be familiar to many who have taken an introductory class in symbolic logic) is that they characterize certain classes of formulas. So, for example, if every interpretation is a model of a given formula, then the formula is a tautology. We give the name model to certain interpretation functions because we are especially interested in them. In the context of logic programming, we are interested in them because we want to know whether backward chaining has the important logical property called soundness.
We represent the beliefs of a rational agent as a KB. We represent the agent's reasoning as backward chaining on the formulas that represent these beliefs. If we know whether backward chaining has the logical property of soundness (or, more simply, know whether backward chaining is sound), we have some information about how good this representation is.
To understand what property soundness is, consider first the example logic program (on the left side of the hashed vertical line) and corresponding formulas in the propositional calculus:
a ← b, c.  a ∨ ¬b ∨ ¬c a ← f.  a ∨ ¬f b.  b b ← g.  b ∨ ¬g c.  c d.  d e.  e
To specify a model, we need to specify an interpretation function that makes all the formulas true. Here is a possible interpretation function, I, for the atomic formulas:
I(a) = true
I(b) = true
I(c) = true
I(d) = true
I(e) = true
I(f) = false
I(g) = false
Given this specification for the atomic formulas, the extension of this interpretation function to compound formulas makes all the clauses in the logic program true.
Again, we can easily check this.
Consider the first line, a ← b, c. It is a way to represent a ∨ ¬b ∨ ¬c. A disjunction is true just in case at least one disjunct is true. Since I assigns true to a, it assigns true to a ∨ ¬b ∨ ¬c.
A Positive Response to a Query
When a query is posed to a KB, we need to understand what the query is asking and what an answer means. That is to say, we need to understand what a positive and negative answer to the query indicates about the relationship between the query and the beliefs in the KB.
"Logical AI has both epistemological problems and heuristic problems. The former concern the knowledge needed by an intelligent agent and how it is represented. The latter concerns how the knowledge is to be used to decide questions, to solve problems and to achieve goals. ... Neither the epistemological problems nor the heuristic problems of logical AI have been solved" (John McCarthy, "Concepts of Logical AI," 2000). One might hope that a positive answer means that it is rational to believe the proposition the query represents. We want this sort of intelligence. We want that the agent to be able to ask and answer the question "should I believe this," where "this" is a query to itself.
The problem, however, is that given our current understanding in epistemology, we do not know how to compute the answer to a query in these terms. So in this course we settle for what we can compute: that the query is a logical consequence of the beliefs in the KB.
Why do we settle for the relation of logical consequence?
Logical consequence, first of all, is something we can compute. Moreover, logical consequence is useful for a rational agent to be able to compute. Information about logical consequence can be useful to an agent who is reasoning about what to believe. If, for example, the agent believes P and discovers that Q is a logical consequence of P, then it can believe Q.
The ability to compute logical consequences is thus a form of intelligence. It is not all we want an artificially rational agent to be ability do in its reasoning, but it is a start.
Backward Chaining is TruthPreserving
If backward chaining results in a positive answer to a query only if the query is a logical consequence of a KB, then backward chaining is truthpreserving and thus sound.
What this means is that if every belief in the KB is true, a positive answer to a query means that the query is true too because the query is a logical consequence of the KB.
This does not mean that if backward chaining results in a positive answer, then the query is true. A positive answer only means that the query is a logical consequence of the KB and hence that it is true on all the interpretations that make all the beliefs in the KB true.
The proof of the soundness of backward chaining depends on the following
relationship between logical consequence and logical entailment:
KB U {¬a} ⊢ ⊥ says that ⊥ is a logical
consequence of KB and ¬a. This means there is a
logical deduction of ⊥ from premises in the set consisting of
¬a and the clauses in KB.
KB ⊨ a says that KB logically entails a.
This means that a is true in every model that makes all the clauses in
KB true.
⊥ is the empty clause.
As the empty clause, ⊥ is a disjunction with no disjuncts.
Since a disjunction is true just in case a least one of its disjuncts is
true, it follows that ⊥ is false.
if KB U {¬a} ⊢ ⊥, then KB ⊨ a
In this course, it is not necessary to understand how to prove this. The proof, although not very difficult, is technical in nature and so more appropriate for a course in logic.
It is easy enough, though, even without this proof, to think that if backward chaining computation issues in a positive answer, then the KB logically entails the truth of the query.
Consider the following very simple logic program
a ← b.
b.
This program is a way to represent the formulas
a ∨ ¬b
b
Now suppose the query is
?a.
To answer this query, we start the backward chaining computation. In this computation, we begin by determining whether the query matches the head of an entry in the KB. It does. It matches the head of the first entry (the rule a ← b). This match produces the derived query b. Now we determine if this derived query b matches the head of an entry in the KB. It does. It matches the head of the second entry (the fact b). Now the query list is empty. This causes the backward chaining procedure to stop and the query to be answered positively.
Given that backward chaining is sound, the positive answer means that the beliefs in the KB logically entail that a is true. That is to say, the query is true if the beliefs in the KB are true.
In this example, can easily see that this is true. If we try to assign truthvalues so that
a ∨ ¬b and b are true and a is false,
we will see this is impossible to do. a ∨ ¬b and b together logically entail a.
The Logic Programming/Agent Model
Soundness tells us something important about the model of the intelligence of a rational agent we are developing in terms of what we will call the logic programming/agent model.
Backward chaining represents a form of reasoning and thus a form of the intelligence. On the logic programming/agent model, as we are currently understanding it, this intelligence is the ability to answer the question "is this a logical consequence of what I believe."
This intelligence is not the same as having the ability to answer the question "what should believe this," and this is a serious shortcoming in the logic programming/agent model.
To see why it is a shortcoming, consider again the way human beings form beliefs about the world on the basis of perception. It can be rational in certain circumstances for an agent to believe that a given object is red if it looks red. Notice, though, that an object looking red does not logically entail it is red because something can look red but not be red.
This shows that computing what it is rational to believe is not that same as computing what the beliefs in the KB entail. This is a problem for the model that needs to be addressed.
A Corresponding Proof in the Propositional Calculus
The backward chaining computation is an algorithm that implements a search for a proof that the query is a logical consequence of the KB. It does not always find the shortest or most elegant proof. It implements a search for what is called a reductio ad absurdum proof.
"Sentences are physical configurations of the agent, and reasoning is a process
of constructing new physical configurations from old ones. Logical reasoning should en
sure that the new configurations represent aspects of the world that actually follow from the
aspects that the old configurations represent"
(Stuart J. Russell and Peter Norvig,
Artificial Intelligence. A Modern Approach, 3rd edition, 7.4.243).
What does "actually follow from" mean here?
This understanding of what the backward chaining computation does can be more than a little
surprising. The computation does not initially at all look like a search for a
proof. The computation just looks like a series of completely mindless steps in an
algorithm, matching heads, pushing derived queries onto the query list, and so on.
If, however, we think about these steps, we can see that they implement a certain form of
intelligence. They implement the ability to engage in logical deduction (the
process of deducing logical consequences).
It seems plausible to think that a similar relationship holds between processes involving neurons in the human brain and the forms of intelligence that characterize human beings. Certain biochemical processes involving neurons somehow implement what we recognize as the ability to engage in logical deduction and the other forms of human intelligence.
Backward chaining is not a biochemical process, but what are called neural networks (which I introduce in the last lecture in the course) are more directly inspired by the way biological neurons are thought to function. It is conceivable that neural networks (which are an area of active research) are part of a model that more clearly shows how biochemical processes can implement some of the forms of intelligence that characterize a rational agent.
The Proof in the Very Simple Example
In what follows, I give the proof (which looks much harder to understand than it is)
backward chaining finds for the query in the case of the very simple logic
program. I set it out in a
Gentzenstyle natural deduction.
Gerhard Gentzen
(19091945) was a mathematician and logician who did the pioneering work in
proof theory.
For an introduction to logic that uses Gentzenstyle proofs, see Neil
Tennant's
Natural Logic.
It is not necessary in this course to understand the proof, but since it is
not all that hard to understand, I give it for those who are
interested.
There is a shorter proof, but it is not a reductio ad absurdum.
[¬b]^{2} b

⊥

a ∨ ¬b [a]^{1} a

a
Here is the reducito in Fitch style natural deduction. I find this
style harder to read, but it is the style of natural deduction used in most
symbolic logic classes.
1. a v ¬b rule
2. b fact
3.  ¬a assumption for reductio

4.   a assumption
 
5.   a reiteration, 4

6.   ¬b assumption
 
7.   ⊥ negation elimination, 6, 7
8.   b reiteration
9.   a intuitionistic absurdity, 8
10.  a disjunction elimination, 1, 45, 69
11.  ⊥ negation elimination, 3, 10
12. ¬¬a negation introduction, 312
13. a double negation elimination, 13
In this proof, red marks the rule (a ∨ ¬b)
and the fact (b). Blue marks the negative
clause (¬a) corresponding to the query.
This negative clause is the assumption for reductio.
[¬a] ^{1} [a]^{2}  ¬E ⊥  ⊥I a ∨ ¬b ¬b [¬b]^{ 3}  ∨E, 2, 3 ¬b b  ¬E ⊥  ¬I, 1 ¬¬a  ¬¬E a
This proof may be divided into three parts. The first of these parts shows that from the premises a ∨ ¬b (which corresponds to the first entry in the logic program) and ¬a (which is the negative clause that corresponds to the query), the conclusion ¬b is a logical consequence:
[¬a]^{1}  . . . a ∨ ¬b  ¬b
The second part of the proof extends the first. It shows that given the first part of the proof and given b (which is the fact in the KB), it follows that ⊥:
[¬a]^{1}  . . . a ∨ ¬b  ¬b b  ⊥
The pattern in these first two parts of the proof corresponds to the matching procedure in the backward chaining computation. The negative clause ¬a (the logical form of the query a) together with a ∨ ¬b (the logical form of the rule) are premises in a proof of the negative clause ¬b (the logical form of the derived query b):
¬a a ← b (or: a ∨ ¬b)  /  /  /  / ¬b b  /  /  / ⊥
If the derived query is the negative clause is ⊥, the initial query is successful because ⊥ means that the derived query list is empty. The backward chaining process stops when there are no more derived queries. It then returns a positive answer to the initial query a. This success is specified in the final part of the proof by the derivation of a from ¬¬a.
The History of Logic Programming
The seminal paper is
J. A. Robinson's "A MachineOriented Logic Based on the Resolution Principle."
Journal of the Association for Computing Machinery, vol. 12, 1965, 2341.
An interview with Alan Robinson.
Logic programming comes out of work
in automated theoremproving. In this tradition, the development of a
technique called "resolution" was a major breakthrough.
For this course, it is not necessary to understand resolution and its role in the history of logic programming. This history is interesting, though, and not hard to understand.
Resolution
The resolution rule in
propositional logic is a derived deduction rule that produces
a new clause from two clauses with complementary
literals.
The following is a simple instance of resolution.
In the clauses that constitute the premises
a ∨ b and ¬a ∨ c,
the literals a and ¬a are
complementary. Resolution eliminates them and conjoins the remaining literals b and
c into the clause that constitutes the conclusion.
a ∨ b ¬a ∨ c  b ∨ c
Because the resolution rule is a derived rule (a rule for which there is a proof in terms of the basic rules), the proofs are shorter. Here is a simple example. The logic program
a ← b.
b.
is a way to write
a ∨ ¬b
b
The query
?a.
corresponds to the negative clause
¬a
We can apply resolution to the negative clause corresponding to the query and to the first rule
¬a a ← b (or: a ∨ ¬b) ¬a a ∨ ¬b  /   / ¬b ¬b
¬b is the derived query. We apply resolution to ¬b and to the fact b in the KB
¬b b ¬b b  /   / ⊥ ⊥
Now we have reached ⊥, so the initial query is a logical consequence of the logic program.
Searching for a Proof
In the context of automated theoremproving, the question is whether a given conclusion is a logical consequence of a given set of premises. To use resolution to answer this question, the first step is to rewrite the premises and conclusion as sets of clauses.
The rewriting occurs according to the following rules, which need to be applied in order.
1. Conditionals (C):
φ → ψ ⇒ ¬φ ∨ ψ
2. Negations (N):
¬¬φ ⇒ φ
¬(φ ∧ ψ) ⇒ ¬φ ∨ ¬ψ
¬(φ ∨ ψ) ⇒ ¬φ ∧ ¬ψ
3. Distribution (D):
φ ∨ (ψ ∧ χ) ⇒ (φ ∨ ψ) ∧ (φ ∨ χ)
(φ ∧ ψ) ∨ χ ⇒ (φ ∨ χ) ∧ (ψ ∨ χ)
φ ∨ (φ_{1} ∨ ... ∨ φ_{n}) ⇒ φ ∨ φ_{1} ∨ ... ∨ φ_{n}
(φ_{1} ∨ ... ∨ φ_{n}) ∨ φ ⇒ φ_{1} ∨ ... ∨ φ_{n} ∨ φ
φ ∧ (φ_{1} ∧ ... ∧ φ_{n}) ⇒ φ ∧ φ_{1} ∧ ... ∧ φ_{n}
(φ_{1} ∧ ... ∧ φ_{n}) ∧ φ ⇒ φ_{1} ∧ ... ∧ φ_{n} ∧ φ
4. Sets (S):
φ_{1} ∨ ... ∨ φ_{n} ⇒ {φ_{1}, ... , φ_{n}}
φ_{1} ∧ ... ∧ φ_{n} ⇒ {φ_{1}}, ... , {φ_{n}}
Consider the formula a ∧ (b → c). Given the rewrite rules, the sets of clauses are {a}, {¬b, c}.
Suppose we wanted to know if a follows from a ∧ (b → c). In classical logic, we can easily see that it does follow. The proof consists in one application of ∧Elimination (∧E).
a ∧ (b → c)  ∧E a
This proof, however, is not the one automatedtheorem proving finds. To answer the question, it uses resolution in a refutation procedure. It rewrites the premise and the negation of the
conclusion, and it shows that the sets of clauses that come from rewriting the premises and conclusion are inconsistent
if
the empty clause {} is derivable using the resolution rule
{φ_{1}, ... , χ, ... , φ_{m}}
{ψ_{1}, ... , ¬χ, ... , ψ_{n}}

{φ_{1}, ... , φ_{m}, ψ_{1}, ..., ψ_{n}}
In this example, the sets of clauses the rewrite rules produce are
{a}, {¬b, c}, {¬a}.
So it is easy to see that the empty clause is derivable.
{a} {¬a}  { }
The corresponding "refutationstyle" proof in classical logic is
a ∧ (b → c)  ∧E a [¬a]^{1}  ¬E ⊥  ¬I,1 ¬¬a  ¬¬E a
To use resolution in automated theoremproving, it is necessary to have a control procedure for the steps that determine whether the empty clause is derivable.
Consider, for example, the argument
p
p → q
(p → q) → (q → r)

r
One resolution proof that the conclusion is a logical consequence of the premises is
1. {p} Premise
2. {¬p, q} Premise
3. {p, ¬q, r} Premise (This is not a definite clause)
4. {¬q, r} Premise
5. {¬r} Premise
6. {q} 1, 2
7. {r} 4, 6
8. {} 5, 7
This, however, is not the only way to apply the resolution rule. At step 6, instead of applying the rule to premises 1 and 2, we could have applied it to premises 2 and 3.
Logic programming was born out of reflection on the question of what the control procedure should be when the clauses that constitute the premises are all definite clauses. The way a query is solved in logic programming incorporates one possible control procedure.
The FirstOrder Predicate Calculus
The firstorder predicate calculus is more expressive than the propositional calculus. It allows for the representation of some propositions the propositional calculus does not represent well.
Consider, for example, the following simple argument
1. Bachelors are not married
2. Tom is a bachelor

3. Tom is not married
This argument is valid, but this validity is not captured when the argument is represented in the propositional calculus. In the propositional calculus, the argument is
1. P
2. Q

3. R
This representation leaves out too much detail to capture the validity of the argument. We need a way to talk about the name (Tom) and the two predicates (is a bachelor, is married) in the argument. The firstorder predicate calculus provides a way to do this.
Again, for this course, it is not necessary to understand all the details that follow.
Syntax for the FirstOrder Predicate Calculus
The vocabulary of the firstorder predicate calculus subdivides into two parts, a logical and a nonlogical part. The logical part is common to all firstorder theories. It does not change. The nonlogical part varies from theory to theory. The logical part of the vocabulary consists in
• the connectives: ¬ ∧ ∨ → ∀ (universal quantifier) ∃ (existential quantifier)
• the comma and the left and right parenthesis: , ( )
• a denumerable list of variables: x_{1} x_{2}
x_{3} x_{4}. . .
The nonlogical part of the vocabulary consists in
• a denumerable list of constants: a_{1} a_{2}
a_{3} a_{4}. . .
• for each n, a denumerable list of nplace
predicates:
P^{1}
_{1},
P^{1}
_{2},
P^{1}
_{3}, . . .
P^{2}
_{1},
P^{2}
_{2},
P^{2}
_{3}, . . .
P^{3}
_{1},
P^{3}
_{2}, . . .
and so on for each n
Given the vocabulary, a formula is defined as follows:
• If P^{n} is a nplace predicate, and
t_{1}, ..., t_{n} are terms,
A term is
either a variable or a constant.
P^{n}t_{1}, ..., t_{n} is a
formula
• If φ and ψ are formulas, ¬φ, (φ ∧ ψ), (φ ∨ ψ), (φ
→ ψ) are
formulas
• If φ is a formula and v is a
variable, then ∀vφ, ∃vφ are formulas
• Nothing else is a formula
This looks more complicated than it is. All we are really doing in setting out the syntax is saying how to construct formulas from the basic pieces in the language.
Semantics for the FirstOrder Predicate Calculus
"The language of firstorder logic ... is built around objects and relations. It has been so important to mathematics, philosophy, andartificial intelligence precisely because those fields—and indeed, much of everyday human existence—can be usefully thought of as dealing with objects and the relations among them. Firstorder logic can also express facts about some or all of the objects in the universe. ... The primary difference between propositional and firstorder logic lies in the ontological commitment made by each language—that is, what it assumes about the nature of reality. Mathematically, this commitment is expressed through the nature of the formal models with respect to which the truth of sentences is defined. For example, propositional logic assumes that there are facts [or: propositions] that either hold or do not hold in the world. Each fact can be in one of two states: true or false, and each model assigns true or false to each proposition symbol.... Firstorder logic assumes more; namely, that the world consists of [a domain of] objects with certain relations among them that do or do not hold" (Stuart J. Russell and Peter Norvig, Artificial Intelligence. A Modern Approach, 3rd edition, 8.1.289). What are traditionally called models of formulas in the firstorder predicate calculus are formal representations of things in the world and their relations to each other. These models are different from the models for the propositional calculus, but their function is the same: they allow for the statement of truthconditions for sentences of the language.
• A model is an ordered pair <D,
F>, where D is a domain and
F is an interpretation. The domain D is a nonempty set. This set
contains the things the formulas are about. The interpretation, F, is a function on the
nonlogical vocabulary. It gives the meaning of this vocabulary relative to the domain.
For every constant c, F(c) is in D. F(c) is the referent
of c in the model.
For every nplace predicate P^{n},
F(P^{n}) is a subset of
D^{n}. F(P^{n}) is the
extension of P^{n} in the model.
• An assignment is a function from variables to members
of D. A vvariant of an assignment
g is an assignment that agrees with g except
possibly on v.
(Assignments are primarily technical devices. They are required to
provide the truthconditions for the quantifiers, ∀ and ∃.)
The truth of a formula relative to a model and an assignment is defined inductively. The base case uses the composite function [ ] ^{F} _{g} on terms, defined as follows:
[t] ^{F} _{g} = F(t) if t is a constant. Otherwise, [t] ^{F} _{g} = g(t) if t is a variable.
The clauses in the inductive definition of truth relative to M and g are as follows:
P^{n}t_{1},...,
t_{n} is true relative to M and g iff
<[t]
^{F} _{g}, ...,
[t]
^{F} _{g}> is in
F(P^{n}).
¬A is true relative to M and g if A
is not true relative to M and g.
A ∧ B is true relative to M and g if
A and B are true relative to M and g.
A ∨ B is true relative to M and g if
A or B is true relative to M and g.
A → B is true relative to M and g if
A is not true or B is true relative to M and g.
∃vA is true relative to M and g if A
is true relative to M and g*, for some vvariant
g* of g.
∀vA is true relative to M and g if A
is true relative to M and g*, for every vvariant
g* of g.
A formula is true relative to a model M iff it is true relative to M for every assignment g.
Again, this all looks much more complicated than it is. In setting out the semantics, we are just saying how to determine whether a formula is true in a given model.
An Example in Prolog Notation
A statement of the syntax and semantics of the firstorder predicate calculus is necessary for understanding logical properties like soundness, but it is not very useful for understanding how the logic programming/agent model uses the firstorder predicate calculus.
For this, we need a straightforward example KB stated in a Prologlike language. An example KB stated in a Prologlike language is closer to ordinary English and thus is much more intuitive than the very formal language of the the firstorder predicate calculus.
Prolog has its own syntax. A variable is a word starting with an uppercase letter. A constant is a word that starts with a lowercase letter. A predicate is a word that starts with a lowercase letter. Constants and predicate symbols are distinguishable by their context in a program. An atomic formula has the form p(t_{1},...,t_{n}), where p is a predicate symbol and each t_{1} is a term.
This example is taken from Representation and Inference for Natural Langauge: A First Course in Computational Semantics, Patrick Blackburn and Johan Bos. We use this syntax to set out an example based on the movie Pulp Fiction. In the movie, various people love other people. Further, to simplify what is true in the movie, we will suppose that a certain rule defines jealousy. So, in the language of Prolog, the "Pulp Fiction" KB is
loves (vincent, mia).
"Vincent loves Mia."
loves (marcellus, mia).
"Marcellus loves Mia."
loves (pumpkin, honey_bunny).
"Pumpkin loves Honey Bunny."
loves (honey_bunny, pumpkin).
"Honey Bunny loves Pumpkin."
jealous (X, Y) : loves (X, Z), loves (Y, Z).
This rule for jealousy is interpreted as a universal quantified formula. From a firstorder predicate logic point of view, the rule really says
This sentence is not a formula of Prolog or the firstorder predicate calculus. It is a mixed form, meant to be suggestive. ∀X ∀Y ∀Z ( ( loves (X, Z) ∧ loves (Y, Z) ) → jealous (X, Y) )
The symbols X, Y, and Z are variables. The rule is universal. It says that x is jealous of y if x loves z and y loves z. Obviously, jealousy in the movie and the real world are different.
To express this program in the firstorder predicate calculus, a key is necessary. The key specifies the meanings of the constants and predicates:
Vincent a_{1}
Marcellus a_{2}
Mia a_{3}
Pumpkin a_{4}
Honey Bunny a_{5}
__ loves __ P^{2}
_{1}
__ is jealous of __ P^{2}
_{2}
Given this key, it is possible to express the entries in the program more formally as formulas in the firstorder predicate calculus. The fact
loves (marcellus, mia)
is expressed in the firstorder predicate calculus (relative to the key) by the formula
P^{2} _{1}a_{2},a_{3}. The language of the firstorder predicate calculus is traditionally set out in this unfriendly way to make it easy to prove various things about the language.
This very formal way of expressing entries in the program as formulas in the firstorder predicate calculus is obviously not very friendly. So it is easy to see why almost all concrete examples use something like a Prolog notation to set out the KB.
To set out a model <D, F> for the "Pulp Fiction" example, we need to define the domain D and the interpretation F. The domain is straightforward. It consists in the people
D = {Vincent, Marcellous, Mia, Pumpkin, Honey Bunny}.
The interpretation F in the model <D, F> is a little more complicated to set out. It assigns the constants to members of the domain D. So, for example,
F (a_{1}) = Vincent
The interpretation F also assigns extensions to the predicates. For F (P^{2}_{1}), it is the set of pairs from the domain such that the first in the pair loves the second in the pair
F (P^{2}_{1}) =
{<Vincent, Mia>,
<Marcellous, Mia>,
<Pumpkin, Honey Bunny>,
<Honey Bunny, Pumpkin>}
In this model <D, F>, the formula P^{2} _{1}a_{2},a_{3} is true.
Queries to the "Pulp Fiction" Program
Relative to the example KB taken from the Movie Pulp Fiction,
loves (vincent, mia).
loves (marcellus, mia).
loves (pumpkin, honey_bunny).
loves (honey_bunny, pumpkin).
jealous (X, Y) : loves (X, Z), loves (Y, Z).
consider the following query (whether Mia loves Vincent):
? loves (mia, vincent).
The response is
false (or no, depending on the particular implementation of Prolog)
For a slightly less trivial query, consider
? jealous (marcellus, W).
This asks whether Marcellus is jealous of anyone. In the language corresponding to the firstorder predicate calculus, the query asks whether there is a w such that Marcellus is jealous of w:
∃W jealous (marcellus, W)
Since Marcellus is jealous of Vincent (given the KB), the response is
W = vincent
From a logical point of view, the query (and the answer to the query is computed in terms of) the corresponding negative clause
¬jealous(marcellus, W)
This negative clause is read as its universal closure
∀W ¬jealous(marcellus, W)
which (by the equivalence of ∀ to ¬∃¬ in classical logic) is equivalent to
¬∃W jealous(marcellus, W)
The computation (to answer the query) corresponds to the attempt to refute the universal closure by trying to derive the empty clause. Given the KB, the empty clause is derivable
{KB, ∀W ¬jealous(marcellus,W)} ⊢ ⊥
This means (given the equivalence of ∀ to ¬∃¬ and of ¬¬φ and φ in classical logic) that
∃W jealous (marcellus,W)
is a consequence of the KB. Moreover, the computation results in a witness to this existential formula. Given the KB, Marcellus is jealous of Vincent. So the response to the query is
W = vincent
Here is how this "thinking" looks when it is implemented on my laptop with SWIProlog:
A Corresponding Proof in the Predicate Calculus
As in the case of the propositional logic, a computation that issues in a positive response to a query corresponds to a proof by reductio in classical firstorder logic.
I set out this proof as a Gentzenstyle natural deduction. Red marks two facts and the rule. Blue marks the negative clause corresponding to the query. This is the assumption for reductio.
To make the proof fit nicely on the page, I abbreviate the predicates "loves" and "jealous" as "l" and "j" and abbreviate the constants "marcellous" and "vincent" as "mar" and "vinc."
Note also that ¬∀x¬ ("not all not") is equivalent to ∃ ("some") in classical firstorder logic.
∀x∀y∀z((l(x,z) ∧ l(y,z)) → j(x,y))  ∀E ∀y∀z((l(mar,z) ∧ l(y,z)) → j(mar,y))  ∀E l(mar,mia) l(vinc,mia) ∀z((l(mar,z) ∧ l(vinc,z)) → j(mar,vinc))  ∧I  ∀E l(mar,mia) ∧ l(vinc,mia) (l(mar,mia) ∧ l(vinc,mia)) → j(mar,vinc) [∀x¬j(mar,x)]^{1}  →E  ∀E j(mar,vinc) ¬j(mar,vinc)  ¬E ⊥  ¬ I, 1 ¬∀x¬j(mar,x)
Unification is Part of the Computation
The instantiation of variables is a complicating factor. Now (unlike for the propositional calculus) the backward chaining computation includes what is called unification.
Notice that in the "Pulp Fiction" example, the query
jealous (marcellus, W)
matches the head of no entry in the logic program, but it can be "unified" with the head of the rule for jealousy. Unification, in this way, is a substitution that makes two terms the same.
A substitution is a replacement of variables by terms. A substitution σ has the following form
{V_{1}/t_{1}, ... , V_{n}/t_{n}}, where V_{I} is a variable and t_{i} is a term.
φσ is a substitution instance of φ. It is the fomula that results from the replacement of every free occurrence of the variable V_{i} in φ with the term t_{i}.
An example makes unification a lot easier to understand.
Consider the
blocksworld program:
on(b1,b2).
on(b3,b4).
on(b4,b5).
on(b5,b6).
above(X,Y) : on(X,Y).
above(X,Y) : on(X,Z), above(Z,Y).
If we use our imagination a little, we can see that an agent with this logic program as his or her KB thinks that the world looks like this:
b3 b4 b1 b5 b2 b6
Now suppose the query is whether block b3 is on top of block b5
? above(b3, b5).
The computation to answer (or solve) this query runs roughly as follows. The query does not match the head of any fact. Nor does it match the head of any rule. It is clear, though, that there is a substitution that unifies it and the head of the first rule. The unifying substitution is
{X/b3, Y/b5}
This substitution produces
above(b3,b5) : on(b3,b5).
So the derived query is
on(b3,b5).
This derived query fails. So now it is necessary to backtrack to see if another match is possible further down in the knowledge base. Another match is possible. The query can be made to match the head of the second rule. The unifying substitution is
{X/b3, Y/b5}
This produces
above(b3,b5) : on(b3,Z), above(Z,b5).
The derived query is
on(b3,Z), above(Z,b5).
Now the question is whether the first conjunct in this query can be unified with anything in the KB. It can. The unifying substitution for the first conjunct in the derived query is
{Z/b4}
The substitution has to be made throughout the derived query. So, given that the first conjunct has been made to match, the derived query becomes
above(b4,b5).
This can be made to match the head of the first rule. The unifying substitution is
{X/b4, Y/b5}
and the derived query is now
on(b4,b5}.
This query matches one of the facts in the knowledge base. So now the query list is empty and hence the computation is a success! The query is a logical consequence of the KB.
We can set out the computation (together with the substitutions) in the following form:
¬above(b3,b5) above(X,Y) : on(X,Z), above(Z,Y) {X/b3, Y/b5} above(X,Y) ∨ ¬on(X,Z) ∨ ¬above(Z,Y) \ \ / \ / ¬on(b3,Z) ∨ ¬above(Z,b5)) on(b3,b4) {Z/b4} \ / \ / \ / ¬above(b4,b5) above(X,Y) : on(X,Y) {X/b4, Y/b5} above(X,Y) ∨ ¬on(X,Y) \ / \ / ¬on(b4,b5) on(b4,b5) \ / \ / ⊥
What we have Accomplished in this Lecture
We examined the relation between logic and logic programming. We saw that if a query to a logic program (or KB) is successful (results in a positive response), then the query is a logical consequence of premises taken from the logic program. To see this, we considered the connection between the backward chaining process in computing a successful query and the underlying proof in the propositional and firstorder predicate calculus.
We saw that backward chaining in the logic programming/agent model implements a form of intelligence: the ability to deduce logical consequences. We saw examples of backward chaining working on a machine and thus that a machine can implement a form of intelligence.
We also raised the question of what is going on when a rational agent thinks about a given proposition and comes to believe it is true. In the logic programming/agent model, in its current form, we understand this in terms of a positive response to a query. So the agent is permitted to add a proposition to his KB (to believe a proposition is true) only if the proposition is a logical consequence of what he already believes. This is some but not a lot of intelligence.