# Philosophy, Computing, and Artificial Intelligence

PHI 319. Some of the Technical Background for Understanding Backward Chaining. Computational Logic and Human Thinking
A1-A3 (251-283), A5 (290-300)

Thinking as Computation
Chapter 2 (23-39)

## Logic and Logic Programming

This material looks much more difficult than it is. Be Patient. It is interesting and beautiful in a certain way, but it takes some time and effort to appreciate.

Don't worry if you don't understand every detail. To do well in the course, you only need to understand enough to answer the questions posed in the assignments. Remember too that you can post questions about the assignments.
Logic programming was developed in an effort to construct a better computer language. Almost all modern computing machines are based on the work of John von Neuman (1903-1957) and his colleagues in the 1940s. As a practical matter, thinking in terms of a von Neuman computing machine is not particularly natural for most people. This led to an attempt to design languages that abstracted away from the underlying machine so that the language would be a more convenient medium of thought for human beings. Many of the languages developed in the mainstream of early computer science (such as the C programming language) remained heavily influenced by the architecture of the machine. Logic programming is completely different in this respect. It is based on logic, which traditionally has been thought to be connected with rational thought.

The term 'logic' here refers to the primary example of modern logic: classical first-order logic. This is the logic that comes out of the work of Gottlob Frege (1848-1925) and others who developed it to clarify certain problems in mathematics. Logic programming has its basis in this logic.

To understand the connection between logic and logic programming, a first step is to understand the relation between the language of logic and the language of logic programming.

## The Language of Logic Programming

A logic program is itself really just a formula of logic. This formula is a conjunction of clauses, but typically it is written in a way that make this not straightforward to see.

So, for example, in the logic program I set out in the first lecture

a ← b, c.
a ← f.
b.
b ← g.
c.
d.
e.

each line is a clause. The program itself is the formula that is the conjunction of these clauses.

This conception is clearer once the language of logic programming is itself clearer.

The language of logic programming is set out in terms of several definitions. In this course, it is not necessary to memorize these definitions. I provide the list because I use some of the terms in the lectures, and they are common in the literature on logic programming. The only really important terms to understand are "logic program" and the terms used in its definition.

A disjunction has disjuncts. In the disjunction a ∨ ¬b ∨ ¬c, a, ¬b, and ¬c are the disjuncts. The descending wedge ∨ symbolizes inclusive or. In Latin, the word is vel.

The arrow → symbolizes the material conditional. In a → b, a is the antecedent of the conditional and b is the consequent.
• A clause is a disjunction of literals
a ∨ ¬b ∨ ¬c is a clause. It is logically equivalent to a ← (b ∧ c)
a ← (b ∧ c) in Prolog notation is a ← b, c.
a ← (b ∧ c) is the backward-arrow way to write (b ∧ c) → a

• A literal is an atomic formula or the negation of an atomic formula
a is an atomic formula
¬b, ¬c are negations of atomic formulas

A positive literals (such as a) symbolize declarative sentences. • A positive literal is an atomic formula
a, b, c

A negative literal (such as ¬b) symbolize the negation of a declarative sentences. ¬ is the not sign. It symbolizes "It is not the case that." In ¬b, b is the sentence ¬ negates. • A negative literal is the negation of an atomic formula
¬b, ¬c

• A definite clause contains exactly one positive literal and zero or more negative literals.
a¬b ∨ ¬c

• A positive unit clause is a definite clause containing no negative literals

• A negative clause contains zero or more negative literals and no positive literals

• An empty clause is a negative clause containing no literals
It is designated by the special symbol

Alfred Horn (1918-2001) was a mathematician who described what are now known as "Horn" clauses. • A Horn clause is a definite clause or a negative clause

• An indefinite clause is a clause containing at least two positive literals

• Positive unit clauses are facts. All other definite clauses are rules.

• A set of definite clauses whose positive literals share the same predicate is a definition of the predicate (and is also called a procedure for the predicate)

• Negative clauses are queries or goal clauses

• A logic program is a conjunction (or set) of non-negative clauses

We are primarily concerned with definite logic programs. • A definite logic program is a conjunction (or set) of definite clauses
Any other program is an indefinite logic program

## The language of Logic: the Propositional Calculus

The definitions of "atomic formulas" and "negations of atomic formulas" are part of a description of the propositional calculus. The propositional calculus is a simplified form of the first-order predicate calculus. It is traditional in symbolic logic classes in philosophy to consider the propositional calculus as an introduction to the first-order predicate calculus.

Formulas in the propositional calculus are constructed from atomic formulas and truth-functional connectives (¬, , , ). The so-called "atomic" formulas have no parts, hence their name. The atomic formulas represent declarative sentences. It is a theory of the philosophy of language that declarative sentences express propositions. This is why the calculus constructed from atomic formulas and truth-functional connectives (¬, ∧, ∨, →) is called the propositional calculus. Whereas in philosophy it is traditional to use capital letters from the end of the alphabet (P, Q, R, and so on) for atomic formulas, it is traditional in logic programming to use small letters from the beginning of the alphabet (a, b, c, and so on). Given the atomic formulas, compound formulas are constructed recursively as follows:

φ and ψ are metalinguistic variables. Metalinguistic variables have strings of the language are their values.

¬φ is shorthand for ⌜¬φ⌝. ⌜¬φ⌝ denotes the concatenation of the string ¬ with the string that is the value of φ.

If the value of φ is (PQ), ⌜¬φ⌝ is ¬(PQ).
¬φ is a formula if φ is a formula
¬φ is the negation of φ. Read ¬φ as "not φ"

(φ ∧ ψ) is a formula if φ and ψ are formulas
(φ ∧ ψ) is the conjunction of φ and ψ. Read (φ ∧ ψ) as "φ and ψ"

(φ ∨ ψ) is a formula if φ and ψ are formulas
(φ ∨ ψ) is the disjunction of φ and ψ. Read (φ ∨ ψ) as "φ or ψ"

(φ → ψ) is a formula if φ and ψ are formulas
(φ → ψ) is the implication of ψ from φ. Read (φ → ψ) as "if φ, then ψ"

Parentheses eliminate ambiguity. Outside parentheses are typically dropped to increase readability. In this course, we will not consider in detail how to use these rules to construct formulas.

## An Example Logic Program

"The central component of a knowledge-based agent is its knowledge base, or KB. A knowledge base is a set of sentences. (Here 'sentence' is used as a technical term. It is related but not identical to the sentences of English and other natural languages.) Each sentence is expressed in a language called a knowledge representation language and represents some assertion about the world. ... There must be a way to add new sentences to the knowledge base and a way to query what is known. The standard names for these operations are TELL and ASK, respectively. Both operations may involve inference —that is, deriving new sentences from old. Inference must obey the requirement that when one ASKs a question of the knowledge base, the answer should follow from what has been told (or TELLed) to the knowledge base previously" (Stuart J. Russell and Peter Norvig, Artificial Intelligence. A Modern Approach, 3rd edition, 7.1.235). Given this information about the two languages, we can return again to the example logic program we considered in the first lecture:

a ← b, c.
a ← f.
b.
b ← g.
c.
d.
e.

In this program, there are three rules and four facts. The first conjunct (a ← b, c) is a rule. So is the second (a ← f) and the fourth (b ← g) conjunct. The other conjuncts are facts.

This logic program is written as a list, but really it is a conjunction of clauses. Further, each clause is a formula in the propositional calculus. We can see this more clearly if we keep in mind that (φψ) is truth-functionally equivalent to (¬φψ). Given this equivalence, the logic program from the first lecture (stated above) is the conjunction of the following formulas:

a ∨ ¬b ∨ ¬c
a ∨ ¬f
b
b ∨ ¬g
c
d
e

In this way, a logic program is really pretty straightforward. It is a conjunction of formulas that themselves are representations of the world. The formulas function as the beliefs about the world in terms of which a rational agent acts. The collection of these beliefs is what we call the agent's "knowledge base" (or "KB"). In the model of the intelligence of a rational agent we are developing, we model the agent's knowledge base as a logic program.

## Semantics for the Propositional Calculus

To know what state of the world a formula represents, it is necessary to have a key for the symbols of the language. A key assigns the symbols meanings. (In this case, it assigns atomic formulas the declarative sentences that the atomic formulas represent.) Otherwise it is impossible to determine whether the agent's beliefs understood as a logic program are true or false.

Truth-values may also be assigned more formally in terms of an interpretation function.

An interpretation function has two parts. The first part of the function is from the atomic formulas to true (T) or false (F). The second part extends the first part to all the formulas in a way the respects the truth-functional meanings of the connective symbols (¬, ∧, ∨, →).

The following table displays a part of four interpretations functions:

```φ         ψ       ¬φ       φ ∧ ψ      φ ∨ ψ      φ → ψ
T         T         F             T              T             T
T         F         F              F              T             F
F         T         T             F              T             T
F         F         T              F              F             T
```

In the table, given truth-values for φ and ψ, each interpretation function assigns compound formulas truth-values according to the truth-functions for the connectives (¬, ∧, ∨, →).

What in logic is traditionally called a model of a set of formulas is an interpretation function that makes all the formulas true. (Note that this use of 'model' designates something different from the use of 'model' in '...the model of the intelligence of a rational agent....')

A model need not correspond to reality. Because in an interpretation the assignment of truth-values to atomic formulas is arbitrary, a model might assign true to the atomic formula for a declarative sentence that is false. So, e.g., it might assign true to the formula corresponding to "The sun is shinning" even though in fact it is cloudy and raining outside.

## Models and Backward Chaining

Another reason to take an interest in models (which will be familiar to many of you who have taken a class in logic) is that they characterize certain classes of formulas. So, for example, a formula for which all interpretations are models is a tautology. Its truth is independent of the way the world is. . We give interpretation functions that are models a special name because we are especially interested in them. The reason we are interested in them is that we want to know whether backward chaining can reach a false output on the basis of true inputs. To know this, we need to know the conditions under which the formulas in a logic program are true or false.

We represent the agent's beliefs as a logic program, and we represent the agent's reasoning in terms of his or her beliefs in terms of backward chaining on this logic program. Rational agents reason in terms of their beliefs. In the model of the intelligence of a rational agent we are developing, we model this reasoning as backward chaining. If we know whether backward chaining can reach a false output on the basis of true inputs, we have some information about how reliable backward chaining is and hence how good the representation of reasoning in the model is.

Consider again the example logic program (on one side of the hashed vertical line) and corresponding formulas in the propositional calculus (on the other side):

```a ← b, c.     |       a ∨ ¬b  ∨ ¬c
a ← f.         |       a ∨ ¬f
b.               |       b
b ← g.        |       b ∨ ¬g
c.               |       c
d.              |       d
e.               |       e
```

To specify a model, we need to specify an interpretation function that makes all the formulas true. Here is a (partial) specification of such an interpretation function, f:

f(a) = true
f(b) = true
f(c) = true
f(d) = true
f(e) = true
f(f) = false
f(g) = false

This interpretation function, f, makes the clauses in the logic program true. Consider the first clause, a ← b, c. It is equivalent to a ∨ ¬b ∨ ¬c. A disjunction is true just in case at least one disjunct is true. Since f assigns true to a, it follows that it assigns true to a ← b, c.

When we pose a query to a KB, we get a positive answer only if there is a certain relationship between the query and the beliefs in the KB. We want a positive answer only if it is rational to believe the query is true given the beliefs in the KB, but we do not know the procedure to compute the answer in these terms. So for now we settle for the relation of logical The inability to compute the query in terms of what it is rational to believe given the KB is a potential problem for the logic programming/agent model of the intelligence of a rational agent. Logical deduction (deducing logical consequences) is an instance of reasoning, but there is more to reasoning than this. Whether a rational agent can be understood in terms of a model built on logical consequence is an unanswered question and focus of the course. consequence between the beliefs in the KB and the query. This is something we can compute.

Suppose that P is a set of the definite clauses constituting a KB and that the question is whether a query a is a logical consequence of P. In the context of logic programming, the way to answer this question is to "ask" whether a is a logical consequence of the KB. We ask this question by posing the query a to the KB. The query corresponds to the negative clause ¬a. This negative clause functions logically as an assumption for a proof by reductio ad absurdum. ⊥ is an empty disjunction. That is to say, it is a disjunction with no disjuncts. Since a disjunction is true just in case a least one disjunct is true, it follows that ⊥ is false. From a logical point of view, the computation (the process of matching) is an attempt to derive the empty clause, ⊥.

P U {¬a} ⊢ ⊥ says that ⊥ is a logical consequence of P and ¬a. This means there is a logical deduction of ⊥ from premises taken from the set consisting of ¬a and the clauses in P.

Pa says that P logically entails a. This means that a is true in every model that makes all the clauses in P true.
This way of "asking" and getting an "answer" from the KB is sound. That is to say, this "asking" gets a positive "answer" only if the KB logically entails the truth of the query. A positive answer does not mean that the query is true. It means that the query is true on all the interpretations that make all the beliefs in the KB true. More formally: if P U {¬a} ⊢ ⊥, then Pa.

So if the agent has all true beliefs (every belief in the KB is true), a query is answered positively only if it is true. Given the way an answer to the query is computed, there is no case in which an interpretation makes the beliefs in the KB true but fails to make a positively answered query true.

It is not necessary for us in this course to understand the proof that backward chaining is sound. The proof, although not hard, is technical and beyond the scope of this course, but it is easy enough to get an inital grasp on why backward chaining issues in a positive answer only if the KB logically entails the truth of the query. Consider the following very simple logic program

a ← b.
b.

The logical form (in the propositional calculus) is

a ∨ ¬b
b

Suppose the query posed to the logic program is

?-a.

This query is answered by determining whether the corresponding negative clause

¬a.

can be added to the KB without contradiction. If it can be so added, then the query is answered negatively. If it cannot, then the query is answered positively.

The matching procedure we saw in the first lecture determines whether the negative clause can be added to the KB without contradiction. The first step is to see whether the query matches the head of an entry in the KB. It does match a head. The query a matches the head of the rule a ← b. This produces the derived query b. The derived query b matches the fact b. Now the list of derived queries is empty. (The empty list represents the empty clause, which is designated as ⊥.) This causes backward chaining to stop and the query to be answered positively.

Further, in this example, it is easy to see that every interpretation function that makes both a ∨ ¬b and b true also makes a true. The KB logically entails that a is true.

## A Corresponding Proof in the Propositional Calculus

Backward chaining that issues in a positive response to a query corresponds to the existence of a proof in classical logic. In the case of the example, the corresponding proof (which looks harder to understand than it is) is set out below in the form of "Gentzen-style natural deduction." Gerhard Gentzen (1909-1945) was a mathematician and logician who did the pioneering work in proof theory.

For an introduction to logic that uses Gentzen-style proofs, see Neil Tennant's Natural Logic.
For this course, it is not necessary to understand the corresponding classical proof. The proof, though, is not all that hard to understand, so I set it out for those who are interested.

In the proof, red marks the rule (a ∨ ¬b) and the fact (b). Blue marks the negative clause (¬a) corresponding to the query. This negative clause is the assumption for reductio.

```
[¬a] 1    [a]2
-----------  ¬E
⊥
------ ⊥I
a ∨ ¬b                  ¬b                          [¬b] 3
-----------------------------------------  ∨E, 2, 3
¬b                                          b
----------------------------- ¬E
⊥
----- ¬I, 1
¬¬a
------ ¬¬E
a

```

This proof may be divided into three parts. The first of these parts shows that from the premises a ∨ ¬b (which is the logical form of the first entry in the logic program) and ¬a (which is the negative clause that corresponds to the query), the conclusion ¬b is a logical consequence:

```
[¬a]1
---------------
.
.
.
a ∨ ¬b
---------------------------------------
¬b

```

The second part of the proof extends the first. It shows that given the first part of the proof and given b (which is the fact in the KB), it follows that ⊥:

```
[¬a]1
---------------
.
.
.
a ∨ ¬b
---------------------------------------
¬b                                               b
--------------------------------
⊥

```

The pattern in these parts of the proof corresponds to the matching procedure in backward chaining. The negative clause ¬a (the logical form of the query a) together with the rule are premises in a proof of a negative clause ¬b (the logical form of the derived query b):

```
¬a       a ← b  (or: a ∨ ¬b)
|          /
|       /
|     /
|   /
¬b     b
|       /
|    /
|  /
⊥

```

If the negative clause is ⊥, the initial query is successful. That is to say, the backward chaining process stops when there are no more derived queries and returns a positive answer to the initial query a. This success is specified in the final part of the proof by the derivation of a.

```
⊥
-----
¬¬a
------
a
```

In this way, backward chaining is really just a way of searching for a reductio ad absurdum proof that a given query is a logical consequence of the logic program.

## Logic Programming and Automated Theorem-Proving

The seminal paper is J. A. Robinson's "A Machine-Oriented Logic Based on the Resolution Principle." Journal of the Association for Computing Machinery, vol. 12, 1965, 23-41.

Interview with Alan Robinson.
Logic programming comes out of work in automated theorem-proving. In this tradition, the development of a technique called "resolution" was a major breakthrough.

(For this course, it is not necessary to understand how logic programming comes out of automated theorem-proving. It is illuminating, though, and not hard to understand.)

### Resolution

The resolution rule in propositional logic is a derived deduction rule that produces a new clause from two clauses with complementary literals. The following is a simple instance of resolution. In the clauses that constitute the premises

a ∨ b and ¬a ∨ c,

the literals a and ¬a are complementary. Resolution eliminates them and conjoins the remaining literals b and c into the clause that constitutes the conclusion.

```a ∨ b   ¬a ∨ c
---------------
b ∨ c
```

Because the resolution rule is a derived rule, the proofs are shorter. Here is a simple example.

a ← b.
b.

has the logical form

a ∨ ¬b
b

The query

?-a.

corresponds to the negative clause

¬a

Resolution may be applied to the negative clause corresponding to the query and to the first rule

```
¬a    a ← b  (or: a ∨ ¬b)
¬a     a ∨ ¬b                 |   /
-------------                 |  /
¬b                              ¬b

```

¬b represents the derived query. Resolution may be applied again to ¬b and to the fact in the KB

```
¬b     b
¬b       b                          |   /
----------                      |  /
⊥                              ⊥

```

Now we have reached ⊥, so the initial query is a consequence of the logic program.

### Searching for a Proof

In the context of automated theorem-proving, the question is whether a given conclusion is a logical consequence of a given set of premises. To use resolution to answer this question, the first step is to rewrite the premises and conclusion as sets of clauses.

The rewriting occurs according to the following rules, which need to be applied in order.

1. Conditionals (C):
φ → ψ   ⇒   ¬φ ∨ ψ

2. Negations (N):
¬¬φ   ⇒   φ
¬(φ ∧ ψ)   ⇒   ¬φ ∨ ¬ψ
¬(φ ∨ ψ)   ⇒   ¬φ ∧ ¬ψ

3. Distribution (D):
φ ∨ (ψ ∧ χ)   ⇒   (φ ∨ ψ) ∧ (φ ∨ χ)
(φ ∧ ψ) ∨ χ   ⇒   (φ ∨ χ) ∧ (ψ ∨ χ)
φ ∨ (φ1 ∨ ... ∨ φn)   ⇒   φ ∨ φ1 ∨ ... ∨ φn
(φ1 ∨ ... ∨ φn) ∨ φ   ⇒   φ1 ∨ ... ∨ φn ∨ φ
φ ∧ (φ1 ∧ ... ∧ φn)   ⇒   φ ∧ φ1 ∧ ... ∧ φn
(φ1 ∧ ... ∧ φn) ∧ φ   ⇒   φ1 ∧ ... ∧ φn ∧ φ

4. Sets (S):
φ1 ∨ ... ∨ φn   ⇒   {φ1, ... , φn}
φ1 ∧ ... ∧ φn   ⇒   {φ1}, ... , {φn}

Consider the formula a ∧ (b → c). Based on the rewrite rules, the sets of clauses are {a}, {¬b, c}.

Suppose we wanted to know if a follows from a ∧ (b → c). In the context of classical logic, we can easily see that it does follow. The proof consists in one application of (∧E).

```    a ∧ (b → c)
------------ ∧E
a   ```

This proof, however, is not the one automated-theorem proving finds. To answer the question, it uses resolution in a refutation procedure. It rewrites the premise and the negation of the conclusion, and if the empty clause {} is derivable using the resolution rule

1, ... , χ, ... , φm}
1, ... , ¬χ, ... , ψn}
----------------------
1, ... , φm, ψ1, ..., ψn}

then the sets of clauses that come from rewriting the premises and conclusion are inconsistent. In this example, the sets of clauses are

{a}, {¬b, c}, {¬a}.

So it is easy to see that the empty clause is derivable.

```	{a}
{¬a}
------
{ }```

The corresponding "refutation-style" proof in classical logic is

```  a ∧ (b → c)
----------- ∧E
a                    [¬a]1
------------------- ¬E
⊥
------- ¬I,1
¬¬a
------- ¬¬E
a          ```

To use resolution in automated theorem-proving, it is necessary to have a control procedure for the steps that determine whether the empty clause is derivable. Consider, for example, the argument

p
p → q
(p → q) → (q → r)
----------------
r

One resolution proof that the conclusion is a logical consequence of the premises is

1. {p}               Premise
2. {¬p, q}        Premise
3. {p, ¬q, r}     Premise      (This is not a definite clause)
4. {¬q, r}         Premise
5. {¬r}             Premise
6. {q}               1, 2
7. {r}                4, 6
8. {}                 5, 7

This, however, is not the only way to apply the resolution rule. At step 6, instead of applying the rule to premises 1 and 2, we could have applied it to premises 2 and 3.

Logic programming was born out of reflection on the question of what the control procedure should be when the clauses that constitute the premises are all definite clauses. The way a query is solved in logic programming incorporates one possible control procedure.

## The Language of Logic: the First-Order Predicate Calculus

"The primary difference between propositional and first-order logic lies in the ontological commitment made by each language—that is, what it assumes about the nature of reality. Mathematically, this commitment is expressed through the nature of the formal models with respect to which the truth of sentences is defined. For example, propositional logic assumes that there are facts that either hold or do not hold in the world. Each fact can be in one of two states: true or false, and each model assigns true or false to each proposition symbol.... First-order logic assumes more; namely, that the world consists of objects with certain relations among them that do or do not hold. The formal models are correspondingly more complicated than those for propositional logic" (Stuart J. Russell and Peter Norvig, Artificial Intelligence. A Modern Approach, 3rd edition, 8.1.289). The first-order predicate calculus is more expressive than the propositional calculus. It allows for the representation of the parts of sentences. It also allows for the representation of quantity.

(Again, for this course, it is not necessary to understand all the details.)

The vocabulary of the first-order predicate calculus subdivides into two parts, a logical and a nonlogical part. The logical part is common to all first-order theories. It does not change. The nonlogical part varies from theory to theory. The logical part of the vocabulary consists in

• the connectives: ¬ ∧ ∨ → ∀ (universal quantifier) ∃ (existential quantifier)
• the comma and the left and right parenthesis: , ( )
• a denumerable list of variables: x1 x2 x3 x4. . .

The nonlogical part of the vocabulary consists in

• a denumerable list of constants: a1 a2 a3 a4. . .
• for each n, a denumerable list of n-place predicates:

P1 1, P1 2, P1 3, . . .
P2 1, P2 2, P2 3, . . .
P3 1, P3 2, . . .
.
.
.

Given the vocabulary, a formula is defined as follows:

• If Pn is a n-place predicate, and t1, ..., tn are terms, A term is either a variable or a constant. Pnt1, ..., tn is a formula
• If φ and ψ are formulas, ¬φ, (φ ∧ ψ), (φ ∨ ψ), (φ → ψ) are formulas
• If φ is a formula and v is a variable, then ∀vφ, ∃vφ are formulas
• Nothing else is a formula

What are traditionally called 'models' in the first-order predicate calculus are formal representations of what the formulas are about. This allows for the statement of truth-conditions. (Here again the use of 'model' differs from the prior uses.)

• A model is an ordered pair <D, F>, where D is a domain and F is an interpretation. The domain D is a non-empty set. This set contains the things the formulas are about. The interpretation, F, is a function on the non-logical vocabulary. It gives the meaning of this vocabulary relative to the domain. For every constant c, F(c) is in D. F(c) is the referent of c in the model. For every n-place predicate Pn, F(Pn) is a subset of Dn. F(Pn) is the extension of Pn in the model.

• An assignment is a function from variables to members of D. A v-variant of an assignment g is an assignment that agrees with g except possibly on v. (Assignments are primarily technical devices. They are required to provide the truth-conditions for the quantifiers, ∀ and ∃.)

The truth of a formula relative to a model and an assignment is defined inductively. The base case uses the composite function [ ] F g on terms, defined as follows:

[t] F g = F(t) if t is a constant. Otherwise, [t] F g = g(t) if t is a variable.

The clauses in the inductive definition of truth relative to M and g are as follows:

Pnt1,..., tn is true relative to M and g iff <[t] F g, ..., [t] F g> is in F(Pn).

¬A is true relative to M and g iff A is not true relative to M and g.
A ∧ B is true relative to M and g iff A and B are true relative to M and g.
A ∨ B is true relative to M and g iff A or B is true relative to M and g.
AB is true relative to M and g iff A is not true relative to M and g or B is true relative to M and g.
∃vA is true relative to M and g iff A is true relative to M and g*, for some v-variant g* of g.
∀vA is true relative to M and g iff A is true relative to M and g*, for every v-variant g* of g.

A formula is true relative to a model M iff it is true relative to M for every assignment g.

## An Example in Prolog Notation

An example stated in the Prolog notation helps show the relationship between the first-order predicate calculus and logic programming.

A variable (in Prolog) is a word starting with an upper-case letter. A constant is a word that starts with a lower-case letter. A predicate is a word that starts with a lower-case letter. Constants and predicate symbols are distinguishable by their context in a knowledge base. An atomic formula has the form p(t1,...,tn), where p is a predicate symbol and each t1 is a term.

This example is taken from Representation and Inference for Natural Langauge: A First Course in Computational Semantics, Patrick Blackburn and Johan Bos. Consider the following example based on the movie Pulp Fiction. In the example, various people love other people. Further, there is a rule defining jealousy. In Prolog notation on the left, with the key on the right, the KB in the "Pulp Fiction" example is

loves (vincent, mia).                         "Vincent loves Mia."
loves (marcellus, mia).                     "Marcellus loves Mia."
loves (pumpkin, honey_bunny).     "Pumpkin loves Honey Bunny."
loves (honey_bunny, pumpkin).     "Honey Bunny loves Pumpkin."

jealous (X, Y) :- loves (X, Z), loves (Y, Z).

The rule is universally quantified. From a logical point of view, it is

This sentence is not a formula of Prolog or the first-order predicate calculus. It is a mixed form, meant to be suggestive. ∀X ∀Y ∀Z ( ( loves (X, Z) ∧ loves (Y, Z) ) → jealous (X, Y) )

The symbols X, Y, and Z are variables. The rule is general. It says that for every x, y, and z, x is jealous of y if x loves z and y loves z. Obviously, jealousy in the real world is different.

To express this knowledge base in the first-order predicate calculus, a key is necessary. The key specifies the meanings of the constants and predicates:

Vincent             a1
Marcellus          a2
Mia                    a3
Pumpkin           a4
Honey Bunny   a5

__ loves __       P2 1

Given this key, it is possible to express the entries in the KB more formally as formulas in the first-order predicate calculus. The fact that Marcellous loves Mia

loves (marcellus, mia)

is expressed in the first-order predicate calculus (relative to the key) by the formula

P2 1a2,a3

This formal way of expressing entries in the KB is not at all user friendly, and so almost all examples use something like a Prolog notation.

## The Pulp Fiction Example in Prolog

Relative to the KB, consider the following query (whether Mia loves Vincent):

?- loves (mia, vincent).

The response is

false (or no, depending on the particular implementation of Prolog)

For an example of a slightly less trivial query, consider

?- jealous (marcellus, W).

This asks whether Marcellus is jealous of anyone. In the language corresponding to the first-order predicate calculus, the query asks whether there is a w such that Marcellus is jealous of w:

∃W jealous (marcellus, W)

Since Marcellus is jealous of Vincent (given the KB), the response is

W = vincent

From a logical point of view, the query (and the answer to the query is computed in terms of) the corresponding negative clause

¬jealous(marcellus, W)

This negative clause is read as its universal closure

∀W ¬jealous(marcellus, W)

which (by the equivalence of ∀ to ¬∃¬ in classical logic) is equivalent to

¬∃W jealous(marcellus, W)

The computation (to answer the query) corresponds to the attempt to refute the universal closure by trying to derive the empty clause. Given the KB, the empty clause is derivable

{KB, ∀W ¬jealous(marcellus,W)} ⊢ ⊥

This means that

∃W jealous (marcellus,W)

is a consequence of the KB. Moreover, the computation results in a witness to this existential truth. Given the KB, it follows that Marcellus is jealous of Vincent. So the response to the query is

W = vincent

Here is how this looks running SWI-Prolog on my laptop: ## A Corresponding Proof in the Predicate Calculus

Red marks the rule and two facts. Blue marks the negative clause corresponding to the query. The predicates "loves" and "jealous" and the constants "marcellous" and "vincent" are abbreviated. ¬∀x¬ (not all not) is equivalent to ∃ (some) in the context of classical logic.

```
∀x∀y∀z((l(x,z) ∧ l(y,z)) → j(x,y))
---------------------------------- ∀E
∀y∀z((l(mar,z) ∧ l(y,z)) → j(mar,y))
----------------------------------- ∀E
l(mar,mia)     l(vinc,mia)            ∀z((l(mar,z) ∧ l(vinc,z)) → j(mar,vinc))
---------------------- ∧I     ------------------------------------- ∀E
l(mar,mia) ∧ l(vinc,mia)         (l(mar,mia) ∧ l(vinc,mia)) → j(mar,vinc)          [∀x¬j(mar,x)]1
------------------------------------------------------------------ →E     --------------- ∀E
j(mar,vinc)                                             ¬j(mar,vinc)
--------------------------------------------------- ¬E
⊥
----- ¬ I, 1
¬∀x¬j(mar,x)

```

## Unification is Part of the Computation

The instantiation of variables is a complicating factor. Now (unlike for the propositional calculus) the backward chaining procedure includes what is called unification.

Notice that in the "pulp fiction" example, the query

jealous (marcellus, W)

matches the head of no entry in the logic program. It can, however, be "unified" with the head of the rule defining jealousy. Unification, in this way, is a substitution that makes two terms the same.

A substitution is a replacement of variables by terms. A substitution σ has the following form

{V1/t1, ... , Vn/tn}, where VI is a variable and ti is a term.

φσ is the replacement of every free occurrence of Vi in φ with ti. φσ is a substitution instance of φ.

An example makes unification easier to understand. Consider the blocks-world program:

on(b1,b2).
on(b3,b4).
on(b4,b5).
on(b5,b6).

above(X,Y) :- on(X,Y).
above(X,Y) :- on(X,Z), above(Z,Y).

An agent with this logic program as his or her KB thinks the world looks like this:

```          b3
b4
b1     b5
b2     b6
```

Now suppose the query is whether block b3 is on top of block b5

?- above(b3, b5).

The computation to answer (or solve) this query runs roughly as follows. The query does not match the head of any fact. Nor does it match the head of any rule. It is clear, though, that there is a substitution that unifies this query and the head of the first rule. The unifying substitution is

{X/b3, Y/b5}

This substitution produces

above(b3,b5) :- on(b3,b5).

So the derived query is

on(b3,b5).

This derived query fails. So now it is necessary to backtrack to see if another match is possible further down in the knowledge base. Another match is possible. The query can be made to match the head of the second rule. The unifying substitution is

{X/b3, Y/b5}

This produces

above(b3,b5) :- on(b3,Z), above(Z,b5).

The derived query is

on(b3,Z), above(Z,b5).

Now the question is whether the first conjunct in this query can be unified with anything in the knowledge base. It can. The unifying substitution for the first conjunct in the derived query is

{Z/b4}

The substitution has to be made throughout the derived query. So, given that the first conjunct has been made to match, the derived query becomes

above(b4,b5).

This can be made to match the head of the first rule for above. The unifying substitution is

{X/b4, Y/b5}

and the derived query is now

on(b4,b5}.

This query matches one of the facts in the knowledge base. So the computation is a success! The query is a logical consequence of the KB. Here is the tree representation of the computation:

```
¬above(b3,b5)     above(X,Y) :- on(X,Z), above(Z,Y)                   {X/b3, Y/b5}
above(X,Y) ∨ ¬on(X,Z) ∨ ¬above(Z,Y)
\
\                   /
\                 /
¬on(b3,Z) ∨ ¬above(Z,b5))         on(b3,b4)                                    {Z/b4}
\                                      /
\                                   /
\                              /
¬above(b4,b5)            above(X,Y) :- on(X,Y)           {X/b4, Y/b5}
above(X,Y) ∨ ¬on(X,Y)
\                       /
\                    /
¬on(b4,b5)         on(b4,b5)
\                 /
\              /
⊥

```

## What we have Accomplished in this Lecture

We looked at the relation between logic and logic programming. We saw that if a query is successful, then the query is a logical consequence of premises taken from the logic program. To see this, we considered the connection between the backward chaining process in computing a successful query and the underlying proof in the propositional and first-order predicate calculus.