Natural Language Processing

Lexicons, grammars, and models


Thinking as Computation, Chapter 8 (153-176)




Human beings understand language. How do they do it?

It would seem that human beings, or any intelligent agent who understands natural language, must process the words they see and hear in terms of a lexicon and a grammar.


A lexicon and grammar

A language is defined by a lexicon and grammar. Here is a simple example:

s -> np vp
np -> det n
vp -> v np
vp -> v
det -> a
det -> the
n -> woman
n -> man
v -> loves

The symbols s, np, vp, det, n, and v stand for grammatical categories:

s: sentence
np: noun phrase
vp: verb phrase
det: determiner

The symbols s, np, and vp are non-terminal grammatical categories.

The symbols det, n, and v are terminal grammatical categories.

The symbols a, the, woman, man, and loves are the lexical items in the terminal grammatical categories.


A parse tree for a sentence in the language

Consider the string of words a woman loves a man.

Is this string grammatical? That is to say, is it a member of the language given by lexicon and grammar?

The following parse tree shows that the string is sentence of the language:


                              s
                              |
                              |
                  /                        \
              np                              vp
              |                               |
              |                               |
          /       \                      /        \
     det            n                v               np

                                                 /         \
      |             |                |   
      a           woman            loves      det           n
               
                                               |            |
                                               a           man
       

A recognizer written in Prolog

A "recognizer" is a program that recognizes whether a given string is in the language.

Here is a recognizer (written in Prolog) for the language given by the lexicon and grammar:

s(X,Z) :- np(X,Y), vp(Y,Z).
np(X,Z) :- det(X,Y), n(Y,Z).
vp(X,Z) :- v(X,Y), np(Y,Z).
vp(X,Z) :- v(X,Z).

det([the|W],W).
det([a|W],W).

n([woman|W],W).
n([man|W],W).

v([loves|W],W).

An example helps makes this recognizer easier to understand. The query

s([a,woman,loves,a,man],[]).

asks whether the difference between [a,woman,loves,a,man] and [] is a list that contains a sentence of the language.


Difference lists

The recognizer works in terms of what are called "difference lists."

A difference list represents a list in terms of a pair of lists. The list represent is the "difference" between the two lists. So, for example,

[a,woman,loves,a,man]

is the "difference" between

[a,woman,loves,a,man] and [].

The first list in this pair contains the string to be recognized (or not recognized) as a sentence of the language. The np predicate looks for a noun phrase at the front of the list. If the np predicate finds a noun phrase ([a,woman]), it passes what remains in the list ([loves,a,man]) to the vp predicate. The vp predicate looks for a verb phrase at the front of the list it is given. If the vp predicate finds a verb phrase, and the difference between the list it is given and the verb phrase it finds is the empty list, then the query is successful.


The computation

In more detail, the computation that occurs to answer the query proceeds as follows:

By unifying X with [a,woman,loves,a,man] and unifying Z with [], the query matches the head of the first rule. The query list thus becomes

np([a,woman,loves,a,man],Y), vp(Y,[]).

The variable Y in this derived query list clashes with the variable Y in the rules. (Remember that the logic program is a conjunction of universally quantified sentences. No quantifier binds variables in distinct sentences.) So, to prevent mistakes in the computation, the variable Y in the derived query list is replaced

np([a,woman,loves,a,man],Y1), vp(Y1,[]).

By unifying X with [a,woman,loves,a,man] and unifying Z with Y1, the first query in this derived query list matches the head of the second rule. So the derived query list becomes

det([a,woman,loves,a,man],Y), n(Y,Y1), vp(Y1,[]).

By unifying Y and W with [woman,loves,a,man], the first item on this derived query list matches the second fact about determiners. (The vertical line ( | ) in the fact separates the head from the tail in the list. So, in the fact, the head is the determiner a.) Now the derived query is

n([woman,loves,a,man],Y1), vp(Y1,[]).

By unifying W and Y1 with [loves,a,man], the first item on the derived query list matches the first fact about nouns. Now the derived query is

vp([loves,a,man],[]).

This completes the computation of the noun phrase a woman. The computation of the verb phrase proceeds similarly, and it should be clear that the computation will eventually succeed.


A screen shot of the recognizer at work

% c:/Users/blackson/AppData/Local/Temp/7zO3F2.tmp/recognizer.pl compiled 0.00 sec, 10 clauses
Welcome to SWI-Prolog (Multi-threaded, 32 bits, Version 6.6.1)
Copyright (c) 1990-2013 University of Amsterdam, VU Amsterdam
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.

For help, use ?- help(Topic). or ?- apropos(Word).

1 ?- s([a,woman,loves,a,man],[]).
true .

2 ?- s(X,[]).
X = [the, woman, loves, the, woman] ;
X = [the, woman, loves, the, man] ;
X = [the, woman, loves, a, woman] ;
X = [the, woman, loves, a, man] ;
X = [the, woman, loves] ;
X = [the, man, loves, the, woman] .

3 ?- np([a,woman],[]).
true .

4 ?- np(X,[]).
X = [the, woman] ;
X = [the, man] ;
X = [a, woman] ;
X = [a, man].


A variation on the recognizer program

The recognizer program can be modified to display the parse tree associated with the recognized sentence:


s(s(NP,VP),X,Z) :- np(NP,X,Y), vp(VP,Y,Z).
np(np(DET,N),X,Z) :- det(DET,X,Y), n(N,Y,Z).
vp(vp(V,NP),X,Z) :- v(V,X,Y), np(NP,Y,Z).
vp(vp(V),X,Z) :- v(V,X,Z).

det(det(the),[the|W],W).
det(det(a),[a|W],W).

n(n(woman),[woman|W],W).
n(n(man),[man|W],W).

v(v(loves),[loves|W],W).


Again, an example makes it clearer how the program works. Suppose that the query is

s(T,[a,woman,loves,a,man],[]).

By unifying T with s(NP,VP), X with [a,woman,loves,a,man], and Z with [], the query matches the head of the first rule. The derived query list is

np(NP,[a,woman,loves,a,man],Y), vp(VP,Y,[]).

The variable Y in the derived query list clashes with the variable Y in the KB, so it is replaced in the derived query list

np(NP,[a,woman,loves,a,man],Y1), vp(VP,Y1,[]).

By unifying NP with np(DET,N), X with [a,woman,loves,a,man], and Y1 with Z, the first item on the derived query list matches the head of the second rule. The derived query list becomes

det(DET,[a,woman,loves,a,man],Y), n(N,Y,Y1), vp(VP,Y1,[]).

By unifying DET with det(a) and W and Y with [woman,loves,a,man], the first item on the derived query list matches the second fact about determiners. The derived query becomes

n(N,[woman,loves,a,man],Y1), vp(VP,Y1,[]).

Notice that in the computation so far, T = s(NP,VP), NP = np(DET,N), and DET = det(a). So, at this point in the computation, T = s(np(det(a),N). It is easier to appreciate the (partially) computed value of T if it is presented in the form of a parse tree. Once the paretheses are replaced with branches, T is the following tree

  
      
                           s
                         /
                      
                        np
                        
                      /    \
                      
                    det     N  
                    
                     |
                     
                     a  
      
            

The rest of the computation works similarly, and it is clear that eventually the computation will succeed and return the computed value for T.


The new recognizer at work

      
% c:/Users/blackson/AppData/Local/Temp/7zOAB3C.tmp/recognizer.pl compiled 0.00 sec, 10 clauses
Welcome to SWI-Prolog (Multi-threaded, 32 bits, Version 6.6.1)
Copyright (c) 1990-2013 University of Amsterdam, VU Amsterdam
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.

For help, use ?- help(Topic). or ?- apropos(Word).

1 ?- s(T,[a,woman,loves,a,man],[]).
T = s(np(det(a), n(woman)), vp(v(loves), np(det(a), n(man)))) ;
false.

2 ?- s(T,S,[]).
T = s(np(det(the), n(woman)), vp(v(loves), np(det(the), n(woman)))),
S = [the, woman, loves, the, woman] ;
T = s(np(det(the), n(woman)), vp(v(loves), np(det(the), n(man)))),
S = [the, woman, loves, the, man] ;
T = s(np(det(the), n(woman)), vp(v(loves), np(det(a), n(woman)))),
S = [the, woman, loves, a, woman] ;
T = s(np(det(the), n(woman)), vp(v(loves), np(det(a), n(man)))),
S = [the, woman, loves, a, man] .

3 ?- 

To make the parse trees easier to read, we can add a "pretty printer" (that I took with slight modification from a course on Prolog given at the 16th European Summer School in Logic, Language, and Information). Here is a screen shot of the program at work with the addition of the pretty printer for the trees:


% swipl                                    
Welcome to SWI-Prolog (threaded, 64 bits, version 7.6.3)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit http://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).


?- s(T,S,[]),pptree(T).

s(
   np(
      det(the)
      n(woman))
   vp(
      v(loves)
      np(
         det(the)
         n(woman))))

T = s(np(det(the), n(woman)), vp(v(loves), np(det(the), n(woman)))),
S = [the, woman, loves, the, woman] .

?- 


Models of the world

Understanding natural language requires more than the ability to recognize sentences of the language. This recognition is the recognition that the string is grammatical. Understanding requires knowledge of the conditions under which sentences are true or false. For this, we need a model of the world against which sentences are evaluated as true or false.

A model shows what is true in the world. Here is an example (nldb.pl), written in Prolog, that Levesque provides:


person(john). person(george). person(mary). person(linda).  
park(kew_beach). park(queens_park). 
tree(tree01). tree(tree02).  tree(tree03).  
hat(hat01).   hat(hat02).  hat(hat03).  hat(hat04).

sex(john,male).    sex(george,male). 
sex(mary,female).  sex(linda,female).

color(hat01,red).   color(hat02,blue). 
color(hat03,red).   color(hat04,blue).  

in(john,kew_beach).     in(george,kew_beach). 
in(linda,queens_park).  in(mary,queens_park).  
in(tree01,queens_park). in(tree02,queens_park). 
in(tree03,kew_beach).

beside(mary,linda). beside(linda,mary). 

on(hat01,john). on(hat02,mary). on(hat03,linda). on(hat04,george). 

size(john,small).    size(george,big). 
size(mary,small).    size(linda,small). 
size(hat01,small).   size(hat02,small). 
size(hat03,big).     size(hat04,big).  
size(tree01,big).    size(tree02,small).  size(tree03,small). 


This model is pretty straightforward to understand. There are four persons. Their names are "john," "george," "mary," and "linda." There are two parks. There are four trees, and so on. In addition to the objects in the model (the people, parks, and so on), the model specifies certain basic truths about the objects.


The lexicon and the model

The lexicon contains words and their grammatical categories. The model is a representation of things in the world. The connection the words to things in the world is given in terms of the "extensions" of the words to things in the world. The rules that define these extensions are in the lexicon. In this way, a lexicon is built in terms of a model.

Here is an example (lexicon.pl) that Levesque provides:


article(a).  article(the).  

common_noun(park,X) :- park(X).  
common_noun(tree,X) :- tree(X).
common_noun(hat,X) :- hat(X).  
common_noun(man,X) :- person(X), sex(X,male). 
common_noun(woman,X) :- person(X), sex(X,female). 

adjective(big,X) :- size(X,big).    
adjective(small,X) :- size(X,small). 
adjective(red,X) :- color(X,red).  
adjective(blue,X) :- color(X,blue). 

preposition(on,X,Y) :- on(X,Y).     
preposition(in,X,Y) :- in(X,Y). 
preposition(beside,X,Y) :- beside(X,Y). 

% The preposition 'with' is flexible in how it is used.
preposition(with,X,Y) :- on(Y,X).        % Y can be on X
preposition(with,X,Y) :- in(Y,X).        % Y can be in X
preposition(with,X,Y) :- beside(Y,X).    % Y can be beside X

% Any word that is not in one of the four categories above.
proper_noun(X,X) :- \+ article(X), \+ adjective(X,_), \+ common_noun(X,_), \+ preposition(X,_,_).


Consider the first line. It says that the words a and the belong to the grammatical category of article.

Consider the first rule for the category of common noun.

The first rule says that the word park belongs to the grammatical category of common noun and that something is in the extension of the word park if this thing is a park. (Notice that the token 'park' occurs twice in the rule but that the meaning of the occurrences are different. In the first, it stands for the word. In the second, it stands for the thing.)

To understand this, return to what the model says about the parks in the world. According to the model, a place in the world whose name is "kew_beach" is a park. According to the model, the other park in the world is a place whose name is "queens_park." Together, these two parks (kew_beach and queens_park) constitute the extension of the common noun park.

Consider the third rule for the category of common noun.

This rule says that the word man belongs to the grammatical category of common noun and that something is in the extension of the word man if this thing is a person and is male. According to the model, the things whose names are "john" and "george" are the things in the world who are persons and male. Together, they constitute the extension of the common noun man.


A program to answer questions about extensions

It is possible to write a program that determines if a given object is in the extension of a noun phrase.

Given the model and lexicon, the query

np([a,woman,in,a,park],linda)

succeeds because "linda" is the name of something in the extension of the noun phrase "a woman in a park."

On the other hand, the query

np([a,hat,on,linda],hat02)

fails because the hat whose name is "hat02" is not in the extension of the noun phrase "a hat on Linda."


Here is the Prolog program (np.pl) Levesque provides:


np([Name],X) :- proper_noun(Name,X).
np([Art|Rest],X) :- article(Art), np2(Rest,X).

np2([Adj|Rest],X) :- adjective(Adj,X), np2(Rest,X). 
np2([Noun|Rest],X) :- common_noun(Noun,X), mods(Rest,X). 

mods([],_). 
mods(Words,X) :- 
   append(Start,End,Words),   % Break the words into two pieces.
   pp(Start,X),               % The first part is a PP.
   mods(End,X).               % The last part is a Mods again.

pp([Prep|Rest],X) :- preposition(Prep,X,Y), np(Rest,Y).


The relevant rewrite rules in the grammar are

NP -> proper noun
NP -> article NP2
NP2 -> adjective NP2
NP2 -> common_noun Mods

Consider the noun phrase "a big tree." The corresponding parse tree is

      
     
                          NP
                           |
                   /              \
               article            NP2
                  |                | 
                  |           /            \
                  |       adjective        NP2
                  |           |             |
                  |           |         /       \
                  |           |    common_noun  Mods
                  |           |         |
                  a          big       tree


Here is the computation (with commentary) to determine whether tree01 is in the extension of "a big tree":



                 np([a,big,tree],tree01)                    query

                           |                                np([Art|Rest],X) :- article(Art), np2(Rest,X)
                           |                                Art = a, Rest = [big,tree], X = tree01

              article(a), np2([big,tree],tree01)            

                           |                                article(a) succeeds; matches fact in lexicon
                           |                                

                   np2([big,tree],tree01)                   

                           |                                np2([Adj|Rest],X) :- adjective(Adj,X), np2(Rest,X)
                           |                                Adj = big, Rest = tree, X = tree01

            adjective(big,tree01), np2(tree,tree01)         

                           |                                adjective(big,X) :- size(X,big)
                           |                                X = tree01

              size(tree01,big), np2(tree,tree01)            
                        
                           |                                size(tree01,big) succeeds; matches fact in model
                           |

                    np2(tree,tree01)
                              
                           |                                np2([Noun|Rest],X) :- common_noun(Noun,X), mods(Rest,X)
                           |                                Noun = tree, Rest = [], X = tree01

               common_noun(tree,tree01), mods([],tree01)

                           |                                common_noun(tree,X) :- tree(X)                     
                           |                                X = tree01

                 tree(tree01), mods([],tree01)

                           |                                tree(tree01) succeeds, matches fact in model 
                           |

                     mods([],tree01)

                           |                                mods([],tree01) succeeds, it matches mods([],_)
                           |
                           |                                The symbol _ (called the "underscore") is the anonymous variable.
                           |                                It indicates that the variable is solely for pattern-matching. The  
                           |                                binding is not part of the computation process.
                         
                    the query succeeds!



The program at work:


% p:/Work Desktop/tab.pl compiled 0.00 sec, 71 clauses
Welcome to SWI-Prolog (Multi-threaded, 32 bits, Version 6.6.1)
Copyright (c) 1990-2013 University of Amsterdam, VU Amsterdam
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.

For help, use ?- help(Topic). or ?- apropos(Word).

1 ?- np([a,woman,in,the,park],linda).
true .

2 ?- np([a,hat,on,linda],hat02).
false.

3 ?- np([a,big,tree],X).
X = tree01 ;
false.

4 ?- 


What we have accomplished in this lecture

We looked at some ideas in natural language processing. We saw how a lexicon and grammar together define a language. We considered a Prolog program that recognizes sentences in the language and generates their parse trees. We saw how a lexicon can be built relative to a model so that a Prolog program answer questions about whether an object is in the extension of a noun phrase. The computations in these programs show natural language processing can be incorporated into the logic programming/agent model.







move on g back