Prohibitions in Machines Ethics

The Nammour Symposium, Sacramento State University, April 2019

Thomas A. Blackson
School of Historical, Philosophical, and Religious Studies
Arizona State University


The Problem of Machine Ethics

We can think of machine ethics as the problem of how to give "ethics" to machines. This requires us to know what we are giving machines when we give them "ethics." If we do not know what this is, we will not know what counts as a solution to the problem.

One way to approach the issue of what we are giving machines when we give them "ethics" is to think about the terms in English we use to appraise behavior: 'right,' 'wrong,' and 'obligatory.'

These three terms as interdefinable in a certain way. That is to say, we can use any one of them to define the other two. So, for example, if we take ‘right’ as primitive, we can define 'wrong' as what is "not right" and define 'obligatory' as what is "not right not to do."

Put a little more formally, the two definitions are

• an act is wrong iff it is not right.
• an act is obligatory iff it is not right not to do.

Since 'prohibited' means "wrong," a machine that never acts in prohibited ways is a machine that never acts in ways that are wrong. This, I take it, is what we want in machines with "ethics."

We Know how to Solve the Problem

Given this way of setting out the problem, we solve the problem of machine ethics if we figure out how to build machines whose actions are never wrong.

In one way, this is an easy problem to solve. All we have to do is build machines so that they never take any actions. A machine that never acts is a machine whose actions are never wrong.

It seems plausible that all existing machines are like this, so all we have to do to build machines that never act is to keep building them in the ways we have been building them.

Machines that can Act need Ethics

It is conceivable that continuing to build machines as we always have built them is the only possible solution to the problem of machine ethics, but it is seems that the interest in machine ethics is built on the assumption that we will soon build machines that act.

If this assumption is true, we need to think about what such machines will be like. How we are to do this is not completely clear, but maybe one way is to think about machines in terms of the

logic programming/agent model of a rational agent.

This model of a rational agent has problems, but perhaps it does begin to show how it is possible to build a machine that acts and hence that can act in ways that are right or wrong.

The Knowledge Base

The logic programming/agent model has several parts.

One is traditionally called the "knowledge base" (KB).

Rational agents exist in a certain kind of loop: they determine whether things are to their liking, they try to make them better if they are not to their liking, and they do this over and over again through out their existence. To do this, they must have a view about how things are.

This is where the “knowledge base” comes in. The KB is the view about how things are.

The name "knowledge base" suggests that what is in the KB is knowledge. We can ask whether this is the correct way to think of the KB, but first lets ask about its initial state.

Some initial knowledge about the world may be built into machines that are rational agents, as perhaps is true for human beings, but in any realistic environment, no rational agent can be equipped from its inception with all the information it needs to act. It must acquire new information about the world by sensing itself and its surroundings.

It is hard to image how this process could work so that the KB consists in knowledge. Think about perception. If, say, something looks red to me, and I have no information to the contrary, it is rational for me to form the belief that the object is red. In AI terms, to

form the belief that the object is red

is equivalent to

put the proposition that the object is red in the KB.

This proposition, though, might be false. For example, it might be that the object I am looking at is white and has a red light shinning on it. Even so, as long as I have no reason to think that the light is red, it is rational for me to include the proposition in my view of how things are.

So whether we should conceive of the KB as knowledge is a problem, but for now we can sidestep this issue and think of the KB as a list of propositions that constitutes the machine’s view of the world. The propositions in this list are represented in a language. In the logic programming/agent model, they are represented as formulas in a version of the first-order predicate calculus.

An Example in Prolog

An example (I borrow from Representation and Inference for Natural Langauge, Patrick Blackburn and Johan Bos) based on the movie Pulp Fiction helps to make the representation of propositions in the KB a little clearer. In the example, as in the movie, various people love each other. Further, there is a rule about what jealousy is. So the KB includes the following entries:

loves(vincent,mia).           % Read as "Vincent loves Mia."
loves(marcellus,mia).         % Read as "Marcellus loves Mia."
loves(pumpkin,honey_bunny).   % Read as "Pumpkin loves Honey Bunny."
loves(honey_bunny,pumpkin).   % Read as "Honey Bunny loves Pumpkin."

jealous(X,Y) :- loves(X,Z), loves(Y,Z).

The last entry is the rule. (The other are literals.) It says that for all X, Y, and Z, X is jealous of Y if X and Y love Z Obviously jealousy in the real world is different.

Now we can think of this KB as the way a machine sees the world. Further, we can give a machine with this view of the world the ability to draw logical consequences from its "beliefs." So, for example, if it were asked whether

?- jealous(vincent,marcellus). % Read "Is Vincent is jealous of Marcellous?"

is true, it could work out the answer "yes" because the truth of this proposition follows logically from its "beliefs" about the world. Given premises in the KB, the proof is straightforward. We instantiate the universal quantifiers for 'vincent,' 'marcellus' and 'mia,' form the conjunction, and use material implication elimination to reach the conclusion.

Here is the proof set out in a Gentzen-style natural deduction:

      
∀X ∀Y ∀Z[(loves(X,Z) ∧ loves(Y,Z)) → jealous(X,Y)]
----------------------------------------------------∀E
 ∀Y ∀Z[(loves vin,Z) ∧ loves (Y,Z)) → jealous(vin,Y)]
 -----------------------------------------------------∀E
   ∀Z [(loves(vin,Z) ∧ loves(mar,Z)) → jealous (vin,mar)]    loves (vinc, mia) loves(mar,mia)
   ------------------------------------------------------∀E  --------------------------------∧I
     (loves (vinc, mia) ∧ loves(mar,mia)) → jealous(vin,mar) (loves (vinc, mia) ∧ loves(mar,mia))
     ---------------------------------------------------------------------------------------------→E
                                            jealous (vincent, marcellous)
      

We can work out the answer "yes" by constructing this proof that

jealous(vincent,marcellus)

is a logical consequence of premises in the KB.

There are now machines that can work out this same answer. The machines do not do it by constructing exactly this proof, but we do not need to worry about that here. Instead, lets just give such a machine the KB and the question and watch it return the answer. Further, for a little fun, lets ask it to return answers to the following questions too

?- jealous(vincent,X).
?- jealous(vincent,X),\+ X=vincent.
?- jealous(vincent,X),\+ X=vincent,\+X=marcellus.

The Fox and the Crow

If the beliefs in the KB include beliefs about how to make things happen, then a machine with such a KB can be understood to be "thinking" about how to achieve a goal. The Fox and Crow

Consider the example of the fox and the crow. (I borrow this example from Computational Logic and Human Thinking, Robert Kowalski.) In the example, a crow sits in a tree holding a piece of cheese in its mouth. A fox stands below and has the following beliefs in its KB:

I have X if I am near X and I pick up X.
I am near the cheese if the crow drops the cheese.
The crow drops the cheese if the crow sings.
The crow sings if I praise the crow.

Suppose that the question for the “fox machine” is whether

I have the cheese

is true. To work out the answer, the fox could see whether the proposition is a logical consequence of its KB. The “reasoning” would go something like this. The fox would instantiate the universal

I have X if I am near X and I pick up X

to

I have the cheese if I am near the cheese and I pick up the cheese

and would see (given its KB) that the truth of

I have the cheese

reduces to the truth of

I am near the cheese and I pick up the cheese

and that this in turn reduces to

The crow drops the cheese and I pick up the cheese

and to

The crow sings and I pick up the cheese

and finally to

I praise the crow and I pick up the cheese

Now, given its KB, to the question

I have the cheese

the fox machine answers "no" because this proposition is not a logical consequence of its KB.

Notice, though, that in working out whether this is a logical consequence, the fox has worked out a plan. The plan is to praise the crow and to pick up the cheese when the crow drops it.

At this point, then, in light this example, we can begin to see how a procedure for computing logical consequence can be the basis for building a machine that can act. Rational agents have goals and can form plans to achieve these goals. The propositions needed for logical consequence but missing from the KB can be thought of as the elements of a plan to achieve a goal

As it stands, however, "fox machine" does not act. It receives its goal from us. We put the question to the machine, and when the machine computes the answer in terms of logical consequence, we can think of the machine as working out a plan to achieve a goal.

For it to be plausible that "fox-machine" acts, the goal must arise in the fox's mind.

Maintenance Goals and Achievement Goals

This brings us to another part of the logic programming/agent model. We have the KB. It represents the agent’s view of the world. We have the procedure for computing logical consequence. It represents a way in which the agent reasons on the basis of its view of the world. The next part we need is what is sometimes called a "maintenance goal."

When we talk about maintenance goals, it is helpful to contrast them with what are called "achievement goals." In the example of the fox and the crow,

I have the cheese

is an achievement goal. It is the goal the fox tries to work out a plan to achieve. The function of maintenance goals is to introduce achievement goals. Maintenance goals encode relationships with the world an agent is designed or has evolved to maintain through its various behaviors. If the agent realizes that the relationship fails, the maintenance goal issues in an achievement goal. The achievement goal triggers behavior to achieve the achievement goal. This behavior is an effort on the part of the agent to reinstate the relationship with the world.

In this way, the states that matter to the life of a agent are encoded in the antecedents of maintenance goals. Consider hunger in animals. When animals are hungry, they tend to move to find food and eat it. In terms of the logic programming/agent model, the conditional

If I am hungry, I have food and eat it

is instantiated in the animal so that it functions as a maintenance goal. When the animal registers the truth of the antecedent, the content of the consequent is activated as an achievement goal. This achievement goal, in turn, moves the animal to take steps to find food and eat it.

To understand more clearly what a maintenance goal is, it is helpful to think about desire in terms of the (ancient Platonic) model of depletion and replenishment. The object of the desire replenishes and thus maintains the agent. So, in the example of the fox and the crow, finding and eating food replenishes the fox. The desire arises because the fox is depleted in a certain way. The fox realizes it is depleted in this way by observing that it is hungry, and the maintenance goal in the mind of the fox links the depletion (hunger) to the condition that replenishes (eating) the depletion.

So now, in the fox machine, there is a maintenance goal and a KB:

If I am hungry, I have cheese and eat it.

I have X if I am near X and I pick up X.
I am near the cheese if the crow drops the cheese.
The crow drops the cheese if the crow sings.
The crow sings if I praise the crow.

Further, the fox machine is able to sense its environment. This ability includes the ability to sense states in itself. The fox machine, then, can observe that

I am hungry

is true. If the fox machine observes that "I am hungry" is true, this observation triggers its maintenance goal. When the maintenance goal is triggered, it introduces the achievement goal

I have cheese and eat it

This achievement goal triggers the reasoning process to find a plan to have cheese and eat it.

Adding Prohibitions to the Model

The logic programming/agent model, as I have set it out thus far, has shortcomings as a model of a rational agent, but I think it does give us some indication of how we might build a machine that acts. So now it is time to turn to the problem of how to give such a machine "ethics."

At the outset, I suggested that a way to do this is to build the machine with prohibitions.

Suppose there is a machine designed to promote public safety, and suppose that this machine finds itself in a situation where there is a runaway trolley. Further, instead of the usual example familar from the consideration of utilitarianism, let the situation be a variant in which the machine is standing on a bridge over the track, a human is standing next to the machine, and the only way to stop the train from killing the five people on the track is for the machine to throw the human onto the track. Suppose that the "public safety machine" has the following "beliefs" in its KB:

Five people are on the tack.
A train is speeding along the track.
The five people cannot escape from the track.
A human is standing next to me.
This human is an innocent bystander.

Suppose that the “public safety machine” has the following maintenance goal:

if people are in danger of being killed by a train,
then I respond to the danger of the people being killed by the train.

Recall that the "fox machine” has the ability to observe the state that triggers its maintenance goal: namely, that it was hungry. The "public safety machine" cannot simply observe that people are in danger of being killed by a train. It will have to form this belief on the basis of what he can observe. Just how this would work is not obvious, but we can put this issue aside for now. We can suppose that the machine does form this belief and that forming this belief triggers its maintenance goal. So, at this point in its reasoning, it has the following achievement goal:

I respond to the danger of the people being killed by the train.

To achieve this goal, the machine must have beliefs in its KB about what it can do to respond to the danger. So assume it has the following beliefs in its KB:

I respond to the danger of the people being killed by the train
if I ignore the danger.

I respond to the danger of people being killed by the train
if I save the people from being killed by the train.

I save the people from being killed by the train
if I throw the human onto the track.

Given these beliefs, the public safety machine can form two plans:

I ignore the danger

or

I throw the human onto the track

To decide between these plans, the machine must have the ability to reason prospectively from these plans to their consequences. This is where the prohibition comes in. It takes the form

If I kill a human and the human is an innocent bystander, then false.

This prohibition gets triggered (against the background of its beliefs) if the machine reasons from the second plan to the consequence

I kill a human and the human is an innocent bystander

This consequence triggers the prohibition, and the prohibition rules out the plan. In this way, for the public safety machine, it seems possible to say that the machine has "ethics."