Prohibitions in Machines Ethics

The Nammour Symposium, Sacremento State University, April 2019

Thomas A. Blackson
School of Historical, Philosophical, and Religious Studies
Arizona State University


The Problem of Machine Ethics

If we think of machine ethics as a design problem, the problem is how to give "ethics" to machines. This way of thinking about machine ethics requires us to know what the "ethics" is that we want to give to machines. Otherwise, we will not know what counts as a solution to the problem.

One way to think about what this "ethics" is is to think about the terms in English we use to appraise behavior. The three primary terms are 'right,' 'wrong,' and 'obligatory.'

These three terms as interdefinable in a certain way. That is to say, we can use any one of them to define the other two. So, for example, if we take ‘right’ as primitive, we can define 'wrong' as what is not right and define 'obligatory' as what is not right not to do.

Put a little more formally, the two definitions are

• an act is wrong iff it is not right.
• an act is obligatory iff it is not right not to do.

Since 'prohibited' just means "wrong," a machine that never acts in prohibited ways is a machine whose actions are always right. This, I take it, is what we want.

We Know how to Solve the Problem

Given this way of setting out the problem, a solution is apparent. We solve the problem of machine ethics if we figure out how to build machines whose actions are always right.

In one way, this is an easy problem to solve. All we have to do is build machines so that they never take any actions. A machine that never acts is a machine whose acts are never wrong.

It seems plausible that all existing machines are like this, so all we have to do to build machines that they never act is to keep building them in the ways we have been building them.

Machines that can Act need Ethics

It is conceivable that continuing to build machines as we always have built them is the only solution to the problem of machine ethics, but the interest in machine ethics is built on the assumption that we will soon build machines that can act.

If this assumption is true, we need to think about how to understand such machines. Just how to do this is not completely clear, but maybe one way to do this is to think about machines in terms of the logic programming/agent model of a rational agent. This model has lots of problems, but perhaps it does begin to show how it is possible to build a machine that can act.

The Knowledge Base

The logic programming/agent model has several parts. One is traditionally called the "knowledge base" (KB). Rational agents exist in a loop: they determine whether things are to their liking, they try to make them better if they are not, and they do this over and over again through out their existence. To do this, the agent must have a view about how things are. This is where the “knowledge base” comes in. The view about how things are is in the KB.

The name "knowledge base" suggests that what is in the KB is knowledge. We can ask whether this is true, but first lets ask about the KB's initial state. Some knowledge about the world may be built into rational agents, as perhaps is true for human beings, but in any realistic environment, no agent or machine can be equipped from its inception with all the information it needs to act. It must acquire new information about the world by sensing itself and its surroundings.

It is hard to image how this process could work so that the KB consists in knowledge. Think about perception. If, say, something looks red to me, and I have no information to the contrary, it is rational for me to form the belief that the object is red. In AI terms, to say "form the belief that the object is red" is to say "put the proposition that the object is red in my KB." This proposition, though, might be false. It might be, for example, that the object I am looking at is white and has a red light shinning on it. Even so, as long as I have no reason to think that the light is red, it is rational for me to include the proposition in my view of how things are.

So whether we should conceive of the KB as knowledge is a problem, but for now we can sidestep this issue and think of the KB as a list of propositions that constitutes the machine’s view of the world. The propositions on this list are represented in a language. In the logic programming/agent model, they are represented as formulas in a version of the first-order predicate calculus.

An Example in Prolog

An example based on the movie Pulp Fiction (which I borrow from Representation and Inference for Natural Langauge, Patrick Blackburn and Johan Bos) helps make the representation of propositions in the KB a little clearer. In the example, as in the movie, various people love each other. Further, there is a rule about what jealousy is. So the KB takes the form

loves(vincent,mia).           % Read as "Vincent loves Mia."
loves(marcellus,mia).         % Read as "Marcellus loves Mia."
loves(pumpkin,honey_bunny).   % Read as "Pumpkin loves Honey Bunny."
loves(honey_bunny,pumpkin).   % Read as "Honey Bunny loves Pumpkin."

jealous(X,Y) :- loves(X,Z), loves(Y,Z).

The rule says that for all X, Y, and Z, X is jealous of Y if X and Y love Z (The symbol :- is for "if." The comma is for "and.") Obviously jealousy in the real world is different.

Now we can think of this KB as the way an artificial agent sees the world. Further, we can give an artificial agent with this view of the world the ability to draw logical consequences from its "beliefs." So, for example, if it were asked whether

?- jealous(vincent,marcellus). % Read "Is Vincent is jealous of Marcellous?"

is true, it could work out that the answer is "yes." The truth of this proposition follows logically from its "beliefs" about the world. The proof of this conclusion based on premises in the KB is straightforward. We instantiate the universal quantifiers for 'vincent,' 'marcellus' and 'mia,' form the conjunction, and use material implication elimination to reach the conclusion.

Here is the proof in the Gentzen-style natural deduction:

      
∀X ∀Y ∀Z[(loves(X,Z) ∧ loves(Y,Z)) → jealous(X,Y)]
----------------------------------------------------∀E
 ∀Y ∀Z[(loves vin,Z) ∧ loves (Y,Z)) → jealous(vin,Y)]
 -----------------------------------------------------∀E
   ∀Z [(loves(vin,Z) ∧ loves(mar,Z)) → jealous (vin,mar)]    loves (vinc, mia) loves(mar,mia)
   ------------------------------------------------------∀E  --------------------------------∧I
     (loves (vinc, mia) ∧ loves(mar,mia)) → jealous(vin,mar) (loves (vinc, mia) ∧ loves(mar,mia))
     ---------------------------------------------------------------------------------------------→E
                                            jealous (vincent, marcellous)
      

Further, a machine can work out the answer too. Lets try it. And for a little logic fun, lets ask the machine the following questions too

?- jealous(vincent,X).
?- jealous(vincent,X),\+ X=vincent.
?- jealous(vincent,X),\+ X=vincent,\+X=marcellus.

The Fox and the Crow

If the beliefs in the KB include beliefs about how to make things happen, then a machine with such a KB can be understood to answer a question about whether it can achieve a goal.

Consider the example of the fox and the crow (which I borrow from Computational Logic and Human Thinking, Robert Kowalski). In this example, the crow sits in a tree holding cheese in its mouth. The fox stands below and has the following beliefs in its KB:

I have X if I am near X and I pick up X.
I am near the cheese if the crow drops the cheese.
The crow drops the cheese if the crow sings.
The crow sings if I praise the crow.

Suppose that the question for the “fox machine” is whether

I have the cheese

is true. To work out the answer, the fox could see whether the proposition follows from its KB. The “reasoning” would go something like this. The fox reasons from

I have X if I am near X and I pick up X

to

I have the cheese if I am near the cheese and I pick up the cheese

Now the fox understands that the goal of having the cheese reduces to the goals of being near the cheese and to picking up the cheese. So the fox proceeds to ask itself whether

I have the cheese, and I pick up the cheese

are true. Given its beliefs, the fox realizes that the goal of having the cheese reduces to the goal of the crow dropping the cheese, that the goal of dropping the cheese reduces to the goal of the crow singing, and that the goal of the crow signing reduces to the goal of the fox praising the crow. In this way, although the answer to the question of whether

I have the cheese

is true is "no," in trying to answer this question the fox works out a plan for making it true. The plan is to praise the crow and pick up the cheese when the crow drops it.

At this point, we can begin to see how computing logical consequence can be the basis for building a machine that can act. One problem, though, is that the "fox machine" receives its question from the outside. We put the question to the machine, and when the machine computes the answer in terms of logical consequence, we can think of it as working out a plan to achieve a goal. What we need for "fox-machine" to be a machine that acts is for the goal to arise internally.

Maintenance Goals and Achievement Goals

This brings us to another part of the logic programming/agent model. We have the KB. It represents the agent’s view of the world. We have the procedure for computing logical consequence. It represents a way in which the agent reasons on the basis of its view of the world. The next part we need is what is sometimes called a "maintenance goal."

When we talk about maintenance goals, it is helpful to contrast them with what are called "achievement goals." In the example of the fox and the crow,

I have the cheese

is an achievement goal. It is the goal the fox tries to work out a plan to achieve. The function of maintenance goals is to introduce achievement goals. Maintenance goals encode relationships with the world an agent is designed or has evolved to maintain through its various behaviors. If the agent realizes the relationship fails, the maintenance goal issues in an achievement goal. The achievement goal triggers behavior to achieve the achievement goal. This behavior is an effort on the part of the agent to reinstate the relationship with the world.

The states in the world that matter to the life of the agent are encoded in the antecedents of maintenance goals. Consider hunger in animals. When animals are hungry, they tend to move to find food and eat it. In terms of the logic programming/agent model, the conditional

If I am hungry, I find food and eat it

is instantiated in the animal so that it functions as a maintenance goal. When the animal registers the truth of the antecedent, the content of the consequent is activated as an achievement goal. The achievement goal, in turn, moves the animal to take steps to find food and eat it.

To understand more clearly what a maintenance goal is, it is helpful to think about desire in terms of the (ancient Platonic) model of depletion and replenishment. The object of the desire replenishes and thus maintains the agent. So, in the example of the fox and the crow, finding and eating food replenishes the fox. The desire arises because the agent is depleted in a certain way. The fox realizes it is depleted in this way by realizing that it is hungry, and the maintenance goal links the depletion (hunger) to the condition that replenishes (eating) the depletion.

So now, in the fox machine, there is a maintenance goal and a KB:

If I am hungry, I find cheese and eat it.

I have X if I am near X and I pick up X.
I am near the cheese if the crow drops the cheese.
The crow drops the cheese if the crow sings.
The crow sings if I praise the crow.

Further, the fox machine is able to sense its environment. This ability includes the ability to sense states in itself. The fox machine, then, can observe whether

I am hungry

is true. If the fox machine observes that "I am hungry" is true, this observation triggers its maintenance goal. When the maintenance goal is triggered, it introduces the achievement goal

I find cheese and eat it

This achievement goal triggers the reasoning process to find a plan to find cheese and eat it.

Adding Prohibitions to the Model

The logic programming/agent model, as I have set it out thus far, has shortcomings as a model of a rational agent, but I think it does give us some indication of how we might build a machine that acts. So now it is time to turn to the problem of how to give such a machine ethics.

At the outset, I suggested that a way to do this is to give the machine prohibitions.

Suppose there is a machine designed to promote public safety, and suppose that this machine finds itself in a situation where there is a runaway trolley. Further, instead of the usual example familar from the consideration of utilitarianism, let the situation be a variant in which the machine is standing on a bridge over the track, a human is standing next to the machine, and the only way to stop the train from killing the five people on the track is for the machine to throw the human onto the track. Suppose that the "public safety machine" has the following "beliefs" in its KB:

Five people are on the tack.
A train is speeding along the track.
The five people cannot escape from the track.
A human is standing next to me.
This human is an innocent bystander.

Suppose that the “public safety machine” has the following maintenance goal:

if people are in danger of being killed by a train,
then I respond to the danger of the people being killed by the train.

Recall that the "fox machine” has the ability to observe the state that triggers its maintenance goal: namely, that it was hungry. The "public safety machine" cannot simply observe that people are in danger of being killed by a train. It will have to form this belief on the basis of what he can observe. Just how this would work is not obvious, but we can put this issue aside for now. We can suppose that the machine does form this belief and that forming this belief triggers its maintenance goal. So, at this point in its reasoning, it has the following achievement goal:

I respond to the danger of the people being killed by the train.

To achieve this goal, the machine must have beliefs in its KB about what it can do to respond to the danger. So assume it has the following beliefs in its KB:

I respond to the danger of the people being killed by the train
if I ignore the danger.

I respond to the danger of people being killed by the train
if I save the people from being killed by the train.

I save the people from being killed by the train
if I throw the human onto the track.

Given these beliefs, the public safety machine can form two plans:

I ignore the danger

or

I throw the human onto the track

To decide between these plans, the machine must have the ability to reason prospectively from these plans to their consequences. Here is where the prohibition comes in. It takes the form

If I kill a human and the human is an innocent bystander, then false.

This prohibition gets triggered (against the background of its beliefs) if the machine reasons from the second plan to the consequence

I kill a human and the human is an innocent bystander

This consequence triggers the prohibition, and the prohibition rules out the plan.