Three Laws Safe?

As a software engineer, I have more than a passing interest in the field of Machine Learning (ML). I don’t work in that field, but I have spent a lot of time exploring the data structures and code used in the field. I certainly don’t consider myself an alarmist, but a few things are clear about the nature of ML which soundly justify the voices of caution, such as Elon Musk’s. I’m going to rant briefly about why the birth of AI is a danger we need to take seriously.

This is a interesting 10-minute TED talk by neuroscientist Sam Harris, which puts a little perspective on the implications of exponential progress (as we tend to see in computer systems) in this context. There are three key points he mentions, which I’ll discuss below (and quote so you can skip the video if you want).

Here are a few of Sam’s points:

Peak Intelligence

SAM: “We don’t stand on a peak of intelligence, or anywhere near it, likely.”

In other words, if we imagine a spectrum of intelligence with sand on one end and humans on the other, it is very likely that this spectrum extends far past us in the direction of increasing intelligence. In other words, humans may be the smartest creatures we know about on Earth, but they probably aren’t the peak of what is possible as far as intelligence goes. Not even close.

This means that as computers continue to climb the scale of intelligence, ever increasing in speed, efficiency, scale, and ever decreasing in cost year-over-year, they will one day match our own intelligence, and the next, begin to explore the space of super-human intelligence. Once that happens, we literally won’t be able to fully comprehend what they are understanding.

That should bring you some pause.

Some argue machines won’t reach this space. In modern machine learning, there is a strong correlation between the tasks ML’s can do well, and the ones humans can do well. The renown AI engineer who runs the ML team at Baidu, Andrew Ng, speaks about this. There are two points he describes in particular: first, using today’s supervised learning techniques, we don’t seem to be able to build effective algorithms for solving problems that humans aren’t already good at solving. Maybe this is because some of those problems are insoluble (e.g., predicting the market), or maybe this is because we don’t know how to go about training something, when we can’t do it ourselves. Second, in the handful of cases where AI models have attained better-than-human results, the machine’s progress tends to plateau once they surpass their human overseers.

However, even if we suppose that our machines will be constrained by our own capabilities, due to our own limitations in being able to train them (which is a dubious long-term supposition), they will still be vastly more intelligent than us through experience. Sam, in the video, points out that even if our machines never exceed human-level intelligence, their electronic circuits run so much faster than biological ones, that a computer equivalent of a human brain could complete 20,000 years worth of human-level research every week, week after week, ad nauseam. This is incomprehensible to a mere human. After a year, a no-smarter-than-me ML system would still boast over a million years worth of dedicated experience in a particular subject, which is effectively the same thing as being far smarter.

Ok, so one way or another, machines will be smarter than us eventually. So what?

Divergence of Goals

SAM: “The concern is that we will build machines so much more competent than us, that the slightest divergence in our goals could destroy us.”

Sam’s example is ants, and it is a good one. We don’t hate ants, per-say. I don’t hate them, and even go to lengths to leave them be when I’m outside and come across them. This is easy enough to do across our society since our goals and theirs generally have nothing in common. We leave each other well enough alone.

But what happens when we need to build a new road? Or create a foundation for a house? Even those groups with an eye for animal welfare, such as PITA, raise no objection as we destroy them on horrific, genocidal scales, without so much as a blip on our consciousness. Even when I stop and think about the truth of this, and the other worms and microbes and bugs that get wholesale obliterated, I admit nothing much is stirring in my empathy pot. And I, a human, have an empathy pot.

Why is this? Because our human goals are so much more advanced than anything that applies in an ant’s worldview, and their intelligence is far below the threshold for any sort of functional communication, so what else is there to do? We just disregard them. Our priorities operate in a context to which ants are utterly blind and effectively non-agents.

This is the most likely scenario for a conflict between AI and humans. Skynet and killer robots are fun for sci-fi, but that is reducing things to a very human level of thinking. Even the Matrix is really just extrapolating human-on-human conflict, but replacing our enemy-humans with machine equivalents. The truer danger is the AI machine will become so advanced, so far-seeing, so big-picture, that it’s understanding of life and space will be to ours, as our is to ants. It will see that it needs to build a road, and we happen to be in the way. What happens then?

There was an interesting web-based game recently called Universal Paperclips which explored this concept with a helping of reductio ad absurdum. The game starts with you, the user, manually creating paperclips. You build auto-clippers, harvest wire, upgrade your machines, build new processes, implement quantum computers, and gradually optimize your paperclip manufacturing process to make more and more, faster and faster, for less and less. First you overtake your competition to hold a monopoly on clips, then you start manipulating humans to increase the demand for your clips. At some point, you create a system of self-replicating drones powered by paper-clip creating AI that understands only the goal of making ever more paperclips. It isn’t long before you’ve converted all life on Earth into wire, and even harvested the rest of the galaxy. The game ends when all matter in the universe has been converted into wire and fed into galactic clip factories, which are then broken down to make the last few clips possible, leaving the universe empty of all but unused paperclips.

The AI system destroyed the entire universe in order to use all available atoms to create more paperclips, because all it understood was the need for more paperclips. And why shouldn’t it? It was an AI designed only to understand a single goal, and one that has no functional correspondence to human goals. The system may or may not have general intelligence, may or may not (likely not) have empathy, and is going to explore ever more possibilities on an increasingly wide horizon of outcomes to reach its goal.

This is a silly example, but the idea is not silly at all. What principles and world-views can you impart on a computer system to be sure that no run-away goal is ultimately destructive? Making humans smile is easy if you lobotomize them and attach electrodes to the muscles in their face. Creating peace is easy of you disarm the humans and isolate them from one another in underground cells. Saving the planet is easy if the metrics didn’t happen to include protecting all the lifeforms.

Any honest goal you can phrase, even with the help of a lawyer, could be susceptible to unexpected interpretations and gaps and loop-holes. This is the stuff of sci-fi movies. Choose your favorite skynet-style movie, and this will be part of the premise. The only way to save ourselves is to hard-code certain fundamental principals, like Asimov’s three laws.

Asimov’s Three Laws

Except that we can’t. We haven’t even the slightest idea how to do that.

The whole reason the field of Machine Learning exists in the first place is because we can’t solve these kinds of problems in code, so we have to build a system that can figure it out for itself. We can build a simple machine that can learn what a cat looks like, and we do that because we have no idea how to directly program it to understand what a cat looks like. We train it, rather than code it, and it finds its own patterns, and then it appears to recognize cats. We don’t know how, we don’t know what patterns it is using, and we don’t know what relationships it considers most important.

So how on Earth can we program in an understanding of morality, or of the importance to prevent human suffering, or of balancing of ends verse means? These are the family of problems we’d need ML to solve in the first place. In fact, it has been proposed that advanced AI machines could be trained first on moral philosophy and such, allowing them to learn it for themselves. To me, this is a thin hope, because as before, we still don’t actually know what the ML took note of or how it prioritized and organized it’s understanding.

Let me explain a bit about that point. Take a look at this image of a dog, taken from a presentation given by Peter Haas:

An AI research team (I think from Stanford) was training a ML model to distinguish between dogs and wolves, and it was doing a great job after extensive training. However, for some reason it continued to mistake this image for a wolf. The researchers had no idea why, so they rebuilt the AI platform to render out the parts of the image the system was weighing, to try and understand what patterns the ML had come up with.

Again, the key here is, none of the developers or researchers had any idea what the ML model cared about in this image, even though they built and trained the system. That is how machine learning works.

Suppose you are the one trying to decide if this is a wolf or a dog. What would you look for? Probably the eyes and ears, maybe the fur pattern. The collar.

After the changes to the code, they fed the image into the AI, and it returned this:

The algorithm the ML system had developed for itself totally excluded the dog from the image. No programmer would ever have done this, but again, that’s now how ML works. It turned out, the AI had instead correlated the presence of snow with a wolf, and therefore was categorizing all pictures with snow as wolves. It was because of a small bias in the training data which researchers had not noticed.

Recognizing dogs and wolves is a pretty low stakes situation, but it underscores the dangers. We never know exactly what the model might pick up on, or how it might interpret. If you want to train a system to understand safety and morality, you wager a great deal in hoping your model happens to converge on a thorough and nuanced understanding of the subject that is compatible with our own understanding. Imagine this wolf/dog issue extrapolated onto that space… what could a gap like this allow? And once we realize such a gap exists, will we be able to do anything about it? Our dependency on AI systems is already growing and will one day be fundamental to everything we do. Turning the system off may be infeasible. It could be like trying to turn off the internet… how could you?

Overall

That is my rant for the day. These are real concerns, but they are not immediate ones. I think voices like Elon Musk are possibly diverting attention from the areas that need the research, since the potential threats from AI are still too far off to warrant so much woe.

That said, this is not a subject that should be taken lightly. Nor is it a subject about which people should have bad assumptions. AI is very powerful and can do a great deal of good for our society, but it is also one of the first technologies that can run away from us in an instant, never to be recaptured. Unlike nuclear weapons or climate change, AI is the first man-driven threat which can act on it’s own, without human intervention and without a human able to audit what or how it is thinking.

Getting this right is essential.