When I first studied artificial intelligence in the 1980s, my lecturers assumed that the most important property of intelligence was the ability to reason, and that to program a computer to perform intelligently you would have to enable it to apply logic to large bodies of facts. Logic is used to make inferences. If you have a general rule, such as ‘All men are mortal,’ and a specific fact, ‘Socrates is a man,’ you, or your computer, can deduce that Socrates is mortal. But it turns out that many of the problems we want intelligent computers to help us with can’t straightforwardly be solved with logic. Some of them – the ability to recognise faces, for example – don’t involve this kind of reasoning. In other cases – the diagnosis of disease would be an example from my own field – the difficulty lies in how to describe the concepts that the rules and facts express. The problem is often seen as a matter of how to standardise terminology. If you want a doctor’s computer to use rules to infer what is wrong with a patient, these rules must be expressed using the same words as the ones used in the patient’s records to describe their symptoms. Huge efforts are made to constrain the vocabulary used in clinicians’ computer systems, but the problem goes deeper than that. It isn’t that we can’t agree on the words: it’s that there aren’t always well-defined concepts to which the words can be attached. In The Promise of Artificial Intelligence, Brian Cantwell Smith tried to explain this by comparing a map of the islands in Georgian Bay in Ontario with an aerial photograph showing the islands along with the underwater topography. On the map, the islands are clearly delineated; in the photograph it’s much harder to say where each island ends and the sea begins, or even exactly how many islands there are. There is a difference between the world as we perceive it, divided into separate objects, and the messier reality. We can use logic to reason about the world as described on the map, but the challenge for AI is how to build the map from the information in the photograph.
Given the extent of the paradigm shift in AI research since 1980, you might think the debate about how to achieve AI had been comprehensively settled in favour of machine learning. But although its algorithms can master specific tasks, they haven’t yet shown anything that approaches the flexibility of human intelligence. It’s worth asking whether there are limits to what machine learning will be capable of, and whether there is something about the way humans think that is essential to real intelligence and not amenable to the kind of computation performed by artificial neural networks. The cognitive scientist Gary Marcus is among the most prominent critics of machine learning as an approach to AI. Rebooting AI, written with Ernest Davis, is a rallying cry to those who still believe in the old religion.
The concept of causality is central to this debate because we are active participants in the world as computers are not. We observe the consequences of our interventions and, from an early age, understand the world in terms of causes and effects. Machine learning algorithms observe correlations among the data provided to them, and can make astonishingly accurate predictions, but they don’t learn causal models and they struggle to distinguish between coincidences and general laws. The question of how to infer causality from observations is, however, an issue not just for AI, but for every science, and social science, that seeks to make inferences from observational rather than experimental data.
This is a question that Judea Pearl has been working on for more than thirty years. During this time, he and his students have, as Dominic Cummings’s eccentric Downing Street job advert put it, ‘transformed the field’. In the 1980s, it seemed to some researchers, including Pearl, that because one characteristic of intelligence was the ability to deal with uncertainty, some of the problems that couldn’t be tackled with logic could possibly be solved using probability. But when it comes to combining large numbers of facts, probability has one huge weakness compared to logic. In logic, complex statements are made up of simpler ones which can be independently proved or disproved. It is harder to deal with complex probabilities. You can’t work out the probability of someone having both heart disease and diabetes from the separate probabilities of their having diabetes or heart disease: you need to know how the likelihood of having one affects the likelihood of having the other. This quantity – the probability of something happening given that something else has already happened – is known as a conditional probability. The main difficulty in using probability is that even a modest increase in the number of concepts to be considered generates an explosive increase in the number of conditional probabilities required.
The original article can be found here.
In the field of AI application with Causality, Professor Judea Pearl is a distinguished pioneer for developing a theory of causal and counterfactual inference based on structural models. In 2011, Professor Pearl won the Turing Award. In 2020, Michael Dukakis Institute also awarded Professor Pearl as World Leader in AI World Society (AIWS.net). At this moment, Professor Judea is a Mentor of AIWS.net and Head of Modern Causal Inference section, which is one of important AIWS.net topics on AI Ethics to develop a positive AI applications for a better world society.