Professor Judea Pearl, Chancellor’s Professor of UCLA, father of Bayesian Networks, Cause and Effect, Turing Award, and World Leader in AIWS, and Mentor of AIWS.net, introduces: “Another exciting paper arriving at my desk reads: Causal Relational Learning which promises to revolutionize causal inference the same way first-order predicate logic has transformed Boolean logic.”
Causal Relational Learning
Babak Salim (University of Washington), Harsh Parikh (Duke University), Moe Kayali (University of Washington), Sudeepa Roy (Duke University), Lise Getoor (University of California at Santa Cruz), and Dan Suciu (University of Washington)
Abstract
Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials; unfortunately, these are not always feasible due to ethical, legal, or cost constraints. As an alternative, methodologies for causal inference from observational data have been developed in statistical studies and social sciences. However, existing methods critically rely on restrictive assumptions such as the study population consisting of homogeneous elements that can be represented in a single flat table, where each row is referred to as a unit. In contrast, in many real-world settings, the study domain naturally consists of heterogeneous elements with complex relational structure, where the data is naturally represented in multiple related tables. In this paper, we present a formal framework for causal inference from such relational data. We propose a declarative language called CaRL for capturing causal background knowledge and assumptions and specifying causal queries using simple Datalog-like rules. CaRL provides a foundation for inferring causality and reasoning about the effect of complex interventions in relational domains. We present an extensive experimental evaluation on real relational data to illustrate the applicability of CaRL in social sciences and healthcare.
Conclusions and Future Work
We introduced the Causal Relational Learning framework for performing causal inference on relational data. This framework allows users to encode background knowledge using a declarative language called CaRL (Causal Relational Language) using simple Datalog-like rules, and ask various complex causal queries on relational data. CaRL isdesigned for researchers and analysts with a social science, healthcare, academic or legal background who are interested inferring causality from a complex relational data. CaRL adds on to existing causal inference literature by relaxing the unit-homogeniety assumption and allowing the confounders, treatment units and outcome units to be of different kinds.
We evaluated CaRL’s completeness and correctness on real-world and synthetic data from academic and healthcare domains. CaRL is successfully able to recover the treatment effects for complex causal queries that may require multiple joins and aggregates.
In future, we aim to extend CaRL to deal with complex cyclic causal dependencies using stationary distribution of stochastic processes. We plan to study stochastic interventions and complex interventions on relational skeletons, which are assumed to be fixed in this paper. We also plan a theoretical and methodological study the functionality of different types of embeddings. We aim to develop principled learning approach for finding efficient embeddings using graph representation learning and graph embedding.
Recently, it has been shown that causality is foundational to the emerging field of algorithmic fairness [52]. In future work we plan to use causal relational learning to study a causality-based framework for fairness and discrimination in relational domains.
The paper can be found here.