D-Separation#

Algorithm for testing d-separation in DAGs.

d-separation is a test for conditional independence in probability distributions that can be factorized using DAGs. It is a purely graphical test that uses the underlying graph and makes no reference to the actual distribution parameters. See [1] for a formal definition.

The implementation is based on the conceptually simple linear time algorithm presented in [2]. Refer to [3], [4] for a couple of alternative algorithms.

Here, we provide a brief overview of d-separation and related concepts that are relevant for understanding it:

Blocking paths#

Before we overview, we introduce the following terminology to describe paths:

  • “open” path: A path between two nodes that can be traversed

  • “blocked” path: A path between two nodes that cannot be traversed

A collider is a triplet of nodes along a path that is like the following: ... u -> c <- v ...), where ‘c’ is a common successor of u and v. A path through a collider is considered “blocked”. When a node that is a collider, or a descendant of a collider is included in the d-separating set, then the path through that collider node is “open”. If the path through the collider node is open, then we will call this node an open collider.

The d-separation set blocks the paths between u and v. If you include colliders, or their descendant nodes in the d-separation set, then those colliders will open up, enabling a path to be traversed if it is not blocked some other way.

Illustration of D-separation with examples#

For a pair of two nodes, u and v, all paths are considered open if there is a path between u and v that is not blocked. That means, there is an open path between u and v that does not encounter a collider, or a variable in the d-separating set.

For example, if the d-separating set is the empty set, then the following paths are unblocked between u and v:

  • u <- z -> v

  • u -> w -> … -> z -> v

If for example, ‘z’ is in the d-separating set, then ‘z’ blocks those paths between u and v.

Colliders block a path by default if they and their descendants are not included in the d-separating set. An example of a path that is blocked when the d-separating set is empty is:

  • u -> w -> … -> z <- v

because ‘z’ is a collider in this path and ‘z’ is not in the d-separating set. However, if ‘z’ or a descendant of ‘z’ is included in the d-separating set, then the path through the collider at ‘z’ (… -> z <- …) is now “open”.

D-separation is concerned with blocking all paths between u and v. Therefore, a d-separating set between u and v is one where all paths are blocked.

D-separation and its applications in probability#

D-separation is commonly used in probabilistic graphical models. D-separation connects the idea of probabilistic “dependence” with separation in a graph. If one assumes the causal Markov condition [5], then d-separation implies conditional independence in probability distributions.

Examples#

>>>
>>> # HMM graph with five states and observation nodes
... g = nx.DiGraph()
>>> g.add_edges_from(
...     [
...         ("S1", "S2"),
...         ("S2", "S3"),
...         ("S3", "S4"),
...         ("S4", "S5"),
...         ("S1", "O1"),
...         ("S2", "O2"),
...         ("S3", "O3"),
...         ("S4", "O4"),
...         ("S5", "O5"),
...     ]
... )
>>>
>>> # states/obs before 'S3' are d-separated from states/obs after 'S3'
... nx.d_separated(g, {"S1", "S2", "O1", "O2"}, {"S4", "S5", "O4", "O5"}, {"S3"})
True

References#

[1]

Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.

[2]

Darwiche, A. (2009). Modeling and reasoning with Bayesian networks. Cambridge: Cambridge University Press.

[3]

Shachter, R. D. (1998). Bayes-ball: rational pastime (for determining irrelevance and requisite information in belief networks and influence diagrams). In , Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (pp. 480–487). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

[4]

Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques. The MIT Press.

d_separated(G, x, y, z)

Return whether node sets x and y are d-separated by z.