Category Archives: Logic of Discovery

One of these things is not like the others

Consider the three plots below:

pop_growth_fig1

What you're looking at is simulated, noisy data describing the growth of three biological populations over time (population size is shown on the vertical access with a shared scale, and time on the horizontal). One of those populations is governed by a dynamics distinct from that which governs the other two.

That last claim requires a little clarification. Roughly speaking, I mean that the way one of those systems evolves is described by a differential equation with a different form from that governing the others. A little more precisely, two of those systems share the same dynamical symmetries. A dynamical symmetry is, in this case, a change in population that commutes with its evolution through time. That is, it makes no difference whether you intervene and transform the population and then let it grow, or let it grow and then transform the population. Two and only two of these three populations share the same set of dynamical symmetries. Why is the sharing of dynamical symmetries an interesting criterion of sameness? Why are the categories or kinds picked out this way important? Because categories of this sort are 'natural kinds' in that they support induction -- many features of one member generalize to the others (see this paper for a full discussion and careful definitions of the terms used above). I won't give much of an argument here except to point out that lots of the most important scientific kinds are kinds of this sort: orbital systems, first-order chemical reactions, quasi-isolated mechanical systems are all kinds of this sort, and all central theoretical categories in scientific practice. If we want to do science in a new domain of phenomena, we want to identify such categories to study.

This raises an interesting question: Can we find natural kinds of this sort prior to having a theoretical understanding of a domain? Can we spot the categories directly and use them to focus the inquiry that lets us build fully predictive or explanatory theories? In answer to that question, consider the plots below:

pop_growth_fig2

The coloring reflects the categories chosen by EUGENE, an algorithm for automated discovery of natural kinds (see this post). EUGENE groups the first and third into the same kind. And this is in fact correct. The model used to simulate the leftmost and rightmost systems is the classic "logistic equation":

The only difference is that the growth rate, r is much lower in the rightmost system.

The middle system, on the other hand, the one that EUGENE marked in green, is described by the following equation:

Taken together, these systems exemplify just two varieties of a large family of models of interest to biologists. They are of interest in large part because it's so hard to tell which is correct. That is, it is remarkably difficult to determine experimentally whether a system is described by one or another set of parameters in the general equation:

And yet, accurately and reliably, with no prior knowledge or explicit hypotheses about the governing dynamics, EUGENE can sort them one from another! I think that's a pretty neat trick.

The EUGENE Project

In the spring of 2015, I was lucky enough to receive a NSF CAREER award for a project entitled "Automated scientific discovery and the philosophical problem of natural kinds." The aim of this project is to develop a new approach to automated scientific discovery based on the theory of natural kinds -- in the sense of projectible kinds -- that I've been elaborating for a while (see this paper). More specifically, the aim over the next five years is to produce algorithms that sort dynamic causal systems into natural kinds as well as algorithms that construct novel variables useful for finding law-like causal relations and additional kinds. These algorithms are intended to be pit directly against the real world; from the outset they are being developed to communicate with physical systems via sensors and actuators rather than confronted with data that has been preprocessed by a human.

Since the grant is a CAREER award, it funds extensive education and outreach components as well. I am excited to be offering a two-week graduate summer school in "Philosophy & Physical Computing" in July of 2016. I will also be putting on a two-day "Robot Scientist" event for middle school students that will be hosted at the Science Museum of Western Virginia.

I and my group of student researchers have already gotten some promising prototypes of the classifier algorithm -- an algorithm that finds kinds -- to work. And I've given the project a new name. I've begun calling the entire collection of automated discovery algorithms under development "EUGENE", largely in honor of Eugene Wigner whose ideas were influential in shaping the theory of natural kinds being implemented (hence the title of this post).

In the next few posts, I'll explain the basic algorithm for kind discovery and why one might expect it to uncover useful categories. For now, in order to give a little more of an overview of the project, I'll provide the summary from my grant proposal:

CAREER: Automated scientific discovery and the philosophical problem of natural kinds

In the course of everyday research, scientists are confronted with a recurring problem: out of all the empirical quantities related to some phenomenon of interest, to which should we pay attention if we are to successfully discover the regularities or laws behind the phenomenon? For most ways of carving up the observable world with a choice of theoretical variables, no tractable patterns present themselves. It is only a special few that are 'projectible', that allow us to accurately generalize from a few particular facts to a great many not in evidence. And yet in the course of their work, scientists efficiently choose variables that support generalization. This presents a puzzle, the epistemic version of the philosophical problem of `natural kinds': how we can know in advance which choices of variables are projectible. This project will clarify and test a new approach to solving this puzzle---the Dynamical Kinds Theory (DKT) of natural kinds---by constructing a series of computer algorithms that automatically carry out a process of variable choice in the service of autonomous scientific discovery. The inductive success of these algorithms when applied to genuine problems in current scientific settings will serve as tangible validation of the philosophical theory.

This project connects the philosophical problem of natural kinds with computational problems of automated discovery in artificial intelligence. It tests the DKT by deriving discovery algorithms from that theory's normative content, and then applying these algorithms to real-world phenomena. Successful algorithms imply that in fact the DKT at least captures an important subclass of the projectible kinds. More dramatically, these discovery algorithms have the potential to produce more than one equally effective but inconsistent classification of phenomena into kinds. The existence of such alternatives plays a central role in debates over scientific realism.

The automated discovery algorithms produced will be leveraged to introduce a generation of graduate students in philosophy and science to the deep connections between physical computing and philosophical epistemology. A recurring summer school will train graduate students in basic programming and formal epistemology, with hands on development of automated discovery systems. Each summer school will culminate in a two-day outreach event at which the graduate students will assist a diverse group of area secondary school children in building their own `robot scientist'. Students and teachers completing the summer school or outreach programs will leave with their own mini-computers configured for developing their own approaches to discovery. Outside of philosophy, the application of the discovery algorithms to open problems in areas of ecology, evolution, metagenomics, metabolomics, and systems biology has the potential to suggest previously unconceived theories of the fundamental ontology in these fields. In particular, the algorithms will be applied to agent-based models of evolutionary dynamics to search for population-level laws, and to publicly available long-term ecological data to search for stable dynamical kinds outside the standard set of ecological categories.