# Paley the Plagiarist? (by Glenn Branch)

The example of the watch in William Paley’s Natural Theology (1802) is famous. A stone found on a heath, Paley explains, seems not to require any explanation, but a watch, with its component parts apparently designed to perform a function, demands to be explained, and explained, moreover, in terms of a designer. And the same is true, he argues at length, of living things. Although Paley is sometimes credited with the example of the watch, it is, I think, generally recognized that he was only the latest in a long string of writers to use horology in the service of natural theology: Cicero, in the first century BCE, similarly appealed to sundials and water-clocks in De natura deorum. So Paley wasn’t original. But was he a plagiarist?

It is among the virtues of Benjamin C. Jantzen’s An Introduction to Design Arguments (2014) that he raises such a provocative question (in a section of a chapter appropriately entitled “Loose Ends”). The question is not original to him: it was broached by the pseudonymous Verax, writing in The Athenaeum in 1848. (Jantzen mistakenly attributes the question to The Book of Days, a collection of chronologically arranged trivia edited by Robert Chambers—the same Robert Chambers who wrote the anonymous protoevolutionary Vestiges of the Natural History of Creation [1844]—as a result of having misread the publication date of the edition he consulted as 1833. Curiously, Yujin Nagasawa makes the same mistake in The Existence of God [2011].)

Verax, by the way, was the pseudonym of Robert Blakey (1795–1878), who proves to have been a character in his own right. A hatmaker-turned-pamphleteer, he turned to philosophy in his thirties, publishing An Essay on Moral Good and Evil (1831), History of Moral Science (1833), and History of the Philosophy of Mind (1848). His output was impressive enough that he was appointed as professor of logic and metaphysics at Queen’s College, Belfast, in 1849, but his ill health led to his dismissal in 1851. (A funny coincidence, by the way: a prolific author, Blakey published more than one manual on fishing. As it happens, Paley was also a keen angler, and was reportedly delayed in finishing Natural Theology by his indulgence in fly-fishing.)

Anyhow, Verax raised the question of Paley’s plagiarism and answered it as well: Paley’s Natural Theology, he charged, was nothing more than a “mere running commentary” on Bernard Nieuwentyt’s The Religious Philosopher (1718), originally published in Dutch as Het regt gebruik der werelt beschouwingen, ter overtuiginge van ongodisten en ongelovigen (1715). Paley occasionally cited, so presumably read, Nieuwentyt, and it was previously remarked that he seemed to be indebted to him. The earliest such remark I found was Samuel Charles Wilks’s “Comparative View of Natural and Revealed Religion” (1817), which mentions “The ‘Religious Philosopher’ of Nieuwentyt, for example, from whom Paley seems to have derived several of his most valuable hints.”

But Verax argued in detail that not just the example of the watch but the overall organization of Natural Theology was borrowed from The Religious Philosopher, providing parallel columns with excerpts from both works showing their similarity. Verax was quite harsh about Paley’s behavior, accusing him of acting “with great unfairness, and in flagrant violations of the literary moralities.” Jantzen seems inclined to credit at least part of Verax’s accusation, writing, “I will take it as a given that, in fact, Paley passed off as his own (or, more charitably, ‘reworked’) an argument that he took from Nieuwentyt.” While I agree that it’s clear that Natural Theology contains reworked arguments that Paley found elsewhere, I am not entirely convinced by Verax.

The reason I’m not convinced is that there are credible accusations involving different sources. In his A Discourse of Natural Theology (1835), Henry Brougham described Natural Theology as “chiefly taken from the writings of [William] Derham [such as Physico-Theology (1713)], deriving from them its whole plan and much of its substance, but clothing the harsher statements of his original in an attractive and popular style,” while in 1829 Richard Watson suggested, “The basis, and indeed the plan,” of Paley’s book “are found in the third and following chapters of [John] HOWE’s Living Temple (1675–1702). So a detailed comparative examination of the sources available to Paley would be necessary to identify whether there was a unique source and if so what it was.

There is, I think, no plausible alternative to acknowledging Paley’s debt to his predecessors, whichever of them are deemed to have been the most important sources. Defenses of Paley’s use of his sources seem to adopt either or both of two approaches. The first approach is to appeal to Paley’s pedagogical aims, arguing that he aimed to present the arguments with vigor and clarity, but not originality, in a way acceptable in his day. In his article on Paley in the ninth edition of the Encyclopedia Britannica, for example, the philosopher Andrew Seth (later Pringle-Pattison) suggested, “In the case of a writer whose chief merit is the way in which he has worked up existing materials, a general charge of plagiarism is almost irrelevant.”

But that might be hard to maintain, considering that Paley was accused of plagiarism at least twice during his life. In a booklet entitled The Young Christian Instructed in Reading and the Principles of Religion (1790), Paley included material from a spelling book without the permission of its author, then unknown to him. The author, a J. Robertson, complained about Paley’s plagiarism in The Gentleman’s Magazine, eliciting a defensive answer from Paley. Similarly, the anonymous Letters to William Paley (1796) accused Paley of using “the sentiments of former writers, frequently copied literally, and always without acknowledgment,” citing passages from John Locke and William Blackstone reworked in Paley’s Evidences of Christianity (1794).

The second approach is to claim that Paley’s use of his source materials was transformative, resulting in a version of the argument from design particularly apt for the ongoing Industrial Revolution. Such was the historian Neal C. Gillespie’s approach in “Divine Design and the Industrial Revolution: William Paley's Abortive Reform of Natural Theology” (1990):

Paley introduced one of the exceptional novelties to be found in British natural theology prior to its eclipse after the appearance of Darwinism. … insofar as animals are constructed on mechanical principles, to that extent (but certainly no farther) they are machines and not merely analogous to machines. … Moreover, he illustrated this argument with examples drawn from the Industrial Revolution in order to reach those involved in that disruptive socioeconomic enterprise.

But a historian, concerned with the contexts in which Natural Theology was composed and received, would find such considerations more relevant to the question of Paley’s originality, than a philosopher, concerned with the arguments of the book, would, so perhaps Jantzen would be unimpressed by such a defense. In any case, I’m grateful for the impetus to have researched and thought about the question!

# Mysterious swimming spider

I recently read a fascinating post about fishing spiders on SpiderBytes (a fun and informative blog full of excellent photos). The post reminded me of a curious spider I observed once in Maine, swimming on the surface of a lake. The spider stood on its outer four legs, and stroked the inner two pairs together like oars in order to move forward. I managed to locate my old field notes from that trip, and I'm including them here in case anyone has any idea of what sort of spider this is.

# One of these things is not like the others

Consider the three plots below:

What you're looking at is simulated, noisy data describing the growth of three biological populations over time (population size is shown on the vertical access with a shared scale, and time on the horizontal). One of those populations is governed by a dynamics distinct from that which governs the other two.

That last claim requires a little clarification. Roughly speaking, I mean that the way one of those systems evolves is described by a differential equation with a different form from that governing the others. A little more precisely, two of those systems share the same dynamical symmetries. A dynamical symmetry is, in this case, a change in population that commutes with its evolution through time. That is, it makes no difference whether you intervene and transform the population and then let it grow, or let it grow and then transform the population. Two and only two of these three populations share the same set of dynamical symmetries. Why is the sharing of dynamical symmetries an interesting criterion of sameness? Why are the categories or kinds picked out this way important? Because categories of this sort are 'natural kinds' in that they support induction -- many features of one member generalize to the others (see this paper for a full discussion and careful definitions of the terms used above). I won't give much of an argument here except to point out that lots of the most important scientific kinds are kinds of this sort: orbital systems, first-order chemical reactions, quasi-isolated mechanical systems are all kinds of this sort, and all central theoretical categories in scientific practice. If we want to do science in a new domain of phenomena, we want to identify such categories to study.

This raises an interesting question: Can we find natural kinds of this sort prior to having a theoretical understanding of a domain? Can we spot the categories directly and use them to focus the inquiry that lets us build fully predictive or explanatory theories? In answer to that question, consider the plots below:

The coloring reflects the categories chosen by EUGENE, an algorithm for automated discovery of natural kinds (see this post). EUGENE groups the first and third into the same kind. And this is in fact correct. The model used to simulate the leftmost and rightmost systems is the classic "logistic equation":

$\dot{x}=rx(1-\frac{x}{K})$

The only difference is that the growth rate, r is much lower in the rightmost system.

The middle system, on the other hand, the one that EUGENE marked in green, is described by the following equation:

$\dot{x}=rx^{0.7}(1-\left(\frac{x}{K}\right)^{2.5})^2$

Taken together, these systems exemplify just two varieties of a large family of models of interest to biologists. They are of interest in large part because it's so hard to tell which is correct. That is, it is remarkably difficult to determine experimentally whether a system is described by one or another set of parameters $\alpha,\beta,\gamma$ in the general equation:

$\dot{x}=rx^{\alpha}(1-\left(\frac{x}{K}\right)^{\beta})^{\gamma}$

And yet, accurately and reliably, with no prior knowledge or explicit hypotheses about the governing dynamics, EUGENE can sort them one from another! I think that's a pretty neat trick.

# The EUGENE Project

In the spring of 2015, I was lucky enough to receive a NSF CAREER award for a project entitled "Automated scientific discovery and the philosophical problem of natural kinds." The aim of this project is to develop a new approach to automated scientific discovery based on the theory of natural kinds -- in the sense of projectible kinds -- that I've been elaborating for a while (see this paper). More specifically, the aim over the next five years is to produce algorithms that sort dynamic causal systems into natural kinds as well as algorithms that construct novel variables useful for finding law-like causal relations and additional kinds. These algorithms are intended to be pit directly against the real world; from the outset they are being developed to communicate with physical systems via sensors and actuators rather than confronted with data that has been preprocessed by a human.

Since the grant is a CAREER award, it funds extensive education and outreach components as well. I am excited to be offering a two-week graduate summer school in "Philosophy & Physical Computing" in July of 2016. I will also be putting on a two-day "Robot Scientist" event for middle school students that will be hosted at the Science Museum of Western Virginia.

I and my group of student researchers have already gotten some promising prototypes of the classifier algorithm -- an algorithm that finds kinds -- to work. And I've given the project a new name. I've begun calling the entire collection of automated discovery algorithms under development "EUGENE", largely in honor of Eugene Wigner whose ideas were influential in shaping the theory of natural kinds being implemented (hence the title of this post).

In the next few posts, I'll explain the basic algorithm for kind discovery and why one might expect it to uncover useful categories. For now, in order to give a little more of an overview of the project, I'll provide the summary from my grant proposal:

## CAREER: Automated scientific discovery and the philosophical problem of natural kinds

In the course of everyday research, scientists are confronted with a recurring problem: out of all the empirical quantities related to some phenomenon of interest, to which should we pay attention if we are to successfully discover the regularities or laws behind the phenomenon? For most ways of carving up the observable world with a choice of theoretical variables, no tractable patterns present themselves. It is only a special few that are 'projectible', that allow us to accurately generalize from a few particular facts to a great many not in evidence. And yet in the course of their work, scientists efficiently choose variables that support generalization. This presents a puzzle, the epistemic version of the philosophical problem of natural kinds': how we can know in advance which choices of variables are projectible. This project will clarify and test a new approach to solving this puzzle---the Dynamical Kinds Theory (DKT) of natural kinds---by constructing a series of computer algorithms that automatically carry out a process of variable choice in the service of autonomous scientific discovery. The inductive success of these algorithms when applied to genuine problems in current scientific settings will serve as tangible validation of the philosophical theory.

This project connects the philosophical problem of natural kinds with computational problems of automated discovery in artificial intelligence. It tests the DKT by deriving discovery algorithms from that theory's normative content, and then applying these algorithms to real-world phenomena. Successful algorithms imply that in fact the DKT at least captures an important subclass of the projectible kinds. More dramatically, these discovery algorithms have the potential to produce more than one equally effective but inconsistent classification of phenomena into kinds. The existence of such alternatives plays a central role in debates over scientific realism.

The automated discovery algorithms produced will be leveraged to introduce a generation of graduate students in philosophy and science to the deep connections between physical computing and philosophical epistemology. A recurring summer school will train graduate students in basic programming and formal epistemology, with hands on development of automated discovery systems. Each summer school will culminate in a two-day outreach event at which the graduate students will assist a diverse group of area secondary school children in building their own robot scientist'. Students and teachers completing the summer school or outreach programs will leave with their own mini-computers configured for developing their own approaches to discovery. Outside of philosophy, the application of the discovery algorithms to open problems in areas of ecology, evolution, metagenomics, metabolomics, and systems biology has the potential to suggest previously unconceived theories of the fundamental ontology in these fields. In particular, the algorithms will be applied to agent-based models of evolutionary dynamics to search for population-level laws, and to publicly available long-term ecological data to search for stable dynamical kinds outside the standard set of ecological categories.

# James Hutton's epistemological treatise

I present here copies of all three volumes of James Hutton's, An Investigation of the Principles of Knowledge. I will have much more to say on this work later. For now, I'll note that it is in these volumes that Hutton explicitly lays out the epistemological principles that ostensibly motivated his revolutionary uniformitarian views in geology. This work has for a long time been difficult to obtain in a format suitable for academic use. By posting these PDF copies, I am hoping to offer other scholars the chance to pay the work its due attention. If you find errors in the files or problems with specific pages, please let me know. I'll do my best to fix them.

Volume 1

Volume 2

Volume 3

Attribution: The work reproduced in these files is in the public domain. These PDFs were created using images provided by the Albert and Shirley Small Special Collections Library of the University of Virginia (see the first page of each file for a citation and link to the source images).

# Some observations on the problem of conceptual novelty in automated discovery

Following a recent conversation with Richard Burian, I realized that both of us had assumed that a necessary if not sufficient condition for a new scientific variable to represent a genuinely novel concept is for it to allow a finer partitioning of possible states of the world than was previously possible. The idea is intuitively plausible. If I posit the existence of a new variable property of material bodies, then I can discriminate more possible states of the world. If, for instance, I posit the existence of an internal resistance for wires, then states that were previously indistinguishable when described in terms of current and voltage in a wire, are now potentially distinguishable on the basis of resistance. If I posit the existence of new kind of particle, then it seems I have recognized a greater variety of possible worlds. Corresponding to what were previously unique states of the world are now many possible states in which the new particles assume various positions, velocities, and so on. Recognizing a genuinely novel property (or class of properties) seems to entail admitting a finer-grained view of the world. But I'm no longer convinced that's the case.

Before I explain why I'm unconvinced, let me back up and explain the question at issue and where it came from. Since the heyday of logical positivism, the consensus in mainstream philosophy of science is that there does not exist a "logic of discovery", a method for mechanically generating significant scientific hypotheses. The only serious argument to this effect turns on the notion of conceptual novelty. The key premise is that no algorithmic process can introduce variables (or associated concepts) that were not already present in the presentation of the data or observations for which we are seeking an explanatory hypothesis. So, for instance, Hempel (1966, p14) claimed that one cannot "...provide a mechanical routine for constructing, on the basis of the given data, a hypothesis or theory stated in terms of some quite novel concepts, which are nowhere used in the description of the data themselves." Laudan echoed the sentiment a couple of decades later. He conceded that, while machines can certainly carry out algebra and curve-fitting, the essence of scientific discovery is the introduction of explanatory theories "...some of whose central concepts have no observable analogue" (Laudan, 1981, p186). Though he makes no explicit argument to this effect, he takes it as obvious that no effective procedure could introduce the sorts of concepts far removed from observation that are at the heart of modern theories.

How much of a stumbling block for automated discovery is the required sort of novelty? That's rather difficult to answer without a more substantive account of conceptual novelty. However, Hempel's syntactic characterization suggests a plausible necessary condition that Laudan would presumably endorse: a novel class of variables represents a novel concept just if the values of that variable are not functions of preexisting variables. Thus, if you already have concepts of mass and velocity, adding momentum or kinetic energy (both of which are defined as simple functions of mass and velocity) doesn't really introduce conceptual novelty. However, introducing a new variable m to represent a heretofore unacknowledged property of inertial mass into a theory involving only position and velocity is a sort of conceptual novelty.

Interestingly, introducing properties like inertial mass into theories previously lacking them is the sort of conceptual invention that automated discovery algorithms were capable of by the end of the decade in which Laudan wrote. I'm thinking specifically of third program in the BACON lineage developed by Herb Simon, Pat Langley, Gary Bradshaw, and Jan Zytow (1987). If we take the above condition as genuinely necessary for conceptual novelty, then BACON.3 is at least a counterexample to the claim that the condition cannot be met by an algorithm. It does in fact introduce an inertial mass when given data from experiments with springs, and it introduces a variable for resistance when examining currents in various circuits. Of course, you might just take this as an indication that the proposed condition for conceptual novelty is not sufficient. That's not an argument I want to take up this time.

What I do want to do is scrutinize the notion that positing a novel concept must somehow increase the number of possible worlds we recognize. In the sense of logical possibility, the new variables allow a finer partitioning of the world. Equivalently, they are not functions of existing variables. But if their introduction is well-motivated, it seems that enough of the additional logical possibilities are nomologically precluded that the number of ways the world might be remains the same. To see what I mean, it will help to consider in a little detail how BACON.3 introduces a variable. Consider the following table of data (adapted from figure 4.1 in (Langley, et al, 1987)):

Battery Wire Current (I) Conductance (c) Voltage (v)
A X 3.4763 3.4763 1.0000
A Y 4.8763 4.8763 1.0000
A Z 3.0590 3.0590 1.0000
B X 3.9781 3.4763 1.1444
B Y 5.5803 4.8763 1.1444
B Z 3.5007 3.0590 1.1444
C X 5.5629 3.4763 1.6003
C Y 7.8034 4.8763 1.6003
C Z 4.8952 3.0590 1.6003

BACON begins with the first three columns of data. Letters label distinct wires and batteries. The only variable measured is current, which is represented by a real number. Upon examining the first three rows of the table (corresponding to the same battery but different wires), BACON notes that current varies from wire to wire. The next step of the algorithm is, practically speaking, driven by the fact that BACON cannot relate non-numerical variables (e.g., the identifiers for distinct wires) to numerical variables. But we might give it a rather plausible methodological interpretation: if a variable changes from one circumstance to the next -- in this case, from one wire to the next -- it is reasonable to suppose that there exists a hidden, causally salient property which varies from wire to wire. Let's call that property conductance, and assume that it can be represented by a real number as well.

Following this maxim, BACON introduces a new variable whose values are shown in the third column. How were these values determined? As is clear from the table, BACON assigns a conductance equal to the values of the previously known variable, current. The authors don't discuss this procedure much, but it is a simple way to ensure that the new variable explains the old in the sense that there is a unique conductance value for each resulting current.

So far, it's not clear that the "new" variable is very informative or novel. But things get interesting when we get to the next three rows of the table. Since each wire was already assigned a value for conductance, BACON uses those values again, and notes that for battery B, the conductance and the current are proportional to one another. Unlike the case for battery A, however, the constant of proportionality is now 1.1444. Similarly, for the last three rows (corresponding to battery C), BACON finds that conductance and current are related by a slope of 1.6003. How to explain this variation? Posit a new variable! This time, we suppose there is a property of batteries (the voltage) that explains the variation, and we assign values identical to the slopes in question. If we note that conductance is the reciprocal of resistance, we can see that BACON has just 'discovered' Ohm's law of resistance: I = v / r. Of course, that relation is tautological if we consider only the data on hand. But treated as a generalization, it is quite powerful and most definitely falsifiable. We might, for instance, find that a new wire, D, has a conductance of c as determined using battery B. But when connected to battery A, the new wire could show a current not equal in value to c. This would violate Ohm's law.

There are two lessons to draw from the procedure described above. First, it sure seems like positing previously unconsidered intrinsic properties like conductance and voltage amount to producing novel theoretical concepts. Thus, it looks as though there is no real barrier to the algorithmic production of novelty, and the objections of Hempel, Laudan, and others are simply misguided. Second, the introduction of a novel concept does not entail recognizing a greater diversity of possible worlds, at least not in every sense. It is certainly the case that if we assume that a newly introduced variable can take on any value consistent with its representation (e.g., any real number), then as a matter of logical possibility, we have considered a finer partitioning of states of the world -- there are more ways the world might be for which we can provide mutually exclusive descriptions. But these logical possibilities are, as a rule, moot. The whole reason for introducing a novel variable is to explain previously unexplained variation. That means that a variable is likely to enter scientific consideration already bound up in a nomic relation with other variables. That law-like relationship precludes many logical possibilities. In fact, in cases like Ohm's law, those relationships will be such as to permit all only those states of the world we already recognized as possible in terms of known variables.

Note that I am not suggesting there is no way to introduce new variables that allow for a finer discrimination of states of the world. It seems obvious that such a thing is possible. My point is just that it is not necessary. In fact, it seems like in most cases of scientific relevance, the new variables do not provide finer discrimination.

To sum up, variables are introduced to do a job: they are supposed to represent whatever hidden properties vary from one circumstance to the next and so explain a previously unexplained variation. But that means that they are generally introduced along with law-like relations to other variables. These relations generally (or at least often) restrict the values in such a way that no finer partitioning of the states of the world is achieved.

### Works cited

Hempel, Carl G. 1966. Philosophy of Natural Science. Prentice-Hall Foundations of Philosophy Series. Englewood Cliffs, N.J: Prentice-Hall.

Langley, Pat, Herbert A. Simon, Gary Bradshaw, and Jan M. Zytkow. 1987. Scientific Discovery: Computational Explorations of the Creative Processes. Cambridge, Mass: MIT Press.

Laudan, Larry. 1981. Science and Hypothesis. Dordrecht, Holland: D. Reidel Publishing Company.

# The Field Guide Approach to Teaching Argument Analysis

I recently gave a talk for the American Association of Philosophy Teachers (AAPT) session at the APA Eastern meeting in Philadelphia about a new approach to teaching argument analysis that I've been developing. Here's the abstract of the talk (with a link to the handout below):

Many students take only a single course in philosophy, oftentimes in argument analysis (i.e., 'introductory logic' or 'critical thinking'). I suggest that such a course should aim to make students competent consumers of the full variety of arguments they are likely to encounter across the disciplines and in their daily lives. I argue that an effective way to meet this objective is to model explicitly the process an expert uses to identify and evaluate the structure of an argument, ideally in a context of obvious relevance to the students. I demonstrate how this can be done with a 'field guide' that students use to identify argument types or inference forms as they occur in real-world settings such as position papers, scientific articles, and court decisions.

By a 'field guide', I mean an identification key like those used to identify birds or trees. In this case, a guide consists of a set of descriptions of the basic argument types along with a set of questions that guide the user to a correct identification. In using the questions to make an identification, a student is encouraged to systematically draw the distinctions that separate one argument type (or fallacy) from another (e.g., argument by analogy versus inference to the best explanation). It is widely acknowledged that making the expert process explicit is essential for mastering complex skills [1]. It is also widely acknowledged that for motivating the study and practice of complex skills, problem-based learning (PBL) is an effective tool [2]. Teaching argument analysis with a field guide lends itself naturally to a PBL format.  I share a sample course design, and invite the audience to contribute to the on-line community developing The Argument Guide, a set of open-access, open-source tools for implementing problem-based learning with a field guide.

1. Ambrose, S. A., Bridges, M., DiPietro, M., Lovett, M. & Norman, M. How learning works: seven research-based principles for smart teaching. (Jossey-Bass, 2010).
2. McKeachie, W. J. McKeachie’s teaching tips: strategies, research, and theory for college and university teachers. (Houghton Mifflin, 2006).

In case you'd like a copy of the handout I used for my presentation, you can find it here.

# Invertebrate cognition: why slugs and bugs are smarter than you think

Over the last decade or so, 'invertebrate cognition' has become a respectable phrase. Some extraordinary new experiments and theoretical discoveries about the capacities of tiny nervous systems have forced us to recognize that many or most invertebrates possess behavioral and cognitive repertoires that rival those of big-brained vertebrates. One might wonder what lessons this realization holds for the philosophy of mind. I  recently had the pleasure of speaking to the students of Walter Ott's "Animal Minds" class at the University of Virginia on just this question (you can find my presentation here). Aside from an overview of the richness of invertebrate cognitive life, I defended two theses: (i) the variety of cognitive capacities exhibited by an organism depend on factors like modularity, not size; and (ii) the use of computer metaphors often obscures the fact that there are lots of ways (structurally and algorithmically) to implement learning.

Let's start with the richness of invertebrate cognition. Most everyone has heard about the surprising capacities of octopuses to solve problems (like opening a screw-top jar to get at a yummy crab) or learning from observation. If you haven't, then check out the videos listed at the bottom of this post right away. What has gone largely unnoticed is the extent to which similar abilities have been shown to manifest throughout the animal kingdom. My personal favorites are the insects. To a first approximation, all species of multicellular animals are insects. Their diversity of form and lifestyle are breathtaking. And so too, it turns out, are their cognitive capacities. Perhaps unsurprisingly, insects (honeybees and the fruit fly, Dropsophila for instance) are capable of operant learning. (That's when an animal learns the relationship between it's own actions and the attainment of a desirable state of affairs.) More surprisingly, crickets have been shown to learn from observation (Giurfa & Menzel, 2013). Specifically, they learned to hide under leaves in the presence of wolf spiders by watching their conspecifics do so. That's definitely not a habit you can pick up through classical conditioning, since the negative reinforcement for making the wrong choice is death.

As fascinating as cricket behavior might be, honeybee cognition is where all the action is, at least so far as experiments have revealed. Honeybees are relatively big-brained for insects (they have nervous systems of about 1 million neurons). And they are capable of some very impressive things including categorization, concept formation, and processing multiple concepts at once. The first involves learning to treat discernible stimuli as the same. For instance, bees can be trained to classify images as 'flowers', 'stems', or 'landscapes' (Zhang et al, 2004). Once they've learned the categories, they will reliably group brand new images they've never encountered before into the right category. So-called 'concept formation' involves classifying things on the basis of relational properties that are independent of the physical nature of the stimulus. Bees can learn relations such as 'sameness/difference', 'above/below', and 'different from'. Honeybees can even handle two of these concepts at once, choosing, for instance, the thing which is both above and to the right (Giurfa & Menzel, 2013). But that's not the end of it. Honeybees communicate symbolically (via the famous 'waggle dance'), build complex spatial maps that let them dead-reckon their way home, and can count up to four (Chitka & Niven, 2009; Dacke & Srinivason, 2008). That last one is particularly striking.

What I find even more incredible, however, is just how simple a nervous system can be and yet exhibit rich learning capacities. Even the lowly round worm, Caenorhabditis elegans, with its meager 302 neurons, is capable of learning by classical conditioning, differential conditioning, context conditioning, and more (Rankin, 2004). For a rich overview of the current state of the art regarding the full range of studied invertebrate---from slugs to bugs---check out the book Invertebrate Learning and Memory edited by Menzel and Benjamin.

As an aside, there's more than one way a brain can be small. On the one hand, you might reduce the number of neurons, as in C. elegans. On the other, you might reduce the size of the neurons. There is currently a sub-field of neurobiology that considers the ways in which nervous systems have been miniaturized. I won't say much about this except to share an example that completely blows my mind every time I look at it. Here's a figure from Niven &  Farris (2012):

Extreme reduction in body size in an insect.
(A) SEM of an adult Megaphragma mymarip-
enne parasitic wasp. (B). The protozoan Para-
mecium caudatum for comparison. The scale
bar is 200 um. Adapted from [5].

As the caption indicates, that's a wasp next to the single-celled protozoan Paramecium. The mircographs are on the same scale!!! That wasp, which has a nervous system of about 4000 neurons, is only twice the size of a human egg cell.

Finally, what about those ubiquitous computer metaphors? I said at the beginning that we are often led astray when we speak of brains in terms of computers. Specifically, we tend to assume that a given input-output relation exhibited by a particular organism in a learning task is realized by an algorithm similar to that which we would choose to accomplish the task. Furthermore, we often assume that the algorithm will be implemented in structures closely analogous to the kinds of computational devices we are used to working with, in other words, computers with a Von Neumann architecture. I suggest that both of these assumptions are suspect.

To make the point for the students of Prof. Ott's class, I asked them to think about solving a maze. What sort of brain would we have to give a robot that can solve a modestly complex maze? Think about the tasks it would need to accomplish. It needs to control at least a couple of motors in order to propel itself and change direction. It needs some way to process sensory information in order to know when it is approaching a wall. For efficiency, it would be great if it had some sort of memory or built an explicit representation of the maze as it went along. If you want the robot to perform robustly, even more is required. It has to be able to get unstuck should it get jammed in a corner, it has to be able to detect when it has capsized and execute a procedure to right itself, and so on.

All of these consideration might lead you to believe that you need something like this:

This is a robot I've been building with my children. There's some debate about its name (Drivey versus Runny). The salient point is that it's brain is an Arduino Uno, a hobbyist board built around the ATmega328. That's a microcontroller with about 32 kilobytes of programmable memory. Some clever young inventors have put exactly this board to use in solving mazes, uploading programs that accomplish many or most of the tasks I mentioned above. Here's one example:

With these requirements for a maze solver in mind, I then performed the following demonstration (the video is not actually from the class at UVa - that's my son's hand inserting the robot in a home demonstration):

If you watched the above video, what you saw was a small toy (a Hexbug Nano) solve an ungainly Lego maze in a matter of seconds. More to the point, the toy has no brain. That is, there is no microcontroller, microprocessor, or even integrated circuit in the whole thing. The only moving part is a cell-phone vibrator. That brainless machine (which is really a random walker) can right itself if it tips over, almost never gets stuck, and---as you can see---rapidly accomplishes the task that our computer metaphors led us to believe would require significant computing power. You might object that in some sense parts of the relevant algorithm are built into the shape and mechanical properties of the toy. And you'd be right. My point is that there is more than one way to accomplish a cognitive task, and far more than one way to implement an algorithm. Think about that the next time you watch a slug ambling across a garden leaf---it's much smarter than you think.

References:

Chittka, Lars, and Jeremy Niven. 2009. “Are Bigger Brains Better?” Current Biology 19 (21): R995–1008. doi:10.1016/j.cub.2009.08.023.

Dacke, Marie, and Mandyam V. Srinivasan. 2008. “Evidence for Counting in Insects.” Animal Cognition 11 (4): 683–89. doi:10.1007/s10071-008-0159-y.

Giurfa, Martin, and Randolf Menzel. 2013. “Cognitive Components of Insect Behavior.” In Invertebrate Learning and Memory. Burlington: Academic Press.

Menzel, Randolf, and Paul Benjamin. 2013. Invertebrate Learning and Memory. Burlington: Academic Press.

Niven, Jeremy E., and Sarah M. Farris. 2012. “Miniaturization of Nervous Systems and Neurons.” Current Biology 22 (9): R323–29. doi:10.1016/j.cub.2012.04.002.

Perry, Clint J, Andrew B Barron, and Ken Cheng. 2013. “Invertebrate Learning and Cognition: Relating Phenomena to Neural Substrate.” Wiley Interdisciplinary Reviews: Cognitive Science 4 (5): 561–82. doi:10.1002/wcs.1248.

Rankin, Catharine H. 2004. “Invertebrate Learning: What Can’t a Worm Learn?” Current Biology 14 (15): R617–18. doi:10.1016/j.cub.2004.07.044.

Roth, Gerhard. 2013. “Invertebrate Cognition and Intelligence.” In The Long Evolution of Brains and Minds, 107–15. Springer Netherlands. http://link.springer.com/chapter/10.1007/978-94-007-6259-6_8.

Zhang, Shaowu, Mandyam V. Srinivasan, Hong Zhu, and Jason Wong. 2004. “Grouping of Visual Objects by Honeybees.” Journal of Experimental Biology 207 (19): 3289–98. doi:10.1242/jeb.01155.