# Günther Palm - Research Topics

In my scientific work I was always intersted in understanding and developing mathematical ideas (proofs, definitions, generalizations) on various topics, in particular information theory. After my dissertation on 'dynamical entropy' in more or less pure mathematics, I definitely wanted to study something real that perhaps needed some kind of mathematical approach, and I was convinced by Valentin Braitenberg that the working of the brain is the most fascinating topic of this kind. Today I can say that most of my work was motivated by two main questions:

1) the functioning of the human brain and

2) the relationship between information and entropy.

Towards the end of my study of mathematics in Tübingen in 1973 I got interested in both topics and specialized both in biomathematics and in information and ergodic theory. For my diploma (in 1974) and Ph.D. (in 1975) I worked on a generalization of dynamical entropy in ergodic theory. After that I followed the invitation of Valentin Braitenberg to work on information processing in the brain and also as a mathematical consultant to experimental researchers at the MPI for Biological Cybernetics in Tübingen, where I also cooperated for a while with Tomaso Poggio on nonlinear system analysis and with David Marr and later Manfred Fahle on the visual system. In a project with Gerald Langner and Henning Scheich I also worked a bit on the auditory system, which was useful for my work on speech recognition.

In Berlin, as a fellow of the Wissenschaftskolleg during the year 1985/6, I took up my work on information theory again and created a first draft of a book on this topic ('Novelty, Information and Surprise'; Springer 2012). I also used this year to reflect on the methodological and epistemological problems in the relation between experimental and theoretical research in the life sciences (as opposed to physics). Here the simple Popperian scheme of predictions and falsifications seems rather naive, since real predictions are rarely made and almost always falsified. Instead other mathematical ideas have to be considered: formulating and statistical testing of hypotheses, and the approximative flexibility of mathematical models (see nonlinear system analysis). Indeed, highly flexible models (with many adjustable parameters, for example neural networks) are very useful in grasping complex phenomena, but this also means they are hard to falsify and cannot directly be used and interpreted in the classical Popperian falsification scheme. Instead one usually has to single out particular 'important' parameters and resort to statistical testing of hypotheses concerning their values.

Back in Tübingen, again in closer connection with the experimental life scientists, and afterwards as a professor of Theoretical Brain Research in Düsseldorf and of Neural Information Processing or 'Neuroinformatik' in Ulm my research was mainly on the two topics, the brain and information. Of course my institute in Ulm, being part of the computer science department, was also working on various applications of artificial neural networks, in particular in machine learning and pattern recognition, concerning aspects of information fusion and the handling of uncertainty. With these topics in mind I took part in the foundation of two large cooperative research projects (so-called SFBs) one on multi-sensory-motor systems (or artificial animals) and one on companion technology.

1. Information processing in the brain.

As many others, I was most fascinated by the problem of understanding how we use our brains to think. As a starting point I read a lot of neuroscience reviews and the inspiring book by Donald Hebb 'On the Organisation of Behavior' (1949). Other than most of the neuroscientific work on sensory - motor systems in various animals (also a truly fascinating subject) Hebb was arguing about the problem of internal representation of objects, things or even whole situations on a sufficiently high level to be able to think about them. This led to his famous concept of the cell assembly, i.e. a distributed representation consisting of a set of co-activated (and also strongly interconnected) neurons distributed potentially all over the cerebral cortex (and maybe involving some subcortical neurons as well). The first question I tried to answer was 'How large are Hebbian assemblies?'. To answer this question I considered the 'formation' of assemblies, i.e. of strengthened synaptic connections between their neurons by Hebbian synaptic plasticity, where assemblies are viewed as patterns stored in an (auto-) associative memory. In this context one can optimize the size of these patterns in terms of an information based criterion which I called the storage capacity. The result of this optimisation was not a fixed number, but the more general observation that these assemblies have to be sparse activity patterns, i.e. only a very small percentage of the neurons in a cortical area belong to a typical local assembly there.

These ideas formed the basis of my first book 'Neural Assemblies - an alternative approach to artificial intelligence' (Springer, 1982). Later, when I started to work on more applied projects (for example as part of the first German initiative towards 'Neuro-Informatik', a term created at that time in the late 1980s), moving further into computer science, I realized that the same ideas around the topic of neural representation were also very useful in the conception and development of artificial neural networks with their multiple representation layers. In particular, the idea of sparseness, both of neural representations and of neural connectivity, seems to be a recurring theme. Today, due to several technical advancements these ideas have again led to fruition in the recent applications of 'deep' multilayer networks.

In the early years of this century we started to work more concretely on the use of distributed Hebbian assemblies in language understanding (and production) in a joint European project called 'MirrorBot'. In our view the processing of speech and language could be modeled as the interaction of several cortical areas in a network connected by 'long distance' cortico-cortical projections. In this model we assumed that at the microscopic level these connections are formed by Hebbian synaptic plasticity, both in the usually bidirectional cortico-cortical projections between different areas (hetero-association) and in the local intracortical connections inside each area (mostly by auto-association). Theoretically, such networks of associative memories can perform any calculation (Turing universality) and in our simulations the understanding of simple sentences was also achieved in a quite plausible way, both from a neuroscientific and from a psychological point of view.

Following this work I realised that it will be hard, if not impossible to localise the essence of human thinking in a particular brain area because it happens by the concerted coordinated interaction of many areas and there may not even be a single localised place in the brain which is responsible for the coordination(see my paper 312). In terms of psychological or cognitive functionality, thinking probably arises from the close interaction of two 'systems' or types of processes that are running concurrently, called 'the fast and the slow system' by Daniel Kahnemann. The *fast system* (or system 1) is our biological heritage and has been studied extensively in neuroscience. It runs automatically and seemingly effortlessly leading us safely through our daily sensori-motor routines and acquired habits. It uses most of the neurons in our brains in a massively parallel fashion (albeit the representations are often sparse) and is perhaps responsible for most of the energy consumption measured in fMRI. Only when something unusual happens or we believe that we have to make a far reaching decision, we may take the additional cognitive effort to engage the *slow system* (or system 2) which helps us to do logical thinking, reasoning and numerical or mathematical calculations. These particular faculties obviously have strong roots in human culture, social and language systems and may not be accessible to biological research in other animals. Still, the deliberate choices we have to make when using these faculties are guided by (the intuitions of) our biological fast system. These 'higher cognitive' faculties do not come easy to most of us and usually we try to avoid them. This was perhaps the main reason to invent and build computers in the first place, to take some of this burden from us.

In technical applications of computer science, in particular in the development and use of classical 'artificial intelligence', the invention and proliferation of so-called rule-based systems has led to the realisation that for most 'real-world' applications they are insufficient and have to be combined with systems of the 'fast' kind, like artificial neural networks, that have more 'biological' perceptual and learning abilities. It has turned out, however, that the creation of such 'hybrid' technical systems is far from being an easy task. The beauty of genuine human thinking seems to be that our more 'rule-based' system 2 is guided by the biological intuition of our fast system 1. If it works, the smooth interaction of the two systems can lead to beautiful and almost miraculous inventions, argumentations and mathematical proofs. But if it doesn't work it can also produce strange errors, fallacies or jumps to wrong conclusions, which have been studied in depth by a branch of cognitive psychology including (and perhaps started by) Kahnemann and Tversky.

2) Information and Entropy.

The notion of entropy occured in physics in the 19th century and its mathematical formulation was given by Boltzmann in his endeavor to form a solid bridge between mechanics (of molecules) and thermodynamics. The notion of information comes from the highly subjective world of human communication, was formalised in the 20th century by Claude Shannon (perhaps following hints by Norbert Wiener) and subsequently used in technical communication and coding theory, but also as a more 'solid' measurement in (sensory) psychology.

To a mathematician it is surprising that these two terms have essentially the same mathematical formulation in terms of the logarithm of a probability, in spite of their completely different origins. So I, like many others, was intrigued by the question whether it is possible to view information as a special kind of (negative) entropy or vice versa. My own work on this subject has led me to the perhaps rather unusual resolution that entropy can be considered as a special kind of information, namely the information about thermodynamical properties of a system that may be of interest to a physicist or engineer. As such it is actually outside classical physics since it is about the relation between the physical world and the human observer or engineer. So entropy belongs more to engineering (in particular chemical engineering) than to physics, and the famous second law of thermodynamics is not really a law of physics, but rather a law of engineering.

What led me to this opinion was my attempt to generalize Shannon's definition of information. This started as a mathematical project extending some of my thesis work on the so-called Kolmogorov-Sinai entropy in ergodic theory towards a general meaningful definition of entropy, or rather information, for overlapping sets of propositions instead of *partitions* (i.e. collections of pairwise disjoint sets). From a mathematical point of view I simply extended Shannon information from *partitions* to arbitrary *covers*. This entailed the distinction of information and what I called novelty, and of several types of *covers*, in particular the so-called templates (see my book 'Novelty, Information and Surprise' 2012).

Let me try to explain the basic ideas involved in this approach again in the context of thermodynamics (Chapter 14 of this book): In classical physics an 'observation' of a physical (or other) system is often described as a real function on its state space, i.e. given a particular state x of the system you describe it by 'measuring' a particular value f(x); you may also describe it by several such measurement values. In my slightly more general formalism an observation of a physical system is described by a true statement about the state x; for example it could be 'f(x) = a', but also 'a < f(x) < b'. So I introduce what I call a description, i.e. a particular kind of mapping that maps a state x into a true statement about x (which also corresponds to a set A that contains x).

To put it a bit more metaphorically: A typical experimentalist views the world through his particular spectacle which shows the value(s) obtained by his measuring device(s). A more generic human observer views the world through his particular spectacle, which tells him a true statement about the state x (from the collection of all those statements the observer might be interested in, also called his *template*). Now I am suggesting that the template of a thermodynamical engineer or observer just consists of the statements 'f(x) ≥ a' for one (or several) typical 'thermodynamical' measurement variable(s) f. And the novelty provided by this template or description is the physical entropy.

When a physical system evolves from a particularly prepared or incidentally observed state which provides a high amount of information or novelty for any particular template, it will most likely move into a state with lower novelty (the lower the novelty, the higher the probability). This is essentially the second law of thermodynamics.