Statistical foundations of ecological rationality

If we reassess the rationality question under the assumption that the uncertainty of the natural world is largely unquantifiable, where do we end up? In this article the author argues that we arrive at a statistical, normative, and cognitive theory of ecological rationality. The main casualty of this rebuilding process is optimality. Once we view optimality as a formal implication of quantified uncertainty rather than an ecologically meaningful objective, the rationality question shifts from being axiomatic/probabilistic in nature to being algorithmic/predictive in nature. These distinct views on rationality mirror fundamental and longstanding divisions in statistics (Published in Special Issue Bio-psycho-social foundations of macroeconomics) JEL A12 B4 C1 C44 C52 C53 C63 D81


Introduction
How do human beings reason when the conditions for rationality postulated by the model of neoclassical economics are not met -for example, when no one can define the appropriate utility function? (Simon, 1989, p. 377) It is sometimes more rational to admit that one does not have sufficient information for probabilistic beliefs than to pretend that one does. (Gilboa et al., 2012, p. 28) If we reassess the rationality question under the assumption that the uncertainty of the natural world is largely unquantifiable, where do we end up? Organisms rely on an ability to make accurate inferences from limited observations of complex, uncertain, and unstable environments, and an answer to the rationality question should formulate both the nature of this problem and its solution. The orthodox answer is that the nature of the problem is probabilistic inference and the nature of the solution is probabilistic optimality.
If we approach the rationality question with this view in mind then the prospects appear slim for a formally viable, normative theory of rational action when probabilities can't be quantified. In this article I argue that this view neglects alternative formulations of the problem of statistical inference, and the statistical theory of ecological rationality in particular (Brighton and Gigerenzer, 2007;Gigerenzer and Brighton, 2009;Brighton and Gigerenzer, 2012;Brighton, 2018). By setting out the statistical foundations of ecological rationality, my argument will undermine the claim that bounded and ecological rationality offer no normative challenge to orthodox rationality. The principle casualty in this statistical reassessment of the rationality question is optimality. Once we view optimality as a formal implication of quantified uncertainty rather than an ecologically meaningful objective, the rationality question shifts from being axiomatic/probabilistic in nature to being algorithmic/predictive in nature. In short, by reassessing the rationality question in this way, we end up not with a revised solution to the same statistical problem but a theory of rationality that responds to a different statistical problem.
I will first argue that the implications of deepened uncertainty are often obscured by a cluster of commonly held statistical intuitions collectively termed the bias bias (Brighton and Gigerenzer, 2015). The bias bias leads many researchers in several disciplines to neglect the relationship between two critical and controllable components of prediction error, bias and variance. Bias is widely understood, at least intuitively, and reflects the ability of a model to accurately capture systematic regularities in observations. The variance component of prediction error reflects the sensitivity of a model's predictions www.economics-ejournal.org 2 Economics: The Open-Access, Open-Assessment E-Journal 14  to different observations of the same problem, such as a different sample from the same population. These two components additively contribute to expected prediction error (O'Sullivan, 1986;Geman et al., 1992;Hastie et al., 2001;Bishop, 2006). Unlike bias, the role of variance is less intuitive and often neglected, causing a bias bias in statistical thinking that simplifies the role of uncertainty, masks a wide range of predictive models from consideration, and leads to the development of questionable theories and poorly justified policies. To illustrate the bias bias I will examine examples of social and economic systems that pose problems without optimal solutions.
Once the statistical pathologies associated with the bias bias have been clarified, I argue that the bias bias also characterizes the orthodox formulation of rational decision making under uncertainty. This second stage of my argument appeals to Breiman's (2001) distinction between two cultures of statistical modeling known as data modeling and algorithmic modeling. Data modeling characterizes much of traditional statistical inquiry and proceeds by conjecturing a data generating model. Algorithmic modeling proceeds under the assumption that the data generating model is indeterminable or non-existent, and does so by analyzing the relative ability of competing learning algorithms to incur low prediction error. This distinction echoes deeper distinctions in statistics and information theory (Geisser, 1993;Rissanen, 2007;Shmueli, 2010). I then argue that these distinct modes of statistical inquiry map directly onto the practices of orthodox rationality and ecological rationality, and, crucially, justify different kinds of rationality claim (Brighton, 2018). Claims of ecological rationality refer to the ability cognitive algorithms, such as simple heuristics, to incur low prediction error relative to alternative algorithms. Such claims neither imply optimal functioning, nor do they require that an optimal response be determinable in order to be established, explained, or justified.
The strategy of extending the categories of uncertainty assumed when formulating the rationality question, and then examining how robust established rationality principles are to these revised uncertainty conditions, is familiar but neglected (Binmore, 2009). A central concern for Savage (1954) was the category small worlds that he assumed when formulating the foundations of Bayesian decision theory. Small worlds are those that can modeled using a decision matrix defining a mutually exclusive and exhaustive set of states of the world, consequences, and actions that map between them. If we assume, as many do, that "the worlds of macroeconomics and high finance certainly don't fall into this category" (Binmore, 2009, p. 2) then this raises the question of which categories these worlds do fall into. In this article I focus on problems characterized not only by unmeasurable uncertainty (Knight, 1921;Kozyreva and Hertwig, 2019) or probabilistic ambiguity under the subjective interpretation of probability (e.g., Gilboa and Schmeidler, 1989;Machina and www.economics-ejournal.org 3 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) Schmeidler, 1992;Epstein, 1999), but by unquantifiable uncertainty under any definition of probability, and to the extent that it is more rational to regard generating distributions as a non-existent than to view them as the foundation for formulating the rationality question. As the following example illustrates, worlds of this kind are not hard to find.
2 Bias, Variance, and the Bias/Variance Dilemma Consider a serial offender who has committed a series of crimes while living at a single home location. Given only the locations of these crimes, how accurately can the location of the offender's home be predicted? I will use this well-studied problem in geographical criminal profiling to illustrate the relationship between bias, variance, and prediction error. Figure   1(a) depicts a map covering an area of roughly 30km 2 of Baltimore county, Maryland, USA. Superimposed on this map are 15 blue circles identifying the locations of a series of burglaries committed by a serial burglar residing at a single address 1 . In additional to these locations, Figure 1(b) plots the true home location of the offender and the home locations predicted by two geographical profiling models. The first model is the centroid method that predicts the "center of gravity" of the crime locations, which is simply the mean xcoordinate and mean y-coordinate of the observed crimes. The second model computes a probability surface over the crime area using an exponential decay function. Given a crime area divided into an array of cells, this model estimates the probability that each cell contains the home address of the offender. As the distance d from a crime location increases, the exponential decay function models the finding that the probability that offenders commit crimes a distance d from their homes decreases as a negative exponential function of d. By integrating the probabilities calculated from each crime location, the model estimates a probability surface over the crime area, shown in Figure 1(c). For this offender, the cell with greatest probability predicts the true home location with nearly zero error. The centroid method, on the other hand, predicts a point roughly 1.5km north-west of the true home location.
Is the probability surface model more accurate in general, or might this near perfect prediction be a lucky guess? One way of addressing this question is to evaluate the two models over a range of serial criminals (e.g., Block and Bernasco, 2009;Leitner and Kent, 2009;Levine and Block, 2011). In addition, further insights into this specific example can be gained by deepening the uncertainty conditions under which predictions are made.
For example, how would the two models perform when, rather than predicting the home 1 The data for this offender, labeled TS15D, can be found in the dataset supplied as part of the CrimeStat (Levine, 2010)   Panel (d) plots the prediction error of the two models as a function of the number of crimes sampled. The relative superiority of the two models inverts after 9 crimes have been observed. Note: Crime locations visually overlap in 5 cases.
www.economics-ejournal.org 5 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) measure the prediction error of both models, and then report the mean prediction error of both models as a function of r. What is striking about Figure 1(d) is that when fewer than 9 crimes are observed the centroid method outperforms the probability surface model. In other words, when the uncertainty conditions are deepened the relative performance of the two models inverts. Our next step is to understand how the concepts of bias and variance help to explain why.

Decomposing Prediction Error Into Bias and Variance
Suppose we are given n crime locations c 1 , . . . , c n where each crime location c i = (x i , y i ) is a point in a compact 2-dimensional Euclidean space covering the crime area. For simplicity I will assume that the offender's home location h also lies within the crime area. The task of the two models is to then map a series of crime locations to a point prediction of the offender's home location. Because the area that law enforcement may need to search before locating the offender's home will grow as a squared function of the prediction error, I will consider squared loss, and specifically, the squared Euclidean distance between the predicted and true home location, ||h − f || 2 . Now, at a given sample size r, analyses of the kind shown in Figure 1(d) sample r crimes k times to yield an ensemble of k . For a given sample size the expected error of a model can be decomposed as follows: Expected error = (bias) 2 + variance + noise. (1) The derivation of this decomposition for squared loss can be found in most machine learning textbooks (e.g., Duda et al., 2001;Hastie et al., 2001;Bishop, 2006) as well as a landmark article by Geman et al. (1992). For this problem the bias is the distance between the mean prediction of the ensemble defined above and true location of offender's residence. The mean prediction of the ensemble is simply the centroid of f (1) , . . . , f (k) , denotedf . Thus, bias is given by: Figure 2(a) provides a visual illustration of the relationship between an ensemble of 5 example predictions, the true home location, and bias. Variance is then the degree to which the individual model predictions -the members of the ensemble -vary about the mean predictionf : www.economics-ejournal.org 6 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) Figure 2(b) illustrates the relationship between the same ensemble of 5 predictions above, the mean predictionf , and variance. Finally, the noise term in Equation 1 plays no role in this example, but could represent the additional and irreducible error we would incur if, say, an adversary added some normally distributed error to our predictions.

Using Bias and Variance to Analyze and Explain Relative Prediction Error
To better understand the relative performance of the two models shown in Figure 1 to hold for this offender. The probability surface model also has high bias at low sample sizes but its bias steadily decreases as the sample size increases, eventually approaching zero. If bias were the only concern then the probability surface model is clearly superior.
Turning to variance, we see that the "trick" behind the centroid method is that it incurs remarkably low variance until roughly 14 crimes are observed, at which point the variance of both models approaches zero. The key insight here is that although the probability surface model is unbiased and makes a near perfect prediction, this is only true under complete information. When uncertainty is increased due to fewer crimes being observed, the probability surface model suffers from high variance, the upshot being that the biased, low-variance model centroid method incurs lower total prediction error. Thus, the ability of a model to match, represent, or capture the underlying structure of the problem -its potential to incur low bias -is only one determinant of low prediction error. How robust the predictions of the model are to different realizations of the problem -its potential to incur low variance -will often prove critical.

The Bias/Variance Dilemma
There will always be an infinite number of explanations that are consistent with our observations. Models code the assumptions needed to select which explanations we entertain in their functional form, their parameters, and in the constraints they impose on what values these parameters can be assigned. Assumptions are like bets. For example, we could hedge our bets by deploying a nonparametric model such as a multilayer neural network capable of representing a wide range of systematic patterns. The hope is that the network will incur low bias, but due to the large number of parameters needed to model layers of the network, we run the risk of incurring high variance. Another approach is www.economics-ejournal.org 7 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) to deploy a parametric method such as a linear model in the hope that the systematic patterns in the observations are governed by a linear function, or something close. This risk might be worth taking, given that the fewer parameters needed to specify the linear model could keep variance within acceptable bounds. An even bolder approach is to deploy a parameterless model with no free parameters. Here, we ignore the observations www.economics-ejournal.org 8 Economics: The Open-Access, Open-Assessment E-Journal 14  completely to guarantee zero variance. Unless this bold conjecture turns to be correct, or close to correct, the problem is that the model will incur high bias.
These issues clarify that whenever uncertainty surrounds our choice of model, we face a bias/variance dilemma (Geman et al., 1992). The dilemma arises because techniques for reducing variance tend to increase bias, and techniques for reducing bias tend to increase variance. As we have seen, the number of observations also play critical role in determining the bias and variance incurred by a model. All told, predictive inference involves complex interactions between models, observations, the data generating process, and prediction error. These complexities are often glossed over, and the bias bias explains to some extent why.

The Bias Bias
To suffer from the bias bias is to develop, deploy, or prefer models that are likely to achieve low bias, while simultaneously paying little or no attention to models with low variance (Brighton and Gigerenzer, 2015). The bias bias manifests itself in a range of statistical intuitions and practices which over-simplify or ignore the complexities of the bias/variance dilemma. In this sense the bias bias approximates the problem of statistical inference. This approximation is justified when the data generating machinery is known with a high degree of certainty, or when the number of observations asymptotes. The following four examples illustrate the dangers of the bias bias when dealing with the uncertainty of the natural world, and social and economic systems in particular.

Four Examples of the Bias Bias
Example 1: Geographical Criminal Profiling. A guiding concern for geographical criminal profiling is achieving low prediction error, and the field progresses in large part through the competitive testing of diverse models ranging from the centroid method to sophisticated probabilistic models (e.g., Rossmo, 1999;Canter et al., 2000;Snook et al., 2005;Levine and Block, 2011). This seems like an unlikely field to harbor a bias bias, yet the bias/variance perspective appears to be unfamiliar to researchers in this area 2 .
Unlike the analysis above, I have failed to find any studies that decompose prediction error into bias and variance or systematically examine, for a specific offender, prediction error as a function of the number of crimes sampled. While it is typical for studies to compare predictive models over a range of serial offenders who have committed varying numbers of crimes, this can only offer a limited insight into the potential of low variance models. Reversals in relative model performance, like we see in Figure 1(d), are masked by standard practices of model evaluation, and this is likely to steer model development away from techniques that incur low variance under conditions of heightened uncertainty.
Failure to explicitly consider and analyze bias and variance does not necessarily imply a bias bias, although it often goes hand in hand with a focus on bias reduction as the driving concern. This is reflected in the tendency of recent work to use increasingly complex probabilistic methods to model the factors thought to drive offender behavior (Block and Bernasco, 2009;Leitner and Kent, 2009;Levine and Block, 2011). I use the terms "simple" and "complex" here in a non-technical sense to refer to the degree to which one attempts to model the geographical, psychological, social, and economic factors likely to drive where and how many offenses an individual commits. In short, there is an argument to made that the field of geographical criminal profiling suffers from a mild bias bias, principally because the concept of variance is unfamiliar, and common techniques for reducing variance appear to play an increasingly minor role in model development. On the other hand, this field scores highly on model diversity and there is a broad recognition that simple models are often hard to improve on. As the following example illustrates, the influence of the bias bias is often far stronger. an optimal solution to this problem (Markowitz, 1959). An overarching concern for this discussion is the nature of the assumptions needed to formulate optimality results. Under conditions of unquantifiable uncertainty these assumptions are likely to be violated, the upshot being that the status of "optimal" solutions must be relegated to "just another exposing a bias bias. These conditions are not exceptional. Jagannathan and Ma (2003), to take another example, examined the role of bias and variance in the performance of portfolios constructed using regularization techniques that impose "wrong constraints", those that violate statistical characteristics of the population. These regularized portfolios, like 1/N , often improved investment performance. We either view such results as to some extent foreseeable or accept that our intuitions suffer from a bias bias.
Example 4: Modeling Consumer Behavior. Managers in the retail industry often need to distinguish between active and inactive customers. One strategy is to use observations of past customer activity to estimate the parameters of a probabilistic model detailing the processes thought to drive customer behavior. For example, the Pareto/NBD model estimates the parameters of a Poisson process modeling customer purchasing behavior and the parameters of exponential distribution modeling customer dropout rates (Schmittlein et al., 1987). Combined with further probabilistic assumptions about the heterogeneity of customers within the population, categorization decisions are then made using a maximum likelihood calculation (Fader et al., 2005). An alternative strategy is to deploy a simple hiatus rule where customers who have not made a purchase within a hiatus period of, say, 9 months are classified as inactive, and all other customers are categorized as active. These two approaches differ in how they prioritize either bias reduction, the ability to accurately model the regularities of the data-generating processes, or variance reduction, the ability to limit the instability of predictions. Pitting these two strategies against each other, Wübben and Wangenheim (2008) compared simple hiatus rules and the Pareto/NBD model using transaction data from the apparel, airline, and music industries. First, they used 40 weeks of customer transaction data to estimate the parameters of the Pareto/NBD model. Using transaction data for the subsequent 40 weeks, they then estimated how accurately each model predicted future customer activity. For the apparel, airline, and music customers, the Pareto/NBD model achieved predictive accuracies of 75%, 74%, and 77%. Hiatus rules with cutoff periods recommended by experienced managers, on the other hand, predicted customer activity with accuracies of 83%, 77%, and 77%. Again, we see that prioritizing bias at the expense of variance appears to be the wrong choice, and the intuition that accurate probabilistic models of the data generating process will result in accurate predictions is another manifestation of the bias bias.

Categories of Uncertainty
We err when making predictions, forecasts, and decisions because the world is uncertain and models provide only a distorted representation of the complex processes that determine  (Brighton and Gigerenzer, 2012).
Additional uncertainties arise when we consider that many systems are non-stationary. It is relative to these categories of uncertainty, which are by no means comprehensive, that the role of the bias bias should considered. The statistical pathologies associated with the bias bias, recall, stem from viewing bias reduction as the primary means for reducing prediction error. This is equivalent to viewing model misspecification as key, which justifies a focus on high fidelity, causal models of the system being studied.
Notice, though, that the bias bias does not imply an argument against the development of potentially complex causal models. Rather, it is an argument against the idea that we necessarily discover more predictive models by doing so. Indeed, some degree of causal thinking is needed to develop models at all. Haldane and Madouros (2012), for instance, decided to use a single liquidity indicator as a predictor of bank failure. This choice can have a causal justification even though they chose to disregard other potential causes of bank failure.

Modeling Under Uncertainty: Bias, Variance, and Optimality
When seen through the lens of the bias bias, how might Examples 1-4 above shed light on the rationality question? Before making a direct connection to considerations of rationality, three points need to be considered: 1. None of the four preceding examples have optimal solutions. The social and economic processes at work are unstable, complex, largely latent, and by any measure deeply uncertain. Of course, there is nothing to stop us from quantifying subjective probabilities and then using probability theory to derive and justify an optimality claim. But in doing so there is a sense in which we resort to changing the problem rather than exploring the space of potential solutions. Indeed, formulating optimal solutions to the problems of criminal profiling, bank regulation, portfolio investment, and customer profiling would say more about our modeling techniques than the systems being modeled.
2. This brings us to the issue of the relationship between bias, variance, and the existence of an assumed data generating function. Given that the meaning and measurement of bias requires that the underlying data generating function is known, how can I claim that the bias/variance perspective offers an insight into problems without optimal solutions? My claim is not that the bias/variance perspective undermines the goal of optimality. Rather, it brings into focus the need to understand how, when, and why statistical models incur low prediction error outside of their implied optimality conditions, and relative to other models. The meaning and measurement of variance, though, does not presuppose or require knowledge of the data generating function. Examples of abound of both justifying and explaining the performance of probabilistic models in terms on their relative ability to reduce variance outside of their optimality conditions (Hand and Yu, 2001;Domingos and Pazzani, 1997;Friedman, 1997;Ng and Jordan, 2002 simplicity has proven to be a highly productive heuristic for discovering low variance models. They offer points of contrast to the popular assumption that advances in statistical modeling arise from greater sophistication and complexity (Hand, 2006).

Other perspectives on variance reduction include regularization (Chen and Haykin,
www.economics-ejournal.org 14 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) 2002) and ensemble methods (Seni and Elder, 2010), the latter being an example of how averaging the predictions of potentially complex models can also reduce variance.
In summary, the outcomes of exploratory statistical analyses tend not to be optimality claims, the bias/variance trade-off is a familiar concept in statistics, machine learning, forecasting, and econometrics, and the potential benefits of simplicity in statistical modeling have long been known. My argument has on occasion been misinterpreted as a claim that these three points are novel, whereas my argument is in fact that these three points are novel considerations when formulating the rationality question. Put differently, one may be familiar with and subscribe to the statistical approach I have presented when modeling social, economic, and other systems, but somehow regard them as being irrelevant to the problem of formulating rational action in these same contexts. The next stage of my argument details how the preceding insights, and the bias bias in particular, can and should guide a reassessment of the rationality question under conditions of unquantifiable uncertainty.

Towards Ecological Rationality
By any measure the bias/variance perspective has failed to penetrate the orthodox study of decision making under uncertainty. Consider first the case of decision theory, where rational actors are seen as Bayesian maximizers of expected utility operating over a mutually exclusive and exhaustive set of future states of the world, consequences, and the actions that map between them (Savage, 1954). On this view rationality is a primarily a subjective matter of axiomatic coherence rather than one of making rational inferences about the world. The cognitive and biological sciences tend to adopt a different attitude toward the rationality question by viewing rational decision makers as optimal Bayesian decision makers defined relative to a probabilistic model of the task environment (McNamara and Houston, 1980;Anderson, 1991b;Chater and Oaksford, 1999;Chater et al., 2006;Griffiths and Tenenbaum, 2006;Knill and Pouget, 2004;Gershman et al., 2015). On this view rational decisions are probabilistically justified decisions, or equivalently, predictions whose accuracy will depend on the fidelity of the assumed model of the environment.
This second notion of rationality, one centered on environmental correspondence rather than axiomatic coherence, is of greater relevance to the decisions of criminal profilers, bank regulators, portfolio managers, and marketing executives given that any loss their decisions incur will be determined by latent or future states of the task environment www.economics-ejournal.org 15 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) (Arkes et al., 2016). When seen in this way, we cannot escape the bias/variance dilemma when formulating the rationality question because all claims flow from an assumed ability to model the data generating process. On recognizing this, I will argue that orthodox rationality -the view that rational decisions under uncertainty are optimal Bayesian decisions -can be seen as falling foul of the bias bias. Using a different but related notion of bias, Gigerenzer (2018) argues that behavioral economics also suffers from a bias bias in its formulation of the rationality question. However, to focus the discussion, this second stage of my argument will center on a contrast between Bayesian optimality modeling and ecological rationality. To establish this contrast it is first necessary to take a step back and consider how these two approaches to formulating the rationality question rest on distinct forms of statistical inquiry.

Constructing the Rationality Problem
The statistician Leo Breiman (2001) characterized two cultures of statistical modeling illustrated schematically in Figure 3. Consider first the scenario where we start with observations, each relating a set of independent variables to a dependent variable. The environment can be seen as a black box containing data-generating machinery that determines the joint distribution over the inputs to the black box (independent variables) and the output (the dependent variable) shown in Figure 3(a). Much of traditional statistical inquiry, and this includes the orthodox study of Bayesian optimality considered here, requires that we make a conjecture about the contents of this black box, depicted in Figure 3(b). We might, for instance, formulate an hypothesis space, prior distribution, and various parameters that we fit using the available observations. Breiman termed this approach data modeling and its defining characteristic is that at some point a conjecture is made about the contents of the black box.
An alternative to data modeling is what Breiman termed algorithmic modeling. When algorithmic modeling we refrain from making a conjecture about the contents of the black box and instead try to predict its behavior. As shown in Figure 3(c), the observations are used to estimate the predictive accuracy of competing models of inductive inference, which in practice usually means comparing machine learning algorithms using the principles of exploratory data analysis, much like the modeling approaches detailed in my discussion of the bias bias. Crucially, learning algorithms and the probabilistic assumptions they imply tend not to be seen as models or properties of the environment, but rather inductive biases likely to introduce model infidelities in order to reduce variance. Among the algorithms being considered, one or more will achieve the lowest prediction error. Such Rather than attempt to model the contents of nature's black box, algorithmic modeling is an incremental search for learning algorithms that can, to varying degrees, accurately predict the input-output relationship, shown in (c). Bayesian optimality modeling conducts data modeling in order to define an optimal response (d), while the study of ecological rationality conducts algorithmic modeling and interprets predictive models as potential cognitive models, shown in (e). Diagrams (a-c) adapted from Breiman (2001). findings in no way license an optimality claim. They merely provide an indication of the kinds of algorithmic design decisions or statistical techniques that reduce prediction error, thereby suggesting further algorithms worth evaluating. Algorithmic modeling is exploratory, yields a functional understanding of the algorithms being considered, yet in no way invokes the concept of optimality to explain model performance. In short, algorithmic modeling seeks a relative understanding of the ability of competing algorithms to reduce prediction error.

Defining Optimality: Data Modeling
Because the claims of Bayesian optimality being considered here are made relative to a probabilistic model of the task environment they necessarily adhere to Breiman's notion of data modeling. To be clear on this point, probabilistic models of this kind are not (initially) seen as subjective models personal to the actor, but the outcome of an ecological analysis conducted by the theorist seeking to make the rationality claim. In his pioneering work on the Bayesian analysis of cognition, for example, Anderson (1991a) states that "the structure of such a theory is concerned with the outside world rather than what is inside the head" (p. 410). The decisions of this rational actor will be probabilistically optimal with respect to the model, but not necessarily the environment being modeled. This raises two concerns. First, how can this model be formulated when we lack sufficient knowledge or the observations needed to probabilistically quantify the relevant uncertainties? If we have observed only a limited sample of banking failures over one or two crisis cycles, or observed only a small number of offenders who commit a certain kind of serial crime, how should we proceed?
The orthodox solution is that whatever uncertainties we face, they can and should be probabilistically quantified using, say, uninformed priors, second-order probabilities, or imprecise probabilities. Alternatively, we could recall the second of the two epigraphs that began this discussion, the proposal that "it is sometimes more rational to admit that one does not have sufficient information for probabilistic beliefs than to pretend that one does" (Gilboa et al., 2012, p. 28). But what is this "more rational" alternative, and how might it be justified? The second concern is that by conjecturing an explanatory, causal, or high fidelity probabilistic model of the environment we run the risk of succumbing to the bias bias. This modeling goal will often diverge from the goal of predictive modeling because a simpler, possibly regularized, and likely biased model incorporating known representational inaccuracies may incur lower prediction error. How can cases where, as Haldane and Madouros (2012) put it, "simple does not just defeat complex; it trumps the truth" (p. 17) be reconciled with the goal of developing probabilistic models of the environment? My claim is that ecological rationality represents a "more rational" response that avoids the bias bias, and it proceeds through algorithmic modeling.

Exploring Ecological Rationality: Algorithmic Modeling
The study of ecological rationality considers the adaptive fit between organisms and the structure of natural environments Gigerenzer and Selten, 2001;Gigerenzer et al., 2011;Todd et al., 2012). It proceeds by examining the interaction www.economics-ejournal.org between three components: (1) algorithmic models of how organisms make inductive inferences, with a particular focus on simple heuristics; (2) the properties of natural environments whose probabilistic structure is either uncertain or unknown; and (3), a formulation of the problem of statistical inference that defines and quantifies the meaning of an adaptive fit. A defining characteristic of simple heuristics is that they ignore information, and the overarching hypothesis is that these heuristics are a vital part of how organisms successfully cope with the uncertainty of the natural world. Results supporting this hypothesis are termed less-is-more effects, and they detail how minimalist processing strategies improve the accuracy of decisions relative to more complex and supposedly sophisticated strategies commonly assumed in the cognitive sciences (e.g., Gigerenzer and Goldstein, 1996;Czerlinski et al., 1999;Goldstein and Gigerenzer, 2002;Brighton, 2006;Gigerenzer and Brighton, 2009;Şimşek and Buckmann, 2015).
Because the overarching concern here is the contrast between orthodox and ecological rationality, I will sidestep a detailed discussion of models of ecological rationality and experimental studies focusing on their use by humans and other animals. I will instead focus on how component (3) of the interaction above typically assumes the perspective algorithmic modeling. Because of this, rationality claims are made relative to alternative models rather than an assumed data model. On finding that one model achieves lower prediction error than the alternatives, this model is regarded as more ecologically rational than the alternatives. Optimality plays no role (Brighton and Olsson, 2009). In Figure   1(d), for instance, the centroid method outperformed the probability surface model at low sample sizes not because is it optimal, and not because the centroid assumption holds for this offender. Similarly, when Haldane and Madouros (2012) found that a single indicator model outperformed the full CAMELS model, it was not because single indicator model was optimal, and not because the "true" economic processes determining bank receivership reduce to a single measure of liquidity. In both cases the simpler model was more ecologically rational than the competitors, and in both cases this was due to knowingly biased and "incorrect" models incurring low variance. The same statistical arguments justify the use of simple cognitive heuristics (Brighton and Gigerenzer, 2007;Gigerenzer and Brighton, 2009).

Orthodox Rationality and the Bias Bias
Advocates of ecological rationality have always maintained that the concept is incompatible with orthodox rationality. At the same time, several commentators have argued that while models of ecological rationality may generate interesting and insightful findings, www.economics-ejournal.org 19 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) these findings not only fail to challenge orthodox rationality but require established rationality principles to be explained (Chater et al., 2003;Chater and Oaksford, 1999;Oaksford and Chater, 2009;Gintis, 2012;Jones and Love, 2011). I have argued that a fundamental incompatibility does exist, and to establish it requires that the relationship be considered at the level of the assumed statistical problem. Previous critiques, in contrast, have assumed that the terms of the relationship can be established by considering the algorithmic properties of models of ecological rationality alone. Algorithms alone, though, do not fully specify the problem they attempt to solve, leaving critics of ecological rationality free to assume optimality as the assumed goal. The onus is therefore on advocates of ecological rationality to not only specify component (3) of the interaction above, but explain why it leads to an incompatibility with orthodox rationality. I will now revisit previous critiques of ecological rationality in the light of my proposed response to this challenge.
A recurring critique is that when a heuristic works, one still needs a rational explanation for why it works, and that the concept of ecological rationality must ultimately appeal to orthodox rationality principles when formulating such an explanation (Chater et al., 2003). And this explanation should be Bayesian in nature. This view extends to critics of Bayesian optimality modeling, such as Jones and Love (2011), who argue that the two approaches are "highly compatible" because "any inference algorithm implicitly embodies a prior expectation about the environment" (p. 186). I interpret this point to mean that a heuristic (or any learning algorithm) is rational to the extent that this prior expectation coincides the probabilistic structure of the environment. The problem with this line of reasoning is that it requires that the probabilistic structure of the environment be known in order to establish and explain instances of success. As I have shown, this assumption is incompatible with and challenged by ecological rationality. Furthermore, even if we assume a probabilistic model of the task environment, this kind of explanation remains problematic. First, the relative rationality claims of ecological rationality are made relative not only to an estimate of prediction error, but relative to the alternative models being considered. These relative claims highlight that models can and typically do succeed outside of the their implied optimality conditions, and for reasons not easily explained in terms of a discrepancy between these conditions and the structure of the environment. Indeed, from bias/variance perspective, they can succeed because of this discrepancy 3 .
3 A classic example is the naïve Bayes classifier, a simple learning algorithm that makes a strong assumption that the features (i.e., cues, attributes, or independent variables) are conditionally independent of each other, given the class. Despite this assumption being extremely unlikely to be met in practice the naïve Bayes classifier has a long history of performing surprising well, particularly when learning from www.economics-ejournal.org 20 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) A recurring theme among compatibility arguments is the idea that any simple heuristic can be reformulated as a probabilistic model operating with respect to a set of optimality conditions (e.g., Parpart et al., 2018). A compatibility between formulations of the rationality question, though, cannot be established at the level of specific models. One has to consider the issue at the level of the assumed statistical problem and uncertainty conditions characterising this problem. Gintis (2012) applies this reformulation argument slightly differently by noting that advocates of bounded and ecological have failed to appreciate that any algorithm with consistent preferences can be reformulated as maximizing an objective function. Therefore, assuming consistent preferences, instances of heuristic success can always be seen as solutions to an optimization problem (see also Boland, 1981).
What this argument fails to consider is that we can also establish and justify cases of relative success in task environments where the optimal solution is undefined, and therefore in situations where we have no basis on which to claim that optimality has been achieved.
The goal of reducing prediction error does not imply that we know what the minimum achievable prediction error is, and hence what the optimal response is. Gintis' argument would carry weight if the incompatibility between ecological and orthodox rationality centered exclusively on the issue of incompatible algorithmic properties, and specifically, a contrast between algorithms that don't explicitly optimize and those that do.
Finally, these concerns return us to the issue of the bias bias when we consider Oaksford and Chater's (2009) claim that Bayesian optimality modeling "cannot be replaced by, but seeks to explain, ecological rationality" (p. 110). Seen at the level of the assumed statistical problem, this claim implies that Breiman's notion of algorithmic modeling is reducible to data modeling. I have argued that two are incompatible, but there is also an argument to made that Oaksford and Chater's claimed relationship should be reversed, and that data modeling is more accurately seen a special case of algorithmic modeling. Specifically, data modeling is a special case of algorithmic modeling where we assign a single model the authority of assumed truth and we interpret all other models as approximations. It is relative to this "true" data model that optimality claims are then made. Crucially, the authority of this data model stems not from an analysis of its prediction error relative to alternative models, but faith in our ability to formally integrate the observations and our probabilistic beliefs about the environment. This is a form of Bayesianism, and the resulting model is assumed to be unbiased, and by conjecturing it rather than inferring it, the variance component of prediction is rendered irrelevant. On this view, it makes little sense to make an optimality claim relative to a model that sacrifices bias in order to achieve sparse data (Hand and Yu, 2001;Domingos and Pazzani, 1997;Friedman, 1997;Ng and Jordan, 2002;Webb et al., 2005).
www.economics-ejournal.org 21 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) a greater reduction in variance, and this is why orthodox formulations of rationality can be seen as another instantiation of the bias bias.

Discussion
You have a big approximation and a small approximation. The big approximation is your approximation to the problem you want to solve. The small approximation is involved in getting the solution to the approximate problem.
(attributed to George Box [e.g., Fieberg et al., 2010, p. 10;Hand, 2014]) Individuals, groups, and organizations make decisions based on limited observations of complex, uncertain, and unstable environments. I take this to be the overarching problem that theories of rationality formulate and provide a normative solution to. Despite the controversial status of optimality in scientific inquiry (Dupré, 1987;Schoemaker, 1991), the idea that rationality implies optimality is so widely assumed as to seem barely worth discussing. Optimal Bayesian decision makers in the cognitive sciences, optimal foragers in biology, and Bayesian maximizers of expected utility in economics are different faces of the same interdisciplinary orthodoxy. My goal has been to reassess this view by first recognizing that in formulating the rationality problem we necessarily make a big approximation.
Specifically, an optimal probabilistic response is a solution, a type of small approximation, to a specific kind of big approximation that presupposes all relevant uncertainties can and should be quantified. Given the uncertainty of the natural world I consider this to be an approximation worth questioning, and the examples I used underscore this point. There exist no optimal solutions to the problems of locating serial burglars, regulating international banks, managing portfolios, or identifying active customers. It is not the concept of rationality that implies optimality, but the big approximation of quantified uncertainty that implies optimality.
For each of the problems I considered, some models incur lower prediction error than others. For a specific task environment, it is therefore reasonable to regard some models as being more rational than others. The concepts of bias, variance, and the bias/variance dilemma help to disentangle the often complex relationship between the number of available observations, properties of models being considered, and their relative performance.
Moreover, these concepts allow us to understand how biased, low variance models can outperform more "principled" models under conditions of heightened uncertainty, such as in the criminal profiling example detailed in Figure 2(c). Previous analyses of the problems I considered exhibited the negative consequences of the bias bias, a cluster of statistical www.economics-ejournal.org 22 Economics: The Open-Access, Open-Assessment E-Journal 14  intuitions that neglect or ignore the role of variance. As such, the bias bias is the manifestation of a big approximation to the problem of statistical inference that masks the discovery of predictive, low variance models. If we then run with the idea that rationality claims can be relative claims, that these relative claims can be made in contexts where optimality is indeterminable and can be explained in terms of variance reduction, then orthodox rationality can be seen as falling foul of the bias bias. This is because optimality claims flow from an assumed ability to profitably conjecture an accurate, unbiased, probabilistic model of the data generating process. Variance plays no role in formulating conjectures of this kind. This is the point at which the substantive contrast between orthodox rationality and ecological rationality begins, and the contrast centers not on specific models or competing theories of cognitive processing, but on the assumed nature of the statistical problem. It is a clash of big approximations.
The study of ecological rationality makes relative rationality claims that refer to the ability of one cognitive mechanism to incur lower prediction error relative to other cognitive mechanisms in a given task environment. And because ecological rationality proceeds by conjecturing and analyzing the performance of cognitive mechanisms in environments with unknown or non-existent generating distributions, it is fundamentally exploratory.
This is the big approximation of ecological rationality, and I used to Breiman's (2001) distinction between data modeling and algorithmic modeling to locate this clash of big approximations within the broader context of statistical inquiry. Although this distinction has not previously been related to the study of rationality, I argued that Breiman's distinction maps directly onto the big approximations of ecological and orthodox rationality. Furthermore, this contrast is not specific to Breiman but reflects long-standing, deep divisions in statistics (e.g., Tukey, 1962;Geisser, 1993;Vapnik, 1998;Shmueli, 2010). For example, the foundations of both algorithmic modeling and ecological rationality share those of the Minimum Description Length (MDL) principle developed by Rissanen (1978Rissanen ( , 1986Rissanen ( , 1989 which rests on the idea that "no assumption of a 'true' data-generating distribution is needed. This changes the objective and foundation for all model building" (Rissanen, 2007, p. 6). This general mode of statistical inquiry, I propose, also changes the foundation and objective for theories of rationality.
I began with a question: If we reassess the rationality question under the assumption that the uncertainty of the natural world is largely unquantifiable, where do we end up?
We have arrived at the statistical, cognitive, and normative theory ecological rationality.
Other arrival points are undoubtedly feasible, but it is worth noting that this reassessment of the rationality question provides a converging line of argument in support of Simon's (1956Simon's ( , 1978 bounded rationality. In his critique of Homo economicus, Simon stressed www.economics-ejournal.org 23 Economics: The Open-Access, Open-Assessment E-Journal 14 (2020-2) that bounded rationality is shaped not only by the computational and cognitive bounds of the decision maker, but also informational bounds. In cognitive science and economics there is a tendency to focus on the first aspect of Simon's proposal, and view bounded rationality as reducing to the claim that optimization is infeasible due to cognitive and computational limitations, and this renders optimal responses either out of reach or in need of redefinition (e.g., Boland, 1981;Gintis, 2012;Griffiths et al., 2015;Gershman et al., 2015). This view neglects the potential for informational bounds to undermine the assumed objective of optimality and reshape the rationality question. In contrast, my reassessment of the rationality question stems entirely from a consideration of informational bounds, and specifically the impact of unquantifiable uncertainty. Yet this statistical reassessment converges on the same conclusion. The conclusion is that rationality in an uncertain world is algorithmic/predictive in nature rather than axiomatic/probabilistic in nature.
Reinhard Selten (2001) argued that "bounded rationality cannot be precisely defined. It is a problem that needs to be explored" (p. 15). I agree, and my argument elaborates on Selten's point by concluding that once we accept that the uncertainty of the natural world is largely unquantifiable, from a statistical standpoint, the rational response is to explore algorithms capable of reducing uncertainty rather than seek to define a probabilistic model that attempts to fully quantify all relevant uncertainties.