### Discussion Paper

Investigating the Exponential Age Distribution of Firms

## Abstract

While several plots of the aggregate age distribution suggest that firm age is exponentially distributed, we find some departures from the exponential benchmark. At the lower tail, we find that very young establishments are more numerous than expected, but they face high exit hazards. At the upper tail, the oldest firms are older than the exponential would have predicted. Furthermore, the age distribution of international airline companies displays multimodality. Although we focused on departures from the exponential, we found that the exponential was a useful reference point and endorse it as an appropriate benchmark for future work on industrial structure.

## Comments and Questions

See attached file

The present paper is concerned with the age distribution of firms. The author investigates several data sets in order to identify common structures and departures from it. The overall conclusion is that although there are some notable departures from the general picture, exponential distribution proves to be a useful benchmark ...[more]

... in the analysis of firm ages.

The investigation is thorough (to the limits that data allows) and statistical treatments are state of the art. The author investigates heterogenous data sets. The data on world's oldest firms is investigated first time. The analysis is conducted on three different levels:

1. Young firms - nation wide - all sectors 2. Old firms - international - all sectors 3. All firms - international - one sector

Although multiple levels of analysis gives the advantage in looking for general trends, it makes the point of the paper obscure (especially in the case of all three data sets having different types of departures from the exponential distribution).

One point I want to especially draw the attention is the theoretical contribution of the paper. I understand the theoretical framework has been borrowed from the work in a different field (Huberman and Adamic 1999). The idea is as follows. We take the theoretical model that generates the lognormal firm size distribution within the cohort (most notably Gibrat 1931) and combine it with the model generating exponential firm age distribution in order to get the power law size distribution on aggregate (cross cohorts).

Although the idea sounds appealing, the two models the author presents in the paper are not compatible with each other (in fact no model of firm age distribution will be compatible with the Gibrat model of firm growth). The reason is following. In Gibrat model of firm growth, firm size distribution approaches lognormal only in the time limit (when time goes to infinity). Therefore, by assumption, the model is only valid for firms that are already very old (in terms of time units at which firms are shocked). Thus, as long as firms are very old, the belonging of them to one particular cohort (age of the firm) does not matter (the age of all the firms is in principle infinity). So, the combination of this model with firm age distribution of any kind looses sense. The author actually acknowledges this in the footnote on page 5 by saying that the approach will not be "entirely appropriate" for very young firms.

However, he omits the implication of the Gibrat model that the distinction between the cohorts blurs at the level where the approach works.

I believe there are two distinctive theoretical approaches for generating Pareto size distributions. One is to stay in the framework of mechanistic models of firm growth generating lognormal distributions within the cohort (e.g. Gibrat 1931) and combine it not with the age distribution, but rather with factors not dependent on time (e.g.

Growiec et al. 2008, Economics Letters). The other is to move to more reasonable way of abandoning mechanistic growth models in favor of models with economic intuitions on micro level (e.g. Marsili 2005, Review of Industrial Organization).

This discussion is not meant to imply that age distribution of firms is not worth investigating. It is meant to stimulate the discussion about the meaningful ways in which these findings should be absorbed and further utilized.

First of all, I am grateful to the referees for taking the time to carefully read the paper, and for making many perspicacious comments.

I begin by responding to what appears to be the main points raised by the referees, before discussing relatively minor points

MAJOR POINTS: ...[more]

...

It seems the main point raised by Referee 1 was that more quantitative tests of the empirical age distribution were requested. (Note however that Referee 2 wrote that "The investigation is thorough (to the limits that data allows) and statistical treatments are state of the art.") I can certainly see referee 1’s point though. I agree that visual tests are not a very rigorous way of investigating an empirical distribution. Ideally, I would provide not only goodness-of-fit statistics for the empirical density with respect to the exponential benchmark, but also estimates of the estimates of the parameters of the fitted exponential distribution (perhaps by implementing asymmetric Subbotin estimation using Giulio Bottazzi’s ‘Subbotools’ software). The problem, however, is that the available data is quite limited. The aggregate datasets presented in Figures 2, 3 and 4 contain information on age for many firms, but it seems that very young firms are under-represented. Consider this: while the exponential would have the youngest age category as the modal age, instead the modal age is 6 years for the Indian dataset and 10 years for the Spanish data (I do not know what the modal age is for the Italian dataset, and I do not personally have access to this data). This suggests that very young firms (which would nonetheless be responsible for the highest frequency density under an exponential distribution) are remarkably under-represented in these Indian and Spanish datasets.

In contrast, the US BDS dataset, which has more precise information on the ages of very young plants, shows that the modal age is the smallest age category. The US BDS dataset does not have detailed information on age for plants for the ages 6 and above, however (plants of different ages are grouped together here, into age classes such as "6-10 years").

Given that the available data has poor coverage of very young firms, it seems that quantitative tests would easily reject the exponential distribution. As more detailed datasets become available, though, Referee 1’s suggestion of more advanced quantitative tests should definitely be given increasing attention.

I have expanded this discussion in the revised manuscript, mainly in the discussion to Figures 2-4 and in the conclusion (where I discuss future directions for research, which should include quantitative tests of the exponential and other candidate distributions such as the Pareto).

Referee 2 (Zakaria Babutsidze) wrote that the main criticisms of the paper are twofold. First, the use of multiple datasets makes the main point of the paper obscure. Second, the problem with combining an exponential age distribution with a Gibrat process is that central limit theorem cannot be applied to the case of young firms who experience only a small number of growth shocks.

First, the paper uses multiple datasets, which obfuscates the main point of the paper. This is a valid objection. However, I’m not sure how I can improve the paper with the datasets that are currently available. The data in Figures 2, 3 and 4 have good coverage of the central part of the support, but not good coverage of very young firms (I discuss this in more depth in the revised manuscript). As I mentioned above to Referee 1, while the exponential would have the youngest age category as the modal age, instead the modal age is 6 years for the Indian dataset and 10 years for the Spanish data. Therefore these datasets need to be complemented by detailed data on very young firms. The aggregated US BDS data gives detailed coverage of number of establishments up to 5 years, but above this age establishments of different ages are grouped together (e.g. the 6-10 age class). The data on the world’s oldest firms gives good information on very old companies, but not on younger firms. I discuss this in more depth in my comments to Figures 2-4 and in the conclusion.

Second, the author writes that central limit theorem requires a large number of observations. So, the theoretical model in Section 2 is not valid for very young firms that have not existed for a large number of periods. (Note the different emphasis between the referee’s statement that "the model is only valid for firms that are already very old" and my reply that "the model is not valid for very young firms.") It seems to me that the referee’s objection is not specific to the present application (i.e. firm growth) but it is a general criticism of the basic mathematical model, because the referee’s objection refers to the failure of central limit theorem when the number of observations is small.

This objection would therefore also apply to the previous apparitions of the mathematical model – namely Adamic and Huberman (1999) and Reed (2001). I did not find in either of these papers a discussion of the failure of the model in the case of ‘young’ entities. In the theoretical model in section 2, however, I do at least point out this caveat in a footnote. However, I suggest that this criticism of the mathematical model is not crucial for this particular paper, because this paper focuses on investigating empirically the age distribution rather than introducing the mathematical model. (Instead, this particular mathematical model is applied to firm age and growth in Coad 2010.) Although this paper does discuss previous theoretical interest in the age distribution by referring to the mathematical models in Adamic and Huberman (1999), Reed (2001) and Coad (2010), the main thrust of the paper is empirical investigation of the aggregate age distribution.

MINOR POINTS

Both referees draw my attention to the interesting and relevant paper by Growiec et al 2008, so I refer to it in the revised manuscript. I also refer to Marsili 2005, which was mentioned by referee 2.

Other minor remarks mentioned by Referee 1:

• Sigma has now been defined;

• Figure 1 is admittedly pretty basic, but since I want the paper to be accessible even to readers who are not terribly familiar with distributions, and since space constraints are not very strict for this journal, I decided to leave it in for now.

• Figure 6 – this limitation is an artifact of the US BDS data, which has detailed information on ages for young establishments, but groups ages together for older establishments (e.g. a 6-10 years age group, an 11-15 years age group, etc. ) I have clarified this in the revised manuscript.

• Missing word – I have corrected this, it now reads: "In this section we investigate the ..."

• Section 3.1- yes, this is a bit ironic. Although the text in Section 3.1 "previous literature" does not contain a single citation, it discusses Figure 5 which plots the results of previous work. I could not think of a way to solve this issue, so for the time being I have left it as is.

• I put the word exponential in the title because I don’t want to keep the reader in suspense, but rather I want the reader to know straightaway what the conclusion is – that the age distribution is approximately exponential. Also, I want to distinguish myself from previous work that suggests that the age distribution is a power law (Cook and Ormerod 2001) or lognormal (Fagiolo and Luzzi 2006 p31).

See the attached pdf file.

I am grateful to Dr Marco Capasso for his helpful comments and interesting suggestions.

Dr Capasso's discussion essentially focuses on the theoretical model in Section 2.

Dr Capasso writes that "a strong accent is given to the theoretical model." As such, I should make it clearer ...[more]

... that the contribution of this paper is the empirical analysis. However, in my defense, let me point out that the theoretical model is not mentioned in either the abstract, the introduction or the conclusion (and so it should not be seen as the main point of the paper). Instead, the theoretical model appeared in a separate working paper (Coad 2008) that has just recently been accepted for publication in the Journal of Industry Competition and Trade (I will update the reference in the revised manuscript). The reason I included the theoretical model in the paper (in Section 2) was that it shows how theoretical models have shown interest in the aggregate age distribution, even if empirical work has not addressed this topic. Dr Capasso suggests that I "emphasize the empirical findings, which represent the original part of the paper." I will follow this suggestion as I revise the manuscript.

Dr Capasso has "doubts that such a model brings us to a better comprehension of industrial dynamics." In my (biased) opinion, I find this model more realistic than some other models, such as the influential model in Axtell 2001, because Axtell expands upon a Gibrat model to assume that all firms are the same age, and that there is a lower reflecting bound on firm size (an artificial assumption that is required for application of a Kesten process). Instead, in this model I expand upon a Gibrat model by incorporating an aggregate age distribution that is exponential (which seems to be quite reasonable). However, I am ready to accept that there are other (more complicated) models that arrive at a more realistic representation of industrial dynamics by allowing for phenomena such as entry, exit, dependence of growth (variance) on size and age etc. Some of these more reasonable models are explored in Richiardi (2004 JASSS), as pointed out by Dr Capasso.

A pedantic reaction - The statement that "Age can easily be linked to size" seems a bit strong to me. While young firms are often small, and old firms are often large, there are still nonetheless many large young firms and many small old firms. Therefore I would be more comfortable with a statement such as "Average age can be linked to average size."

Dr Capasso points out that "the exponential age distribution can well be a consequence of a model a la Simon rather than an additional assumption." This is an interesting point. For example, an exponential age distribution could arise from simple entry-exit conditions if, for example, there is a constant number of entrants in each year, and they face a constant positive probability of exit in each year. Alternatively, an exponential age distribution could arise if the number of entrants increases exponentially and these firms survive indefinitely (this latter scenario is not very realistic, however). De Wit (2005, IJIO) provides a thorough overview of many different steady-state models that produce a Pareto firm size distribution, and many of these models have entry and exit processes. Although the probability of survival as a function of age is not explicitly explored in this strand of literature, it would be interesting to derive the age distributions implied by these entry and exit processes. While this is beyond the scope of this present paper, further work on the implicit age distribution of these stochastic models would be interesting. The simulation model presented at the end of the comments shows just how such investigations could be undertaken.

Investigating the Exponential Age Distribution of Firms

Major comments

The article tackles with the issue of the firm age distribution. While there is a wide empirical analysis on firm size distribution, the firm age distribution has been hardly studied previously. Recent empirical datasets have given the opportunity to ...[more]

... investigate this issue. Hence, the article focuses on this new topic.

The paper is also interesting because of the comparison of databases not only at aggregated and sectoral level, but it also between developed and non-developed countries.

Ironically, the author does not mention any previous literature on the analysis of firm age distribution. However, there is “collateral” literature related with the firm age distribution such as the firm entry literature and the survival likelihood. (In fact the author tackles with this issue empirically).

Minor comments

A part from comments of previous referees…

• there is a mistake in page 3 last paragraph: “Even in this situations…”.

• Maybe to give a little more information on the different databases presented in the text

• Why not to introduce some statistical information?

Reply to the reader comments dated March 29, 2010 - 11:10

I am grateful to the reader for taking the time to read the paper and make some useful comments on it.

The reader comments that I do not mention any previous literature on the analysis of ...[more]

... the firm age distribution. However, the reader does not mention any literature either. Since writing the first version of this paper, however, I have become aware of some other work on the topic. Two references that come to mind are the depictions of the age distribution in Huergo and Jaumandreu 2004 IJIO p558 (see also Huergo and Jaumandreu 2004 SBE p198) and Fagiolo and Luzzi 2006 p31.

Fagiolo and Luzzi observe a empirical age distribution in their sample of Italian firms that is strikingly close to a lognormal. It seems to me that this is most likely an artifact of their sample (in which young firms are apparently under-represented) rather than a robust feature of industrial structure.

Similarly, Huergo and Jaumandreu 2004 investigate the age distribution of firms in their database and observe a bimodal distribution in their histogram, with the lower mode corresponding to the 5-8 years age

category, and the upper mode corresponding to the 37+ residual category.

In comparison to these previous studies, the present paper provides a benchmark that can be used to gauge the extent of sample selection and the under-representation of very young firms in databases.

I will add these references in the next revision of the paper.

Minor comments - thanks, these will be included in the next draft.

See attached file

See attached file

See attached file

See attached file

See attached file

I was requested by the Chief Editor to assess this paper.

I found it very well-done and providing interesting statistical insights on firms age distribution, especially with regard to the possible departure from the exponential benchmark, due to excess of entry in a given industry. This is an interesting ...[more]

... bridge to the new emerging literature (both theoretical and empirical) on "entry mistakes".

However, interpretation and conclusions can be improved through a larger use of previous economic literature.

Some of my responses to the referees (taken individually) were uploaded, but my general comments on the common themes raised by both referees were not uploaded. Hence this addition.

First, I want to thank the referees for carefully reading the paper and making many insightful and thought-provoking suggestions ...[more]

... that have helped improve the paper. I have tried to follow their suggestions closely in revising the manuscript.

A common theme in the referee reports is that more rigorous analysis was requested (although they did not give practical guidelines as to how this could be achieved - e.g. what exactly to do and which software to use). Also, because of the limitations of current datasets, I cannot be as rigorous as one might hope. It seems to me that the best way to respond to the requests for more rigorous analysis is to undertake some asymmetric Subbotin estimation (parametric fitting) of the age distribution, and explicitly taking the poor coverage of very young firms in aggregate datasets into account by restricting the mode of the estimated model to the empirically-observed mode. (Unfortunately though, I couldn’t find any way to associate standard errors with the estimated parameters using Subbotools 0.9.8.1.) I also compare the exponential to other potential candidate distributions in the revised Figure 1, and also in the Subbotin estimation section. Furthermore, I have tried to make the motivation of the paper clearer, as well as the logical flow of the paper.