### Discussion Paper

## Abstract

A correlation curve is introduced as a tool to study the degree of intergenerational income mobility, i.e. how income status is related between parents and adult child. The method overcomes the shortcomings of the elasticity of children’s income with respect to parents’ income (i.e. its sensitiveness to different dispersion among the generations) and the correlation coefficient (i.e. its inability to capture nonlinearities). The method is particularly suitable for comparative studies and in this study labour earnings are compared to disposable income. The correlation between the parental income and the child’s adult disposable income becomes stronger for higher percentiles in the income distribution of the parents. Above the median the correlation is found to be stronger than for labour earnings. Interestingly, the elasticity is higher for labour earnings for most parts of the distribution and complementing the elasticity with correlation curves provides a much more complete picture of the intergenerational income mobility.

## Comments and Questions

The article argues that correlation curves could be a useful way to characterise the joint distribution between parent and child incomes, and then estimates these and related measures of dependence in a Swedish sample.

I find the argument transparent and convincing, and have only a number of proposals on ...[more]

... clarifying the presentation:

- What is the advantage of the correlation curve compared to just plotting the sample equivalent of the conditional expectation function of standardized child on standardized parent (log) income? Is the correlation function something akin to the derivative of that CEF? In what ways does it differ? To describe the correlation function from that perspective might be useful, because applied researchers will have plotted and eyeballed CEFs many times.

- Related, the paper argues that correlation curves are advantageous compared to the nonlinear/nonparametric elasticity, because it is not sensitive to differences in the standard deviation of the income distributions in the parent and child generations. What about standardising these distributions first before estimating the nonlinear elasticity – would that then be something akin to the correlation curve?

- Can you include a more explicit proposal on how to estimate the correlation curve in practice? Many practitioners might like the idea in principle, but the observation that “The correlation curve is easily calculated once a nonparametric technique is used to estimate the derivative of the regression function and the residual variance.” might not be explicit enough for the econometrically challenged.

- The observation that estimates of the elasticity in labor income are more than twice as large than estimates of the Pearson correlation (Table 2) should be briefly commented on. Is the variance in income so different between the parent and child generations?

- The fact that the estimates in Table 2 are likely to be severely attenuated because of measurement error (the income spans are quite short) could be better highlighted. This could also explain why in some cases the rank correlation estimate is so much higher than the Pearson correlation (rank correlations are more robust to outliers). As the author notes, it would be interesting to study how the correlation curve changes with the quality of the data at hand. To some degree this could be tested even in this study, by artificially decreasing the amount of information used in its estimation. But I suppose it makes more sense to perform such sensitivity tests in data that is more extensive.

- The plots of the correlation curve in Figure 1 are really hard to read, which is a shame because that is what many readers will remember. It would help to explicitly indicate (legend or table notes) which of the lines is the elasticity and which the correlation curve. It should also be noted what each of the three lines for the respective measures represent. I suppose the additional two lines are the confidence intervals, and they could then be formatted differently than the actual point estimates.

- Check and update list of references.

1. Standardizing both variables and obtaining the derivative of the conditional expectation function will, in general, not be equal to the correlation curve. The reason is that even after standardizing, the distributions can be skewed (or, expressed more generally, have local variation in the dispersion). This can imply variation in ...[more]

... the residual variance. Consider a case with two (linear) conditional expectation functions that are equal. Despite this, the residual variance can be substantially different and this would provide different correlation coefficients. If the residual variance would be heteroscedastic the correlation curve would be nonlinear, despite a constant regression slope. The correlation curve would indicate a weaker degree of association is where the residual variance is higher. The advantage with the correlation curve is that it incorporates both the slope (that can vary) and the spread around the regression function that also can vary over the distribution. Standardizing does not remove that possible variation in spread around the function, and this is the reason why we in general would have a difference between the slope and the correlation curve.

2. The Appendix includes more details on estimation. In particularly, I explain how to use local polynomial regression. I have submitted gauss codes that estimate the correlation curve, including bootstrap confidence interval. The code includes simulation data, but also details on how to use it on actual data (which is very easy).

3. Yes, there is an important difference in dispersion between the parent and child generation. It is important to remember that the definition of labor income for the child generation is very different from the joint income measure for father and mother that can be found in early tax register.

4. It would be possible to evaluate how the correlation curve performs in worse data scenarios (i.e. fewer income years), but I decided to not go in that direction. The reason is that the analysis would still be incomplete without having richer data (with more income years). The recent literature that recommends using rank correlation uses data sets with many more years for both generations. With the gauss code, together with such rich data, it would be easy to evaluate the correlation curve in the same way. One issue that also requires more attention is how sensitive the methods are of using father’s income instead of parents’ income. (Note: I use parents’ income).

5. I will definitely improve the figures in that respect. (Yes, the additional curves are confidence intervals).

6. I also agree on this point, concerning the reference list, and I will upload a revised version of the paper.