Home Advantage in European International Soccer: Which Dimension of Distance Matters?

We investigate whether the home advantage in soccer differs by various dimensions of distance between the (regions of the) home and away teams: geographical distance, climatic differences, cultural distance, and disparities in economic prosperity. To this end, we analyse 2,012 recent matches played in the UEFA Champions League and UEFA Europa League. We find that when the home team plays at a higher altitude, they benefit substantially more from their home advantage. Every 100 meters of altitude difference is associated with an increase in expected probability to win the match, as the home team, by 1.1 percentage points.


Introduction
The home advantage in team sports is a phenomenon that has been widely studied in peerreviewed literature. Courneya and Carron (1992, p. 13) defined this home advantage in their review article as: "the consistent finding that home teams in sports competitions win over 50.0% of the matches played under a balanced home and away schedule." More concretely, the home advantage has been documented as a key determinant of sports game outcomes in a broad range of different team sports, including American football (Pollard and Pollard, 2005b), basketball (Ribeiro et al., 2016), field hockey (Smith et al., 2000), and ice hockey (Bray, 1999). However, this phenomenon has been studied most widely in soccer. Numerous research has centred around analysing the home advantage in soccer matches at the national level, moving from country-specific studies in Australia (Goumas, 2014a), Brazil , England (Clarke and Norman, 1995;Nevill et al., 1996;Carmichael and Thomas, 2005), Germany (Oberhofer et al., 2010), Greece (Armatas and Pollard, 2012), Spain (Sánchez et al., 2009;Saavedra et al., 2015), and Turkey (Seckin and Pollard, 2008), among others, to cross-country investigations (Pollard, 2006a(Pollard, , 2006bPollard and Gómez, 2014;Leite and Pollard, 2018). Additionally, research on the home advantage in soccer has been conducted based on World Cup data (Torgler, 2004;, international club competitions data (Page and Page, 2007;Poulter, 2009;Goumas, 2013Goumas, , 2014b, and data on international football games played in South America (McSharry, 2007). Several of the aforementioned studies have investigated the moderators of the home advantage in soccer. Among the most discussed factors influencing this home advantage are: (i) crowd effects (Nevill et al., 1996;Pollard and Pollard, 2005a;Sánchez et al., 2009;Oberhofer et al., 2010;Goumas, 2013;Ponzo and Scoppa, 2018;; (ii) referee bias (Nevill et al., 1996;Sutter & Kocher, 2004;Nevill et al., 2013); (iii) territoriality effects (Neave and Wolfson, 2003;Pollard, 2006aPollard, , 2006bPollard et al., 2008;Seckin and Pollard, 2008;Armatas and Pollard, 2012;Gómez, 2013, 2014;; (iv) travel effects (Clarke and Norman, 1995;McSharry, 2007;Pollard et al., 2008;Oberhofer et al., 2010;Armatas and Pollard, 2012;Bäker et al., 2012;Goumas, 2014aGoumas, , 2014b; and (v) familiarity effects (Pollard, 2002;Watson and Krantz, 2003;2014;. Moderators (i), (ii), and (iii) each relate to the fact that the home team typically receives stronger support from the audience, which motivates the players of the home team, and which tends to influence the referee's decisions in favour of this team. Therefore, not surprisingly, many studies have found that the larger the audience, the greater the home advantage. In addition, countries with a higher sense of territoriality, like those in the Balkan region, are generally found to have a greater home advantage (Pollard, 2006a(Pollard, , 2006bGómez, 2013, 2014).
Moderators (iv) and (v) address the fact that the away team may experience fatigue due to travel-related factors and that the home team has the advantage of being familiar with the circumstances in the city of the stadium, both resulting in a higher relative productivity of the home team. Crucial with respect to (iv) and (v) are various aspects of distance between home and away teams. In this respect, small-but significant-positive associations between home advantage and distance travelled are found in England (Clarke and Norman, 1995), Brazil , Germany (Oberhofer et al., 2010), and in international European soccer matches (Goumas, 2014b), but not in Greece (Armatas and Pollard, 2012) nor Australia (Goumas, 2014a). Relatedly, Seckin and Pollard (2008), Bäker et al. (2012), and Leite and Pollard (2018) indicate that the home advantage is substantially smaller or even completely vanishes whenever a match is a derby. In addition, McSharry (2007) and  report there is a significant association between home advantage and altitude, with each 1,000 m in altitude difference worth, on average, an increase in the goal difference by half of a goal according to the first study and 0.115 of a point's advantage for the home team according to the second study. Last,  report that playing in high humidity increases home advantage.
However, this literature on the relationship between home advantage in soccer and distance between the home and away teams is characterised by an important gap. That is, all mentioned studies investigate one or two variables related to geographical distance while making abstraction of other dimensions of distance. In other words, they neglect that distance between two teams can go beyond mere measurable miles. From an empirical point of view, their approach may result in an omitted variable bias. Indeed, the included (geographical) distance measures may pick up the moderating effect of other dimensions of distance that are not included. For instance, the travel length variables included in previous studies may pick up the effect of temperature differences between the cities of the home and away teams (to which away players have to adapt).
The present study aims to fill this gap. We investigate the association between home advantage in European international soccer and multiple perspectives of the factor of distance between home and away teams. More concretely, we investigate whether home advantage in soccer is heterogeneous by (a) geographical distance (travel length and difference in altitude); (b) climatic differences (with respect to temperature and precipitation); (c) cultural distance; (d) and disparities in economic prosperity between the regions of the home and away teams, keeping heterogeneity in the home advantage by the number of spectators, the derby status of the match, the home advantage at the national competition level, and the teams' relative strength constant. We are not aware of any previous work investigating the importance of distance factors (c) or (d) in the home advantage in soccer, let alone previous work investigating them within one statistical framework.
To this end, we analyse 2,012 matches in the Union of European Football Associations (UEFA) Champions League and UEFA Europa League between 2008 and 2016. The match data are merged with country and city-level data. These data also allow us to test, as a first study, whether or not the home advantage in international soccer matches is different in derbies and whether or not an elevated home advantage in the national leagues in the Balkan translates into a higher home advantage for Balkan teams in international matches.

Data
The basis of our dataset was formed by match reports from all matches in the UEFA Champions League between 2008 and 2016, and all matches in the UEFA Europa League between 2011 and 2016-before 2011, another competition format was used for the latter competition. These data were collected from the official website of the UEFA (UEFA; http://www.uefa.com). The UEFA Champions League, which is the most prestigious club competition in the world, and the UEFA Europa League both begin with a group stage of 32 and 48 teams, respectively, divided into groups of four teams, where each team plays against the other once at home and once away. The group stage of each season is played from September to December. The teams finishing first and second in each group proceed to the knock-out stage of their competition. Additionally, the teams finishing third in each group of the UEFA Champions League enter the UEFA Europa League knock-out stage. The knock-out stage of both competitions is played from February to May. During this phase, teams meet each other in one home and one away match after which the team with the positive goal difference over these two matches (potentially after additional time and penalties) advances to the following round. In total, 125 and 205 matches are played in each season of the UEFA Champions League and UEFA Europa League, respectively, which totals of 2,025 matches within the mentioned time frame. However, the 13 final matches were excluded from our analyses, given that they were played on a neutral pitch (without home advantage). Consequently, our analyses are based on 2,012 match reports. For more information on the regulations of the two competitions and the rules of a soccer match, we refer to the UEFA website (http://www.uefa.com) and to FIFA (2017). Following the approach used by Ponzo and Scoppa (2018), we considered each match twice in our data, one time from the perspective of the home team and one time from the perspective of the away team. This generated a total of 4,024 observations at the team-match level. As the outcome variables are closely related for the observations of the home and away teams at the match level, we clustered the standard errors in our regression analyses at this level. In addition, as a robustness check, we redid our analyses after randomly assigning each match either to the home or to the away team, thereby considering each match only once. However, this alternative approach did not yield different empirical conclusions. In what follows, we will always refer to a match between a 'team' and its 'opponent', where 'team' is the home team and 'opponent' is the away team if the match is viewed from the perspective of the home team, and vice versa. Table 1 presents descriptive statistics for all of the variables used in the regression analysis below, together with their definitions and their respective sources. Panel A describes the variables used as dependent variables in our analysis. We constructed three distinctive variables capturing the outcome of the match at full time from the perspective of the team under concern: (i) goal difference, (ii) victory, and (iii) number of points. The mean value of 0.000 for (i) is a direct consequence of the construction of our dataset, where, as aforementioned, we considered each match twice. Using the mean value of victory, equal to 0.379, we can deduce that 24.2%, i.e. 1 -2 × 0.379, of the matches ended in a draw. As a victory yields three points and a draw yields one point, not surprisingly, each team obtained about 1.379 points per match on average.

Economics: The Open-Access, Open-Assessment E-Journal 13 (2019-50)
www.economics-ejournal.org 5 Panel B of Table 1 presents the main independent variable, i.e. the home team status of the considered team. Given the construction of our dataset, half of the observations capture match events from the perspective of the home team. Panel C shows the variables by which the advantage of this team (over the away team) may differ. As mentioned in the introduction, we included six such variables that relate to the multi-dimensional 'distance' between the home and away teams.
First, geographical distance is captured by the variables 'Distance: travel length' (average distance determined using a bird's eye view between the city of the home team and its opponent) and 'Distance: altitude' (difference in meters above sea level between the two cities). The highest travel length (of 6,173 km) is observed between the stadiums of Benfica (Lisbon, Portugal) and FC Astana (Nur-Sultan, Kazakhstan). The stadium with the lowest altitude is that of Qarabağ (Baku, Azerbaijan; 7 meters below sea level), while the stadium with the highest altitude is that of FC St. Gallen (St. Gallen, Switzerland; 779 meters above sea level).
Second, distance regarding climatic differences between the cities of the home team and the away team are determined by their temperature and precipitation differences, both measured against the month of the match. The lowest and highest (average) temperatures are measured in Kazan in February (FC Rubin Kazan, Russia; −10°C) and Tel-Aviv in September (Maccabi Tel Aviv FC and Hapoel Tel Aviv FC, Israel; 27°C). In Tel-Aviv, also the precipitation level is the lowest (0 mm in September); it is the highest in San Sebastián (Real Sociedad, Spain; 181 mm in November).
Third, the teams' cultural distance is based on the Cultural Distance Index constructed by Spolaore and Wacziarg (2016). Using the answers to a questionnaire containing enquiries related to six different value-related categories collected from people in 71 countries, Spolaore and Wacziarg (2016) calculated the cultural variance index for 2,701 pairs of countries. This index is not available for 192 observations-matches with teams from Israel are overrepresented in these observations. The smaller the value of the cultural distance index, the smaller the cultural distance between the two countries under review. For example, the smallest cultural distance in our dataset is that observed between Russian and Ukrainian teams, equal to -89.820-the same value is used when two teams from the same country play against each other. We find the biggest cultural distance between Denmark and Turkey, with a value of 81.670.
Fourth and last, the disparity in economic prosperity between the country of the home and away teams is operationalised by their difference in gross domestic product (GDP) per capita. The lowest GDP level is measured in Ukraine in 2015 (FC Dnipro, FC Dynamo Kyiv, and FC Shakhtar Donetsk; 2,125 euro) while the highest level is measured in Norway in 2013 (Tromsø IL; 102,910 euro).
We believe these four dimensions ensure a focus on the most relevant aspects of distances in Europe. Substantial correlations are found between these dimensions. In particular, teams that are at a large distance in bird's eye view are often characterised by a high cultural distance (Pearson's r = 0.541). In addition, other significant correlations (at the 5% significance level) are those between (i) altitude difference and temperature difference (r = -0.097), (ii) altitude difference and precipitation difference (r = -0.058), (iii) altitude difference and wealth difference (r = 0.126), and (iv) precipitation difference and wealth difference (r = 0.165).
Two of the six distance variables are equal for the home and away teams at the match level: travel distance and cultural distance. The four other distance variables have a direction, so that their value for the home team is the opposite of that of the away team (and their average value is, by construction of our data, 0): altitude difference, temperature difference, precipitation difference, and wealth difference. For the latter variables, we also constructed the corresponding distance in absolute values. These variables are added to the regression model in our extended analysis. Including these absolute values makes it possible to determine whether it is a difference (or shock) in these variables that determines the home advantage, irrespective of its direction, or whether it is a difference in a certain direction that yields an additional home premium.
The other variables in Panel C are match characteristics with a potential influence on the home advantage that are not related to the distance between the home and away teams. First, to capture crowd effects in a direct way, we included a variable capturing the number of spectators. As mentioned in our introduction, this variable often recurs in the literature as a factor that increases the home advantage. The average number of spectators in the analysed matches was 31,101. Second, we adopted a derby variable to check, as a first study, whether the home advantage varies by this variable in European international soccer as it does in national matches in Germany (Bäker et al., 2012). Third, we included indicators for teams from the Balkans and Northern Europe. Following Pollard (2006b), the home advantage in national leagues in the Balkans is generally higher than elsewhere in Europe, while the home advantage in Northern Europe (including the Baltic states, Scandinavian countries, Iceland, and the five countries of the British Isles) is lower than average. By means of our regression framework, we can test whether the higher (lower) home advantage in national leagues in the Balkan (Northern Europe) is also reflected in a higher (lower) home advantage for Balkan (Northern European) teams in international matches. A final potential moderator of home advantage that we investigate is the relative strength of the team and its opponent. This relative strength is captured by the teams' difference in UEFA coefficient-the UEFA coefficient of a team is based on its participation and results in the previous five seasons of the UEFA Champions League and UEFA Europa League.
The variables in Panel D of Table 1 are used to confirm whether the performed analyses are robust for (i) the exclusion of matches in which the home team does not play in their own stadium; (ii) the exclusion of matches without a competitive value for the team or its opponent; and (iii) the exclusion of matches in the knock-out stage. A team does not play in its own stadium if its stadium does not meet the requirements of the UEFA, for example, at Zulte Waregem (Belgium), when participating to the Europa League in 2013, or if there are security concerns, as at Shakhtar Donetsk (Ukraine), when participating to the Champions League in 2014. In those instances, the home team has only a pseudo home status. This occurred, however, only in 3.3% of the analysed matches. Next, we define a match without competitive value to be a match in the group stage where it was mathematically impossible for the team and/or its opponent to change their qualification status for the next stage. A third robustness check is performed to see whether the home advantage patterns in our data remain when matches in the knock-out phase are excluded. This is considered given that, as aforementioned, additional time and a penalty shoot-out may be added to the return match of a round, potentially resulting in an additional home advantage for the home team of such matches.

Statistical approach
We analysed the data presented in the former subsection using linear regression models. All models that are estimated can be represented by means of the following general equation: In this equation, , is the dependent variable: the outcome of the nth match, from the point of view of a team i. , is the dummy variable capturing the home team status of team i in match n. , is a vector of distance-related and other variables according to which the association of , with , may be heterogeneous. is the intercept of the model, is the coefficient related to , , is a vector of coefficients associated with , , and , is the error term. As aforementioned, we clustered the standard errors at the match level to correct for the correlation between the error terms due to the two observations per match. In addition, this clustering of the standard errors corrects for their heteroscedasticity due to the fact that our dependent variables, 'Victory' in particular, are not normally distributed (Angrist and Pischke, 2008;Baert and Amez, 2018;Van Den Broucke and Baert, 2019). However, we also estimated (ordered) logit models, yielding the same research conclusions.
It is important to notice that we did not include the , without interaction with , (ergo, as a control variable). Correlation between , and the , is impossible given the construction of our dataset, where for every combination of teams, there is always a match where one team is the home team and the other one is the away team, and a match where the opposite is true. For the same reason, controlling for team fixed effects is not rational. Furthermore, it is not desirable to include the , as such because we would then consistently divide the total home effect into an effect of home advantage and away disadvantage, which would not be consistent with the literature mentioned in the introduction.
The , in , , were mean-centred so that throughout the regression models, we can interpret as the average effect of playing at home. For each of the models presented in the Results section, we computed multicollinearity diagnostics leading to variance inflation factors substantially lower than 5. Table 2 presents the results of our benchmark analysis. In regressions (1) -(3), we regress goal difference (model (1)), victory (model (2)) and number of points (model (3)) on the home status of the team only. In regressions (4) -(6), we redo the same analyses after adding the interactions between the home status of the team and the variables from Panel C of Table 1.
The estimation results concerning the average effect of playing at home are robust across the six regression models. A highly significant (p = 0.000) positive association is found between playing at home and the outcome of the match in terms of our three dependent variables. After including the interaction variables, we find that playing at home increases (i) the expected goal difference at full time by 0.811 goals, (ii) the expected probability of a victory by 18.3 percentage points, and (iii) the expected number of points by 0.550, all other variables held constant.
With respect to the importance of the multi-dimensional distance between the home and away teams, only the interaction with the altitude difference between the teams has a significant coefficient. Every additional 100 m above sea level is associated with (i) an increase in the goal difference by 0.050 goals (p = 0.006), (ii) an increase in the chance of a victory by 1.1 percentage points (p = 0.014), and (iii) an increase in points by 0.032 (p = 0.008) for the home team. The increase in the goal difference is exactly of the same magnitude as that reported in McSharry (2007), namely an increase of about half a goal for each 1,000 meter additional distance in altitude. In contrast, the increase in points we find (10×0.032 per 1,000 meter) is more than three times as high as the 0.115 additional points for each 1,000 meter additional distance in altitude reported by . This higher association may be related to the fact that Europeans might be less used to substantial differences in altitude. Given this striking difference in magnitude, we estimated several alternative regression models with non-linear specifications (adopting, for instance, the natural logarithm of our distance measure) but these did not seem to capture our data better.
With respect to the other distance variables, we identify a small and weakly significant (p = 0.053) coefficient for the interaction between the home status of a team and its cultural distance to the away team in regression (5), but not in regressions (4) and (6), which indicates that this may be a statistical artefact.
Regarding the other interaction variables, we find that the home advantage is consistently higher when the number of spectators is higher and when the relative strength of the home team is more substantial. Per 1,000 additional spectators (one unit increase in the relative strength index), the goal difference in favour of the home team increases with 0.009 (0.012), the chance of a home win increases by 0.2 (0.2) percentage points, and (iii) the number of points obtained by the home team increases by 0.007 (0.007). We do not find evidence for the home advantage to be heterogeneous by the derby status of the match or the region of the country of the team (Balkan, Northern European, or other). Table 3 presents the results of an extended analysis where we include the absolute values of the distance variables with a direction. As aforementioned, this allows us to check whether it is a shock in these distances that determines the home advantage, or a shock only heading in a certain direction. Regarding the altitude difference, we see that the direction of this difference is important. The regular altitude distance variable is very comparable to that in Table 2 after including its absolute value, while this absolute value is not significant. So, again, when the home team plays at a higher (lower) altitude, they benefit more (less) from their home advantage. Furthermore, we notice a significantly positive association between home advantage and the absolute wealth difference between the competing teams. An additional difference in wealth between the country of the home team and the country of the away team of 1,000 dollars Notes. A definition of the included variables can be found in Table 1. The variables interacted with 'Home' are mean-centred. The presented statistics are linear regression model estimates and standard errors, clustered at the match level, in parentheses. *** (**) ((*)) indicate significance at the 1% (5%) ((10%)) significance level.
per capita increases the home advantage in terms of goal difference by 0.076 goals. However, this interaction is not significant in regressions (2) and (3), so, again, this result should be interpreted with caution.

Robustness checks
As our main finding of a higher home advantage for home teams playing at a higher altitude could be driven by a few outliers, we discuss an outlier analysis. More concretely, in Table 4 we replicate regressions (4), (5), and (6) of Table 2 after excluding matches with a distinctive distance in altitude. That is, for columns (1), (2), and (3) of Table 4, we exclude matches where the distance in terms of altitude is more than three standard deviations higher or lower than the average of 0 (so, with a distance of 707.7 meters or more). In addition, for columns (4), (5), and (6), we exclude matches where this distance is more than two standard deviations higher or lower than the average. As a consequence, the number of observations is reduced from 3,832 observations to 3,804 observations and 3,660 observations in the first three and last three columns, respectively. However, the regressions results are very comparable to those presented in Table 2.
In addition, as mentioned in the Data subsection, we redid our benchmark analysis after (i) the exclusion of matches when the home team does not play in their own stadium, (ii) the exclusion of matches without any competitive value for the team or their opponent, and (iii) the exclusion of matches in the knock-out stage. However, none of these analyses, the results of which can be obtained upon request, has led to other insights than those of the benchmark analysis.

Conclusion
This study contributed to the literature about home advantage in soccer in several ways. Former contributions to this literature investigated how this home advantage varies by the geographical distance between the home and away teams neglecting other dimensions of distance (and the related, potential omitted variable bias in their estimates). In contrast, we investigated heterogeneity in the home effect by geographical distance (travel length and difference in altitude), climatic differences (temperature and precipitation), cultural distance as well as disparities in economic prosperity between the region of the home and away teams. In addition, we allowed the measured home advantage to vary by the number of spectators, the derby status of the match, the home advantage at the national competition level, and the teams' relative strength. To this end, 2,012 matches in the UEFA Champions League and UEFA Europa League between 2008 and 2016 were analysed. We found, first, in line with the literature, there is a highly significantly positive association between playing at home and ending the match in a favourable position. Second, the altitude difference stood out as the one major distance-related moderator of this home advantage. Each www.economics-ejournal.org 13  100 m of rising above sea level is associated with an increase of the home advantage by 0.032points. A possible explanation for this may be that the available oxygen decreases with increasing altitude. Home team players are likely to be more adapted to performing well in the condition of low oxygen levels. Other explanations, as mentioned by an anonymous referee commenting the original version of the present study (Van Damme and Baert, 2019), might be a decreased air friction and a higher ball velocity. Third, we found that the home advantage in soccer is more outspoken when the number of spectators is higher and when the home team is substantially stronger (in terms of UEFA coefficient) than the away team. These findings are consistent with Nevill et al. (1996), Goumas (2013), Ponzo and Scoppa (2018), and . Finally, no significant association was found with variables capturing derby matches and variables portraying the home advantage at the national level. The latter finding is remarkable, especially for the countries in the Balkans, because the higher home advantage identified in these countries' national leagues often recurs in the literature (Pollard, 2006a(Pollard, , 2006bGómez, 2013, 2014). We end this study by acknowledging its main research limitation. By means of investigating how the home advantage in soccer is associated with a broad spectrum of distance-related variables, we took a step forward in measuring the unbiased, independent importance of these determinants of the home advantage. Yet, the related coefficient estimates mentioned in this article cannot be given a causal interpretation. This is the case as there might be still other factors that we did not include in our study but may correlate with our distance dimensions and with performance in soccer. Therefore, we are in favour of (i) future empirical work that exploits (quasi-)experimental variation in one or more of these dimensions to investigate their genuine causal impact and (ii) qualitative research on the mechanisms underlying the reported association between home advantage and the altitude difference between the cities of the home team and the away team.