How Easy Is It To Qualify For The World Cup in CONCACAF North America?

Information and news on North American, Central American & Caribbean football

How Easy Is It To Qualify For The World Cup in CONCACAF North America?

abramjones
Joined: 04 Aug 2012, 20:06

05 Sep 2017, 22:41 #1

CONCACAF has 3.5 spots for the 2018 World Cup. Here are the major North American countries with their economic determinant (a number that estimates countries potential based on gdp per capita and population amount). See this video for more commentary: https://www.youtube.com/watch?v=X7iZbCOyHmw

United States: 149 million (soccer is unpopular sport here, reduce to about 10 to 15% of value)
Canada: 16 million (soccer is unpopular sport here, reduce to about 10 to 15% of value)
Mexico: 10 million

Puerto Rico: 1.1 million (soccer is unpopular sport here, reduce to about 10 to 15% of value)
Cuba: 641,000  (soccer is unpopular sport here, number should be significantly reduced)
Dominicana: 545,000 (soccer is unpopular sport here, number should be significantly reduced)

Guatemala: 402,000
Costa Rica: 373,000
Panama: 294,000 (not traditionally a soccer country, but starting to lean toward soccer. number still should be reduced)
Salvador: 221,000
Trinidad Tobago: 210,000 (not traditionally a soccer country, but starting to lean toward soccer. number still should be reduced)
Honduras: 161,000
Jamaica: 132,000 (not traditionally a soccer country, but starting to lean toward soccer. number still should be reduced)

Conclusion: United States and Mexico are almost guaranteed to go to the World Cup unless they severely under perform. Other listed countries have moderate difficulty to achieve world cup birth. North American countries not listed here have almost no chance of getting to the World Cup. No matter how efficient or good they are at the sport the economics are too far against their favor. In fact, the only country I think that ever qualified in this group is Haiti.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

06 Sep 2017, 09:19 #2

As more than half of your numbers are labelled as inaccurate, I'm not sure what would be the point of drawing any conclusions from them.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

06 Sep 2017, 09:23 #3

Thinking of simply answering the question in your title, I would look at the number of matches that a team can afford to lose and still qualify.   Or the number of points they need to guarantee it, etc.   This is just a format concern, not a polulation/ability concern.  For example, a team like Honduras could considerably qualify despite losing as many games as it wins. 
Reply

abramjones
Joined: 04 Aug 2012, 20:06

06 Sep 2017, 11:46 #4

nfm24 wrote: As more than half of your numbers are labelled as inaccurate, I'm not sure what would be the point of drawing any conclusions from them.
They aren't inaccurate. But obviously you can't expect a country that doesn't play soccer as its main sport to perform at full potential. Not only can conclusions be made, but they can be made quite clearly.
Reply

Kaizeler
Joined: 05 Apr 2012, 14:54

06 Sep 2017, 12:11 #5

When your "economic determinant" is GDP per capita times population, aren't you just ranking countries/teams (hello Raetia in row 137) by their GDP?

I was thinking you might get more meaningful results by getting metrics that would allow you to get closer to your earlier concept of "available population", such as median instead of average income, or considering only population within given age brackets. For instance, males aged 20-34 represent about a third (!) of Qatar's total population, but only ~8/9% in ageing Western countries such as Italy or Germany.
Reply

Luca
Joined: 09 Feb 2011, 19:58

06 Sep 2017, 12:16 #6

abramjones wrote: No matter how efficient or good they are at the sport the economics are too far against their favor. In fact, the only country I think that ever qualified in this group is Haiti.
Haiti is also the only Caribbean country to have won the CONCACAF continental title (out of 24 editions!). The Caribbean countries have always had problems in prevailing in the CONCACAF region. Maybe a division into two different confederations (North and Central America on the one hand and Caribbean America on the other) can prevent a lot of disparities.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

06 Sep 2017, 14:48 #7

Kaizeler wrote:For instance, males aged 20-34 represent about a third (!) of Qatar's total population, but only ~8/9% in ageing Western countries such as Italy or Germany.
In the case of Qatar, does this include foreign imported workers?  The idea of finessing the "available population" of the whole country is not a bad one but it's obviously open to all sorts of fudging and confirmation bias of the anticipated important factors.   If the figures of the number of active footballers in each country are available, they should be factored in.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

06 Sep 2017, 14:50 #8

abramjones wrote:
nfm24 wrote:As more than half of your numbers are labelled as inaccurate, I'm not sure what would be the point of drawing any conclusions from them.
They aren't inaccurate. But obviously you can't expect a country that doesn't play soccer as its main sport to perform at full potential. Not only can conclusions be made, but they can be made quite clearly.
They are inaccurate for the purposes for which you wish to use them.   Conclusions are hence weak.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

06 Sep 2017, 22:22 #9

Kaizeler wrote: When your "economic determinant" is GDP per capita times population, aren't you just ranking countries/teams (hello Raetia in row 137) by their GDP?

I was thinking you might get more meaningful results by getting metrics that would allow you to get closer to your earlier concept of "available population", such as median instead of average income, or considering only population within given age brackets. For instance, males aged 20-34 represent about a third (!) of Qatar's total population, but only ~8/9% in ageing Western countries such as Italy or Germany.
Yes you're right, the economic determinant method is more simple than the available population method. Oddly, I find it works better than the AP method, probably because accurate data is easier to find. Or maybe because the over all power of the economy is more important than specific age ranges and genders. For instance, the top athletes of the world are going to be directly or indirectly supported by just about everyone, perhaps even a 20 year old women working at a retail store. Maybe she purchases a sports related gift for her boyfriend, which sponsors the sport. A 50 old female nurse may treat a sports fan who regularly goes to sports games, prolonging his sponsorship of the sport. a 35 year old women may drive her sons to practice everyday, which greatly contributes to the sport. These things seem small, but when you have millions of people doing it the consequence is gigantic. This type of economic activity cannot be overlooked, and most all economic activity is intertwined with and supports each other.

note: I did include females in AP as well, I was just bias against certain age groups.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

06 Sep 2017, 22:28 #10

Luca wrote:
abramjones wrote: No matter how efficient or good they are at the sport the economics are too far against their favor. In fact, the only country I think that ever qualified in this group is Haiti.
Haiti is also the only Caribbean country to have won the CONCACAF continental title (out of 24 editions!). The Caribbean countries have always had problems in prevailing in the CONCACAF region. Maybe a division into two different confederations (North and Central America on the one hand and Caribbean America on the other) can prevent a lot of disparities.
They have problems prevailing in many sports for a couple of different reasons. First the obvious is wealth and population, second is that a lot of them are occupied by other sports... mostly cricket, baseball, and basketball.  I think a better option would be creating a West Indies football team.
Last edited by abramjones on 06 Sep 2017, 22:33, edited 1 time in total.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

06 Sep 2017, 22:30 #11

nfm24 wrote:
Kaizeler wrote:For instance, males aged 20-34 represent about a third (!) of Qatar's total population, but only ~8/9% in ageing Western countries such as Italy or Germany.
In the case of Qatar, does this include foreign imported workers?  The idea of finessing the "available population" of the whole country is not a bad one but it's obviously open to all sorts of fudging and confirmation bias of the anticipated important factors.   If the figures of the number of active footballers in each country are available, they should be factored in.
This would be a different method of ranking than what I'm doing. In fact, we have talked about this in another conversation. It is a very good idea, but it is a separate idea... and an idea that would require many more resources than what I'm doing, which is already extensive in terms of a hobby.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

06 Sep 2017, 22:31 #12

nfm24 wrote:
abramjones wrote:
nfm24 wrote:As more than half of your numbers are labelled as inaccurate, I'm not sure what would be the point of drawing any conclusions from them.
They aren't inaccurate. But obviously you can't expect a country that doesn't play soccer as its main sport to perform at full potential. Not only can conclusions be made, but they can be made quite clearly.
They are inaccurate for the purposes for which you wish to use them.   Conclusions are hence weak.
This is a baseless claim.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

07 Sep 2017, 19:32 #13

Which claim, yours or mine?  All I'm saying is that you are trying to base yours on a list of data of which more than half have a disclaimer about their accuracy in parentheses.   What is the basis for "reduce to about 10 to 15% of value" ?   Wouldn't it be better to use more relevant figures which don't need arbitrary fudging in over 50% of cases?
Reply

nfm24
Joined: 07 Apr 2007, 16:28

07 Sep 2017, 19:38 #14

abramjones wrote:
nfm24 wrote: If the figures of the number of active footballers in each country are available, they should be factored in.
This would be a different method of ranking than what I'm doing. In fact, we have talked about this in another conversation. It is a very good idea, but it is a separate idea... and an idea that would require many more resources than what I'm doing, which is already extensive in terms of a hobby.
It's not a separate idea, it's the same idea but with more relevant input data. 

FIFA did a sort of census of players in 2006, so there is good data easily available. 
Reply

abramjones
Joined: 04 Aug 2012, 20:06

11 Sep 2017, 00:06 #15

nfm24 wrote: Which claim, yours or mine?  All I'm saying is that you are trying to base yours on a list of data of which more than half have a disclaimer about their accuracy in parentheses.   What is the basis for "reduce to about 10 to 15% of value" ?   Wouldn't it be better to use more relevant figures which don't need arbitrary fudging in over 50% of cases?
You're missing the point.... it has already been showing without even fudging the numbers that wealthier and more populated countries rank higher a disproportionate amount of the time. There is no need to change the numbers, I'm simply showing how obvious it is.... because if we did have these "more accurate" numbers my claims would be even more evident... and they are already extremely evident: https://docs.google.com/spreadsheets/d/ ... Ho2TFO5vcM I don't know if you are just trolling or don't understand what I'm doing.
Last edited by abramjones on 11 Sep 2017, 00:12, edited 1 time in total.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

11 Sep 2017, 00:11 #16

nfm24 wrote:
abramjones wrote:
nfm24 wrote: If the figures of the number of active footballers in each country are available, they should be factored in.
This would be a different method of ranking than what I'm doing. In fact, we have talked about this in another conversation. It is a very good idea, but it is a separate idea... and an idea that would require many more resources than what I'm doing, which is already extensive in terms of a hobby.
It's not a separate idea, it's the same idea but with more relevant input data. 

FIFA did a sort of census of players in 2006, so there is good data easily available. 
The point of my rankings is to show rankings after wealth and population have been negated as much as possible. Certainly you could create a ranking system that accounts for diet, genetics, et cetera if you could quantify it. I think the dynamics that go into sports infrastructure are too far reaching to view number of players alone, and at the same time throw out wealth and population adjusted rankings. I think if we only look at registered number of players, and perhaps wealth and population with differen weights... it may be not enough on its own because of the dynamics involved. I previously stated that such a ranking system might make mine obsolete, and there's a reason I said might... because of these economic dynamics.

Data from 2006 in the most popular sport in the world is not nearly sufficient to fulfill this task. I am dealing with many sports that don't have such data available, and i'm also dealing with all years. Having said that, if someone makes a ranking system that includes number of registered players, I will be very interested in it and support it, even if it doesn't go back far in history.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

11 Sep 2017, 09:09 #17

> it has already been showing without even fudging the numbers that wealthier and more populated countries rank higher a disproportionate amount of the time.

It hasn't.  At least, not by you.


> I think the dynamics that go into sports infrastructure are too far reaching to view number of players alone

Obviously, but the same applies to population & wealth.  It's bizarre that you want to factor in the entire population of the country *before* considering the footballing subset.  The fact that you labelled so much of your own data (e.g. 5 of the top 6 in your list) as irrelevant for soccer makes this even more bizarre.  


> Data from 2006 in the most popular sport in the world is not nearly sufficient to fulfill this task. I am dealing with many sports that don't have such data available, and i'm also dealing with all years.

Perhaps you should revise your objectives in order to be able to use the available data.  Incidentally, similar data exists well before 2006.  I recall various "football directory"-type books giving lists of registered players for most countries.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

10 Oct 2017, 05:56 #18

nfm24 wrote: > it has already been showing without even fudging the numbers that wealthier and more populated countries rank higher a disproportionate amount of the time.

It hasn't.  At least, not by you.


> I think the dynamics that go into sports infrastructure are too far reaching to view number of players alone

Obviously, but the same applies to population & wealth.  It's bizarre that you want to factor in the entire population of the country *before* considering the footballing subset.  The fact that you labelled so much of your own data (e.g. 5 of the top 6 in your list) as irrelevant for soccer makes this even more bizarre.  


> Data from 2006 in the most popular sport in the world is not nearly sufficient to fulfill this task. I am dealing with many sports that don't have such data available, and i'm also dealing with all years.

Perhaps you should revise your objectives in order to be able to use the available data.  Incidentally, similar data exists well before 2006.  I recall various "football directory"-type books giving lists of registered players for most countries.
It hasn't.  At least, not by you.

A 10 year old could show you. Imagine any country or any region. Divide that region in half, divide it again, divide it again. At this instant in time, any smaller part will never have more sporting talent than the whole combined. In very rare cases they may have equal sporting power. You are making a big fuss against a fact that is so elementary (as many people seem to do).

Obviously, but the same applies to population & wealth.  It's bizarre that you want to factor in the entire population of the country *before* considering the footballing subset.  

What is bizarre to me is that you have trouble understanding that the whole of the population has the potential to contribute to the sport, directly and indirectly. I would totally support a ranking system that only takes into consideration amount of players in a sport, as well as only the amount of wealth directly invested in the sport. This will tell us a lot, but it is still a different perspective. They are both important.

The fact that you labelled so much of your own data (e.g. 5 of the top 6 in your list) as irrelevant for soccer makes this even more bizarre.  

An estimated number is not an irrelevant number. I'm going to an apple party with an unknown thousands of people. 10 of the people have their own apple trees and are bringing some apples. I can deduct from this that I won't need quite as many apples. Though I don't know the exact numbers of apples they are bringing, I can estimate it and save myself some money. And chances are that most apple cravings at the party will still be satisfied. The week after that I'm going to another apple party, but I am solely responsible for all apples. In this case I know to go all out with my apple buying. This is the same concept as when I say we must reduce the total E-determinant for countries that don't play soccer as a main sport to get a better estimate on their potential... because obviously they are not devoting the maximum amount of their sporting resources to that sport. Why must you make such a big deal about things that children can understand?
Reply

nfm24
Joined: 07 Apr 2007, 16:28

11 Oct 2017, 09:58 #19

> A 10 year old could show you. Imagine any country or any region. Divide that region in half, divide it again, divide it again. ...


The fact that a half is smaller than a whole, while fascinating, does nothing to show that "wealthier and more populated countries rank higher a disproportionate amount of the time", which is what you haven't demonstrated in any useful capacity.   Such statements are only meaningful when a more quantitative aspect is brought to the table, in a way that is relevant to the topic under discussion, and this is what you have consistently failed to do.


>> It's bizarre that you want to factor in the entire population of the country *before* considering the footballing subset.  
> What is bizarre to me is that you have trouble understanding that the whole of the population has the potential to contribute to the sport, directly and indirectly.

Why do you think I have trouble understanding something just because I say that there is a better way?  Of course the general population influences "national performance", in any field.  It is just a weaker and less relevant study to consider the entire population than to consider the footballing subset, or other connected subsets.  

You seem unwilling to incorporate any sport-specific inputs, even when the data is there.  In fact the only soccer-specific input you've added here is when you arbitrarily hacked down most of your input data because "soccer is unpopular sport here".

Wouldn't it be better to show that a good correlation exists in one particular sport (one with good input data), before mixing in all sports together?


>> The fact that you labelled so much of your own data (e.g. 5 of the top 6 in your list) as irrelevant for soccer makes this even more bizarre
> An estimated number is not an irrelevant number.
...
> we must reduce the total E-determinant for countries that don't play soccer as a main sport to get a better estimate on their potential...

Reducing most of the key input parameters by an arbitrary amount in an unqualified manner, is not "estimating" and is not conducive to a convincing argument in a quantitative study.


> I'm going to an apple party with an unknown thousands of people.

Does anybody else know what he is talking about?
Reply

mattsanger92
Joined: 04 Jul 2011, 10:46

11 Oct 2017, 10:10 #20

nfm24 wrote:> I'm going to an apple party with an unknown thousands of people.

Does anybody else know what he is talking about?
Nope, maybe a censor-friendly substitute for booze?

Personally I think the best thing to do is to make some combustible apples and burn their house down sorry, wrong fruit.
Reply

Kaizeler
Joined: 05 Apr 2012, 14:54

11 Oct 2017, 12:09 #21

Trying to get us back to sanity, or at least more in line with the question at hand:

1. How easy is it for each country to qualify for the World Cup? Look at pre-qualifying odds for a first-level estimate.
2. Does it depend on the economic determinant? Plot those figures against each country's GDP.
3. ???
4. Profit

Europe and Africa as bonus:

Image

Image

Horizontal axis is the square root of GDP because diminishing returns.

Potentially good news for smaller economies on the rise, but you just don't know if you'll turn out a Norway or a Belgium...
Reply

abramjones
Joined: 04 Aug 2012, 20:06

11 Oct 2017, 12:31 #22

nfm24

The fact that a half is smaller than a whole, while fascinating, does nothing to show that "wealthier and more populated countries rank higher a disproportionate amount of the time", which is what you haven't demonstrated in any useful capacity.   Such statements are only meaningful when a more quantitative aspect is brought to the table, in a way that is relevant to the topic under discussion, and this is what you have consistently failed to do.

It's just common sense. I am actually in the process of showing this with actual numbers using NBA PER player ratings sorted by US state. In my rough draft USA has a rating of 130, California had a rating of a little over 100, and Florida was close to 115. Most other states I did (I didn't get to Texas) were well under 100. Very small states tended to have extremely low ratings. No U.S. state was able to match the U.S. itself, what a surprise.

Why do you think I have trouble understanding something just because I say that there is a better way?

Because, you are trying to compare apples and oranges (or maybe 2 types of fruit that are more closely related). I want to look at the impact of entire economies, you want me to look at the sport specific economy. I have considered both ways, and I want to look at the entire economy. I'm not saying it's better or worse, I'm saying both are excellent ways of looking at it, but I only have the resources to do one at the moment. 

  
Of course the general population influences "national performance", in any field.  It is just a weaker and less relevant study to consider the entire population than to consider the footballing subset, or other connected subsets.  

It's not weaker, it is different.

You seem unwilling to incorporate any sport-specific inputs, even when the data is there.  In fact the only soccer-specific input you've added here is when you arbitrarily hacked down most of your input data because "soccer is unpopular sport here".

I don't have the resources at the moment, you could do it though.

Wouldn't it be better to show that a good correlation exists in one particular sport (one with good input data), before mixing in all sports together? 

It depends what you're looking for... I am more interested in the consequences of entire economies.

Reducing most of the key input parameters by an arbitrary amount in an unqualified manner, is not "estimating" and is not conducive to a convincing argument in a quantitative study.

Of course it's estimating. Do you know what the word estimate means?

Does anybody else know what he is talking about?

It's very simple, and you do not wish to understand.
Last edited by abramjones on 11 Oct 2017, 12:40, edited 2 times in total.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

11 Oct 2017, 12:37 #23

mattsanger92 wrote:
nfm24 wrote:> I'm going to an apple party with an unknown thousands of people.

Does anybody else know what he is talking about?
Nope, maybe a censor-friendly substitute for booze?

Personally I think the best thing to do is to make some combustible apples and burn their house down sorry, wrong fruit.
😂 I was giving an example of a related situation where a broad estimate can be practical and helpful.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

11 Oct 2017, 12:44 #24

Kaizeler wrote: Trying to get us back to sanity, or at least more in line with the question at hand:

1. How easy is it for each country to qualify for the World Cup? Look at pre-qualifying odds for a first-level estimate.
2. Does it depend on the economic determinant? Plot those figures against each country's GDP.
3. ???
4. Profit

Europe and Africa as bonus:

Image

Image

Horizontal axis is the square root of GDP because diminishing returns.

Potentially good news for smaller economies on the rise, but you just don't know if you'll turn out a Norway or a Belgium...
This is very interesting. What is the blue line? One thing I would recommend is using both population and gdp per capita, which is what I labelled economic determinant. But this is good too. It looks like there is a quite noticeable correlation between GDP and success. Who would have thought?
Reply

nfm24
Joined: 07 Apr 2007, 16:28

11 Oct 2017, 12:57 #25

Nice work.   Glad to see someone has actually done some meaningful work.

I was thinking some sort of sigmoidal curve to capture the minnow tail and the upper plateau.  Then most of the main outliers would be teams with a particularly strong/weak generation (Portugal, Wales, Iceland, Norway...).

A question about your use of pre-qualifying odds - are these before the group draw was made, or after?  Figures might vary a bit based on getting an "easy" or "hard" group.    And of course, these are very much snapshot data, representing one instant in time rather than a general property of a country over many cycles.

The Europe-Africa contrast is stark, you can see the influence of competition format (including also the number of places available of course).
Reply

nfm24
Joined: 07 Apr 2007, 16:28

11 Oct 2017, 13:33 #26

>> "wealthier and more populated countries rank higher a disproportionate amount of the time", which is what you haven't demonstrated in any useful capacity.

> It's just common sense.

The qualitative aspect is common sense, but you are trying to infer quantitative conclusions without having done any quantitative comparison.  This sort of topic is redundant unless a thorough quantitative aspect is brought to the table.  The data you have posted above offer nothing in this regard, and it is disappointing that you are still stuck on "it's just common sense" rather than advancing to any sort of proper analysis.  


>> Why do you think I have trouble understanding something just because I say that there is a better way?
> Because, you are trying to compare apples and oranges (or maybe 2 types of fruit that are more closely related).

No I'm not saying use two different non-overlapping data sets, I'm saying use the specific relevant subset of the same data set.  If I want to analyse the success of the apple harvests in many countries then it would be better to use input data on apple tree planting in those countries, rather than general fruit planting data.   I might get away with doing the latter, but the analysis would include a lot of noise and would be much less meaningful.

BTW what is your obsession with apples?  Is this an American thing?  


>> You seem unwilling to incorporate any sport-specific inputs, even when the data is there.
> I don't have the resources at the moment,

What resources?   All that is needed (as a basic first effort) is to add one or two columns of easily retrievable data into your spreadsheet, and press "plot". 


>> Reducing most of the key input parameters by an arbitrary amount in an unqualified manner, is not "estimating" and is not conducive to a convincing argument in a quantitative study.
> Of course it's estimating.

No, it's guessing.  Call it what you want though, it is simply an inadequate approach.


Overall, given that you're basically refusing to add any sport-informed data into your comparison, perhaps this topic should be moved into a non-sporting sub-forum - it is just about economic data (and apples).
Reply

Kaizeler
Joined: 05 Apr 2012, 14:54

11 Oct 2017, 15:16 #27

abramjones wrote: This is very interesting. What is the blue line? One thing I would recommend is using both population and gdp per capita, which is what I labelled economic determinant. But this is good too. It looks like there is a quite noticeable correlation between GDP and success. Who would have thought?
The correlation is there, for sure. 🙃

The blue line is the linear estimate, the straight line that best fits the data points (I'm guessing via ordinary least squares), and its equation is described in the grey boxes. As a trendline, it can also help you make predictions: e.g. Russia obviously didn't have to go through qualifying, but if they had, given their GDP of 1.3 short trillion USD this model would put their chances at ~64%.

That is not to say that it is the best line in general; as Neil mentions given the amount of minnows (or the relatively low number of spots) a sigmoid curve would fit better, but Excel doesn't offer that feature by default. It does allow for a 3rd degree polynomial trendline, which is marginally better.

nfm24 wrote:A question about your use of pre-qualifying odds - are these before the group draw was made, or after?  Figures might vary a bit based on getting an "easy" or "hard" group.    And of course, these are very much snapshot data, representing one instant in time rather than a general property of a country over many cycles.
I did try to find odds provided by actual bookmakers but I couldn't find historical numbers. So I resorted to my (patent-pending) simulator to look at each country's chances (not using the actual figures in the link; those are four years old). For extra fairness, I considered seedings but not the actual groups (so knowing that e.g. Wales and Croatia would be in Pot 1 but allowing for random groupings), but that's because I suffer from Toon Hermans syndrome.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

11 Oct 2017, 16:50 #28

Kaizeler wrote:That is not to say that it is the best line in general; as Neil mentions given the amount of minnows (or the relatively low number of spots) a sigmoid curve would fit better, but Excel doesn't offer that feature by default. It does allow for a 3rd degree polynomial trendline, which is marginally better.
If you have Gnuplot or something similar you can optimize the coefficients/parameters of an arbitrary fit function.   If not, send me the raw data and I'll have a bash.
nfm24 wrote:I did try to find odds provided by actual bookmakers but I couldn't find historical numbers. So I resorted to my (patent-pending) simulator to look at each country's chances (not using the actual figures in the link; those are four years old)
Ah OK.  Some sort of calibration may be needed to make this fully rigourous :-)  This might potentially be better than bookies' odds anyway due to the tight-fisted gits adjusting to make a profit...  the main thing is that your simulator has the competition format inherent within it.    Perhaps do a version when qualification ends using the bookies' odds of the 32 finalists to win the cup  (before and after the draw is made)?
Reply

abramjones
Joined: 04 Aug 2012, 20:06

11 Oct 2017, 23:32 #29

nfm24:

This sort of topic is redundant unless a thorough quantitative aspect is brought to the table

Yes, it is redundant... and the point of these videos is just to recognize what's going on with common sense. I have already done the spreadsheets on this page: https://docs.zoho.com/writer/open/1vl7i ... 70336f002e which offer more details, that you have seen already, I believe. Here we can see obvious correlation.

No, it's guessing.  Call it what you want though, it is simply an inadequate approach.

It is effective for it's purpose, just as in the apple story, that is what you overlook.

What resources?  

Time, a lot of time.

No I'm not saying use two different non-overlapping data sets, I'm saying use the specific relevant subset of the same data set.  If I want to analyse the success of the apple harvests in many countries then it would be better to use input data on apple tree planting in those countries, rather than general fruit planting data.   I might get away with doing the latter, but the analysis would include a lot of noise and would be much less meaningful.

nfm... i know what you're saying, you've explained this over and over, and i got it the first (or second) time. but you are not hearing my point of view. this is to see the impact that raw wealth and population amounts have on sporting potential. yes, there is a lot of noise... but what's important is that we still see this correlation through the noise. the only time that this correlation has been greatly reduced is when viewing a sport that is popular mostly among small countries (Aussie Rules). With deductive reasoning we can tell that interest in the sport has something to do with it, though we could have used common sense to arrive at this conclusion as well.

BTW what is your obsession with apples?  Is this an American thing?  

I'm not sure, I don't interact with Americans like that. But I do eat a lot of apples.

Overall, given that you're basically refusing to add any sport-informed data into your comparison, perhaps this topic should be moved into a non-sporting sub-forum - it is just about economic data (and apples).

Maybe, but I am viewing sports related results, not apple related results. So your suggestion is more a philosophical one on where the topic belongs.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

11 Oct 2017, 23:54 #30

kaizeler:

The correlation is there, for sure.

And you disagree that the causation is also? Using critical thinking we can come to a pretty safe conclusion that wealth and population are major factors behind these results. I could make a similar spreadsheet of uniform color... and in the case of soccer football, teams wearing white or lighter jerseys may perform better. But we know that in soccer football home teams generally wear white or lighter colors. We also know, that in some sports under the right conditions (which are quite common) the home team has an advantage. From this truth we can safely conclude that the white or light colored jerseys themselves are not causing the results seen in the spreadsheet. In this example there is correlation but not causation.

Coincidentally... we could make a spreadsheet of the performance of all countries in sports based on racial majority. Certainly countries with white majorities would generally outperform everyone else, by quite a bit I predict. But we know that being white is not what is causing this for several reasons. For one, we scientifcally know there is not much difference between perceived races, and we also know that white countries are usually the ones with most of the... wait for it... ...  (drum rolllllllllllllllllllll) ... ... ... wealth!

The blue line is the linear estimate, the straight line that best fits the data points (I'm guessing via ordinary least squares), and its equation is described in the grey boxes. As a trendline, it can also help you make predictions: e.g. Russia obviously didn't have to go through qualifying, but if they had, given their GDP of 1.3 short trillion USD this model would put their chances at ~64%.


That is not to say that it is the best line in general; as Neil mentions given the amount of minnows (or the relatively low number of spots) a sigmoid curve would fit better, but Excel doesn't offer that feature by default. It does allow for a 3rd degree polynomial trendline, which is marginally better.

Thanks, I learned a lot from this. Also, why is the square root of a number good to use to estimate diminishing returns, or are you just using it as an arbitrary estimate?
Reply

Kaizeler
Joined: 05 Apr 2012, 14:54

12 Oct 2017, 08:02 #31

abramjones wrote: kaizeler:

The correlation is there, for sure.

And you disagree that the causation is also? Using critical thinking we can come to a pretty safe conclusion that wealth and population are major factors behind these results.
Obviously wealth and performance aren't independent factors. But Spanish players won't forget how to kick a ball if a bank collapses, and the Albanians won't become Ronaldos if tomorrow oil is found on their coast. Such a direct connection x=>y is clearly too simple and not the best model available, not least because as mentioned it's missing some sort of population metrics (at the very least total population) or sport-specific inputs.
There will be other factors in between (how the countries invest their wealth), often with a time-lag effect. We can muse about what those may be: a richer country will invest more in sporting facilities; a country where its citizens have more disposable income will see them enroll their children in after-school sporting activities; nations with more amenities and services will attract foreign players and coaches which can boost their knowledge and technique; countries with stronger economies will welcome migrants from turbulent areas (see e.g. Switzerland now benefiting from immigration from the former Yugoslavia in the 1990s). But I haven't studied those causation factors. And so I cannot tell if attracting foreign investment will do more good than changing the shirt colour.
Also, why is it seemingly more important in Europe than in Africa? Would the results look differently if Africa had 10 qualifying spots? The impact of the qualification structure should be assessed as well.

abramjones wrote:Thanks, I learned a lot from this. Also, why is the square root of a number good to use to estimate diminishing returns, or are you just using it as an arbitrary estimate?
Generally speaking you would expect diminishing returns in economic functions (doubling the input will not generate twice as much output), and the square root is just a default go-to estimation option. Plotting the square root in the horizontal axis doesn't change the underlying "true" relationship between the variables, which might end up being best described by a cubic root, or a power of 0.8, or a log-function, or some other transformation. In this case it is just used to try to better find a linear trendline.

Consider the following example I found online:
Image Image
The relationship between "Budget" and "AdSpace" is the same in both cases, you just have the option of framing it as "y grows with x in quadratic-root fashion" or "y grows with sqrt(x) in linear fashion".
Reply

Kaizeler
Joined: 05 Apr 2012, 14:54

12 Oct 2017, 08:30 #32

PS: It now becomes obvious why the US didn't qualify for the World Cup.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

12 Oct 2017, 18:40 #33

kaizeler:
There will be other factors in between 

I agree with all of this. Wealth and population are root factors, they determine potential along with other things such as genetics, environment, et cetera. After that are human factors such as diet, cultural traits, healthcare system, population density, et cetera. Then are the sport specific inputs nfm24 refers to such as interest in the sport, sport infrastructure, tournament structure, et cetera. Many of these factors could be placed in other categories in different scenarios. For example if genetic engineering continues to advance we could see super athletes created. Even now we could take part in conscious genetic selection of humas to make better athletes. What I try to do with my rankings is estimate how a nation has lived up to their potential in any particular sport after removing as much as possible the root impact of wealth and population.

doubling the input will not generate twice as much output

Absolutely, in my w&p rankings I use adjusted scores depending on economic determinant difference and reduce the accumulative handicap added as difference increases. I need to make this more curved though. I have also started on regional rankings, which is a related but separate idea. These take into account player performace index ratings and sorted based on where players grew up. I use economic determinant to sort these regions into conferences. After being shown this I think I will sort them by square root of the economic determinant. The increase in score from city to county to state/province is generally much less than I would have predicted, so this will help.

PS: It now becomes obvious why the US didn't qualify for the World Cup.

red jerseys > biggest economy in world, i love it 😂 what joy T&T has brought me. First soca, and now this.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

12 Oct 2017, 21:45 #34

abramjones wrote:This is very interesting. What is the blue line?
Do you really not know what a best-fit curve is?
Reply

LeonardoP
Joined: 15 Aug 2013, 05:00

12 Oct 2017, 22:47 #35

My two cents: isn't it best if, instead of discussing if a metric X is good enough for predicting another metric Y, we try to find the metrics that best predict Y?

I don't have a comprehensive numerical training (my graduation is in Computing field, instead), but I think PCA (principal component analysis) technique fits the problem very well.

My suggestions for X set: GDP, population size, GDP per capita, number of professional players, ratio of professional players for each million inhabitants. Many others are possible.
My suggestions for Y: either number of appearances in World Cup (possibly with Bayesian weights favoring last WCs) or number of points in WC qualifiers.
Last edited by LeonardoP on 13 Oct 2017, 13:08, edited 1 time in total.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

12 Oct 2017, 23:11 #36

LeonardoP wrote:My two cents: isn't it best if, instead of discussing if a metric X is good enough for predicting another metric Y, we try to find the metrics that best predict Y?
Yes, and this is what we've consistently tried to get him to do since he first began posting on this board, but he isn't much interested in adapting his own X. 
Reply

nfm24
Joined: 07 Apr 2007, 16:28

12 Oct 2017, 23:25 #37

Kaizeler wrote:Russia obviously didn't have to go through qualifying, but if they had, given their GDP of 1.3 short trillion USD this model would put their chances at ~64%.
Just a moot point of interest, that figure (64%) is presumably based on leaving the other teams unchanged, whereas, if Russia actually had to go through qualifying, the other countries' odds would be affected.
Reply

Luca
Joined: 09 Feb 2011, 19:58

13 Oct 2017, 18:04 #38

abramjones wrote:
Then are the sport specific inputs nfm24 refers to such as interest in the sport, sport infrastructure, tournament structure, et cetera. Many of these factors could be placed in other categories in different scenarios.
Good structures aren't always necessary. Some of the greatest footballers ever grew up playing on a dusty street or on a beach with a ball of rags. And playing on ravel often helps you improve your skills. This is what many coaches and instructors always repeat.
abramjones wrote:
I agree with all of this. Wealth and population are root factors, they determine potential along with other things such as genetics, environment, et cetera. After that are human factors such as diet, cultural traits, healthcare system, population density, et cetera.
 Wealth doesn't always determine the good results of a national team. Poverty in many cases is more decisive than wealth, because young guys can see in football an instrument of social redemption. Some of the greatest players ever were of very humble birth.
Reply

Kaizeler
Joined: 05 Apr 2012, 14:54

14 Oct 2017, 09:00 #39

LeonardoP wrote:I don't have a comprehensive numerical training (my graduation is in Computing field, instead), but I think PCA (principal component analysis) technique fits the problem very well.

My suggestions for X set: GDP, population size, GDP per capita, number of professional players, ratio of professional players for each million inhabitants. Many others are possible.
My suggestions for Y: either number of appearances in World Cup (possibly with Bayesian weights favoring last WCs) or number of points in WC qualifiers.
That's okay; I graduated in Marketing. Anyway, working on it. Clicking the excel icon on the right will let you download the entire dataset.

nfm24 wrote:
Kaizeler wrote:Russia obviously didn't have to go through qualifying, but if they had, given their GDP of 1.3 short trillion USD this model would put their chances at ~64%.
Just a moot point of interest, that figure (64%) is presumably based on leaving the other teams unchanged, whereas, if Russia actually had to go through qualifying, the other countries' odds would be affected.
Good point. Otherwise the percentages would add up to 1364% and would need to be adjusted downwards.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

14 Oct 2017, 15:44 #40

Luca wrote:
abramjones wrote:
Then are the sport specific inputs nfm24 refers to such as interest in the sport, sport infrastructure, tournament structure, et cetera. Many of these factors could be placed in other categories in different scenarios.
Good structures aren't always necessary. Some of the greatest footballers ever grew up playing on a dusty street or on a beach with a ball of rags. And playing on ravel often helps you improve your skills. This is what many coaches and instructors always repeat.
abramjones wrote:
I agree with all of this. Wealth and population are root factors, they determine potential along with other things such as genetics, environment, et cetera. After that are human factors such as diet, cultural traits, healthcare system, population density, et cetera.
 Wealth doesn't always determine the good results of a national team. Poverty in many cases is more decisive than wealth, because young guys can see in football an instrument of social redemption. Some of the greatest players ever were of very humble birth.
While these specific training methods may be good, the over all notion is a romantic one rather than reality. And I agree that it isn't always necessary, but usually the case is that weather and bigger countries will perform better. Wealth doesn't determine exact outcomes, it just changes the odds in favor of wealthier bigger countries. There is a big difference between determining exact standings and offering a severe advantage.

The thing you have to keep in mind about your last comment is the sheer number of people in poverty that try to accomplish that dream, and the small percentage of them that actually get there. Good athletes generally come from backgrounds of significant wealth (not to be confused with rich) or are intertwined into a scene where there is a lot of wealth to support them and for them to benefit from.

A great example I like to use (though it won't exactly apply to much of the football world), are USA basketball players. Many people identify them as coming from impoverished backgrounds, but that is pretty much a fairy tale. Though ghettos in the USA are certainly lower in wealth than suburban USA they are still among the richest people in the world... supported by huge infrastructures and decent economies. They certainly have had it easier than most Lithuanian and Serbian players growing up. This situation might apply to your "social redemption" idea. Many of these kids don't have a career path envisioned or set up for them like kids in suburban USA. So they may put more into playing sports while the suburban kids put more into academics.

Now think of countries like Brazil or Argentina... in these countries, by global definitions, you have a significant amount moderately wealthy areas, a few rich areas, and a high amount of impoverished areas (true poverty, unlike in the USA). A lot of these impoverished kids still may receive significant support from the wealthy football infrastructure in those countries. I don't know the details of this, but I hope you understand my point. But compare their opportunities to countries where most of the nation is in severe poverty... like most African and many Asian countries. It is a completely different situation... the impoverished players there generally don't receive nearly as much support as they may receive in countries like Argentina, Brazil, Uruguay, Honduras, Costa Rica, Mexico, et cetera.

One thing that really annoys me is when people say in the same sentence something like "Mexico and India are both poor countries" when Mexico is much wealthier than India. From a western perspective it is easy for westerners to lump them both in the same category, but the real economic situation is much different.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

14 Oct 2017, 15:59 #41

Kaizeler wrote:
LeonardoP wrote:I don't have a comprehensive numerical training (my graduation is in Computing field, instead), but I think PCA (principal component analysis) technique fits the problem very well.

My suggestions for X set: GDP, population size, GDP per capita, number of professional players, ratio of professional players for each million inhabitants. Many others are possible.
My suggestions for Y: either number of appearances in World Cup (possibly with Bayesian weights favoring last WCs) or number of points in WC qualifiers.
That's okay; I graduated in Marketing. Anyway, working on it. Clicking the excel icon on the right will let you download the entire dataset.

nfm24 wrote:
Kaizeler wrote:Russia obviously didn't have to go through qualifying, but if they had, given their GDP of 1.3 short trillion USD this model would put their chances at ~64%.
Just a moot point of interest, that figure (64%) is presumably based on leaving the other teams unchanged, whereas, if Russia actually had to go through qualifying, the other countries' odds would be affected.
Good point. Otherwise the percentages would add up to 1364% and would need to be adjusted downwards.

thank-you! i'll share this on some other boards.
Last edited by abramjones on 14 Oct 2017, 16:04, edited 2 times in total.
Reply

nfm24
Joined: 07 Apr 2007, 16:28

14 Oct 2017, 16:00 #42

Hopefully this is an easy tweak: can you also include the capacity to plot qualifying odds against Elo rating (both are only permitted as y-axes currently)?

As the odds were derived directly from the Elo ratings, this will just reflect the logarithmic definition of Elo (hence I suggested a sigmoid earlier) with a steeper curve due to iterated application over a large number of matches per team.  But it may also give us insight into the qualifying format/seeding variation across the continents - which continents are "fairer" in the sense of allowing non-negligible chance of qualifying to teams outside the top N ranked (where N is number of qualifying places available). 


The relationship between actual team strength (ranking) and qualifying potential (based on known format) is still the most interesting part of all this, and the part most relevant to the actual question, in my view - we had a similar discussion previously on WC formats where the Cubic Watermelon simulator was first released :-)  

Whatever correlation exists between team strength and non-football factors (e.g. population, GDP), if you're talking about qualifying for the World Cup then this correlation has to be convoluted via the above relationship, which is severely format dependent.
Reply

abramjones
Joined: 04 Aug 2012, 20:06

14 Oct 2017, 16:17 #43

LeonardoP wrote: My two cents: isn't it best if, instead of discussing if a metric X is good enough for predicting another metric Y, we try to find the metrics that best predict Y?

I don't have a comprehensive numerical training (my graduation is in Computing field, instead), but I think PCA (principal component analysis) technique fits the problem very well.

My suggestions for X set: GDP, population size, GDP per capita, number of professional players, ratio of professional players for each million inhabitants. Many others are possible.
My suggestions for Y: either number of appearances in World Cup (possibly with Bayesian weights favoring last WCs) or number of points in WC qualifiers.
another thing to consider is number of registered players rather than total players (referring to the 2006 count).

the last thing I don't think has been mentioned here is amount of money put into the sport (this can be tricky as much of it is indirect). after all, if number of players is considered over population amount, then wouldn't money invested be considered over GDP per capita/GDP?
Reply

Kaizeler
Joined: 05 Apr 2012, 14:54

15 Oct 2017, 11:13 #44

nfm24 wrote: Hopefully this is an easy tweak: can you also include the capacity to plot qualifying odds against Elo rating (both are only permitted as y-axes currently)?

As the odds were derived directly from the Elo ratings, this will just reflect the logarithmic definition of Elo (hence I suggested a sigmoid earlier) with a steeper curve due to iterated application over a large number of matches per team.  But it may also give us insight into the qualifying format/seeding variation across the continents - which continents are "fairer" in the sense of allowing non-negligible chance of qualifying to teams outside the top N ranked (where N is number of qualifying places available). 
Yes, making the metrics available on both axes is not an issue. I'll be adding into the visualization a few others that are also available in the excel as well.

I've also included, besides the latest Elo and the latest CW ratings, the CW rating I've used in the simulations (taken at the point in which each regional qualification started). The estimated qualification %s come from a system that already incorporates the qualification format (e.g. France had a higher rating than Portugal, but a lower estimated qualification chance because they were in Pot 2).

I'm sure the mathematicians in the room can have a field day with this.

And yes, we've been here before on the format unfairness question. 🙃
Reply

nfm24
Joined: 07 Apr 2007, 16:28

15 Oct 2017, 13:17 #45

We can also consider the inverse questions, e.g. by how much does a given team need to improve in order to increase its qualifying chances by x%  etc.
Reply