Pages

Friday, 21 July 2017

Shots, Blocks And Game State

In this post I described a way to quantify game state by reference to how well or badly a side was doing in relation to their pregame expectations.

So rather than simply using the current scoreline to define game state, it gave a much more nuanced description of the state of the game, particularly in those frequent phases of a match when the sides were level.

It also incorporates time remaining into the calculation. 

A team level after 10 minutes might be in a very different situation compared to the same score differential, but with ten minutes remaining. How they and their opponents played out the subsequent time may be very different in the two scenarios.

At a simplistic level, those teams in a happy place may be more content to prioritise actions that maintain the status quo, such as defend more, while those who'd wish to alter the state of the game might put more resources into attack than had previously been the case.

It seems logical that a more defensive approach should result that team accumulating more products of a packed defence, such as blocked shots, while any chances they do create may be meet with increasingly fewer defenders.

I took at look at the correlation between blocks and clear cut or so called big chances and the prevailing state of the game and there was a significant relationship between them.

A side in a poor state of the game had more chance of their goal attempts being blocked and his increased as their game state deteriorated.

Similarly, a side in a positive state of the game was more likely to create a chance that was deemed a big chance.

This appears to fit which the hypothesis of content teams packing their defence more, and increasing the likelihood that they block an attempt and if they do scoot off upfield, they're more likely to be met with a depleted defence.

However, correlation doesn't prove causation etc etc. 

In the case of a side being more likely to create big chances, there may be a confounding factor that is causing both the good state of the game and the big chances. (Think raincoats, wet pavements and weather).

That factor is possibly team quality.

The top six account for 30% of the Premier League, but took 48% of the wins, 43% of the goals scored and 45% of the league points won.

They're a league within a league, more likely to be in a very good game state and they also accounted for 43% of the league's big chances.

Team quality may be the causative agent for a good game state and for creating big chances, which correlates the two without either being causative agents of the other.

So I stripped out all games involving the big six to get a more closely matched initial contest, but the correlation persisted.

Teams in a good place against sides of similar core abilities were more likely to create very good chances and more likely to find defensive bodies to block the anticipated  onslaught from their opponents.

As a tentative conclusion, intuitive events that you might expect to be more likely to occur as strategies subtly alter do appear to be identifiable in the data.

Data from InfoGolApp

Saturday, 15 July 2017

Lloris, the Best with Room to Improve?

Expected goals, saves or assists are now a common currency with which to evaluate players and teams, with an over achievement often being sufficient to label a player as above average/and or lucky, depending on the required narrative.

By presenting simple expected goals verses actual goals scored, much of the often copious amount of information that has been tortured to arrive at two simple numbers is hidden from the view of the audience.

Really useful additional data is sometimes omitted, even simple shot volume and the distribution in shot quality over the sample.

The latter is particularly salient in attempting to estimate the shot stopping abilities of goal keepers.

Unlike shot takers, it is legitimate to include post shot information when modelling a side's last line of defence.

Extra details, such as shot strength, placement and other significant features, like deflections and swerve on the ball, can hugely impact on the likelihood that a shot will end up in the net.

A strongly hit, swerving shot, that is heading for the top corner of the net is going to have a relatively high chance of scoring compared to a weakly struck effort from distance.

Therefore, the range probabilistic success rates for a keeper based shot model is going to be wider than for a mere shooter's expected goals model. not least because the former only contains shots that are on target.

We've seen that the distribution of the likely success of chances can have an effect on the range of actual goals that might be scored, even when the cumulative expected goals of those chances is the same.

To demonstrate, a keeper may face two shots, one eminently savable, with a probability of success of say 0.01 and one virtually unstoppable, with a p of 0.99. Compare this scenario to a keeper who also faces two shots, each with a 0.5 probability of success.

Both have a cumulative expectation of conceding one goal, but if you run the sims or do the maths, there's a 50% chance the latter concedes exactly 1 goal and a near 98% chance for the former.

The overall expectation is balanced by the former having a very small chance of allowing exactly 2 goals, compared to 25% for the keeper facing two coin toss attempts.

Much of this information about the shot volume and distribution of shot difficulty faced by a keeper can be retained by simulating numerous iterations of the shots faced to see how the hypothetical average keeper upon whom these models are initially built and seeing where on that distribution of possible outcomes a particular keepers actual performance lies.

Hugo Lloris has faced 366 non penalty shots and headers on goal over the last 3 Premier League seasons.

Those attempts range from ones that would result in a score once in 1,400 attempts to near certainties with probabilities of 0.99.


An average keeper might expect to conceded goals centred about 120 actual scores based on the quality and quantity of chances faced by Lloris.

Spurs' keeper allowed just 96 non penalty, non own goals and no simulation based on the average stopping ability of Premier league keepers did this well.. The best the average benchmark achieved begins to peter out around 100 goals.

Therefore, an assessment of the shot stopping qualities of a keeper might better be expressed  as the percentage of average keeper simulations that result in as many or fewer goals being scored than the keeper's actual record.

This method incorporates both the volume and quality of attempts faced.




The table above shows the percentage of average keeper simulations of all attempts faced by Premier League keepers since 2014 that equalled or bettered the actual performance of that particular keeper.

For example, there's only a 2.5% chance, assuming a reasonably accurate model, that an average keeper replicates or betters Cech's 2014-17 record and they would expected to equal or better Bravo's
in perpetuity.

Lloris' numbers are extremely unlikely to be replicated by chance by an average keeper and it seems reasonable to surmise that some of his over achievement is because of above average shot stopping talent.

Lloris over performs the average model across the board. Saving more easy attempts compared to the model's estimates and repeating this through to the most difficult ones.

Vertical distance from goal is a significant variable of any shot model and  Lloris' performs to average keeper benchmark save rates, but with the ball moved around 20% closer to the goal.

Intriguingly, this exceptional over performance is partly counter balanced by an apparent less than stellar return when faced with shots across his body.

Modelling Lloris when an opponent attempts to hit the far post produces a variable that his a larger effect on the likelihood of a goal then is the case in the average keeper model.

Raw figures alone hint at an area for improvement in Lloris' already stellar shot stopping.

The conversion rate for players who got an attempt on target, while going across Lloris' body converted 35% of the time, compared to the league average of 32%. He goes from the top of the tree overall to around average in these types of shots.

An average keeper gets more than a look in in this subset and the average model equals or beats Lloris' far post, on target actual outcome around 22% of the time. That's still ok, but perhaps suggests that even the very best have room to improve.

Below I've stitched together a handful of Lloris' attempts to keep out far post, cross shots to give some visual context.



For more recent good work, check out Will and Sam's twitter feed and Paul's blog & podcasts.

Data from Infogol.InfoGol

Thursday, 13 July 2017

Gylfi, "On me head, son"

Expected assists looks at the process of chance creation from the viewpoint of the potential goal creator.

An assisted goal is a collaboration between the player making the vital final pass and his colleague who tries to beat the keeper, but over a season these sample sizes tend to be small.

Manchester City's Kevin De Bruyne topped the actual assist charts in 2016/17 with 18, but these numbers may have benefited from a statistically noisy bout of hot finishing or suffered from team mates who frequently sliced wildly into the crowd.

Therefore, it makes sense to use the probabilistic likelihood of success in the 85 additional instances when the Belgian carved out a chance that went begging.

Here's the top ten expected chance creators from the 2016/17 Premier League, along with their actual returns, courtesy of the recipients of these these key passes.



The list contains the kind of players you'd expect to see when trawling the Premier League for creative talent.

The expected assists are based on a model derived from the historical performance of every assisted goal attempt from previous Premier League seasons.

So De Bruyne's over performance may reflect the above average talent, not just of himself, but also his team mates or it could be that creating and finishing talent is tightly grouped in the top tier of English and Welsh football and randomness accounts for the majority of the disconnect between actual and ExpA over a single season.

Swansea's Gylfi Sigurdsson, a constant topic of transfer speculation, lies 3rd in both expected and actual assists, with 9 ExpA and 13 actual ones. This backs up the Icelander's importance to the Swans, where he was involved in nearly a quarter of Swansea's ExpG in 2016/17.

His relatively large over performance, compared to his ExpA cumulative total of just under 9 may suggest he is particularly adept at presenting chances to his team mates.

However, a simple random hot streak from both or either participant in the goal attempt should not be ruled our.

In 9% of simulations, an average assister/assisted combination would score 13 or more goals from the 77 opportunities crafted by Sigurdsson.


Neither is there anything untoward in the fit of the model to Sigurdsson's 77 assists. Lower quality chances are converted at a lower rate than those which had a higher expectation of producing a goal.

So far there's nothing to set off warning bells for any potential purchaser, Sigurdsson appears to be legitimately a top echelon goal creator, albeit one who may have run slightly hot in 2016/17.

But if we make some direct comparisons to say De Bruyne, differences begin to emerge.

De Bruyne's ExpA per key pass is 0.15 compared to 0.11 for Sigurdson, which suggests that De Bruyne is, on average creating higher quality opportunities.

The profile of the position of the recipients of Sigurdsson's key passes is also strikingly different from those of the Manchester City player.


De Bruyne is supplying chances for a much larger proportion of attacking minded players, such as out and out strikers, wingers and attacking midfielders.

Whereas, over 50% of Sigurdsson's key passes are picking out defenders, notably central defenders and that usually means headed chances, from set pieces.

This appears to be confirmed by the final column in the first graphics of this post. Only a third of Sigurdsson's assists arrived at the feet of a team mate, well below the figures for the remaining nine assisters in the table.

All of whom check in with at least 67% of their potential assists being finished off with the boot.

Gylfi's penchant for set play deliveries to a defenders head also features in Ted's article on the transfer speculation surrounding Sigurdsson in The Independent as part of Ted's grand tour of the British press.

Despite Sigurdsson's apparent niche assistance role, at least in 2016/17, his ExpA per potential assist does still hold up well.

He's below De Bruyne, as we've seen, but is above the remaining eight players in the top ten, bar Fabregas and an anonymous Stoke player, who we want to keep.

So although he does deliver aerial passes to generally less skilled finishers, his relatively impressive ExpA per key pass does suggest that he can put the ball into extremely dangerous areas and with accuracy to find a team mate.

Also his actual assists from headed chances of 8 compared to and expected total of just over 5 suggests he may be more skilled at such deliveries than is the average case, although such small samples inevitably prevent random chance being eliminated as the main causative agent in any over performance.

Overall, Gylfi Sigurdsson may be worth a great deal of money.....to a side that is set up to benefit most from his particular creative skill set.

But those teams may be few in number and principal among them are his current employers.

All data via Infogol

Wednesday, 12 July 2017

Stoke Score More August Goals Than Andrew Cole

Hugely amusing tweet* doing the rounds, yesterday.


All great fun in the world of football bants and also an excellent case study in how to use "stats" to purvey a misleading impression that's likely to get picked up, circulated and no doubt recycled in September when the Premier League's fixture computer love affair with the Potters pitches them to the foot of the table.

So let's do a bit of due diligence .

Cole played 44 games to reach his 25 goals, playing, as he did in the 42 game, Premier League era, when they sometimes managed to cram six games in during the opening month.

Stoke scored their 23 goals in 28 matches.

So even this simple addition of context floors the deliberately provocative tweet.

Cole scored 0.57 Premier League goals/game in August, which is eclipsed by Stoke's 0.82 August goals/game.

The comeback would probably be "one is a team of 11 to 14 players".

But 1 of those 14 is a goal keepers, and keepers, with the exception of Stoke City ones, generally don't score.

Four or five are defenders who don't score a lot, which limits the fair comparison to Stoke players, from the August months of the Premier League era, who played in a similarly advanced role to Cole's position at Newcastle, Manchester United, Blackburn, Fulham, Manchester City and Portsmouth.

Designated Stoke forwards scored 13 of their 23 goals, so their scoring rate falls below Cole's 0.57 goals per game to 0.46 goals per game.

Stoke played an average of two out and out strikers over their Premier League existence, so we'll half that rate to 0.23 goals per game.

This puts Cole well back in the lead, allowing the rip to be taken out of the Potters again....?

However, we haven't considered the goal environments.

Stoke played against a batch of sides in August who conceded an average of 1,35 goals per game, as did Cole a decade earlier.

No change, there.

Cole's teams scored an average of 1.80 goals per game, meaning he played for sides who had a lot of attacking intent.

His 0.57 goals per game was around 30% of the baseline figure for his team.

The homogenised Premier League Stoke striker scored 22% of the 1.06 goals per game Stoke have averaged in the Premier League.

Those strikers included Dave Kitson, Mama Sidibe (legend), Ricky Fuller (legend), James Beattie, Kenwyne Jones and Peter Crouch.

Bottom line, Andrew Cole scored a higher proportion of goals for his club than did this mismatch of ageing, journeymen footballers did in their defensively structured, mid table team........in one particular month.

Ha ha.

*NOT

Thursday, 6 July 2017

Game State Outliers

Newcastle's 2011/12 season remains one of the most interesting of recent times.

They scored just four more goals than Norwich, but gained 18 more league points and allowed two fewer goals than Stoke and won 20 more points.

Their meagre +5 goal difference was inferior to the three teams who finished immediately below them in the final table and a 5th place finish was partly down to the hugely efficient way in which they conceded and scored their goals.

The ability to leak goals only when a game was already lost and score at the most advantageous times proved transient and the following season Newcastle's elevation to the top tier of the Premier League stalled as they barely finished above Sunderland and relegation.

In this post I looked to define game state in terms of not simply the current score, but also the equally important factor of time elapsed.

The current state of the game for a side is a combination of the score line, the relative abilities of each side and how long remains for either team to achieve a favourable final outcome.

As an example, take Stoke's home game with Everton.

The matchup was fairly even, Everton the better team being balanced by the Bet365 stadium and after 6 minutes the hosts had around a 37% chance of winning and a 25% of drawing.

That equates to an expected league points of 1.4.

A minute later Peter Crouch scores to put Stoke 1-0 up and their expected points with 93% of the game remaining and a goal to the good, rises to 2.1 league points.

The goal's welcome, but mitigated by the large amount of time remaining and the evenly matched teams.

No VAR and the game ends 1-1.


The plot above has averaged the increase in expected points per goal scored in an attempt to see which sides were scoring goals that most advanced their potential expected league points, either by design or raw chance, combined with their core ability.

It shouldn't be surprising to see the better teams having the lowest average expected points improvement per goal in the Premier League.

They are more likely to win matches by large margins and the 4th goal in a rout will add little to the teams expected league points, which will already be close to 3.

However, even among the top teams there are variations.

Spurs have the lowest expected points increase per goal scored, partly due to wide margin wins against the lesser sides, while Chelsea, with a similar number of goals, found themselves celebrating a score with, on average, a more tangible game state reward.

Hull appeared to occasionally put themselves into relatively decent positions, despite meagre scoring, while Sunderland, not only scored fewer goals, but also frequently only netted when the spoils had largely been won by their opponents.


The same point may be better illustrated by plotting the success rate ( a combination of  wins and draws for each team) against their expected points increase per goal scored.

Chelsea are apparent outliers from the line of best fit, scoring goals that advance their game state, on average  by more than their fellow top sides.

Again this might suggest that they are employing slightly different in game tactics compared to others.

Perhaps one that deserts further attacking intent for a more defensive outlook once they find themselves in a favourable match position, as do Manchester United....Or perhaps there is an element of random good fortune in when they are scoring their goals, a la Newcastle 2011/12.

Both Championship enigmas, promoted Huddersfield and their beaten playoff rivals, Reading, show anomalies from the seasonal norm when we examine their change of expected points based on goals and time elapsed.

Huddersfield fly high above the general line of best fit for a side of their scoring capacity, fed by a glut of goals where the time factor had nearly ebbed away. Again, tactically and skill driven or transient good fortune or a bit of both?

Reading showed an uncanny ability to know instantly when they were beaten by "selectively" leaking many of their 64 goals allowed in a handful of games "allowing" themselves to spread the remainder  of their concessions more thinly and remain competitive in a large number of their matches.

A "trait" that will be eagerly anticipated for their 2017/18 season.

Thursday, 29 June 2017

Big Chance or No Big Chance.

There has been a fair bit of comment recently around big chances and their inclusion or not in shot based expected goals models.

Big chances are, as the name suggests, a partly subjective addition to the Opta data feed which describes a goal attempt.

Along with undeniable parameters, such as shot location, type and pre-shot build-up details, the big chance attempts to add information, such as the level of defensive pressure or the positioning of the keeper.

While such information may enhance any conclusion about the quality of an individual chance and assist in converting a purely outcome based approach to team evaluation to a more probabilistic, process based one, it may become prey to cognitive biases, such as outcome biases.

I thought I'd quickly build two models, using the Opta data feed we use to power the Infogol app and see how each performs when put to some of the common uses of an ExpG model.

One model uses big chances (BC), whilst the other does not (NBC).

Such models are primarily used either as descriptive of past matches and/or predictive of future performances.

Typically, pre-shot data is collected from a previous season or number of seasons and the relationship between this data to a discrete outcome, such as whether a goal is scored is found using logistic regression.

We can then use the results of the previously modelled regression to assign the probability that any future chance will result in a goal based on recent historical precedent.

The advantages of using ExpG models is that shots are much more numerous than goals and hopefully the process of chance creation with an attached probabilistic measurement of success will better describe a side's underlying abilities compared to actual goals, which are perhaps more prone to random streaks.

                     Cumulative ExpG Totals for 2015/16 Modelled from 2014/15 Opta Data.



Here's the cumulative ExpG totals for the 2015/16 Premier League, modelled using data from the previous season. These type of figures are often used as a basis to predict the future performance of a side.

The top model doesn't use big chances as a parameter, but the second does and while there is some variation between models, the correlation measured in Exp GD is strong between the two models.


For those wishing to use an ExpG approach to produce a probabilistic estimation of team quality, there seems little difference in larger sample sizes between a big or non big chance based model.

It would appear that, in the long term at least, chance quality information is also retrieved from non big chance Opta parameters and more importantly is distributed to individual teams in a similar way to a big chance model.

In short, both models give Exp GD of similar values for most sides.

However, cumulative totals can give near identical values, but be very different at the granular level.

Model BC may assign a much bigger probability to excellent opportunities and smaller ones to weaker opportunities, while model NBC may do the polar opposite and the errors in the latter may fortuitously balance out to give near equal cumulative totals.

The first model would describe future reality better than the second.

To test both models, I arranged the goal attempts for all 20 teams in ascending chance quality,divided these into groups and then compared the actual number of goals scored in each of these subsets to the number predicted by each model.

                      How Well Does the Predicted Distribution of Outcomes Match Reality.



(Green = acceptable match, brown - poor match).

The results of this goodness of fit test is shown above.

Where the probabilistic model prediction for each subset largely agrees with the actual distribution of outcomes for 201516, we get a large p value. There's a decent chance that the variation we see between prediction and reality is just down to chance.

Using the usual 5% threshold, there are two teams from the model constructed without big chances where the actual distribution of outcomes is so far removed from the predictions that chance may be largely ruled out as the cause.

In this case, Liverpool and Stoke.

The model constructed with big chances included as a variable has three teams where chance looks an unlikely candidate for the variation seen in the two distributions. Liverpool (again), Everton and Swansea.

So while cumulative ExpG values tend to show only small variations between a BC and a non BC model, differences do emerge at a more granular level and these differences for this season and these two models does not appear to be systematically in favour of the BC or non BC model.

In short, ExpG is a product of a model and all models vary and these differences and the conclusions we draw may be most evident in smaller shot samples

Saturday, 24 June 2017

You Don't Need Goals to Change Game State

I’ve written previously about the concept of game state and how a side prioritises their attacking and defensive resources.

It is well known that trailing sides often increase their attacking output when they are behind compared to when they were either level or ahead and this in turn impacts on the amount of defending their opponents are obliged to do.

Dependent upon the relative abilities of the two competing teams, a side seeking to get back on level terms often takes more shots and also accrues more products of attacking play, such as corners than was previously the case.

However, game state, as simply defined as the current score line does seem limiting and I’ve previously quoted the example of a top side playing out a goalless draw with a lesser team.

While the level scoreline would be increasingly welcome to the lower rated team as the game progressed, the opposite would apply for the better side in the matchup.

Therefore, quantifying “game state” should perhaps be done in terms that include the changing expectations of each team due to the passage of time and scoreline, rather than simply the scoreline.

I’ve suggested using the expected points each side would get on average from a match as a suitable baseline with which to begin measuring the evolving state of the game.

Here’s an example.

Chelsea entertains Everton and based on pregame home win/draw/away win estimations, Chelsea would expect to average 2.1 points compared to around 0.71 points for the visitors from the fixture.

40 minutes into a still goalless game and these numbers have respectively fallen to 1.9 and risen to 0.81. After 67 minutes and still no goal and Chelsea are faring even less well (1.66) and Everton are up to an average expectation of 0.90 points.

There have been no goals, but the state of the game is constantly drifting away from Chelsea’s expectations and surpassing Everton’s “par for the course”.

Chelsea's game state environment is gradually becoming less palatable to them and Everton's more so, simply through the passage of time and if this feeds through into the relative approaches of the sides, it should be seen in the match data.

Here’s a memorable 0-0 from 2016/17 when Burnley took a point in a stalemate at Old Trafford.

The host’s average expected points total started at around 2.3 points at kick-off compared to 0.55 points for the visitors, but it had fallen by over 10% when half time failed to see a score. So a gradual erosion of expectations, rather than a precipitous decline.

Burnley’s modest expectation was up to over 50% of their original with 20 minutes remaining and with United’s now tumbling by nearly a quarter compared to kick-off, their shot count began increasing as Burnley’s stalled.

        How Manchester United Piled on the Attempts as Burnley Frustrated them at OT.


This switch towards a more overtly attacking stance from the side leaking initial expectation as time elapses in a level match, forces their opponent to adopt a more defensive outlook and appears to be mirrored, on average in all such matches from the 201617 Premier League season.

72% of the goal attempts taken when the scoreline was level in 201617 were taken by the side whose expected points had slipped below their pregame estimation. Perhaps an important consideration when nearly half of all goal attempts from 201617 came while the scores were level.

Across all score lines, the inferior team in a match who had managed to improve their pre-game position, either through remaining level or taking a lead, attempted 31% of shots while that position persisted, but such sides upped this to nearly 46% against superior opponents when their current points expectation fell below their initial expectation.

These figures tally with intuition about how games develop, even in the absence of goals.

Therefore, the amount of change in a team’s pregame expectation may be a viable extension to the more commonly applied mere scoreline when assessing game state, particularly when we are still awaiting an initial goal.

For example, it is commonly assumed that increased shot volume from a side that finds themselves in a disadvantageous game state is partially balanced by a more packed defence.

This may lead to the expected goals from identical pitch locations being lower when defensive pressure is greater.

To try to test this I included a variable for game state within an expected goal model.for the 201617 Premier League, based around this continuous, time elapsed and score dependent calculation, rather than merely using the current scoreline.

Overall, a team playing with a current expected points total that had dipped well below their pre-game expectations, converted chances at a lower rate than identical chances where game state was much less of a factor.

In addition, as teams played with a poorer game state, their goal attempts were also more likely to be blocked by defenders than in similar situations when their game state environment wasn't as dire.

As an example, a side who had improved their position compared to pre-game by around 40% of their initial points expectation might convert a decent shot from the heart of the penalty area around 44% of the time.

But when faced with the same chance when their points expectation had fallen by a similarly large amount, they appear to only convert the opportunity 37% of the time.

This may be due to fewer defenders being around in the first instance as their opponents perhaps chased a goal of their own compared to the second situation when defence might be a higher priority for their opponents.

Thursday, 15 June 2017

Early Season Strength of Schedule

With the major European leagues currently enjoying their summer holidays, it is left to a handful of competitions to provide club based action until early August.

One such league is Brazil's Serie A, a fascinating mix of player and managerial churn, exciting skillful youngsters, paired with former internationals, slowly winding down their illustrious careers and lots of shooting from distance.

Tonight sees the completion of week seven of the twenty team league, so while we have accumulated some new information about the 2017/18 version of teams such as Santos, Sao Paulo, Corinthians and less know sides, such as Gremio and Bahia, that information comes courtesy of an unbalanced schedule.

Prior to week seven, Flamengo had played three of the current bottom four and no side from the top half of the table, whereas Vasco da Gama had faced the current top two and only two sides outside the top ten.

The challenges faced by these two sides were likely to vary in their degree of difficulty,

Delving deeper into each side's most recent games, including matches from 2016/17 may be a more reliable indicator of their respective future prospects, but it is understandable that a six game season to date also invites comment in isolation.

Predicting the future arc of a team's season is always welcome, but celebrating achievement over a shorter time frame, even if some of it has come from a sprinkling of unsustainable randomness also deserves attention.

How can advanced stats and strength of schedule adjustments assist?

It's natural to look firstly at the record of the side in question, but it is their opponents that possess the richest seam of data from 2017/18's fledgling season.

Vasco has played Palmeiras, Bahia, Sport, Fluminese, Corinthians and Gremio prior to last night and in turn each of their opponents has also played five other opponents in addition to Vasco.

Combined, Vasco's opponents have played 36 games, nearly a full season and have played every side in Serie A at least once, bar Corinthians.

We have a ton of accumulated data from goals to expected goals for Vasco's opponents, but only six games of data for Vasco themselves and the same is true for the remaining 19 teams.

It's natural to expect even this limited, if recent achievement does contain some signal relating to future performance and Ben Cronin over at Pinnacle has written this article about the correlations between Premier League position after six games and final position and the FT's John Burn-Murdoch also tweeted this excellent visualisation correlating current league position during the 2013/14 season with finishing position in May.

To adjust for strength of schedule, we might take expected goal differential, rather than league position as the performance related output for each team and utilise the interrelated collateral form lines are created after a few weeks of the season

Team A may not have played team B yet, but they may have played team C, who have played team B.

We are left with 20 simultaneous equations, with a side's opponents on one side and their actual expected goal differential output on the other. Solve these we have new expected goals differentials that more fully represent the difficulty of each team's schedule.

In short, it is the basis for so called power ratings.



Here's how Serie A teams were ranked by expected goals differential prior to week seven and how that ranking changed when we allowed for the sometimes heavily unbalanced schedules played.

Vasco were ranked 13th on expected goal differential, but jumped into the top 10 to 9th when their harsh early schedule was applied.

Ponte Preta dropped four places to 15th in view of an apparently benign group of initial opponents.

In theory this seems fine, but does schedule strength add anything to our knowledge of a side going forward if we choose to limit ourselves to data from just this single season?

As Ben and John have admirably demonstrated, there is a correlation between league position at various stages of the season and finishing position.

Here's a limited (due to workload) example from a previous Premier League season using simply goal differential rather than expected goals.

13 games into the 2013/14 season, Spurs were ranked 13th by goal difference, 10th when strength of previous schedule was applied and 9th in the actual table. They finished 6th.

Their position in the table after 13 games better predicted their finishing spot, followed by strength of schedule adjusted goal difference and lastly actual goal difference.

As a whole though ranked, strength of schedule adjusted goal difference from week 13 did best of the three, producing ranked correlations of 0.77 for league position and actual goal difference after 13 games, but rising to 0.80 when strength of schedule corrections were applied and the teams re ranked after 13 matches each.

In short, there is signal in limited early season data and as a means of predicting final finishing position there may be some improvement if we rank by a schedule adjusted performance indicator.

All Brazilian data from InfAppoGol

Sunday, 11 June 2017

Take On Me

A quick data viz spin through some of the less readily available attacking stats from the 2016/17 Premier League.

Aside from a penalty kick, the take on is the contest in a football game that most directly pits together the attacking and defensive attributes of individuals.

The ability to break apart a defensive structure by beating an opponent in a one on one contest is a hugely valuable asset, particularly if it takes place deep into opposition territory as demonstrated by England's opening goal against Scotland.

Similarly, conceding possession from an attacking move can also leave a side vulnerable to counters.

So who's perpetually trying to be creative in the opposition box and who might leave his side vulnerable to a costly turnover in less advanced areas of the field.

Here's the plots for the Top Six. The left hand side of the plot is closest to the opponent's goal and players who have played few minutes have been omitted.







Data from InfoGolApp

Friday, 9 June 2017

Visualising Premier League Defence

A quick follow up to the last post on the defensive actions of players in the 2016/17 Premier League.

Numerical values, of course are the mainstay of any attempt at a deeper analysis of the defensive side of football, but it is also useful to have a visualisation of the data from which to derive a quick overview and comparison of different players.

The previous post looked to quantify the number of defensive actions particular positions were responsible for and where on the pitch they took place.

This post looks at individual players and both the amount of defensive actions they partake in, corrected to per 90 minutes and also whether these occur closer to their own goal or higher up the field.



Here's the plots for the three main challengers to Chelsea from 2016/17.

The pitch has been split into ten equal portions, sorted by distance to the centre of the defending team's goal line and the volume of defensive actions have been counted in each of the ten sectors.

The right hand end of the spark line plot is the nearest sector to the team's own goal and the vertical line denotes half way.

The plot shows where and how often, either through instruction or necessity, a player is involved in the defensive efforts of his side and who is given free rein to concentrate on other aspects of team play.

All data from @InfoGolApp

Tuesday, 6 June 2017

All For One.....Defensive Lines in the Premier League.

While the attacking side of football was always going to be the focus of advanced analytics it is perhaps surprising that defensive metrics have received such little attention.

Aside from team wide expected goals allowed, more granular defensive metrics have barely progressed beyond mere counting of defensive actions such as tackles and challenges (player on player) and interceptions and clearances (player on ball).

There are exceptions, the universally excellent Colin Trainor here and there are excuses, particularly the scant availability of data relating to defensive actions.

Defence is also more overtly a team responsibility and whereas heroic last ditch tackles do occur and prevent a chance from turning into a shot, it is the overall structure and ability to create pressure on the team in possession that also exerts a great deal of influence.

So off the ball events are likely more important in defining an excellent defence than say decoy runs are to adding information to the attacking process, where shots, headers and key passes are more intuitively useful as an indication of repeatable process.

However, it can still be useful to add descriptive context to the defensive actions that are beginning to become available, such as interceptions, tackles and ball recoveries.

A simple division of how these defensive actions are shared out amongst the different playing positions and where on average on the field these actions are happening may add flesh towhat has previously been dry bones.

There are problems, especially the diversity of team formations, 17 different ones were employed in the 2016/17 Premier League 4231 proving most popular and 3142 the least and the definitive classification of positions also becomes less certain.

We can begin to look at both the share of defensive duties undertaken by a designated position both on average across the league and particularly within a team, along with the average area of the field where these actions occur.

These may then be a useful guide as to where either by choice or force as side defends its goal.


Firstly, here's a summary of the average distance from the centre of the goal where a defensive action occurred for designated positions during the 2016/17 Premier League season.

As you'd expect strikers and attacking players carry out their defensive duties the furthest away from their own goal. defensive midfielders creep closer to their own goal and defenders more so.


Now here's the share of defensive duties undertaken by the most commonly defined playing positions. Again there are no surprises, defensive positions are responsible for the lion's share of the recorded defensive events, but they do set baselines from which we can compare different teams to begin to tease out deviations from the norm.



Here's the average position from a side's own goal where the designated playing positions are taking part in a defensive action.

Usually strikers are involved in the defensive actions that take place highest up the field and central defenders are the group of playing positions who are mixing it nearest to their own goal.

The final column simply subtracts the first distance from the second to hopefully quantify the area within which most of a side's defensive actions are occurring within.

Burnley were the most compressed, defensively in 2016/17, requiring their designated strikers to help out in their own half, on average 42 yards from their own goal, while holding one of the deepest defensive lines in the league just 27 yards from goal, on average.

The majority of Burnley's defensive actions took place in a 15 yard perpendicular distance between these two lines of defensive action.

Leicester's defensive efforts, in contrast were the most spread out, with their strikers contribution spilling out into the opponents half of the pitch and their defence holding the deepest line of the 20 sides.

They perhaps needed a midfielder who could do the work of two.

Liverpool's high press is evident with the average position for defensive actions from their strikers taking place just inside their opponents half of the field and they also contribute the highest proportion of defensive actions in comparison to the attackers from other teams.

Part of this inflated striking defensive contribution will be down to the Reds utilising above average numbers of strikers, but it does seem that being part of such an attacking set up requires a spirited contribution towards the defensive cause as well.

All data is taken from the InfogolApp

Saturday, 3 June 2017

Francesco Totti's Ageing Curve

40 year old Francesco Totti ended his 25 year association with AS Roma when he appeared for the final half hour of last weekend's game with Genoa.

Totti has played over 600 Serie A matches, clocking up over 47,000 minutes of playing time, while scoring 250 league goals, although 71 of those have come from 12 yards and over that period, Roma has enjoyed consistent success, rarely dropping out of the top four positions.

As league careers go, Totti's has therefore been played at a very similar level, where Roma has been regularly amongst the best club sides in Italy and he has largely avoided injury.

Between 1994-95 and 2014-15 he has played at least 1,000 minutes in each and every season, peaking in 2006-07 when he managed 3,034 on the field minutes.

As such he is an ideal subject to see where is performance levels stopped improving and began that inevitable, age related decline, albeit from a very high level.

Quantifying the performance achieved by a players over the course of their careers is problematical. Playing time can often be used as a proxy, but goal output is perhaps the most easily accessible benchmark for an attacking player's current and previous level of play.

Here's the, inevitably noisy plot of how Totti's non penalty goals per 90 have changed from one season to the next over his long career.

The trend line indicates that improvement is replaced by decline when the horizontal axis is breached by the trend and this occurred when Totti was just over 28 & 1/2.

This doesn't of course mean that he suddenly because a poor player, merely that his best years, on average and from a scoring perspective were most likely behind him. Although as he subsequently demonstrated, he was still capable of contributing to Roma, perhaps in a slightly different role.

So footballers are all prey to ageing, although some have such high levels of innate talent that they can, like Totti prolong their time spent at the highest level because their aged talents are still above those peak years of less talented contemporaries.

Which brings us to tonight's champions League final, featuring Ronaldo. A player who has had a more varied league career, spanning Portugal, England and Spain, but judged against his own highest standards, has been himself in decline since just prior to his 28th birthday.

Thursday, 1 June 2017

Charting Liverpool's Expected Goal Surge Under Jurgen Klopp

Everyone with a passing interest in the developing football analytics movement will by now have heard of expected goals.

While far from  perfect, in common with most models, it does do an excellent job of examining the process behind the creation and attempted execution of goal scoring opportunities in a sport, such as football which has relatively few actual scoring events.

Much of the progress in recent years has revolved around improving both the descriptive and predictive qualities of the metric by incorporating firstly the shot type as well as location and also other pre-shot information, such as how the attack developed, often used as a proxy for defensive pressure.

Less attention has been paid to how the values of expected goals are presented for individual sides or players, with often a simple cumulative addition of the expected goals created and conceded being deemed sufficient for individual matches or seasons.

Simulations of each individual attempt using the expected goal value associated with that shot or header is an easy alternative, but this also converts the raw granular data into the different currency of win probability, when used on a single game or expected position or league points won if applied over a larger number of matches.

Retaining information about the distribution of the quality of the chances created, rather than simply taking a summation of the individual elements, is useful because of the way such distributions contribute towards the final range of possible outcomes.

Spreading your cumulative expected goals over a few shots compared to many has a different potential payoff.

In the former, you are foregoing the potential for an occasional bumper score line for the increased likelihood that you may be lucky and good enough to score at least one, which often yields some kind of return in a low score environment.

I first wrote about this here in 2014.

Here's an extreme example.

Would you rather have a penalty kick, with an ExpG value of 0.8 or eight shots, each with an ExpG value of 0.1.

The cumulative ExpG is 0.8 in both cases, but if the range of outcomes were combined in a match scenario, the lone penalty would win 35% of such games and the more frequent, but less likely attempts would win just 28% of the contests despite also summing to 0.8 ExpG.

Therefore, ExpG distribution matters.



Here's the distribution of the ExpG chances created by Brendan Rodgers' and Jurgen Klopp's Liverpool over their most recent 48 game span.

The opportunities have been grouped and counted by increasing ExpG per attempt and compared to the average league for quality and quantity, adjusted to a 48 game sequence.

The majority of chances created by a side has a relatively low expectations of scoring, falling between an expectation of near zero, rising to around a 15% chance.

Attempts with higher ExpG values are much less numerous, ranging up to so call big chances, where historically a team has been more likely to score than not.

Therefore, a secondary axis has been used to produce definition on these much rarer groups of bigger chances.

There's not much between the current Klopp managed Liverpool and the man he replaced, Rodgers in the lowest expectation region of chances created.

Klopp's side is above the average, volume-wise for attempts in the three initial groups that are quantified by the left hand axis, ranging from 0-0.15 expG.

Rodgers edges ahead in the volume of chances created with a grouped ExpG of between 0.2-0.25, the counts for which are shown on the right hand axis.

Once we encounter chances with a likely historical likelihood of 35% or greater, the present Liverpool set up dominates both the league standard and Rodgers' Reds.

No penalty kicks have been included.

Data from @Infogol

Saturday, 27 May 2017

Reading vs Huddersfield, Championship Playoff Final.

The football match for the world's biggest prize takes place at Wembley on Bank Holiday Monday, when the two remaining Championship playoff teams meet to gain entry the the Premier League money mountain.

Much has been and will be written about the two sides, particularly about their less than impressive regular and advanced stats.

Reading, at least managed a positive goal difference of +4 compared to Huddersfield's -2, although the latter impressed more in the probabilistic, process driven world of expected goals.

There's a detailed preview being posted later in the weekend, but as a crib sheet for fans and neutrals alike, here's how the 46 game season looks for both teams through the lens of expected goals.

Each goal attempt has been assigned and expectation of of ending up in the net based on a variety of parameters and their historical contribution to a successful outcome.

Each individual attempt is then simulated along with all the others taken in each match and a scoreline emerges, based on the attempt events in each match.

Score effects will play a part in this partly artificial process and models will not capture every ingredient that goes into a complex team based sport, such as football.

Games have been "replayed" 10,000 times and the percentage of games which end in say a Reading win or a draw for Huddersfield have been counted.

Finally, the match have been arranged in descending order of how well the individual goal attempts and their associated expected goals reflect the actual reality of the result on the day.


Top of Reading's list for "slightly taking the liberty" is their 2-1 win at home to Wolves. The Royals' ExpG totals around 0.5 compared to 1.82 for Wolves. The home team scored with both of their only two shots of note and a simulation of all the attempts from the game suggest they win such a shooting contest 7 times from 100.

A match that kind of sums up the contrasting fortunes of both Reading and Wolves in 2016/17.


(click to enlarge).

Here's all of Reading's league matches, along with the simulated probabilities for each possible outcome. All of the top ten matches and 17 of the top 20 are Reading wins and are also games where the granular shot probabilities were initially skewed in favour of Reading's opponents.



Here's Huddersfield's season and a more mixed bag of game outcomes at the top of the table, perhaps implying that their season hasn't revolved around the Terriers pulling an Al-Habsi sized rabbit out of the hat on more than a few occasions. Unlike Reading.

Data is from the @InfogolApp which can be downloaded free and has historical Premier league, La Liga, Championship, Europa League and Champions League expected goal values for both teams and players.

Thursday, 25 May 2017

The Ticking Premier League Clock.

With the 2016/17 Premier League season now a wrap there's inevitably a raft of season reviews, both statistical and narrative driven.

Already sides are scrambling to pick apart the squads of the three relegated teams and capture the talent that shone brightest amongst the mediocrity.

Improving your Premier League squad for the upcoming 2017/18 season is an obvious priority. The likely output from you current collection of talent does not stand still, principally through the ticking of the clock.

It has been well demonstrated that a player's output, as measured by simple metrics or the amount of playing time he is given first waxes and then wanes (desperately resists obvious pun).

Although there is some positional variations, as well as individuals who possibly fall outside the usual, the peak ages in general for Premier League players lies between the ages of 24 and 29.

It is a simple task the chart which teams are well set to enter 2017/18 with a squad that is likely to show an improvement, just because players who were deemed good enough to be given playing time in 2016/17 are either moving into  the sweet spot for age related peak of performance or are remaining within their peak years.

On the flip side, other teams may be anticipating the need to recruit new, younger talent to replace an ageing squad that may have produced results that are acceptable for the club's perceived status in the Premier League pecking order, but if left unresolved will likely see an age related decline.



In the table above, the weighted amount of playing time given to players has been grouped by age,

This makes it possible to see which teams have a comfortable buffer of young talent that was deemed good enough to play some part in 2016/17 and under normal development will be expected to pick up some of the shortfall from older squad members who may begin to show age related decline.

It's also possible to wind the clock forward to spot which sides are best placed to cope with these transitions in the absence of new signings.

Ominously, Chelsea will likely retain the highest proportion of peak age performers, narrowly followed by fellow Champions League participants, Liverpool and Spurs.

By contrast, Manchester City again find themselves with a dearth of peak age performers from their current squad in the upcoming season, suggesting a bout of major squad reconstruction is imminent.

Monday, 22 May 2017

Tony Pulis Is Not A Slacker

Tony Pulis is never short of narratives.

Since the diminutive Welshman announced his presence on the main Premiership stage, guiding an under funded Stoke team, lacking in top flight talent to perennial survival, he's attracted plaudits and brickbats as the master of squeezing the most from meagre resources.

He's acquired manager of the season awards, as well as acrimony for his dour, anti football, laced with innovation, for which all Stoke fans will forever forgive him, especially as it came with the added bonus of infuriating Arsene Wenger.

Slacker, however, is a term rarely associated with Pulis or his three Premier League charges.

Until now.


Visually the evidence appears damning. In the 54 matches a Pulis led side has played after the black line in the graphic, only 45 points have been won.

That's relegation form in every season and the implication is that a manager who once infamously multi-tasked by cancelling Christmas, while also showering, has allowed his team to slacken when a likely survival target has been met.

So do the numbers support the view that a manager whose mantra is "work 'ard" actually relents during April and May.

"Can I have the month off, boss"?

Firstly, there is an element of selective cutoff points that do Pulis no favours in the graphic.

To surpass any target requires a side to either win or draw and in eight out of the nine seasons, Pulis' side reached the line set in the graphic with a win.

Therefore, just as "X has not won at Y since 2014/15, immediately tells you that they did actually win in 2013/14, each period of "rest and reflection" begins immediately after a positive result and that biases your perception of the ensuing games.

Secondly, gaining points is very difficult for mid to lower ranked teams, epitomised by those TP has managed.

It's quite easy to spot runs of 5 or 6 consecutive matches without a win during periods when Pulis was presumably cracking the whip (or wet towel).

Thirdly, the fixture list can get very unbalanced when broken down into segments of between 12 or just three matches, as has been done in the graphic.

Whether by quirk of the fixture list or design, Pulis has been sent more games against the Premier league's best and Arsenal in the latter phases of the season.

Rather than lounging on a deckchair, they've been taking on Arsenal (6 times), Man City (4 times, including once immediately after a FA Cup Final), Chelsea (3 times), Everton (3 times), Liverpool (3 times), and Manchester United and Spurs, twice each.

That's a disproportionately larger share of the current top 7 compared to a random draw.

The easiest way to quantify how a side has done over a range of games is to simulate the range of possible points won based around a probabilistic model that doesn't incorporate a "doesn't try when safe" variable.

This approach results in Pulis gaining the actual 45 points his sides accumulated or fewer in around 16% of trials.

So the return is an under performance, certainly, but one that might occur in 16% of simulations simply through the randomness of how points are won.


Here's an attempt to cherry pick a single season where the returns are so low compared to a odds based distribution of points that randomness is challenged as a possible contender for the actual points returned in the run in.

Seven out of the nine seasons are unremarkable, the two exceptions are the most recent campaigns at WBA, but even these two examples have respectively a 10 and 7% chance to just be random deviations from a bench line estimate of WBA's ability over the season.

And with a raft of sides hovering around WBA's performance expectation for points won going into April, the chances improve that someone, (not necessarily WBA), will appear to tank their season early.

Even if there is something in the tailing off of a Pulis side in two out of nine seasons, evidence must be presented for a possible cause, which could be plentiful.

Resting players carrying longterm injuries, experimenting with alternative tactical set ups, blooding inexperienced players, seeing your hot and unsustainable production from niche attacking methods regress towards less extreme levels each deserve scrutiny.

The list is nearly endless and almost universally laudable, but Tone giving the lads a breather would be way, way down my list, even if the data supported the claims.....which it doesn't.

Friday, 19 May 2017

Who's Made Their £Million Wage Earners "Put In A Shift"?

As soon as Omar Chaudhuri starts tweeting words like "bugbear", you know he's onto something that deserves a good going over.

£'s per point were deservedly in his sights as a way of determining over or under performance compared to league position following The Times perpetuating this nonsense.

I've outlined the fatal flaws in this approach in yesterday's blog, and Omar has also suggested improved methodologies on his Twitter timeline.

But it opens up a wider question about the simplified use of readily available data.

Just because something is relatively easy to calculate and appears to be intuitively sensible it doesn't make it immune from being a piece of pernicious hogwash.

In the NFL, strength of schedule prior to the season is regularly estimated by adding the win/loss record from the previous season of the upcoming opponents for each team. This seems sensible and excel & csv files are your friend.

However, this too is easily verified GIGO. Do you really think a multitude of easily identified factors that delivered a 2-14 record are going to perpetuate?

Note to the Racing Post, "stop using these numbers in your season preview".

No one minds flawed reasoning, but the greater the potential audience, the greater the responsibility to do some due diligence regarding methods and a willingness to make corrections if needed.

Here's the performance of Premier League teams over the last six, nearly complete seasons, using the proportion of resources outlaid in wages and the similarly weighted rewards in terms of wins and draws compared to the historical relationship between the two.

A couple of seasons may be missing because I couldn't find the data for a few sides.


Everton, Spurs & Southampton have had a more than fair return for handing out bulky pay packets as do Bournemouth, with more limited evidence.

Newcastle have managed just one, albeit a spectacular season of over performing against the splashed cash and Leicester's single over par was unsurprisingly the largest one in the whole six year sample.

Sunderland can at least attempt to eventually over perform in new surroundings in 2017/18.


Here's the individual under/over seasonal wages vs performance for the 11 ever presents over the six seasons.

Tottenham and Everton making a habit of beating expectation and Arsenal performing to similar relative levels as WBA and Stoke (whose managers names escape me for the moment).

Tuesday, 16 May 2017

Chelsea Win the Title By Efficient Use of Wages.

The Times is one of the pioneers of quality, statistically based football journalism, notably under their Fink Tank banner.

It's therefore no surprise when their sporting articles not only receive extensive coverage on Twitter and in other news outlets, but also carry a degree of authority based on the legacy of past departed star performers.

One such post appeared on twitter today and quickly spread via a raft of online newspapers and media, gaining many likes and re tweets.



The post was a long performed ritual come the end of the season, whereby a side's wage bill is divided by their points total to derive a "cost per point" number.

Following this intuitively comforting calculation, one team is deemed the most wasteful with their millions, in this case Manchester United (£3.6 million per point) and one is crowned the "value for money" team of the year. Spurs (£1.3 million per point).

Title winners, Chelsea came 12th out of 17 (the promoted teams were omitted), implying perhaps that they had won the title with a wasteful, inefficient splurging of the chequebook.

But is that really the case?

When Chelsea last won the league in 2014/15, the average wage bill for the 20 teams was around £100 million, ranging from £29 M at Burnley to £217 M at the title winners.

The Blues' wage bill was just under 2 standard deviations above the league average and for that outlay they gained 87 points or a success rate of 0.8 per game if you prefer to express draws as half a win.

The reward for Chelsea spending 2 SD above the league average wage bill was a success rate that was slightly greater than 2 SD above the average success rate for the Premier League.

This relationship holds for multiple seasons and for most teams.

Below is the plot for 2014/15.


The uncomfortable truth for a team wishing to gate crash the top of the Premier League, Leicester excepted, is that a typical title winning season requires a financial outlay in the region of 2 SD's above the league average.

Similarly, a stinting on the wage bill inevitably, with some variation throughout to account for luck, innovation, plagues of locusts etc pitches you into the bottom half of the table.

But the take away is that there is a strong relationship between how much you spend in comparison to the league average and the actual wage bills of the remaining teams in a Premier League season and your success rate again compared to the league average and that of your competitors.

Therefore, any under or over performance in a season should be compared to this historical relationship, rather than a perennial and flawed click bait ritual involving nothing more than long division.



Using the raw figures used by The Times (actually from 2015/16, but we're assuming 2016/17 will be similar), Chelsea spent 1.64 SD's above the league average this term.

 That "entitles" them, based on historical precedent to gain a success rate that is around 1.3 SD'a above the league average.

With a week to go their success rate ( (wins + draws/2)/Games played) is nearly 1.8 SD's above the league average success rate.

Whereas Chelsea languish two thirds of the way down The Times' value for money table, they've actually won the league, while also being this season's third best over performers in terms of share of money spent and success rate achieved.

Rather than being lambasted in a quality daily, Roman's bean counters, backroom staff and players deserve a huge pat on the back for being the best and efficiently so. Even if there was, as ever, an element of unsustainable good fortune, as well.

Bournemouth fall from 2nd to 4th, Watford from 4th to 8th, Stoke drop from 6th to 10th and Swansea also drop to 15th from 11th. Manchester City rise from 13th in The Times to 6th and Liverpool go from 15th to 9th.

Some teams remain relatively unchanged. Congratulations Spurs, sorry United fans, but consigning to the bin this self confessed "simplistic" method of ranking over or under achievement is long overdue.

Also check out Omar Chaudhuri's time line for his views on this "bugbear" & an alternate approach to quantifying inefficiency in wages.