Towards A New Model – Part 3

In previous posts on this matter, I settled on the gamma distribution as the model, as it was a continuous analogue of the poisson curve, and provided a curve that reflected actual real world scores.

The next little challenge was to calibrate it against the “par” scores that GRAFT generates. The par scores typically range from 50 to 120 and are a little narrower than the distribution of actual scores. There were a few more quirks that came in when I was trying to make it all fit. Basically I took 6-point wide slices of each prediction set, and set them against the actual scores against which I would try and fit the gamma curve. Fortunately, at least for the 60 – 120 par range there was a reasonably consistent look to the curves, as I will show. I confined the set of historical scores to start from 1987 to 2017.

(The black curve is the fit against all scores)

A little bit haywire at the extremities, but not too bad for the main spread. I looked at the factors for the distribution, particularly the shape parameter a, to come up with something plausible.

After a bit of eyeballing a sensible function for a, I’ve set the magic function with 7.5 as the scale

def curve(rating):
    a = rating * .1 + 3
    return scipy.stats.gamma(a, 0, 7.5)

(This is Python code of course, you’ll want the scipy library to play with this. I presume, if you use R you’ll be able to figure out how to implement this.)

So in the end we have this bad boy:

And overlaid on the actual result fits…

So that’s not too bad a fit, at least for our purposes.

A few quirks, though; the mean and median of each curve is actually regressed towards the global mean compared with the GRAFT par that was spat out. I have a feeling that this is counteracted by the spread of scores at each GRAFT tick, where a lot of scores come in slightly under average, but this is balanced by the long tail when teams go on the march and score 20 goals or more.

Well, let’s just that I haven’t put too much rigour into it before this season. There’s still a few things to mess around with, like for instance I basically set up curves for each team’s rating against each other completely independently, so there isn’t any covariance taken into account at this stage. What happens when two defense-orientated teams face off, versus two attacking teams? Well, that’s already kind of accounted for at the par setting stage (team.attack – opposition.defense + league_par).

I’ve gotten as far as cooking up some rough examples for the opener on Thursday night:

From the main site, Richmond’s par is 90.8, Carlton’s par is 58.8.

Based on all of the above, Richmond’s likelihood of winning is 76%, with a mean winning margin of 24. Well, that’s our first bit of weirdness there: GRAFT states that Richmond are better by 32 points, so what’s happening there?

The same sort of thing is happening with the base scores: Again, the idea of GRAFT is that each par score represents what each team should score against each other, given past performances, with the weekly adjustments carried out by a factor, calibrated to maximise the amount of correct tips. It’s a little fuzzier on margins even though margins are at the core of the result.

Richmond’s par (rounded off) is 90.8, but the mean of their curve is 90.6 (OK, not too crazy) and the median is 88.1.

As for Carlton, with a par of 58.8, their mean is 66.6 and their median is 64.1.

We’re fine with the median being a little skewiff, it’s not a symmetrical distribution, and the elongated tail is consistent with real scores. (That’s why I picked the gamma distribution, after all, as it was such a good fit for the historical record)

 

Very early drafts of how the graphs might look on the site. I have coded these at a low level in Pillow because I am a masochist. Also because matplotlib plots like the ones earlier in the post look like matplotlib plots.

So obviously there’s regression to the global mean happening each week. Thing is, the weekly ratings are updated according to weekly results based on the actual scores, based on factors which boil down to *Hulk voice* “big better than small”. While I’ve tweaked how the ratings are presented this year (dropping them by a factor of 10 and making them floating numbers) that part has essentially not changed.

What I was dissatisfied with was how the probabilities of individual games and the season as a whole were calculated. That practice of using a normal distribution which would not tail off nicely at zero so these required a horrible fudge of jacking up both teams’ scores. I am glad to be done with that nonsense.

With this new algorithm, I am at the point where I am happy to publish the probabilities, intending to roll them out by Wednesday (boy this blog entry is going to become dated fast), but at the same time they probably need a little more rigour applied, so caution is advised if you’re going to take heed of them for certain activities. Entertainment purposes only!

Another thing that will happen on the site is that I will be producing CSV files for the tips (with lines and likelihoods) and ratings for people to scrape if they like – a little more detail on the the formats and such in the next post.

Towards A New Model – Part 1

The past couple of weeks have seen small but significant steps on the development towards the retooling my sim model for AFL.

The first thing I had to do was update my historical data from, the AFL Tables scrapings.

For that I dragged out my old parsing code which still works, but had to deal with the fact that I had stored the goals/behinds with a dot separator. Which is actually not really a good idea if you’re generating a CSV (comma separated value) file, as if you load those straight into Excel the trailing zero may get stripped, so 10 behinds would be come 1 behind.

It’s OK for my purposes since I do most of my stuff in Python, but I decided that I would make my history file public I should at least eliminate that misunderstanding, and so for those fields I’ve changed the sub-separator to the underscore ( _ ).

After all that cleaning up it’s at a point where I can make the history file public, so you can now pick it up on the new Resources section, which may get expanded with other files that I decide to make public.

With that dataset sorted out, I could get stuck into analysing that.

In previous years I’d used the normal distribution (ye olde bell curve) as the basis for the simulation module. There are a few of problems with that, the most annoying to me is that it would generate negative scores.

Anyway, while I was attempting to work up a plausible sim model for “sokkah”, in that case I reasoned that the poisson distribution was most appropriate there as it was an event-based sport, after all.

AFL scoring, too, is a series of events, but with goals and behinds, the waters get muddied a bit as far as the quantum of the data. I guess I still couldn’t get away from the idea of using a continuous distribution, but for that I decided to use the continuous equivalent of the poisson distribution, the gamma curve.

So, I applied that to the set of final scores in the AFL/VFL set, and it worked marvelously.

So that’s what we’ll be using as a basis. I’ve also gotten the suggestion that the log-normal curve might also be worthy as it exhibits similar characteristics, so that might get a look in as I fine-tune things.

I’m now at the point where I’m trying to calibrate forecast results (based on the GRAFT system) against actual results, and that’s actually not looking so great. As far as margins go, what I’ve found is that while there is a good correlation in one sense (bigger predicted margins match up with bigger actual margins), the average of the actual margins for each slice is about 75-76% of the forecast margin. Not that flash. I can generate a pretty good win probability system out of it, but I also want to nail the “line” margins and par scores as well.

In other words, for games where I have “predicted” one team to win by 50 points, they end up winning (on average) by 38 (mean) or 40 (median) points – albeit with a lot of outliers as you’d expect.

There’s a bit of thinking here to do, and I strongly suspect that it’ll lead to a significantly reworking of the GRAFT system to the point where it’ll have to be considered Version 3 – but what that actually entails is still a bit of a mystery. It may be that this will be a whole new system that moves away from the linear arithmetic model, at least in part.

So that’s where we’re up to at this point. How much of this work I can get done before the new season is a little uncertain, because there’s a few other things on my plate over the next few months. But we’ll see how we go.

Offseason Training

What’s been happening lately at GRAFT?

Of course, I’ve set up the A-League site, and have been incrementally adding new things. Most recently putting together a SVG graph with the weekly fluctuations in the Elo ratings.

As well, I’ve made a some steps towards a decent prediction model. As you’d imagine “football” modelling is very mature and there is a great deal of literature on that, but you know, it doesn’t hurt to devise a system from first principles, even if you find out you’ve only reinvented the wheel at the end of it.

Elo is particularly good at giving you win/loss probabilities out of the box, but of course there’s the whole issue of draws to account for. On the whole draws seem to eventuate 25% of the time, which is a nice round figure.

This is one of the things I hope to deal with as I want to incorporate attacking/defensive ratings into the mix, in an attempt to create more plausible projections.

My theory is that a contest between two teams with attacking tendencies is less likely to result in a draw than between two more defensive teams. The reasoning should be fairly intuitive, if neither club gives a shit about stopping balls flying into nets, in such a shootout it’s less likely that the teams will finish on the same score.

As well, two evenly matched teams would be more likely to play out a draw than in a match where one team is of a much higher quality than the other, even when taking bus parking arrangements into consideration.

Anyway, that’ll take a bit of nutting out, although at the end, I want to be able to show a par score for each team prior to each game (much like I do with the AFL) although it’d look something like MVC 1.2 v SYD 2.1, with maybe the result probabilities alongside. The numbers would align to the poisson distribution of how many goals they might actually score. Once you get to that point, doing Monte Carlo predictions on that basis becomes pretty simple.

Among other moves for next year:

Revamp the site for the AFL/AFLW seasons in the new year. I mean, yes, it worked out for 2017, but I’m kind of bad about leaving things alone. Besides the whole “rating footy teams” part, I’m into this thing as much for developing new visualisations and designs. Basically I’m experimenting on all that in public. Some people have weird hobbies and this is mine.

Something else I will try and do is also provide CSV output of my tips and ratings into a set format so it’s easier for others to scrape and utilise rather than break everything whenever I mess around with the “look and feel”.

I consider the basic GRAFT system to be pretty much settled, for all its faults and limitations. That’s tied up with the basic philosophy that it takes into account only results and venues, however. It is what it is.

Which is not to say that I won’t have new things on the go, including other systems based off it or on completely different principles, but at the end of several years of development and refinement, the core is done. I do want to publish the basic algorithm (it really is absurdly simple) and some associated tools for others to examine and rip to shreds, but I am a horrible coder so there’s a bit of cleaning up to do before that happens.

Having said that, the main thrust of development is the projection system. I intend to overhaul the system that I use to work out my probabilities and eliminate some of the more egregious fudges. I’ll have more detail here during analysis and development, but the first aim is to move on from the “close enough is good enough” normal distribution that I have used up to this point.

As far as AFL projections go, I will probably stick to a continuous distribution. There’s a few that might fit the bill, but that’s yet to be figured out. Maybe gamma, maybe logistic. There’s a lot of number crunching to be done for that but I’m going for something that starts at zero and follows the historical curve, has some amount of covariance between the two teams, and of course it’ll need to correlate nicely with the GRAFT ratings.

Another objective with the AFL section is to flesh out history sections for the league in general and also under each club page. Not quite sure how to present all that, I probably won’t go all AFL Tables on you because, well, we already have AFL Tables, but having the ratings into a historical context would be interesting. Again, that’s a thing that will develop over time.

Aside from all the AFL stuff, there’s also the intention to¬† branch that other winter game – since I do want to have a more general view to the site. There’s a few things to work out how to tackle the NRL ratings, but I think some kind of hybrid approach will be needed there. Historical data is a little harder to find and organise so that’ll be the first thing to sort out. Of course this means I will might actually have to get enthusiastic about League again, which has been a struggle since it tried to go Super.

Of course in taking a more general approach across the sports, I have to sort out this shiny new site so people don’t get lost around here. Getting into the web dev side is pretty interesting; I’m using Bootstrap 4 as the basis for now since it’s reasonably easy to set things up so it doesn’t look too broken in small screens.

Aside from the framework, when it comes to generating the website pages, I’m committed to using all sorts of spaghetti code that I have trouble comprehending when I look it again after a break. Well, as I said, horrible coder. I probably do all sorts of things would make seasoned pros scream. “What’s unit testing, Precious?”

Fortunately it’s not my day gig, and I’m not looking for one.

Anyway, that’s what’s on the whiteboard for the next few months. In the next week or two I will have a poke at the 2018 AFL fixture and see how that stacks up and then announce my usual anodyne opinions about how nothing really matters anyway and thank buggery they haven’t inflicted us with 17/5 just yet. Bet you’re looking forward to that.