UK Horse Racing's Data Analyst Tool
Data Analyst Downloads
Download the Excel spreadsheet from here
Data Analysis Tool (Version 66.5)
The required text steering files
Archived Result Data Files (required as input for analysis) - Note that this is only accessible by UKHR Members.
Setting up the DA Tool
As always, an example of how things are done is often the best. So here's how I have my DA Tool enviroment set up.
In the folder C:\Horses\Development\Data Analyst I have the .xslm spreadsheet - this is the DA Tool. I have those two .txt files as well in the same folder. This folder is, therefore, my Data Analyst folder. The folder where you keep the DA Tool may be somewhere else.
It doesn't matter where the DA Tool folder is but one word of advice: don't place it into one of the Windows' Library Folders such as My Documents, this is because on later versions of Windows (i.e from Windows 7 onwards) a lot of Library folder functions are borked and it's always wise to avoid these folders. This piece of advice is worth following for all of your work.
What happens is when the spreadsheet opens the first bit of code runs to check the environment. The first thing that it does is to work out where it is on your hard drive. On my machine I have it stored C:\Horses\Development\Data Analyst if you recall.
Assume that the DA Tool has started up and knows where it is. Having determined that, it then looks for the two text steer files in the same folder. These have to be in the same folder because the DA Tool wouldn't know where else to look for the files.
When it's found those two text files it then opens them up to get more information before it can produce anything. The DA Tool needs to know two things; where to look for the results data (these could be anywhere on your disk) and where to put the reports that it makes.
So, the DataFilePath.txt must contain a line of text which tells the DA Tool where the CSV Results Data files have been placed on your hard drive. And in the OutputPath.txt you have to state where you want the various reports to go.
In my case I have all of the results data in the folder: C:\Horses\Reports\Archive\Modified050\4 - All and therefore I have that string in my DataFilePath.txt file. I want all my output files to go into the folder C:\__Temp and so that string is in the OutputPath.txt file.
But you can choose different folders where you have put the CSV Results and where you want the reports to go. This is why you have to edit those two text files - quite simply; the DA Tool needs to know this information.
The following must be observed though:
1. The output folder must exist
2. The CSV results folder must contain ONLY the CSV results files from the Full Archives For Members and no other data.
3. The CSV results files come down in a zip file and so the data must be copied out of that zip file into the CSV results folder
Make sure that macros are enabled in Excel and then off you go.
Yes, a bit of faff setting it up the first time but once you're done that then any updates to the CSV Results will go into the same results folder (not forgetting to unzip them first!) and then you will be off making reports constantly. One you get to grips with it; it's very simple to edit the code and to make lots of reports.
Please note that if there are any other CSV files in the data folder, even if they are the daily CSV files, then the DA Tool will stop. Without exception every CSV file within the data folder must have come from the Full Archives.
Posted 6th November, 2017
A slight error with the ROI% calculations has been fixed. In cases of a filtered set of runners having one, or more, runners and no winners the ROI% was wrongly displayed as 0% whereas it should be -100%.
Posted 29th May, 2016
Twenty-two new fields added:
- BHAclassToday: What the BHA race class is for the race today
- BHAclassLast: What the BHA race class was for the previous type
- BHAclassLastType: What the BHA race class was for the last race run on the same race type
- PositionLastTime: How the horse finished last race
- Position2RunsAgo: How the horse finished two races ago
- Position3RunsAgo: ditto but three races ago...
- Position4RunsAgo: ...and four races ago
- Position5RunsAgo: go on, have a guess
- DistanceLastTime: How many furlongs was the last race run
- Distance2RunsAgo: How many furlongs the horse ran two races ago
- Distance3RunsAgo: ditto three races ago
- Distance4RunsAgo: and for four races ago
- Distance5RunsAgo: and five races ago
- GoingLastTime: the going last time
- DistanceRuns: How many times this horse ran over this distance
- DistanceWins: And how many times it's won over this distance
- GoingRuns: How many times its run on this forecasted going
- GoingWins: How many times its won on this forecasted going
Posted 20th February 2016
Twenty Five new fields added:
- JockeyAdvantage: How much ratings advantage/disadvantage this jockey has against the top rated (or 2nd rated for the top rated jockey)
- TrainerAdvantage: ditto but for the Trainer's rating advantage
- TrFormAdvantage: ditto but for the TrForm advantage
- ConnAdvantage: ditto but for the Conn advantage
- FrmAdvantage: ditto but for the Frm advantage
- LstAdvantage: ditto but for the Lst advantage
- ClsAdvantage: ditto but for the Cls advantage
- WinFAdvantage: ditto but for the WinF advantage
- SpdAdvantage: ditto but for the Spd advantage (not be confused with Speed)
- HCPAdvantage: ditto but for the HCP advantage
- FCPAdvantage: ditto but for the FCP advantage
- SpeedAdvantage: ditto but for the Speed advantage (not to be confused with Spd)
- JumpAdvantage: ditto but for the Jump advantage
- BetterClassWin: How much higher was the most recent better Class win, if negative how close was the best historical win
- BetterClassWinDaysAgo: How many days ago that this above event occured
- BetterClassWinRacesAgo: ditto but looking at the number of races ago
- BetterClassWeightWin: The same as the BetterClassWin but taking into consideration Class & Weight
- BetterClassWeightWinDaysAgo: How many days ago this above event occured
- BetterClassWeightWinRacesAgo: ditto but looking at the number of races ago
- BetterClassWinType: Similar to the BetterClassWin but looking at the same race type as today
- BetterClassWinTypeDaysAgo: How many days ago this above event occured
- BetterClassWinTypeRacesAgo: ditto but looking at the number of races ago
- BetterClassWeightWinType: Similar to the BetterClassWeightWin but looking at the same race type as today
- BetterClassWeightWinTypeDaysAgo: How many days ago this above event occured
- BetterClassWeightWinTypeRacesAgo: ditto but looking at the number of races ago
Posted 13th December, 2015
1. Three new fields added:
- Places Offered: How many places should be offered for each-way betting
- Raw Advantage: How much Raw ratings advantage/disadvantage this horse has against the top rated (or 2nd rated for the top rated horse)
- RAdj Advantage: as above but for the RAdj.
2. New feature added to the code.
If one looks at the line below which features in each of your classes one will see that there's an extra argument at the end, nStakeModifier.
oNodeHandler.AddResult asEntry(), nIndexStart, nIndexEnd, nStakeModifier
This is an optional argument which means that one doesn't have to go through each of the classes and add it. What this does is to modify the stake by a percentage for that entry (the default of the modifier is 1.00, ie no change). So if wished one can do things like:
if len(sHandicap) > 0 then
nStakeModifier = 1.5
nStakeModifier = 0.75
oNodeHandler.AddResult asEntry(), nIndexStart, nIndexEnd, nStakeModifier
Which will put a larger stake on handicaps (50% more) and less on non-handicaps (25% less) than usual. But other examples of use could include putting higher stakes on lower (or higher) priced horses.
The idea came from the The Bet Engine's banded staking facility and I thought it may be useful if we could utilise this in some way.
Posted 8th November 2015
Four new fields added:
- Speed Rating Rank: The new Speed figure in ranked format
- Last Class Drop Rank: Ranking the drop in class of each horse's last race to today's class
- Last Race Rating Rank: Simply how we ranked the horse last time.
- Rating Advantage: How much Rating advantage/disadvantage this horse has against the top rated (or 2nd rated for the top rated horse)
Posted 28th April 2015
There were a number of internal updates concerning the reporting.
The tool was also updated to cater for the new Speed Rating column in the CSV data. Version 62 of the Archived Data is required for this code to run.
Posted 28th February 2015
One request for an update as it was something of whimsy it was written and released right away.
The request was for the option of a notification for when the run finished. If the Notification checkbox is completed then the .wav file called CompletedRun.wav (which ought to be in the same folder as the DA Tool) is played. If the file isn't present then nothing will be played.
If one wishes to play an alternate .wav file then copy one of your choice into the DA Tool folder and rename it CompletedRun.wav .
The .wav file was composed by Mark Brooker, a talented musician and engineer.
There was no released version of 61f which was an internal update.
Posted 27th February 2015
There have been a number of improvements to this version of the Data Analysis Tool.
Chief amongst these updates are the following.
Export to CSV
There is now the option to have the final report presented as a CSV file rather than a text file. This is so that one may then do one's own Excel analysis on the output data without extensive editing and importing of the final report.
Excel 2007 and Earlier
This version of the Data Analysis Tool will now run on Excel 2007 and previous versions.
If one runs on one of these 'earlier' versions of Excel then there are two things to take note of. First, the PNY calculation doesn't work and, secondly, the Archie calculation is slightly different to the output run on a later version of Excel.
The first of these two issues will be investigated and fixed Soon™ whilst the second will be left as it is for the time being as the difference between the two versions is only marginal and is currently being ascribed to rounding errors in the calculations. But as things stand the second issue is of little concern.
Posted 21st November 2014
There are a number of extra columns for this release. Most of them are fixed but one is a 'floating' column as it is populated during the monthly results update.
This is the Duration which gives the recorded winning time of the race in seconds. It will follow the 'Lengths Behind Total' field which was introduced in version 60 of the data.
As before please look for this information in the same manner as we do for lengths behind.
Dim nRaceDuration as Double
nRaceDuration = Val(asEntry(GetColumnNumber(COLUMN_DURATION) - 1))
The following fields have been introduced with this version and their locations are fixed.
NI - Races since the trainer has saddled a winner
NJ - Days since the trainer has saddled a winner
NK - Trainer Form Last Ten Runs - Runs
NL - Trainer Form Last Ten Runs - Wins
NM - Trainer Form Last Ten Runs - Profit/Loss (to £1 stakes)
NN - Trainer Form Last Ten Runs - Strike Rate
NO - Trainer Form Last Ten Runs - ROI
NP - Trainer Form Last Forty Runs - Runs
NQ - Trainer Form Last Forty Runs - Wins
NR - Trainer Form Last Forty Runs - Profit/Loss (to £1 stakes)
NS - Trainer Form Last Forty Runs - Strike Rate
NT - Trainer Form Last Forty Runs - ROI
NU - 5 Year Class Differential Ranking
NV - 5 Year Weight Differential Ranking
NW - 5 Year Class & Weight Differential Ranking
NX - 5 Year Class Differential Wins Ranking
NY - 5 Year Weight Differential Wins Ranking
NZ - 5 Year Class & Weight Wins Differential Ranking
OA - 1 Year Class Differential Ranking
OB - 1 Year Weight Differential Ranking
OC - 1 Year Class & Weight Differential Ranking
OD - 1 Year Class Differential Wins Ranking
OE - 1 Year Weight Differential Wins Ranking
OF - 1 Year Class & Weight Wins Differential Ranking
Version 60 - Winning & Losing Distances Added
This version has two extra columns; Lengths Behind and Lengths Behind Total.
Note that these two columns are in a floating position; they will move between releases of the DA Tool. This is because this information is gathered after the CSV has been prepared. So in your code do not please look for this information from a column in the usual way but please look for this data via the following method:
Dim nLengthsBehind as Double, nLengthsBehindTotal as Double
nLengthsBehind = Val(asEntry(GetColumnNumber(COLUMN_LENGTHS_BEHIND) - 1))
nLengthsBehindTotal = Val(asEntry(GetColumnNumber(COLUMN_LENGTHS_BEHIND_TOTAL) - 1))
The intrinsic variables used (COLUMN_LENGTHS_BEHIND and COLUMN_LENGTHS_BEHIND_TOTAL) will be updated with each release of the DA Tool so that the code, if these variables are used, will always work correctly.
The DA Tool (see below) will be updated to version 60. For this data please use version 60, and only version 60, of the DA Tool. Any other version used may explode causing embarrasment and perhaps slight bruisings.
The Lengths Behind field indicated how far behind the horse finishing before it is placed. The Lengths Behind Total field gives the lengths behind the winner of the race.
If the horse is the winning horse then these two figures will show the distance between itself and the second placed horse but as a negative number. Thus if a horse wins with a Lengths Behind margin of -3 then it has won that race by three lengths. A horse winning a dead-heat will, of course, have a Lengths Behind of zero.
Horses which fail to finish will have a Lengths Behind value of zero.
A further update in the Archive data is that an error in the column titles in which the Dam Age PL was duplicated. The second of these should have been Dam Age SR. The error only effected the title row and not the actual data and has been corrected.
Version 59 update - changes to DataFilePath.txt file
In version 58 there has been a change to the DA Tool in which the user now can have, if he or she so wishes, to have three lines of in the DataFilePath.txt file, each being a valid path.
From version 68 of the DA Tool there is now a drop-down box on the front page of the DA Tool asking the user which set of input data files are to be used. If there is only one path given in the DataFilePath.txt file then there is no change to the operation of the tool at all and all the user will see is a drop-down menu with one item in it saying All.
If there are three lines in the DataFilePath.txt file then three options are listed in the drop-down menu and they will be given as All, Test and Live.
The idea behind this is that the user can now have three sets of the CSV data files. The first set to contain all of the data files available. The second set, which will be labelled 'Test' could have a sub-set of all of the CSV files and the third option, 'Live' could have another sub-set.
For example, in my folder which is referenced by 'All' I have all of the data files from 2009 to the present date. So running with 'All' selected will run through every race since 2009. In the folder referenced by 'Test', which will be the second line in the DataFilePath.txt file, I have all of the data from the years 2009-2012. And in the third folder, the third listed in the text file and is displayed as 'Live', I have all of the files from 2013 to the present date.
Using this set-up I am able to make a series of development tests against the Test data and then when it looks good I can then test it against the Live data. If then I like the results against that then I do a final result with All the data.