UK Horse Racing Data Analyst Tool, version 2.12IntroductionThis updated version of the Data Analyst tool is for those who wish to explore the CSV Archive results with the aid of our tool. This version of the tool, which is presented as an Excel spreadsheet, requires the user to be able to write simple VBA statements. This isn't as hard as it sounds as the structure of the tool has been set up and all the user has to do is to change some statements which works out the requirements. Even if the user isn't au-fait with VBA it won't be beyond anyone with average computing abilities to use this tool. If anyone wishes to learn VBA then the internet is full of good sites explaining elementary Excel VBA. Version 2.13This version was released on the 14th April 2012. There are countless updates in this version, as would be expected of something which has been worked on continually for the last few months. One of the major inclusions has been the ability to now work on the Class and Weight Differentials. However, this will require the download of the new format (version 41, or higher) of the archived historical data to make it run. The results files have been improved significantly and these will continue to do so. Data Analyst DownloadsDownload the Excel spreadsheet from here
Version 2.13 Running The Data Analyst ToolIn the .zip file containing the two text files. One is The data files are the CSV Result files which we post each month and these should be unzipped and altogether within the same folder. There are two check boxes. Clearing the 'Back' checkbox will make a laying report and clicking the 'Place Analysis' checkbox will have a second sweep through the run to look at the place backing. The ReportWhen the analysis has finished, three files are created in the Output Folder, the ones definied in OutputPath.txt. The largest file will be a large CSV file containing all the selections which fit the selection criteria. the smaller CSV file is a breakdown, by month, of the profit and loss per month; the idea of this is that users can see how smooth or volatile the system is and/or which monts are generally worth exploring. The third file is the all important text file and this is what most people will want to look for in the report. The following is an extract of a run looking at the top rated horse to value which has run within Great Britain..
The left hand column will show something like WIN or PLACE which helps to identify what sort of bet it is. Also within the report there will be rows for, for example, Hurdle Win so that one can look into different race types. The second column is rather straight-forward; All means all bets, All Value is everything to value and, overall, these are self explanatory. The next five columns are the regular Runs, Wins, Profit, Strike Rate and Return On Investment. Note that in this report all winnings include a 5% Betfair tax. All results are taken from the Betfair SP price, where applicable, and anything which don't have a Betfair SP are assumed to be 20% higher than industry SP. The next two columns are soemthing which I have designed which helps to identify systems which are smoother than others. Imagine you have a £100 bank and each bet you place 1% of the value of the bank so the first stake will be £1. The last column will tell you what the final state of the bank will be at (again after 5% Betfair tax) at the end of the period. There is one issue here; due to compounding issues some of the banks would exceed the US National Debt so this model caps all stakes at £100. This is because it is assumed that this amount can be staked on any race. The figure in the column before this, LDA%, tells us what is the largest percentage drop of the account at any point in the history of this system. So, in this example the largest profit drop was 32.5%. This is to give an idea on how volatile a system is. Report UpdatesThe report is being constantly updated all the time to add new features. For example if a run was engineered so that there were to be only Chases to be examined then only Chases would appear in the report to save time and space. Since version 2.12 two more columns have been added. These are the largest Draw-Down to level stakes. The ProgrammingIntroductionThis version of the Data Analyst tool requires the user to get his hands slightly dirty and to write or update some VBA. The code has been designed so that this isn't too traumatic so an elementary knowledge of VBA is required. If one can operate a spreadsheet then this won't be beyond the user at all. Opening the IDEStart the spreadsheet and then open the Integrated Development Environment (the code editing area known as the IDE) by pressing alt-F11. Looking Aroundin the top left hand corner of the IDE the object tree is visible and it shows all the classes in the spreadsheet.
In the Classes section all the classes which begin with the letters 'cls' are all analysis systems. The user will be encouraged to write his own or to edit any of the existing classes. It is expected that this list of classes will extend over time. But which class is run? Well if the Modules sub-tree is expanded and then the modLink module is opened and then the following code can be seen.
Public Sub CreateClassLink()
Everything in green is commented out so we can see that there's one line of code in this routine. Which is as it ought to be because there is only supposed to be one line here - that which links the relevant test class into the tool. So, in this example we can see that the last run called up the class which clearly did some work on the Raw Underscores. If the user creates a new class then it has to be linked in here else it won't run. The ClassesAs mentioned earlier all of the classes starting with 'cls' are the various system tests. These are what the users should be creating and editing. There are six other classes which are system classes and are used for the processing; have a look, by all means, in there but please don't edit anything. These six start with the letters 'zzz' just so that they are placed at the end of the list. Two pairs of these are concerned with the linked lists of the Courses and Goings and the last two are for the main reporting and calculations. If one of the classes is opened and the routine RaceAnalysis() is located then there's a section in here which is between the following two comments. This is an example of one such test routine:
' ********* Place the selection criteria after here
The first section is concerned with grabbing all sorts of data from the CSV files and shoving them into various variables (please, please make sure that good variable naming is used - if you're going to use a variable called nTJCProfit then please ensure that it contains the profit/loss for the Trainer/Jockey/Course combination and nothing else). The next section is a pile of If statements. If the horse passes the requirements then the oNodeHandler.AddResult() routine is called. So in this above example we're looking for all horses which pass the following requirements:
Now that you can see the various tests looking above at the first section one can see how the data gets into the variables. Note that how the Horse Ranking (nRanking) is derived from column AZ from the CSV files. The distance of the race is taken from column G and so on. And that is basically it. It's that simple. oNodeHandler ClassAt the core of all of these If statements is this routine within the oNodeHandler class. What happens here is if a horse passes all of the requrements then the detaila are passed to this .AddResult() function and then the horse's details are then added to a linked list. Then when everything has been added to the linked list this is then processed hundreds of times to make the report. The user need not worry about this; the code looks after this but as long as the user knows all this happens after the .AddResult() function is called then that's all that's needed. Consider it, if you will, as a plug hole where data goes to be processed. Useful FunctionsThere are a few useful functions written and placed within the module ModFunctions which can be called. We have seen a couple used in the above example but the functions which can be used to help the data analysis include: IsValue()This returns True if the price of the run was value or False if it doesn't. It is called as follows: IsInPriceRange()This returns True if the price of the run was between the two numberic values or False if it wasn't. It is called as follows: PoundsFromTopRated()This returns the number of pounds which this horse was rated behind the top rated horse in the race. If this is the top rated horse then zero is returned. It is called as follows: IsTopJumpRated()This function returns True or False. It returns True if this horse is the Top Rated Jump horse in the race (Chases only). It is called as follows: CountJumpRatedHorses()This function returns the number of horses which have a certain jump rating (the last parameter). This applies to Chases only. It is called as follows: GetRaceGender()This function returns the gender of horses competing in this race. It is called as follows: CountRaceAlarms()This function returns the number of any given alarm in the race. For example you may wish to search the race and cound the number of "p" alarms. It is called as follows: IsInRangeAll()This function returns True or False depending on whether all of the Columns (which should be numeric!) fall between the maximum and minimum values. It is called as follows: InRangeCount()This function returns the number of Columns (which should be numeric!) fall between the maximum and minimum values. It is called as follows: |