UK Horse Racing's Data Analyst Tool

Programming Example 1

Introduction

In this example we're going to create a new class which investigates the Top Rated horses jumping over the hurdles. This isn't a difficult class to create and we shall do it here.

Making a New Class

Open up the DA Tool and then open the IDE (press Alt-F11).

The next thing would be to create a new class but the easiest thing would be to copy one. So find a class that's already present and then select the whole code within that class and finally type Control c to save the code to the clipboard. Then in the Project Explorer of the IDE, right click and select Insert > Class Module.

You have now created a new class called Class1.

Because you still have the code in your clipboard the best thing is to dump it here now. Select the whole of the code in this new class module (it may have one line saying "Option Explicit" or the new class may contain nothing) and then paste in your code.

The next thing is to give this module a name. Since it's going to be the Top Rated Hurdles let's give it the name 'clsTopRatedHurdles' in where it says 'Class1' in the Proprties window below the Project view. Press Control s and now you have saved a new class.

Changing the Class' Title

But it still does the same as the previous class so let's change it to do what we wish.

The first thing to do is to change the name of the output file and the description of the code because you want to know in five years' time what this code does and also this name is reflected in the output file, so each time you edit a class make a change to this line. Anyway, near to the top of the class is the line which starts like this:

Const CLASS_TITLE As String =

After the equals sign is string in double quotes. Change the contents so that it now reads:

Const CLASS_TITLE As String = "Top-1 Hurdles"

Save again with Control S and then we can move onto the main step.

Building the Analysis Section

There's a couple of ways in which to do this. The first way, is what I do all of the time is to edit the class module I have in front of me by adding in new assignments, new variables and what have we. I never ever create a new analysis routine from scratch.

So this is what we will do, we will create a new analysis routine from scratch.

No, I am not being awkward for the sake of it. It is just for this example only so that we can see only what we need to see and not have any of the clutter which are in the other class routines get in our way.

So what you do is to find the whole chunk of code which starts off with the line:

Public Sub RaceAnalysis

and select that line all the way down to, and including the line which says

End Sub

Then copy and paste in this code into where the last routine came from:

Public Sub RaceAnalysis(nIndexStart As Long, nIndexEnd As Long, nMaxCols As Long)

 Dim nIndex As Long        ' The index into the row block, each block is a race
 Dim asEntry() As String   ' The row being looked at
  
  Dim sCSVPosition As String, nCSVPositionColumnNumber As Long
  
 
  ' ****  This run's declarations

  Dim nRanking As Long
  Dim sRaceType As String
  
  Dim bIsSelection As Boolean
  
  
  For nIndex = nIndexStart To nIndexEnd
    asEntry = Split(asRaces(nIndex), ",")
  
    nCSVPositionColumnNumber = GetColumnNumber(COLUMN_RESULT) - 1
    sCSVPosition = asEntry(nCSVPositionColumnNumber)
    
    If sCSVPosition <> "N" And sCSVPosition <> "A" And sCSVPosition <> "V" Then
  
  
      ' *********  Place the selection criteria after here
      
      nRanking = Val(asEntry(GetColumnNumber("AZ") - 1))
      sRaceType = Trim$("" & asEntry(GetColumnNumber("C") - 1))
     

      bIsSelection = False
      
      If nRanking = 1 Then
        If sRaceType = "Hurdle" Then
          bIsSelection = True
        End If
      End If
        

      If bIsSelection Then
        oNodeHandler.AddResult asEntry(), nIndexStart, nIndexEnd
      End If
  
      ' *********  Place the selection criteria before here
    
    End If  ' Check for non-runner or abandoned/void race
  Next nIndex

End Sub

Then save the file in the customary way.

Explanation

Warning: This could get technical.

This subroutine, RaceAnalysis() is the only one in that module that you care about. And then not all of it too. The idea is that even though the DA Tool contains many thousands of lines of code you, the user, only have to deal with a small subset of it.

After the first comment (this is shown up in green) there are three Dim statements, this declares three variables that we’re using. There are other such declarations but they are out of the way at the top and you don’t bother with them. However, for this run we have created three variables.

The first is nRanking which is declared to be a Long. The Dim statement, by the way, is the act of declaration.

The Long is the variable’s Type, as in type of value it holds. This is a form of numeric variable (i.e. only holds numbers) but being Long it only holds integer numbers. If we wanted to hold real numbers, i.e. numbers with a decimal point in, then we’d use Double as the variable type. But for the ranking in the ratings it’s only going to be a whole number so Long is perfect.

The second is sRaceType which I have declared to be String. This holds a string of characters such as, for example, “Hello, World”. Note that I have decided on the names of the variables; they could be most anything.

A word of advice: it’s always the best to make variable names meaningful. The purpose being that I can look at this code in ten years’ time and understand that sRaceType is a variable holding a string which I expect the race type to be in. Furthermore, I can pass this code to someone else and they too will know what sort of data that variable contains. Also note the first letter of these two variables is an ‘n’ or an ‘s’.

This is my cut down version of Hungarian Notation (as it’s called) which I put letters at the front of a variable so I know what type it is. ‘n’ is numeric and ‘s’ is string. Yes, ideally, I should have used ‘l’ rather than ‘n’ but there’s a long and historical reason why I don't.

Note: There is a current trend not to use Hungarian Notation. But I believe that those who choose not to are fools and heretics and, rightly, should be burned at the stake. But that's just a personal opinion.

We come then to the last variable which is a Boolean variable. Booleans are either True or False. I have named this one bIsSelection - the b is for Boolean and then the IsSelection is like a question to myself. If the value is True then it is a selection and if it’s False the it is not a selection.

It's always best to name Boolean variables in such a way that we know what it means to be True or False. For example, if I see a variable called bIsRaining and it's False then I can surmise that it's not raining. The same with bIsFullOfEels; if it is True then we know that my hovercraft is full of them. If I see that bIsNotDrunk is True then I will go and do something about it.

This is a little style note which helps with naming of Booleans and it helps in the future when one returns to the code.

That’s the declarations over. The next thing to do is to populate the two variables. This happens after the second comment and what happens here is that this section of code (the stuff between the last two sets of green comments) is executed for each row in each CSV file examined.

The first of these statements grabs the data from the CSV file in what would be cell ‘AZ’ (assuming we opened it up in Excel) and shoved the value, as a numeric, into the variable nRanking. The second assignment grabs the data from what would be cell ‘C’ and then places the value, as a string, into the variable sRaceType.

Now, a few things here. I say “what would be cell….” because I don’t open the CSV files as Excel files. As you may know, or should know, a CSV is simply a text file which is easily read by Excel. I have to emphasise; a CSV file is NOT a spreadsheet file at all and should always be considered to be what it really it is: a text file.

So, to explain further, the code grabs a row of data from the CSV file, line by line. The line of data is large row with each field separated by commas (open it in a text editor such as Notepad and have a look) hence the meaning of CSV: Comma Seperated Values.

Technical Aside:
What happens in the code here is that the data is brought in as one great chunk of text outside of this subroutine that you can see here and put into a large array containing string variables. Each element in the array corresponds to one of the fields in the CSV file. So, for example, the third field in the CSV file is the Race Type and so if we want to access that data then we need the third element of the array.

Now, to make things a little more confusing when we’re dealing with arrays we have to be aware that they start with the first element being numbereed zero, the second element is number one, the third element is number two and so on. So, to get the Race Type we will want the third field which means that we want the second element of the array (the first field is in the zeroth element).

Now because we are used to working with the CSV file and when we’re coding the DA Tool we can use this page to tell us what’s in which field.

We can see that the Ranking of the horse (top rated is one, second rated is two and so on) is in Field ‘AZ’. So we will want to get the AZth field into our variable. Now, I don’t know about you but I don’t carry the alphabet in my head and I can’t tell you what how many fields in is, say, field ‘GX’ (this is, by the way, how well the horse ran last time) so we have a function that converts these to a number.

This is why we have this in the code:
GetColumnNumber("AZ")

This function gets the string “AZ” and returns the value 52, this is something written so that we don't have to do the work ourselves and it's easier to debug the code if there's any problems. So in this instance, field AZ is the fifty-second column in the spreadsheet.

However, do you remember how I said that it was stored in an array which always starts at zero? So then to get this data we need to get the data from element number fifty-one.

The array the data is stored in is called asEntry(). The notation at the start reminds me that it’s an array (the ‘a’) and that the array is full of string variables (the ‘s’).

So to get the Ratings Ranking we need to get the data from the AZ cell and this is written as:
asEntry(GetColumnNumber("AZ") - 1)

Now we’re going to want to put this into a variable, nRanking, whose type is Long. But we have a string value (the array contains strings remember) and so we have to convert it to a numeric value so we wrap that up with a Val() function that turns the string into a numeric value and then we can place it into the variable as so:
nRanking = Val(asEntry(GetColumnNumber("AZ") - 1))

This is a long explanation as to why things are done this way. But if you look through all of the DA Tool code then you will see that all of the numerical assignments from the CSV file are done this way. If they aren’t then they ought to be.

Something similar occurs with the sRaceType assignment. This time we’re getting a string value out and we don’t need to convert it to a numeric as it’s going into a string variable and it’s a string anyway.
sRaceType = Trim$("" & asEntry(GetColumnNumber("C") - 1))

We could have just put the following instead:
sRaceType = asEntry(GetColumnNumber("C") - 1)

But what this does is a little amount of defensive programming to stop things going wrong. Most programming you will find is preparing for errors and as a programmer one has to constantly think “What can possibly go wrong here?” which makes programmers generally a pessimistic bunch at heart. Anyway, this code that we’re using is putting a zero length string at the front of the value brought from the CSV file. This is because anything appended to the back of a string becomes a string, so if the data comes out a numeric (it shouldn’t) or a special form of null string (it certainly shouldn’t) then whatever is presented becomes a string and we know then that nothing is going to blow up. Then the whole lot is trimmed, which means that any spaces at the start and end of the string are removed. What then is left is what we want: the data value and it’s then placed, as a string, into the sRaceType variable.

The next statement is where we assign the value of False to bIsSelection. That’s easy enough to explain: at this point we don’t know if this is a valid data row or not. These tests come in the next section.

This test section is straightforward to understand:

If nRanking = 1 Then
   If sRaceType = "Hurdle" Then
      bIsSelection = True
   End If
End If

The first line here asks if the nRanking value is one (i.e. top rated). If it is then we go onto the next line. If it isn’t then we skip the statements within the If…End If phrase. If the value of nRanking is 1 then we’re immediately at another If statement; this time asking if the value of the sRaceType variable is “Hurdle”. If it is then the value of True is assigned to the bIsSelection variable.

Note that the only way that the Boolean variable, bIsSelection, can be set to true is if the Race Type is a Hurdle race and the horse is Top Rated.

Then the last part.

If bIsSelection Then
   oNodeHandler.AddResult asEntry(), nIndexStart, nIndexEnd
End If

If the value of bIsSelection is True then this line of code put the data into DA Tool’s data structure for processing. At this stage there is no need to worry about how this works but for now just calling this statement is enough to say that this horse in this race passes the tests required and then include it for the processing run.

And that's it. Once you get the hang of it, it's really simple.

The next thing to do is to go through the other classes in the DA Tool and see what happens there. If you know the rules of a system, such as one of the Missions, then have a look to see how they are coded up in the DA Tool.