Why are there are different IDs in the CSV Results?
When one looks within the CSV files, particularly the monthly Results CSV files, one can see that there are two sets of IDs. Both sets more or less mirror each other but they can be slightly different.
So, why are there two sets of, for example, the HorseID and, more importantly, what the heck is going on?
Okay, the best thing is to look at an example. This below are from the March 2017 Results CSV file and the first example shows the two races as being RaceID 294704 and 294705.
But within the same rows, which means the same race, we have a different set of RaceIDs.
So, as asked previously; what the heck is going on?
The first set of IDs that we can see, these have the substring 'Cards' in the column title, are the values in the UK Horse Racing database when the cards are being processed. At the time of data production we don't know if the race is going to be run and we don't even know if the meeting is even running nor do we know if the horse itself is running. But, at the time of producing the ratings these are our IDs for these items within our database.
When we produce the ratings each day the thing we do as a part of the process is to import any results which need to be added. At the same time any cards data (i.e. entries without results) are removed from the database and then only when we get that data for that race, which may take a few days to gather is it added in.
This can't be changed it's the way that the Model's database was designed. The reasons why isn't pertinent to the discussion but this is how the data is handled within the database.
Not only does it take a day or so to get the full data for results into the database because we start the ratings run before the day's racing has finished but we have races that don't happen, we have horses which don't run and we get important overseas results coming in.
So this means then that the RaceID for the cards is going to nearly always different to the final set of RaceIDs when we come to the results. Which is why column FJ is going to be different to column QS. And unless it's a new trainer or a new course then we could expect that the early IDs for trainers and courses will be the same as the result IDs.
It can be seen that the data within FJ to FN are temporary values and have no long term validity. Why are they placed into the CSV file then if they can cause confusion? Simply because members asked for this data even though there could be confusion. I have to state that, in my defence, that I didn't want to add the 'early' data because of the potential for confusion and that we could, and would, end up with two entries with the same unique Entry ID. However, the requests were made and this data was added.
The data that comes in the columns QS to QX is the data that's in the UKHR database here and this will be set in stone. So in our database course 561 will be Ffos Las, Jockey 10791 will be Mr Sheehan and so on. This is the data that should be used if one imports the Results CSV into one's database.
I hope that this clarifies the situation and explains why there's two set of IDs and if one wishes to use our ID values then please use the data within columns QS to QX which, of course, will only be found in the Results CSV files.