In College Football Statistics, I created my Team and Game objects, filled the Teams collection, and wrote a FindByName property for Teams. Now the fun part; filling the Game objects. This is a long one and I’m not going to sit here and tell you it doesn’t need some refactoring. But it works for now. I added line numbers so I can identify the lines as I explain the logic.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
Public Sub FillGames() Dim clsTeam As CTeam Dim xmlReq As MSXML2.XMLHTTP Dim htmlDoc As HTMLDocument Dim hTbl As HTMLTable Dim hRow As HTMLTableRow Dim clsGame As CGame Dim bIsAway As Boolean Dim clsOpponent As CTeam Dim dtGame As Date Const sTBLCLASS As String = "game - log" 10 For Each clsTeam In gclsTeams 20 Set xmlReq = New MSXML2.XMLHTTP 30 Set htmlDoc = New HTMLDocument 40 xmlReq.Open "GET", clsTeam.OffenseUrl, False 50 xmlReq.send 60 htmlDoc.body.innerHTML = xmlReq.responseText 70 For Each hTbl In htmlDoc.all.tags("TABLE") 80 If hTbl.className = sTBLCLASS Then 90 For Each hRow In hTbl.Rows 100 If hRow.RowIndex > 0 Then 110 If hRow.Cells(0).className = "date" Then 120 If hRow.Cells(1).Children.Length > 0 Then 130 dtGame = DateValue(hRow.Cells(0).innerText) 140 Set clsOpponent = gclsTeams.TeamByName(hRow.Cells(1).innerText) 150 Set clsGame = gclsGames.FindGameByDateAndTeams(dtGame, clsTeam.TeamName, clsOpponent.TeamName) 160 bIsAway = IsAway(hRow) 170 If clsGame Is Nothing Then 180 Set clsGame = New CGame 190 With clsGame 200 .GameDate = DateValue(hRow.Cells(0).innerText) 210 .SetScore hRow.Cells(3).innerText, bIsAway 220 If bIsAway Then 230 Set .HomeTeam = clsOpponent 240 Set .AwayTeam = clsTeam 250 Else 260 Set .HomeTeam = clsTeam 270 Set .AwayTeam = clsOpponent 280 End If 290 End With 300 Me.Add clsGame 310 clsTeam.Games.Add clsGame 320 clsOpponent.Games.Add clsGame 330 End If 340 With clsGame 350 If bIsAway Then 360 .AwayRushYards = hRow.Cells(4).innerText 370 .AwayPassYards = hRow.Cells(5).innerText 380 .AwayPlays = hRow.Cells(6).innerText 390 Else 400 .HomeRushYards = hRow.Cells(4).innerText 410 .HomePassYards = hRow.Cells(5).innerText 420 .HomePlays = hRow.Cells(6).innerText 430 End If 440 End With 450 bIsAway = False 460 End If 470 End If 480 End If 490 Next hRow 500 End If 510 Next hTbl 520 xmlReq.Open "GET", clsTeam.DefenseUrl, False 530 xmlReq.send 540 htmlDoc.body.innerHTML = xmlReq.responseText 550 For Each hTbl In htmlDoc.all.tags("TABLE") 560 If hTbl.className = sTBLCLASS Then 570 For Each hRow In hTbl.Rows 580 If hRow.RowIndex > 0 Then 590 If hRow.Cells(0).className = "date" Then 600 If hRow.Cells(1).Children.Length > 0 Then 610 dtGame = DateValue(hRow.Cells(0).innerText) 620 Set clsOpponent = gclsTeams.TeamByName(hRow.Cells(1).innerText) 630 Set clsGame = gclsGames.FindGameByDateAndTeams(dtGame, clsTeam.TeamName, clsOpponent.TeamName) 640 bIsAway = IsAway(hRow) 650 With clsGame 660 If bIsAway Then 670 .HomeRushYards = hRow.Cells(4).innerText 680 .HomePassYards = hRow.Cells(5).innerText 690 .HomePlays = hRow.Cells(6).innerText 700 Else 710 .AwayRushYards = hRow.Cells(4).innerText 720 .AwayPassYards = hRow.Cells(5).innerText 730 .AwayPlays = hRow.Cells(6).innerText 740 End If 750 End With 760 bIsAway = False 770 End If 780 End If 790 End If 800 Next hRow 810 End If 820 Next hTbl 830 Next clsTeam End Sub |
The main block of the procedure is looping through every team in the CTeams collection class. Within that loop, there are two main blocks, offense and defense. Offense starts on line 20 and defense on line 520. For this code to work, I need a reference to Microsoft XML, v6.0 and Microsoft HTML Object Library. Generally, I go get a web page, loop through a certain table, fill a CGame class, and add it to CGames plus to the collection classes within the two teams involved.
The web page fetching starts on line 40. I open a request passing a URL, send the request, then create an HTML document based on the response.
Starting on line 70, I loop through every table in the HTML document and look for the one that has a class of “game-log”. Then I loop through each row in that table (90). I skip the first row (100), any rows that don’t have a date in the first cell (110), and any row where the team name isn’t a hyperlink (120). Teams without hyperlinks are D1AA teams and I don’t care about those games.
Once I find a row that’s relevant, I grab the game date (130), and a CTeam object called clsOpponet based on the name in the second cell (140). Now I have clsTeam and clsOpponent representing the two teams in the game and a date. With those three pieces of information, I can determine if the game already exists (150). That property simply loops through all the games and checks the teams and the date. If the game exists, I don’t want to create a new one. For instance, if I’ve already processed Auburn then the Auburn v. Clemson game already exists and Auburns stats are recorded in there. When I get around to processing Clemson, I don’t want to create a new game, just fill in Clemson’s stats.
To differentiate the teams, I identify the home team and the away team. I need to determine if I’m processing the home or away team on this pass (160). This got a little funky for neutral site games because the web page doesn’t identify who’s home and who’s away. I have this in MUtlities
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Public Function IsAway(hRow As HTMLTableRow) As Boolean Dim bReturn As Boolean 10 bReturn = Left$(hRow.Cells(1).innerText, 1) = "@" 20 If Left$(hRow.Cells(1).innerText, 1) = " + " Then 30 If Left$(hRow.Cells(3).innerText, 1) = "L" Then 40 bReturn = True 50 Else 60 bReturn = False 70 End If 80 End If 90 IsAway = bReturn End Function |
If there’s an @ sign, it’s an away game. For neutral site games, identfied with a “+”, I call the loser the away team regardless of who actually was. Ugh, that code needs to be refactored to bReturn = Left$(hRow.Cells(3).innerText, 1) = "L"
. I’ll get on that.
Back to my fill code. Starting on 170, I create a new game if doesn’t already exist and set the date, score, and the home and away teams. I add the game to the CGames collection class, but also to each teams games collection (310 and 320). Starting in 340, I fill in the home or away stats.
In the second major block of code, I pretty much do the same thing for the defensive stats. There’s not much different here and that screams for a rewrite. I should have these two major blocks in another procedure and just pass the differences in. Like I said, it’s a work in process. I like to get the code working and then refactor where it makes sense. Apparently not before I post to the blog though.
At this point I have a collection of CTeam objects and a collection of CGame objects. My CTeam objects also have their own collection games. Let’s see if it works.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Sub TestTeamsAndGames() Dim clsTeam As CTeam Dim clsGame As CGame Initialize For Each clsTeam In gclsTeams For Each clsGame In clsTeam.Games With clsGame Debug.Print .GameDate, .Matchup, .HomeTotalYards, .AwayTotalYards End With Next clsGame Next clsTeam End Sub |
I check a few and it all looks good. Some of the new properties I wrote for this test
1 2 3 4 5 6 7 8 9 10 |
Public Property Get HomeTotalYards() As Double HomeTotalYards = Me.HomeRushYards + Me.HomePassYards End Property Public Property Get Matchup() As String Matchup = Me.AwayTeam.TeamName & " @ " & Me.HomeTeam.TeamName End Property |
It wasn’t all peaches and beach balls though. I ran into a few problems and had to create this little sub to check out one team at a time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Sub TestOneTeam() Dim clsTeam As CTeam Dim clsGame As CGame Initialize Set clsTeam = gclsTeams.TeamByName("Arkansas State") Debug.Print clsTeam.TeamName, clsTeam.TotalOffenseYards For Each clsGame In clsTeam.Games With clsGame Debug.Print .GameDate, .HomeTeam.TeamName, .HomeTotalYards, .AwayTeam.TeamName, .AwayTotalYards End With Next clsGame End Sub |
That way I could check each game of a particular team for accuracy.
I feel like I have all the information I need. Now I need to actually do something with it. How will I exclude games? I prefer to do it automatically, but I’ll need a pretty fancy algorithm. CBS ranks all 120 D1A teams so I could use that ranking system and exclude teams below a certain number. Another thought I had is to weight the stats against the opponent based on the opponents ranking. For instance, I could divide the 120 teams into thirds. The top third would be weighted 100%. The middle third 50%. The bottom third 0%. I’ll probably need to do it a few different ways to see what I like.
Thanks Dick. Because of this, I will start following college football.
Just kidding! There’s a serious bug in that code, but I’m not going to tell you what it is. I hope you don’t have any big money riding on this.