TSR Collaboration

Jan 2, 20227 min

The Algorithm: Reevaluating Our D1 Freshman Class Rankings (Men)

The release of our men's and women's D1 Top-10 Freshman Class Rankings is now complete.

However, our analysis of those rookie classes are not.

TSR writers Sam Ivanecky and Maura Beattie were tasked with collecting results for each current NCAA freshman who is considered to be a high school graduate from the Class of 2021.

The only criteria they had to meet was this: If a team's freshman class looks competitive, even in the slightest, add them to our list.

Once all of the data was compiled, we went to work, breaking down each freshman class and attempting to figure out which teams had a top-10 worthy group of rookies.

But while we were attempting to organize that data into a meaningful hierarchy of talent, Sam opted to go with a more direct route. The data science major built a model which evaluated every freshman class that we had recorded and then ranked them.

The results? Very realistic and fairly on par with our expectations.

Of course, there were still a few surprises.

Here are the top-20 men's D1 freshman classes that Sam Ivanecky's algorithm produced...

20. Villanova

19. UCLA

18. Wisconsin

17. BYU*

16. NC State

15. Ole Miss

14. Colorado

13. Boise State

12. Georgetown

11. Northern Arizona

10. Alabama

9. Furman

8. Virginia Tech

7. Oklahoma State

6. Florida

5. North Carolina

4. Oregon

3. Arkansas

2. Stanford

1. Virginia

Now, you probably have a plethora of questions as to why our rankings differ from the model. But before we breakdown those differences, I had TSR data specialist Sam Ivanecky give us a breakdown of the model of how it works.

Take it away, Sam...


Why analytics?

Analytics tend to be a “black box” term, but the idea is to use data to drive decisions. Distance running is a fairly objective sport -- the lowest time wins.

So why isn’t analytics used more in the sport? There’s not a great reason why, but it certainly should be. With that in mind, the analytics approach was used to help develop recruiting rankings for the high school Class of 2021.

This brings an objective approach to an objective sport.

What data was used?

The TSR staff gathered data for recruits across all of Division One. Primarily, we looked at high school results from the 800 meters to the 5000 meters, between both track and cross country. We did not include data for collegiate results from any runners who competed in cross country this fall, although those factors were discussed in our actual rankings.

How does the “model” work?

The term "model" is used in quotes because it is a little different than what many would consider a traditional model. It works as follows:

Step #1

The 800 meters, 1600 meters, 3200 meters and 5000 meters are all being considered for this model. Since results vary a bit based on country and state, times were converted to a standard format.

For example, the 1600 meter, 1500 meter and mile times were shifted to their 1600 equivalent. The same was done for the 3000 meters and the two-mile, converting marks to their 3200 meter equivalent. For the 5000 meters, we converted all track marks to a 5k cross country equivalent by adding approximately 15 to 20 seconds (it's not a perfect science).

Step #2

At this point, each event was broken out and athletes were ranked from fastest to slowest. Each runner was given a percentile score. For example, a runner in the 97th percentile (faster than 97% of other runners) would receive a score of 0.97 for the event.

Step #3

After scoring all athletes, three metrics were calculated for each runner. We calculated their...

  1. Best Event Score: The score for the athlete's best event.

  2. Total Score: The total of all scores that an athlete has.

  3. Average Score: The average score across all events that an athlete has.

Step #4

Data for each team was then aggregated into three metrics:

  1. Average Best Event: The average best event score for all recruits.

  2. Total Score: The total of all recruit scores.

  3. Average Recruit Score: The average score of the recruit scores.

Step #5

The final team score was created by using 40% of the team’s best event score, 50% of the average score, and 10% of the total score.

In the above output you saw earlier, we opted to only evaluate up to the top-five recruits from each freshman class. We did this to avoid less accomplished recruits diluting the success of their top rookie teammates when aggregating scores.

What are the pros and cons of this system?

Pros:

  • Provides an objective ranking system.

  • Easily tweakable scoring (modify how heavily weighted metrics are).

  • Removes any implicit bias about a team or recruit.

Cons:

  • Limited data. Having data for every single recruit would provide a better database.

  • No historic system to compare against.

  • No knowledge of times relative to collegiate performances. For example, Keely Small of Oregon ran a 2:00 800m which is elite in the NCAA, but loses some value since the data does not know that. It only knows she’s the best of any recruit.

How does this improve rankings?

We already mentioned the removal of bias for an objective ranking, but it also gives us a baseline. Obviously, our TSR staff is able to shift teams regardless of what the analytics show, but it provides a baseline across a huge number of data points.

Additionally, it potentially creates a more expansive future ranking system. You can see in the graphs that more than the top-10 teams were ranked via this system, despite the fact we only do write ups on the top-10.

A key reminder when using analytics is that they can be difficult to be comfortable with at first. People often only like analytics when they support underlying or prior bias and belief, and when they tell us otherwise, we often second guess them. This is exactly why analytics are important. They remove these bias and beliefs to provide a level playing field.


How Come BYU is Ranked So Low?

This is actually fairly easy to explain. There are three names in BYU's freshman class who were initially left out of our data collection process. Why? Because those names are not currently on BYU's roster, leading us to believe that they may be on mission trips.

Those three names -- Dalton Mortensen, Brayden Packard and Ben Conlin -- offer plenty of strong talent and value to the Cougars and certainly gave the men from Provo a boost in our rankings.

Of course, without those three -- specifically Mortensen and Packard -- the algorithm was forced to push BYU to a lower spot.

Why Was Oklahoma State Not Originally Ranked?

When we collect our data for these rankings, our writers look at all of the rosters and aim to identify all of the true freshmen who would qualify for these rankings. We also aim to find any news / updates about new talents who are joining the NCAA during the winter.

When it comes to Oklahoma State, we found four men who qualified for these rankings: Joshua English, Rory Leonard, Gabe Simonsen and Ben Calusinski. That is a very strong group of rookie, especially with English and Leonard highlighting the freshman quartet. However, they didn't have quite enough to be ranked in our eyes.

Since then, The Stride Report has learned that the Cowboys are actually adding two superstar foreigners to their roster starting this winter. Those names include Fouad Messaoudi, who has run 3:38 (1500) and 13:46 (5k), and Hafez Mahadi, who has run 48.8 (400) and 1:48.41 (800).

Neither of those names are officially listed on Oklahoma State's roster yet, but it is very clear that their elite firepower, paired with the success of Leonard and English, makes this rookie class one of the NCAA's absolute best.

Why Were Florida and Virginia Tech Not Originally Listed In The Stride Report's Top-10 Rankings?

Anyone looking at the output that Sam's algorithm gave us would think that we have a vendetta of sorts against the Gators and the Hokies. That, of course, is not the case.

I am, after all, a Virginia Tech graduate.

Still, I can at least understand any frustrations some people may have with Florida and Virginia Tech not being ranked. Those two classes had some outstanding depth, specifically in the middle distances. Each rookie class had one or two nationally competitive names.

Florida's class was highlighted by one runner who has posted a mark of 1:51 for 800 meters and three others who have posted marks of 1:53 for the same distance. One of those 1:53 men had also run 4:09 in the mile while a separate distance talent had run 14:57 for 5000 meters on the grass.

Virginia Tech's class was highlighted by three runners who have posted marks of 1:52 for 800 meters, one runner who has posted a mark of 4:05 for 1600 meters and one runner who has run 14:59 for 5000 meters on the track.

Those are all very solid and impressive marks, but the top recruiting classes that TSR ranked all had strong 800 meter runners of their own with depth extending to other event areas. They typically had more than one standout star and the intangibles such as championship experience, roster fit and team needs seemed to make more sense.

No matter how much we wanted to rank the Gators and the Hokies, we couldn't quite go through with giving them a ranking.

Furman's Near Ranking

As you can see, the model suggests that Furman should've been ranked at TSR #9.

And truthfully, we don't entirely disagree!

The Paladins have an outstanding freshman class which features three men who have run 1:53, 1:53 and 1:52. They also have three men who have run under 14:50 for 5000 meters (between the grass and track).

Not all of their current freshmen are stars, and only a select few names really hold most of their ranking value, but the model noticed Furman's spread of strong marks and ranked them at TSR #9 as a result.

Had we thought about it long enough, we may have put the Paladins in the same spot as well.

Why Are Georgetown & Northern Arizona So Low In These Rankings?

As you all saw during our rankings, the Hoyas and Lumberjacks landed freshman classes that were plenty deserving of a top-10 ranking. Of course, if you were to look at the model, then their spot in the rankings actually drops quite a bit.

So why is this?

The answer is depth.

The above model analyzes up to the five-best recruits on each team, looks directly at their times and then applies varying weights to different aggregated scores (Sam explains this above).

For the Hoyas, they have a handful of elite-level talents, but they also have a few impressive recruits where their best results are top cross country finishes.

As for Northern Arizona, they are in a similar situation. Sure, they have an outstanding class, but their accolades on the grass is what holds more value than their times on the track.

When you plug the performances of those recruits into an algorithm, it makes sense why NAU and Georgetown fall back a bit.

    0