Web Analytics for Race Result Websites

June 17th, 2018

Why Race Results?đź”—

Every weekends hundred of thousands of people worldwide participate in races across the world recorded by race tracking software by a handful of companies like FlashResults, HyTek, and MileSplit. These companies have millions, if not billions, of performances and splits produced by young and old runners of all races, genders and levels of fitness. In the current world of big data, the best thing a company can do for themselves is to create more data, because the data can be used to generate more data about/for users. Given the lack of competition among race timing companies, these companies will likely continue to generate even more data about athletes and their performances as time progresses. Based on their steady stream of data, these companies should be flourishing and becoming ever more knowledgeable and valuable to the athletics community. But they are not. These websites are not doing anything with the data that they have and that is being gathered. No information is being generated from the data, and no content is being created to draw in the community. Race result websites continue to provide the same service they have been providing for decades: they take the text results that were sent to them and format them into a table for the user. While other sports are spending millions of dollars to gain insights using data analysis techniques, the only group doing any analysis on running is the ARRS, which neither well-known nor well-funded. By their very nature track & field and running/walking races are among the most quantifiable sports in the history of the world, and is extremely easy to generate/record/interpret data for. So if the data for the sport is so accessible, why aren’t we seeing the same widespread data-driven approaches and insights that we see from other sports on ESPN?
In my perspective, the reason for this lack of data usage likely boils down to a handful of potential explanations

  • The athletics community doesn’t want them. Athletics has always been proud of it’s grassroots approach, and for good reason. Athletics is a sport that anyone can get involved with, regardless of financial status or geographic location. It is one of the oldest sports that it requires no equipment to form competitions or understand the outcome of the results. From what I see on the internet, this grassroots movement is still very strong, and it is possible that the desire to retain these grassroots makes these companies hesitant to attempt the development
  • Companies don’t have the resources to spend. Performing extensive, large-scale data analysis can be difficult and/or time consuming. In order to do so requires computing power, and the mathematical/software development skills to efficiently calculate, store and render the results of the calculation to users or corporate partners. Software developers with these skills may demand salaries that the companies are not willing to pay, which would explain why the companies have not pursued these features.
  • It is not profitable All of these companies are for-profit. If the companies do not forecast a positive return on investment, they may not feel the incentive to provide a service which may cause them to lose money, regardless of whether the community wants the services or not

Beginning data-intensive analysis of the second most popular sport in the world, is no small task and I do not begin to think that I am capable of covering that topic comprehensively, much less in a single blog post. But…we have to start somewhere, which is why we are here. Rather than tackling the technical problems, I will start by discussing how data could be used to increase the profitability of these companies.

## How Did We Get Here

To understand what runners desire and expect from websites today, it will helpful to learn how we got where we are today in the running community: During the first running boom in the 1970s, runners would go to races, and then wait after the race for the race officials to post the paper copies of race results to a bulletin board so the athletes could see the official results of the race. At the time, there was no other way to distribute accurate data about their performances. This method put the race results in the hand of the people who needed it most: the runners who ran that race. Runners could not find out how runners across the country did, because they had no way of accessing the original printed race results. There was no world wide web, because there was no internet. This resulted in the formation of thousands of small, relatively isolated running communities through the United States and other countries. The only time that these running communities intersected were when they ran at another communities race, or at championship events, such as the Olympics Games.

It was not until 1991 that the internet become publicly accessible, ushering in a new era of information exchange. About 10 years after the internet became public, small websites began posting race results on the Internet. The most notable of these sites is LetsRun and ARRS, (which recently got the “run” top-level-domain). These websites were not ran by a funded company but by enthusiastic runners who saw an opportunity to grow the running community. These pioneers broke down barriers between running communities, enabling the exchange of training information, and race results between previously isolated running communities. These websites also led to direct communication between these knowledge hubs (through the newly conceived “e-mail”) fostering relationships between running communities and enthusiasts around the world. These fostered relationships enabled important developments such as the Cameron model for predicting race performances.

The first corporate website dedicated to fitness was Active in the year 2000. Later companies such as MileSplit and FloTrack became prominent, with money and resources were able to provide more than just raw data, they were able to have a person analyze the results of prominent meets, and even post pictures from them. And most importantly, they were able to verify and upload large numbers of performances, from large numbers of races. This provided runners with all the raw, essential race data that they needed. Websites such as ARRS could not compete with the volume of data on these websites or the number of skilled developers they had, resulting in these websites fading from the limelight of common running websites. Runners and other athletes began to seek Heart Rate monitors and VO2MAX become common metrics for toquantify their training and measure fitness, but it was up to the individual to either learn how to interpret the results or find a trained professional to interpret these results to gain actionable insights.

Moving forward to the present day, new technologies, such as improved GPS and improved sensors have been developed so that runners no longer have to manually collect their own data regarding their workouts and athletics performances. Products like Fitbit and Garmin wearables track all of their runs, energy expenditures, running routes, sleeping patterns, heart rates, blood pressure, etc. enabling runners to focus on their workouts and races rather than collecting their metrics. These devices are collecting relatively accurate health and fitness data at unprecedented intervals volumes, enabling runners to get insights throughout the day regarding their fitness, their workout, and their overall health. Breakthroughs and resurgence of machine learning and artificial intelligence in the last 10 years allow developers to create insights from raw datasets, finding insights previously only done by humans, and in many domains, at an accuracy that humans cannot match. Applications of machine learning techniques allow everyday people insights in real-time from data that would previously have taken a person days to identify and interpret. Artificial intelligence can beat a human in Go, improve trading on stock exchanges, forecast web traffic, classify and direct work requests in the workplace, navigate drones, and and many more things. Applications of machine learning and artificial intelligence enable companies like Google and Fitbit to be able to identify trends and provide meaningful feedback to users based on their workouts.

Present Dayđź”—

Today, running does not experience the widespread popularity that it did after the first, or second running boom. While Track & Field and Cross Country are getting more popular as high school sports, running as a whole is losing popularity. In the last 10 years, new fitness trends have risen such as yoga, and new approaches to fitness have been developed like Crossfit, Zumba, and Pure Barre and boutique fitness studios such as SoulCycle.

While there is conflicting views on whether millennial are hurting the running movement or embracing it, there are some data points that are not disputed: Millennial care about fitness, and they are willing to pay for it (Forbes, Goldman Sachs, Men’s Health). Millennial have a greater appreciation for living a healthy lifestyle than previous generations. However, as mentioned by the Washington Post, the values of millennials have generally changed from those of previous generations. Many millennials do not have competitive pursuits for fitness, they hope to live a healthy lifestyle and to enjoy the experience.

While millenials were coming of age, big data came to dominate the world of business. Companies are generating and recording data to find insights for their business decisions, and to give insights to their customers/visitors. While gathering data large amount of data can be costly, an effective way to make old data provide more insights is by gathering more information about the same data points. For example, when you already know that your company’s visitors all have a particular need, finding out what their gender is can allow each dimension of existing data to be broken down to generate a whole new class of insights, This way each additional dimension of data can provide exponentially more information about the existing data points. Data begets data. In many cases, race result websites may receive results from hundred or even thousands of separate races, with some races containing performances for upwards of 20,000 runners. And these race results website have a guaranteed set of information that they will contain:

  1. User who submitting the results
  2. Time of submission
  3. Where results were submitted from
  4. Date of Event
  5. Race/Performance Distance
  6. Race Location
  7. Athlete Name
  8. Athlete Gender
  9. Athlete Performance
  10. Athlete Finish placement

These websites receive 9 dimensions from the results of each race, where the latter 6 dimensions provide essential information about each athlete and their performance. For each additional dimension of data collected about races, athletes, or performances by the website, they are able to breakdown all the other dimensions that they have regarding races, athletes, and performances. This indicates that additional dimension provides an exponential growth in value to the company and, if provided, to the users. However, this value is only realized if it is used to provide insights to users. In the world of big data, websites can either pay to create insights to bring in more users or pay for not creating them by losing the users to another website that will. The exponential growth in value of data can be shown by the number of ways in which the data can be broken down by all of the dimensions:

\[\tag{Eq. 1} potentialbreakdowns = \displaystyle\sum_{size=1}^{dimensions} \tbinom{dimensions}{size}\]

In the world of big data, websites can either pay to create insights to bring in more users, or pay for not creating them by losing the users to another website that will.

Understandably, race results websites are generally not huge profit centers. With the exception of Active.com, these websites are either run by a single person, or by a small staff. These sites generally do not have many advertisers, and do not have majors sponsors or supporters. These websites do not have the staffing to write long-form articles or personal responses to each user, and lack the funding to use cloud services to provide real-time computationally expensive data analysis. These constraints require the developers evaluate their process, and the return on investment for each feature for both users and the website.

I examined some of these websites and found that these websites do collect user actions to add dimensions to user data. The dimensions tracked for these websites tended to be following:

  • Page Views
  • Page View Duration
  • Video Clicks
  • Multivariate testing results
  • Clicking “Read More” for an article

While it very possible that these websites are collecting data server-side, from the client-side it appears that there are many dimensions that are not being captured that could later be used to find insights into visitor/user behavior. These are opportunities to create additional value through data. We will start with just concerning ourselves with a single way of adding value through data: By adding additional dimensions by which users/employees can meaningfully break down data.

Data Integrationsđź”—

Integrating analytics tools into the website can be the fastest way to get access to more data. The fastest way would be to integrate Google Analytics data into the user data using User ID, therefore being able to tie authenticated users to Google Analytics users . This would allow employees to breakdown online behavior based on profile information. Out of the box, Google Analytics provides hundreds of data points based on users, sessions, and hits which could then be used to create real-time recommendation systems and feedback loops on the website. Race result websites have thousands of meets, and hundreds of thousands of races. Being able to identify and recommend the races that users care about would significantly reduce the time users spend looking for the content, and more time consuming it. These benefits would particularly improve the user experience on mobile devices, as the users would save significant data on their data plans by loading fewer pages searching for content they do not want and would also reduce bounce rates.

Internal Analytics: Heuristic Segmentationđź”—

Not all people have the same interests. This is true for all aspects of life, including race results. To ensure that visitors enjoy themselves, websites often highlight content for each user that they believe they will enjoy, based on their sharing attributes with other users who have enjoyed the content. This is called user segmentation, where groups of users with common attributes are called segments. This has not been done on any race result websites I have seen thus far and a promising method for ensuring that visitors enjoy themselves and want to come back to the website in the future. But the website first needs to be able to learn about the attributes that their visitors have using user analytics collection. This data can then be analyzed to create new segments, and to make live recommendations to users.

We will setup Google Analytics to record/track attributes of users and their behavior using Dimensions. Using Google Analytics to store dimensions is beneficial by not having to pay for database storage, and also taking advantage of the tools that Google provides to use the data, such as Google Analytics visualizations, and Google Data Studio to create dashboards to breakdown the data. Some useful dimensions to record for a race website could be:

  • Paid member vs unpaid member
  • Primary Role of member (ex. athlete, coach, official, parent, enthusiast, alumni, etc)
  • Are they a newsletter subscriber?

These dimensions would allow employees to identify the behaviors of users across the platform based on business categories and based on business values. If paid members are paying close attention to professional meets, then developers can spend more time focused on providing automated feedback to users on professional runners, or add the ability for users to subscribe to results for professional meets, corporate teams, or professional athletes themselves. The process of grouping users by these behaviors and values is called user segmentation.

To further segment the users, a race website would want to create multiple dimensions for segmentation based on domain-specific values. Track & field and cross country have their own tiers in which to could segment fans. We could use the following tiers for track & field and cross country:

  • Sport - Cross Country, Indoor Track & Field, Outdoor Track & Field
  • Level - Middle School, High School, Club, College, Professional
  • Division: Level 1 - VSHL, NCAA, NAIA, etc
  • Division: Level 2 - Southeast Region, Central Region, etc
  • Conference - Atlantic Coast Conference, Peninsula District, etc
  • Affiliate - United States, Oregon Project, Virginia Tech, Gloucester High School, etc
  • Type - Field, Sprints, Middle Distance, Distance, Long-Distance, etc
  • Athlete - Doug Fenstermacher, Galen Rupp, Mo Farah, etc.

A user could potentially have any combination of interests in the tiers, or not have any interests at all in a tier so a user could be in any number of tiers simultaneously. For example, a high school coach would be very interested in the athletes they have coached and are currently coaching, as well as moderate interest in the affiliates in their conference and regions. An NCAA alumni of the University of Virginia may have a moderate interest in their old events, a moderate interest in the University of Virginia, and a strong interest in their old teammates who went on to run professionally. We could infer the interests of a user based on our user analytics, or they could manage their interests themselves through a user preference control page. These preferences would need to be updated over time, as user interests can change over time. Regardless, these preferences could be used to recommend articles or race results to each user. This would change race result websites, going from a search-based website, where you already must know what you are looking for, to a browsing site, where you can find something you didn’t know you wanted to see. This could increase overall engagement and broaden the horizons of many fans.

Some race results allow users to claim athlete results as their own, or, to claim a runner’s identity (the affiliations, their race performances, their championships, etc) as their own. By being able to link both web behavior and racing performances/behavior, the number of insights that can be gained about users grows exponentially. Historically, the web has always struggled to obtain and link a person’s “real life” data to their digital presence. These websites have that functionality built in through their race results, which happen to be authoritative data about the non-digital world of each runner. Using this authoritative data, race result websites can make direct inferences about their digital behaviors using a mixture of digital behaviors and accurate physical data from race results. This would allow developers to identify causalities in user behavior online. For example, the reason John Smith is suddenly spending so much time looking at Bob Doe’s profile page is because Bob Doe out-kicked John Smith the week before to win the championship. This would allow race result websites to identify rivalries between athletes, and affiliates based on their online, and offline behaviors. This allows the platform to breakdown users based on their rivalries and loyalties.

Website Performanceđź”—

I was a Milestat junkie when I was running in high school. I spent a good amount of time between meetings reading through results on other races, and on trips to/from meets thinking about my own performances and comparing them to other runners. If smart phones existed, or were affordable at the time, I would have been on MileStat, Flotrack or FlashResults checking race results while in the backseat of my parent’s car or on the bus. Today, competitive runners/coaches use mobile devices to access race results while they are on trips for races, making mobile performance of race result websites important, as these segments are the primary users of race results.

While loading the Flotrack website, I loaded 3.4 MB of data to load the home page, and 3.1 MB of data for Runner’s World. Of the data that was sent, roughly 2/3 of all data came from images. Loading images synchronously requires images to be loaded before loading content further down the page, preventing users from accessing the data on the website. While simply loading the images asynchronously would allow users to access all the data on the website, it would still all the data necessary to load all the images. The optimal route would be to save user data by lazily load images and videos, so only the media viewed by the user will be loaded. Using this method, mobile user will save 66% of their mobile data when loading the web page and save their data plans. If a service like Cloudinary is being used to serve images, it can save the owners money by reducing the amount of data they are charged for sending data as well by using Service Workers as well. The WorkBox Javascript library is an easy starting point for installing Service Workers on a website. Additionally, WorkBox also enables offline usage of Google Analytics, so actions taken by the user while offline will still be recorded in Google Analytics when the user goes online again.

Conclusionđź”—

Race result websites have many low-hanging fruit for improving the user experience for users and for learning about their users/visitors. The low-hanging fruit for learning about users/visitors with the most immediate utility involve user tracking, user segmentation, and external data integration. The best opportunity for improving the user experience is by running regular benchmarks to improve the load times of pages. There are many more opportunities than what I have covered here to improve the utility race result website. I hope we will start to see race result websites take advantage of new approaches to data processing and data-driven decision-making and provide a greater service to the running community.