Good data is better than no data: M-Lab is built to continually improve data collection efforts

The void of publicly available information regarding the nature and actual performance of customers' broadband connections has begun to disappear. Under NFSNET, which served as the precursor backbone to the commercial Internet, network performance information was systematically collected and made publicly available. In fact, between 1988 and 1995, many of the Internet's fundamental performance and traffic statistics were freely available for anyone who wanted to access them. Over the past 15 years, however, academic researchers and policy-makers have been forced to work with increasingly limited data about the state of the Internet. These data are not just critical for maintaining accountable network operations, they are also essential for guiding the future of Internet growth and adoption across the country and around the globe.

One of the organizations that has begun to fill in this data void is Measurement Lab (M-Lab), a purpose-designed, open, and distributed server platform for researchers and policy-makers to deploy active Internet measurement tools. In the past 3 months, the project has taken important steps towards our goal of advancing network research, improving network transparency, and empowering the public with useful information about their broadband connections. BitTorrent Inc. began using the Network Diagnostic Test (NDT) as a way to both improve the user experience and network friendliness in their µTorrent client while providing the resulting test data to researchers. Second, the FCC released a web based consumer broadband test also utilizing NDT to measure consumer broadband speeds across the country. And the United States is not alone – the Hellenic Telecommunications and Post Commission has chosen M-Lab as its official platform for measuring broadband speeds across Greece. As of April 2010, users around the globe have run over 6 million broadband speed tests – over 100,000 a day!

When the FCC presented its National Broadband Plan to Congress, it included dedicated sections for network transparency and broadband data collection. M-Lab's impact has been noticed. M-Lab and the FCC's Plan have been mentioned in over 2,000 articles from almost 500 different sources. That recognition has driven more interest in testing, and soon nearly 30 terabytes of additional M-Lab test data will be made publicly available to researchers, policy-makers, and the general public.

In many of the articles discussing broadband measurement, the authors provide clear suggestions and critiques on how speed tests and other measurement tools can help sustain a healthy and innovative Internet. Fortunately for M-Lab, our overarching goals and mission parallel these suggestions. A number of these articles have criticized many broadband speed tests as methodologically flawed and returning inaccurate information. A recent critique of data that the FCC purchased from comScore raises specific issues that exist with many measurement tools and platforms, and may, upon first blush, appear to apply to certain tools on the M-Lab platform; however, as we discuss below, most do not, and of the minor issues that remain, the M-Lab platform (like all open source and open technology projects) is built to evolve and improve to address concerns as they are brought to light.

The lack of transparency in both data/information, methodology, and source code is an important criticism often mentioned in critiques of most broadband speed tests. We've made transparency a cornerstone to the M-Lab project. M-Lab requires all its tools to be Open Source, providing anyone with an opportunity to examine their contents and offer suggestions for improvement. M-Lab also requires that all collected data be made publicly available so that anyone who's so inclined can crunch the numbers for themselves. Currently, the public can go to Amazon's Public Data Sets and begin analyzing raw NDT data -- we encourage you to dive in and see exactly what's been collected. With M-Lab, you don't have to speculate on what the data might show – you can find out for yourself. For anyone who wants to find out how tests are conducted, NDT's test methodology is available at the FCC's NDT help page, in more detail at the NDT Google Code Project Page, or the NDT Internet2 page. For even more detail, anyone can download the complete test application source code for more investigation and analysis.

Another common critique is that the widely accepted broadband testing methodology for measuring broadband speed may not necessarily be the most appropriate solution. These tests make use of a client application within a web browser that connects to a server somewhere on the Internet. Just like web-surfing is an on-demand activity, M-Lab's tests such as NDT often are as well. Yes, this affects the test and results toward the times that users are using the Internet, but does anyone really care how fast their Internet connection is when they are not using it? That said, this bias does exist and must be understood and taken into account by researchers, analysts and policy-makers.

Another concern is with the distance between the server and client. This distance not only affects transfer speed, but also can lead to misunderstandings about provider responsibility when more than one provider is involved between end points. These risks can be minimized, but never entirely removed. M-Lab deploys many test servers all around the world to help ensure that servers are as geographically close to clients as possible. This is a place where ISP’s can play an essential role in supporting more accurate results by hosting additional servers in new locations. For instance, they could deploy M-Lab servers within their networks to improve geographic distribution and help provide more robust results.

It is also possible that tests running on the overloaded computers we use everyday might also create a misleading test result. Again, this is a risk that can only be minimized, but not easily removed. Our most widely used test, NDT, addresses this issue by generating data streams from program memory, thus minimizing issues with hard drive activity and memory swaps. Within a single connection, it performs tests with multiple TCP streams in 5 millisecond increments and analyzes and error checks to see if anything appears to have impacted the test. These advanced diagnostic features enables both client and server machines to collect more detailed information than most other broadband tests. These results include all of the TCP diagnostic data needed to identify end-to-end problems affecting throughput.

Further, it is true that a subscriber's other home Internet traffic may conflict with the test results. A test result could also be misleading if a user is using a large amount of their available connection speed or running a test from a laptop using a wireless connection slower than their broadband connection. The user’s home network might be using a firewall or Network Address Translation (NAT). Or the firewall might be utilizing traffic shaping. These interfering factors may not be the responsibility of the users' broadband provider, but they're certainly present and part of their customers' Internet environment. To identify these last mile and home network issues, NDT performs and records two separate "Middle Box" tests identifying if the client is likely behind a firewall and/or NAT. Further, M-Lab's other tools such as NPAD will diagnose common problems affecting last mile and end-user systems while Shaperprobe will identify the most common forms of traffic shaping that may be occurring.

The people behind M-Lab are glad to see recognition of the limits and caveats that apply to all client/server broadband speed tests. Despite these, it is worth remembering that the status quo was no available information at all to the public or for policymakers. These measurement tools are performing active client/server tests and are lighting up a dark void where there has been a lack of transparency for some time. There is always room for improving both current and new testing and measurement methods. For example, the United Kingdom has undertaken further efforts with a partnership between their telecommunications regulator Ofcom, a test application developer, SamKnows, and incumbent ISPs, to provide statistically sound methods and data. In Greece, the telecommunications regulator EETT has resolved to use M-Lab as an open, impartial data collection service. M-Lab will continue to add and evolve open measurement tools and systems for researchers and developers to deploy new broadband tests that freely provide data to policy-makers and the public. The most important aspects of any successful testing platform is that it's processes are open, inclusive, and evolving to integrate improvements as they are proposed. These are all key facets of the M-Lab project.

eett
bittorrent
planetlab
amazon
New America
Google
Victoria University of Wellington
The WIDE Project



SamKnows
AARNet

Internet2
Quova
OJC Tech
Skype
The University of Tokyo