Skip to the content.

This is the accompanying repository for the PAM 2024 paper “Following the Data Trail: An Analysis of IXP Dependencies”. Use the three buttons above to access:

  1. Data to reproduce the plots and analysis from the paper.
  2. Weekly updated data (in form of CSV files) for bulk data downloads.
  3. An API that provides more fine-grained access to the weekly data.

If you use this data, we would be happy if you cite our paper:

Malte Tashiro, Romain Fontugne, and Kensuke Fukuda, “Following the Data Trail: An Analysis of IXP Dependencies”, Passive and Active Measurement (PAM’24), March 2024.

Data Format (Archive)

The table below shows a data snippet (from the archive data) highlighting all columns and data variations.

id timebin hege af nbsamples origin_type origin_name dependency_type dependency_name
1608913 2024-01-29 00:00:00+00 0.76 4 31 AS 2501 AS 2497
1486945 2024-01-29 00:00:00+00 0.102125 4 1112 AS 9498 IX 31
1464692 2024-01-29 00:00:00+00 0.109375 4 158 AS 45232 MB 31;9498
1436188 2024-01-29 00:00:00+00 0.152929 4 1112 AS 7713 IP 80.81.193.22

Working with the data

Below are some examples (based on Python’s pandas syntax):

import pandas as pd

df = pd.read_csv('ihr_tr_hegemony_2024-01-29.csv')
# Get all dependents (as defined in the paper) of AS2497.
df[(df.hege >= 0.1) & (df.dependency_type == 'AS') & (df.dependency_name == '2497')]
# Get all ASes depending on some IXP.
df[(df.hege >= 0.1) & (df.dependency_type == 'IX')]
# Get the Hegemony score (if available) between AS2501 and AS2497.
df[(df.origin_type == 'AS') & (df.origin_name == '2501') & (df.dependency_type == 'AS') & (df.dependency_name == '2497')]

Data Format (API)

The API has almost the same format as the archive files. Scroll to the tr_hegemony section of the API page for a description of the available parameters. Due to the way the internal database is designed, there are additional “af” parameters, origin_af and dependency_af, which can be ignored for now (they are always 4).

Working with the API

Here are the same examples as above, but retrieved from the API:

Data Production Interval

We produce data once a week at midnight UTC from Sunday to Monday. This means the only available timebins are always ending on T00:00. Since we only include RIPE Atlas data for now (due to ease of access), the results are based on four weeks of traceroute data (filtered by a list of probe ASes, which is updated weekly, as described in the paper).

Therefore, data with in timebin 2024-01-29T00:00 is based on traceroute results from 2024-01-01T00:00 to 2024-01-29T00:00.