Strengths and limitations of Call Detail Records (CDR data)

What are the strengths of CDR data?

There are a range of sources of big data which provide exciting new opportunities to address gaps in data from traditional sources, such as surveys, and to produce more timely and disaggregated statistics.

The ubiquity of mobile devices globally, including high penetration in low- and middle-income countries (LMICs), provides a particularly promising source of data. The International Telecommunication Union (ITU) estimates that 97% of the global population, including 90% of people in the least developed countries (LDCs), have mobile network coverage and that there are 110 mobile-cellular telephone subscriptions per 100 people globally, and 76 subscriptions per 100 people in LDCs.

Mobile phone data containing the timestamped locations for individuals, such as CDRs, signalling data and mobile GPS data, can provide high quality, granular information on human mobility. As these types of mobility data are automatically generated in real time, this allows for the rapid production of timely, quantitative insights in situations where accurate, up-to-date information is paramount.

Mobile phone data also covers large geographic areas, with data from mobile network operators (MNOs) covering entire countries.

CDR data have a range of strengths compared to other forms of mobile device location data

CDRs are used by MNOs for billing purposes and are therefore routinely recorded as part of the normal business operations. This reduces the need for additional data infrastructure to store data that would not otherwise be collected or stored by an MNO. Furthermore, the lower frequency at which CDR data are generated, compared to signalling or GPS data, also reduces the demands on the data infrastructure as there is a smaller quantity of data to store.

CDRs are also generated regardless of the type of mobile device, meaning that CDR data have higher penetration than GPS data from applications on smart devices, particularly in LMICs where smart devices are less prevalent. In LMICs in Sub-Saharan Africa, data-enabled mobile devices only account for 55% of devices, compared to 90% in high-income countries.

Furthermore, CDR data are passively generated when subscribers use their mobile devices, rather than being generated by actively sending a signal to devices to determine their location. Actively generated signalling data, in comparison, may require subscribers to opt-in to have their data collected in this way.

The strengths mean CDR data is especially well suited to estimating changes in the geographic distribution and mobility of populations in response to specific events, such as a disaster or the introduction of government restrictions to control the spread of disease, and estimating the variation around routine distribution mobility patterns.

Strengths of CDR data

High penetration, worldwide
Covers large geographic scales, including entire countries
Billions of data points from millions of people
Relatively high spatial and temporal resolution
Near-real time
Already generated and stored by MNOs

What are the limitations of CDR data?

Call detail records (CDRs) are an exciting source of mobility data with a number of strengths when compared to traditional data sources, such as surveys and censuses. However, CDR data have their own limitations which we should address when planning to generate insights from the analysis of CDR data.

In order to interpret CDR-derived indicators correctly and support evidence-based decision-making informed by the resulting insights, it is important to understand these limitations and how they can be addressed.

Adjusting for representation biases

As with any dataset, CDR datasets include only a sample of the population of interest (e.g. the national population or the population of a city). It is therefore important to assess how representative this sample is of the population as a whole.

CDRs are generated in an MNO’s systems when network events are routed through the network belonging to that MNO and are attributed to a subscriber. In order to be included in a CDR dataset, an individual must therefore first own a mobile device and second subscribe to the MNO(s) whose CDR data are being processed. Furthermore, a subscriber must use their mobile device often enough to generate sufficient network events for analysis. This usage threshold for inclusion in the analysis will vary depending on the questions we’re trying to answer. For example, a subscriber who makes two calls a week is not suitable for analyses of movements over short time periods such as studies of commuting behaviours.

As a result, there are several layers of filters affecting the sample of the population included in the dataset:

Mobile phone ownership
Subscription to a participating MNO
Sufficient usage of the mobile device during the study period

The sample of the population which meets these criteria is not random and therefore not fully representative of the whole population. For each of these filters, factors such as age, gender and socio-economic status may affect whether an individual is included in the dataset. For example, young children may be unlikely to possess a mobile device, or less wealthy subscribers may use their mobile device less frequently or subscribe to particular MNOs.

Addressing biases in representativeness

However, depending on the application, issues with representativeness may not substantially impact the validity of the insights generated from CDR data. For example, if an event affects everyone's movements in a similar way, regardless of whether or not they’re an active subscriber with a given MNO, then the indicators derived from CDR data will not be substantially impacted by biases in the representativeness of the data.

Furthermore, we can supplement our CDR data with survey and census data to help address these representation biases. Data on demographics and the ownership and use of mobile devices can help us to better understand any biases in the CDR data and enable us to adjust our indicators to address these biases.

Adjusting for non-uniform spatial and temporal resolutions

As CDRs are only generated when a subscriber engages in a network event, the temporal resolution of the dataset (i.e. the frequency of data points) is limited by the frequency with which subscribers use their mobile devices. This can affect the analysis of CDR data in a number of ways.

First, the temporal resolution of the data may result in sections of journeys or even whole trips being unobserved. This will especially affect fine-scale movements or movements over short periods of time which may be important for certain analyses and sectors.

To some extent, we can address the issue with temporal resolution by limiting the dataset to high-usage subscribers. However, as discussed above, this may impact the representativeness of the data for the population as a whole, and potentially exclude the most vulnerable.

Furthermore, changes in subscribers’ activity on their mobile devices, for example in response to a crisis, can impact the movements which are captured in CDR data and give the impression of a change in mobility where there is none. For this reason we also produce diagnostic indicators which describe network activity so that we can account for change in subscriber behaviour.

CDR data also only contain the location of the cell tower which routed the network event, as opposed to the location of the subscriber.

As a result, the spatial resolution of the data (i.e. the geographic precision of the subscribers location) is limited by the density of cell towers. While in urban areas a subscriber will generally be within approximately 500 metres of a cell tower, in rural areas cell tower density may be much lower. In the most remote areas, the furthest a cell tower may be from a subscriber and still route a network event, may be up to 8km.

The variation in the density of cell towers may result in differences in the movements which are captured by CDR data between different areas. Relatively small movements which are captured in urban areas with a high density of cell towers may not be captured in rural areas with lower tower density where subscribers can travel further without network events being routed by a different cell tower. We therefore need to take this into consideration when looking at changes in mobility between different areas.

It is therefore important to consider whether other forms of location data with greater temporal and spatial resolution are more appropriate for the questions we are trying to answer.

However, in order to sufficiently anonymise subscribers and preserve their individual privacy, these data must be aggregated spatially and temporally. These aggregates often have relatively coarse spatial and temporal resolution relative to individual movements, reducing the negative impacts of the lower resolution of CDR data.

Subscriber individuality and uniqueness

When using mobile phone usage data, including CDRs, to study mobility, there can be a tendency to assume that each subscriber identifier corresponds to a single, unique individual. However, a single device or SIM card may be shared by multiple people, for example by members of the same household, or a single individual may possess multiple devices. As a result, a single trajectory may represent the movements of several individuals or a single individual may be counted multiple times.

Tackling the assumption that each subscriber is a unique individual is not straightforward, as SIM sharing and the possession of multiple SIMs could be affected by factors such as age, gender and socioeconomic status.

Conversely, owning multiple mobile devices with different SIM cards is likely to be associated with greater wealth or potentially certain types of employment in which company mobile devices are more common.

As with concerns about the representativeness of the sample of the population included in the dataset, we can use survey data on mobile phone ownership and usage behaviours to help address these concerns. For example, a survey conducted in Nepal by Flowminder researchers found relatively high levels of SIM sharing (47% of respondents reported the use of their SIM card by others). However, there was no significant difference in SIM sharing between men and women.

Liminations in the disaster management context

In addition to the limitations inherent to CDR data under other applications, some specific considerations need to be taken into account when analysing such data for the management of disasters and humanitarian crises.