In order to extract usable, informative insights which protect individual privacy and are relevant in the development and humanitarian sector, CDR data needs to go through several stages of processing and analysis.
We describe below an approach we recommend to turn these data into insights, once access has been safely and securely granted.
Raw Call Detail Records (CDR data)
Raw CDR data are owned by mobile network operators (MNOs) and are designed to record billable network events of subscribers. From the perspective of analysing human population distribution, however, CDRs contain three essential pieces of information about each network event:
- An identifier for the subscriber,
- The time of the network event,
- The cell tower the network event was routed through.
Raw CDR data contains personal data about subscribers and must therefore be kept secure to preserve the individual privacy of subscribers. We strongly recommend raw CDR data never leaves the possession of the MNO and that access to the data is restricted.
Raw CDR data can then be grouped spatially and temporally to produce CDR aggregates which describe the distribution and movements of the subscribers as a whole.
Production of CDR aggregates
CDR aggregates are produced by processing the CDR data of many individual subscribers into an output that characterises the behaviour of the entire group of subscribers. As the movements of individual subscribers cannot be extracted from them, CDR aggregates are anonymised and no longer considered personal data under (most) current data privacy regulations. However, CDR aggregates may still contain sensitive information which needs to be considered before any dissemination.
CDR aggregates can be generated by MNOs or telecommunications regulators with access to CDR data. However, to facilitate the production of aggregations while minimising the direct interaction with the raw CDR data, MNOs can install software such as FlowKit which contain tools/algorithms for standardised production of common types of aggregates.
Examples of commonly-produced aggregates include the count of subscribers actively using their phone in a given region within a specified hour, or the count of subscribers recorded travelling from one region to another on a particular day.
In order to ensure statistical validity, and ensure that information about any single subscriber is not inadvertently disclosed, we recommend that aggregates are produced for groups of at least 15 subscribers.
Production of CDR indicators
Data analysts and modellers can further process CDR aggregates to derive indicators. These indicators can address specific questions about the geographic distribution and mobility of a population. Unlike CDR aggregates, indicators may also be corrected to address biases in the representativeness of the data or corrected for changes in the network activity of subscribers.
There are a number different CDR indicators which we can produce, depending on the types of questions we want to address. However, in order to be informative they must be interpreted with an understanding of the local context. Indicators can therefore be interpreted to generate insights for dissemination to end users or disseminated directly to end-users with the expertise and context to interpret the indicators themselves.
Turning indicators into actionable insights for decision making
Insights can be generated by stakeholders with the contextual knowledge to interpret CDR indicators. Insights into the geographic distribution and mobility of a population have a broad range of applications for decision-makers and may be disseminated in different ways depending on the needs of the end user, including graphs, maps or reports.