What is FlowKit?

FlowKit is an open-source (MPLv2) software suite developed by FlowMinder to enable the secure processing, analysis and granular access control to CDR data for humanitarian and development purposes.

FlowKit is designed to be installed within a mobile network operator’s (MNO) firewall so that no personal data leaves the MNO.

FlowKit is also containerised to simplify deployment and optimised to run on a single server. This means FlowKit is well suited to use in resource-constrained scenarios.

You can find more information on installing and using FlowKit here.

FlowKit figure

FlowKit has three components:

  • Ingestion and quality assurance
  • Processing
  • Authentication and controlled access
FlowKit components

Ingestion and quality assurance

The ingestion and quality assurance component of FlowKit is designed to reduce the effort to make CDR data “analysis-ready”.

The first stage is data ingestion. FlowKit is connected to a database containing pseudonymised CDR data at installation. It then automatically ingests new data, reducing the effort prior to the processing of the data.

FlowKit also automatically carries out quality assurance checks on new data, and identifies any issues which may impact the outputs.

Processing

The processing component of FlowKit facilitates the use of the methods developed by Flowminder, the consistent re-use of methods and replication of analyses, and the creation of new methods.

FlowKit includes a suite of methods developed and tested by Flowminder for the analysis of CDR data. These are implemented in a python library for constructing SQL queries and have been developed to run efficiently on a single server, as FlowKit is designed to run in resource-constrained scenarios. For example, FlowKit caches the results of the intermediate queries required to conduct larger analyses which allow them to be re-used in other analyses without being recalculated each time.

FlowKit also integrates GIS features to facilitate geospatial analysis and has a modular design to allow for growth and extension.

Authentication and controlled access

The authentication and controlled access component of FlowKit is responsible for managing permissions for data access and for integration into the data pipeline.

FlowKit allows users to access anonymised mobility aggregates derived from CDR data without requiring access to the individual-level data. Furthermore, FlowKit facilitates data minimisation as it allows granular control of the outputs a given user is able to access. This means users may only access the data outputs they specifically require.

The methods implemented in FlowKit also preserve the individual privacy of an MNO’s subscribers by anonymising all outputs. Anonymisation is achieved through the aggregation of individual mobility data and also the implementation of other anonymisation methods, including k-anonymity. When implementing k-anonymity FlowKit uses a value for k of 15 as standard, meaning that any data points within an aggregate corresponding to 15 or fewer subscribers will be redacted to preserve those subscribers' individual privacy.

FlowKit also supports robust data security by recording all queries by users so that they can be audited if necessary.

In addition to data security, FlowKit’s API facilitates the integration of FlowKit into a data pipeline. The API can act as an interface for further automation of analyses and protects users from internal changes to FlowKit’s methods.