Using Earth Observation for Journalism¶
This repository contains a series of notebooks describing interaction with the Copernicus Open Access Hub in order to obtain, manipulate and analyze earth observation data. The aim is to document common tasks that might make the data from the Copernicus Sentinel-2 mission attractive for usage in data journalism.
The publication and research uses Jupyter notebooks and is published using jupyter-books, an open-source python project that allows generating HTML pages from a collection of Jupyter notebooks.
Copernicus Open Access Hub¶
Copernicus Open Access Hub is the platform which is openly distributing the Terrabytes of Sentinel-2 data which these notebooks rely on. A (free) Scihub account is needed in order to follow this documentation interactively. The registration form can be found at https://scihub.copernicus.eu/dhus/.
Target Audience¶
These notebooks assume Python knowledge as well as familiarity with common Python data processing tools like the pandas
library.
The topic is approached primarily from a computer science perspective, i.e. not an aeronautical, not a geophysical, or any other one.
As a consequence the focus will be how different tasks can be implemented.
Many considerations behind a particular action or processing step can only be briefly touched.
Obtaining and Running the Code¶
The notebooks are published for reading at https://arne.schlueter.is/working-on/earth-observation-for-journalism. The source code lives at https://github.com/heyarne/earth-observation-for-journalism.
A Dockerfile
is present at the root of the repository to help with reproducing the computing environment.
The image can be built by running the following command from the project root:
docker build . -t eratosthenes:latest
When running the docker image you need to define your Scihub user credentials as environment variables:
docker run -it \
--name eratosthenes \
--net host \
--volume "$(pwd)":/home/jovyan \
-e SCIHUB_USERNAME='<username>' \
-e SCIHUB_PASSWORD='<password>' \
eratosthenes:latest
This starts up a JupyterLab
environment which allows you to interactively execute all notebooks and modify them to suit your needs.
The Docker image is based on the jupyter/scipy-notebook. Follow the link for more information on installed packages or other configuration details.
Hardware Requirements¶
Note that working with this kind of data is resource intensive. These notebooks download or create roughly 50GB of data, most of which is occupied by compressed raster geodata. They have been executed and tested on a virtual server with 4 CPU cores with a clock speed of 2.6 GHz each and 32 GB of RAM.
Some notebooks contain cells that start with either the %%time
or %%timeit
magic commands.
These cells often contain code that does not immediately produce output and the magic commands produce information about the execution time on the system described above.
Building the Jupyter Book¶
The jupyter-book
dependency is not included in the Dockerfile
.
It has to be explicity installed.
When a Docker container is running as instructed above the dependency can be installed using the following command:
docker exec eratosthenes pip install jupyter-book
Afterwards a book can be built in a running container by executing the following command when the container above is running:
docker exec eratosthenes jupyter-book build .
The resulting html, css and js files of the book can be found in the directory _build/html/
.