Download Process¶
These notebooks contain information about querying and downloading data of the Sentinel-2 mission from the Copernicus Open Access Hub.
Note
In order to run these notebooks, a free account for the Copernicus Open Access Hub is needed. Please create an account first:
https://scihub.copernicus.eu/dhus/#/self-registration
Consult the README for information on setting up the environment variables.
import os
from sentinelsat import SentinelAPI, read_geojson
from tqdm.notebook import tqdm
api = SentinelAPI(os.getenv('SCIHUB_USERNAME'), os.getenv('SCIHUB_PASSWORD'))
api._tqdm = tqdm # ← gives us better progress bars
Defining the Region of Interest¶
The Copernicus Open Access Hub API expects a point or area that designates the region of interest.
The OpenStreetMap Nominatim API provides a HTTP-based interface to the OpenStreetMap data set, which can be queried for the administrative boundaries of Berlin.The search_osm
function defined in sentinel_helpers.py
allows retrieving these geometries by location name:
from sentinel_helpers import search_osm
??search_osm
Signature: search_osm(place)
Source:
def search_osm(place):
'''
Returns a GeoDataFrame with results from OpenStreetMap Nominatim for the given search string.
'''
urlescaped_place = urllib.parse.quote(place)
search_url = ('https://nominatim.openstreetmap.org/search/?q={}' +
'&format=geojson&polygon_geojson=1').format(urlescaped_place)
return gpd.read_file(search_url)
File: ~/sources/sentinel_helpers.py
Type: function
The function is mostly concerned with constructing the correct URL and ensuring that the response is returned as GeoJSON so it can be parsed by the GeoPandas
library.
A query returns a GeoPandas
GeoDataFrame
which contains a geometry and additional information that can be used to narrow down the area of interest in:
results = search_osm('Berlin, Germany')
results.head()
place_id | osm_type | osm_id | display_name | place_rank | category | type | importance | icon | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 574401 | node | 240109189 | Berlin, 10117, Deutschland | 15 | place | city | 0.897539 | https://nominatim.openstreetmap.org/ui/mapicon... | POINT (13.38886 52.51704) |
1 | 256375666 | relation | 62422 | Berlin, Deutschland | 8 | boundary | administrative | 0.897539 | https://nominatim.openstreetmap.org/ui/mapicon... | MULTIPOLYGON (((13.08835 52.41963, 13.09021 52... |
2 | 1253688 | node | 313826576 | Berlin, Tauentzienstraße, Charlottenburg, Char... | 30 | tourism | artwork | 0.315372 | https://nominatim.openstreetmap.org/ui/mapicon... | POINT (13.33864 52.50340) |
3 | 225671 | node | 50798110 | Berlin, Seedorf, Trave-Land, Segeberg, Schlesw... | 19 | place | village | 0.272651 | https://nominatim.openstreetmap.org/ui/mapicon... | POINT (10.44613 54.03636) |
4 | 107621118 | way | 68276545 | Berlin, Potsdam, Klein Bennebek, Kropp-Stapelh... | 26 | highway | unclassified | 0.210000 | None | LINESTRING (9.41929 54.40283, 9.42072 54.40343... |
OpenStreetMap contains geoinformation at vastly different scales and of very different types, which range from outlines of parks to entire countries.
The OpenStreetMap wiki contains exhaustive information about the architectural design of OpenStreetMap.
This empowers the search_osm
function to retrieve geoinformation on places that would otherwise be widely scattered across a myriad of sources, if available at all.
The first result is the city’s centroid.
The property listed in the type
column can be used to select the administrative boundaries.
berlin = results[results['type'] == 'administrative']
berlin.plot(figsize=(6,6))
<AxesSubplot:>

… and save this result for later. All downloaded files as well as all the artifacts generated by the various notebooks will live inside the resources
directory:
from pathlib import Path
resources = Path('resources')
resources.mkdir(exist_ok=True)
berlin.to_file(resources / 'berlin.geojson', driver='GeoJSON')
OpenStreetMap provides detailed geometries not only for cities, but also for larger areas such as federal states:
brandenburg = search_osm('Brandenburg, Germany')
brandenburg.iloc[:1].plot(figsize=(6,6))
<AxesSubplot:>

… or smaller areas such as parks:
tiergarten = search_osm('Tiergarten, Berlin, Germany')
tiergarten.plot(figsize=(6,6))
<AxesSubplot:>

In addition to the OpenStreetMap Wiki the OpenStreetMap Website is useful to interactively test out different search queries.
In this notebook the area of interest is defined to be berlin
.
Defining a Request to the Copernicus Open Access Hub¶
An API request additionally contains the time span in which to search for results - so especially recent imagery or imagery over a long time can be directly queried - and the maximum cloud coverage. The maximum cloud coverage is needed because Sentinel-2 is equipped with optical instruments. If there are too many clouds the products will not contain a lot of useful information.
from datetime import date
import datetime
# note that start_date are python date objects this means they can be
# manipulated using python functions
start_date = date(2020, 5, 22)
end_date = date(2020, 6, 22)
cloud_coverage = (0,30) # between 0 and 30 percent
The region of interest is given as the footprint
parameter.
The geometry retrieved from OpenStreetMap is simplified by calculating convex hull due to restrictions in URL lengths that don’t allow querying for arbitrarily detailed geometries:
footprint = berlin.iloc[0]['geometry'].convex_hull
footprint
products = api.query(footprint,
platformname='Sentinel-2',
processinglevel='Level-2A',
date=(start_date, end_date),
cloudcoverpercentage=cloud_coverage)
print(f'Found {len(products)} results')
Found 16 results
Selecting Products from the Result Set¶
The criteria for selecting a product depends on the specific use case. The first use case is to plot an image of Berlin, so as much of the city as possible should be visible in the downloaded data.
The list of products to a GeoDataFrame
(gdf
):
gdf = api.to_geodataframe(products)
/opt/conda/lib/python3.8/site-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
return _prepare_from_string(" ".join(pjargs))
plot_product_extent
allows plotting the tile geometry of downloaded or available products over with a designated area of interest.
This provides quick visual impression of the result:
from sentinel_helpers import plot_product_extent
ax = plot_product_extent(gdf, berlin, figsize=(9,9))

Note
The geometrical shape of the captured data depends on where the Sentinel-2 satellite was in orbit when it crossed the area of interest. For an in-depth analysis see Coverage Analysis.
Because products are large (roughly 1GB), it is desirable to skip unnecessary downloads whenever possible. Products are therefore filtered by two criteria:
How much of the area of interest is visible in the product (i.e. area of the intersection of a product’s tile and the area of interest)
Cloud coverage (the less the better)
# we are using a metric coordinate reference system for europe in order to
# safely compare areas
gdf = gdf.to_crs('EPSG:4839')
berlin = berlin.to_crs('EPSG:4839')
import geopandas as gpd
intersection = gpd.overlay(
gdf,
berlin,
how="intersection"
)
intersection['area'] = intersection['geometry'].area
max_area = max(*intersection['area'])
largest_intersections = intersection[intersection['area'] == max_area]
largest_intersections
title | link | link_alternative | link_icon | summary | ondemand | beginposition | endposition | ingestiondate | orbitnumber | ... | osm_type | osm_id | display_name | place_rank | category | type | importance | icon | geometry | area | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | S2A_MSIL2A_20200617T101031_N0214_R022_T33UUU_2... | https://scihub.copernicus.eu/apihub/odata/v1/P... | https://scihub.copernicus.eu/apihub/odata/v1/P... | https://scihub.copernicus.eu/apihub/odata/v1/P... | Date: 2020-06-17T10:10:31.024Z, Instrument: MS... | false | 2020-06-17 10:10:31.024 | 2020-06-17 10:10:31.024 | 2020-06-17 21:35:56.836 | 26045 | ... | relation | 62422 | Berlin, Deutschland | 8 | boundary | administrative | 0.897539 | https://nominatim.openstreetmap.org/ui/mapicon... | MULTIPOLYGON (((215331.937 168288.512, 216047.... | 8.535962e+08 |
3 | S2B_MSIL2A_20200615T101559_N0214_R065_T33UUU_2... | https://scihub.copernicus.eu/apihub/odata/v1/P... | https://scihub.copernicus.eu/apihub/odata/v1/P... | https://scihub.copernicus.eu/apihub/odata/v1/P... | Date: 2020-06-15T10:15:59.024Z, Instrument: MS... | false | 2020-06-15 10:15:59.024 | 2020-06-15 10:15:59.024 | 2020-06-16 01:21:09.378 | 17108 | ... | relation | 62422 | Berlin, Deutschland | 8 | boundary | administrative | 0.897539 | https://nominatim.openstreetmap.org/ui/mapicon... | MULTIPOLYGON (((215331.937 168288.512, 216047.... | 8.535962e+08 |
10 | S2B_MSIL2A_20200602T100559_N0214_R022_T33UUU_2... | https://scihub.copernicus.eu/apihub/odata/v1/P... | https://scihub.copernicus.eu/apihub/odata/v1/P... | https://scihub.copernicus.eu/apihub/odata/v1/P... | Date: 2020-06-02T10:05:59.024Z, Instrument: MS... | false | 2020-06-02 10:05:59.024 | 2020-06-02 10:05:59.024 | 2020-06-02 18:40:43.331 | 16922 | ... | relation | 62422 | Berlin, Deutschland | 8 | boundary | administrative | 0.897539 | https://nominatim.openstreetmap.org/ui/mapicon... | MULTIPOLYGON (((215331.937 168288.512, 216047.... | 8.535962e+08 |
11 | S2A_MSIL2A_20200531T102031_N0214_R065_T33UUU_2... | https://scihub.copernicus.eu/apihub/odata/v1/P... | https://scihub.copernicus.eu/apihub/odata/v1/P... | https://scihub.copernicus.eu/apihub/odata/v1/P... | Date: 2020-05-31T10:20:31.024Z, Instrument: MS... | false | 2020-05-31 10:20:31.024 | 2020-05-31 10:20:31.024 | 2020-05-31 20:21:27.279 | 25802 | ... | relation | 62422 | Berlin, Deutschland | 8 | boundary | administrative | 0.897539 | https://nominatim.openstreetmap.org/ui/mapicon... | MULTIPOLYGON (((215331.937 168288.512, 216047.... | 8.535962e+08 |
4 rows × 46 columns
largest_intersections[['title', 'cloudcoverpercentage']]
title | cloudcoverpercentage | |
---|---|---|
1 | S2A_MSIL2A_20200617T101031_N0214_R022_T33UUU_2... | 24.374344 |
3 | S2B_MSIL2A_20200615T101559_N0214_R065_T33UUU_2... | 3.626466 |
10 | S2B_MSIL2A_20200602T100559_N0214_R022_T33UUU_2... | 0.345483 |
11 | S2A_MSIL2A_20200531T102031_N0214_R065_T33UUU_2... | 16.512980 |
These files will all be downloaded to resources/true_color_image
.
Repeated calls to download
or download_all
will skip the download process if a product has previously been downloaded.
dst_path = resources / 'true_color_image'
# we make sure to create the directory at dst_path so the notebook can run
# without interruptions
dst_path.mkdir(exist_ok=True, parents=True)
downloads, _, _ = api.download_all(largest_intersections['uuid'].values, dst_path)
downloads
{'958ef7fc-815e-47c6-b323-219de6a91339': {'id': '958ef7fc-815e-47c6-b323-219de6a91339',
'title': 'S2A_MSIL2A_20200617T101031_N0214_R022_T33UUU_20200617T161854',
'size': 854267363,
'md5': 'E8E78FEEBF25D95E50A7E5E0C971C061',
'date': datetime.datetime(2020, 6, 17, 10, 10, 31, 24000),
'footprint': 'POLYGON((12.352510548540328 52.23126993496904,12.408603762795623 52.36271559980042,12.470993716931147 52.50836314366547,12.533637922787934 52.653880456978825,12.596456783078388 52.79937966494399,12.659608391506955 52.944876790325615,12.72334134983589 53.09033052896263,12.783072077326212 53.2261774473976,13.648341514492438 53.24196354764364,13.678544065009483 52.25515074878083,12.352510548540328 52.23126993496904))',
'url': "https://scihub.copernicus.eu/apihub/odata/v1/Products('958ef7fc-815e-47c6-b323-219de6a91339')/$value",
'Online': True,
'Creation Date': datetime.datetime(2020, 6, 17, 21, 36, 46, 539000),
'Ingestion Date': datetime.datetime(2020, 6, 17, 21, 35, 56, 836000),
'path': 'resources/true_color_image/S2A_MSIL2A_20200617T101031_N0214_R022_T33UUU_20200617T161854.zip',
'downloaded_bytes': 0},
'b64d4450-d999-41dc-9ecb-6d263f322dd3': {'id': 'b64d4450-d999-41dc-9ecb-6d263f322dd3',
'title': 'S2B_MSIL2A_20200615T101559_N0214_R065_T33UUU_20200615T141117',
'size': 1176239030,
'md5': '86D9DB1D0C6C087D66AE28E4A106EF04',
'date': datetime.datetime(2020, 6, 15, 10, 15, 59, 24000),
'footprint': 'POLYGON((12.004760925069467 53.21197782662703,13.648341514492438 53.24196354764364,13.678544065009483 52.25515074878083,12.071595116334858 52.22621087043558,12.004760925069467 53.21197782662703))',
'url': "https://scihub.copernicus.eu/apihub/odata/v1/Products('b64d4450-d999-41dc-9ecb-6d263f322dd3')/$value",
'Online': True,
'Creation Date': datetime.datetime(2020, 6, 16, 1, 22, 2, 573000),
'Ingestion Date': datetime.datetime(2020, 6, 16, 1, 21, 9, 378000),
'path': 'resources/true_color_image/S2B_MSIL2A_20200615T101559_N0214_R065_T33UUU_20200615T141117.zip',
'downloaded_bytes': 0},
'10190cf1-5690-48a3-b71e-861629703a3f': {'id': '10190cf1-5690-48a3-b71e-861629703a3f',
'title': 'S2B_MSIL2A_20200602T100559_N0214_R022_T33UUU_20200602T135324',
'size': 862966648,
'md5': 'A93C8F7AA12C2396CF57CFD02ADF21A2',
'date': datetime.datetime(2020, 6, 2, 10, 5, 59, 24000),
'footprint': 'POLYGON((12.358321903656456 52.23137459287384,12.415699786858166 52.36633012201262,12.47808001229512 52.51194199175493,12.5407946185535 52.65754848189568,12.603698138951865 52.80313660845604,12.667166015291048 52.94869104851679,12.731188091140115 53.09422646557383,12.789415151489028 53.226293171356296,13.648341514492438 53.24196354764364,13.678544065009483 52.25515074878083,12.358321903656456 52.23137459287384))',
'url': "https://scihub.copernicus.eu/apihub/odata/v1/Products('10190cf1-5690-48a3-b71e-861629703a3f')/$value",
'Online': True,
'Creation Date': datetime.datetime(2020, 6, 2, 18, 42, 4, 900000),
'Ingestion Date': datetime.datetime(2020, 6, 2, 18, 40, 43, 331000),
'path': 'resources/true_color_image/S2B_MSIL2A_20200602T100559_N0214_R022_T33UUU_20200602T135324.zip',
'downloaded_bytes': 0},
'e385d397-4d1b-4fa2-a1ce-917f420c86dd': {'id': 'e385d397-4d1b-4fa2-a1ce-917f420c86dd',
'title': 'S2A_MSIL2A_20200531T102031_N0214_R065_T33UUU_20200531T134031',
'size': 1162711104,
'md5': '7E74A83DA5EA1176CFB68517E53A343D',
'date': datetime.datetime(2020, 5, 31, 10, 20, 31, 24000),
'footprint': 'POLYGON((12.004760925069467 53.21197782662703,13.648341514492438 53.24196354764364,13.678544065009483 52.25515074878083,12.071595116334858 52.22621087043558,12.004760925069467 53.21197782662703))',
'url': "https://scihub.copernicus.eu/apihub/odata/v1/Products('e385d397-4d1b-4fa2-a1ce-917f420c86dd')/$value",
'Online': True,
'Creation Date': datetime.datetime(2020, 5, 31, 20, 22, 5, 475000),
'Ingestion Date': datetime.datetime(2020, 5, 31, 20, 21, 27, 279000),
'path': 'resources/true_color_image/S2A_MSIL2A_20200531T102031_N0214_R065_T33UUU_20200531T134031.zip',
'downloaded_bytes': 0}}
downloads
contains a dictionary, mapping each product’s UUID to detailed information about the downloaded product.
This information can be used to calculate the total download size:
downloaded_bytes_total = sum(p['size'] for p in downloads.values())
print(f'Downloaded {(downloaded_bytes_total / 1024 ** 3):.2f}GB')
Downloaded 3.78GB