Data for the Entire Season

This section explores how much data would be needed to analyze the entire area of Brandenburg, Germany, over an entire year. This was explored to verify the feasibility of comparing data for an entire year with official numbers as provided by the Waldbrandstatistik published by the responsible ministry, the Landesbetrieb Forst Brandenburg.

Data in the year 2018 is looked at to provide an estimate of the required resources.

from datetime import date
import os
from sentinel_helpers import search_osm
from sentinelsat import SentinelAPI
from tqdm.notebook import tqdm

api = SentinelAPI(os.getenv('SCIHUB_USERNAME'), os.getenv('SCIHUB_PASSWORD'))
api._tqdm = tqdm

start_date = date(2018, 1, 1)
end_date = date(2018, 12, 31)
cloud_coverage = (0, 30)

brandenburg = search_osm('Brandenburg, Germany')
brandenburg = brandenburg[brandenburg['type'] == 'administrative'][:1]
brandenburg.plot(figsize=(9,9))
<AxesSubplot:>
../_images/03a-data-retrieval-estimate_1_1.png
footprint = brandenburg.iloc[0]['geometry'].convex_hull.wkt
results = api.query(footprint,
                    platformname='Sentinel-2',
                    processinglevel='Level-2A',
                    date=(start_date, end_date),
                    cloudcoverpercentage=cloud_coverage)
print(f'Found {len(results)} results')
Found 638 results

Is previously mentioned, results is an ordered dict of UUIDproduct, where each product contains information about download size. This download size is given as a string which ends with a unit of either Megabyte (MB) or Gigabyte (GB).

These sizes are all converted to floats and normalized to be given in Gigabytes:

def human_readable_size_to_bytes(size):
    return float(size.split(' ')[0]) * 1024 * 1024 * (1024 if 'GB' in size else 1)

gdf = SentinelAPI.to_geodataframe(results)
size_in_bytes = gdf['size'].apply(human_readable_size_to_bytes)

The entire size of all products covering Brandenburg with a cloud coverage less than 30% can thus be calculated to be just over 500GB:

total_size_in_gb = size_in_bytes.sum() / 1024 ** 3
print(f'Total size in GB: {total_size_in_gb:.2f}GB')
Total size in GB: 517.64GB

This is more than the disk space available on this virtual machine:

! df -h | grep -E '(Filesystem|/home/jovyan)'
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      247G  184G   50G  79% /home/jovyan

Downloading these with an average download speed of 2 MB/s means obtaining this data, not counting the time it takes to restore it from the Long-Term-Archive takes…

duration_in_hours = size_in_bytes.sum() / (1024 ** 2) / 2 / 60 / 60
print(f'{duration_in_hours:.2f} hours')
73.62 hours

The official forest fire season in Brandenburg is from March 1 until October 31 (source). The above calculations can be repeated for this restricted time frame:

season_start = date(2018, 3, 1)
season_end = date(2018, 10, 31)
results = api.query(footprint,
                    platformname='Sentinel-2',
                    processinglevel='Level-2A',
                    date=(season_start, season_end),
                    cloudcoverpercentage=cloud_coverage)

print(f'Found {len(results)} results')
Found 579 results
gdf = SentinelAPI.to_geodataframe(results)
size_in_bytes = gdf['size'].apply(human_readable_size_to_bytes)

total_size_in_gb = size_in_bytes.sum() / 1024 ** 3
print(f'Total size in GB: {total_size_in_gb:.2f}GB')
Total size in GB: 468.67GB

Which is a part of approximately 91% of the products for the entire year and therefore a download duration of roughly 67 hours. This number represents a fraction of the actual processing time needed due to problems with data corruption and manual verification steps.