Interpretation and Visualization

This notebook shows how to access the content of the products downloaded in Download Process and plot a true color rendering. While the products already contain a True Color Image (TCI), this approach is useful for two reasons:

  1. It allows comparing the custom rendering with a rendering provided by official sources, thereby providing a template that can be used to spot any errors.

  2. Generating a custom True Color Image can be useful for further image manipulations, changing contrast or changing out single bands for others to highlight phenomena which are more visible at certain wavelengths.

First the shape of Berlin is created from data previously downloaded from OpenStreetMap:

from pathlib import Path

import geopandas as gpd
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm

resources = Path('resources')
berlin = gpd.read_file(resources / 'berlin.geojson')

fig, ax = plt.subplots(figsize=(6,6))
berlin.plot(ax=ax)
<AxesSubplot:>
../_images/01b-visualization_1_1.png

The information which product contains the least amount of clouds is given in the previous notebook.

raster_files = resources / 'true_color_image'
product = next(raster_files.glob('S2B_MSIL2A_20200602T100559*.zip'))
product
PosixPath('resources/true_color_image/S2B_MSIL2A_20200602T100559_N0214_R022_T33UUU_20200602T135324.zip')

The product path contains a lot of information:

  • S2B shows that the downloaded product was captured by the Sentinel-2 satellite B. At the moment there are two satellites in the mission, A and B.

  • MSI stands for Multi Spectral Instrument.

  • L2A is the processing level. Level 2A is the highest processing level and lower processing levels may need further processing to be useful.

  • The first timestamp, 20200602T100559, is the date at which the data was captured.

  • N0214 is the processing baseline. This is like a version string that encodes how the previous processing steps were performed.

  • R022 is the relative orbit number.

  • T33UUU is the unique name of the area which is covered by the product.

  • The second timestamp, 20200602T135324, “is the ‘<Product Discriminator>’ field, which […] is used to distinguish between different end user products from the same datatake.” (Source)

Previewing Downloaded Files

The largest share of the product size is made up by *.jp2 files, which are raster images which contain geocoded spectral band information at a fixed resolution. Data on each spectral band captured by the Sentinel-2 mission is distributed on a per-file basis, which is why most of these files contain data on a single channel only. An exception is the included True Color Image, which contains data in red, green and blue channels. There is a function that, given a resolution and one or more spectral band name, returns the correct file paths. It contains code to deal with compressed or uncompressed products.

from sentinel_helpers import scihub_band_paths
??scihub_band_paths
Signature: scihub_band_paths(p, bands, resolution=None)
Source:   
def scihub_band_paths(p, bands, resolution=None):
    '''
    Given a zip file or folder at `p`, returns the paths inside p to the raster files containing
    information for the given bands. Because some bands are available in more than one
    resolution, this can be filtered by prodiding a third parameter (e.g. resolution='10m').
    
    - `p` can be a string or a pathlib.Path.
    - `bands` can be a list of bands or a single band.
   
    The returned paths are formatted in the zip scheme as per Apache Commons VFS if necessary
    and can be directly opened by rasterio.
    '''
    if type(bands) != list:
        # allow passing in a single band more easily
        bands = [bands]
    
    p = Path(p) # make sure we're dealing with a pathlib.Path
    if p.suffix == '.zip':
        # when dealing with zip files we have to read the filenames from the
        # archive first
        with ZipFile(p) as f:
            files = f.namelist()
            rasters = [Path(f'zip+file://{p.absolute()}!/{f}') for f in files if f.endswith('.jp2')]
    else:
        rasters = p.glob('**/*.jp2')
    
    # take only the paths that contain one of the given bands
    rasters = [raster for band in bands for raster in rasters if band in raster.name]
    
    # if a resolution is given, further discard the bands we don't need
    if resolution:
        rasters = [raster for raster in rasters if resolution in raster.name]
    
    return rasters
File:      ~/sources/sentinel_helpers.py
Type:      function

Using the compressed zip-file, while slightly inconvenient, makes sense because it allows saving disk space and avoiding the extra step of decompressing every single downloaded product.

The pre-rendered True Color Image can be plotted for an impression of the product’s contents:

low_res_tci = scihub_band_paths(product, 'TCI', '60m')[0]
low_res_tci
PosixPath('zip+file:/home/jovyan/sources/resources/true_color_image/S2B_MSIL2A_20200602T100559_N0214_R022_T33UUU_20200602T135324.zip!/S2B_MSIL2A_20200602T100559_N0214_R022_T33UUU_20200602T135324.SAFE/GRANULE/L2A_T33UUU_A016922_20200602T101204/IMG_DATA/R60m/T33UUU_20200602T100559_TCI_60m.jp2')

The rasterio library is used to open this image and render its contents:

import rasterio as r
import rasterio.plot as rplt

with r.open(low_res_tci) as tci:
    fig, ax = plt.subplots(figsize=(10,10))
    ax.grid(False)
    rplt.show(tci)
../_images/01b-visualization_10_0.png

This pattern shows repeatedly across many notebooks:

with r.open(...) as src:
    # ...

It is used to open one or more data sets at a time and close them as soon as the execution pointer of the block has ended. This means data is only read inside the block, which is also where the processing usually takes place.

Creating a Custom True Color Composition

While for many use cases using the TCI can be enough, knowing how to compose True Color Images provides additional merit as explained above.

The blue, green, and red parts of the spectrum are represented in the raster files for the bands 2, 3 and 4 respectively sentinel_helpers.py contains a function wrapping scihub_band_paths to retrieve those bands in a resolution of choice:

from sentinel_helpers import scihub_bgr_paths
??scihub_bgr_paths
Signature: scihub_bgr_paths(product_path, resolution=None)
Source:   
def scihub_bgr_paths(product_path, resolution=None):
    '''
    A convenence function to return the paths to the blue, green and red bands
    in the downloaded product at `product_path`.
    '''
    return scihub_band_paths(product_path, ['B02', 'B03', 'B04'], resolution)
File:      ~/sources/sentinel_helpers.py
Type:      function
b02, b03, b04 = scihub_bgr_paths(product, '60m')
b02.name, b03.name, b04.name
('T33UUU_20200602T100559_B02_60m.jp2',
 'T33UUU_20200602T100559_B03_60m.jp2',
 'T33UUU_20200602T100559_B04_60m.jp2')

Full Range Plot

Next, a plot of the combination of these bands is plotted:

with r.open(b02) as blue, r.open(b03) as green, r.open(b04) as red:
    fig, ax = plt.subplots(figsize=(10, 10))
    ax.grid(False)
    
    # calling blue.read(1) returns the first raster band of the blue raster
    # file as a numpy array. each raster file can contain multiple bands and
    # unfortunately the different parts of the spectrum are called "bands" as well.
    rplt.show([red.read(1), green.read(1), blue.read(1)],
              # we need to pass the transform explicitly if we want the plot
              # to be labelled correctly because the numpy array does not,
              # contain information about world coordinates it only uses
              # array indices
              transform=blue.transform)
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
../_images/01b-visualization_15_1.png

What this plot shows is an effect that is akin to an overexposed photograph. The intensity of the individual pixels seems to high, which can be verified by looking at the histogram of the data and printing a statistical summary.

import numpy as np

hist_args = {
    'alpha': 0.3,
    'stacked': False,
    'histtype': 'stepfilled',
    'bins': 50
}

with r.open(b02) as blue, r.open(b03) as green, r.open(b04) as red:
    fig, ax = plt.subplots(figsize=(10, 10))
    
    rgb = np.array([red.read(1), green.read(1), blue.read(1)])
    labels = ['Red spectral band', 'Green spectral band', 'Blue spectral band']
    rplt.show_hist(rgb, label=labels, ax=ax, **hist_args)
../_images/01b-visualization_17_0.png
for idx, band in enumerate(['red', 'green', 'blue']):
    print(f'{band} min:     {rgb[idx].min()}')
    print(f'{band} max:     {rgb[idx].max()}')
    print(f'{band} mean:    {rgb[idx].mean()}')
    print(f'{band} median:  {np.median(rgb[idx])}')
    print(f'{band} deciles: {np.percentile(rgb[idx], np.arange(0, 100, 10))}')
red min:     0
red max:     9057
red mean:    435.0680333840963
red median:  335.0
red deciles: [   0.    0.    0.    0.  271.  335.  417.  553.  751. 1022.]
green min:     0
green max:     8602
green mean:    482.2255373406193
green median:  478.0
green deciles: [  0.   0.   0.   0. 437. 478. 569. 687. 811. 971.]
blue min:     0
blue max:     8979
blue mean:    343.94775269491475
blue median:  317.0
blue deciles: [  0.   0.   0.   0. 283. 317. 370. 462. 571. 733.]

While the arrays are heavily sloped towards the lower end of the spectrum (the 90th percentile is around 1000 or lower), the range of values extends to over 9000. Contrast this with the Sentinel User Guide contains the following information the included TCI:

The TCI is an RGB image built from the B02 (Blue), B03 (Green), and B04 (Red) Bands. The reflectances are coded between 1 and 255, 0 being reserved for ‘No Data’. The saturation level of 255 digital counts correspond to a level of 3558 for L1C products or 2000 for L2A products (0.3558 and 0.2 in reflectance value respectively.

Why do the products described above extend the range of up to 2000?

The reflectance value - which is used in the arithmetic calculations - is a floating point number \(\in [0, 1]\). The products contain digital counts (DC) or digital numbers (DN). These values have to be converted with \(\frac{\text{DN}}{10\ 000}\) like the quote above implies.

The sensors on the sentinel satellite capture a lot of light outside of the spectrum visible to the human eye. The sensors are not modeled to match the sensitivity of the human eye - values which are too intense (i.e. \(> 2000\)) can therefore be discarded.1

def normalize_rgb(rgb):
    return ((np.clip(rgb, 0, 2000) / 2000) * 255).astype(np.uint8)

with r.open(b02) as blue, r.open(b03) as green, r.open(b04) as red, r.open(low_res_tci) as tci:
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18,6))
    fig.suptitle('True Color Image', fontsize=20)
    
    ax1.grid(False)
    ax2.grid(False)
    ax3.grid(False)
    
    transform = blue.transform
    
    # included true color image
    included_tci = tci.read()
    rplt.show(included_tci, transform=transform, ax=ax1)
    ax1.set_title('Included TCI', pad=20)
    
    # "overexposed" rgb image
    rplt.show(rgb, transform=transform, ax=ax2)
    ax2.set_title('Full RGB ND Range', pad=20)
    
    # normalized rgb image
    normalized_rgb = normalize_rgb(rgb)
    rplt.show(normalized_rgb, transform=transform, ax=ax3)
    ax3.set_title('Normalized RGB', pad=20)
    
    plt.show()
    fig.savefig(resources / 'berlin_true_color_image_naive_plot.png', dpi=300, bbox_inches='tight')
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
../_images/01b-visualization_20_1.png

A comparison of the histograms of included_tci and the normalized_rgb array offers more details:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18,9))
fig.suptitle('Histogram Comparison', fontsize=20)

rplt.show_hist(included_tci, label=labels, ax=ax1, **hist_args)
rplt.show_hist(normalized_rgb, label=labels, ax=ax2, **hist_args)

ax1.set_title('Included TCI', pad=20)
ax1.set_xlabel('Value')
ax2.set_title('Normalized RGB', pad=20)
ax2.set_xlabel('Value')
plt.savefig(resources / 'berlin_true_color_image_naive_histogram.png', dpi=300, bbox_inches='tight')
../_images/01b-visualization_22_0.png

Each the red, green and blue band have higher peaks in the prerendered TCI at value of around 50 - the curves match closely. Most of the pixels are using the designated nodata-value 0, which is the black stripe in the top left corner of the image.

Because the purpose of this visualization is not creating a one-to-one replica of the included TCI but rather demonstrate how to interpret and manipulate the raster file contents, the approximation is sufficient.

Creating a Subset of the Area of Interest

It is rare to plot the entire product because the data in this product can be partially missing depending on the orbit position (see Coverage Analysis for more information).

The created image can be cropped using code provided in the rasterio library for its rio command line tool. This requires constructing a Window, which is a rectangle with an x- and y-offset that is provided by rasterio to partially read or write raster data.

The position of the Window is calculated by transforming the area of interest berlin into the Coordinate Reference System that is used by src and then calculating the intersection:

from tempfile import NamedTemporaryFile
from rasterio.windows import Window
from tqdm.notebook import tqdm

# use a higher resolution
b02, b03, b04 = scihub_bgr_paths(product, '10m')

# this code is based on https://github.com/mapbox/rasterio/blob/d7b2dd3ae64c55978e265fa9230732e88b1dc9ae/rasterio/rio/clip.py
with r.open(b02) as blue, r.open(b03) as green, r.open(b04) as red:
    # OpenStreetMap data uses WGS84 (i.e. latitude and longitude) for its
    # coordinate system; the coordinate system used by the product depends on
    # the UTM tile which is covered
    bounds = berlin.to_crs(blue.crs).bounds.loc[0].values
    
    # create window from the bounding box of the target area
    out_window = blue.window(*bounds).round_lengths(op='floor')
    out_window = out_window.intersection(
        Window(0, 0, blue.width, blue.height) # a window identical to `src`
    )
    
    height = int(out_window.height)
    width = int(out_window.width)
    
    # arguments for writing the slice into a new file
    out_kwargs = blue.profile.copy()
    out_kwargs.update({
        # number of combined bands
        'count': 3,
        'height': height,
        'width': width,
        
        # use the transformation matrix of the intersection (the origin changed)
        'transform': blue.window_transform(out_window),

        'driver': 'COG', # save as geotiff
        'dtype': np.uint8 # using integers instead of floats
    })

    # we're using a temporary file for demonstration purposes; if you want to
    # save the result somewhere else you'd probably use a `pathlib.Path`.
    with NamedTemporaryFile() as tmp:
        with r.open(tmp.name, 'w+', **out_kwargs) as out:
            src_bands = [red, green, blue]
            
            # we need to explicitly set the band which the source band is
            # written to, ranging from 1 to 3 (inclusive)
            for idx, src in tqdm(enumerate(src_bands, start=1), desc=f'Writing bands to {tmp.name}'):
                subset = normalize_rgb(src.read(
                    1,
                    window=out_window,
                    out_shape=(1, height, width)
                ))
                out.write(subset, idx)
                
            # plot the generated output
            fig, ax = plt.subplots(figsize=(20, 20))
            ax.grid(False)
            rplt.show(out.read(), transform=out_kwargs['transform'], ax=ax)
            berlin.to_crs(out_kwargs['crs']).plot(ax=ax, facecolor='none', edgecolor='#FBC15E', linewidth=2)

../_images/01b-visualization_24_2.png

The plot shows that the region of interest exceeds the boundaries of a single product. The following chapter will provide an analysis of product shapes and data availability.


1

Note that this is just one approach to plot an image that approximates human color perception. There are other techniques that try reconstructing human color perception in different color spaces for example.