How LERC raster compression works
What is LERC compression?
LERC (limited error raster compression) is a format for compressing raster images.
If you’re using GDAL version 2.4 or later, you can LERC compress a raster just like you would with Deflate:
gdal_translate -co COMPRESS=LERC -co MAX_Z_ERROR=0.01 raster.tif raster.compressed.tif
LERC is a lossy compression algorithm: it is able to achieve very high compression ratios, but the data is modified slightly in the process. Unlike subjective lossy compression methods like jpeg, LERC provides a guarantee: the error in a pixel after compression won’t exceed MAX_Z_ERROR
.
As we’ll see later, LERC-compressed data has a number of additional properties that make it great for scientific analysis.
How LERC compression works
I was surprised to see that the LERC compression algorithm is fairly grokkable, so I’m including a section here on how it works. If you’re not interested, skip straight to Properties of LERC-compressed data
tldr: data is rounded to the nearest 2 * MAX_Z_ERROR
, then losslessly compressed.
Imagine we have this 3x4 raster we want to compress with a maximum error of 0.1
35.254 | 35.254 | 41.039 | 50.369 |
35.254 | 35.254 | 40.837 | 48.253 |
29.836 | 35.254 | 39.598 | 46.181 |
The first step in the LERC algorithm is to rescale each pixel \(x_i\) by a factor of \(2 \cdot \mathrm{maxError}\) and shift it to start at 0.
\[s_i = \frac{x_i - \mathrm{min}(x)}{2 \cdot \mathrm{maxError}} = \frac{x_i - 29.836}{0.2}\]27.090 | 27.090 | 56.015 | 102.665 |
27.090 | 27.090 | 55.005 | 92.085 |
0.000 | 27.090 | 48.810 | 81.725 |
Next, each scaled pixel \(s_i\) is rounded to the nearest integer \(r_i\). This is the only non-reversible step of the algorithm, and it’s where loss/error is introduced.
Because the original data was scaled by \(2 * \mathrm{maxError}\), the rounding in this step will modify the data by a maximum of \(\mathrm{maxError}\) in either direction (in the unscaled units), preserving the bounded loss guarantee!
\[r_i = \lfloor s_i + 0.5 \rfloor\]27 | 27 | 56 | 103 |
27 | 27 | 55 | 92 |
0 | 27 | 49 | 82 |
So far all we’ve done is convert our data from floating point numbers to integers. The last step is to actually compress those integers.
The integers produced by the lossy scaling process are usually smaller than the maximum 32 bit integer 4,294,967,295, so compression is done by using the fewest possible bits to represent the highest scaled integer:
\[\mathrm{nBits} = \lceil \log_2 (\max(r_i)) \rceil = \lceil \log_2 (103) \rceil = 7\]Our example data can be represented as 7 bit integers instead of 32 bit floats, for a compression ratio of \(\frac{7}{32} = 0.2\), which is much better than standard lossless algorithms typically achieve!
0011011 | 0011011 | 0111000 | 1100111 |
0011011 | 0011011 | 0110111 | 1011100 |
0000000 | 0011011 | 0110001 | 1010010 |
To decompress the data, you need to know \(\mathrm{\min}(x)\), \(\mathrm{maxError}\) and \(\mathrm{nBits}\) (all of which are stored in a header by LERC) and invert the scaling:
\[\bar{x}_i = 2 \cdot \mathrm{maxError} \cdot r_i + \min{x}\]35.136 | 35.136 | 40.936 | 50.336 |
35.136 | 35.136 | 40.736 | 48.136 |
29.736 | 35.136 | 39.536 | 46.136 |
You can see that the new data values differ slightly from the original ones, but by less than 0.1. Here’s the errors:
0.018 | 0.018 | 0.003 | -0.067 |
0.018 | 0.018 | 0.001 | 0.017 |
0. | 0.018 | -0.038 | -0.055 |
How LERC compression actually works
I made a few simplifications in outlining the algorithm, here are the additional details:
- If \(\mathrm{nBits}\) is more than the number of bits used to store the original data, the data is stored uncompressed.
- If \(\mathrm{maxError}\) is 0, the data is stored uncompressed.
- Null values are removed before compression. A binary mask is built indicating which pixels are null, then this mask is compressed using Run-length encoding.
- Raster file formats like
.geotiff
split data into blocks/tiles/strips. LERC compresses these blocks independently: each gets its own header, null mask, and compressed data section.
Properties of LERC-compressed data
- LERC errors are deterministic: all pixels with a value \(z\) will have the same value \(\bar{z} = x + \mathrm{err}(z)\) after compression. This means that areas of constant value (like lakes in an elevation dataset) will still have constant value after compression.
- This only applies within a tiff block. You may want to increase block size larger than the scale of constant-value features.
- When compressing random noise, errors are uniformly distributed between \(-\mathrm{maxErr}\) and \(\mathrm{maxErr}\).
- Large areas of constant value may skew this distribution.
- Errors are introduced by rounding, so two values that differ by less than \(2 \cdot \mathrm{maxErr}\) will either have the same value or be exactly \(2 \cdot \mathrm{maxErr}\) apart.
- LERC skips compression if it would increase size for that tile, so it’s fine to use opportunistically with a very small \(\mathrm{maxErr}\).
- LERC builds a compressed mask for NODATA values before compressing the valid values, so LERC works well with sparse data.
- LERC’s compression ratio depends on the range (max - min) of values. When compressing data that has large macro variations compared to local noise, a tiled raster will compress better.
- LERC compression is stable: repeated compression with the same \(\mathrm{maxErr}\) value won’t change the values after the first compression.
- There is no error on the smallest value in the data. So if you have a land-only DEM with a min value of 0 denoting water, it will still be 0 after LERC compression.
Thanks to Ákos Halmai for pointing out a mathematical mistake in this post (now fixed)!