Mean array scaling in Python

scikit-image’s resize_local_mean function is an easy way to rescale a 2D numpy array, when you want the value of the new pixels to represent the mean of the pixels they were covering:

import numpy as np
import skimage.transform

# Small test array.
x = np.linspace(0, 3, 7)
y = np.linspace(0, 1, 5)
xx, yy = np.meshgrid(x, y)
a = xx + yy
new_shape = (3, 4)

# Do the resize.
a_resized = skimage.transform.resize_local_mean(a, new_shape, grid_mode=True)

However, resize_local_mean runs very slowly on large arrays.

OpenCV has a similar algorithm that scales much better to large arrays! The function just becomes a little more complicated:

  • The dsize argument is reversed compared to the numpy ecosystem.
  • cv2.resize expects a normalised array with values between 0 and 1.
  • The normalisation also means checking for an ill-conditioned array.
import cv2


def opencv_resize_local_mean(a: np.ndarray, new_shape: tuple[int, int]) -> np.ndarray:
    a_min = np.min(a)
    a_max = np.max(a)

    # Early exit to avoid divide by zero.
    if a_min == a_max:
        return np.zeros(new_shape) + a_min

    # Rescale the array to [0, 1].
    a_norm = (a - a_min) / (a_max - a_min)
    a_norm = a_norm.astype(np.float32)

    # Do the resizing.
    new_shape_r = [new_shape[1], new_shape[0]]
    res_norm = cv2.resize(a_norm, new_shape_r, interpolation=cv2.INTER_AREA)

    # Unscale the array.
    res = res_norm * (a_max - a_min) + a_min
    assert np.array_equal(res.shape, new_shape)

For a test resizing of a 10,000 px DEM to 3,600 px, the average error between the two solutions is small. The max error was about 2 mm.

diff = res_skimage - res_cv2
np.mean(diff): 1.108501658070558e-05
np.median(diff): -7.279990974495831e-07
np.max(diff): 0.01960058534207576
np.min(diff): -1.3228989928393275e-05

The runtime is a lot better also for large arrays:

5,000px -> 3600px
  resize_local_mean 3.482 [s]
  cv2_area 0.229 [s]

10,000px -> 3,600px
  m_resize_local_mean 11.541 [s]
  cv2_area 0.743 [s]

20,000px -> 3,600px
  m_resize_local_mean 33.303 [s]
  cv2_area 2.822 [s]

And finally, the CPU usage is a lot lower also. While the latency of a single invocation is about 10x faster with OpenCV, if you’re saturating your CPU running these calculations (as I was!) I found my throughput increased by about 50x.


DEMs as a service

GPXZ’s raster API lets you specify your desired bounds and resolution: we do the resizing and warping for you!

If you need high-quality elevation DEMs, get in touch at [email protected]