Fixing sparse .xyz files
I’ve been working with the Northern Ireland DTM dataset, which is provided as .txt
files like this:
293465.0000 444005.0000 1.6954
293475.0000 444005.0000 0.0746
293485.0000 444005.0000 0.1014
...
At first glance it looks like an .xyz
file which is supported by gdal. However, gdal complains when parsing them:
gdal_translate -a_srs EPSG:29903 Sheet001v4.txt Sheet001v4.tif
ERROR 1: Ungridded dataset: At line 38676, too many stepY values
It turns out that .xyz
files cannot be sparse: they have to contain a row for every x, y coordinate combination, while the NI DTM files omit cells with missing data.
I’m not the first person to come across this issue: there’s a couple of answers on the GIS stackexchange suggesting to use gdal_grid
, but I couldn’t get any to work properly.
Instead, I found it easier and faster to fill in the missing cells with pandas in python:
import numpy as np
import pandas as pd
# Setup.
input_file = 'Sheet001v4.txt'
output_file = 'Sheet001v4.csv'
resolution = 10
nodata_value = -9999
# Load txt file.
df = pd.read_csv(input_file, sep='\s+', header=None, names=['x', 'y', 'z'])
# Figure out which x and y values are needed.
x_vals = set(np.arange(df.x.min(), df.x.max() + resolution, resolution))
y_vals = set(np.arange(df.y.min(), df.y.max() + resolution, resolution))
# For each x value, find any missing y values, and add a NODATA row.
dfs = [df]
for x in x_vals:
y_vals_missing = y_vals - set(df[df.x == x].y)
if y_vals_missing:
df_missing = pd.DataFrame({'x': x, 'y': y_vals_missing, 'z': nodata_value})
dfs.append(df_missing)
# Build full csv, and sort to xyz spec.
df = pd.concat(dfs, ignore_index=True)
df = df.sort_values(['y', 'x'])
# Check.
assert len(df) == len(x_vals) * len(y_vals)
# Output.
df.to_csv(output_file, index=False, header=False)
The resulting file can be used with gdal normally.
gdal_translate -a_srs EPSG:29903 -a_nodata -9999 Sheet001v4.csv Sheet001v4.tif