Reducing the size of GeoJSON files with geojson-shave
Giant GeoJSON files can be a nightmare, crashing your IDE, GIS software or browser (and potentially causing you to tear your hair out in frustration!).
I use GeoJSON files quite often so I decided to create a command-line tool that reduces the size of GeoJSON files.
You can view the project homepage here.
You can install the tool using pip:
$ pip install geojson-shave
Usage
geojson-shave
reduces the size of GeoJSON files by truncating latitude/longitude coordinates to the specified decimal places,
eliminating unnecessary whitespace and (optionally) replacing the properties key's value with null/empty dictionary.
Simply pass the file path of your GeoJSON file and it will truncuate the coordinates to 5 decimal places, outputing to the current working directory:
$ geojson-shave roads.geoson
Alterntatively you can specify the number of decimal points you want the coordiantes truncuated to:
$ geojson-shave roads.geojson -d 3
You can also specify if you only want certain Geometry object types in the file to be processed:
$ geojson-shave roads.geojson -g LineString Polygon
Note that the -g option doesn't apply to objects nested within Geometry Collection.
And to reduce the file size even further you can nullify the property value of Feature objects:
$ geojson-shave roads.geojson -p
Output to a directory other than the current working directory:
$ geojson-shave roads.geojson -o ../data/output.geojson
How I did it
To fully understand how the command-line tool works you can read the source code but to truncuate coordinates I used a recursive function:
def _create_coordinates(coordinates, precision):
"""Create truncuated coordinates."""
new_coordinates = []
for item in coordinates:
if isinstance(item, list):
new_coordinates.append(_create_coordinates(item, precision))
else:
item = round(item, precision)
new_coordinates.append(float(item))
return new_coordinates
Because there are different types of GeoJSON Geometry objects with varying levels of nested coordinates, recursion was critical to traversing these hierarchial data structures.
For example, you can see the difference between a Point and Polygon objects' coordinates:
{
"type": "Point",
"coordinates": [100.0, 0.0]
},
{
"type": "Polygon",
"coordinates": [
[
[100.0, 0.0],
[101.0, 0.0],
[101.0, 1.0],
[100.0, 1.0],
[100.0, 0.0]
],
[
[100.8, 0.8],
[100.8, 0.2],
[100.2, 0.2],
[100.2, 0.8],
[100.8, 0.8]
]
]
}
Comments !