Ben Nour

Reducing the size of GeoJSON files with geojson-shave

GeoJSON-shave

Giant GeoJSON files can be a nightmare, crashing your IDE, GIS software or browser (and potentially causing you to tear your hair out in frustration!).

I use GeoJSON files quite often so I decided to create a command-line tool that reduces the size of GeoJSON files.


GeoJSON-shave-demo

You can view the project homepage here.

You can install the tool using pip:

$ pip install geojson-shave

Usage

geojson-shave reduces the size of GeoJSON files by truncating latitude/longitude coordinates to the specified decimal places, eliminating unnecessary whitespace and (optionally) replacing the properties key's value with null/empty dictionary.

Simply pass the file path of your GeoJSON file and it will truncuate the coordinates to 5 decimal places, outputing to the current working directory:

$ geojson-shave roads.geoson

Alterntatively you can specify the number of decimal points you want the coordiantes truncuated to:

$ geojson-shave roads.geojson -d 3

You can also specify if you only want certain Geometry object types in the file to be processed:

$ geojson-shave roads.geojson -g LineString Polygon

Note that the -g option doesn't apply to objects nested within Geometry Collection.

And to reduce the file size even further you can nullify the property value of Feature objects:

$ geojson-shave roads.geojson -p

Output to a directory other than the current working directory:

$ geojson-shave roads.geojson -o ../data/output.geojson

How I did it

To fully understand how the command-line tool works you can read the source code but to truncuate coordinates I used a recursive function:

def _create_coordinates(coordinates, precision):
    """Create truncuated coordinates."""
    new_coordinates = []
    for item in coordinates:
        if isinstance(item, list):
            new_coordinates.append(_create_coordinates(item, precision))
        else:
            item = round(item, precision)
            new_coordinates.append(float(item))
    return new_coordinates

Because there are different types of GeoJSON Geometry objects with varying levels of nested coordinates, recursion was critical to traversing these hierarchial data structures.

For example, you can see the difference between a Point and Polygon objects' coordinates:

{
         "type": "Point",
         "coordinates": [100.0, 0.0]
     },
{
         "type": "Polygon",
         "coordinates": [
             [
                 [100.0, 0.0],
                 [101.0, 0.0],
                 [101.0, 1.0],
                 [100.0, 1.0],
                 [100.0, 0.0]
             ],
             [
                 [100.8, 0.8],
                 [100.8, 0.2],
                 [100.2, 0.2],
                 [100.2, 0.8],
                 [100.8, 0.8]
             ]
         ]
     }

Comments !