Ben Nour

Reducing the size of GeoJSON files with geojson-shave

GeoJSON-shave

Giant GeoJSON files can be a nightmare, crashing your IDE, GIS software or browser (and potentially causing you to tear your hair out in frustration!).

I use GeoJSON files quite often so I decided to create a command-line tool that reduces the size of GeoJSON files.


GeoJSON-shave-demo

You can view the project homepage here.

You can install the tool using pip:

$ pip install geojson-shave

Usage

geojson-shave reduces the size of GeoJSON files by truncating latitude/longitude coordinates to the specified decimal places, eliminating unnecessary whitespace and (optionally) replacing the properties key's value with null/empty dictionary.

Simply pass the file path of your GeoJSON file and it will truncuate the coordinates to 5 decimal places, outputing to the current working directory:

$ geojson-shave roads.geoson

Alterntatively you can specify the number of decimal points you want the coordiantes truncuated to:

$ geojson-shave roads.geojson -d 3

You can also specify if you only want certain Geometry object types in the file to be processed:

$ geojson-shave roads.geojson -g LineString Polygon

Note that the -g option doesn't apply to objects nested within Geometry Collection.

And to reduce the file size even further you can nullify the property value of Feature objects:

$ geojson-shave roads.geojson -p

Output to a directory other than the current working directory:

$ geojson-shave roads.geojson -o ../data/output.geojson

How I did it

To fully understand how the command-line tool works you can read the source code but to truncuate coordinates I used a recursive function:

def _create_coordinates(coordinates, precision):
    """Create truncuated coordinates."""
    new_coordinates = []
    for item in coordinates:
        if isinstance(item, list):
            new_coordinates.append(_create_coordinates(item, precision))
        else:
            item = round(item, precision)
            new_coordinates.append(float(item))
    return new_coordinates

Because there are different types of GeoJSON Geometry objects with varying levels of nested coordinates I had to use recursion, a technique I hadn't used before but I had fun employing.

For example, you can see the difference between a Point and Polygon objects' coordinates:

{
         "type": "Point",
         "coordinates": [100.0, 0.0]
     },
{
         "type": "Polygon",
         "coordinates": [
             [
                 [100.0, 0.0],
                 [101.0, 0.0],
                 [101.0, 1.0],
                 [100.0, 1.0],
                 [100.0, 0.0]
             ],
             [
                 [100.8, 0.8],
                 [100.8, 0.2],
                 [100.2, 0.2],
                 [100.2, 0.8],
                 [100.8, 0.8]
             ]
         ]
     }

Comments !