Ben Nour

Which Sydney LGAs have the lowest and highest unemployment rates?

Unemployment by LGA in Sydney, Australia

The Australian government's Jobs and Skills Australia has an insightful dataset I recently came across - quarterly Small Area Labour Markets (SALM) estimates of unemployment and the unemployment rate, broken out by local government area (LGA).

I decided to make an interactive map so users can see the different unemployment rates by local government area.

How I did it

I sourced a GeoJSON file of Greater Sydney LGAs from UNSW's CityData platform.

import folium
import requests
import pandas as pd

response = requests.get("https://staging.citydata.be.unsw.edu.au/geoserver/geonode/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=geonode:LGAs_Sydney_and_surrounds&outputFormat=application%2Fjson")
geojson = response.json()

There was some typical data cleaning that I needed to apply to the unemployment dataset (making more readable column headers, converting the data type).

df = pd.read_csv('unemployment.csv')
df = df[['Data item', 'Local Government Area (LGA) (2023 ASGS)', 'Jun-23']]
df = df.query("`Data item` == 'Smoothed unemployment rate (%)'")
df = df.rename(columns={'Local Government Area (LGA) (2023 ASGS)':'LGA', 'Jun-23':'unemployment_rate'})
df['unemployment_rate'] = pd.to_numeric(df['unemployment_rate'], errors='coerce')

But the big issue was that unfortunately not all the LGA names in the CSV matched the LGA names in the GeoJSON file so I had manually updated these records to reflect the naming used in the GeoJSON file.

# Printing the names of the LGAs in the GeoJSON file.
lgas = []
for number in range(0, len(geojson['features'])):
    lga = geojson['features'][number]['properties']['NSW_LGA__3']
    lgas.append(lga)

lgas = list(set(lgas))
geojson_lgas = pd.Series(lgas)
geojson_lgas.name = 'LGA'
geojson_lgas
0         SUTHERLAND SHIRE
1                  BURWOOD
2                LANE COVE
3               WILLOUGHBY
4               INNER WEST
5              STRATHFIELD
6               CUMBERLAND
7           UNINCORPORATED
8                 WAVERLEY
9                  BAYSIDE
10              HAWKESBURY
11                  CAMDEN
12             WOLLONDILLY
13            CAMPBELLTOWN
14               LIVERPOOL
15          BLUE MOUNTAINS
16               FAIRFIELD
17             KU-RING-GAI
18              CANADA BAY
19                 PENRITH
20                 HORNSBY
21                    RYDE
22         THE HILLS SHIRE
23                  SYDNEY
24           CENTRAL COAST
25        NORTHERN BEACHES
26    CANTERBURY-BANKSTOWN
27            NORTH SYDNEY
28            HUNTERS HILL
29                  MOSMAN
30                RANDWICK
31               BLACKTOWN
32           GEORGES RIVER
33               WOOLLAHRA
34              PARRAMATTA
Name: LGA, dtype: object
# Updating LGA names in the unemployment DataFrame.
df['LGA'] = df['LGA'].str.upper()
df.at[1107, 'LGA'] = 'CAMPBELLTOWN'
df.at[1195, 'LGA'] = 'THE HILLS SHIRE'
df.at[1091, 'LGA'] = 'BAYSIDE'
df.at[1190, 'LGA'] = 'SUTHERLAND SHIRE'
df.at[1111, 'LGA'] = 'CENTRAL COAST'

df
Data item LGA unemployment_rate
1086 Smoothed unemployment rate (%) ALBURY 4.5
1087 Smoothed unemployment rate (%) ARMIDALE 2.8
1088 Smoothed unemployment rate (%) BALLINA 1.7
1089 Smoothed unemployment rate (%) BALRANALD 1.8
1090 Smoothed unemployment rate (%) BATHURST 1.6
... ... ... ...
1624 Smoothed unemployment rate (%) WAGAIT 2.3
1625 Smoothed unemployment rate (%) WEST ARNHEM 9.3
1626 Smoothed unemployment rate (%) WEST DALY 13.0
1627 Smoothed unemployment rate (%) UNINCORPORATED NT 5.2
1628 Smoothed unemployment rate (%) UNINCORPORATED ACT 3.0

543 rows × 3 columns

I also needed to modify and recreate the GeoJSON file, to:

  • Add the unemployment rate to the properties of each Feature so the data can be passed to a tooltip popup in the map.

  • Remove suburbs not in Greater Sydney (this dataset includes surrounding LGAs like Hawkesbury and Central Coast).

# Setting the index on the unemployment DataFrame to LGA in order to pass the unemployment rate
# to the GeoJSON properties.
df['LGA_index'] = df['LGA']
df = df.set_index('LGA_index')

new_geojson = {'type':'FeatureCollection'}
# Creating a new GeoJSON file consisiting of only Greater Sydney suburbs
# and adding unemployment_rate to the Feature properties.
features = []
for feature in geojson['features']:
    if (feature_lga:= feature['properties']['NSW_LGA__3']) not in ('UNINCORPORATED', 'CENTRAL COAST', 'BLUE MOUNTAINS', 'WOLLONDILLY', 'HAWKESBURY'):
        try:
            unemployment_rate = df.at[feature_lga, 'unemployment_rate']
            unemployment_rate = str(unemployment_rate) + '%'
            feature['properties']['unemployment_rate'] = unemployment_rate
            features.append(feature)
        except KeyError:
            pass
new_geojson['features'] = features

lgas = []
for number in range(0, len(new_geojson['features'])):
    lga = new_geojson['features'][number]['properties']['NSW_LGA__3']
    lgas.append(lga)

# Creating a Series object from the new GeoJSON file.
lgas = list(set(lgas))
geojson_lgas = pd.Series(lgas)
geojson_lgas.name = 'LGA'

Finally I just had to inner join the two DataFrames to filter out LGAs that aren't in Sydney and then I could create the map!

sydney_lgas_unemployment = pd.merge(geojson_lgas, df, left_on="LGA", right_on="LGA")
m = folium.Map(location=[-33.8688, 151.2093], zoom_start=9.5)

choropleth = folium.Choropleth(
    geo_data=new_geojson,
    data=sydney_lgas_unemployment,
    columns=["LGA", "unemployment_rate"],
    key_on="feature.properties.NSW_LGA__3",
    fill_opacity=0.7,
    line_weight=2,
    fill_color="YlOrRd",
    highlight=True,
    legend_name="Unemployment rate %"
).add_to(m)

tooltip = folium.GeoJsonTooltip(fields=["NSW_LGA__3", "unemployment_rate"], aliases=["LGA", "Unemployment rate"])
choropleth.geojson.add_child(tooltip)

m.save('sydney_unemployment.html')
m
Make this Notebook Trusted to load map: File -> Trust Notebook