Which Sydney LGAs have the lowest and highest unemployment rates?
The Australian government's Jobs and Skills Australia has an insightful dataset I recently came across - quarterly Small Area Labour Markets (SALM) estimates of unemployment and the unemployment rate, broken out by local government area (LGA).
I decided to make an interactive map so users can see the different unemployment rates by local government area.
How I did it
I sourced a GeoJSON file of Greater Sydney LGAs from UNSW's CityData platform.
import folium
import requests
import pandas as pd
response = requests.get("https://staging.citydata.be.unsw.edu.au/geoserver/geonode/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=geonode:LGAs_Sydney_and_surrounds&outputFormat=application%2Fjson")
geojson = response.json()
There was some typical data cleaning that I needed to apply to the unemployment dataset (making more readable column headers, converting the data type).
df = pd.read_csv('unemployment.csv')
df = df[['Data item', 'Local Government Area (LGA) (2023 ASGS)', 'Jun-23']]
df = df.query("`Data item` == 'Smoothed unemployment rate (%)'")
df = df.rename(columns={'Local Government Area (LGA) (2023 ASGS)':'LGA', 'Jun-23':'unemployment_rate'})
df['unemployment_rate'] = pd.to_numeric(df['unemployment_rate'], errors='coerce')
But the big issue was that unfortunately not all the LGA names in the CSV matched the LGA names in the GeoJSON file so I had manually updated these records to reflect the naming used in the GeoJSON file.
# Printing the names of the LGAs in the GeoJSON file.
lgas = []
for number in range(0, len(geojson['features'])):
lga = geojson['features'][number]['properties']['NSW_LGA__3']
lgas.append(lga)
lgas = list(set(lgas))
geojson_lgas = pd.Series(lgas)
geojson_lgas.name = 'LGA'
geojson_lgas
0 SUTHERLAND SHIRE
1 BURWOOD
2 LANE COVE
3 WILLOUGHBY
4 INNER WEST
5 STRATHFIELD
6 CUMBERLAND
7 UNINCORPORATED
8 WAVERLEY
9 BAYSIDE
10 HAWKESBURY
11 CAMDEN
12 WOLLONDILLY
13 CAMPBELLTOWN
14 LIVERPOOL
15 BLUE MOUNTAINS
16 FAIRFIELD
17 KU-RING-GAI
18 CANADA BAY
19 PENRITH
20 HORNSBY
21 RYDE
22 THE HILLS SHIRE
23 SYDNEY
24 CENTRAL COAST
25 NORTHERN BEACHES
26 CANTERBURY-BANKSTOWN
27 NORTH SYDNEY
28 HUNTERS HILL
29 MOSMAN
30 RANDWICK
31 BLACKTOWN
32 GEORGES RIVER
33 WOOLLAHRA
34 PARRAMATTA
Name: LGA, dtype: object
# Updating LGA names in the unemployment DataFrame.
df['LGA'] = df['LGA'].str.upper()
df.at[1107, 'LGA'] = 'CAMPBELLTOWN'
df.at[1195, 'LGA'] = 'THE HILLS SHIRE'
df.at[1091, 'LGA'] = 'BAYSIDE'
df.at[1190, 'LGA'] = 'SUTHERLAND SHIRE'
df.at[1111, 'LGA'] = 'CENTRAL COAST'
df
Data item | LGA | unemployment_rate | |
---|---|---|---|
1086 | Smoothed unemployment rate (%) | ALBURY | 4.5 |
1087 | Smoothed unemployment rate (%) | ARMIDALE | 2.8 |
1088 | Smoothed unemployment rate (%) | BALLINA | 1.7 |
1089 | Smoothed unemployment rate (%) | BALRANALD | 1.8 |
1090 | Smoothed unemployment rate (%) | BATHURST | 1.6 |
... | ... | ... | ... |
1624 | Smoothed unemployment rate (%) | WAGAIT | 2.3 |
1625 | Smoothed unemployment rate (%) | WEST ARNHEM | 9.3 |
1626 | Smoothed unemployment rate (%) | WEST DALY | 13.0 |
1627 | Smoothed unemployment rate (%) | UNINCORPORATED NT | 5.2 |
1628 | Smoothed unemployment rate (%) | UNINCORPORATED ACT | 3.0 |
543 rows × 3 columns
I also needed to modify and recreate the GeoJSON file, to:
-
Add the unemployment rate to the properties of each Feature so the data can be passed to a tooltip popup in the map.
-
Remove suburbs not in Greater Sydney (this dataset includes surrounding LGAs like Hawkesbury and Central Coast).
# Setting the index on the unemployment DataFrame to LGA in order to pass the unemployment rate
# to the GeoJSON properties.
df['LGA_index'] = df['LGA']
df = df.set_index('LGA_index')
new_geojson = {'type':'FeatureCollection'}
# Creating a new GeoJSON file consisiting of only Greater Sydney suburbs
# and adding unemployment_rate to the Feature properties.
features = []
for feature in geojson['features']:
if (feature_lga:= feature['properties']['NSW_LGA__3']) not in ('UNINCORPORATED', 'CENTRAL COAST', 'BLUE MOUNTAINS', 'WOLLONDILLY', 'HAWKESBURY'):
try:
unemployment_rate = df.at[feature_lga, 'unemployment_rate']
unemployment_rate = str(unemployment_rate) + '%'
feature['properties']['unemployment_rate'] = unemployment_rate
features.append(feature)
except KeyError:
pass
new_geojson['features'] = features
lgas = []
for number in range(0, len(new_geojson['features'])):
lga = new_geojson['features'][number]['properties']['NSW_LGA__3']
lgas.append(lga)
# Creating a Series object from the new GeoJSON file.
lgas = list(set(lgas))
geojson_lgas = pd.Series(lgas)
geojson_lgas.name = 'LGA'
Finally I just had to inner join the two DataFrames to filter out LGAs that aren't in Sydney and then I could create the map!
sydney_lgas_unemployment = pd.merge(geojson_lgas, df, left_on="LGA", right_on="LGA")
m = folium.Map(location=[-33.8688, 151.2093], zoom_start=9.5)
choropleth = folium.Choropleth(
geo_data=new_geojson,
data=sydney_lgas_unemployment,
columns=["LGA", "unemployment_rate"],
key_on="feature.properties.NSW_LGA__3",
fill_opacity=0.7,
line_weight=2,
fill_color="YlOrRd",
highlight=True,
legend_name="Unemployment rate %"
).add_to(m)
tooltip = folium.GeoJsonTooltip(fields=["NSW_LGA__3", "unemployment_rate"], aliases=["LGA", "Unemployment rate"])
choropleth.geojson.add_child(tooltip)
m.save('sydney_unemployment.html')
m