How I created a Twitter bot that posts about science fiction books
As an avid reader I thought it would be cool to combine my love of literature and data to create a Twitter bot that posts about sci-fi books :
Specifically I wanted to recommend science fiction books that you could read free of charge, something made possible thanks to Project Gutenberg, a volunteer-run organisation that hosts a collection of public domain works.
Data source/collection
PG very helpfully offers a CSV feed, which I used to download sci-fi books from their catalog and upload to a SQLite database:
import csv
import io
import sqlite3
import requests
SQL = """
INSERT INTO books_catalog (book_id, title, authors)
VALUES (?, ?, ?)
"""
response = requests.get(URL, stream=True, timeout=240)
content = response.content.decode("utf-8")
csv_file = io.StringIO(content)
csv_reader = csv.reader(csv_file)
sf_books = [
row for row in csv_reader if row[1] == "Text" and "Science Fiction" in row[8]
]
processed_sf_books = []
for book in sf_books:
processed_book = []
for index, field in enumerate(book):
if index in (0, 3, 5):
field = field.replace("\n", " ")
field = field.replace("\r", "")
processed_book.append(field)
processed_sf_books.append(processed_book)
with sqlite3.connect(database) as conn:
cursor = conn.cursor()
cursor.executemany(SQL, processed_sf_books)
conn.commit()
Here's a snippet of the catalog:
Text# | Type | Issued | Title | Language | Authors | Subjects | LoCC | Bookshelves |
---|---|---|---|---|---|---|---|---|
64 | Text | 1993-05-01 | The Gods of Mars | en | Burroughs, Edgar Rice, 1875-1950 | Science fiction; Mars (Planet) -- Fiction; Life on other planets -- Fiction; Carter, John (Fictitious character) -- Fiction; Dejah Thoris (Fictitious character) -- Fiction | PS | Science Fiction |
155 | Text | 2006-01-12 | The Moonstone | en | Collins, Wilkie, 1824-1889 | England -- Fiction; Country homes -- Fiction; Police -- England -- Fiction; Jewelry theft -- Fiction; East Indians -- England -- Fiction; Mystery fiction | PR | Detective Fiction; Mystery Fiction |
Cleaning the data
The Authors field metadata is in the "surname-first" format (used in academic and scientific writing), which is unfortunately not as readable for the intended end user of this bot:
For example:
Text# | Title | Authors |
---|---|---|
28767 | The Defenders | Dick, Philip K., 1928-1982; Emshwiller, Ed, 1925-1990 [Illustrator] |
In order to clean this data, I created a clean_authors()
function that takes advantage of regex:
def clean_authors(authors):
"""Clean the authors string into a more
readable string."""
PATTERN = "\[(Illustrator|Editor|Translator|Contributor)\]"
# Remove years from authors' names and split.
authors = [re.sub(" [0-9]{4}-[0-9]{4}", "", author) for author in authors.split(";")]
# Clean each individual authors' name.
cleaned_authors = []
for author in authors:
if (match := re.search(PATTERN, author)) is not None:
new_author = re.sub(PATTERN, "", author)
new_author = [word.strip() for word in new_author.split(",") if word != " "]
new_author.reverse()
new_author.append(match.group())
new_author = " ".join(new_author)
cleaned_authors.append(new_author)
else:
new_author = [word.strip() for word in author.split(",") if word != " "]
new_author.reverse()
new_author = " ".join(new_author)
cleaned_authors.append(new_author)
# Create final string of authors' names.
cleaned_authors = " and ".join(cleaned_authors)
return cleaned_authors
Using our The Defenders example from before, here is the cleaned data:
Choosing a book to post about
I won't go into great detail but I utilised SQL to a great extent throughout the script, including an anti-join to get a list of books that had not yet been posted about:
SELECT
BC.*
FROM books_catalog bc
LEFT JOIN books_posted bp
ON bc.BOOK_ID = bp.BOOK_ID
OR BC.TITLE = BP.TITLE
WHERE BP.BOOK_ID IS NULL
Automation
In order to make my Twitter bot automatically post a random book recommendation twice a day (at 7am and 7pm UTC time), I utilized Github Actions to create a customized workflow. This allowed me to automate the process of selecting and posting a book recommendation without the need for manual intervention:
name: Post book tweet
on:
schedule:
- cron: "0 7 * * *"
- cron: "0 19 * * *"
permissions: write-all
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Install Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Checkout repository content
uses: actions/checkout@master
- name: Install requirements.txt
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Set enviromental secrets and execute twitter_bot.py
env:
PG_TWITTER_ACCESS_TOKEN: ${{ secrets.PG_TWITTER_ACCESS_TOKEN }}
PG_TWITTER_ACCESS_TOKEN_SECRET: ${{ secrets.PG_TWITTER_ACCESS_TOKEN_SECRET }}
PG_TWITTER_BEARER_TOKEN: ${{ secrets.PG_TWITTER_BEARER_TOKEN }}
PG_TWITTER_CONSUMER_KEY: ${{ secrets.PG_TWITTER_CONSUMER_KEY }}
PG_TWITTER_CONSUMER_SECRET: ${{ secrets.PG_TWITTER_CONSUMER_SECRET }}
run: |
python src/twitter_bot.py
- name: update repo
run: |
git config user.email ${{ secrets.EMAIL }}
git config user.name "Ben"
git config user.username ben-n93
git config user.password ${{ secrets.PERSONAL_ACCESS_TOKEN}}
git add --all
git commit -m "update"
git push
In order to ensure that my Twitter bot could post about any new sci-fi books added to the PG catalog, I also used GitHub Actions to automate the extraction of the catalog data on the first day of every month (I won't include the YAML file here but you can view it in my repo).
You can view the full source code on Github.
Comments !