Ben Nour

How I created a Twitter bot that posts about science fiction books

As an avid reader I thought it would be cool to combine my love of literature and data to create a Twitter bot that posts about sci-fi books :

Twitter-bot

Specifically I wanted to recommend science fiction books that you could read free of charge, something made possible thanks to Project Gutenberg, a volunteer-run organisation that hosts a collection of public domain works.

Data source/collection

PG very helpfully offers a CSV feed, which I used to download sci-fi books from their catalog and upload to a SQLite database:

import csv
import io
import sqlite3

import requests

SQL = """
INSERT INTO books_catalog (book_id, title, authors)
VALUES (?, ?, ?)
"""

response = requests.get(URL, stream=True, timeout=240)
content = response.content.decode("utf-8")
csv_file = io.StringIO(content)
csv_reader = csv.reader(csv_file)
sf_books = [
    row for row in csv_reader if row[1] == "Text" and "Science Fiction" in row[8]
]
processed_sf_books = []
for book in sf_books:
    processed_book = []
    for index, field in enumerate(book):
        if index in (0, 3, 5):
            field = field.replace("\n", " ")
            field = field.replace("\r", "")
            processed_book.append(field)
    processed_sf_books.append(processed_book)

with sqlite3.connect(database) as conn:
    cursor = conn.cursor()
    cursor.executemany(SQL, processed_sf_books)
    conn.commit()

Here's a snippet of the catalog:

Text# Type Issued Title Language Authors Subjects LoCC Bookshelves
64 Text 1993-05-01 The Gods of Mars en Burroughs, Edgar Rice, 1875-1950 Science fiction; Mars (Planet) -- Fiction; Life on other planets -- Fiction; Carter, John (Fictitious character) -- Fiction; Dejah Thoris (Fictitious character) -- Fiction PS Science Fiction
155 Text 2006-01-12 The Moonstone en Collins, Wilkie, 1824-1889 England -- Fiction; Country homes -- Fiction; Police -- England -- Fiction; Jewelry theft -- Fiction; East Indians -- England -- Fiction; Mystery fiction PR Detective Fiction; Mystery Fiction

Cleaning the data

The Authors field metadata is in the "surname-first" format (used in academic and scientific writing), which is unfortunately not as readable for the intended end user of this bot:

For example:

Text# Title Authors
28767 The Defenders Dick, Philip K., 1928-1982; Emshwiller, Ed, 1925-1990 [Illustrator]

In order to clean this data, I created a clean_authors() function that takes advantage of regex:

def clean_authors(authors):
    """Clean the authors string into a more
    readable string."""

    PATTERN = "\[(Illustrator|Editor|Translator|Contributor)\]"

    # Remove years from authors' names and split.
    authors = [re.sub(" [0-9]{4}-[0-9]{4}", "", author) for author in authors.split(";")]
    # Clean each individual authors' name.
    cleaned_authors = []
    for author in authors:
        if (match := re.search(PATTERN, author)) is not None:
            new_author = re.sub(PATTERN, "", author)
            new_author = [word.strip() for word in new_author.split(",") if word != " "]
            new_author.reverse()
            new_author.append(match.group())
            new_author = " ".join(new_author)
            cleaned_authors.append(new_author)
        else:
            new_author = [word.strip() for word in author.split(",") if word != " "]
            new_author.reverse()
            new_author = " ".join(new_author)
            cleaned_authors.append(new_author)
    # Create final string of authors' names.
    cleaned_authors = " and ".join(cleaned_authors)
    return cleaned_authors

Using our The Defenders example from before, here is the cleaned data:

Twitter-bot

Choosing a book to post about

I won't go into great detail but I utilised SQL to a great extent throughout the script, including an anti-join to get a list of books that had not yet been posted about:

SELECT 
BC.*
FROM books_catalog bc 
    LEFT JOIN books_posted bp 
    ON bc.BOOK_ID = bp.BOOK_ID
    OR BC.TITLE = BP.TITLE 
WHERE BP.BOOK_ID IS NULL

Automation

In order to make my Twitter bot automatically post a random book recommendation twice a day (at 7am and 7pm UTC time), I utilized Github Actions to create a customized workflow. This allowed me to automate the process of selecting and posting a book recommendation without the need for manual intervention:

name: Post book tweet

on: 
  schedule: 
  - cron: "0 7 * * *"
  - cron: "0 19 * * *"

permissions: write-all

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - name: Install Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.10'
    - name: Checkout repository content
      uses: actions/checkout@master
    - name: Install requirements.txt
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
    - name: Set enviromental secrets and execute twitter_bot.py
      env:
          PG_TWITTER_ACCESS_TOKEN: ${{ secrets.PG_TWITTER_ACCESS_TOKEN }}
          PG_TWITTER_ACCESS_TOKEN_SECRET: ${{ secrets.PG_TWITTER_ACCESS_TOKEN_SECRET }}
          PG_TWITTER_BEARER_TOKEN: ${{ secrets.PG_TWITTER_BEARER_TOKEN }}
          PG_TWITTER_CONSUMER_KEY: ${{ secrets.PG_TWITTER_CONSUMER_KEY }}
          PG_TWITTER_CONSUMER_SECRET: ${{ secrets.PG_TWITTER_CONSUMER_SECRET }}
      run: |
          python src/twitter_bot.py
    - name: update repo
      run: |
        git config user.email ${{ secrets.EMAIL }}
        git config user.name "Ben"
        git config user.username ben-n93
        git config user.password ${{ secrets.PERSONAL_ACCESS_TOKEN}}
        git add --all
        git commit -m "update"
        git push

In order to ensure that my Twitter bot could post about any new sci-fi books added to the PG catalog, I also used GitHub Actions to automate the extraction of the catalog data on the first day of every month (I won't include the YAML file here but you can view it in my repo).

You can view the full source code on Github.

Comments !