CT to Text Workflow Development YML

In this post, we’ll explore the GitHub Actions workflow responsible for taking newly published MorphoSource CT data and converting it into readable text summaries. We’ll focus specifically on the .yml configuration. This workflow automates:

  1. Checking for new morphosource-updates releases
  2. Generating a timestamp
  3. Calling a custom Python script to transform the release content into a concise summary
  4. Publishing a new release in GitHub with the generated text

Below is the workflow file, named CT to Text, annotated step-by-step.

1. Workflow Trigger: on: workflow_run

The workflow is triggered whenever another workflow named Parse MorphoSource Data completes. This ensures the CT-to-text analysis runs right after data extraction is done. The types: [completed] indicates we only care about the completion event of that workflow.

on:
  workflow_run:
    workflows: ["Parse MorphoSource Data"]
    types: [completed]

2. Job Configuration: jobs: ct_text_job

This is where we define our main job. It runs on Ubuntu and will only proceed if the previous workflow run was successful (if: $).

ct_text_job:
  runs-on: ubuntu-latest
  if: $

3. Step 1 – Check out your repository

We use actions/checkout to clone the repository so we can access our Python script and other files.

- name: Check out repo
  uses: actions/checkout@v3

4. Step 2 – Set up Python

We specify Python version 3.9 with actions/setup-python. This ensures the environment is ready to install our dependencies.

- name: Set up Python
  uses: actions/setup-python@v4
  with:
    python-version: "3.9"

5. Step 3 – Install Dependencies

We install the required Python packages—requests, beautifulsoup4, and openai. These libraries let us handle HTTP requests, parse HTML, and call OpenAI’s API, respectively.

- name: Install Dependencies
  run: pip install requests beautifulsoup4 openai

6. Step 4 – Fetch Latest Release

Using curl, we grab the latest release from the current repository’s GitHub Releases. We store the response in a JSON file (latest_release.json) for debugging and extract key data:

We save these values as step outputs, so subsequent steps can easily reference them.

- name: Fetch Latest Release
  id: fetch_release
  env:
    GITHUB_TOKEN: $
  run: |
    response=$(curl -sSL -H "Authorization: Bearer $GITHUB_TOKEN" \
      https://api.github.com/repos/$/releases/latest)
    ...
    echo "release_body<<EOF" >> "$GITHUB_OUTPUT"
    echo "$body" >> "$GITHUB_OUTPUT"
    echo "EOF" >> "$GITHUB_OUTPUT"
    echo "release_tag=$tag_name" >> "$GITHUB_OUTPUT"

7. Step 5 – Check if morphosource-updates

We inspect the tag_name to see if it starts with morphosource-updates-. If true, we mark an output is_morphosource=true; otherwise, we skip the rest of the pipeline.

- name: Check if morphosource-updates
  id: check_morpho
  run: |
    TAG_NAME="$"
    ...
    if [[ "$TAG_NAME" == morphosource-updates-* ]]; then
      echo "is_morphosource=true" >> "$GITHUB_OUTPUT"
    else
      echo "is_morphosource=false" >> "$GITHUB_OUTPUT"
    fi

8. Step 6 – Generate Timestamp

If we have a morphosource-updates release, we create a timestamp for versioning. This step outputs a formatted date/time string like 2025-01-08_13-45-22.

- name: Generate Timestamp
  id: gen_ts
  if: steps.check_morpho.outputs.is_morphosource == 'true'
  run: |
    TS=$(date +'%Y-%m-%d_%H-%M-%S')
    echo "timestamp=$TS" >> "$GITHUB_OUTPUT"

9. Step 7 – Run CT to Text

Here we run our custom Python script, ct_to_text.py. It takes the release_body and processes it with OpenAI. The multi-line text summary is stored in ct_output.txt and then passed along as an output (description) for later steps.

- name: Run CT to Text
  id: ct2text
  if: steps.check_morpho.outputs.is_morphosource == 'true'
  env:
    OPENAI_API_KEY: $
  run: |
    echo "$" > release_body.txt
    python .github/scripts/ct_to_text.py release_body.txt > ct_output.txt

    echo "description<<EOF" >> "$GITHUB_OUTPUT"
    cat ct_output.txt >> "$GITHUB_OUTPUT"
    echo "EOF" >> "$GITHUB_OUTPUT"

10. Step 8 – Create or Update Release with AI Summary

With our summary (description) in hand, we leverage the actions/create-release action to publish a new release. Its tag is prefixed with ct_to_text_analysis-, plus the timestamp. The body of the release is the AI-generated summary.

- name: Create or Update Release with AI Summary
  if: steps.check_morpho.outputs.is_morphosource == 'true'
  uses: actions/create-release@v1
  env:
    GITHUB_TOKEN: $
  with:
    tag_name: "ct_to_text_analysis-$"
    release_name: "CT to Text Analysis #$"
    body: $
    draft: false
    prerelease: false

11. Step 9 – Fallback step if not morphosource-updates

When no new morphosource-updates tag is found, this step logs that no CT-to-text analysis is performed.

- name: No new morphosource release
  if: steps.check_morpho.outputs.is_morphosource == 'false'
  run: echo "No 'morphosource-updates-*' release found. Skipping CT-to-text analysis."

Conclusion

This CT to Text GitHub Actions workflow automates the detection of new morphosource-updates releases, runs a custom Python script for AI-based text generation, and publishes the result as a GitHub release. By breaking down each step, you can adapt and extend this workflow for additional transformations, checks, or notifications. Stay tuned for our next post, where we’ll dive into the accompanying Python script (ct_to_text.py) to see how it transforms CT metadata into a human-readable summary.


← Previous Post $~~~~~~~~~~~$ Next Post →