CT to Text Workflow Development YML
In this post, we’ll explore the GitHub Actions workflow responsible for taking newly published MorphoSource CT data and converting it into readable text summaries. We’ll focus specifically on the .yml configuration. This workflow automates:
- Checking for new morphosource-updates releases
- Generating a timestamp
- Calling a custom Python script to transform the release content into a concise summary
- Publishing a new release in GitHub with the generated text
Below is the workflow file, named CT to Text, annotated step-by-step.
1. Workflow Trigger: on: workflow_run
The workflow is triggered whenever another workflow named Parse MorphoSource Data completes. This ensures the CT-to-text analysis runs right after data extraction is done. The types: [completed] indicates we only care about the completion event of that workflow.
on:
workflow_run:
workflows: ["Parse MorphoSource Data"]
types: [completed]
2. Job Configuration: jobs: ct_text_job
This is where we define our main job. It runs on Ubuntu and will only proceed if the previous workflow run was successful (if: $).
ct_text_job:
runs-on: ubuntu-latest
if: $
3. Step 1 – Check out your repository
We use actions/checkout to clone the repository so we can access our Python script and other files.
- name: Check out repo
uses: actions/checkout@v3
4. Step 2 – Set up Python
We specify Python version 3.9 with actions/setup-python. This ensures the environment is ready to install our dependencies.
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
5. Step 3 – Install Dependencies
We install the required Python packages—requests, beautifulsoup4, and openai. These libraries let us handle HTTP requests, parse HTML, and call OpenAI’s API, respectively.
- name: Install Dependencies
run: pip install requests beautifulsoup4 openai
6. Step 4 – Fetch Latest Release
Using curl, we grab the latest release from the current repository’s GitHub Releases. We store the response in a JSON file (latest_release.json) for debugging and extract key data:
- body: The release’s full text body
- tag_name: A short name or version for that release
We save these values as step outputs, so subsequent steps can easily reference them.
- name: Fetch Latest Release
id: fetch_release
env:
GITHUB_TOKEN: $
run: |
response=$(curl -sSL -H "Authorization: Bearer $GITHUB_TOKEN" \
https://api.github.com/repos/$/releases/latest)
...
echo "release_body<<EOF" >> "$GITHUB_OUTPUT"
echo "$body" >> "$GITHUB_OUTPUT"
echo "EOF" >> "$GITHUB_OUTPUT"
echo "release_tag=$tag_name" >> "$GITHUB_OUTPUT"
7. Step 5 – Check if morphosource-updates
We inspect the tag_name to see if it starts with morphosource-updates-. If true, we mark an output is_morphosource=true; otherwise, we skip the rest of the pipeline.
- name: Check if morphosource-updates
id: check_morpho
run: |
TAG_NAME="$"
...
if [[ "$TAG_NAME" == morphosource-updates-* ]]; then
echo "is_morphosource=true" >> "$GITHUB_OUTPUT"
else
echo "is_morphosource=false" >> "$GITHUB_OUTPUT"
fi
8. Step 6 – Generate Timestamp
If we have a morphosource-updates release, we create a timestamp for versioning. This step outputs a formatted date/time string like 2025-01-08_13-45-22.
- name: Generate Timestamp
id: gen_ts
if: steps.check_morpho.outputs.is_morphosource == 'true'
run: |
TS=$(date +'%Y-%m-%d_%H-%M-%S')
echo "timestamp=$TS" >> "$GITHUB_OUTPUT"
9. Step 7 – Run CT to Text
Here we run our custom Python script, ct_to_text.py. It takes the release_body and processes it with OpenAI. The multi-line text summary is stored in ct_output.txt and then passed along as an output (description) for later steps.
- name: Run CT to Text
id: ct2text
if: steps.check_morpho.outputs.is_morphosource == 'true'
env:
OPENAI_API_KEY: $
run: |
echo "$" > release_body.txt
python .github/scripts/ct_to_text.py release_body.txt > ct_output.txt
echo "description<<EOF" >> "$GITHUB_OUTPUT"
cat ct_output.txt >> "$GITHUB_OUTPUT"
echo "EOF" >> "$GITHUB_OUTPUT"
10. Step 8 – Create or Update Release with AI Summary
With our summary (description) in hand, we leverage the actions/create-release action to publish a new release. Its tag is prefixed with ct_to_text_analysis-, plus the timestamp. The body of the release is the AI-generated summary.
- name: Create or Update Release with AI Summary
if: steps.check_morpho.outputs.is_morphosource == 'true'
uses: actions/create-release@v1
env:
GITHUB_TOKEN: $
with:
tag_name: "ct_to_text_analysis-$"
release_name: "CT to Text Analysis #$"
body: $
draft: false
prerelease: false
11. Step 9 – Fallback step if not morphosource-updates
When no new morphosource-updates tag is found, this step logs that no CT-to-text analysis is performed.
- name: No new morphosource release
if: steps.check_morpho.outputs.is_morphosource == 'false'
run: echo "No 'morphosource-updates-*' release found. Skipping CT-to-text analysis."
Conclusion
This CT to Text GitHub Actions workflow automates the detection of new morphosource-updates releases, runs a custom Python script for AI-based text generation, and publishes the result as a GitHub release. By breaking down each step, you can adapt and extend this workflow for additional transformations, checks, or notifications. Stay tuned for our next post, where we’ll dive into the accompanying Python script (ct_to_text.py) to see how it transforms CT metadata into a human-readable summary.
← Previous Post $~~~~~~~~~~~$ Next Post →