Challenges Developing CT to Text in Production: Workflow Integration and AI Prompting
Moving our CT to Text pipeline from development to production revealed several challenges in workflow integration, data passing, and AI prompt engineering. Here’s how we tackled these challenges and created a robust production system.
Workflow Integration Challenges
1. Coordinating Multiple Workflows
The key challenge was getting two GitHub Actions workflows to work together seamlessly:
- Parse MorphoSource workflow: Extracts and processes CT data
- CT to Text workflow: Generates descriptive text using AI
# CT to Text workflow trigger
on:
workflow_run:
workflows: ["Parse MorphoSource Data"]
types: [completed]
2. Data Passing Between Workflows
We needed to reliably pass the release body text from Parse MorphoSource to CT to Text:
- name: Fetch Latest Release
id: fetch_release
env:
GITHUB_TOKEN: $
run: |
response=$(curl -sSL -H "Authorization: Bearer $GITHUB_TOKEN" \
https://api.github.com/repos/$/releases/latest)
body=$(echo "$response" | jq -r .body)
echo "release_body<<EOF" >> "$GITHUB_OUTPUT"
echo "$body" >> "$GITHUB_OUTPUT"
echo "EOF" >> "$GITHUB_OUTPUT"
OpenAI Integration Challenges
1. Prompt Engineering
Creating an effective prompt for scientific content was crucial:
def generate_text_for_records(records):
user_content = """Analyze these CT scan records and generate a scientific summary.
Focus on:
- Taxonomic classification
- Anatomical features
- Morphological significance
- Research implications
For each record, provide:
- ~200 word description
- Technical but accessible language
- Highlight key findings
Records:
"""
for record in records:
user_content += f"""
Record #{record['record_number']}:
- Title: {record['title']}
- Object: {record['Object']}
- Taxonomy: {record['Taxonomy']}
"""
2. Error Handling and Retries
We implemented robust error handling for API calls:
def call_openai_with_retry(prompt, max_retries=3):
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gpt-4-1106-preview",
messages=[
{"role": "system", "content": "You are a scientific writer..."},
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
except Exception as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # Exponential backoff
Release Management Challenges
1. Version Control and Naming
We implemented a systematic versioning approach:
- name: Generate Timestamp
id: gen_ts
run: |
TS=$(date +'%Y-%m-%d_%H-%M-%S')
echo "timestamp=$TS" >> "$GITHUB_OUTPUT"
- name: Create Release
with:
tag_name: "ct_to_text_analysis-$"
release_name: "CT to Text Analysis #$"
2. Release Body Formatting
Ensuring consistent formatting of AI-generated text in releases:
def format_release_body(generated_text):
sections = []
for record in parse_text_into_records(generated_text):
sections.append(f"""
## Record {record['number']}
{record['description']}
### Technical Details
- Object Type: {record['object_type']}
- Taxonomic Group: {record['taxonomy']}
""")
return "\n".join(sections)
Workflow Optimization Techniques
1. Intelligent Workflow Triggers
We implemented checks to prevent unnecessary processing:
- name: Check if morphosource-updates
id: check_morpho
run: |
TAG_NAME="$"
if [[ "$TAG_NAME" == morphosource-updates-* ]]; then
echo "is_morphosource=true" >> "$GITHUB_OUTPUT"
else
echo "is_morphosource=false" >> "$GITHUB_OUTPUT"
fi
2. Resource Management
Optimizing API usage and processing time:
def process_records_in_batches(records, batch_size=5):
"""Process records in smaller batches to manage resources"""
batches = [records[i:i + batch_size]
for i in range(0, len(records), batch_size)]
results = []
for batch in batches:
batch_text = generate_text_for_records(batch)
results.extend(parse_generated_text(batch_text))
time.sleep(1) # Rate limiting
return results
Best Practices Learned
- Workflow Design
- Clear dependency chains
- Explicit error states
- Comprehensive logging
- AI Integration
- Structured prompts
- Consistent output formats
- Error recovery mechanisms
- Release Management
- Systematic versioning
- Clear naming conventions
- Automated cleanup
Future Improvements
- Enhanced AI Processing
- More sophisticated prompts
- Context-aware summaries
- Multi-model verification
- Workflow Optimization
- Parallel processing
- Caching mechanisms
- Smarter retry logic
- Release Management
- Advanced version tracking
- Automated quality checks
- Change detection
Conclusion
Successfully integrating CT to Text processing in production required careful attention to workflow coordination, data passing, and AI prompt engineering. By implementing robust error handling, systematic versioning, and optimized processing patterns, we’ve created a reliable system for generating high-quality scientific summaries from CT scan data.
This foundation provides a solid base for future enhancements while maintaining the reliability needed for production use. As we continue to evolve the system, these core principles will guide our development of even more sophisticated CT data processing capabilities.
← Previous Post $~~~~~~~~~~~$ Next Post →