Metadata-to-Morphsource-compare

Metadata-to-Morphosource Compare

Tests

A user-friendly tool for researchers to compare their specimen metadata with Morphosource database records and verify voxel spacing values.

🤖 NEW: AI-Powered MorphoSource Query System

Try our interactive query system at: https://johntrue15.github.io/Metadata-to-Morphsource-compare/

Ask natural language questions like:

The system uses GitHub Actions to process your queries through:

  1. ChatGPT Query Formatter - Converts natural language to optimized API queries
  2. MorphoSource API - Searches the database with formatted queries
  3. ChatGPT Response Processor - Analyzes results and provides natural language responses

How It Works

  1. Submit a Query: Visit the GitHub Pages site and enter your question
  2. Create Issue: Click to create a GitHub Issue (requires free GitHub account)
  3. Auto-Trigger: The issue automatically triggers the query processor workflow
  4. Sequential Processing:
    • Job 1: ChatGPT formats your natural language query into an optimized API call
    • Job 2: MorphoSource API searches for relevant data using the formatted query
    • Job 3: ChatGPT processes the results and generates a natural language response
  5. Get Results: Results are posted as a comment on your issue + you get notified

Why Issues? This approach eliminates HTTP 401 errors by using GitHub’s native issue system instead of API authentication.

Setting Up (For Repository Owner)

To enable the query system:

  1. Enable GitHub Pages:
    • Go to repository Settings → Pages
    • Under Source, select GitHub Actions
    • The site will be automatically deployed
  2. Configure API Keys:
    • Go to Settings → Secrets and variables → Actions
    • Add these secrets:
      • OPENAI_API_KEY - Your OpenAI API key
      • MORPHOSOURCE_API_KEY - Your MorphoSource API key (optional)

Using the Query System

  1. Visit the GitHub Pages site
  2. Enter your question in the text box
  3. Click “Prepare to Submit Query”
  4. Click the link to create a GitHub Issue (requires GitHub account)
  5. Submit the pre-filled issue
  6. Wait for results to be posted as a comment on your issue (usually within 1-2 minutes)
  7. Optionally download artifacts from the Actions tab for detailed JSON responses

For Researchers: Quick Start Guide

Step 1: Upload your CSV file to the data/csv/ folder in this repository

Step 2: Run the comparison workflow

Step 3: Access your results

That’s it! The system will match your specimen data against Morphosource records and verify the voxel spacing values.

What This Tool Does

This repository helps researchers to:

  1. Compare your local specimen metadata with Morphosource database records
  2. Match specimens based on catalog numbers and taxonomic information
  3. Verify voxel spacing values in CT scans between your records and Morphosource

Understanding Your Results

You’ll receive two CSV files with your results:

  1. matched.csv - Shows which of your specimens were found in Morphosource:
    • Contains all your original data
    • Adds a Match_Found column (yes/no)
    • Adds Morphosource_URL links to matching records
    • Adds Match_Score showing the confidence of each match (higher is better)
  2. confirmed_matches.csv - Verifies the voxel spacing values:
    • Includes a voxel_spacing_verified column showing if values match
    • Contains API voxel spacing values for comparison
    • Helps identify any discrepancies between your data and Morphosource

What The Verification Status Means

In your results, the voxel_spacing_verified column will show:

Technical Details

Required Files and Structure

The repository uses this file structure:

├── data/
│   ├── csv/          # Upload your CSV files here
│   ├── json/         # Contains the Morphosource database
│   └── output/       # Where results are saved (created automatically)
├── compare.py        # Comparison script
├── verify_pixel_spacing.py # Verification script
└── run_comparison.py # Helper script for local execution

Required API Keys (For Repository Maintainers)

To use the verification feature and AI assistant backend, configure these API keys:

  1. Go to repository Settings
  2. Select “Secrets and variables” → “Actions”
  3. Click “New repository secret”
  4. Add the following secrets:
    • Name: MORPHOSOURCE_API_KEY Value: Your Morphosource API key
    • Name: OPENAI_API_KEY Value: Your OpenAI API key (for backend chat processing)
  5. Click “Add secret” for each

Note: The OPENAI_API_KEY is optional. The AI assistant works client-side with users providing their own keys. The repository secret is only needed if you want to implement a backend API to handle API keys centrally.

Running Locally (For Advanced Users)

If you prefer to run the tool locally rather than through GitHub Actions:

# Clone the repository
git clone https://github.com/yourusername/Metadata-to-Morphsource-compare.git
cd Metadata-to-Morphsource-compare

# Place your CSV in the data/csv directory

# Run the comparison
python run_comparison.py --csv "Your CSV Filename.csv" --api-key "your-api-key-here"

For Developers

Testing

This project includes a comprehensive test suite. To run tests locally:

# Install test dependencies
pip install -r requirements-test.txt

# Run all tests
pytest tests/

# Run tests with coverage
pytest tests/ --cov=. --cov-report=term

For detailed testing information, see TESTING.md.

Continuous Integration

Tests automatically run on:

View test results in the Actions tab.

Troubleshooting

Common Issues

  1. “File not found” errors:
    • Make sure your CSV file is correctly uploaded to the data/csv/ folder
    • Check that you entered the exact filename in the workflow
  2. Column name warnings:
    • The script looks for specific column names for voxel spacing data
    • If your CSV uses different column names, the system will still work but may not match all values
    • Columns with voxel spacing should ideally be named: x_voxel_spacing_mm, y_voxel_spacing_mm, z_voxel_spacing_mm
  3. No matches found:
    • Check that your specimen identifiers (catalog numbers) match the format in Morphosource
    • The system uses both catalog numbers and taxonomic information for matching

Getting Help

If you encounter issues:

  1. Check the workflow run logs for error messages
  2. Verify your CSV format matches expected columns
  3. Contact repository maintainers for assistance

License

[Your license information here]