Overview
Production Readiness
0.6
Novelty Score
0.5
Cost Impact Score
0.6
Citation Count
2
Why It Matters For Business
Automating BCO creation cuts manual work for documenting legacy bioinformatics workflows and speeds evaluation, handoff, and regulatory review when human verification is applied.
Summary TLDR
This paper builds a proof-of-concept tool (BCO assistant) that uses Retrieval-Augmented Generation (RAG) plus LLMs to auto-generate IEEE BioCompute Objects (BCOs) from scientific papers and optional GitHub repos. Key engineering choices: per-domain prompts, chunk+embed+vector-store retrieval, two-pass retrieval with a cross-encoder re-ranker, optional repo ingestion, and integrated automated and human evaluation. Code and docs are open on GitHub. The tool lowers manual work for retroactive documentation but still needs human review for missing repo-level details and to catch hallucinations.
Problem Statement
Creating standard-compliant BioCompute Objects for past bioinformatics studies is time-consuming. Papers often omit workflow details that live in external code repos. Manual BCO creation is a barrier to reproducibility and adoption of the IEEE BCO standard.
Main Contribution
A working BCO assistant that ingests a paper (PDF) and optional GitHub repo to auto-generate per-domain BCO JSON.
A RAG pipeline with chunking, embeddings, top-k retrieval, and a two-pass re-ranking (cross-encoder) to improve relevance.
Standardized, per-domain prompts and a split retrieval/LMM prompting strategy to reduce hallucination and improve schema conformity.
Optional GitHub ingestion to capture parametric and description details missing from papers.
Integrated evaluation stack: automated metrics via DeepEval plus a human evaluation UI and parameter-search wrappers for testing.
Key Findings
RAG plus LLMs can produce domain-specific BCO text from papers and repos.
Two-pass retrieval with a cross-encoder reranker improved output quality compared to only embedding-based retrieval.
Important run parameters and detailed pipeline steps are often only present in linked GitHub repositories, not in the paper.
Results
Answer relevancy (automated)
Faithfulness (automated)
Who Should Care
What To Try In 7 Days
Clone the repo and run BCO assistant on one paper using default settings.
Index a paper plus its public GitHub repo to see how repo ingestion changes outputs.
Run the provided evaluation UI to compare generated vs human-curated domains for one workflow.
Optimization Features
Token Efficiency
- Per-domain generation reduces effective context needs
Infra Optimization
- Planned microservices architecture for scaling and experimentation
System Optimization
- Two-pass retrieval with cross-encoder reranker
- Split retrieval embedding and LLM prompt to avoid polluting similarity scores
Reproducibility
Code Available
Open Source Status
- yes
Risks & Boundaries
Limitations
- Could not test largest frontier open-source LLMs due to compute and cost limits.
- Tool depends on external libraries (e.g., LlamaIndex) which may hide low-level behavior.
- Papers often omit parametric and file-location details; public GitHub is required to fill those gaps.
- Automated evaluation is imperfect; human review remains necessary for accuracy and regulatory use.
- Forcing strict JSON output can harm LLM reasoning and content quality.
When Not To Use
- If the paper has no linked public code or data and precise run parameters are required.
- For final regulatory submissions without human verification.
- When the GitHub repository is private or inaccessible.
Failure Modes
- LLM hallucination leading to incorrect or fabricated parameter values.
- Missing parametric or file-location details when repos are not indexed.
- JSON formatting or schema validation errors from constrained outputs.
- Retrieval misses relevant content buried in long contexts.
Core Entities
Models
- OpenAI API (unspecified LLMs used via API)
- Llama 3.1 (discussed but not used due to compute limits)
Metrics
- answer relevancy
- faithfulness

