Hackathon Showcase

AgentOne

Building Sage AI

YouTube Video

Project Description

AI CO-SCIENTIST: ADVANCED MULTI-AGENT SCIENTIFIC HYPOTHESIS GENERATION SYSTEM

AI Co-Scientist is a sophisticated multi-agent AI system designed to accelerate scientific research through automated hypothesis generation, evaluation, and refinement. The project implements Google’s Agent Development Kit (ADK) framework to orchestrate six specialized AI agents that work collaboratively to generate novel, scientifically rigorous hypotheses across diverse research domains.

The system represents a significant advancement in AI-assisted scientific discovery, combining state-of-the-art language models with systematic scientific methodology to produce research-grade hypotheses that meet peer-review standards.

MEETING JUDGING CRITERIA

LLM DEPLOYMENT

The project features comprehensive LLM deployment across multiple platforms and technologies:

Google Cloud Run with NVIDIA L4 GPU Deployment

Primary deployment: Gemma 3 12B model on Google Cloud Run
Hardware: NVIDIA L4 GPU (24GB VRAM, 7424 CUDA cores)
Infrastructure: 8 vCPUs, 32GB RAM with auto-scaling capabilities
Service endpoint: RESTful API with streaming support

Multi-Provider API Integration

GROQ API: High-performance inference for Llama 3.3 70B, Gemma 2 9B, and Qwen 3 32B
OpenAI API: Advanced reasoning with o3-mini for critical evaluation tasks
Tavily API: Real-time knowledge retrieval and literature search

Unified LLM Client Architecture

Strategic model selection based on task requirements
Automatic fallback mechanisms for reliability
Consistent interface across different API providers
Performance monitoring and optimization

AGENT DEVELOPMENT

The system implements a sophisticated 6-agent architecture using Google’s Agent Development Kit:

Generation Agent (Gemma 3 12B - Deployed on L4 GPU)

Generates novel scientific hypotheses using creative reasoning
Implements systematic literature grounding and ethical assessment
Produces 3-8 hypotheses per query with confidence scoring
Features interdisciplinary integration and theoretical foundation requirements

Proximity Agent (Llama 4 Scout 17B via GROQ)

Retrieves and analyzes relevant knowledge from multiple sources
Integrates Tavily search service for real-time literature access
Performs knowledge gap analysis and reproducibility assessment
Provides technology readiness and funding landscape evaluation

Reflection Agent (OpenAI o3-mini)

Conducts rigorous scientific critique and evaluation
Implements test-time compute scaling with recursive self-critique
Evaluates validity, novelty, feasibility, and impact potential
Provides uncertainty quantification with confidence intervals

Ranking Agent (Qwen 3 32B via GROQ)

Performs multi-criteria hypothesis ranking using weighted evaluation
Implements Elo rating system with Bayesian updating
Conducts strategic assessment and competitive landscape analysis
Provides portfolio-level optimization recommendations

Evolution Agent (Llama 3.3 70B via GROQ)

Iteratively refines top hypotheses through advanced evolutionary strategies
Implements multi-objective optimization with convergence tracking
Features collaboration integration and impact amplification
Produces enhanced hypotheses with quantitative improvement metrics

Meta-Review Agent (OpenAI o3-mini)

Performs comprehensive final evaluation and research planning
Creates detailed experimental plans with resource analysis
Provides strategic assessment and commercialization pathways
Generates executive summaries with go/no-go decision frameworks

AGENT DEPLOYMENT

The agent deployment showcases advanced orchestration and scalability:

Google Agent Development Kit (ADK) Integration

Production-grade agent framework with workflow orchestration
Sequential and parallel processing capabilities
Built-in memory services and session management
Tool integration for search, computation, and data access

FastAPI Production Deployment

RESTful API with comprehensive endpoint coverage
Interactive documentation (Swagger UI and ReDoc)
API key authentication and security measures
Real-time processing with advanced terminal UI

Scalable Infrastructure

MongoDB database integration with JSON fallback
Enhanced memory service with session tracking
Comprehensive logging and monitoring systems
Docker containerization support

Production-Ready Features

Error handling and graceful fallback mechanisms
Performance monitoring and optimization
Comprehensive test suite with unit and integration tests
Health checks and system diagnostics

UNIQUE FEATURES & USER EXPERIENCE

INNOVATIVE TECHNICAL FEATURES

Auto Query Generation System

Automated scientific query generation using deployed Gemma 3 12B
Covers diverse scientific domains (biomedical, materials, environmental, AI/ML, energy)
Produces high-quality, testable research questions
Simple /sample endpoint for easy integration and testing

Test-Time Compute Scaling

Recursive self-critique with minimum 3 iterations for quality improvement
Dynamic reasoning chains that adapt to hypothesis complexity
Convergence tracking to ensure optimal solution discovery
Enhanced reasoning capability through iterative refinement

Advanced Memory System

MongoDB-based persistent storage with version control
Hypothesis evolution tracking and lineage management
Session analytics and performance metrics
Related hypothesis discovery using semantic similarity

Elo Rating System

Tournament-style hypothesis evaluation inspired by Google’s AI co-scientist
Dynamic ranking with confidence assessment
Bayesian updating for improved evaluation accuracy
Comparative analysis across multiple evaluation criteria

Multi-Model Strategic Architecture

Optimal model selection for each agent’s specialized task
Creative generation (Gemma 3 12B) vs. analytical reasoning (o3-mini)
High-performance inference (GROQ) vs. advanced reasoning (OpenAI)
Cost-performance optimization across the entire pipeline

ENHANCED USER EXPERIENCE

Real-Time Visual Feedback

Colored terminal output with progress bars and status indicators
Live agent activity logs with timing information
Hypothesis summaries with scoring metrics
Success/error indicators with detailed messages

Comprehensive API Design

RESTful endpoints with clear request/response patterns
Interactive documentation for easy exploration
Sample query generation for immediate testing
Health monitoring and system status endpoints

Scientific Rigor Interface

Peer-review level evaluation criteria
Confidence intervals and uncertainty quantification
Literature citations and theoretical grounding
Experimental plan generation with resource requirements

Flexible Integration Options

Python client libraries for programmatic access
JSON API for language-agnostic integration
Streaming support for real-time hypothesis generation
Batch processing capabilities for research workflows

TECHNICAL STACK & FRAMEWORKS

CORE TECHNOLOGIES

AI/ML Frameworks

Google Agent Development Kit (ADK) v1.7.0 for agent orchestration
Multiple state-of-the-art LLMs (Gemma 3, Llama 3.3, o3-mini, Qwen 3)
Advanced prompt engineering with scientific methodology integration
Vector embeddings and semantic search capabilities

Cloud Infrastructure

Google Cloud Run with NVIDIA L4 GPU for model hosting
Auto-scaling serverless architecture
MongoDB Atlas for production-grade data persistence
Multi-region deployment capability

API & Web Technologies

FastAPI v0.104.1 for high-performance web framework
Pydantic v2.5.0 for data validation and serialization
Uvicorn with standard features for ASGI server
CORS middleware for cross-origin request support

External Services Integration

GROQ API for high-performance LLM inference
OpenAI API for advanced reasoning capabilities
Tavily Search API for real-time knowledge retrieval
Google Cloud services for model deployment and scaling

LIBRARIES & TOOLS

Data & Storage

PyMongo v4.0.0+ for MongoDB database operations
Python-dotenv v1.0.0 for environment configuration
JSON serialization with custom datetime encoding
File-based fallback storage for reliability

AI & Search

Tavily-python v0.3.0+ for scientific literature search
Google-auth v2.40.3 for cloud service authentication
Advanced prompt templates with scientific methodology
Multi-criteria evaluation frameworks

Development & Quality

Comprehensive test suite with pytest
Type hints and Pydantic models for data integrity
Structured logging with configurable levels
Error handling with graceful degradation

User Interface

Jinja2 v3.1.2 for template rendering
Terminal color support for enhanced user experience
Progress tracking and status visualization
Interactive API documentation

SYSTEM ARCHITECTURE HIGHLIGHTS

WORKFLOW ORCHESTRATION

The system implements a sophisticated workflow that processes scientific queries through six specialized phases:

Query Processing: Input validation and parameter configuration
Hypothesis Generation: Creative idea generation using Gemma 3 12B
Knowledge Retrieval: Literature search and context gathering
Critical Analysis: Scientific evaluation and validity assessment
Ranking & Selection: Multi-criteria evaluation and prioritization
Evolution & Refinement: Iterative improvement of top hypotheses
Meta-Review: Final evaluation and experimental planning

DATA FLOW & PERSISTENCE

All queries and responses are stored in MongoDB with JSON fallback
Session tracking maintains complete research history
Hypothesis evolution is tracked with lineage information
Performance metrics and analytics are continuously collected
Memory service enables related hypothesis discovery

DEPLOYMENT & SCALABILITY

Containerized deployment with Docker support
Auto-scaling based on demand with zero-cost idle periods
Multi-region deployment capability for global accessibility
Comprehensive monitoring and health checking systems
Production-ready error handling and recovery mechanisms

VALIDATION & QUALITY ASSURANCE

TESTING FRAMEWORK

Unit tests for individual agent functionality
Integration tests for complete workflow validation
Performance benchmarking and optimization
Error scenario testing and recovery validation
API endpoint testing with comprehensive coverage

SCIENTIFIC RIGOR

Peer-review level evaluation criteria implementation
Literature grounding requirements for all hypotheses
Ethical framework integration and bias assessment
Reproducibility standards and transparency protocols
Expert-in-the-loop validation framework

IMPACT & APPLICATIONS

RESEARCH ACCELERATION

The AI Co-Scientist system accelerates scientific discovery by:

Generating novel hypotheses in minutes rather than weeks
Providing systematic evaluation and ranking of research ideas
Offering detailed experimental plans and resource requirements
Connecting researchers with relevant literature and collaboration opportunities

DOMAIN COVERAGE

The system supports research across multiple scientific domains:

Biomedical research and drug discovery
Materials science and nanotechnology
Environmental science and sustainability
Artificial intelligence and machine learning
Energy systems and renewable technologies
Interdisciplinary research opportunities

CONCLUSION

AI Co-Scientist represents a significant advancement in AI-assisted scientific research, combining state-of-the-art language models with systematic scientific methodology. The project demonstrates excellence in LLM deployment through its Google Cloud Run implementation with NVIDIA L4 GPU, sophisticated agent development using Google’s ADK framework, and production-ready agent deployment with comprehensive API integration.

The system’s unique features, including auto query generation, test-time compute scaling, advanced memory systems, and multi-model strategic architecture, provide an enhanced user experience that meets the rigorous standards expected in scientific research environments. The comprehensive technical stack leverages cutting-edge frameworks and tools to deliver a robust, scalable, and scientifically rigorous platform for hypothesis generation and evaluation.

This implementation positions AI Co-Scientist as a valuable tool for researchers, institutions, and organizations seeking to accelerate scientific discovery through AI-powered hypothesis generation and systematic evaluation methodologies.

Team

Harshil Patel

Roger Sentongo

Products & Tools

GPUs Google Cloud

Additional Links

https://github.com/lucidopus/ai-co-scientist

Backend

Summarizing URL...

https://github.com/lucidopus/sage-ai

Frontend

Summarizing URL...