Skip to content
View epaunova's full-sized avatar

Block or report epaunova

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. llm-prompt-engineering-guide llm-prompt-engineering-guide Public

    Python 1

  2. LLM-Drift-Observatory LLM-Drift-Observatory Public

    A hands-on framework for detecting and visualizing **behavioral drift** in Large Language Models (LLMs) across versions and providers.

    Jupyter Notebook

  3. LLM-Scoring-Dashboard-Streamlit-OpenAI-eval-UI- LLM-Scoring-Dashboard-Streamlit-OpenAI-eval-UI- Public

    Streamlit-based interactive dashboard to evaluate LLM outputs on key qualitative metrics: Factuality Clarity Style

    Python 1

  4. LLM-eval-benchmark-lab LLM-eval-benchmark-lab Public

    A modular, configurable benchmarking harness for evaluating LLM behavior across tasks, constraints, and model classes.

    Python 1

  5. Prompt-Efficiency-Sandbox Prompt-Efficiency-Sandbox Public

    * Compare 3 versions of a prompt against: * GPT-3.5 * Mistral * QLoRA-based small model * Metrics: token usage, latency, eval score

    Jupyter Notebook 1

  6. LLM-Evaluation-Toolkit LLM-Evaluation-Toolkit Public

    A practical toolkit for evaluating LLM outputs using GPT-based auto-grading. Designed for product teams to benchmark factuality, coherence, and tone in real-world use cases.

    Python 1