py-why · grace-sng7 · May 19, 2025 · May 19, 2025 · May 19, 2025 · May 20, 2025
diff --git a/README.md b/README.md
@@ -28,6 +28,7 @@ PyWhy-LLM seamlessly integrates into your existing causal inference process. Imp
 from pywhyllm.suggesters.model_suggester import ModelSuggester 
 from pywhyllm.suggesters.identification_suggester import IdentificationSuggester
 from pywhyllm.suggesters.validation_suggester import ValidationSuggester
+from pywhyllm.suggesters.augmented_model_suggester import AugmentedModelSuggester
 from pywhyllm import RelationshipStrategy
 
 ```
@@ -49,11 +50,22 @@ domain_expertises = modeler.suggest_domain_expertises(all_factors)
 # Suggest a set of potential confounders
 suggested_confounders = modeler.suggest_confounders(treatment, outcome, all_factors, domain_expertises)
 
-# Suggest pair-wise relationship between variables
+# Suggest pair-wise relationships between variables
 suggested_dag = modeler.suggest_relationships(treatment, outcome, all_factors, domain_expertises, RelationshipStrategy.Pairwise)
 ```
 
+### Retrieval Augmented Generation (RAG)-based Modeler
 
+```python
+# Create instance of Modeler
+modeler = AugmentedModelSuggester('gpt-4')
+
+treatment = "smoking"
+outcome = "lung cancer"
+
+# Suggest pair-wise relationship between two given variables, utilizing CauseNet for RAGing the LLM
+suggested_relationship = modeler.suggest_relationships(treatment, outcome)
+```
 
 ### Identifier
 

diff --git a/docs/notebooks/augmented_model_suggester_examples.ipynb b/docs/notebooks/augmented_model_suggester_examples.ipynb
@@ -0,0 +1,156 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "pip install dotenv"
+      ],
+      "metadata": {
+        "id": "cmZerbMu6Uk4"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "EulKv3Km4nMa"
+      },
+      "outputs": [],
+      "source": [
+        "from dotenv import load_dotenv\n",
+        "import os\n",
+        "\n",
+        "load_dotenv()\n",
+        "\n",
+        "os.environ[\"OPENAI_API_KEY\"] = '' # specify your key here"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "pip install pywhyllm"
+      ],
+      "metadata": {
+        "collapsed": true,
+        "id": "83sxVcP97xlH"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Here we introduce the AugmentedModelSuggester class. Creating an instance of it enables the chosen LLM to utilize Retrieval Augmented Generation (RAG) to determine causality. It currently does this by searching the CauseNet dataset for a relevant causal pair and augmenting the LLM with the corresponding evidence/information stored in the dataset.\n",
+        "\n",
+        "CauseNet is a large-scale knowledge base of causal relations extracted from the web, created by Heindorf et al. (2020). CauseNet is available at [causenet.org](https://causenet.org) and is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)."
+      ],
+      "metadata": {
+        "id": "DjYECuX84vbN"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from pywhyllm.suggesters.augmented_model_suggester import AugmentedModelSuggester\n",
+        "\n",
+        "model = AugmentedModelSuggester('gpt-4')"
+      ],
+      "metadata": {
+        "id": "VdfEKuDLEYcU"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "AugmentedModelSuggester can suggest the pairwise relationship given two variables. If a relevant causal pair is found in CauseNet, the LLM is augmented with the aforementioned information in CauseNet. If not found, by default, the LLM will rely on its own knowledge."
+      ],
+      "metadata": {
+        "id": "dES0LwHV57eX"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "result = model.suggest_pairwise_relationship(\"smoking\", \"lung cancer\")"
+      ],
+      "metadata": {
+        "id": "D85ec6Pk5JzA"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "result"
+      ],
+      "metadata": {
+        "id": "W3bFehXh5SQl"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "result = model.suggest_pairwise_relationship(\"income\", \"exercise level\")"
+      ],
+      "metadata": {
+        "id": "odFkp921hQsX"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "result"
+      ],
+      "metadata": {
+        "id": "ZIeStj9OwIPe"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "result = model.suggest_pairwise_relationship(\"flooding\", \"rain\")"
+      ],
+      "metadata": {
+        "id": "Fm5XCFrRwKsV"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "result"
+      ],
+      "metadata": {
+        "id": "HDo098ICwzi7"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}