Skip to content

Creating augmented suggester #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
a82e872
Created new augmented_model_suggester and corresponding utils.
grace-sng7 May 19, 2025
63e6d82
Updated dependencies according to augmented_model_suggester and corre…
grace-sng7 May 19, 2025
b81e676
Updated LLM query prompt.
grace-sng7 May 19, 2025
2af6350
Minor fixes after testing AugmentedModelSuggester.
grace-sng7 May 20, 2025
e349bf5
Edited CauseNet search function.
grace-sng7 May 25, 2025
633d76d
Updated README.md to include augmented_model_suggester
grace-sng7 May 25, 2025
a474e5d
Update README.md
grace-sng7 May 27, 2025
d31fc79
Merge pull request #53 from grace-sng7/creating_augmented_suggester
grace-sng7 May 27, 2025
6b71deb
Update README.md
grace-sng7 May 27, 2025
d763517
Added augmented model suggester examples notebook
grace-sng7 May 27, 2025
e2f57e4
Merge pull request #55 from grace-sng7/creating_augmented_suggester
grace-sng7 May 27, 2025
072adc4
Uploaded augmented model suggester examples notebook again.
grace-sng7 May 27, 2025
8d82bd9
Merge branch 'py-why:creating_augmented_suggester' into creating_augm…
grace-sng7 May 27, 2025
fd17322
Merge pull request #56 from grace-sng7/creating_augmented_suggester
grace-sng7 May 27, 2025
05d9aa9
Set to ignore notebook testing for augmented model suggester examples
grace-sng7 May 27, 2025
0d1c2b5
Merge pull request #57 from grace-sng7/augmented_suggester
grace-sng7 May 27, 2025
00b61a2
Updated augmented_model_suggester_examples notebooks, docstrings, and…
grace-sng7 Jun 9, 2025
83a968f
Merge pull request #58 from grace-sng7/augmented_suggester
grace-sng7 Jun 9, 2025
2c6c7c2
Updated citations
grace-sng7 Jun 11, 2025
bfde305
Merge pull request #59 from grace-sng7/augmented_suggester
grace-sng7 Jun 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ PyWhy-LLM seamlessly integrates into your existing causal inference process. Imp
from pywhyllm.suggesters.model_suggester import ModelSuggester
from pywhyllm.suggesters.identification_suggester import IdentificationSuggester
from pywhyllm.suggesters.validation_suggester import ValidationSuggester
from pywhyllm.suggesters.augmented_model_suggester import AugmentedModelSuggester
from pywhyllm import RelationshipStrategy

```
Expand All @@ -49,11 +50,22 @@ domain_expertises = modeler.suggest_domain_expertises(all_factors)
# Suggest a set of potential confounders
suggested_confounders = modeler.suggest_confounders(treatment, outcome, all_factors, domain_expertises)

# Suggest pair-wise relationship between variables
# Suggest pair-wise relationships between variables
suggested_dag = modeler.suggest_relationships(treatment, outcome, all_factors, domain_expertises, RelationshipStrategy.Pairwise)
```

### Retrieval Augmented Generation (RAG)-based Modeler

```python
# Create instance of Modeler
modeler = AugmentedModelSuggester('gpt-4')

treatment = "smoking"
outcome = "lung cancer"

# Suggest pair-wise relationship between two given variables, utilizing CauseNet for RAGing the LLM
suggested_relationship = modeler.suggest_relationships(treatment, outcome)
```

### Identifier

Expand Down
156 changes: 156 additions & 0 deletions docs/notebooks/augmented_model_suggester_examples.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "code",
"source": [
"pip install dotenv"
],
"metadata": {
"id": "cmZerbMu6Uk4"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EulKv3Km4nMa"
},
"outputs": [],
"source": [
"from dotenv import load_dotenv\n",
"import os\n",
"\n",
"load_dotenv()\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = '' # specify your key here"
]
},
{
"cell_type": "code",
"source": [
"pip install pywhyllm"
],
"metadata": {
"collapsed": true,
"id": "83sxVcP97xlH"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Here we introduce the AugmentedModelSuggester class. Creating an instance of it enables the chosen LLM to utilize Retrieval Augmented Generation (RAG) to determine causality. It currently does this by searching the CauseNet dataset for a relevant causal pair and augmenting the LLM with the corresponding evidence/information stored in the dataset.\n",
"\n",
"CauseNet is a large-scale knowledge base of causal relations extracted from the web, created by Heindorf et al. (2020). CauseNet is available at [causenet.org](https://causenet.org) and is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)."
],
"metadata": {
"id": "DjYECuX84vbN"
}
},
{
"cell_type": "code",
"source": [
"from pywhyllm.suggesters.augmented_model_suggester import AugmentedModelSuggester\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook makes sense.
Can you add some description/text to the notebook so that readers can understand:

  1. Motivation: we are introducing this retrieval augmented suggester. It currently works on a specific cause pairs dataset.
  2. How it works: we fetch the pairs from that dataset.
  3. some documentation on the actual function call: can say users can give any two variable names, and also tell them what would happen if the pair was not found in the database.

"\n",
"model = AugmentedModelSuggester('gpt-4')"
],
"metadata": {
"id": "VdfEKuDLEYcU"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"AugmentedModelSuggester can suggest the pairwise relationship given two variables. If a relevant causal pair is found in CauseNet, the LLM is augmented with the aforementioned information in CauseNet. If not found, by default, the LLM will rely on its own knowledge."
],
"metadata": {
"id": "dES0LwHV57eX"
}
},
{
"cell_type": "code",
"source": [
"result = model.suggest_pairwise_relationship(\"smoking\", \"lung cancer\")"
],
"metadata": {
"id": "D85ec6Pk5JzA"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"result"
],
"metadata": {
"id": "W3bFehXh5SQl"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"result = model.suggest_pairwise_relationship(\"income\", \"exercise level\")"
],
"metadata": {
"id": "odFkp921hQsX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"result"
],
"metadata": {
"id": "ZIeStj9OwIPe"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"result = model.suggest_pairwise_relationship(\"flooding\", \"rain\")"
],
"metadata": {
"id": "Fm5XCFrRwKsV"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"result"
],
"metadata": {
"id": "HDo098ICwzi7"
},
"execution_count": null,
"outputs": []
}
]
}
Loading
Loading