Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
fadd879
Cleaned up README.md to be a bit more clean and clear (namely, removi…
Jan 27, 2025
081f799
Cleaned up README.md to be a bit more clean and clear (namely, removi…
Jan 27, 2025
a8d2a3f
Merge branch 'master' into kjo/documentation
Jan 29, 2025
e3ffbac
Added docstring for 'registered_data_hook'
Jan 29, 2025
b81bc6f
Updated docstrings for the ABCs for data hooks
Jan 29, 2025
f2f1d59
Updated the docstrings of data-encoding hooks
Jan 29, 2025
9e094d4
Updated DocStrings for the 'feature_selection.py' data hooks. Also fi…
Jan 30, 2025
07504f4
[Minor] Corrected indentation of use-case to match the rest of the do…
Jan 30, 2025
d4d00e5
Added the docstring for the Imputation data hooks (of which there is …
Jan 30, 2025
fa1c984
Added the docstring for the Standardization data hooks (of which ther…
Jan 30, 2025
9a7ef4f
Added missing docstring for the "evaluate_param" function, used by Sc…
Jan 30, 2025
0ae009d
Updated docstrings for the SciKit-Learn Ensemble models (mostly addin…
Jan 30, 2025
4a496c9
Updated docstring for the Linear models provided by this tool. Note t…
Jan 30, 2025
8406ba2
Fixed incorrect indentation in the use-case examples for Ensemble mod…
Jan 30, 2025
c58a613
Extended the docstring of KNeighborsClassifierManager to include exam…
Jan 30, 2025
cd459ab
Added example usage to the SVC docstring.
Jan 30, 2025
c926333
Extended the docstring of the tuning utility classes, to clarify thei…
Jan 30, 2025
acfdc2b
Update data/hooks/encoding.py
SomeoneInParticular Jan 31, 2025
699378e
Update data/hooks/encoding.py
SomeoneInParticular Jan 31, 2025
b870030
Update README.md
SomeoneInParticular Jan 31, 2025
21864e9
Merge branch 'master' into kjo/documentation
SomeoneInParticular Jan 31, 2025
1035ccb
Update data/hooks/encoding.py
SomeoneInParticular Jan 31, 2025
06805ba
Initial Sphinx docs addition. Still WIP, be gentle
Jan 31, 2025
103b286
Further improves to the documentation, mostly restructuring and clari…
Feb 3, 2025
955ad4e
Minor formatting correction to the FAQ page; its header should now ac…
Feb 3, 2025
5ed3a9c
Initial commit of the Getting Started page
Feb 3, 2025
d5e0bf0
Initial addition of the "walkthrough" for MOOP; still very much WIP
Feb 3, 2025
d3cccdc
Extended the data config documentation to discuss data hooks.
Feb 10, 2025
6f94f83
Initial addition of model configuration + parameter tuning walkthrough
Feb 11, 2025
4e21ebc
[Minor] Grammar correction in model config docs
Feb 11, 2025
cda4fff
Added model config template for easy user reference
Feb 11, 2025
724ec7b
Added the template file for the study config before I forget to later
Feb 12, 2025
5b4cfbf
Initial commit of the study config documentation (walk-through); last…
Feb 12, 2025
9bd1b8f
Some clean-up and re-wiring of cross-references in docs. Should be a …
Feb 12, 2025
6cc6c19
Trimmed the titles of the walkthrough, as they were too elaborate for…
Feb 12, 2025
d80001d
Added missing comma in the model config tutorial JSON example; oops
Feb 12, 2025
8becde2
Yet another missing comma aaaaa
Feb 12, 2025
61ca5c6
Slight correction to the parameters for the output path in the study …
Feb 12, 2025
d030016
Initial results interpretation discussion added to docs
Feb 25, 2025
fc29f8c
Added missing results.csv, using in result interpretation docs
Feb 25, 2025
2a0c6e4
First addition to results, showing common plots which can be generate…
Feb 26, 2025
5c75953
Extended results documentation, showing how to compare two different …
Feb 27, 2025
ed24fc3
Added warning about an edge case bug, where the data is erroneously c…
Feb 27, 2025
a1e28e8
Initial addition of statistical analyses in the walkthrough; still ve…
Feb 27, 2025
f0de7fe
Swapped to PyData theme; should help with eye strain via its dark mode.
Mar 11, 2025
ae23c33
[Minor] Corrected misleading comment in statistics documentation. Oops
Mar 11, 2025
6dddcac
Added section on calculating statistics for single runs, place before…
Mar 11, 2025
3ffe2a3
Added a section detailing some common statistical analyses that can b…
Mar 21, 2025
0ec06cb
Fixed ".. codeblock" appearing erroneously in some doc pages
Mar 27, 2025
937ac0f
Clarified an example of the dataset in question
SomeoneInParticular Mar 27, 2025
457c6d7
Initial ReadTheDocs commit!
Mar 27, 2025
b785516
Clarified null-like data formatting
SomeoneInParticular Mar 27, 2025
78e0865
Changed header of "index" section to remove duplicate adjacent headers
Mar 27, 2025
af2878c
Merge remote-tracking branch 'origin/kjo/documentation' into kjo/docu…
Mar 27, 2025
5435777
Fixed strategy used in docs
SomeoneInParticular Mar 28, 2025
e1158b8
Merge branch 'master' into kjo/documentation
SomeoneInParticular Apr 20, 2025
4b6f3e3
Added link to ReadTheDocs
Jun 3, 2025
becc4e3
Merge branch 'master' into kjo/documentation
SomeoneInParticular Jun 3, 2025
0edb829
Removed redundant statements in data config arguments.
Jun 3, 2025
bec1ced
Minor formatting and cleaning.
Jun 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Modular Analysis Framework for Large-Scale ML Analyses

This is a framework for automating high-permutation, large scale ML analyses.
While it was developed for research into prediction post-surgical outcomes for patients with DCM,
it can be easily extended to allow for the analysis of any tabular dataset.
While it was developed for research into prediction post-surgical outcomes for patients with DCM, it can be easily extended to allow for the analysis of any tabular dataset.

For a detailed overview of how to use this tool, please [read the docs](https://modular-optuna-ml.readthedocs.io/en/latest/).

## Set Up

Expand Down
8 changes: 4 additions & 4 deletions docs/source/walkthrough/data_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Like any good data scientist, lets start by looking at the data we will be using
Based on this, lets aim to run some simple pre-processing throughout the MOOPs run. This will include:

* Drop the 'meta' column, as it contains data more likely to distract the model than to inform it.
* Encoding 'bar' to be numeric with a One-Hot encoding
* Encoding 'bar' to be numeric with a One-Hot encoding.
* Scaling the data in each column so that it abides by a Unit Norm (has a mean of 0, and a standard deviation of 1).

Lets start setting up the configuration file to do this!
Expand All @@ -37,11 +37,11 @@ First thing's first, the config needs to specify the where the data it is config

Each of these entries correspond to the following behaviour in MOOPS:

* "label": How studies involving this dataset will be identified; corresponds to the {data} part of the {data}__{model}__{study} table IDs in the final database
* "label": How studies involving this dataset will be identified; corresponds to the ``{data}`` part of the ``{data}__{model}__{study}`` table ID in the final database
* "data_source": The path to the dataset you want to use for this MOOPs run, including the file name *and* its extension!
* "format": The structure of the data in the data source you specified prior. Currently only 'tabular' is supported (representing data in csv-like formats, such as csv and tsv)
* "format": The structure of the data in the data source you specified prior. Currently only ``tabular`` is supported (representing data in csv-like formats, such as ``csv`` and ``tsv``)

Any arguments past these three will be specific to the format you specified; this is specific to each format type, and in our case adds two other config entries:
Any arguments past these three will be specific to the format you specified; in our case adds two other config entries:

* "separator": The character(s) used to separate the columns in the text file.
* "index": The column to use as the index (label) for each sample. If not provided, each row's position is used instead.
Expand Down
Loading