Related to: https://github.com/uw-ssec/llmaven/issues/2 - [ ] Define our needs for an evaluation suite - [ ] Collect the metrics we're using for benchmarking for various models using this evaluation suite - [ ] Update the evaluation code for deepeval to leverage the modular retrieval / generation code.