From 6f861b9142d172e888eb7875d2633e4ab45986b0 Mon Sep 17 00:00:00 2001 From: parker-research <166864283+parker-research@users.noreply.github.com> Date: Mon, 29 Apr 2024 18:24:06 -0600 Subject: [PATCH] Review README --- README.md | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index a9c32b8..7dff59b 100644 --- a/README.md +++ b/README.md @@ -13,14 +13,14 @@ converted to text-only format. We closely follow guidance from [HumanEval](https://github.com/openai/human-eval/tree/master). -Make sure to use python 3.7 or later: -``` +Make sure to use Python 3.7 or later: +```bash $ conda create -n codex python=3.7 $ conda activate codex ``` Install [ICARUS Verilog](https://github.com/steveicarus/iverilog): -``` +```bash $ git clone https://github.com/steveicarus/iverilog.git && cd iverilog \ && git checkout 01441687235135d1c12eeef920f75d97995da333 \ && sh ./autoconf.sh && ./configure && make -j4\ @@ -32,14 +32,14 @@ which already pre-installed ICARUS Verilog Simulator. Using the docker container you would still need to complete the following step. Check out and install this repository: -``` +```bash $ git clone https://github.com/NVlabs/verilog-eval $ pip install -e verilog-eval ``` ## Usage -**This program would make system calls to *iverilog* and *vvp* to simulate +**This program would make system calls to `iverilog` and `vvp` to simulate untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. The [execution call](https://github.com/NVlabs/verilog-eval/blob/main/verilog_eval/execution.py#L79-L112) @@ -50,14 +50,16 @@ disclaimer before running code in a potentially unsafe manner. See the comment i After following the above instructions to enable execution, generate samples and save them in the following JSON Lines (jsonl) format, where each sample is formatted into a single line like so: -``` +```json {"task_id": "Corresponding VerilogEval task ID", "completion": "Completion only without the prompt"} ``` We provide examples under `data/example` to illustrate the format and help with debugging. To evaluate the samples, run -``` +```bash $ evaluate_functional_correctness samples.jsonl --problem_file data/VerilogEval_Human.jsonl +``` +``` Reading samples... 3120it [00:00, 16077.44it/s] Running test suites... @@ -81,8 +83,10 @@ out", or "failed". As a quick sanity-check, the example samples should yield 0.5 pass@1. The results can be verified against the provided output in `data/example/ExampleSolution.jsonl_reference.jsonl`. -``` +```bash $ evaluate_functional_correctness data/example/ExampleSolution.jsonl --problem_file=data/example/ExampleEval.jsonl +``` +``` Reading samples... 6it [00:00, 221.60it/s] Running example suites... @@ -97,14 +101,14 @@ Because there is no unbiased way of estimating pass@k when there are fewer samples than k, the script does not evaluate pass@k for these cases. To evaluate with other k values, pass `--k=`. For other options, see -``` +```bash $ evaluate_functional_correctness --help ``` However, we recommend that you use the default values for the rest. ## Issues Problem descriptions in `descriptions/VerilogDescription_Machine.jsonl` are machine -generated and we can not guarantee the absense of ambiguity and errors. We do not plan +generated and we can not guarantee the absence of ambiguity and errors. We do not plan to maintain description correctness. Functional correctness are evaluated through comparing simulation outputs using