Skip to content
This repository was archived by the owner on Oct 3, 2021. It is now read-only.

Commit ec1813c

Browse files
committed
Added fine-tuning script
1 parent 300e19c commit ec1813c

18 files changed

+174247
-806
lines changed

β€Ž.gitignoreβ€Ž

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
/demos/gpt2-models
2+
/demos/models

β€ŽREADME.mdβ€Ž

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,8 @@ from gpt2_client import GPT2Client
6767

6868
gpt2 = GPT2Client('117M') # This could also be `345M`
6969

70-
my_corpus = open('shakespeare.txt', 'r').read()
71-
gpt2.finetune(x_train, y_train, epochs=10, batch_size=32) # Load your custom dataset
70+
my_corpus = 'shakespeare.txt'
71+
custom_text = gpt2.finetune(my_corpus, return_text=True) # Load your custom dataset
7272
```
7373

74-
In order to fine-tune GPT-2 to your custom corpus or dataset, you must have a GPU or TPU at hand. [Google Colab](http://colab.research.google.com) is one such tool you can make use of to re-train your model to generate new forms of text.
74+
In order to fine-tune GPT-2 to your custom corpus or dataset, it's ideal to have a GPU or TPU at hand. [Google Colab](http://colab.research.google.com) is one such tool you can make use of to re-train/fine-tune your custom model.
4.97 KB
Binary file not shown.
5 KB
Binary file not shown.
6.55 KB
Binary file not shown.
2.46 KB
Binary file not shown.

β€Ždemos/accumulate.pyβ€Ž

Lines changed: 0 additions & 35 deletions
This file was deleted.

β€Ždemos/checkpoint/run1/encoder.jsonβ€Ž

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.
166 Bytes
Binary file not shown.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"n_vocab": 50257,
3+
"n_ctx": 1024,
4+
"n_embd": 768,
5+
"n_head": 12,
6+
"n_layer": 12
7+
}

0 commit comments

Comments
Β (0)