Skip to content

Conversation

JesseLivezey
Copy link
Contributor

This implementation is based on the arxiv [v4] paper. Haven't run or tested it yet.

The paper seems to have at least one typo in that \beta_2^t is used but never defined. I'm assuming it is just \beta_2 currently. Also assuming that \beta_{1,t} is the same thing as \beta_1^t.

@goodfeli
Copy link
Contributor

goodfeli commented Mar 6, 2015

Why not just use Alec Radford's implementation?
https://gist.github.com/Newmu/acb738767acb4788bac3

I've been using that plugged into Pylearn2 in my private repo and it works well.

@JesseLivezey
Copy link
Contributor Author

I don't think Alec's version is consistent with the most recent version of the paper, but I haven't really tested this implementation vs. his, so I'm not sure how different the results will be.

@JesseLivezey
Copy link
Contributor Author

It might just be that Alec's version doesn't decay beta1, although the betas have been redefined, and I haven't checkout to see whether the rest of the math is equivalent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants