Open
Description
📚 The doc issue
I want to replace adam with sgd in Colossal-LLaMA-2 because I don't have enough gpu but have time to adjust hyper-parameters. Is there any examples of sgd optimizer?
I want to replace adam with sgd in Colossal-LLaMA-2 because I don't have enough gpu but have time to adjust hyper-parameters. Is there any examples of sgd optimizer?