Skip to content

[Benchmark] Stochastic Gradient Descent for Linear Model #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ogrisel opened this issue Jul 10, 2013 · 1 comment
Open

[Benchmark] Stochastic Gradient Descent for Linear Model #13

ogrisel opened this issue Jul 10, 2013 · 1 comment

Comments

@ogrisel
Copy link
Contributor

ogrisel commented Jul 10, 2013

Linear model SGD or averaged perceptron as a benchmark for both dense array input and sparse input (bag of words representation of a text document) would be very nice to have.

@varshinika2
Copy link

varshinika2 commented May 15, 2025

Thank you for highlighting this use case. I would like to contribute to developing the idea of stochastic Gradient Descent for Linear model. Stochastic model is used in deep learning and in machine learning and i would like to develop the idea of where it is being used:

1)Consider you want to calculate the accurate predictions then we use stochastic Gradient , for example: let us take a graph and assume that the steepest slope in the downward direction here we estimate how wrong the model predictions are and the steps are adjusted according to the model's parameters.

2)We have a mathematical formula to calculate the model performance
θ new =θ old−α∇J(θ old )

where the α is the learning rate

∇J(θ old) is the gradient of cost function.

3)Traditional methods are more time consuming because they process the entire dataset and it requires more memory and it leads to redundant computations.

4)Why stochastic Gradient is better then traditional methods? stotastic gradient does not utilize the entire dataset for computation it randomly selects a single data point and calculates the gradient of cost function and it is much faster than the traditional methods.

5)There are some potential challenges :

  • Due to the Randomness introduced during the selection of points there might be some noise.
  • The Learning rate is very important a high learning rate leads to divergence and a low learning rate in slow convergence

6)What are the potential contributions:

  • Develop Learning Rate Schedules.
  • Instead of training on single points try it on mini batches.
  • Implement techniques for handling large datasets.
  • Adding regularization techniques such as L1 and L2.
  • Try to avoid overfitting and underfitting by maintaining the trade-off bias and variance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants