Skip to content
View ved1beta's full-sized avatar
🍊
santra
🍊
santra

Block or report ved1beta

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ved1beta/README.md

image

Things I Do: )

  • Triton: making custom triton kernels for better optimizations, working on some big kernel projects
  • Cuda: cuda architecture for better understanding of kernels and triton
  • Deep Learning: comp vision, NLP etc. : )

Technical Skills 🛠️

  • Languages: Python, CUDA, C++
  • Frameworks & Libraries: Pytorch, Pandas, Matplotlib, triton, Mpi4py
  • Tools & Platforms: GitHub, Docker, Vercel, Neovim, Vscode, Jupyter Notebook, Aws
  • Machine Learning Specialist: Proficient in statistical analysis, predictive modeling (Regression, Decision Trees, Random Forest), and advanced algorithms (CatBoost, SGD) with strong focus on optimization and accuracy.

Key Projects 📚

CUDA

  • GPU Sanghathan: Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
  • Cuda writer: writing cuda kernels from scratch vec_add to flash_attention and model implementation from scratch.
  • Flash attention: Implementation of flash attention in tritonutilization

Machine learning

  • Paligemma-Google: Implemented paligemma vision language model by google from scratch paper

  • Transformer: Implemented Transformer language model by Google from scratch paper

  • Mixture of Experts: Mixture of Experts (MoE) model with a focus on efficient routing and expert

  • Triton/CUDA kernels in my free time : )

Connect with Me 📬

  • 🐦 Twitter
  • 📫 Email
  • 🔗 LinkedIn I'm looking forward to collaborating on projects that are at the intersection of technology and social good. Let's connect! 🌍

Pinned Loading

  1. bitsandbytes-foundation/bitsandbytes bitsandbytes-foundation/bitsandbytes Public

    Accessible large language models via k-bit quantization for PyTorch.

    Python 7.1k 703

  2. vllm-project/llm-compressor vllm-project/llm-compressor Public

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Python 1.4k 139

  3. tinygrad/tinygrad tinygrad/tinygrad Public

    You like pytorch? You like micrograd? You love tinygrad! ❤️

    Python 29.3k 3.4k

  4. GPU-sanghathan GPU-sanghathan Public

    Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

    Python 4

  5. huggingface/accelerate huggingface/accelerate Public

    🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

    Python 8.8k 1.1k

  6. Quanta Quanta Public

    "Efficient and scalable solutions for PyTorch, enabling large language model quantization with k-bit precision for enhanced accessibility.

    Python 1 2