ved1beta

Follow

🍊

santra

वेदांत ved1beta

🍊

santra

Follow

Hi, I'm Ved! 🎖️ GPU Engineer : )

41 followers · 136 following

Achievements

Achievements

Highlights

Developer Program Member
Pro

ved1beta/README.md

Things I Do: )

Triton: making custom triton kernels for better optimizations, working on some big kernel projects
Cuda: cuda architecture for better understanding of kernels and triton
Deep Learning: comp vision, NLP etc. : )

Technical Skills 🛠️

Languages: Python, CUDA, C++
Frameworks & Libraries: Pytorch, Pandas, Matplotlib, triton, Mpi4py
Tools & Platforms: GitHub, Docker, Vercel, Neovim, Vscode, Jupyter Notebook, Aws
Machine Learning Specialist: Proficient in statistical analysis, predictive modeling (Regression, Decision Trees, Random Forest), and advanced algorithms (CatBoost, SGD) with strong focus on optimization and accuracy.

Key Projects 📚

CUDA

GPU Sanghathan: Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
Cuda writer: writing cuda kernels from scratch vec_add to flash_attention and model implementation from scratch.
Flash attention: Implementation of flash attention in tritonutilization

Machine learning

Paligemma-Google: Implemented paligemma vision language model by google from scratch paper
Transformer: Implemented Transformer language model by Google from scratch paper
Mixture of Experts: Mixture of Experts (MoE) model with a focus on efficient routing and expert
Triton/CUDA kernels in my free time : )

Connect with Me 📬

🐦 Twitter
📫 Email
🔗 LinkedIn I'm looking forward to collaborating on projects that are at the intersection of technology and social good. Let's connect! 🌍

Pinned Loading

bitsandbytes-foundation/bitsandbytes bitsandbytes-foundation/bitsandbytes Public

Accessible large language models via k-bit quantization for PyTorch.

Python 7.1k 703
vllm-project/llm-compressor vllm-project/llm-compressor Public

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1.4k 139
tinygrad/tinygrad tinygrad/tinygrad Public

You like pytorch? You like micrograd? You love tinygrad! ❤️

Python 29.3k 3.4k
GPU-sanghathan GPU-sanghathan Public

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Python 4
huggingface/accelerate huggingface/accelerate Public

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 8.8k 1.1k
Quanta Quanta Public

"Efficient and scalable solutions for PyTorch, enabling large language model quantization with k-bit precision for enhanced accessibility.

Python 1 2