Data Engineer with expertise in building robust data pipelines, ETL processes, and data warehousing solutions. Passionate about transforming raw data into actionable insights and creating scalable data infrastructure.
- Languages: Python, SQL, Scala, Java
- Big Data: Apache Spark, Hadoop, Kafka
- Data Warehousing: Snowflake, BigQuery, Redshift
- Databases: PostgreSQL, MongoDB, Cassandra
- Cloud: AWS (S3, EMR, Glue, Redshift), GCP (BigQuery, Dataflow), Azure
- ETL/ELT Tools: Airflow, dbt, Fivetran, Matillion
- Version Control: Git, GitHub
- CI/CD: Jenkins, GitHub Actions
- Containerization: Docker, Kubernetes
This project implements a comprehensive data pipeline that extracts data from Reddit, processes it, and makes it available for analytics. The pipeline extracts posts from selected subreddits, loads them into AWS infrastructure, transforms the data using dbt, and visualizes insights using Power BI.
- AWS Certified Data Analytics - Specialty
- Window Functions: Powerful Tools for Data Analysis
- Email: [email protected]
- LinkedIn: Dharma Teja Samudrala
βοΈ From dharmateja03