Amazon Web Services
Apr 2025 – Oct 2025Software Development Engineer Intern
Contributed to Connector for Pytorch, optimizing S3 for large-scale distributed ML workloads. Scaled Distributed Checkpointing to over 200 EC2 instances (1M+ S3 req/min) and implemented 'shadow copy' partitioning to reduce throttling, cutting 503 errors by over 90% with multi-node ML workloads.