About Me
Hi π, Iβm Wonook and my research interests reside in distributed systems for efficiently processing big data (batch & stream) and performing large-scale AI (training & serving). My recent focuses are on topics regarding efficient systems for data processing (e.g., resource optimization in diverse settings) and generative AI (e.g., parallelization, batching, caching, kernel optimization, and quantization). My works are presented at top-level computer science conferences and journals including USENIX ATC, ACM EuroSys, and ACM TOCS.
I finished my Ph.D. in computer science at Seoul National University in 2023, under the supervision of Professor Byung-Gon Chun. My dissertation title is βEfficient and Adaptive Resource Management for Dynamically Optimizing Distributed Data Processing Systemsβ. I also received my B.S. degree in computer science from Seoul National University in 2017. You can check out my list of works on Google Scholar as well.
Apart from work, I do a lot of sports (mainly tennis, golf, and BJJ nowadays), learn new languages, and meet new people. Due to my experience living around the world, I happen to be a motivation-driven, multicultural person with a flexible mind, who is open, adaptive, and a fast learner for new knowledge and people.
Education
Seoul National University, Republic of Korea π°π·
Ph.D., Computer Science
2017 - 2023
Dissertation: Efficient and Adaptive Resource Management for Dynamically Optimizing Distributed Data Processing Systems
Seoul National University, Republic of Korea π°π·
B.S., Computer Science
2013 - 2017
https://en.snu.ac.kr
Γcole Polytechnique FΓ©dΓ©rale de Lausanne, Switzerland π¨π
Exchange program, Informatiques (Computer Science)
2015 - 2016
https://www.epfl.ch
Hana Academy Seoul, Republic of Korea π°π·
Honors High School/Secondary Diploma Program
2010 - 2013
http://eng.hana.hs.kr
American School of Paris, France π«π·πΊπΈ
International School
2003 - 2009
https://www.asparis.org
International Publications
*Blaze: Holistic Caching for Iterative Data Processing
2024. European Conference on Computer Systems (EuroSys '24). ACM. DOI https://doi.org/10.1145/3627703.3629558
Click for full textWon Wook Song, Jeongyoon Eo, Taegeon Um, Myeongjae Jeon, and Byung-Gon Chun
*Sponge: Fast Reactive Scaling for Stream Processing with Serverless Frameworks
2023. USENIX Annual Technical Conference (ATC) β23. USENIX Association. ISBN 978-1-939133-35-9
Click for full textWon Wook Song, Taegeon Um, Sameh Elnikety, Myeongjae Jeon, and Byung-Gon Chun
SWAN: WAN-aware Stream Processing on Geographically-distributed Clusters
2022. APSys (Asia-Pacific Workshop on Systems). ACM SIGOPS. ISBN 978-1-4503-9441-3. DOI https://doi.org/10.1145/3546591.3547524
Click for full textWon Wook Song, Myeongjae Jeon, and Byung-Gon Chun
*Apache Nemo: A Framework for Optimizing Distributed Data Processing
2021. Transactions on Computer Systems (TOCS). ACM. DOI https://doi.org/10.1145/3468144
Click for full textWon Wook Song, Youngseok Yang, Jeongyoon Eo, Jangho Seo, Joo Yeon Kim, Sanha Lee, Gyewon Lee, Taegeon Um, Haeyoon Cho, and Byung-Gon Chun
Harmony: A Scheduling Framework Optimized for Multiple Distributed Machine Learning Jobs
2021. International Conference on Distributed Computing Systems (ICDCS) β21. IEEE. DOI https://doi.org/10.1109/ICDCS51616.2021.00085
Click for full textWoo-Yeon Lee, Yunseong Lee, Won Wook Song, Youngseok Yang, Joo Yeon Kim and Byung-Gon Chun
Apache Nemo: A Framework for Building Distributed Dataflow Optimization Policies
2019. USENIX Annual Technical Conference (ATC) β19. USENIX Association. ISBN 978-1-939133-03-8. DOI https://dl.acm.org/doi/10.5555/3358807.3358824
Click for full textYoungseok Yang, Jeongyoon Eo, Geon-Woo Kim, Joo Yeon Kim, Sanha Lee, Jangho Seo, Won Wook Song, and Byung-Gon Chun.
Automating System Configuration of Distributed Machine Learning
2019. International Conference on Distributed Computing Systems (ICDCS) β19. IEEE. ISBN 978-1-7281-2520-6. DOI https://doi.org/10.1109/ICDCS.2019.00203
Click for full textWoo-Yeon Lee, Yunseong Lee, Joo Seong Jeong, Gyeong-In Yu, Joo Yeon Kim, Ho Jin Park, Beomyeol Jeon, Won Wook Song, Gunhee Kim, Markus Weimer, Brian Cho, Byung-Gon Chun.
Pado: A Data Processing Engine for Harnessing Transient Resources in Datacenters
2017. European Conference on Computer Systems (EuroSys '17). ACM. ISBN 978-1-4503-4938-3. DOI https://doi.org/10.1145/3064176.3064181
Click for full textYoungseok Yang, Geon-Woo Kim, Won Wook Song, Yunseong Lee, Andrew Chung, Zhengping Qian, Brian Cho, and Byung-Gon Chun.
Experiences
- Start-up experience in building a resource-efficient system for serving AI and large language models (LLM) on GPGPUs (i.e., a closed-source system similar to vLLM/TensorRT-LLM).
- Our team built support for: continuous batching, KV caching, adaptive & dynamic inference / tensor & pipeline parallelization, kernel optimization (e.g., FlashAttn) / quantization, MoE / agents & tools (e.g., structured outputs), RAG (e.g., vector stores), and fine-tuning (e.g., (multi) LoRA and PEFT).
- Experience in scalable large-scale cluster management through K8s and Docker (w/ various tools).
- Also the main author and editor-in-chief of the companyβs technical publications.
- Dissertation: Efficient and Adaptive Resource Management for Dynamically Optimizing Distributed Data Processing Systems
- My research records demonstrate that I can conduct researches and publish papers in top-level conferences and journals as the first author, as well as collaborate with others as a co-author. Below are the brief explanations for each of the works.
- First-authored papers (in USENIX ATC, ACM EuroSys, TOCS, APSys)
- Blaze is a data processing system that optimizes caching for iterative workloads (i.e. ML, graph processing) by coordinating separate layers in a holistic manner to use memory resources effectively.
- Sponge is an adaptive and scalable stream processing system to handle bursty inputs by leveraging benefits of powerful stable VMs and fast-scaling serverless frameworks with special operators.
- SWAN is a stream processing system that adaptively utilizes optimal network connections under low-bandwidth high-latency globally distributed computing clusters.
- Published a journal paper (first-authored) and a conference paper (co-authored) for the Apache Nemo project that flexibly builds dataflow optimization policies through customizable task scheduling and data communication.
- Co-authored papers (in USENIX ATC, ACM EuroSys, IEEE ICDCS)
- Cruise is a system for automatically adjusting system configurations for distributed ML jobs.
- Harmony is a scheduling framework for optimizing the execution of multiple distributed ML jobs for high utilization of CPU, memory, and network resources.
- Pado is a distributed data processing system that effectively uses preemptible spot (transient) resources while guaranteeing progress.
- Apart from the papers, Iβve also led to a successful completion of an 8-year grant project funded by the Korean government (IITP of MSIT) (No.2015-0-00221)
Apache Software Foundation
Started as a GSoC student developer, mentor, and now a project committer and PMC.
MAY 2016 - PRESENT
https://www.apache.org
- Primary committer and a PMC member on the Apache Nemo incubator project.
- Participated as a GSoC mentor in 2017, 2018, 2020, 2021, 2022 for ASF to supervise student projects on top of Apache Nemo.
- Contributing to other Apache projects as well, such as Apache Beam from 2018.
- Experiences with open source contribution, and community activities, including preparing and launching releases.
- Previously a Google Summer of Code student developer for the Apache REEF project and finished the development of a ssh-standalone mode for running REEF in 2016, a library from MS that facilitates building of distributed systems in cluster environments with a master and multiple workers.
- Experience with tools like Apache Maven, JIRA, Git/GitHub, and many more.
- Worked with the Network Research Group in MSRA, mentored by Yongqiang Xiong and Wenxue Cheng, centering on reinforcement learning (i.e., involving simulators and evaluators)
- Worked with OpenNetLab: open platform for RL-based congestion control for real-time communication, using bandwidth estimation for optimizing congestion control in production RTC workloads.
Past Experiences
- Part of the start-up team running the ButFitSeoul service, a platform for gathering people for group exercise workout programs, with many thousands of participants.
- Used Django, VueJS, Fabric, Unicorn, and cooperated with web designers through Figma.
- Used Git, GitHub, AWS, GCP (Compute/EC2, RDBMS, Storage/S3, VPC, etc.)
- Used Django. Took care of the server-side materials (deployment and backend) for the start-up located in EPFL, Lausanne.
- See https://artmyn.com
Systers, Anita Borg Institute
Google Summer of Code (GSoC) Student Developer
MAY 2015 - AUG. 2015
- Accepted as a Google Summer of Code student developer for a Rails application project.
- Used Rails, PostgreSQL, ReactJS, minitest-rails, Capistrano, Passenger.
- Used Slack, Travis CI, Git/GitHub, Agile for communication, CI, and development.
- Experience with working with collaborators around the globe (Asia, America, and Europe)
YelloMobile
Software Engineer
JAN. 2015 - APRIL 2015
- Part of the team running the Dieter app, once ranked 4th on the health category in Google Play Market in Korea, with over 1.5 million users.
- Used Rails, AngularJS, Capistrano, Unicorn, and cooperated with Android and iOS developers, as well as with web designers.
- Used Git, GitHub, AWS (EC2, RDBMS, Load Balancing, CloudWatch and Route 53).
Talks
Sponge: Fast Reactive Scaling for Stream Processing with Serverless Frameworks
USENIX ATC 2023 (Annual Technical Conference).
JULY 2023
https://youtu.be/mhx0aWhzP6w?si=0LUHQL7rq4D5s-q5
Building a Distributed System for Batch & Stream Processing
Spotify Engineering Conference 2022 (SpEC 2022).
MAY 2022
Undisclosed talk
Flexible Optimizations and Efficient Execution of Data Processing on Apache Nemo
ApacheCon Asia 2021.
AUG. 2021
https://youtu.be/lFCuiL9ZRWk
Running Apache Beam Programs on Apache Nemo
Apache Beam Summit 2019, Berlin, Germany.
JUNE 2019
https://youtu.be/DKxYE8YWF_o
Conference with Experts on Technology Trends and International Affairs (Korean).
Korean Ministry of Foreign Affairs, Seoul, South Korea.
NOV. 2020
Undisclosed talk
Fast and Efficient Data Processing with Apache Nemo. (Korean)
Naver Tech Talk 2020, Seoul, South Korea.
JULY 2020
https://youtu.be/Gc4-o8n762I?si=udAow673paGkDR0Q&t=1217
Executing Apache Beam Applications with Optimized Configurations on Specific Environments on Apache Nemo. (Korean)
KOSSCON 2018, Seoul, South Korea.
NOV. 2018
https://festa.io/events/152
A Flexible and an Extensible way of Big Data Processing. (Korean)
Naver DeView 2017, Seoul, South Korea.
NOV. 2017
https://tv.naver.com/v/2293702
Other Skills
Programming Languages
Python, Java, C++, Go, Scala, JavaScript, TypeScript, SQL, HTML/CSS, Ruby, Ocaml, Scheme
Tools, Frameworks & More
Kubernetes, Docker / vLLM, TensorRT-LLM / Hadoop, YARN, Spark, Flink, Beam / PyTorch, TensorFlow, Ray, Gym / CUDA / Git, Jenkins, GitHub, Jira, Notion, Figma / AWS, GCP, Azure, CoreWeave / FastAPI, Flask, Django, VueJS / Maven / PostgreSQL, Redis / Terraform, Helm, Traefik, Argo, Prometheus, gRPC, etcd
Visa Status
My US green card is to be issued in early 2025 via EB2-NIW.
A Little More About Me
Due to my experiences across the globeπ«π·π¨ππΊπΈπ°π·, I happen to speak fluent English and Korean, and can socialize in French. Iβm also learning Spanish and Mandarin. The Myers-Briggs Type Indicator test tells me that I have a personality of an ENFJ (protagonist)π. Alongside my interests in computer systems π¨βπ» some of my other interests are:
- Skiing & Snowboarding β·π
- Tennis πΎ
- Golfing β³οΈποΈββοΈ
- Brazilian Jiu-Jitsu π€ΈββοΈ
- Running πββοΈ
- Cycling π΄ββοΈ
- Swimming πββοΈ
- My Italian motorbike π΅
- Philosophy π
- Photography π·
- Travelling πβοΈ
- Wine π·
- Interior designing π¨
- .. and a lot more