Terraforming Jupyter and Dask at Quansight: How to Get Your Own Cloud Data Science Platform on the Cheap
See how QHub—combining HashiCorp Terraform, Helm, and Github Actions—allows you to build scalable data science platforms in under 15 minutes, deploying them to multiple cloud providers.
Quansight deploys open source data science environments for many clients, small to large. Provisioning and maintaining an open source auto-scaling JupyterHub and Dask clusters on the cloud is a difficult task. Throw Kubernetes in there and it can all get overwhelming quite fast. With that challenge in mind, we decided to build a platform QHub that uses infrastructure as code and Terraform to handle and simplify such deployments. Terraform is the integral part that allows us to support multiple cloud providers: GCP, AWS, Digital Ocean, on-prem, and soon Azure.
What You'll Learn
In this talk, Chris Ostrouchov of Quansight will show how QHub allows teams and individuals to build scalable data science platforms in under 15 minutes, deploying them to multiple cloud providers. Combining Terraform, Helm, and Github Actions, QHub provides automated infrastructure as code that allows continuous deployment on scalable data science projects, easy to share and collaborate with. Our goal is to provide small and medium teams with the ability to provision and manage a JupyterHub cluster effectively without a dedicated DevOps engineer or sysadmin in a cost effective manner.
Accompanying this talk is a demo of deploying QHub showing how some of the recent improvements in Terraform 0.13
and 0.14
allowed for better expression of the dependencies between modules.
This talk is aimed to DevOps and SysAdmin professionals. It is assumed the audience has basic knowledge of Terraform (intro will not be covered). Ostrouchov will walk through the fundamentals of Jupyter and Dask. Although of intermediate level, all guests are welcome.
Speaker: Chris Ostrouchov