Running JupyterHub With Istio Service Mesh on Kubernetes — A Troubleshooting Journey

Harsimran Singh Maan
The Startup
Published in
8 min readSep 15, 2020

--

JupyterHub is an open-source tool that offers the ability to spin up Jupyter notebook servers on demand. The notebooks can be used for data analysis or to create and execute Machine learning models. Istio is a service mesh that offers secure and observable communication mechanism between different services in a Kubernetes cluster.
One of the benefits of running JupyterHub in an istio-enabled cluster is to gain support for mTLS(mutual TLS) capabilities between different JupyterHub components. mTLS ensures that all communication between the hub and the user-notebook servers is encrypted and is safe from eavesdropping. This capability has been requested by many users in the JupyterHub community.

To follow along this journey, it is important to know the basic component interactions in JupyterHub

  • The Hub configures the proxy by calling proxy-api
  • The proxy forwards all requests to the Hub by default
  • The Hub handles login, and spawns single-user notebook servers on demand
  • The Hub configures the proxy to forward url prefixes to single-user notebook servers
Image Source: https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/architecture.html

Setup

— Install istio

$ istioctl install --set profile=demo

— Install JupyterHub

Create the jupyterhub namespace to install the JupyterHub. Set the istio-injection label to configure the automatic injection of the istio-proxy sidecar in the pods that start in the namespace. Set the mTLS mode for all services on the namespace.

$ kubectl create ns jupyterhub
$ kubectl label namespace jupyterhub istio-injection=enabled
$ kubectl apply -n jupyterhub -f - <<EOF
apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
name: "default"
spec:
mtls:
mode: STRICT
EOF

Next, setup the helm charts repository.

$ helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
$ helm repo update

Setup the config for the helm chart.

$ echo -n "proxy:\n  secretToken: '$(openssl rand -hex 32)'\n" > config.yaml

Install JupyterHub in the jupyterhub namespace

$ helm template  jupyterhub/jupyterhub \
--version=0.9.0 \
--values config.yaml | kubectl -n jupyterhub apply -f -
$ # Not using `helm install` is a personal preference. I prefer qbec instead for day to day use. Using helm here as it is used to package JupyterHub for Kubernetes in the community.

Next, we’ll verify the deployment to see if the pods are running. Both the hub and the proxy pods running as expected.

$ kubectl -n jupyterhub get po
hub-fd88f65b6-6zqb9 2/2 Running 1 5m31s
proxy-98fdbb5fd-bv7nt 2/2 Running 0 5m31s

The 2/2 part shows that there are two containers in the pod - the main container and a sidecar istio-proxy container. kubectl -n jupyterhub describe po hub-fd88f65b6-6zqb9 shows that the hub pod has an istio-init container and an istio-proxy sidecar.

Network traffic is routed through the istio-proxy sidecar. To validate, look at the access logs from the sidecar.

$ kubectl -n jupyterhub logs hub-fd88f65b6-6zqb9 -c istio-proxy[2020-09-15T03:50:42.650Z] "GET /api/routes HTTP/1.1" 200 - "-" "-" 0 87 2 1 "-" "Mozilla/5.0 (compatible; pycurl)" "389f6b9c-c966-96d7-8cc3-2a565f623ccd" "10.106.101.111:8001" "10.1.0.19:8001" outbound|8001||proxy-api.jupyterhub.svc.cluster.local 10.1.0.18:52182 10.106.101.111:8001 10.1.0.18:36004 - default

So far so good. Looks like we got everything we need. But, navigating to the proxy-public results in an unexpected error.

$ kubectl -n jupyterhub port-forward svc/proxy-public 8080:80
404 when accessing hub in the Jupyterhub on the istio service mesh

Port-forwarding directly to the proxy pod also fails with the same 404 error

$ kubectl -n jupyterhub port-forward proxy-98fdbb5fd-265xq 8080:8000

Investigation

The 404 turns out to be a little tricky to debug. Everything checks out, the proxy container is able to connect to the hub container via the hub service, the user request on the port-forward lands on the proxy container. Could this be an issue with istio? Disabling the sidecar injection makes everything work again magically.

$ kubectl label namespace jupyterhub istio-injection=disabled --overwrite

The port-forward lands at the login page.

Without istio-sidecar injection, the login page is served without issues

Digging deeper requires more networking expertise. Enabling sidecar injection again and simultaneously tailing logs of the istio-proxy(which is based on envoy) for the proxy pod shows the event that records the failure.

$ kubectl -n jupyterhub logs proxy-98fdbb5fd-svnzl -c istio-proxy -f

[2020-09-15T04:37:03.884Z] "GET / HTTP/1.1" 404 NR "-" "-" 0 0 0 - "127.0.0.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36" "b0bd27fe-90c3-9bfd-8ae7-e716baa9eb6e" "localhost:8080" "-" - - 10.105.254.81:8081 127.0.0.1:0 - -

10.105.254.81:8081 is destination(k8s service) where hub is listening but there is no corresponding inbound event in the logs for the istio-proxy on the hub pod. The real hint here is the 404 NR which means that there is “No Route” or in other words the target service is not known to envoy. Hence envoy drops the outbound request resulting in a 404 NOT FOUND response. To understand what’s going on, let’s fire up a shell on the proxy and make some tailored requests and reproduce the issue.

$ kubectl -n jupyterhub  exec -it deploy/proxy -c chp sh
# curl hub:8081

The proxy sidecar logs show the event with a successful redirect as expected(302 in this case).

[2020-09-15T04:51:18.816Z] "GET / HTTP/1.1" 302 - "-" "-" 0 0 2 1 "-" "curl/7.67.0" "923377f0-4cb1-9c35-8ba9-0c42f8e247da" "hub:8081" "10.1.0.37:8081" outbound|8081||hub.jupyterhub.svc.cluster.local 10.1.0.38:46732 10.105.254.81:8081 10.1.0.38:50312 - default

The hub sidecar log also show the corresponding inbound event. The X-REQUEST-ID field can be used to track the logs across services in an istio service mesh.

[2020-09-15T04:51:18.816Z] "GET / HTTP/1.1" 302 - "-" "-" 0 0 1 1 "-" "curl/7.67.0" "923377f0-4cb1-9c35-8ba9-0c42f8e247da" "hub:8081" "127.0.0.1:8081" inbound|8081||hub.jupyterhub.svc.cluster.local 127.0.0.1:34838 10.1.0.37:8081 10.1.0.38:46732 outbound_.8081_._.hub.jupyterhub.svc.cluster.local default

Now let’s route the request through the chp(configurable-http-proxy) — the nodejs server that proxies the call. This results in the 404 error — the same error as the calls from the browser.

# curl localhost:8000 # the entry below is from proxy sidecar logs
[2020-09-15T04:54:19.216Z] "GET / HTTP/1.1" 404 NR "-" "-" 0 0 0 - "127.0.0.1" "curl/7.67.0" "0fcf1f52-a809-9855-bb58-50501a17f694" "localhost:8000" "-" - - 10.105.254.81:8081 127.0.0.1:0 - -

Setting the correct Host header on the request works as expected.

# curl localhost:8000 -H "Host: hub:8081" -H "X-REQUEST-ID:
test"
[2020-09-15T05:01:35.988Z] "GET / HTTP/1.1" 302 - "-" "-" 0 0 2 2 "127.0.0.1" "curl/7.67.0" "test" "hub:8081" "10.1.0.37:8081" outbound|8081||hub.jupyterhub.svc.cluster.local 10.1.0.38:46732 10.105.254.81:8081 127.0.0.1:0 - default

So, the issue with chp is the mismatch of the Host header on the request. Envoy would drop the outbound request with an NR and the proxy cannot hit the hub service when an external request is routed through it. (Some other tools and nodejs snippets used to narrow down the exact issue have been omitted for brevity.)

jupyterhub-istio-proxy to the rescue

While there are ways to hack the current proxy implementation(s) and in some cases use a less secure variant, it is kinda redundant as istio(underlying envoy to be precise) offers a first-class support for network proxying. Moreover, the chp proxy implementation becomes a bottleneck as soon as the Jupyterhub traffic grows. It cannot be scaled beyond one pod due to its technical limitations. jupyterhub-istio-proxy can be used to configure istio to do the actual network routing based on user interactions with JupyterHub. It also offers a horizontally scalable solution needed to run production workloads at scale.

Create an istio gateway to handle ingress to the K8s cluster. The gateway is the entry point for network.

$ kubectl -n jupyterhub apply -f - https://gist.githubusercontent.com/harsimranmaan/4315477268fccea65accf8674f5c49ef/raw/0298f3f420365c7e56aedab7949ad39e00ffbcc3/jupyterhub-istio-proxy-gateway.yaml

Remove the proxy-public service as it no longer needed.

$ kubectl -n jupyterhub delete svc proxy-public

Replace the proxy deployment with the jupyterhub-istio-proxy:

$ kubectl -n jupyterhub apply -f https://gist.githubusercontent.com/harsimranmaan/2e77cf65019439052122b7b89f926686/raw/d800b8c60c2ac10226d549c1fbc6d8d75e8e6142/jupyterhub-istio-proxy.yaml
Using splunk/jupyterhub-istio-proxy instead of default proxy

Once the above config is applied, a new virtual service would appear.

$ kubectl -n jupyterhub get vs
jupyterhub-8a5edab282632443219e051e4ade2d1d5bbc671c781051bf1437897cbdfea0f1 [jupyterhub-gateway] [*] 37m

Everything should now work in theory(right?), but there are a few more issues to address. By default, hub is configured to tell the proxy-api(jupyterhub-istio-proxy) to route traffic to its IP instead of using its service name. This causes the istio virtual service to be configured with the IP

Hub log:
[I 2020-09-15 06:28:24.481 JupyterHub proxy:400] Adding default route for Hub: / => http://10.105.254.81:8081

resulting in invalid VS config

- destination:
host: 10.105.254.81.jupyterhub.svc.cluster.local
port:
number: 8081

Patch the Jupyterhub config to set the Jupyterhub.hub_connect_ip property to the service name instead of the IP. The PROXY_PUBLIC_SERVICE_HOST and PROXY_PUBLIC_SERVICE_PORT are no longer in use and can be set to the external hostname and port (localhost:80 in this setup).

kubectl -n jupyterhub get cm/hub-config -o yaml | sed  "s/os\.environ\['HUB_SERVICE_HOST'\]/'hub'/g" | sed  "s/os\.environ\['PROXY_PUBLIC_SERVICE_HOST'\]/'localhost'/g" | sed  "s/os\.environ\['PROXY_PUBLIC_SERVICE_PORT'\]/'80'/g" | kubectl -n jupyterhub apply -f -

Restart the hub pod to pick up the new config. The service name is set for the default route.

[I 2020-09-15 07:27:09.068 JupyterHub proxy:400] Adding default route for Hub: / => http://hub:8081- destination:
host: hub.jupyterhub.svc.cluster.local
port:
number: 8081

Follow the official guide to determine the istio Gateway URL. Navigate to http://YOUR_GATEWAY_URL and you’ll see JupyterHub running. TLS termination for web requests is not covered here but is fairly straightforward to setup with istio gateway. It is left as an exercise to the readers.

The last missing piece in the puzzle is to ensure that user-notebook servers can be spun up and users can run their favourite notebooks.
This requires patching another JupyterHub component(kubespawner). The details can be found in this PR: https://github.com/jupyterhub/kubespawner/pull/425

Jupyter notebook server spawn after patching the kubespawner

Under the hood

jupyterhub-istio-proxy creates an Istio virtual service for every route request from the hub. Hub forwards routing requests to jupyterhub-istio-proxy which sets up the desired destination route and waits for the route to warm up before sending the confirmation back to the hub. Once the route is created, hub redirects the user to their notebook server.

Routing with jupyterhub-istio-proxy

If you have questions or would like to contribute to the development of jupyterhub-istio-proxy, drop a note or send in your contributions at https://github.com/splunk/jupyterhub-istio-proxy/issues

--

--