Guides

Configuration, kernel modes, storage isolation, OAuth — the day-2 topics.

Configuration

Three layers, in order of precedence:

  1. Helm values — chart-level config (chart/values.yaml) for cluster topology, replica counts, resource limits, persistence.
  2. Environment variables — runtime config consumed by the backend container.
  3. Per-user kernel options — set inside the notebook UI (Spark packages, Iceberg warehouse).

Kernel modes

SparkLabX runs the same backend in three topologies. Pick via KERNEL_MODE:

Mode Topology Use case
shared Single Jupyter Kernel Gateway, all users share it Demo, single-user laptop
docker_per_user One Docker container per user, dynamically spawned Multi-user on a single VM
k8s_per_user One Kubernetes Pod per user Production multi-tenant

Storage & isolation

Single MinIO bucket (workspace) with per-user prefix isolation enforced by MinIO IAM policies — not app-layer checks.

  • Private prefix at users/<username>/ — read/write
  • Shared public/ prefix — read for everyone, write for admins
  • S3-compatible credentials scoped to their prefix, valid only inside their kernel pod

From Scala/PySpark: spark.read.parquet("s3a://workspace/users/alice/orders/") works because the kernel pod’s STS credentials are pre-injected.

OAuth login

Optional — leave the client IDs blank in .env to use username/password only. Google and Microsoft are both supported out of the box.

GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=...
MICROSOFT_CLIENT_ID=...
MICROSOFT_TENANT_ID=...
ALLOWED_EMAIL_DOMAINS=yourcompany.com,partner.com

Frontend OAuth client IDs are baked at build time via VITE_GOOGLE_CLIENT_ID and VITE_MICROSOFT_CLIENT_ID/TENANT_ID.