Guides
Configuration, kernel modes, storage isolation, OAuth — the day-2 topics.
Configuration
Three layers, in order of precedence:
- Helm values — chart-level config (
chart/values.yaml) for cluster topology, replica counts, resource limits, persistence. - Environment variables — runtime config consumed by the backend container.
- Per-user kernel options — set inside the notebook UI (Spark packages, Iceberg warehouse).
Kernel modes
SparkLabX runs the same backend in three topologies. Pick via KERNEL_MODE:
| Mode | Topology | Use case |
|---|---|---|
shared |
Single Jupyter Kernel Gateway, all users share it | Demo, single-user laptop |
docker_per_user |
One Docker container per user, dynamically spawned | Multi-user on a single VM |
k8s_per_user ⭐ |
One Kubernetes Pod per user | Production multi-tenant |
Storage & isolation
Single MinIO bucket (workspace) with per-user prefix isolation enforced by MinIO IAM policies — not app-layer checks.
- Private prefix at
users/<username>/— read/write - Shared
public/prefix — read for everyone, write for admins - S3-compatible credentials scoped to their prefix, valid only inside their kernel pod
From Scala/PySpark: spark.read.parquet("s3a://workspace/users/alice/orders/") works because the kernel pod’s STS credentials are pre-injected.
OAuth login
Optional — leave the client IDs blank in .env to use username/password only. Google and Microsoft are both supported out of the box.
GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=...
MICROSOFT_CLIENT_ID=...
MICROSOFT_TENANT_ID=...
ALLOWED_EMAIL_DOMAINS=yourcompany.com,partner.com
Frontend OAuth client IDs are baked at build time via VITE_GOOGLE_CLIENT_ID and VITE_MICROSOFT_CLIENT_ID/TENANT_ID.