Guides

Configuration, kernel modes, storage isolation, OAuth — the day-2 topics.

Configuration

Three layers, in order of precedence:

Helm values — chart-level config (chart/values.yaml) for cluster topology, replica counts, resource limits, persistence.
Environment variables — runtime config consumed by the backend container.
Per-user kernel options — set inside the notebook UI (Spark packages, Iceberg warehouse).

Kernel modes

SparkLabX runs the same backend in three topologies. Pick via KERNEL_MODE:

Mode	Topology	Use case
`shared`	Single Jupyter Kernel Gateway, all users share it	Demo, single-user laptop
`docker_per_user`	One Docker container per user, dynamically spawned	Multi-user on a single VM
`k8s_per_user` ⭐	One Kubernetes Pod per user	Production multi-tenant

Storage & isolation

Single MinIO bucket (workspace) with per-user prefix isolation enforced by MinIO IAM policies — not app-layer checks.

Private prefix at users/<username>/ — read/write
Shared public/ prefix — read for everyone, write for admins
S3-compatible credentials scoped to their prefix, valid only inside their kernel pod

From Scala/PySpark: spark.read.parquet("s3a://workspace/users/alice/orders/") works because the kernel pod’s STS credentials are pre-injected.

Optional — leave the client IDs blank in .env to use username/password only. Google and Microsoft are both supported out of the box.

GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=...
MICROSOFT_CLIENT_ID=...
MICROSOFT_TENANT_ID=...
ALLOWED_EMAIL_DOMAINS=yourcompany.com,partner.com

Frontend OAuth client IDs are baked at build time via VITE_GOOGLE_CLIENT_ID and VITE_MICROSOFT_CLIENT_ID/TENANT_ID.

Guides

Configuration

Kernel modes

Storage & isolation

OAuth login