Operations
The Operations section is the day-2 runbook for platform operators running Datalayer on Kubernetes.
It maps directly to the operation domains in this folder:
- Management: manage Jupyter CRDs (pools, environments, contents, users).
- Observability: OTEL-first telemetry operations, health checks, and troubleshooting.
- Scaling: adjust node pools and runtime capacity for traffic and cost targets.
- Availability: keep runtime access resilient and monitor platform status.
- Continuity: backup and disaster recovery procedures.
- Upgrades: staged rollout strategy for controlled version changes.
- Security: platform security posture and trust-center guidance.
Recommended Operations Sequence
For a new production environment, use this order:
- Start with Management to validate CRDs and runtime control objects.
- Enable Observability and baseline service health.
- Tune Scaling for expected workloads.
- Review Availability behavior for runtime continuity.
- Implement Continuity backups and recovery drills.
- Execute Upgrades as staged rollouts.
- Apply Security controls and governance checks.
The availability of online Datalayer services can be monitored on the public Status page.
If you observe slowdown, incidents, or unexpected behavior, contact support.
🗃️ Management
4 items
📄️ Observability
Datalayer observability operations are centered on the OTEL service.
📄️ Scaling
The platform operator provisions the Kubernetes cluster with the Jupyter Contents, Environments and Pools.
📄️ Availability
To create new Runtimes, the Datalayer services must be up-and-running.
📄️ Continuity
Disaster Recovery
📄️ Upgrades
Plan upgrades as controlled, staged rollouts.
📄️ Security
The technical components we use, and the way we configure them, are secure by default.