Skip to main content

Operations

The Operations section is the day-2 runbook for platform operators running Datalayer on Kubernetes.

It maps directly to the operation domains in this folder:

  1. Management: manage Jupyter CRDs (pools, environments, contents, users).
  2. Observability: OTEL-first telemetry operations, health checks, and troubleshooting.
  3. Scaling: adjust node pools and runtime capacity for traffic and cost targets.
  4. Availability: keep runtime access resilient and monitor platform status.
  5. Continuity: backup and disaster recovery procedures.
  6. Upgrades: staged rollout strategy for controlled version changes.
  7. Security: platform security posture and trust-center guidance.

For a new production environment, use this order:

  1. Start with Management to validate CRDs and runtime control objects.
  2. Enable Observability and baseline service health.
  3. Tune Scaling for expected workloads.
  4. Review Availability behavior for runtime continuity.
  5. Implement Continuity backups and recovery drills.
  6. Execute Upgrades as staged rollouts.
  7. Apply Security controls and governance checks.

The availability of online Datalayer services can be monitored on the public Status page.

If you observe slowdown, incidents, or unexpected behavior, contact support.