Observability
Datalayer observability operations are centered on the OTEL service.
The OTEL stack is the current and recommended path for traces, metrics, and logs collection/query in Datalayer.
warning
The legacy Observer stack is deprecated. Prefer OTEL for new deployments and for day-2 operations.
OTEL Operations Workflow
- Deploy or upgrade
datalayer-otel. - Confirm collector and API service health.
- Validate telemetry ingestion (traces, metrics, logs).
- Query telemetry data via OTEL API/CLI.
Deploy OTEL
plane up datalayer-otel
If you use Terraform-generated scripts:
cd terraform
./generated/services/deploy-datalayer-otel.sh
Health Checks
kubectl get pods -n datalayer-otel
kubectl get svc -n datalayer-otel
kubectl logs -n datalayer-otel deploy/datalayer-otel-fastapi --tail=200
kubectl logs -n datalayer-otel deploy/datalayer-otel-otel-collector --tail=200
Validate Data Flow
Confirm OTEL topics and ingestion dependencies are healthy:
kubectl get pods -n datalayer-pulsar
kubectl get pods -n datalayer-otel
Then run end-to-end validation via CLI (from an environment with credentials and API access):
datalayer-otel smoke-test
Query and Troubleshoot
Use the OTEL API docs and CLI for day-2 operations:
- Swagger:
https://<RUN_HOST>/api/otel/v1/docs - ReDoc:
https://<RUN_HOST>/api/otel/v1/redoc - CLI queries:
datalayer-otel traces,datalayer-otel metrics,datalayer-otel logs
If telemetry is missing:
- Verify
OTEL_EXPORTER_OTLP_*environment variables in instrumented services. - Verify collector endpoint connectivity on
:4317/:4318. - Check Pulsar topic flow and consumer lag.
- Check OTEL FastAPI logs for ingestion/query errors.
Migration from Observer
If you still run datalayer-observer, migrate operational dashboards and checks to OTEL.
Use OTEL deployment and query paths as the operational baseline:
- Service documentation: OTEL service
- Deployments overview: Deployments