Skip to main content

Solr

Apache Solr Operator

To deploy Solr, you first need to deploy the Solr Operator.

helm repo add apache-solr https://solr.apache.org/charts
helm repo update
plane up datalayer-solr-operator

Check the availability of the Solr CRDs.

kubectl explain solrcloud.spec.zookeeperRef.provided.config
kubectl explain solrcloud.spec.zookeeperRef.provided.persistence
kubectl explain solrcloud.spec.zookeeperRef.provided.persistence.spec
plane ls

Check the availability of the Solr Operator Pods.

kubectl get pods -n datalayer-solr-operator -l control-plane=solr-operator

Datalayer Solr Cluster

Prepare a AWS S3 bucket for the Solr backups (the DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME environment variable should contain that name) and create a secret with write access to that S3 bucket.

kubectl create secret generic aws-creds \
--from-literal=access-key-id=$AWS_ACCESS_KEY_ID \
--from-literal=secret-access-key=$AWS_SECRET_ACCESS_KEY \
--namespace=datalayer-solr
kubectl describe secret aws-creds -n datalayer-solr

Create a secret for the Solr authentication.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: solr-basic-auth
namespace: datalayer-solr
type: kubernetes.io/basic-auth
stringData:
username: ${DATALAYER_SOLR_USERNAME}
password: ${DATALAYER_SOLR_PASSWORD}
EOF
kubectl describe secret solr-basic-auth -n datalayer-solr
# Secret for the datalayer-api namespace.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: solr-basic-auth
namespace: datalayer-api
type: kubernetes.io/basic-auth
stringData:
username: ${DATALAYER_SOLR_USERNAME}
password: ${DATALAYER_SOLR_PASSWORD}
EOF
kubectl describe secret solr-basic-auth -n datalayer-api

You are now ready to create a Solr cluster. ensure the DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME and DATALAYER_SOLR_BACKUP_S3_BUCKET_REGION variables with the name and region of the S3 bucket for the backups.

You can also configure the size of the cluster, a 3 nodes replica cluster is what is defined in the following spec.

kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer.yaml

Check the creation of the Solr Cluster Pods. It may take multiple minutes to get a completely up-and-running Solr cluster, mainly due to the time to provision the required storage.

kubectl get solrclouds -n datalayer-solr -w
# NAME VERSION TARGETVERSION DESIREDNODES NODES READYNODES UPTODATENODES AGE
# solr-datalayer 9.0.0 3 3 3 3 79s
kubectl get pods -n datalayer-solr -w
kubectl describe pods -n datalayer-solr | grep Node

Create Datalayer Solr Collections

Solr collections are organized into two sets that can be initialized independently.

Core Collections (17)

Platform collections used by the main Datalayer services:

CollectionPurpose
ai-agentsAI agents service records
contactsContact information
creditsUsage credits
datasourcesData source definitions
eventsAgent lifecycle and system event records
iamIdentity and access management
iam-tokensIAM authentication tokens
inboundsInbound integrations
invitesUser invitations
libraryContent library
notificationsUser-facing notifications records
outboundsOutbound integrations
secretsEncrypted secrets metadata
spacesWorkspace spaces
successSuccess tracking
tool-approvalsTool approval requests and decision state
usagePlatform usage metrics

Note: The notifications collection stores user-facing notifications, while events stores lifecycle and system events.

Runtimes Collections (2)

Collections specific to the runtimes subsystem:

CollectionPurpose
runtimes-checkpointsRuntime checkpoint records (CRIU snapshots metadata)
runtimes-snapshotsRuntime snapshot records

Source of Truth

The collection arrays are defined in solr-collections.sh:

  • SOLR_COLLECTIONS_CORE — Core platform collections
  • SOLR_COLLECTIONS_RUNTIMES — Runtimes collections
  • SOLR_COLLECTIONS — All collections combined (union of the above two)

The solr-init script prompts which collection set to create. This is useful when adding the runtimes collections to an existing cluster without re-creating the core ones.

plane solr-init
# Select:
# 1) Core - Platform collections (17 collections)
# 2) Runtimes - Runtimes collections (2 collections)
# 3) All - All collections (19 collections)
#
# The init pod is created, logs are streamed, and the pod is auto-cleaned.

Backup Datalayer Solr

Solr collections are backed up to an AWS S3 bucket using the Solr Operator SolrBackup CRD. The backup relies on the s3 repository configured in the SolrCloud spec (see Datalayer Solr Cluster above).

Three backup definitions are available, matching the collection sets:

FileScopeCR Name
datalayer-backup-s3.yamlAll 19 collectionsdatalayer-solr-collection-backup
datalayer-backup-s3-core.yaml17 core collectionsdatalayer-solr-collection-backup-core
datalayer-backup-s3-runtimes.yaml2 runtimes collectionsdatalayer-solr-collection-backup-runtimes

All backups run daily at 1:00 AM UTC and retain up to 200 snapshots.

Prerequisites

  • The aws-creds secret must exist in the datalayer-solr namespace (see Datalayer Solr Cluster).
  • The DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME and DATALAYER_SOLR_BACKUP_S3_BUCKET_REGION environment variables must be set.

Apply the Backup Schedule

# All collections.
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3.yaml
# Core collections only.
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3-core.yaml
# Runtimes collections only.
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3-runtimes.yaml

Monitor Backups

# List all backups.
kubectl get solrbackups -n datalayer-solr
# Describe a specific backup for detailed status.
kubectl describe solrbackup datalayer-solr-collection-backup -n datalayer-solr

Restore Datalayer Solr

Restore Solr collections from an S3 backup using the plane solr-restore command. The restore calls the Solr Collections API RESTORE action asynchronously for each collection.

When restoring multiple collections, the script prompts which collection set to restore (core, runtimes, or all). Specifying a single collection on the command line bypasses the prompt.

Prerequisites

  • The DATALAYER_SOLR_PASSWORD environment variable must be set.
  • A valid backup must exist in the S3 repository (check with kubectl get solrbackups -n datalayer-solr).

Restore Collections

The default backup name is datalayer-solr-collection-backup (matching the SolrBackup CRD name).

# Restore collections (prompts for set: core / runtimes / all).
plane solr-restore
# Restore only a specific collection (no prompt).
plane solr-restore datalayer-solr-collection-backup iam

Monitor Restore Progress

Restore operations are asynchronous. Use the solr-restore-status script to check all collections at once, or query a single collection.

# Check the status of all restore operations.
plane solr-restore-status
# Check a single collection.
plane solr-restore-status iam

Scale Datalayer Solr

Scale Solr replicas if needed.

kubectl scale \
--replicas=5 \
solrcloud/solr-datalayer \
-n datalayer-solr

Tear Down Datalayer Solr

Tear down the created Solr Cloud if needed.

kubectl delete solrcloud solr-datalayer -n datalayer-solr
kubectl get solrcloud -A

Tear down the Solr Operator if needed.

plane down datalayer-solr-operator