Solr

Apache Solr Operator

To deploy Solr, you first need to deploy the Solr Operator.

helm repo add apache-solr https://solr.apache.org/charts
helm repo update

Plane
Helm

plane up datalayer-solr-operator

cat << 'EOF' > /tmp/values.yaml
nodeSelector:
  role.datalayer.io/system: "true"
zookeeper-operator:
  nodeSelector:
    role.datalayer.io/system: "true"
EOF
export RELEASE=datalayer-solr-operator
export NAMESPACE=datalayer-solr-operator
kubectl create \
  -n $NAMESPACE \
  -f https://solr.apache.org/operator/downloads/crds/v0.8.0/all-with-dependencies.yaml
helm upgrade \
  --install $RELEASE \
  apache-solr/solr-operator \
  --version 0.8.0 \
  --create-namespace \
  --namespace $NAMESPACE \
  --values /tmp/values.yaml \
  --timeout 5m

Check the availability of the Solr CRDs.

kubectl explain solrcloud.spec.zookeeperRef.provided.config
kubectl explain solrcloud.spec.zookeeperRef.provided.persistence
kubectl explain solrcloud.spec.zookeeperRef.provided.persistence.spec

Plane
Helm

plane ls

helm ls -A

Check the availability of the Solr Operator Pods.

kubectl get pods -n datalayer-solr-operator -l control-plane=solr-operator

Datalayer Solr Cluster

Prepare a AWS S3 bucket for the Solr backups (the DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME environment variable should contain that name) and create a secret with write access to that S3 bucket.

kubectl create secret generic aws-creds \
  --from-literal=access-key-id=$AWS_ACCESS_KEY_ID \
  --from-literal=secret-access-key=$AWS_SECRET_ACCESS_KEY \
  --namespace=datalayer-solr
kubectl describe secret aws-creds -n datalayer-solr

Create a secret for the Solr authentication.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: solr-basic-auth
  namespace: datalayer-solr
type: kubernetes.io/basic-auth
stringData:
  username: ${DATALAYER_SOLR_USERNAME}
  password: ${DATALAYER_SOLR_PASSWORD}
EOF
kubectl describe secret solr-basic-auth -n datalayer-solr
# Secret for the datalayer-api namespace.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: solr-basic-auth
  namespace: datalayer-api
type: kubernetes.io/basic-auth
stringData:
  username: ${DATALAYER_SOLR_USERNAME}
  password: ${DATALAYER_SOLR_PASSWORD}
EOF
kubectl describe secret solr-basic-auth -n datalayer-api

You are now ready to create a Solr cluster. ensure the DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME and DATALAYER_SOLR_BACKUP_S3_BUCKET_REGION variables with the name and region of the S3 bucket for the backups.

You can also configure the size of the cluster, a 3 nodes replica cluster is what is defined in the following spec.

Plane
Bash

kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer.yaml

cat <<EOF | kubectl apply -f -
apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
metadata:
  name: solr-datalayer
  namespace: datalayer-solr
spec:
  dataStorage:
    persistent:
      reclaimPolicy: Retain
      pvcTemplate:
        spec:
          resources:
            requests:
              storage: "200Gi"
  replicas: 3
  solrImage:
    tag: 9.0.0
  solrJavaMem: "-Xms1g -Xmx5g"
  solrModules:
    - s3-repository
  additionalLibs:
    - "/opt/solr/contrib/s3-repository/lib"
  backupRepositories:
    - name: s3
      s3:
        bucket: ${DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME}
        region: ${DATALAYER_SOLR_BACKUP_S3_BUCKET_REGION}
        credentials:
          accessKeyIdSecret:
            name: aws-creds
            key: access-key-id
          secretAccessKeySecret:
            name: aws-creds
            key: secret-access-key
  customSolrKubeOptions:
    podOptions:
      nodeSelector:
        role.datalayer.io/solr: "true"
      envVars:
      - name: AWS_ACCESS_KEY_ID
        valueFrom:
          secretKeyRef:
            name: aws-creds
            key: access-key-id
      - name: AWS_SECRET_ACCESS_KEY
        valueFrom:
          secretKeyRef:
            name: aws-creds
            key: secret-access-key
      - name: AWS_DEFAULT_REGION
        value: us-east-1
      resources:
        limits:
          memory: "3G"
        requests:
          cpu: "65m"
          memory: "1G"
  zookeeperRef:
    provided:
      zookeeperPodPolicy:
        nodeSelector:
          role.datalayer.io/solr: "true"
        resources:
          limits:
            memory: "1G"
          requests:
            cpu: "65m"
            memory: "156Mi"
      persistence:
        reclaimPolicy: Delete
        spec:
          resources:
            requests:
              storage: "5Gi"
      replicas: 3
  solrOpts: "-Dsolr.autoSoftCommit.maxTime=10000"
  solrGCTune: "-XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8"
  solrSecurity:
    authenticationType: "Basic"
    basicAuthSecret: "solr-basic-auth"
    probesRequireAuth: false
EOF

Check the creation of the Solr Cluster Pods. It may take multiple minutes to get a completely up-and-running Solr cluster, mainly due to the time to provision the required storage.

kubectl get solrclouds -n datalayer-solr -w
# NAME             VERSION   TARGETVERSION   DESIREDNODES   NODES   READYNODES   UPTODATENODES   AGE
# solr-datalayer   9.0.0                     3              3       3            3               79s
kubectl get pods -n datalayer-solr -w
kubectl describe pods -n datalayer-solr | grep Node

Create Datalayer Solr Collections

Solr collections are organized into two sets that can be initialized independently.

Core Collections (17)

Platform collections used by the main Datalayer services:

Collection	Purpose
`ai-agents`	AI agents service records
`contacts`	Contact information
`credits`	Usage credits
`datasources`	Data source definitions
`events`	Agent lifecycle and system event records
`iam`	Identity and access management
`iam-tokens`	IAM authentication tokens
`inbounds`	Inbound integrations
`invites`	User invitations
`library`	Content library
`notifications`	User-facing notifications records
`outbounds`	Outbound integrations
`secrets`	Encrypted secrets metadata
`spaces`	Workspace spaces
`success`	Success tracking
`tool-approvals`	Tool approval requests and decision state
`usage`	Platform usage metrics

Note: The notifications collection stores user-facing notifications, while events stores lifecycle and system events.

Runtimes Collections (2)

Collections specific to the runtimes subsystem:

Collection	Purpose
`runtimes-checkpoints`	Runtime checkpoint records (CRIU snapshots metadata)
`runtimes-snapshots`	Runtime snapshot records

Source of Truth

The collection arrays are defined in solr-collections.sh:

SOLR_COLLECTIONS_CORE — Core platform collections
SOLR_COLLECTIONS_RUNTIMES — Runtimes collections
SOLR_COLLECTIONS — All collections combined (union of the above two)

The solr-init script prompts which collection set to create. This is useful when adding the runtimes collections to an existing cluster without re-creating the core ones.

Plane
Bash

plane solr-init
# Select:
#   1) Core       - Platform collections (17 collections)
#   2) Runtimes   - Runtimes collections (2 collections)
#   3) All        - All collections (19 collections)
#
# The init pod is created, logs are streamed, and the pod is auto-cleaned.

Access Datalayer Solr

To connect from you host to the Solr user interface, add the following entries in your /etc/hosts file.

# /etc/hosts
127.0.0.1      solr-datalayer-solrcloud-0.solr-datalayer-solrcloud-headless.datalayer-solr solr-datalayer-solrcloud-1.solr-datalayer-solrcloud-headless.datalayer-solr solr-datalayer-solrcloud-2.solr-datalayer-solrcloud-headless.datalayer-solr

127.0.0.1      solr-datalayer-solrcloud-zookeeper-0.solr-datalayer-solrcloud-zookeeper-headless.datalayer-solr.svc.cluster.local solr-datalayer-solrcloud-zookeeper-1.solr-datalayer-solrcloud-zookeeper-headless.datalayer-solr.svc.cluster.local solr-datalayer-solrcloud-zookeeper-2.solr-datalayer-solrcloud-zookeeper-headless.datalayer-solr.svc.cluster.local

Launch two kubectl port-forward to the Kubernetes cluster.

# open http://localhost:8983/solr
# open http://solr-datalayer-solrcloud-0.solr-datalayer-solrcloud-headless.datalayer-solr:8983/solr
kubectl port-forward -n datalayer-solr service/solr-datalayer-solrcloud-zookeeper-client 2181:2181 &
kubectl port-forward -n datalayer-solr service/solr-datalayer-solrcloud-headless 8983:8983 &
wait

You will need Java and Apache Solr available on your system as well as the configuration files.

export ZK_HOST=localhost:2181
export SOLR_HOME=/opt/solr
export SOLR_AUTH_TYPE="basic"
export SOLR_AUTHENTICATION_OPTS="-Dbasicauth=solr:${DATALAYER_SOLR_PASSWORD}"
export PATH=$SOLR_HOME/bin:$PATH
export YELLOW='\x1b[33m'
export RESET='\x1b[0m'

for COLLECTION in ai-agents contacts credits datasources events iam iam-tokens inbounds invites library notifications outbounds secrets spaces success tool-approvals usage runtimes-checkpoints runtimes-snapshots
do
    echo
    echo -e $YELLOW"Creating Solr collection $COLLECTION"$RESET
    echo
    $SOLR_HOME/bin/solr create -c $COLLECTION -shards 3 -replicationFactor 3 -d $PLANE_HOME/etc/dockerfiles/datalayer-solr/config -p 8983
done

Backup Datalayer Solr

Solr collections are backed up to an AWS S3 bucket using the Solr Operator SolrBackup CRD. The backup relies on the s3 repository configured in the SolrCloud spec (see Datalayer Solr Cluster above).

Three backup definitions are available, matching the collection sets:

File	Scope	CR Name
`datalayer-backup-s3.yaml`	All 19 collections	`datalayer-solr-collection-backup`
`datalayer-backup-s3-core.yaml`	17 core collections	`datalayer-solr-collection-backup-core`
`datalayer-backup-s3-runtimes.yaml`	2 runtimes collections	`datalayer-solr-collection-backup-runtimes`

All backups run daily at 1:00 AM UTC and retain up to 200 snapshots.

Prerequisites

The aws-creds secret must exist in the datalayer-solr namespace (see Datalayer Solr Cluster).
The DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME and DATALAYER_SOLR_BACKUP_S3_BUCKET_REGION environment variables must be set.

Apply the Backup Schedule

Plane
Bash

# All collections.
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3.yaml
# Core collections only.
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3-core.yaml
# Runtimes collections only.
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3-runtimes.yaml

# Example: All collections.
cat <<EOF | kubectl apply -f -
apiVersion: solr.apache.org/v1beta1
kind: SolrBackup
metadata:
  name: datalayer-solr-collection-backup
  namespace: datalayer-solr
spec:
  repositoryName: s3
  solrCloud: solr-datalayer
  collections:
    - ai-agents
    - contacts
    - credits
    - datasources
    - events
    - iam
    - iam-tokens
    - inbounds
    - invites
    - library
    - notifications
    - runtimes-checkpoints
    - runtimes-snapshots
    - outbounds
    - secrets
    - spaces
    - success
    - tool-approvals
    - usage
  recurrence:
    schedule: "0 1 * * *" # every day at 1:00 AM
    maxSaved: 200
EOF

Monitor Backups

# List all backups.
kubectl get solrbackups -n datalayer-solr
# Describe a specific backup for detailed status.
kubectl describe solrbackup datalayer-solr-collection-backup -n datalayer-solr

Restore Datalayer Solr

Restore Solr collections from an S3 backup using the plane solr-restore command. The restore calls the Solr Collections API RESTORE action asynchronously for each collection.

When restoring multiple collections, the script prompts which collection set to restore (core, runtimes, or all). Specifying a single collection on the command line bypasses the prompt.

Prerequisites

The DATALAYER_SOLR_PASSWORD environment variable must be set.
A valid backup must exist in the S3 repository (check with kubectl get solrbackups -n datalayer-solr).

Restore Collections

The default backup name is datalayer-solr-collection-backup (matching the SolrBackup CRD name).

Plane
Bash

# Restore collections (prompts for set: core / runtimes / all).
plane solr-restore
# Restore only a specific collection (no prompt).
plane solr-restore datalayer-solr-collection-backup iam

# Restore a single collection via kubectl exec into a Solr pod.
# The backup name per collection is: <backup-name>-<collection>
kubectl exec -n datalayer-solr solr-datalayer-solrcloud-0 -- \
  curl -s -u "solr:${DATALAYER_SOLR_PASSWORD}" \
  "http://localhost:8983/solr/admin/collections?action=RESTORE&name=datalayer-solr-collection-backup-iam&collection=iam&location=/&repository=s3&async=iam-restore"

Monitor Restore Progress

Restore operations are asynchronous. Use the solr-restore-status script to check all collections at once, or query a single collection.

Plane
Bash

# Check the status of all restore operations.
plane solr-restore-status
# Check a single collection.
plane solr-restore-status iam

# Check the status of a single restore request.
kubectl exec -n datalayer-solr solr-datalayer-solrcloud-0 -- \
  curl -s -u "solr:${DATALAYER_SOLR_PASSWORD}" \
  "http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=iam-restore"

Scale Datalayer Solr

Scale Solr replicas if needed.

kubectl scale \
  --replicas=5 \
  solrcloud/solr-datalayer \
  -n datalayer-solr

Tear Down Datalayer Solr

Tear down the created Solr Cloud if needed.

kubectl delete solrcloud solr-datalayer -n datalayer-solr
kubectl get solrcloud -A

Tear down the Solr Operator if needed.

Plane
Helm

plane down datalayer-solr-operator

export RELEASE=datalayer-solr-operator
export NAMESPACE=datalayer-solr-operator
helm delete $RELEASE --namespace $NAMESPACE
kubectl delete \
  -n $NAMESPACE \
  -f https://solr.apache.org/operator/downloads/crds/v0.8.0/all-with-dependencies.yaml

Apache Solr Operator​

Datalayer Solr Cluster​

Create Datalayer Solr Collections​

Core Collections (17)​

Runtimes Collections (2)​

Source of Truth​

Access Datalayer Solr​

Backup Datalayer Solr​

Prerequisites​

Apply the Backup Schedule​

Monitor Backups​

Restore Datalayer Solr​

Prerequisites​

Restore Collections​

Monitor Restore Progress​

Scale Datalayer Solr​

Tear Down Datalayer Solr​

Apache Solr Operator

Datalayer Solr Cluster

Create Datalayer Solr Collections

Core Collections (17)

Runtimes Collections (2)

Source of Truth

Access Datalayer Solr

Backup Datalayer Solr

Prerequisites

Apply the Backup Schedule

Monitor Backups

Restore Datalayer Solr

Prerequisites

Restore Collections

Monitor Restore Progress

Scale Datalayer Solr

Tear Down Datalayer Solr