Solr
Apache Solr Operator
To deploy Solr, you first need to deploy the Solr Operator.
helm repo add apache-solr https://solr.apache.org/charts
helm repo update
- Plane
- Helm
plane up datalayer-solr-operator
cat << 'EOF' > /tmp/values.yaml
nodeSelector:
role.datalayer.io/system: "true"
zookeeper-operator:
nodeSelector:
role.datalayer.io/system: "true"
EOF
export RELEASE=datalayer-solr-operator
export NAMESPACE=datalayer-solr-operator
kubectl create \
-n $NAMESPACE \
-f https://solr.apache.org/operator/downloads/crds/v0.8.0/all-with-dependencies.yaml
helm upgrade \
--install $RELEASE \
apache-solr/solr-operator \
--version 0.8.0 \
--create-namespace \
--namespace $NAMESPACE \
--values /tmp/values.yaml \
--timeout 5m
Check the availability of the Solr CRDs.
kubectl explain solrcloud.spec.zookeeperRef.provided.config
kubectl explain solrcloud.spec.zookeeperRef.provided.persistence
kubectl explain solrcloud.spec.zookeeperRef.provided.persistence.spec
- Plane
- Helm
plane ls
helm ls -A
Check the availability of the Solr Operator Pods.
kubectl get pods -n datalayer-solr-operator -l control-plane=solr-operator
Datalayer Solr Cluster
Prepare a AWS S3 bucket for the Solr backups (the DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME environment variable should contain that name) and create a secret with write access to that S3 bucket.
kubectl create secret generic aws-creds \
--from-literal=access-key-id=$AWS_ACCESS_KEY_ID \
--from-literal=secret-access-key=$AWS_SECRET_ACCESS_KEY \
--namespace=datalayer-solr
kubectl describe secret aws-creds -n datalayer-solr
Create a secret for the Solr authentication.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: solr-basic-auth
namespace: datalayer-solr
type: kubernetes.io/basic-auth
stringData:
username: ${DATALAYER_SOLR_USERNAME}
password: ${DATALAYER_SOLR_PASSWORD}
EOF
kubectl describe secret solr-basic-auth -n datalayer-solr
# Secret for the datalayer-api namespace.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: solr-basic-auth
namespace: datalayer-api
type: kubernetes.io/basic-auth
stringData:
username: ${DATALAYER_SOLR_USERNAME}
password: ${DATALAYER_SOLR_PASSWORD}
EOF
kubectl describe secret solr-basic-auth -n datalayer-api
You are now ready to create a Solr cluster. ensure the DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME and DATALAYER_SOLR_BACKUP_S3_BUCKET_REGION variables with the name and region of the S3 bucket for the backups.
You can also configure the size of the cluster, a 3 nodes replica cluster is what is defined in the following spec.
- Plane
- Bash
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer.yaml
cat <<EOF | kubectl apply -f -
apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
metadata:
name: solr-datalayer
namespace: datalayer-solr
spec:
dataStorage:
persistent:
reclaimPolicy: Retain
pvcTemplate:
spec:
resources:
requests:
storage: "200Gi"
replicas: 3
solrImage:
tag: 9.0.0
solrJavaMem: "-Xms1g -Xmx5g"
solrModules:
- s3-repository
additionalLibs:
- "/opt/solr/contrib/s3-repository/lib"
backupRepositories:
- name: s3
s3:
bucket: ${DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME}
region: ${DATALAYER_SOLR_BACKUP_S3_BUCKET_REGION}
credentials:
accessKeyIdSecret:
name: aws-creds
key: access-key-id
secretAccessKeySecret:
name: aws-creds
key: secret-access-key
customSolrKubeOptions:
podOptions:
nodeSelector:
role.datalayer.io/solr: "true"
envVars:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-creds
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-creds
key: secret-access-key
- name: AWS_DEFAULT_REGION
value: us-east-1
resources:
limits:
memory: "3G"
requests:
cpu: "65m"
memory: "1G"
zookeeperRef:
provided:
zookeeperPodPolicy:
nodeSelector:
role.datalayer.io/solr: "true"
resources:
limits:
memory: "1G"
requests:
cpu: "65m"
memory: "156Mi"
persistence:
reclaimPolicy: Delete
spec:
resources:
requests:
storage: "5Gi"
replicas: 3
solrOpts: "-Dsolr.autoSoftCommit.maxTime=10000"
solrGCTune: "-XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8"
solrSecurity:
authenticationType: "Basic"
basicAuthSecret: "solr-basic-auth"
probesRequireAuth: false
EOF
Check the creation of the Solr Cluster Pods. It may take multiple minutes to get a completely up-and-running Solr cluster, mainly due to the time to provision the required storage.
kubectl get solrclouds -n datalayer-solr -w
# NAME VERSION TARGETVERSION DESIREDNODES NODES READYNODES UPTODATENODES AGE
# solr-datalayer 9.0.0 3 3 3 3 79s
kubectl get pods -n datalayer-solr -w
kubectl describe pods -n datalayer-solr | grep Node
Create Datalayer Solr Collections
Solr collections are organized into two sets that can be initialized independently.
Core Collections (17)
Platform collections used by the main Datalayer services:
| Collection | Purpose |
|---|---|
ai-agents | AI agents service records |
contacts | Contact information |
credits | Usage credits |
datasources | Data source definitions |
events | Agent lifecycle and system event records |
iam | Identity and access management |
iam-tokens | IAM authentication tokens |
inbounds | Inbound integrations |
invites | User invitations |
library | Content library |
notifications | User-facing notifications records |
outbounds | Outbound integrations |
secrets | Encrypted secrets metadata |
spaces | Workspace spaces |
success | Success tracking |
tool-approvals | Tool approval requests and decision state |
usage | Platform usage metrics |
Note: The notifications collection stores user-facing notifications, while events stores lifecycle and system events.
Runtimes Collections (2)
Collections specific to the runtimes subsystem:
| Collection | Purpose |
|---|---|
runtimes-checkpoints | Runtime checkpoint records (CRIU snapshots metadata) |
runtimes-snapshots | Runtime snapshot records |
Source of Truth
The collection arrays are defined in solr-collections.sh:
SOLR_COLLECTIONS_CORE— Core platform collectionsSOLR_COLLECTIONS_RUNTIMES— Runtimes collectionsSOLR_COLLECTIONS— All collections combined (union of the above two)
The solr-init script prompts which collection set to create. This is useful when adding the runtimes collections to an existing cluster without re-creating the core ones.
- Plane
- Bash
plane solr-init
# Select:
# 1) Core - Platform collections (17 collections)
# 2) Runtimes - Runtimes collections (2 collections)
# 3) All - All collections (19 collections)
#
# The init pod is created, logs are streamed, and the pod is auto-cleaned.
Access Datalayer Solr
To connect from you host to the Solr user interface, add the following entries in your /etc/hosts file.
# /etc/hosts
127.0.0.1 solr-datalayer-solrcloud-0.solr-datalayer-solrcloud-headless.datalayer-solr solr-datalayer-solrcloud-1.solr-datalayer-solrcloud-headless.datalayer-solr solr-datalayer-solrcloud-2.solr-datalayer-solrcloud-headless.datalayer-solr
127.0.0.1 solr-datalayer-solrcloud-zookeeper-0.solr-datalayer-solrcloud-zookeeper-headless.datalayer-solr.svc.cluster.local solr-datalayer-solrcloud-zookeeper-1.solr-datalayer-solrcloud-zookeeper-headless.datalayer-solr.svc.cluster.local solr-datalayer-solrcloud-zookeeper-2.solr-datalayer-solrcloud-zookeeper-headless.datalayer-solr.svc.cluster.local
Launch two kubectl port-forward to the Kubernetes cluster.
# open http://localhost:8983/solr
# open http://solr-datalayer-solrcloud-0.solr-datalayer-solrcloud-headless.datalayer-solr:8983/solr
kubectl port-forward -n datalayer-solr service/solr-datalayer-solrcloud-zookeeper-client 2181:2181 &
kubectl port-forward -n datalayer-solr service/solr-datalayer-solrcloud-headless 8983:8983 &
wait
You will need Java and Apache Solr available on your system as well as the configuration files.
export ZK_HOST=localhost:2181
export SOLR_HOME=/opt/solr
export SOLR_AUTH_TYPE="basic"
export SOLR_AUTHENTICATION_OPTS="-Dbasicauth=solr:${DATALAYER_SOLR_PASSWORD}"
export PATH=$SOLR_HOME/bin:$PATH
export YELLOW='\x1b[33m'
export RESET='\x1b[0m'
for COLLECTION in ai-agents contacts credits datasources events iam iam-tokens inbounds invites library notifications outbounds secrets spaces success tool-approvals usage runtimes-checkpoints runtimes-snapshots
do
echo
echo -e $YELLOW"Creating Solr collection $COLLECTION"$RESET
echo
$SOLR_HOME/bin/solr create -c $COLLECTION -shards 3 -replicationFactor 3 -d $PLANE_HOME/etc/dockerfiles/datalayer-solr/config -p 8983
done
Backup Datalayer Solr
Solr collections are backed up to an AWS S3 bucket using the Solr Operator SolrBackup CRD. The backup relies on the s3 repository configured in the SolrCloud spec (see Datalayer Solr Cluster above).
Three backup definitions are available, matching the collection sets:
| File | Scope | CR Name |
|---|---|---|
datalayer-backup-s3.yaml | All 19 collections | datalayer-solr-collection-backup |
datalayer-backup-s3-core.yaml | 17 core collections | datalayer-solr-collection-backup-core |
datalayer-backup-s3-runtimes.yaml | 2 runtimes collections | datalayer-solr-collection-backup-runtimes |
All backups run daily at 1:00 AM UTC and retain up to 200 snapshots.
Prerequisites
- The
aws-credssecret must exist in thedatalayer-solrnamespace (see Datalayer Solr Cluster). - The
DATALAYER_SOLR_BACKUP_S3_BUCKET_NAMEandDATALAYER_SOLR_BACKUP_S3_BUCKET_REGIONenvironment variables must be set.
Apply the Backup Schedule
- Plane
- Bash
# All collections.
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3.yaml
# Core collections only.
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3-core.yaml
# Runtimes collections only.
kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3-runtimes.yaml
# Example: All collections.
cat <<EOF | kubectl apply -f -
apiVersion: solr.apache.org/v1beta1
kind: SolrBackup
metadata:
name: datalayer-solr-collection-backup
namespace: datalayer-solr
spec:
repositoryName: s3
solrCloud: solr-datalayer
collections:
- ai-agents
- contacts
- credits
- datasources
- events
- iam
- iam-tokens
- inbounds
- invites
- library
- notifications
- runtimes-checkpoints
- runtimes-snapshots
- outbounds
- secrets
- spaces
- success
- tool-approvals
- usage
recurrence:
schedule: "0 1 * * *" # every day at 1:00 AM
maxSaved: 200
EOF
Monitor Backups
# List all backups.
kubectl get solrbackups -n datalayer-solr
# Describe a specific backup for detailed status.
kubectl describe solrbackup datalayer-solr-collection-backup -n datalayer-solr
Restore Datalayer Solr
Restore Solr collections from an S3 backup using the plane solr-restore command. The restore calls the Solr Collections API RESTORE action asynchronously for each collection.
When restoring multiple collections, the script prompts which collection set to restore (core, runtimes, or all). Specifying a single collection on the command line bypasses the prompt.
Prerequisites
- The
DATALAYER_SOLR_PASSWORDenvironment variable must be set. - A valid backup must exist in the S3 repository (check with
kubectl get solrbackups -n datalayer-solr).
Restore Collections
The default backup name is datalayer-solr-collection-backup (matching the SolrBackup CRD name).
- Plane
- Bash
# Restore collections (prompts for set: core / runtimes / all).
plane solr-restore
# Restore only a specific collection (no prompt).
plane solr-restore datalayer-solr-collection-backup iam
# Restore a single collection via kubectl exec into a Solr pod.
# The backup name per collection is: <backup-name>-<collection>
kubectl exec -n datalayer-solr solr-datalayer-solrcloud-0 -- \
curl -s -u "solr:${DATALAYER_SOLR_PASSWORD}" \
"http://localhost:8983/solr/admin/collections?action=RESTORE&name=datalayer-solr-collection-backup-iam&collection=iam&location=/&repository=s3&async=iam-restore"
Monitor Restore Progress
Restore operations are asynchronous. Use the solr-restore-status script to check all collections at once, or query a single collection.
- Plane
- Bash
# Check the status of all restore operations.
plane solr-restore-status
# Check a single collection.
plane solr-restore-status iam
# Check the status of a single restore request.
kubectl exec -n datalayer-solr solr-datalayer-solrcloud-0 -- \
curl -s -u "solr:${DATALAYER_SOLR_PASSWORD}" \
"http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=iam-restore"
Scale Datalayer Solr
Scale Solr replicas if needed.
kubectl scale \
--replicas=5 \
solrcloud/solr-datalayer \
-n datalayer-solr
Tear Down Datalayer Solr
Tear down the created Solr Cloud if needed.
kubectl delete solrcloud solr-datalayer -n datalayer-solr
kubectl get solrcloud -A
Tear down the Solr Operator if needed.
- Plane
- Helm
plane down datalayer-solr-operator
export RELEASE=datalayer-solr-operator
export NAMESPACE=datalayer-solr-operator
helm delete $RELEASE --namespace $NAMESPACE
kubectl delete \
-n $NAMESPACE \
-f https://solr.apache.org/operator/downloads/crds/v0.8.0/all-with-dependencies.yaml