Skip to main content

Solr

Apache Solr Operator

To deploy Solr, you first need to deploy the Solr Operator.

helm repo add apache-solr https://solr.apache.org/charts
helm repo update
plane up datalayer-solr-operator

Check the availability of the Solr CRDs.

kubectl explain solrcloud.spec.zookeeperRef.provided.config
kubectl explain solrcloud.spec.zookeeperRef.provided.persistence
kubectl explain solrcloud.spec.zookeeperRef.provided.persistence.spec
plane ls

Check the availability of the Solr Operator Pods.

kubectl get pods -n datalayer-solr-operator -l control-plane=solr-operator

Datalayer Solr Cluster

Prepare a AWS S3 bucket for the Solr backups (the DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME environment variable should contain that name) and create a secret with write access to that S3 bucket.

kubectl create secret generic aws-creds \
--from-literal=access-key-id=$AWS_ACCESS_KEY_ID \
--from-literal=secret-access-key=$AWS_SECRET_ACCESS_KEY \
--namespace=datalayer-solr
kubectl describe secret aws-creds -n datalayer-solr

Create a secret for the Solr authentication.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: solr-basic-auth
namespace: datalayer-solr
type: kubernetes.io/basic-auth
stringData:
username: ${DATALAYER_SOLR_USERNAME}
password: ${DATALAYER_SOLR_PASSWORD}
EOF
kubectl describe secret solr-basic-auth -n datalayer-solr
# Secret for the datalayer-api namespace.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: solr-basic-auth
namespace: datalayer-api
type: kubernetes.io/basic-auth
stringData:
username: ${DATALAYER_SOLR_USERNAME}
password: ${DATALAYER_SOLR_PASSWORD}
EOF
kubectl describe secret solr-basic-auth -n datalayer-api

You are now ready to create a Solr cluster. ensure the DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME and DATALAYER_SOLR_BACKUP_S3_BUCKET_REGION variables with the name and region of the S3 bucket for the backups.

You can also configure the size of the cluster, a 3 nodes replica cluster is what is defined in the following spec.

kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer.yaml

Check the creation of the Solr Cluster Pods. It may take multiple minutes to get a completely up-and-running Solr cluster, mainly due to the time to provision the required storage.

kubectl get solrclouds -n datalayer-solr -w
# NAME VERSION TARGETVERSION DESIREDNODES NODES READYNODES UPTODATENODES AGE
# solr-datalayer 9.0.0 3 3 3 3 79s
kubectl get pods -n datalayer-solr -w
kubectl describe pods -n datalayer-solr | grep Node

Create Datalayer Solr Collections

Create the Solr collections.

plane solr-init
# Check the solr init pod.
kubectl get pod datalayer-solr-init -n datalayer-system -w
# Check the logs and once initialization successfully completed, delete the pod.
kubectl logs datalayer-solr-init -n datalayer-system -f
# Delete the solr init container.
kubectl delete pod datalayer-solr-init -n datalayer-system

Backup Datalayer Solr

Solr collections are backed up to an AWS S3 bucket using the Solr Operator SolrBackup CRD. The backup relies on the s3 repository configured in the SolrCloud spec (see Datalayer Solr Cluster above).

The backup is configured as a recurring job that runs daily at 1:00 AM UTC and retains up to 200 snapshots. All 14 Datalayer collections are included: contacts, credits, datasources, iam, iam-tokens, inbounds, invites, library, runtimes-snapshots, outbounds, secrets, spaces, success, and usage.

Prerequisites

  • The aws-creds secret must exist in the datalayer-solr namespace (see Datalayer Solr Cluster).
  • The DATALAYER_SOLR_BACKUP_S3_BUCKET_NAME and DATALAYER_SOLR_BACKUP_S3_BUCKET_REGION environment variables must be set.

Apply the Backup Schedule

kubectl apply -f $PLANE_HOME/etc/specs/solr/datalayer-backup-s3.yaml

Monitor Backups

# List all backups.
kubectl get solrbackups -n datalayer-solr
# Describe a specific backup for detailed status.
kubectl describe solrbackup datalayer-solr-collection-backup -n datalayer-solr

Restore Datalayer Solr

Restore Solr collections from an S3 backup using the plane solr-restore command. The restore calls the Solr Collections API RESTORE action asynchronously for each collection.

Prerequisites

  • The DATALAYER_SOLR_PASSWORD environment variable must be set.
  • A valid backup must exist in the S3 repository (check with kubectl get solrbackups -n datalayer-solr).

Restore Collections

The default backup name is datalayer-solr-collection-backup (matching the SolrBackup CRD name).

# Restore all collections from the default backup.
plane solr-restore
# Restore only a specific collection.
plane solr-restore datalayer-solr-collection-backup iam

Monitor Restore Progress

Restore operations are asynchronous. Use the solr-restore-status script to check all collections at once, or query a single collection.

# Check the status of all restore operations.
plane solr-restore-status
# Check a single collection.
plane solr-restore-status iam

Scale Datalayer Solr

Scale Solr replicas if needed.

kubectl scale \
--replicas=5 \
solrcloud/solr-datalayer \
-n datalayer-solr

Tear Down Datalayer Solr

Tear down the created Solr Cloud if needed.

kubectl delete solrcloud solr-datalayer -n datalayer-solr
kubectl get solrcloud -A

Tear down the Solr Operator if needed.

plane down datalayer-solr-operator