Disaster Recovery for PGVector Langchain Application with Percona PostgreSQL
Prerequisites
- AWS Cli
- yq
- kubectl
- Nova Control Plane installed with 3 workload clusters connected
The paths to files will be defined relatively to try-nova root directory.
We will first export these environment variables so that subsequent steps in this tutorial can be easily followed.
export NOVA_NAMESPACE=elotl
export NOVA_CONTROLPLANE_CONTEXT=nova
export NOVA_WORKLOAD_CLUSTER_1=wlc-1
export NOVA_WORKLOAD_CLUSTER_2=wlc-2
Export these additional environment variables if you installed Nova using the tarball.
export K8S_HOSTING_CLUSTER_CONTEXT=k8s-cluster-hosting-cp
export NOVA_WORKLOAD_CLUSTER_1=wlc-1
export NOVA_WORKLOAD_CLUSTER_2=wlc-2
Alternatively export these environment variables if you installed Nova using setup scripts provided in the try-nova repository.
export K8S_HOSTING_CLUSTER_CONTEXT=kind-hosting-cluster
export K8S_CLUSTER_CONTEXT_1=kind-wlc-1
export K8S_CLUSTER_CONTEXT_2=kind-wlc-2
Setting Up S3 Access for Backups
Our first step involves setting up an S3 bucket for backups. Follow these commands to create a bucket and configure access:
- Create S3 bucket
REGION=eu-west-2
aws s3api create-bucket \
--bucket nova-postgresql-backup \
--region $REGION \
--create-bucket-configuration LocationConstraint=$REGION
- Create IAM Policy:
aws iam create-policy \
--policy-name read-write-list-s3-nova-postgresql-backup \
--policy-document file://examples/pgvector-disaster-recovery/s3-policy.json
- List Policies to Verify:
aws iam list-policies --query 'Policies[?PolicyName==`read-write-list-s3-nova-postgresql-backup`].Arn' --output text
- Create User and Attach Policy:
aws iam create-user --no-cli-pager --user-name s3-backup-service-account
POLICYARN=$(aws iam list-policies --query 'Policies[?PolicyName==`read-write-list-s3-nova-postgresql-backup`].Arn' --output text)
aws iam attach-user-policy \
--policy-arn $POLICYARN \
--user-name s3-backup-service-account
aws iam create-access-key --user-name s3-backup-service-account
NOTE Before rerunning this tutorial make sure that used bucket is empty.
{
"AccessKey": {
"UserName": "s3-backup-service-account",
"AccessKeyId": "AKIAXXXX",
"Status": "Active",
"SecretAccessKey": "VaC0xxxx",
"CreateDate": "2023-12-13T13:59:34+00:00"
}
}
Note down the AccessKeyId and SecretAccessKey values and substitute in examples/pgvector-disaster-recovery/template-s3-bucket-access-key-secret.txt
base64 -i examples/pgvector-disaster-recovery/template-s3-bucket-access-key-secret.txt
Place output in examples/pgvector-disaster-recovery/s3-access-secret.yaml
Uploading pgvector Postgres extension
In order to use pgvector we need to place it in s3 so that Percona operator can find it and install it. In this example we'll use the same s3 bucket as we do for backups, just for simplicity. To do that, simply run
aws s3 cp --profile my-aws-profile --endpoint-url http://172.18.255.240:9000 examples/pgvector-disaster-recovery/pgvector-pg15-0.5.1.tar.gz s3://nova-postgresql-backup/pgvector-pg15-0.5.1.tar.gz
Installing Percona PostgreSQL Operator
Now let's install the Percona PostgreSQL Operator and set up the clusters:
- Create Schedule Policies: Below policies will schedule PostgreSQL Operator to cluster 1 and 2, primary PostgreSQL cluster to 1 and standby to 2. HaProxy will be also scheduled to cluster 2.
kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} create -f examples/pgvector-disaster-recovery/schedule-policies.yaml
- Clone Percona PostgreSQL Repository:
REPO_DIR="percona-postgresql-operator"
REPO_URL="https://github.com/percona/percona-postgresql-operator"
REPO_BRANCH="v2.3.0"
if [ -d "$REPO_DIR" ]; then
rm -rf $REPO_DIR
fi
git clone -b $REPO_BRANCH $REPO_URL
- Proceed with installing Percona PostgreSQL Operator
echo "Creating operator namespace"
kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} create ns pgvector-operator --dry-run=client -o yaml | yq e ".metadata.labels.psql-cluster = \"all\"" | kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} apply -f -
echo "Installing operator to cluster all"
cat percona-postgresql-operator/deploy/bundle.yaml | python3 add_labels.py namespace psql-cluster all | python3 add_labels.py cluster psql-cluster all | kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} --namespace pgvector-operator create -f -
When running on AWS use:
# echo "Settting up s3 access"
cat examples/pgvector-disaster-recovery/s3-access-secret.yaml | python3 add_labels.py namespace psql-cluster all | kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} --namespace pgvector-operator create -f -
and when running locally with Minio:
# echo "Settting up s3 access"
cat examples/pgvector-disaster-recovery/s3-access-secret-minio.yaml | python3 add_labels.py namespace psql-cluster all | kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} --namespace pgvector-operator create -f -
- Configure 2 PostgreSQL clusters
cat examples/pgvector-disaster-recovery/cluster_1_cr.yaml | python3 add_labels.py namespace psql-cluster cluster-1 | kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} --namespace pgvector-operator create -f -
cat examples/pgvector-disaster-recovery/cluster_2_cr.yaml | python3 add_labels.py namespace psql-cluster cluster-2 | kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} --namespace pgvector-operator create -f -
- Setup loadbalancer in front of our databases. LoadBalancer is needed to keep supporting client connection after the recovery switch is made. For our example we'll use HAProxy. We'll need address of our active PostgreSQL cluster. To get it, you can run:
kubectl wait perconapgcluster/cluster1 -n psql-operator --context=${K8S_CLUSTER_CONTEXT_1} '--for=jsonpath={.status.host}' --timeout=300s
DB_HOST=$(kubectl --context=${K8S_CLUSTER_CONTEXT_1} get perconapgcluster/cluster1 -n psql-operator -o jsonpath='{.status.host}')
envsubst < "examples/pgvector-disaster-recovery/haproxy.cfg" > "./haproxy.cfg"
kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} create configmap haproxy-config --from-file=haproxy.cfg=./haproxy.cfg --dry-run=client -o yaml | python3 add_labels.py namespace cluster cluster-ha-proxy | kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} apply -f -
And then apply actual HAProxy deployment and service
kubectl --context=${NOVA_CONTROLPLANE_CONTEXT} create -f examples/pgvector-disaster-recovery/haproxy.yaml