Version: v0.8.0

Disaster Recovery for PGVector Langchain Application with Percona PostgreSQL

Prerequisites

AWS Cli
yq
kubectl
Nova Control Plane installed with 3 workload clusters connected

The paths to files will be defined relatively to try-nova root directory.

Setting Up S3 Access for Backups

Our first step involves setting up an S3 bucket for backups. Follow these commands to create a bucket and configure access:

Create S3 bucket

REGION=eu-west-2

aws s3api create-bucket \
    --bucket nova-postgresql-backup \
    --region $REGION \
    --create-bucket-configuration LocationConstraint=$REGION

Create IAM Policy:

aws iam create-policy \
    --policy-name read-write-list-s3-nova-postgresql-backup \
    --policy-document file://examples/percona-disaster-recovery/s3-policy.json

List Policies to Verify:

aws iam list-policies --query 'Policies[?PolicyName==`read-write-list-s3-nova-postgresql-backup`].Arn' --output text

Create User and Attach Policy:

aws iam create-user --no-cli-pager --user-name s3-backup-service-account

POLICYARN=$(aws iam list-policies --query 'Policies[?PolicyName==`read-write-list-s3-nova-postgresql-backup`].Arn' --output text)
aws iam attach-user-policy \
    --policy-arn $POLICYARN \
    --user-name s3-backup-service-account

aws iam create-access-key --user-name s3-backup-service-account

NOTE Before rerunning this tutorial make sure that used bucket is empty.

{
    "AccessKey": {
        "UserName": "s3-backup-service-account",
        "AccessKeyId": "AKIAXXXX",
        "Status": "Active",
        "SecretAccessKey": "VaC0xxxx",
        "CreateDate": "2023-12-13T13:59:34+00:00"
    }
}

Note down the AccessKeyId and SecretAccessKey values and substitute in examples/percona-disaster-recovery/template-s3-bucket-access-key-secret.txt

base64 -i examples/percona-disaster-recovery/template-s3-bucket-access-key-secret.txt

Place output in examples/percona-disaster-recovery/s3-access-secret.yaml

Uploading pgvector Postgres extension

In order to use pgvector we need to place it in s3 so that Percona operator can find it and install it. In this example we'll use the same s3 bucket as we do for backups, just for simplicity. To do that, simply run

aws s3 cp --profile my-aws-profile --endpoint-url http://172.18.255.240:9000 examples/percona-disaster-recovery/pgvector-pg15-0.5.1.tar.gz s3://nova-postgresql-backup/pgvector-pg15-0.5.1.tar.gz

Installing Percona PostgreSQL Operator

Now let's install the Percona PostgreSQL Operator and set up the clusters:

Create Schedule Policies: Below policies will schedule PostgreSQL Operator to cluster 1 and 2, primary PostgreSQL cluster to 1 and standby to 2. HaProxy will be also scheduled to cluster 2.

kubectl --context nova create -f examples/percona-disaster-recovery/schedule-policies.yaml

Clone Percona PostgreSQL Repository:

REPO_DIR="percona-postgresql-operator"
REPO_URL="https://github.com/percona/percona-postgresql-operator"
REPO_BRANCH="v2.3.0"

if [ -d "$REPO_DIR" ]; then
    rm -rf $REPO_DIR
fi

git clone -b $REPO_BRANCH $REPO_URL

Proceed with installing Percona PostgreSQL Operator

echo "Creating operator namespace"
kubectl --context nova create ns psql-operator --dry-run=client -o yaml | yq e ".metadata.labels.psql-cluster = \"all\"" | kubectl --context nova apply -f -

echo "Installing operator to cluster all"
cat percona-postgresql-operator/deploy/bundle.yaml | python3 add_labels.py namespace psql-cluster all | python3 add_labels.py cluster psql-cluster all | kubectl --context nova --namespace psql-operator create -f -

When running on AWS use:

# echo "Settting up s3 access"
cat examples/percona-disaster-recovery/s3-access-secret.yaml | python3 add_labels.py namespace psql-cluster all | kubectl --context nova --namespace psql-operator create -f -

and when running locally with Minio:

# echo "Settting up s3 access"
cat examples/percona-disaster-recovery/s3-access-secret-minio.yaml | python3 add_labels.py namespace psql-cluster all | kubectl --context nova --namespace psql-operator create -f -

Configure 2 PostgreSQL clusters

cat examples/percona-disaster-recovery/cluster_1_cr.yaml | python3 add_labels.py namespace psql-cluster cluster-1 | kubectl --context nova --namespace psql-operator create -f -

cat examples/percona-disaster-recovery/cluster_2_cr.yaml | python3 add_labels.py namespace psql-cluster cluster-2 | kubectl --context nova --namespace psql-operator create -f -

Setup loadbalancer in front of our databases. LoadBalancer is needed to keep supporting client connection after the recovery switch is made. For our example we'll use HAProxy. We'll need address of our active PostgreSQL cluster. To get it, you can run:

kubectl --context nova get perconapgcluster

Then substitute address for server db1 in HAProxy config:

defaults
  mode tcp
  timeout connect 5000ms
  timeout client 50000ms
  timeout server 50000ms

frontend fe_main
  bind *:5432
  default_backend be_db_1

backend be_db_1
  server db1 <address>:<port> check

Save this config as examples/percona-disaster-recovery/haproxy.cfg file and run following command to create ConfigMap.

kubectl --context nova create configmap haproxy-config --from-file=haproxy.cfg=examples/percona-disaster-recovery/haproxy.cfg --dry-run=client -o yaml | python3 add_labels.py namespace cluster cluster-ha-proxy | kubectl --context nova apply -f -

And then apply actual HAProxy deployment and service

kubectl --context nova create -f examples/percona-disaster-recovery/haproxy.yaml

Setup RecoveryPlan

apiVersion: recovery.elotl.co/v1alpha1
kind: RecoveryPlan
metadata:
  name: psql-primary-failover-plan
spec:
  alertLabels:
    app: percona-postgresql-cluster-1
  steps:
    - type: patch  # set cluster 1 to standby
      patch:
        apiVersion: "pgv2.percona.com/v2
        resource: "perconapgclusters"
        namespace: "psql-operator"
        name: "cluster1"
        override:
          fieldPath: "spec.standby.enabled"
          value:
            raw: true
        patchType: "application/merge-patch+json"
    - type: patch  # set cluster 2 as new primery
      patch:
        apiVersion: "pgv2.percona.com/v2"
        resource: "perconapgclusters"
        namespace: "psql-operator"
        name: "cluster2"
        override:
          fieldPath: "spec.standby.enabled"
          value:
            raw: false
        patchType: "application/merge-patch+json"
    - type: readField  # read cluster 2 host
      readField:
        apiVersion: "pgv2.percona.com/v2"
        resource: "perconapgclusters"
        namespace: "psql-operator"
        name: "cluster2"
        fieldPath: "status.host"
        outputKey: "Cluster2IP"
    - type: patch  # update HAProxy to point to cluster 2
      patch:
        apiVersion: "v1"
        resource: "configmaps"
        namespace: "default"
        name: "haproxy-config"
        override:
          fieldPath: "data"
          value:
            raw: {"haproxy.cfg": "defaults\n    mode tcp\n    timeout connect 5000ms\n    timeout client 50000ms\n    timeout server 50000ms\n\nfrontend fe_main\n    bind *:5432\n    default_backend be_db_2\n\nbackend be_db_2\n    server db2 {{ .Values.Cluster2IP }}:5432 check"}
        patchType: "application/merge-patch+json"

Let's run

Recovery plan will read the host of standby cluster, so we need to make sure it was assigned, before proceeding

kubectl wait perconapgclusters/cluster2 -n psql-operator --context nova '--for=jsonpath={.status.host}' --timeout=180s

Add recovery plan

kubectl --context nova create -f examples/percona-disaster-recovery/recovery-plan.yaml

In production systems alerts will be sent to Nova through recovery webhook, by some metrics service like Prometheus with Alertmanager. For ease of this tutorial we will simulate receiving an alert by adding it to Nova. When the alert is added Nova looks for recovery plan by matching alert labels to recovery plan labels. Once it finds the recovery plan it executes it.

kubectl --context nova create -f examples/percona-disaster-recovery/received-alert.yaml

Let's verify if recovery succeeded

Check if cluster 1 (in our tutorial we assume it fails) is set to standby.

kubectl wait perconapgclusters/cluster1 -n psql-operator --context nova '--for=jsonpath={.spec.standby.enabled}'=true --timeout=180s

Check if cluster 2 took over the role of primary - standby false.

kubectl wait perconapgclusters/cluster2 -n psql-operator --context nova '--for=jsonpath={.spec.standby.enabled}'=false --timeout=180s

Check if HAProxy is now connected to the new primary cluster - cluster 2.

kubectl get cm/haproxy-config --context nova -n default -o jsonpath='{.data.haproxy\.cfg}' | grep 'server db2'

server db2 172.18.255.240:5432 check

Cleanup

kubectl --context nova delete -f examples/percona-disaster-recovery/received-alert.yaml

kubectl --context nova delete -f examples/percona-disaster-recovery/recovery-plan.yaml

kubectl --context nova delete -f examples/percona-disaster-recovery/haproxy.yaml

kubectl --context nova create configmap haproxy-config --from-file=haproxy.cfg=examples/percona-disaster-recovery/haproxy.cfg --dry-run=client -o yaml | python3 add_labels.py namespace cluster cluster-ha-proxy | kubectl --context nova delete -f -

cat examples/percona-disaster-recovery/cluster_1_cr.yaml | python3 add_labels.py namespace psql-cluster cluster-1 | kubectl --context nova --namespace psql-operator delete -f -

cat examples/percona-disaster-recovery/cluster_2_cr.yaml | python3 add_labels.py namespace psql-cluster cluster-2 | kubectl --context nova --namespace psql-operator delete -f -

cat percona-postgresql-operator/deploy/bundle.yaml | python3 add_labels.py namespace psql-cluster all | python3 add_labels.py cluster psql-cluster all | kubectl --context nova --namespace psql-operator delete -f -

cat examples/percona-disaster-recovery/s3-access-secret.yaml | python3 add_labels.py namespace psql-cluster all | kubectl --context nova delete -f -

kubectl --context nova create ns psql-operator --dry-run=client -o yaml | yq e ".metadata.labels.psql-cluster = \"all\"" | kubectl --context nova delete -f -

kubectl --context nova delete -f examples/percona-disaster-recovery/schedule-policies.yaml

Disaster Recovery for PGVector Langchain Application with Percona PostgreSQL

Prerequisites​

Setting Up S3 Access for Backups​

Uploading pgvector Postgres extension​

Installing Percona PostgreSQL Operator​

Setup RecoveryPlan​

Let's run​

Let's verify if recovery succeeded​