Great – here is the updated GitLab issue draft incorporating your inputs:
Title
etcd Upgrade for PostgreSQL Clusters – 39 Servers (Weekend Maintenance)
Description
We need to perform an upgrade of the etcd version used by PostgreSQL Patroni clusters across our environment to address security vulnerabilities and improve stability.
The current etcd version deployed on multiple clusters is outdated and requires upgrade to a supported and secure release.
This activity will cover validation and upgrade of 39 servers hosting etcd for PostgreSQL clusters.
Scope
Total servers in scope: 39
Environments: Production / Staging / Non-Production
Upgrade of etcd on all PostgreSQL Patroni clusters
Health validation of PostgreSQL services post-upgrade
Testing of failover and cluster stability
Reason for Upgrade
Mitigation of known vulnerabilities in existing etcd versions
Stability and performance improvements
Compatibility with recommended Patroni/PostgreSQL HA configurations
Alignment with global infrastructure standards
Reference:
https://dev.oud.ubc.net/bbs/fs/hosting/database/apac-db-reliability/issues/224�
Maintenance Window
Activity to be performed during weekend maintenance window
Exact schedule to be coordinated with application teams
Changes will be executed in a phased and controlled manner
Implementation Plan
Identify current etcd version on all 39 servers
Notify stakeholders and finalize maintenance schedule
Take configuration and data backups
Perform rolling upgrade of etcd nodes cluster by cluster
Validate:
etcd cluster health
Patroni status
PostgreSQL connectivity
Perform controlled failover tests
Post-upgrade monitoring and verification
Rollback Plan
Revert etcd to previous version if any instability occurs
Restore prior configuration backups
Revalidate cluster functionality
Teams Involved
Database Operations Team
Global Infrastructure Team
Respective Application Owners (for validation)
Acceptance Criteria
All 39 servers successfully upgraded to target etcd version
PostgreSQL clusters fully operational post upgrade
Patroni leader election and failover working as expected
No unplanned downtime or data inconsistency
Priority
High
Timeline
Planning and communication:
Execution window: Weekend
Post validation: Following business day
If you share the current etcd version and target version, I can refine the issue further with exact technical details 👍