As part of the preemptive troubleshooting effort for the PostgreSQL issue scheduled this weekend, I conducted a database restoration simulation to identify possible resolutions. Below are the findings and key steps documented:
Significance of
postgresql.conf_with_recovery_command:- This file is a new addition in PostgreSQL versions greater than 12. When a database is restored, it generates this configuration file with critical recovery parameters.
- To ensure a successful startup, these parameters need to be manually copied into the primary
postgresql.conffile. - Official documentation reference: Commvault PostgreSQL Restore Documentation
- Detailed simulation and steps are documented in Confluence: [internal link].
Base Configuration Update Requirement:
- If the database is opened using
pg_ctlwithout copying the recovery parameters, it fails to initialize properly. Updating the base configuration manually resolves this issue.
- If the database is opened using
Recovery Mode Persistence and WAL File Validation:
- In cases where the database remains in recovery mode (both with
pg_ctland Patroni), a background process validation can pinpoint the missing WAL file. - If the WAL file no longer exists, the database gets stuck in a replica+starting mode.
- Resolution: Manually promoting the instance using
pg_ctlsuccessfully fixed the issue. Patroni, however, did not permit the promotion. This behavior appears analogous to an "incomplete cancel-based recovery." - Additional details are available in [internal page].
- In cases where the database remains in recovery mode (both with
Next Steps and Recommendations:
It would be helpful if more SREs could test these scenarios across various clusters. If we observe a consistent pattern in solutions, we can standardize a common SOP for PostgreSQL recovery. This would greatly enhance efficiency during future restoration efforts.
Please let me know if you'd like to discuss this further or need additional details.





0 comments:
Post a Comment