Thursday, 5 December 2024

Filled under:

 Description of the Issue:

We are experiencing a data replication latency issue in our PostgreSQL setup managed by Patroni. The issue was reported by the application team when querying a view on the replica node. The data was not initially visible on the replica but was accessible on the leader (primary). After a delay, the data became available on the replica.

The observed latency is a concern, especially for application functionalities dependent on near real-time data replication.

Setup Details:

  • Environment: Patroni-managed PostgreSQL High Availability (HA) Cluster
  • Replication Mode: Asynchronous
  • Primary-Replica Configuration: 1 leader (primary) and 2 replicas
  • Patroni Version: [Specify Version]
  • PostgreSQL Version: [Specify Version]
  • Replication Slots: Enabled/Disabled (specify)
  • Network Setup: Nodes are within the same data center / across regions (specify).

Observations:

  1. pg_last_xact_replay_timestamp:
    The timestamp on the replica node indicated it was lagging behind the leader.

    • The last replayed transaction timestamp was older than expected, even when active transactions were occurring on the primary.
  2. Behavior:

    • Data committed on the leader was visible immediately.
    • A noticeable delay occurred before the data was available on the replica.
    • No significant load or downtime was observed during this period.
  3. WAL Replication:

    • WAL logs appear to be streaming, but the application of WALs on the replica may be delayed.
  4. System Logs:

    • No errors or warnings related to replication in PostgreSQL logs or Patroni logs during the observed delay.

Actions Taken So Far:

  • Verified the status of replication using pg_stat_replication.
  • Monitored resource utilization (CPU, memory, I/O) on replica nodes—found no significant bottlenecks.
  • Checked network latency between the leader and replicas—no anomalies detected.
  • Ensured that WAL archiving and replication settings align with recommended configurations.

Request for Assistance:

We would like assistance from the EDB team in investigating the following:

  1. Possible causes of replication delays in this setup.
  2. Guidance on advanced tuning for replication in Patroni with PostgreSQL to ensure minimal latency.
  3. Suggestions for any additional diagnostic steps or logs to examine for identifying root causes.

Please let us know if you need further details about our environment or configurations.

Urgency Level: Medium / High (based on application needs)

Thank you for your assistance in resolving this issue.

0 comments:

Post a Comment