September 2024 ~ JAVA J2EE 4 U

Monday, 30 September 2024

Filled under:

It looks like the vendor replied on August 27th with some details related to the information I shared with them last Friday. The case remained unacknowledged on our end. I've responded to them today and provided further details to move things forward.

Posted By Nikhil19:53

Friday, 27 September 2024

Filled under:

Posted By Nikhil20:14

Filled under:

Posted By Nikhil20:12

Thursday, 26 September 2024

Filled under:

#!/bin/bash

# Variables

DB_NAME="your_database_name" # Replace with your PostgreSQL database name

DB_USER="your_db_user" # Replace with the PostgreSQL user

OUTPUT_FILE="/path/to/flat_file.log" # Define the path to the flat file

HOST="localhost" # Hostname of the PostgreSQL server, adjust if necessary

PORT="5432" # Port for PostgreSQL (default: 5432)

# Date and time format

DATE=$(date '+%Y-%m-%d')

TIME=$(date '+%H:%M:%S')

# Capture the database size using psql

DB_SIZE=$(psql -U $DB_USER -h $HOST -p $PORT -d $DB_NAME -t -c "SELECT pg_size_pretty(pg_database_size('$DB_NAME'));" | tr -d ' ')

# Check if DB_SIZE is not empty or null

if [[ -z "$DB_SIZE" ]]; then

echo "[$DATE $TIME] ERROR: Could not fetch the database size" >> $OUTPUT_FILE

else

# Write date, time, and size to the output file

echo "[$DATE $TIME] Database: $DB_NAME Size: $DB_SIZE" >> $OUTPUT_FILE

Posted By Nikhil19:16

Filled under:

I've noticed that the password file for our database has been changing weekly. I'm reaching out to see if there is a specific process or reason for these changes. While I'm not certain if these modifications were made intentionally, they are causing the standby database to fall out of sync and triggering alerts related to the database being down.

If there’s a procedure in place that requires these updates, please let me know. Understanding the context would be really helpful in addressing the alerts and ensuring everything remains stable.

Posted By Nikhil19:00

Wednesday, 25 September 2024

a

Filled under:

wanted to provide a quick handover regarding the recent database incident.

Incident Summary

The database was in a starting state on the leader site, and the standby was not in sync.
We engaged the vendor via a call, and they requested the logs, which have already been uploaded.

Current Actions

The instance has been restarted, and the new leader is [New Leader Name].
The rebuild of the other site is currently in progress.
We have decided to stop the old leader, [Old Leader Name], and will not restart it.

Next Steps

I raised a separate case with the vendor to investigate the following:
- Database inflation
- Possible solutions for out-of-sync issues at specific times
- Checkpointing messages noted in the log file

Please let me know if you need any further information.

Posted By Nikhil22:42

Filled under:

Request for Support: Investigation of Rapid Database Growth and Checkpoint Issues

Dear [Vendor's Support Team/Specific Contact Name],

I hope this message finds you well. We are reaching out to seek your expertise regarding some issues we've been experiencing with our PostgreSQL database that require further investigation.

Issue Overview

We have observed that our PostgreSQL database has been growing rapidly, which has raised our concerns. Our initial suspicion points towards potential tempfile usage or table bloat, but we would appreciate your guidance on identifying all possible causes for this growth and the best practices to mitigate it.

Specific Observations

Rapid Database Growth:
- The database size has increased significantly in a short period. We are particularly interested in understanding the contributing factors, including:
  - Tempfile creation
  - Table bloat (especially in frequently updated tables)
  - Large transactions or queries that might be creating excessive temporary data
Frequent Checkpoints:
- We have also noticed that the database experiences a higher number of checkpoints at specific times, which seems to coincide with periods when the database goes out of sync temporarily. We would like to understand:
  - The relationship between checkpoint frequency and database performance
  - Potential configuration adjustments to optimize checkpoint settings
  - Best practices for maintaining database consistency during high-load scenarios

Request for Support

We would greatly appreciate your insights into the following:

Possible reasons behind the rapid growth of our database.
Recommendations for monitoring and addressing tempfiles and bloat.
Guidance on configuring checkpoint settings to minimize disruption and ensure data synchronization.
Any additional best practices you might suggest for maintaining overall database health.

We look forward to your advice and appreciate your support in helping us resolve these issues. Please let us know if you need any specific logs or metrics from our side to assist in the investigation.

Thank you for your attention to this matter.

Posted By Nikhil21:48

Tuesday, 24 September 2024

Filled under:

ChatGPT 4o mini

Write a mail to your team about the recent incident happened with postgres cluster The postgres cluster appeared in starting state today whereas the replicas were not in sync, potentially this is the cause the leader was not able to switch to available sites. There was no alert received for this but noticed this situation when I had to provide inputs to vendor for previous case which was raised last week Currently you have raised a case with edb to know what can be done to fix along with that raise a channel for application owners to inform them and confirm the impact Looking at the logs it appears that there was excessive checkpoint and potentially caused the standby not in sync, but termination of leader instance is still unknown.

Subject: Update on Postgres Cluster Incident

Team,

I wanted to inform you about a recent incident involving our Postgres cluster. This morning, the cluster was found in a starting state, and it became apparent that the replicas were not in sync. This situation may explain why the leader instance was unable to switch to available sites.

Unfortunately, no alerts were received regarding this issue. I discovered it while preparing inputs for a vendor related to a previous case raised last week.

I have opened a case with EDB to explore potential solutions and have also created a channel for application owners to keep them informed and to assess the impact of this situation.

Initial log analysis suggests that excessive checkpoint activity may have caused the standby to fall out of sync, but the termination of the leader instance remains unclear at this time.

I will keep you updated as more information becomes available. Please let me know if you have any questions or need further clarification.

Posted By Nikhil19:59

Thursday, 19 September 2024

Filled under:

We received an alert indicating that the filesystem threshold was reached. The base directory and database sizes are large, but currently, we are below the threshold, so no action is needed at this time. The pg_wal has been drained as the replica database is now in sync.

I've opened a case with the vendor to explore precautionary measures and potential reasons for this issue

Posted By Nikhil21:11

Filled under:

4o mini

One of the application database is growing fast and has occupied the filesystem. Write a mail to application team to inform this and request to increase the space with the help of vism team. Share the risk that need of space is necessary else it may cause outage when filesystem is completely occupied. Also mention that there seems large number of operations might have happened in few days that could have caused the size to grow and replica not in sync. Ask the same. Also ask if they have been creating LOB and if so, clear the orphan LOBs which are not in use. Also suggest it is advisable to perform vacuum once space is added.

Subject: Urgent: Action Required for Database Space and Synchronization Issues

Dear Application Team,

I hope this message finds you well. I want to bring to your attention that our application database is growing rapidly and has nearly occupied the entire filesystem. To prevent a potential outage, it is crucial to increase the available space in collaboration with the VISM team as soon as possible.

Additionally, I’ve noticed a significant number of operations occurring over the past few days, which may have contributed to the database size increase. This activity might also be causing our replica to fall out of sync. Could you please provide insights into the recent operations?

Furthermore, if LOBs have been created, please ensure that any orphaned LOBs that are no longer in use are cleared out to free up space.

Once additional space has been allocated, I recommend performing a vacuum to optimize the database performance.

Thank you for your prompt attention to these matters. Please let me know if you need any assistance in coordinating with the VISM team.

Posted By Nikhil19:51

Filled under:

ChatGPT 4o mini

You were asked if any tickets were generate for this database Write a mail to explain this Yes tickets were created and following are tickets, but remained not acknowledged Since we introduced new correlation during bigpsnd migration it has been noticed new tickets are created after 48hrs of correlation despite of already there is a ticket in open state. Due to this, eventually tickets remains unacknowledged in assumption that other dbas are working on it, It was proposed a few times to remove this correlation logic Here are are some sample tickets from past Such tickets are only action with comments of reference of another tickets and often goes in loop, but actual issue is only fixed when backup team RO sends the list of database where backup is fialing since long time

Subject: Update on Ticket Generation for the Database

Dear [Recipient's Name],

I wanted to provide an update regarding the ticket generation for our database. Yes, tickets were created, but unfortunately, they have remained unacknowledged.

Since we introduced new correlation logic during the BigPond migration, we’ve observed that new tickets are being created 48 hours after the correlation, even when there is an existing ticket in an open state. This has led to confusion, as it appears other DBAs are working on these tickets, causing them to remain unacknowledged.

We have proposed several times to consider removing this correlation logic, as it seems to contribute to the issue.

Here are some examples of past tickets for your reference. These tickets typically only receive comments referencing other tickets, leading to a looping situation. The actual issues tend to be resolved only when the backup team’s RO provides a list of databases where backups have been failing for an extended period.

Please let me know if you’d like to discuss this further.

Best regards,

Posted By Nikhil18:17

Monday, 30 September 2024

Friday, 27 September 2024

Thursday, 26 September 2024

Wednesday, 25 September 2024

a

Incident Summary

Current Actions

Next Steps

Issue Overview

Specific Observations

Request for Support

Tuesday, 24 September 2024

Thursday, 19 September 2024

About me

ni

Aboutme

Follow

You can also receive Free Email Updates:

You can also receive
Free Email Updates:

Followers

Blog Archive

Total Pageviews

About Metro UI Theme

Blogger news

Blogroll

Blogroll

About

Monday, 30 September 2024

Friday, 27 September 2024

Thursday, 26 September 2024

Wednesday, 25 September 2024

a

Incident Summary

Current Actions

Next Steps

Issue Overview

Specific Observations

Request for Support

Tuesday, 24 September 2024

You said:

ChatGPT said:

Thursday, 19 September 2024

You said:

ChatGPT said:

You said:

ChatGPT said:

About me

ni

Aboutme

Follow

You can also receive Free Email Updates:

You can also receive Free Email Updates:

Followers

Blog Archive

Total Pageviews

About Metro UI Theme

Contact us

Sign up for our newsletter

Blogger news

Blogroll

Blogroll

About

You can also receive
Free Email Updates: