Slider 1 mini Slider 2 mini

Wednesday, 4 March 2026

Filled under:

 Hi [Name],

Thank you for your response.

Please find our clarification below:

1. Patch Set Verification

The patch set verification falls under OS patching, which is managed by the Linux team. As you are aware, this is not related to any database patching activity. We do not have visibility into OS-level patch baselines.

Kindly verify and confirm the applied OS patch set from your end.

2. Permission Loss After Reboot

The core issue reported by us was that permissions were getting reset after server reboot. As per the recent update, this has been fixed.

Given that this is an OS-level behavior, validation should ideally be performed by the Linux team to confirm persistence across reboots. If required, please involve the appropriate resource from your team to validate.

From the database side, we have already accommodated multiple reactive validations. Since the fix was implemented yesterday, and considering this impacts production backups, we cannot risk another failure scenario.

3. Service Status Checks

Similar to point 2, these validations are OS/service-level checks. Now that the fix is claimed to be implemented, Linux team should validate service behavior post-reboot and confirm stability.

We would like to highlight that this issue has been ongoing for over a month, with an earlier commitment that it would be resolved by end of February. However, we are still seeing reactive queries directed toward the database team, even though the root cause and ownership lie within the OS layer.

We request proactive validation and confirmation from the Linux team to ensure closure and avoid further production backup impact.

Looking forward to your confirmation.

Posted By Nikhil04:46

Friday, 27 February 2026

Filled under:

 I’ve enhanced the internal BCM automation articles to be more structured and AI-friendly.

Now the internal AI can accurately handle queries like task failures or escalation contacts and provide the right responses directly — improving self-service and reducing manual handling.

Isn’t this exactly the direction we’re heading with AI & automation integration? πŸ‘πŸΌπŸ™‚

Posted By Nikhil06:52
Filled under:

 Assist has now been fed with all FAQs (user queries). Going forward, if users have any queries related to our BCM automation, please redirect them to goto/red or Copilot. In most cases, AI should be able to provide the appropriate response directly.

Let’s guide users there to reduce manual interactions and query handling.

Thanks to Ranjit for the suggestion — this has now been implemented.

Posted By Nikhil05:55

Thursday, 26 February 2026

Filled under:

 Objective 1: Implement Unified Grafana Dashboard (LGTM Project)

Objective Statement:

Design and implement centralized Grafana dashboards for Oracle and PostgreSQL databases under the LGTM initiative to improve real-time visibility and operational decision-making.

Key Deliverables:

Integrate Oracle and PostgreSQL metrics into the LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus).

Develop standardized dashboards covering availability, performance, replication lag, storage, sessions, and alert trends.

Enable role-based access for application and operations teams.

Conduct at least 1 knowledge-sharing/demo session for stakeholders.

Success Metrics:

Dashboards live in production by [target quarter].

≥ 80% of database operational metrics accessible via Grafana.

Reduction in ad-hoc monitoring requests by 30%.

Positive feedback from users on accessibility and usability.

Business Impact:

Improves transparency, reduces dependency on DBAs for basic health checks, and enhances proactive monitoring.

Objective 2: Develop Actionable Data Mesh Reporting Framework

Objective Statement:

Design and implement a structured reporting solution using Data Mesh principles to generate meaningful, actionable insights supporting daily database operations and decision-making.

Key Deliverables:

Identify key operational data domains (backup status, replication health, performance trends, storage growth).

Develop automated report generation using centralized data sources.

Ensure reports are standardized, easy to interpret, and aligned with operational KPIs.

Implement scheduled distribution or dashboard-based access.

Success Metrics:

Deliver at least 3 core operational reports.

Reduce manual data compilation effort by 40%.

Improve response time for operational decision-making.

Adoption by operations/application teams for day-to-day reference.

Business Impact:

Enables data-driven decisions, reduces manual effort, and increases operational efficiency.

Posted By Nikhil20:32
Filled under:

 Hi Pawel, thanks for reaching out. I had already signed off yesterday.

Happy to connect today — just let me know a time that works for you, and I’ll be around.

Posted By Nikhil01:41

Wednesday, 25 February 2026

Filled under:

 Hi Team,

Thank you for raising the change request for the database switchover as part of your BCM plan.

As per the standard process, BCM switchovers are executed via automation. We request you to please raise one RITM ticket (Automated – BCM Database Switchover). Once submitted, this will trigger the automation workflow and provide you with the required control to initiate the database switchover independently.

You can find the detailed steps and process documentation here:

[Insert Link Here]

Kindly ensure the RITM is raised well in advance of the planned implementation window to avoid any delays.

During the day of implementation, if you encounter any issues while executing the BCM activity, the Database Team will be available on #channel-name for immediate support.

Please let us know once the RITM is raised.

Posted By Nikhil17:33

Monday, 16 February 2026

Filled under:

 Hi Team,


I’ve noticed that in recent automated PostgreSQL provisioning runs, the "pg_krb5.conf" file is no longer present on newly built instances.


From an SRE standpoint, this should not impact us since:


- Our primary authentication mechanism is SSL certificate-based.

- LDAP usage (in our case) does not rely on Kerberos/GSSAPI.

- "krb5.conf" is only required when Kerberos (GSSAPI) authentication is enabled via "pg_hba.conf".


Could you please confirm whether this removal is intentional?

Just want to ensure there are no implications for any edge cases or future auth integrations.


Thanks,

Nikhil



Hi Team,


I hope you’re doing well.


I wanted to highlight an observation from the recent PostgreSQL server provisioning runs (automated builds in our environment). I’ve noticed that the "pg_krb5.conf" file is no longer being provisioned on newly created instances.


As part of a quick technical review from the SRE perspective:


- Our authentication model is predominantly certificate-based (SSL client certs), which does not depend on Kerberos configuration.

- LDAP authentication is used occasionally, and standard LDAP (without GSSAPI/Kerberos binding) also does not require a "krb5.conf" file.

- The "krb5.conf" file becomes relevant only if GSSAPI/Kerberos authentication is enabled (e.g., "gss" entries in "pg_hba.conf") or if LDAP is configured with SASL/GSSAPI.

- In the absence of Kerberos-based authentication, the file should not have any functional impact on PostgreSQL connectivity.


That said, I wanted to confirm:


- Is the omission of "pg_krb5.conf" intentional as part of a security hardening or configuration simplification effort?

- Or should it still be provisioned for fallback / future Kerberos-based integrations?


Just seeking confirmation to ensure there are no unintended side effects in edge cases or future auth model changes.


Thanks in advance for the clarification.


Best regards,

Nikhil

Posted By Nikhil06:33