Slider 1 mini Slider 2 mini

Friday, 12 June 2026

Filled under:

 Hi Team, sharing a quick update from today’s BCM event — all PostgreSQL BCM activities have been completed successfully.


This time, the PostgreSQL switchover/switchback was executed via automation, and all validations were successful. Thanks everyone for the support.

Posted By Nikhil23:08

Thursday, 11 June 2026

Filled under:

 We are currently working on LGTM deployment, where the deployment activity involves certificate installation on the target hosts.

The deployment was successful in the Development environment. However, for Production and TE2 environments, the deployment did not proceed directly as it went for approval. To avoid approval dependency, the required hosts were whitelisted under reference number: .

Even after whitelisting the hosts, the deployment still went for approval. We have now scheduled a call with the CREDS team to understand what additional configuration or mapping is required.

As per the available FAQ, the technical account may also need to be mapped in ISAAC for the deployment to proceed without approval. We need support to confirm the complete requirement and identify what is currently missing, so that future LGTM deployments to Production and TE2 can proceed without manual approval for this account.

Request:

Please help review the current setup and confirm:

Whether the host whitelisting has been correctly applied.

Whether the technical account needs to be mapped in ISAAC.

Any additional approval bypass configuration required for Production and TE2.

What changes are needed so that LGTM certificate deployment can proceed without approval for this account. :::

A shorter title could be:

LGTM deployment going for approval despite host whitelisting

Posted By Nikhil01:37

Wednesday, 10 June 2026

Filled under:

 Here is a clean 2–3 line release summary for each item:

:::writing{variant="document" id="73924"} 1. Policy to disable HA for lower environments — Audit Mode

Introduced an audit-mode policy to identify HA-enabled instances across AMR, Redis, Cosmos DB, and DocumentDB in dev/test environments. This will help review unnecessary HA usage in lower environments and support future cost-optimization actions.

2. Azure Cosmos — ICC Evidence Document Update

Updated the Azure Cosmos ICC evidence documentation with the latest required evidence triggered by MERs. This ensures the relevant compliance artefacts remain current and aligned with ICC requirements.

3. Cosmos — Prod Database Refresh Script

Developed a script to copy required metadata from production to lower environments. This helps provide a more production-like setup in non-prod, improving testing reliability and validation quality.

4. Azure Cosmos vCore & AMR — Migration to Python / AKS Analysis

Completed analysis to support role assignment for Azure Cosmos vCore and Azure Managed Redis using a Python-based application approach. This is a step towards enabling Azure Function-based execution and reducing dependency on ADO pipelines.

5. Automation Solution for Pipeline Failure — Analysis

Performed analysis for improving the solution design around pipeline failure detection. The objective is to reduce false positives and improve identification of actual pipeline failures.

6. Agent Pool Fix — Invoke Pyfunc Pipeline

Worked on the agent pool fix required for the Invoke Pyfunc pipeline execution. This helps improve pipeline stability and ensures smoother automation runs. :::

Posted By Nikhil22:14
Filled under:

 We have one hour blocked, but if we are able to cover previous action items, today’s feedback, and agree on next actions earlier, we can close early.”

Thanks, lets now move to this iteration’s retro and capture fresh inputs on what went well, what did not go well, and what we can improve.”

----

Hi Team, thanks for joining. We’ll use this retro to quickly review the previous action items, then capture inputs for this iteration — what went well, what did not go well, and what we can improve. We don’t need to use the full hour if we finish early.

Before we start fresh inputs, let’s quickly revisit the takeaways from the previous retro and check whether we saw any improvement.”

Then close

Thanks, I’ll carry forward only the items that still need action. Let’s now move to this iteration’s retro inputs.”

Then for 10mins

Ending

Thanks everyone for the inputs. I’ll summarize the key points and action items after the call. We’ll review progress on these in the next retro.”

“We have covered the main points and actions, so we can give some time back. Thanks everyone.”

Posted By Nikhil20:04
Filled under:

 

Hi Team,

This request has been assigned to DBA for running a query on application tables.

Could you please confirm if the expectation is really for DBA to execute this? As per usual controls, DBAs should not run application-level queries using privileged access on application-owned tables.

It looks like this may have been assigned to DBA a bit too quickly, so checking before we proceed.

Please confirm the application owner and the approved account/process to run this query.

Regards,
Nikhil

Posted By Nikhil09:21
Filled under:

 

Title: Standard ID Remediation for Oracle LGTM Onboarding in Lower Environments

As a DBA / SRE working on Oracle LGTM onboarding,
I want to identify and remediate Standard ID related issues on lower environment Oracle hosts,
So that SRVAPD account deployment can complete successfully and LGTM onboarding can continue without repeated failures.


Background

As part of the LGTM onboarding activity, we are currently working on Oracle databases. After completing a few batches, we started seeing failures during the deployment of the SRVAPD account.

Initial investigation showed that the failures were not related to the LGTM onboarding flow itself, but were caused by underlying Standard ID issues on several lower environment servers.

The following issues were observed across different hosts:

  • Standard ID was in a broken state.
  • Standard ID details were missing from the expected source.
  • Standard ID certificate was expired on some hosts.
  • Pisa package related issues were also seen on a few servers.

Based on the investigation, we decided to segregate the issues by category, identify the ones with major weightage, and start remediation for the most common and high-impact cases first.


Scope of Work

  • Identify lower environment Oracle hosts impacted by Standard ID issues.
  • Segregate the failures based on the type of issue observed.
  • Validate hosts where SRVAPD deployment failed due to Standard ID problems.
  • Perform trial remediation on selected hosts.
  • Develop a script to remediate Standard ID issues.
  • Add required validations in the script to confirm the remediation is successful.
  • Test the script on multiple hosts before wider rollout.
  • Work with the automation team to deploy the script through Amelia.
  • Use Amelia automation to remediate lower environment hosts.

Acceptance Criteria

  • Affected hosts are identified and categorized based on the Standard ID issue.
  • Major failure patterns are separated from one-off host-specific issues.
  • Trial remediation is completed successfully on selected hosts.
  • Remediation script is developed and tested.
  • Script performs required pre-checks and post-checks.
  • Script is deployed through Amelia automation.
  • Lower environment hosts can be remediated using Amelia.
  • SRVAPD account deployment is revalidated after remediation.
  • Remediated hosts are ready to continue with LGTM onboarding.

Comments / Checklist

  • [ ] Identified that SRVAPD account deployment started failing after a few LGTM onboarding batches.
  • [ ] Investigated and confirmed that multiple lower environment servers had Standard ID in a broken state.
  • [ ] Observed cases where Standard ID details were missing from the required source.
  • [ ] Observed cases where Standard ID certificates were expired.
  • [ ] Observed cases where Pisa package related issues were present.
  • [ ] Segregated the issues based on failure category and impact.
  • [ ] Performed trial remediation on various hosts to validate the fix.
  • [ ] Developed a remediation script for Standard ID related issues.
  • [ ] Tested the script on multiple hosts to ensure it works as expected.
  • [ ] Added validations in the script to confirm Standard ID status before and after remediation.
  • [ ] Worked with the automation team to deploy the script in Amelia.
  • [ ] Plan to use Amelia automation for lower environment remediation.
  • [ ] Revalidate SRVAPD deployment and LGTM onboarding after remediation.

Posted By Nikhil07:40
Filled under:

 SET LINESIZE 300

SET PAGESIZE 50000

SET LONG 1000000

SET LONGCHUNKSIZE 1000000

SET TRIMSPOOL ON

SET WRAP ON


COLUMN hostname FORMAT A45

COLUMN script_result FORMAT A200

Posted By Nikhil07:34