Title: Standard ID Remediation for Oracle LGTM Onboarding in Lower Environments
As a DBA / SRE working on Oracle LGTM onboarding,
I want to identify and remediate Standard ID related issues on lower environment Oracle hosts,
So that SRVAPD account deployment can complete successfully and LGTM onboarding can continue without repeated failures.
Background
As part of the LGTM onboarding activity, we are currently working on Oracle databases. After completing a few batches, we started seeing failures during the deployment of the SRVAPD account.
Initial investigation showed that the failures were not related to the LGTM onboarding flow itself, but were caused by underlying Standard ID issues on several lower environment servers.
The following issues were observed across different hosts:
- Standard ID was in a broken state.
- Standard ID details were missing from the expected source.
- Standard ID certificate was expired on some hosts.
- Pisa package related issues were also seen on a few servers.
Based on the investigation, we decided to segregate the issues by category, identify the ones with major weightage, and start remediation for the most common and high-impact cases first.
Scope of Work
- Identify lower environment Oracle hosts impacted by Standard ID issues.
- Segregate the failures based on the type of issue observed.
- Validate hosts where SRVAPD deployment failed due to Standard ID problems.
- Perform trial remediation on selected hosts.
- Develop a script to remediate Standard ID issues.
- Add required validations in the script to confirm the remediation is successful.
- Test the script on multiple hosts before wider rollout.
- Work with the automation team to deploy the script through Amelia.
- Use Amelia automation to remediate lower environment hosts.
Acceptance Criteria
- Affected hosts are identified and categorized based on the Standard ID issue.
- Major failure patterns are separated from one-off host-specific issues.
- Trial remediation is completed successfully on selected hosts.
- Remediation script is developed and tested.
- Script performs required pre-checks and post-checks.
- Script is deployed through Amelia automation.
- Lower environment hosts can be remediated using Amelia.
- SRVAPD account deployment is revalidated after remediation.
- Remediated hosts are ready to continue with LGTM onboarding.
Comments / Checklist
- [ ] Identified that SRVAPD account deployment started failing after a few LGTM onboarding batches.
- [ ] Investigated and confirmed that multiple lower environment servers had Standard ID in a broken state.
- [ ] Observed cases where Standard ID details were missing from the required source.
- [ ] Observed cases where Standard ID certificates were expired.
- [ ] Observed cases where Pisa package related issues were present.
- [ ] Segregated the issues based on failure category and impact.
- [ ] Performed trial remediation on various hosts to validate the fix.
- [ ] Developed a remediation script for Standard ID related issues.
- [ ] Tested the script on multiple hosts to ensure it works as expected.
- [ ] Added validations in the script to confirm Standard ID status before and after remediation.
- [ ] Worked with the automation team to deploy the script in Amelia.
- [ ] Plan to use Amelia automation for lower environment remediation.
- [ ] Revalidate SRVAPD deployment and LGTM onboarding after remediation.