Sunday, 30 November 2025

Filled under:

 How do you approach tuning a slow query in PostgreSQL when EXPLAIN ANALYZE shows high sequential scan costs but indexes seem to exist?

In Oracle, what are the key differences between undoing with UNDO tablespace vs. redoing with REDO logs? How would you size them for a high-transaction OLTP system?

Compare PostgreSQL logical replication vs. physical replication (streaming/WAL). When would you choose one over the other in an SRE context?

What are the most common causes of ORA-04031 (shared pool memory) errors and how do you diagnose and resolve them in production without downtime?

How does PostgreSQL handle VACUUM vs. AUTOVACUUM? What can go wrong at scale and how do you monitor bloat?

5. Reliability & High Availability

You need to achieve < 5 min RPO and < 15 min RTO for a 10 TB PostgreSQL database with almost zero downtime allowed. Walk us through your architecture.

How would you design Oracle Data Guard in “Maximum Performance” vs. “Maximum Availability” vs. “Maximum Protection” modes? Trade-offs from an SRE perspective?

You get paged at 3 AM: primary PostgreSQL node is down and the standby is not promoting. What’s your checklist?

Describe how you would implement zero-downtime migrations from Oracle 12c/19c to PostgreSQL (or vice-versa) in production.

What SRE metrics (SLOs/SLIs/error budgets) would you define for a mission-critical database tier?

6. Automation & Infrastructure as Code

How do you manage PostgreSQL/Oracle configuration at scale (100+ instances) with zero manual drift? (Expect Ansible, Terraform + Vault, Chef/Puppet, etc.)

Show me an example (or describe) how you use Terraform + Helm/operator to provision and manage PostgreSQL on Kubernetes (or Oracle on OCI/EC2).

How would you automate Oracle patching (PSU/BP + OJVM) with zero or near-zero downtime across a fleet?

7. Observability & Monitoring

What are your must-have metrics and alerts for PostgreSQL and Oracle in production? (e.g., pg_stat_activity long-running queries, Oracle AWR top wait events, connection pool exhaustion, archive lag, etc.)

How do you set up centralized logging and tracing for both PostgreSQL (pg_stat_statements, log_min_duration) and Oracle (audit trail, trace files) and feed them into the observability stack (Prometheus + Grafana, Datadog, ELK, etc.)?

You see sudden spiking of “latch: cache buffers chains” contention in Oracle. How do you debug it in real time?

8. Incident Response & Postmortems

Tell us about a serious database-related outage you were involved in (preferably Oracle or PostgreSQL). What was the root cause, how did you detect it, and what blameless postmortem actions came out?

You accidentally ran DROP TABLE production_table in PostgreSQL with no WHERE clause and it succeeded. Walk us through recovery options and timeline.

9. Security & Compliance

How do you handle secrets (DB passwords, TDE wallets, PGP keys) in an SRE environment?

Compare Oracle TDE vs. PostgreSQL pgcrypto / transparent data encryption options. How would you enforce encryption at rest and in transit at scale?

How do you implement and rotate Oracle DB vault or PostgreSQL row-level security in a highly regulated environment?

10. SRE Philosophy & Culture

As a former DBA transitioning to SRE, how do you feel about letting developers run ALTER TABLE ... ADD COLUMN directly in production via CI/CD (with proper Liquibase/Flyway)?

How do you balance “toil reduction” with keeping deep expertise in Oracle/PostgreSQL when most new services move to managed cloud databases (Amazon RDS, Cloud SQL, Azure Postgres)?

Error budgets: If your database SLO is 99.95% and you just burned half the monthly budget in one incident, what do you do?

Bonus “hands-on” questions (if you do live exercises)

Write a Prometheus exporter query (or describe) for PostgreSQL replication lag.

Given an Oracle AWR report snippet, identify the top 3 performance bottlenecks.

Design a GitOps-based workflow for PostgreSQL major version upgrade using Patroni + Zalando operator.

These questions will quickly separate pure traditional DBAs from candidates who truly think and operate like Site Reliability Engineers while still retaining strong Oracle and PostgreSQL skills. Good luck with the interviews!

0 comments:

Post a Comment