Skip to content

Disaster recovery planning

What Disaster Recovery Planning Is

Disaster recovery planning (DR planning) is the structured work of defining how you will restore critical systems after a major disruption — and proving that the plan works through testing. In EV charging, DR planning covers the full stack: CPMS, charger connectivity, authentication, payments, roaming, monitoring, firmware/configuration services, databases, and support operations.

Why DR Planning Matters for EV Charging Networks

Charging networks behave like utility infrastructure: downtime quickly impacts revenue, SLAs, and customer trust. DR planning helps you:
– Recover service within defined limits (RTO)
– Limit data loss (RPO) for transactions and billing
– Reduce blast radius from mis-deployments or cyber incidents
– Keep fleets operational (especially depots with departure deadlines)
– Meet enterprise procurement and cybersecurity requirements

Define the Targets First

A solid DR plan starts by setting measurable objectives per service:

RTO (Recovery Time Objective): maximum acceptable downtime
RPO (Recovery Point Objective): maximum acceptable data loss window
Service tiers: what must be restored first vs later

Example tiering for charging:
– Tier 0: core authorization + charging start/stop
– Tier 1: monitoring, alarms, support tooling
– Tier 2: billing, invoicing, reporting, analytics

Scope: What Must Be Recoverable

DR planning is not only “the cloud CPMS.” A complete scope typically includes:
Device connectivity (OCPP endpoints, TLS cert validation, DNS)
Identity & access (RFID/app tokens, whitelists, user accounts)
Payments (tariffs, payment provider integration, settlement)
Roaming (OCPI or roaming hub links, session exchange)
Databases (sessions, meter values, customer data, audit logs)
Configuration & device twins (desired state, site policies, load caps)
OTA firmware pipeline (update hosting, signing keys, rollback)
Logging and observability (metrics, traces, SIEM feeds)
Support operations (ticketing, call center scripts, incident comms)

Core DR Design Decisions

DR planning must choose how failover and recovery will work:

Active-passive: standby environment is ready, activated during disaster
Active-active: multiple regions run simultaneously, fastest failover
Cold/warm/hot standby: trade-off between cost and recovery speed
Data replication: synchronous vs asynchronous (affects achievable RPO)
Key management: how secrets, certificates, and signing keys are protected and restored
Network dependencies: DNS, APN/SIM providers, firewalls, VPNs

Charger Behavior During Backend Outage

For EV charging, DR planning must define “what chargers do when the brain is down”:
– Offline start rules: local whitelist, cached tokens, or restricted mode
– Session continuity: buffering meter values and events until reconnect
– Safe fallback: “deny by default” vs “free vend” policies per site type
– Reconciliation rules after recovery (avoid double billing, ensure auditability)
– Depot-specific policies: prioritise keeping charging possible for critical departures

DR Playbooks You Should Have

Practical DR planning produces step-by-step playbooks, not slides:
Regional cloud outage failover (switch endpoints, update DNS, validate mTLS)
Database restore (point-in-time recovery, integrity checks, reindex)
Credential compromise (rotate secrets, revoke certificates, block devices)
Bad configuration rollout (rollback desired state, freeze changes)
Bad firmware rollout (halt campaign, rollback channel, isolate affected models)
Payment provider outage (fallback tariffs, offline receipts, queue transactions)
Roaming outage (local auth options, user messaging, settlement queue)

Each playbook should specify:
– Trigger criteria (what counts as disaster)
– Roles and approvals (who can declare and execute)
– Exact steps, commands, and validation checks
– Communication templates (status page, customers, partners)
– Post-incident actions (root cause, improvements)

Backups and Recovery Readiness

Your DR plan should explicitly document:
– Backup frequency and retention per dataset
– Where backups are stored (separate account/tenant, immutable storage)
– Encryption and access controls (who can restore)
– Restore testing schedule and success criteria
– Dependencies: schemas, migrations, feature flags

Testing and DR Drills

A DR plan is only real if tested. Common drill types:
Tabletop exercises (walk through scenarios and decision paths)
Game days (controlled outages and failovers in production-like staging)
Live failover tests (planned regional switch)
Restore tests (prove RPO and data integrity using real backups)

Key measures to record:
– Actual recovery time vs RTO
– Data loss vs RPO
– Operational friction points (missing access, unclear ownership, broken docs)

Common Pitfalls

– Backups exist but restores aren’t tested
– DR covers CPMS but ignores payments, roaming, identity, or firmware signing keys
– Single points of failure: one DNS provider, one region, one database cluster
– No “kill switch” for mass configuration or firmware errors
– Chargers have no defined offline policy → unpredictable behavior and customer confusion

Disaster recovery
Business continuity
High availability (HA)
Failover
Backup and restore
Incident response
Secure update pipeline
Device authentication