Skip to content

Disaster recovery

What Disaster Recovery Is

Disaster recovery (DR) is the set of processes, tools, and plans used to restore critical systems after a major disruption — such as a cloud outage, cyberattack, data corruption, or physical damage to infrastructure. In EV charging, DR ensures that a charging network and its supporting services (CPMS, payments, roaming, monitoring, and device management) can return to operation within defined time and data-loss limits.

Why Disaster Recovery Matters in EV Charging

EV charging networks are service infrastructure. If backend systems fail, chargers may become unavailable, transactions may be lost, and customer trust drops fast. DR helps operators:
– Keep charging available during major incidents
– Restore service quickly and predictably
– Protect business continuity (revenue, SLAs, fleet readiness)
– Reduce safety and compliance risks caused by uncontrolled outages
– Limit data loss for transactions, billing, and support investigations

What Disaster Recovery Typically Covers

A complete DR scope usually includes:
CPMS availability (device connectivity, commands, session records)
Authentication and access control (RFID, app tokens, whitelists)
Payments and billing (tariffs, invoicing, refunds, settlement)
Roaming interfaces (interoperability and session exchange)
Monitoring and diagnostics (alerts, logs, device twins)
Firmware and configuration management (OTA pipeline, desired state)
Databases and storage (session history, customer data, audit logs)
Support operations (ticketing, call center tooling, incident comms)

Key DR Concepts

These metrics define how strong a DR plan is:

RTO (Recovery Time Objective)
Maximum acceptable time to restore the service after a disaster.

RPO (Recovery Point Objective)
Maximum acceptable data loss, measured as time (e.g., “no more than 15 minutes of data lost”).

Failover
Switching from a primary system/region to a backup system/region.

Backup and restore
Regular snapshots of data and the ability to restore them reliably.

Common Disaster Scenarios for Charging Operators

– Cloud region outage affecting the CPMS
– Database corruption or accidental deletion
– Cyber incidents (ransomware, credential compromise, supply-chain attack)
– Payment provider outage or API failure
– Roaming hub outage causing interoperability failures
– Mis-deployed configuration or firmware affecting many chargers at once
– Network-wide connectivity issues (SIM/APN failures, DNS issues)

DR Strategies Used in Practice

Different operators choose different DR levels based on scale and SLA needs:

Active-passive
Primary system runs normally; backup is ready to take over if needed.

Active-active
Two regions run in parallel; traffic can shift instantly if one fails.

Cold / warm / hot standby
Backup environment ranges from “needs manual start” (cold) to “fully running” (hot).

Graceful degradation
If the backend is down, chargers may continue in limited modes (e.g., free vend, local whitelist, offline transactions) depending on local capabilities and configuration.

DR for Chargers in the Field

Because chargers are distributed, DR planning must include device behavior during backend outages:
– Offline authorization logic (local whitelist, cached tokens)
– Session continuity and meter value buffering
– Safe fallback modes (restrictive vs permissive)
– Reconciliation rules when connectivity returns (avoid double billing)
– Clear operator policy on what services remain available during incidents

Best Practices

– Define RTO/RPO per service (CPMS, payments, roaming)
– Separate backups from production credentials and access
– Encrypt backups and test restore procedures regularly
– Use infrastructure-as-code to rebuild environments consistently
– Run DR drills (tabletop + live failover testing)
– Maintain an incident communication plan (status page, customer updates)
– Keep strong audit logs for post-incident analysis and compliance

Common Pitfalls

– Backups exist but restores are never tested
– DR plan covers the CPMS but not payments, roaming, or identity services
– No rollback strategy for mass configuration/firmware deployment errors
– Single points of failure (one region, one database, one DNS provider)
– Unclear decision-making: who declares a disaster and triggers failover

Business continuity
High availability (HA)
Failover
Backup and restore
Incident response
Secure update pipeline
Device provisioning
Diagnostics