Data Node Failure
In this failure case, one of the data nodes completely fails.

Stage | Details |
---|---|
Preconditions | You should continually monitor the replication lag of the replica database to make sure it is in sync with the primary database. You can accomplish this precondition by monitoring the sudo -u ilo-pce illumio-pce-db-management show-replication-info |
Failure Behavior | PCE
VENs
|
Recovery |
|
Full Recovery | When the failed data node is recovered or a new node is provisioned, it registers with PCE and is added as an active member of the cluster. This node is designated as the replica database and will replicate all the data from the primary database. |
Primary Database Doesn't Start
In this failure case, the database node fails to start.

Stage | Details |
---|---|
Preconditions | The primary database node does not start. |
Failure Behavior | The database cannot be started. Therefore, the entire PCE cluster cannot be started. |
Full Recovery | Recovery type: Manual. You have two recovery options:
WarningPromoting a replica to primary risks data loss Illumio strongly recommends that this option be a last resort because of the potential for data loss. When the PCE Supercluster is affected by this problem, you must also restore data on the promoted primary database. |
Primary Database Doesn't Start When PCE Starts
In this failure case, the database node fails to start when the PCE starts or restarts.
The following recovery information applies only when the PCE starts or restarts. When the PCE is already running and the primary database node fails, database failover will occur normally and automatically, and the replica database node will become the primary node.
Stage | Details |
---|---|
Preconditions | The primary database node does not start during PCE startup. This issue could occur because of an error on the primary node. Even when no error occurred, you might start the replica node first and then be interrupted, causing a delay in starting the primary node that exceeds the timeout. |
Failure Behavior | The database cannot be started. Therefore, the entire PCE cluster cannot be started. |
Full Recovery | Recovery type: Manual. You have two recovery options:
WarningPromoting replica to primary risks data loss Consider this option as a last resort because of the potential for data loss, depending on the replication lag. When you decide on the second option, on the replica database node, run the following command: sudo ilo-pce illumio-pce-ctl promote-data-node <core-node-ip-address> This command promotes the node to be the primary database for the cluster whose leader is at the specified IP address. |