Core Node Failure

In this failure case, one of the core nodes completely fails. This situation occurs anytime a node is not communicating with any of the other nodes in the cluster; for example, a node is destroyed, the node's SDS fails, or the node is powered off or disconnected from the cluster.

Stage	Details
Preconditions	The load balancer must be able to run application level health checks on each of the core nodes in the PCE cluster, so that it can be aware at all times whether a node is available. Important When you use a DNS load balancer and need to provision a new core node to recover from this failure, the `runtime_env.yml` file parameter named `cluster_public_ips` must include the IP address of your existing core nodes and the IP addresses of the replacement nodes. When this is not configured correctly, VENs will not have outbound rules programmed to allow them to connect to the IP address of the replacement node. Illumio recommends that you preallocate these IP addresses so that, in the event of a failure, you can restore the cluster and the VENs can communicate with the replacement node.
Failure Behavior	PCE The PCE is temporarily unavailable. Users might be unable to log into the PCE web console. The PCE might return an HTTP 502 response and the `/node_available` API call might return an HTTP 404 error. Other services that are dependent on the failed services might be restarted within the cluster. VENs VENs are not affected. VENs continue to enforce the current policy. When a VEN misses a heartbeat to the PCE, it retries in 5 minutes.
Recovery	Recovery type: Automatic. The cluster has multiple active core nodes for redundancy. Recovery procedure: None required. RTO: 5 minutes. RPO: Zero. No data loss occurs because the core nodes are stateless.
Full Recovery	Either recover the failed node or provision a new node and join it to the cluster.

Illumio Core 25.1 Administration Guide

Core Node Failure

Important

Search results