Skip to main content

Illumio Core 25.2.10 Install, Configure, Upgrade

Restore a PCE or Entire Supercluster

Learn how to restore a single failed PCE, either leader or member, and rejoin it to a Supercluster. You will also learn how to restore the entire Supercluster.

Restore a Single PCE in a Supercluster

Learn how to restore a leader or member PCE in a Supercluster. Isolate that PCE from the Supercluster, restore it, and rejoin it to the Supercluster.

Follow these overview steps to restore a single PCE in a Supercluster.

Step 1: Prepare to Restore a Single PCE
  • Back up the failed PCE. See Back Up Supercluster.

  • Back up the failed PCEs runtime_env.yml configuration file.

  • Make a list of the IP address, ports, and FQDN of the failed PCE. You must use these same values to reconfigure the repaired PCE.

Step 2: Isolate the Affected PCE from the Supercluster

Before restoring a single PCE, isolate that PCE from the Supercluster.

  1. On all nodes, shut down the affected PCE:

    sudo -u ilo-pce illumio-pce-ctl stop
  2. On a core node of each surviving PCE in the Supercluster, set the PCE to runlevel to 2:

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 2

    Note

    You must set all PCEs to runlevel 2 before proceeding to the next step.

  3. On any core node of a surviving PCE, drop the failed PCE from the Supercluster:

    sudo -u ilo-pce illumio-pce-ctl supercluster-drop fqdn_of_failed_pce
Step 3: Install a New PCE or Reuse the Affected PCE

Determine whether you want to reinstall the PCE on new hardware or reuse the PCE installation that is already on the affected system.

Note

In both cases, you must re-establish the FQDN of the affected PCE so that VENs can continue to communicate with the Supercluster. When you have any VENs in enforcement, or you rely on DNS-based load balancing, the new IP addresses of the PCE nodes can be different, as long as the new IP addresses were already in the appropriate settings in the runtime_env.yml file on all PCE core nodes. See Pre-configure New IP addresses.

  • To reinstall the PCE on new hardware, see Deploy a PCE Supercluster.

  • To reuse the affected PCE installation, complete the following steps.

When you decide to reuse the PCEs pre-failure installation, refresh the installation as a standalone PCE.

  1. Power on the PCE nodes.

  2. On all nodes of the affected PCE, run the following command to delete pre-failure directories:

    sudo -u ilo-pce illumio-pce-ctl reset

    Note

    You must run this command on all nodes before proceeding to the next step.

  3. Copy your backed-up copy of the failed PCEs runtime_env.yml file to its location on the newly repaired PCE. See Back Up Leader and Member Runtime Environment Files.

    The default location of the PCE Runtime Environment File is /etc/illumio-pce/runtime_env.yml. When the location is different on your system, locate the file by checking the value of the ILLUMIO_RUNTIME_ENV environment variable.

  4. On all nodes, bring the nodes to runlevel 1:

    sudo -u ilo-pce illumio-pce-ctl start --runlevel 1
  5. On any node, verify the nodes are at runlevel 1:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  6. On any node, run the following command:

    sudo -u ilo-pce illumio-pce-db-management setup
Step 4: Restore the Affected PCEs Supercluster Data
  1. On any node of the affected PCE, verify the runlevel is still 1:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  2. On the data0 node, run the following command:

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --local-pce-file path_to_backup_file --restore-type single_pce

    The restore operation can take time to complete. Wait for it to finish before you move to the next step.

  3. On the data1 node, run the following command:

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --skip-db-restore --local-pce-file path_to_backup_file --restore-type single_pce

    The --skip-db-restore option prevents the command from unnecessarily repeating work that has already been done by previous commands.

  4. On any core node, you must reinstall any previously installed VEN bundles per the compatibility matrix because they will not be restored with the rest of the supercluster data. See Ways to Install the VEN.

Step 5: Rejoin the PCE to the Supercluster
  1. On all Supercluster PCEs, set the runlevel to 2:

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 2

    Setting the runlevel may take some time to complete.

  2. Check the progress to see when the status is RUNNING on all nodes:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  3. Rejoin the PCE to the Supercluster.

    This command can take some time, depending on the number of PCEs in the Supercluster and the size of the PCE databases.

    Choose one of the following options, depending on whether you are working on a leader or member.

    Rejoining the Leader PCE

    On any core node, run the following command:

    sudo -u ilo-pce illumio-pce-ctl supercluster-restore fqdn_of_failed_cluster --restore-type single_pce

    While this command is running, the PCE temporarily sets the runlevel to 1. When the command is interrupted, you might see runlevel 1 unexpectedly.

    Rejoining a Member PCE

    On any core node, run the following command:

    sudo -u ilo-pce illumio-pce-ctl supercluster-restore fqdn_of_failed_cluster fqdn_of_supercluster_leader --restore-type single_pce

    While this command is running, the PCE temporarily sets the runlevel to 1. If the command is interrupted, you might see runlevel 1 unexpectedly.

  4. On every PCE, set the runlevel to 5.

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 5
  5. Verify the run level.

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  6. Verify that the restored PCE has rejoined the Supercluster and is fully operational.

    1. Log in to the leader PCE web console.

    2. Go to the PCE Health page and verify that the PCE health status is Normal.

  7. Check the status of the paired VENs on each PCE.

    From the PCE web console, choose Workloads and VENs > VENs. After all VENs change status from Active (Syncing) to Active, run the following command on one PCE at a time:

    sudo -u ilo-pce illumio-pce-ctl listen-only-mode disable
Restore an Entire Supercluster
  1. Prepare to restore the entire supercluster.

  2. Shut down the entire Supercluster.

  3. Install new PCEs or reuse PCEs.

  4. Restore the PCEs. Repeat the following steps for all PCEs in the Supercluster, either one at a time or in parallel.

    1. Reinstall the PCE on new hardware or reuse the installations and runtime_env.yml files.

    2. Restore the Supercluster data from backup.

  5. Join the repaired PCEs to the Supercluster one at a time.

  6. Finish and verify the full Supercluster restore.

Step 1: Prepare to Restore the Entire Supercluster
  • Have the backups of all PCEs ready to use. For each member PCE, have that member's backup, on the data0 node, a copy of the backup file from the leader. The leader only needs its own backup. See Back Up Supercluster.

  • Have copies of every PCEs Copy of each PCE runtime_env.yml file.

  • Know the IP address, ports, and DNS name of all PCEs in the Supercluster. You must use the same values when you rejoin the PCEs to the Supercluster.

Step 2: Shut Down the Entire Supercluster

On all nodes of every PCE in the Supercluster, run the following command:

sudo -u ilo-pce illumio-pce-ctl stop
Step 3: Install New PCEs or Reuse PCEs

Decide whether you want to completely reinstall the PCEs on new hardware or to reuse the PCE installations.

Note

In both cases, you must re-establish the FQDN of the affected PCE so that VENs can continue to communicate with the Supercluster. When you have any VENs in enforcement, or you rely on DNS-based load balancing, the new IP addresses of the PCE nodes can be different, as long as the new IP addresses were already in the appropriate settings in the runtime_env.yml file on all PCE core nodes. See Pre-configure New IP addresses for information.

  • To reinstall the PCEs on new hardware, see Deploy a PCE Supercluster.

  • To reuse the PCE installations, complete the following steps.

When you decide to reuse the PCE's pre-failure installation, refresh the installation as a standalone PCE:

  1. On all nodes of the PCE, reset the nodes:

    sudo -u ilo-pce illumio-pce-ctl reset

    Note

    You must run this command on all nodes before proceeding to the next step.

  2. Copy your backed-up copy of the failed PCE's runtime_env.yml file to its location on the newly repaired PCE. See Back Up Leader and Member Runtime Environment Files. The default location of the PCE Runtime Environment File is /etc/illumio-pce/runtime_env.yml. When the location is different on your system, locate the file by checking the value of the ILLUMIO_RUNTIME_ENV environment variable.

  3. Bring all nodes to runlevel 1:

    sudo -u ilo-pce illumio-pce-ctl start --runlevel 1
  4. On any node, verify runlevel 1:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  5. On any node, run the following command:

    sudo -u ilo-pce illumio-pce-db-management setup
  6. Repeat these steps for all PCEs in the Supercluster.

Step 4: Restore the Supercluster Data

Perform the following steps for all PCEs in the Supercluster one at a time or all in parallel.

  1. On any node of the PCE, verify the runlevel is still 1:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  2. On the data0 node, run the following depending on whether you are restoring a member or leader PCE.

    Member PCE

    In --local-pce-file, enter the path to the member PCE's backup file. In --restoring-pce-file, enter the path to the leader PCE's backup file, which should already be present on the PCE from when you followed the steps in Copy Leader Backup to Members.

    Note

    If necessary, copy the leader PCEs backup file to the data0 node of this PCE before running this command. Also, be sure that all member PCEs are using the same version of the leader PCE backup file. Using different versions of the leader PCE backup can cause data replication to fail after the supercluster restore is complete.

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --local-pce-file path_to_backup_file --restoring-pce-file path_to_leader_pce_backup_file  --restore-type entire_supercluster

    Leader PCE

    In --local-pce-file, enter the path to this leader PCE's backup file:

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --local-pce-file path_to_backup_file --restore-type entire_supercluster

    Note

    The restore operation can take some time to complete.

  3. On the data1 node, run the following command after the restore operation finishes:

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --skip-db-restore --local-pce-file path_to_backup_file --restore-type entire_supercluster

    The --skip-db-restore option prevents the command from unnecessarily repeating work that has already been done by previous commands.

  4. Set the runlevel to 2:

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 2

    Setting the run level might take some time to complete.

  5. Check the progress to see when the status is RUNNING on all nodes:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w

For all PCEs in the Supercluster, you must complete all steps in Install New PCEs or Reuse PCEs and Restore Supercluster Data. When finished, proceed to the next task.

Step 5: Rejoin the PCEs to the Supercluster

Warning

When you rejoin PCEs to a Supercluster, be sure to follow this rule:

  1. First, rejoin the leader PCE.

  2. Next, rejoin the member PCEs one at a time in any order.

  1. Rejoin the leader PCE to the Supercluster. This command can take some time depending on the number of PCEs in the Supercluster and the size of the PCE databases.

    On any core node, run this command:

    sudo -u ilo-pce illumio-pce-ctl supercluster-restore fqdn_of_failed_cluster --restore-type entire_supercluster

    While the command is running, it temporarily sets the runlevel to 1. When the command is interrupted, you might see runlevel 1 unexpectedly.

  2. Rejoin each member PCE to the Supercluster. This command can take some time depending on the number of PCEs in the Supercluster and the size of the PCE databases.

    On any core node, run the following command:

    sudo -u ilo-pce illumio-pce-ctl supercluster-restore fqdn_of_failed_cluster fqdn_of_supercluster_leader --restore-type entire_supercluster

    While this command is running, it temporarily sets the runlevel to 1. When the command is interrupted, you might see runlevel 1 unexpectedly.

  3. Repeat step 2 until all PCEs are rejoined to the Supercluster.

Step 6: Finish and Verify the Full Supercluster Restore

After rejoining all PCEs in the Supercluster:

  1. On all PCEs, set the runlevel to 5:

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 5
  2. On all PCEs, verify the runlevel:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  3. Verify that the restored PCEs have rejoined the Supercluster and are fully operational.

    1. Log into the leader PCE web console.

    2. Go to the PCE Health page and verify that the PCE health status is Normal.

  4. Check the status of the paired VENs on each PCE. From the PCE web console, choose Workloads and VENs > VENs. After all VENs change status from Active (Syncing) to Active, run the following command on one PCE at a time:

    sudo -u ilo-pce illumio-pce-ctl listen-only-mode disable