Skip to main content

Illumio Core 21.5 Install, Configure, Upgrade

Restore a PCE or Entire Supercluster

This section describes how to restore a single failed PCE, either leader or member, and rejoin it to a Supercluster. It also describes how to restore the entire Supercluster.

Restore a Single PCE in a Supercluster

This section explains how to restore a leader or member PCE in a Supercluster. Isolate that PCE from the Supercluster, restore it, and rejoin it to the Supercluster.

Summary

The following steps are an overview of how to restore a single PCE in a Supercluster. For detailed instructions, read the rest of this section.

    1. Have the backups and the copy of the affected PCE's runtime_env.yml configuration file ready to use.

    2. Know the IP address, ports, and DNS name of the affected PCE. You must use the same values when you rejoin the PCE to the Supercluster.

    Preparation:

  1. Isolate the affected PCE from the Supercluster.

  2. Install the PCE on new hardware or reuse the installation and runtime_env.yml file of the affected PCE.

  3. Restore the Supercluster data from backup.

  4. Join the repaired PCE to the Supercluster.

Prepare to Restore a Single PCE

Have the following items available:

  • Back up the failed PCE; see Back Up Supercluster for information.

  • Because this procedure removes all the VENs on the PCE, before you begin make sure to first take note of the versions of the VENs that are currently installed on the PCE so that you can reinstall them to the VEN Library during the Finish and Verify Full Supercluster Restore phase.

  • Back up the failed PCE's runtime_env.yml file.

  • Make a list of the IP address, ports, and FQDN of the failed PCE. You will use these values to reconfigure the repaired PCE.

Isolate the Affected PCE

Before restoring a single PCE, isolate that PCE from the Supercluster.

  1. On all nodes, shut down the affected PCE:

    sudo -u ilo-pce illumio-pce-ctl stop
  2. On a core node of each surviving PCE in the Supercluster, set the PCE to runlevel to 2:

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 2

    Note

    You must set all PCEs to runlevel 2 before proceeding to the next step.

  3. On any core node of a surviving PCE, drop the failed PCE from the Supercluster:

    sudo -u ilo-pce illumio-pce-ctl supercluster-drop fqdn_of_failed_pce
Install New PCE or Reuse Affected PCE

Decide whether to completely reinstall the PCE on new hardware or reuse the PCE installation that is already on the affected system.

Note

In both cases, you must reestablish the FQDN of the affected PCE so that VENs can continue to communicate with the Supercluster. When you have any VENs in enforcement, or you rely on DNS-based load balancing, the new IP addresses of the PCE nodes can be different, as long as the new IP addresses were already in the appropriate settings in the runtime_env.yml file on all PCE core nodes. See Pre-configure New IP addresses for information.

  • To reinstall the PCE on new hardware, see Deploy a PCE Supercluster.

  • To reuse the affected PCE installation, complete the following steps.

When you decide to reuse the PCE's pre-failure installation, refresh the installation as a standalone PCE:

  1. Power on the PCE nodes.

  2. On all nodes of the affected PCE, run the following command to delete pre-failure directories:

    sudo -u ilo-pce illumio-pce-ctl reset

    Note

    You must run this command on all nodes before proceeding to the next step.

  3. Copy your backed-up copy of the failed PCE's runtime_env.yml file to its location on the newly repaired PCE. See Back Up Leader and Member Runtime Environment Files for information.

    The default location of the PCE Runtime Environment File is /etc/illumio-pce/runtime_env.yml. When the location is different on your system, locate the file by checking the value of the ILLUMIO_RUNTIME_ENV environment variable.

  4. On all nodes, bring the nodes to runlevel 1:

    sudo -u ilo-pce illumio-pce-ctl start --runlevel 1
  5. On any node, verify the nodes are at runlevel 1:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  6. On any node, run the following command:

    sudo -u ilo-pce illumio-pce-db-management setup
Restore Affected PCE's Supercluster Data
  1. On any node of the affected PCE, verify the runlevel is still 1:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  2. On the data0 node, run the following command:

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --local-pce-file path_to_backup_file --restore-type single_pce

    The restore operation can take some time to complete. Wait until it finishes before proceeding to the next step.

  3. On the data1 node, run the following command:

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --skip-db-restore --local-pce-file path_to_backup_file --restore-type single_pce

    The --skip-db-restore option prevents the command from unnecessarily repeating work that has already been done by previous commands.

  4. On any core node, it will be necessary to reinstall any previously installed VEN bundles per the compatibility matrix because they will not be restored with the rest of the supercluster data. See Ways to Install the VEN.

Rejoin PCE to Supercluster
  1. On all Supercluster PCEs, set the runlevel to 2:

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 2

    Setting the runlevel might take some time to complete.

  2. Check the progress to see when the status is RUNNING on all nodes:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  3. Rejoin the PCE to the Supercluster. This command can take some time, depending on the number of PCEs in the Supercluster and the size of the PCE databases.

    Choose one of the following options, depending on whether you are working on a leader or member.

    Rejoining the Leader PCE

    On any core node, run the following command:

    sudo -u ilo-pce illumio-pce-ctl supercluster-restore fqdn_of_failed_cluster --restore-type single_pce

    While this command is running, the PCE temporarily sets the runlevel to 1. When the command is interrupted, you might see runlevel 1 unexpectedly.

    Rejoining a Member PCE

    On any core node, run the following command:

    sudo -u ilo-pce illumio-pce-ctl supercluster-restore fqdn_of_failed_clusterfqdn_of_supercluster_leader --restore-type single_pce

    While this command is running, the PCE temporarily sets the runlevel to 1. If the command is interrupted, you might see runlevel 1 unexpectedly.

  4. On every PCE, set the runlevel to 5:

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 5
  5. Verify the run level:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  6. Verify that the restored PCE has rejoined the Supercluster and is fully operational:

    1. Log in to the leader PCE web console.

    2. Go to the PCE Health page and verify that the PCE health status is Normal.

  7. Check the status of the paired VENs on each PCE. From the PCE web console, choose Workloads and VENs > VENs. After all VENs change status from Active (Syncing) to Active, run the following command on one PCE at a time:

    sudo -u ilo-pce illumio-pce-ctl listen-only-mode disable
Restore an Entire Supercluster

Restoring an entire Supercluster follows this high-level process:

  1. Preparation:

    1. Have the backups of all PCEs ready to use. For each member PCE, have that member's backup and, on the data0 node, a copy of the backup file from the leader. The leader only needs its own backup.

    2. Have copies of every PCE's runtime_env.yml configuration file ready to use.

    3. Know the IP address, ports, and DNS name of all PCEs in the Supercluster. You must use the same values when you rejoin the PCEs to the Supercluster.

  2. Shut down the entire Supercluster.

  3. Restore the PCEs. Repeat the following steps for all PCEs in the Supercluster, either one at a time or in parallel:

    1. Reinstall the PCE on new hardware or reuse the installations and runtime_env.yml files.

    2. Restore the Supercluster data from backup.

  4. Join the repaired PCEs to the Supercluster one at a time.

Prepare to Restore Entire Supercluster

Have the following items ready:

  • Backup of each PCE, and the leader's backup copied to each member. See Back Up Supercluster for information.

  • Copy of each PCE's runtime_env.yml file.

  • List of the new IP address, ports, and DNS name for all Supercluster members.

Shut Down Entire Supercluster

On all nodes of every PCE in the Supercluster, run the following command:

sudo -u ilo-pce illumio-pce-ctl stop
Install New PCEs or Reuse PCEs

Decide whether you want to completely reinstall the PCEs on new hardware or to reuse the PCE installations.

Note

In both cases, you must reestablish the FQDN of the affected PCE so that VENs can continue to communicate with the Supercluster. When you have any VENs in enforcement, or you rely on DNS-based load balancing, the new IP addresses of the PCE nodes can be different, as long as the new IP addresses were already in the appropriate settings in the runtime_env.yml file on all PCE core nodes. See Pre-configure New IP addresses for information.

  • To reinstall the PCEs on new hardware, see Deploy a PCE Supercluster for information.

  • To reuse the PCE installations, complete the following steps.

When you decide to reuse the PCE's pre-failure installation, refresh the installation as a standalone PCE:

  1. On all nodes of the PCE, reset the nodes:

    sudo -u ilo-pce illumio-pce-ctl reset

    Note

    You must run this command on all nodes before proceeding to the next step.

  2. Copy your backed-up copy of the failed PCE's runtime_env.yml file to its location on the newly repaired PCE. See Back Up Leader and Member Runtime Environment Files. The default location of the PCE Runtime Environment File is /etc/illumio-pce/runtime_env.yml. When the location is different on your system, locate the file by checking the value of the ILLUMIO_RUNTIME_ENV environment variable.

  3. Bring all nodes to runlevel 1:

    sudo -u ilo-pce illumio-pce-ctl start --runlevel 1
  4. On any node, verify runlevel 1:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  5. On any node, run the following command:

    sudo -u ilo-pce illumio-pce-db-management setup
  6. Repeat these steps for all PCEs in the Supercluster.

Restore Supercluster Data

Perform the following steps for all PCEs in the Supercluster one at a time or all in parallel.

  1. On any node of the PCE, verify the runlevel is still 1:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  2. On the data0 node, run the following depending on whether you are restoring a member or leader PCE.

    Member PCE

    In --local-pce-file, enter the path to the member PCE's backup file. In --restoring-pce-file, enter the path to the leader PCE's backup file, which should already be present on the PCE from when you followed the steps in Copy Leader Backup to Members.

    Note

    If necessary, copy the leader PCE's backup file to the data0 node of this PCE before running this command.

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --local-pce-file path_to_backup_file --restoring-pce-file path_to_leader_pce_backup_file  --restore-type entire_supercluster

    Leader PCE

    In --local-pce-file, enter the path to this leader PCE's backup file:

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --local-pce-file path_to_backup_file --restore-type entire_supercluster

    Note

    The restore operation can take some time to complete.

  3. On the data1 node, run the following command after the restore operation finishes:

    sudo -u ilo-pce illumio-pce-db-management supercluster-data-restore --skip-db-restore --local-pce-file path_to_backup_file --restore-type entire_supercluster

    The --skip-db-restore option prevents the command from unnecessarily repeating work that has already been done by previous commands.

  4. Set the runlevel to 2:

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 2

    Setting the run level might take some time to complete.

  5. Check the progress to see when the status is RUNNING on all nodes:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w

For all PCEs in the Supercluster, you must complete all steps in Install New PCEs or Reuse PCEs and Restore Supercluster Data. When finished, proceed to the next task.

Rejoin the PCEs to the Supercluster

Rejoin the leader PCE, then rejoin the member PCEs one at a time in any order.

  1. Rejoin the leader PCE to the Supercluster. This command can take some time depending on the number of PCEs in the Supercluster and size of the PCE databases.

    On any core node, run the following command:

    sudo -u ilo-pce illumio-pce-ctl supercluster-restore fqdn_of_failed_cluster --restore-type entire_supercluster

    While this command is running, it temporarily sets the runlevel to 1. When the command is interrupted, you might see runlevel 1 unexpectedly.

  2. Rejoin each member PCE to the Supercluster. This command can take some time depending on the number of PCEs in the Supercluster and size of the PCE databases.

    On any core node, run the following command:

    sudo -u ilo-pce illumio-pce-ctl supercluster-restore fqdn_of_failed_clusterfqdn_of_supercluster_leader --restore-type entire_supercluster

    While this command is running, it temporarily sets the runlevel to 1. When the command is interrupted, you might see runlevel 1 unexpectedly.

  3. Repeat step 2 until all PCEs are rejoined to the Supercluster.

Finish and Verify Full Supercluster Restore

After rejoining all PCEs in the Supercluster:

  1. Reinstall all of the VEN versions that were previously installed in the VEN library (mentioned in Prepare to Restore a Single PCE). Otherwise, installing or upgrading the Pairing Profile won't work and the VEN library will be broken.

  2. On all PCEs, set the runlevel to 5:

    sudo -u ilo-pce illumio-pce-ctl set-runlevel 5
  3. On all PCEs, verify the runlevel:

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w
  4. Verify that the restored PCEs have rejoined the Supercluster and are fully operational.

    1. Log into the leader PCE web console.

    2. Go to the PCE Health page and verify that the PCE health status is Normal.

  5. Check the status of the paired VENs on each PCE. From the PCE web console, choose Workloads and VENs > VENs. After all VENs change status from Active (Syncing) to Active, run the following command on one PCE at a time:

    sudo -u ilo-pce illumio-pce-ctl listen-only-mode disable