Skip to main content

Illumio Core 24.5 Administration Guide

Update PCE Configuration

This section describes how to change the configuration of a PCE at any time after the initial configuration is set during PCE installation.

Back up PCE Runtime File

Store a copy of each node's runtime_env.yml file on a system that is not part of the Supercluster. The default location of the PCE Runtime Environment File is /etc/illumio-pce/runtime_env.yml.

Update Runtime Configuration

Update the runtime_env.yml file with the configuration changes.

Run the following command to validate the runtime_env.yml file:

sudo -u ilo-pce illumio-pce-env check

Run the following command to restart the node with the configuration changes:

sudo -u ilo-pce illumio-pce-ctl restart
Get Current PCE Runlevel

When you first install the PCE software and start the PCE application, the runlevel is set to 1 by default. At runlevel 1, only the database services are running. This setting allows you to set up the database before the entire PCE application starts running.

Runlevel 1 is also used for upgrading the PCE software. When upgrade the PCE, you need to set the PCE runlevel to 1 before you migrate the PCE database. After database migration finishes, you can set the PCE runlevel back to 5 to start the entire PCE application.

When the PCE software is already at runlevel 5, setting the runlevel to 1 takes effect the next time the software is started.

For more information about upgrading the PCE software, see PCE Installation and Upgrade Guide.

Run this command to display the current Illumio PCE runlevel:

sudo -u ilo-pce illumio-pce-ctl get-runlevel
Set PCE Runlevel

Run this command to start the PCE cluster at one of the following runlevels:

  • Runlevel 1, which only starts the PCE database

  • Runlevel 5, which starts the entire PCE cluster

sudo -u ilo-pce illumio-pce-ctl set-runlevel [1 or 5]
Update PCE Certificates

Whenever the PCE certificates are updated, you must obtain the new certificate and update it on all PCE nodes. Use the following steps.

  1. Obtain the new certificate. The certificate must meet certificate requirements described in PCE Installation and Upgrade Guide.

  2. Stop all nodes in your deployment:

    sudo -u ilo-pce illumio-pce-ctl stop
  3. On all nodes, load the certificate into the correct directory.

    For example:

    /var/lib/illumio_pce/cert
  4. When the name of the new certificate is different from the name of the old certificate, update the file names in your runtime_env.yml file on every node.

  5. On all nodes, validate the certificate:

    sudo -u ilo-pce illumio-pce-env check
  6. Start all nodes in your deployment:

    sudo -u ilo-pce illumio-pce-ctl start
Change the PCE FQDN

To change the PCE FQDN:

  • Backup the database and restore the database with the change-fqdn option.

  • Configure runtime_env prior to the restore and make sure the web certificate has the new FQDN.

Ideally, the old FQDN is used as a Subject Alternative name on the new certificate. This way, the VENs can still connect to the PCE and update the FQDN on its own configuration, which depends on the reason the FQDN is being changed.

Warning

Before starting this process, add or generate another certificate with a new FQDN. If you skip this step, your cluster will stay down with old certificates.

You can change the fully-qualified domain name (FQDN) of a PCE as long as the PCE is not part of a Supercluster.

  1. On any node, shut down all PCE nodes:

    sudo -u ilo-pce illumio-pce-ctl cluster-stop
  2. Open the file runtime_env.yml.

  3. Modify the parameter pce_fqdn and save the file.

  4. Validate the runtime_env.yml file:

    sudo -u ilo-pce illumio-pce-env check

    Note

    Workloads that were paired with the old FQDN automatically detect and pair with the new FQDN as long as the PCE was stopped long enough for each VEN to attempt and fail at least one heartbeat.

  5. On any node, restart the PCE:

    sudo -u ilo-pce illumio-pce-ctl cluster-restart
Upgrade the OS on a Running PCE

You can upgrade the operating system on a running PCE cluster without stopping the entire cluster. Isolate one node at a time, wipe its disk, and install the new operating system while the other nodes in the PCE cluster continue to operate. The PCE can function with a mix of operating system versions on the different nodes.

Use this procedure when upgrading from one operating system version to another. If you are merely installing an operating system patch, you do not need to wipe the disk.

The general steps are as follows:

  1. Back up the PCE databases.

  2. Remove one node from the cluster.

  3. Wipe the disk and install the new operating system version.

  4. Install and configure the PCE software.

  5. Restore the node to the cluster.

  6. Repeat this procedure for the other nodes in the PCE cluster.

Back Up the PCE
  1. Back up the PCE policy and traffic databases and runtime_env.yml file. Follow the steps in PCE Database Backup. For a Supercluster, follow the steps in Back Up Supercluster in PCE Supercluster Deployment Guide.

  2. Save a copy of the PCE certificate in a safe location (not on the PCE node). Take note of the directory path where the certificate was stored. You will need to replace the certificate in the same location later.

  3. Save a copy of the private key in a safe location. Take note of the directory path where the key file was stored. You will need to replace the key in the same location later.

Remove a Node From the Cluster

Remove one node from the PCE cluster so you can update its operating system. The cluster will continue to operate using the remaining nodes.

Remove and upgrade the nodes in this order:

  • Core nodes

  • Replica data node

  • Primary data node

Caution

Remove and upgrade the policy database primary data node last to avoid unnecessary failover. To find the primary data node, run the following command on any node in the PCE cluster:

sudo -u ilo-pce illumio-pce-db-management show-master
  1. Verify that the cluster is running and healthy. If you remove a node from a PCE that is not in a healthy state, it can cause downtime. There are several ways to check the health of the PCE cluster; see Monitor PCE Health.

    One way to check PCE health is to run the following command:

    sudo -u ilo-pce illumio-pce-ctl cluster-status
  2. On the node that is to be removed, stop the PCE software:

    sudo -u ilo-pce illumio-pce-ctl stop

    Stopping the PCE software causes PCE services to fail over to their backup node.

  3. Check to be sure the PCE node is stopped.

    sudo -u ilo-pce illumio-pce-ctl cluster-status

    Expected output:

    Checking Illumio Runtime                         STOPPED 1.76s
  4. When you are removing the leader node, wait until the PCE has promoted another node to the leader before proceeding. Run the following command to determine the new leader node:

    sudo -u ilo-pce illumio-pce-ctl cluster-leader
  5. On the leader node, run the following command to be sure the data nodes are synchronized.

    Caution

    To avoid data loss, the data nodes must be synchronized before removing the node from the PCE cluster. Be sure the output from this command shows that the nodes are synchronized.

    sudo -u ilo-pce illumio-pce-ctl cluster-status

    Expected output is similar to the following:

    Reading /etc/illumio-pce/runtime_env.yml.
    SERVICES (runlevel: 5)               NODES (Reachable: 3 of 4)
    ======================               =========================
    agent_background_worker_service      192.0.2.241      192.0.2.242
    agent_service                        192.0.2.241      192.0.2.242
    agent_traffic_redis_cache            192.0.2.240
    agent_traffic_redis_server           192.0.2.240
    agent_traffic_service                192.0.2.241      192.0.2.241      192.0.2.242      192.0.2.242
    app_gateway_service                  192.0.2.240      192.0.2.241      192.0.2.242
    auditable_events_service             192.0.2.241      192.0.2.242
    citus_coordinator_replica_service    NOT RUNNING
    citus_coordinator_service            192.0.2.240
    cluster_management_service           192.0.2.241      192.0.2.242
    collector_service                    192.0.2.241      192.0.2.241      192.0.2.242      192.0.2.242
    data_job_queue_redis_replica_service NOT RUNNING
    data_job_queue_redis_service         192.0.2.240
    data_job_queue_service               192.0.2.241      192.0.2.241      192.0.2.242      192.0.2.242
    database_monitor                     192.0.2.240
    database_service                     192.0.2.240
    database_slave_service               NOT RUNNING
    db_cache_manager_service             192.0.2.240
    ev_service                           192.0.2.241      192.0.2.242
    events_background_worker_service     192.0.2.241      192.0.2.242
    executor_service                     192.0.2.241      192.0.2.242
    fileserver_service                   192.0.2.240
    fileserver_slave_service             NOT RUNNING
    flow_analytics_monitor_service       192.0.2.240
    flow_analytics_service               192.0.2.240      192.0.2.240
    fluentd_data_service                 192.0.2.240
    fluentd_source_service               192.0.2.241      192.0.2.242
    fluentd_sys_event_fwd_service        192.0.2.240      192.0.2.241      192.0.2.242
    login_service                        192.0.2.241      192.0.2.242
    memcached                            192.0.2.241      192.0.2.242
    network_device_service               192.0.2.241      192.0.2.242
    node_monitor                         192.0.2.240      192.0.2.241      192.0.2.242
    report_generator_service             192.0.2.241      192.0.2.242
    report_monitor_service               192.0.2.240
    reporting_database_monitor           192.0.2.240
    reporting_database_replica_service   NOT RUNNING
    reporting_database_service           192.0.2.240
    reporting_etl_service                192.0.2.241
    reporting_management_service         192.0.2.241      192.0.2.242
    search_index_service                 192.0.2.241      192.0.2.242
    server_load_balancer                 192.0.2.241      192.0.2.242
    service_discovery_agent              NOT RUNNING
    service_discovery_server             192.0.2.240      192.0.2.241      192.0.2.242
    set_server_redis_server              192.0.2.240
    traffic_database_monitor             192.0.2.240
    traffic_query_service                192.0.2.240
    traffic_worker_service               192.0.2.241      192.0.2.241      192.0.2.242      192.0.2.242
    web_server                           192.0.2.241      192.0.2.242
    
    Cluster status: RUNNING
  6. Wait until the cluster status has returned to RUNNING.

  7. On the leader node, remove the node. For ip_address, substitute the IP address of the node you are removing:

    sudo -u ilo-pce illumio-pce-ctl cluster-leave ip_address

    Expected output:

    Removed node successfully.
  8. Check the status of the PCE again to confirm it is still running normally:

    sudo -u ilo-pce illumio-pce-ctl cluster-status

    Expected output is similar to that shown in step 5.

Remove OS and Install New

Remove the old operating system version. Then install the new version. Use the documentation provided by your operating system vendor.

Reinstall the PCE
  • Install the PCE software and configure its runtime parameters.

    Important

    Do not start the PCE yet.

    • Be sure the PCE FQDN (hostname) is the same as before the upgrade.

    • Be sure the and IP addresses for all NICs are the same as before the upgrade.

    • Set up NTP and IPTables.

Restore PCE Files
  1. Copy the runtime_env.yml file to the same location where it was before.

  2. Replace the certificate and key files in the same directory path where they were before.

  3. Compare the certificate and key file locations to the specified locations in the runtime_env.yml file to be sure they match.

Restore Node to Cluster

Restore the node to the cluster.

  1. On the node where you just upgraded the OS, run the following command. For ip_address, substitute the IP address of any running node in the PCE cluster:

    sudo -u ilo-pce illumio-pce-ctl cluster-join ip_address

    After the node successfully joins the PCE cluster, the PCE software is started.

  2. Verify that the cluster is functional and data has been synchronized to all data nodes.

    sudo -u ilo-pce illumio-pce-ctl cluster-status -w

    Wait until this command returns output that shows all services are running. The output concludes with this line:

    Cluster status: RUNNING
Upgrade and Restore Remaining Nodes

Repeat this procedure for the other nodes in the PCE cluster. Reminder: Upgrade the primary database node last.