Update PCE Configuration
This section describes how to change the configuration of a PCE at any time after the initial configuration is set during PCE installation.
Back up PCE Runtime File
Store a copy of each node's runtime_env.yml
file on a system that is not part of the Supercluster. The default location of the PCE Runtime Environment File is /etc/illumio-pce/runtime_env.yml
.
Update Runtime Configuration
Update the runtime_env.yml
file with the configuration changes.
Run the following command to validate the runtime_env.yml
file:
sudo -u ilo-pce illumio-pce-env check
Run the following command to restart the node with the configuration changes:
sudo -u ilo-pce illumio-pce-ctl restart
Get Current PCE Runlevel
When you first install the PCE software and start the PCE application, the runlevel is set to 1 by default. At runlevel 1, only the database services are running. This setting allows you to set up the database before the entire PCE application starts running.
Runlevel 1 is also used for upgrading the PCE software. When upgrade the PCE, you need to set the PCE runlevel to 1 before you migrate the PCE database. After database migration finishes, you can set the PCE runlevel back to 5 to start the entire PCE application.
When the PCE software is already at runlevel 5, setting the runlevel to 1 takes effect the next time the software is started.
For more information about upgrading the PCE software, see PCE Installation and Upgrade Guide.
Run this command to display the current Illumio PCE runlevel:
sudo -u ilo-pce illumio-pce-ctl get-runlevel
Set PCE Runlevel
Run this command to start the PCE cluster at one of the following runlevels:
Runlevel 1, which only starts the PCE database
Runlevel 5, which starts the entire PCE cluster
sudo -u ilo-pce illumio-pce-ctl set-runlevel [1 or 5]
Update PCE Certificates
Whenever the PCE certificates are updated, you must obtain the new certificate and update it on all PCE nodes. Use the following steps.
Obtain the new certificate. The certificate must meet certificate requirements described in PCE Installation and Upgrade Guide.
Stop all nodes in your deployment:
sudo -u ilo-pce illumio-pce-ctl stop
On all nodes, load the certificate into the correct directory.
For example:
/var/lib/illumio_pce/cert
When the name of the new certificate is different from the name of the old certificate, update the file names in your
runtime_env.yml
file on every node.On all nodes, validate the certificate:
sudo -u ilo-pce illumio-pce-env check
Start all nodes in your deployment:
sudo -u ilo-pce illumio-pce-ctl start
Change the PCE FQDN
To change the PCE FQDN:
Backup the database and restore the database with the
change-fqdn
option.Configure
runtime_env
prior to the restore and make sure the web certificate has the new FQDN.
Ideally, the old FQDN is used as a Subject Alternative name on the new certificate. This way, the VENs can still connect to the PCE and update the FQDN on its own configuration, which depends on the reason the FQDN is being changed.
Warning
Before starting this process, add or generate another certificate with a new FQDN. If you skip this step, your cluster will stay down with old certificates.
You can change the fully-qualified domain name (FQDN) of a PCE as long as the PCE is not part of a Supercluster.
On any node, shut down all PCE nodes:
sudo -u ilo-pce illumio-pce-ctl cluster-stop
Open the file
runtime_env.yml
.Modify the parameter
pce_fqdn
and save the file.Validate the
runtime_env.yml
file:sudo -u ilo-pce illumio-pce-env check
Note
Workloads that were paired with the old FQDN automatically detect and pair with the new FQDN as long as the PCE was stopped long enough for each VEN to attempt and fail at least one heartbeat.
On any node, restart the PCE:
sudo -u ilo-pce illumio-pce-ctl cluster-restart
Upgrade the OS on a Running PCE
You can upgrade the operating system on a running PCE cluster without stopping the entire cluster. Isolate one node at a time, wipe its disk, and install the new operating system while the other nodes in the PCE cluster continue to operate. The PCE can function with a mix of operating system versions on the different nodes.
Use this procedure when upgrading from one operating system version to another. If you are merely installing an operating system patch, you do not need to wipe the disk.
The general steps are as follows:
Back up the PCE databases.
Remove one node from the cluster.
Wipe the disk and install the new operating system version.
Install and configure the PCE software.
Restore the node to the cluster.
Repeat this procedure for the other nodes in the PCE cluster.
Back Up the PCE
Back up the PCE policy and traffic databases and
runtime_env.yml
file. Follow the steps in PCE Database Backup. For a Supercluster, follow the steps in Back Up Supercluster in PCE Supercluster Deployment Guide.Save a copy of the PCE certificate in a safe location (not on the PCE node). Take note of the directory path where the certificate was stored. You will need to replace the certificate in the same location later.
Save a copy of the private key in a safe location. Take note of the directory path where the key file was stored. You will need to replace the key in the same location later.
Remove a Node From the Cluster
Remove one node from the PCE cluster so you can update its operating system. The cluster will continue to operate using the remaining nodes.
Remove and upgrade the nodes in this order:
Core nodes
Replica data node
Primary data node
Caution
Remove and upgrade the policy database primary data node last to avoid unnecessary failover. To find the primary data node, run the following command on any node in the PCE cluster:
sudo -u ilo-pce illumio-pce-db-management show-master
Verify that the cluster is running and healthy. If you remove a node from a PCE that is not in a healthy state, it can cause downtime. There are several ways to check the health of the PCE cluster; see Monitor PCE Health.
One way to check PCE health is to run the following command:
sudo -u ilo-pce illumio-pce-ctl cluster-status
On the node that is to be removed, stop the PCE software:
sudo -u ilo-pce illumio-pce-ctl stop
Stopping the PCE software causes PCE services to fail over to their backup node.
Check to be sure the PCE node is stopped.
sudo -u ilo-pce illumio-pce-ctl cluster-status
Expected output:
Checking Illumio Runtime STOPPED 1.76s
When you are removing the leader node, wait until the PCE has promoted another node to the leader before proceeding. Run the following command to determine the new leader node:
sudo -u ilo-pce illumio-pce-ctl cluster-leader
On the leader node, run the following command to be sure the data nodes are synchronized.
Caution
To avoid data loss, the data nodes must be synchronized before removing the node from the PCE cluster. Be sure the output from this command shows that the nodes are synchronized.
sudo -u ilo-pce illumio-pce-ctl cluster-status
Expected output is similar to the following:
Reading /etc/illumio-pce/runtime_env.yml. SERVICES (runlevel: 5) NODES (Reachable: 3 of 4) ====================== ========================= agent_background_worker_service 192.0.2.241 192.0.2.242 agent_service 192.0.2.241 192.0.2.242 agent_traffic_redis_cache 192.0.2.240 agent_traffic_redis_server 192.0.2.240 agent_traffic_service 192.0.2.241 192.0.2.241 192.0.2.242 192.0.2.242 app_gateway_service 192.0.2.240 192.0.2.241 192.0.2.242 auditable_events_service 192.0.2.241 192.0.2.242 citus_coordinator_replica_service NOT RUNNING citus_coordinator_service 192.0.2.240 cluster_management_service 192.0.2.241 192.0.2.242 collector_service 192.0.2.241 192.0.2.241 192.0.2.242 192.0.2.242 data_job_queue_redis_replica_service NOT RUNNING data_job_queue_redis_service 192.0.2.240 data_job_queue_service 192.0.2.241 192.0.2.241 192.0.2.242 192.0.2.242 database_monitor 192.0.2.240 database_service 192.0.2.240 database_slave_service NOT RUNNING db_cache_manager_service 192.0.2.240 ev_service 192.0.2.241 192.0.2.242 events_background_worker_service 192.0.2.241 192.0.2.242 executor_service 192.0.2.241 192.0.2.242 fileserver_service 192.0.2.240 fileserver_slave_service NOT RUNNING flow_analytics_monitor_service 192.0.2.240 flow_analytics_service 192.0.2.240 192.0.2.240 fluentd_data_service 192.0.2.240 fluentd_source_service 192.0.2.241 192.0.2.242 fluentd_sys_event_fwd_service 192.0.2.240 192.0.2.241 192.0.2.242 login_service 192.0.2.241 192.0.2.242 memcached 192.0.2.241 192.0.2.242 network_device_service 192.0.2.241 192.0.2.242 node_monitor 192.0.2.240 192.0.2.241 192.0.2.242 report_generator_service 192.0.2.241 192.0.2.242 report_monitor_service 192.0.2.240 reporting_database_monitor 192.0.2.240 reporting_database_replica_service NOT RUNNING reporting_database_service 192.0.2.240 reporting_etl_service 192.0.2.241 reporting_management_service 192.0.2.241 192.0.2.242 search_index_service 192.0.2.241 192.0.2.242 server_load_balancer 192.0.2.241 192.0.2.242 service_discovery_agent NOT RUNNING service_discovery_server 192.0.2.240 192.0.2.241 192.0.2.242 set_server_redis_server 192.0.2.240 traffic_database_monitor 192.0.2.240 traffic_query_service 192.0.2.240 traffic_worker_service 192.0.2.241 192.0.2.241 192.0.2.242 192.0.2.242 web_server 192.0.2.241 192.0.2.242 Cluster status: RUNNING
Wait until the cluster status has returned to RUNNING.
On the leader node, remove the node. For
ip_address
, substitute the IP address of the node you are removing:sudo -u ilo-pce illumio-pce-ctl cluster-leave ip_address
Expected output:
Removed node successfully.
Check the status of the PCE again to confirm it is still running normally:
sudo -u ilo-pce illumio-pce-ctl cluster-status
Expected output is similar to that shown in step 5.
Remove OS and Install New
Remove the old operating system version. Then install the new version. Use the documentation provided by your operating system vendor.
Reinstall the PCE
Install the PCE software and configure its runtime parameters.
Important
Do not start the PCE yet.
Be sure the PCE FQDN (hostname) is the same as before the upgrade.
Be sure the and IP addresses for all NICs are the same as before the upgrade.
Set up NTP and IPTables.
Restore PCE Files
Copy the
runtime_env.yml
file to the same location where it was before.Replace the certificate and key files in the same directory path where they were before.
Compare the certificate and key file locations to the specified locations in the
runtime_env.yml
file to be sure they match.
Restore Node to Cluster
Restore the node to the cluster.
On the node where you just upgraded the OS, run the following command. For
ip_address
, substitute the IP address of any running node in the PCE cluster:sudo -u ilo-pce illumio-pce-ctl cluster-join ip_address
After the node successfully joins the PCE cluster, the PCE software is started.
Verify that the cluster is functional and data has been synchronized to all data nodes.
sudo -u ilo-pce illumio-pce-ctl cluster-status -w
Wait until this command returns output that shows all services are running. The output concludes with this line:
Cluster status: RUNNING
Upgrade and Restore Remaining Nodes
Repeat this procedure for the other nodes in the PCE cluster. Reminder: Upgrade the primary database node last.