PCE Supercluster Deployment Planning

This section describes requirements that you need to follow before deploying a PCE Supercluster.

Plan Supercluster FQDNs Carefully

Be sure to plan the fully qualified domain names (FQDNs) you want to use with your Supercluster PCEs. Be careful to define these names exactly how you want them before you deploy the Supercluster. Changing FQDNs after deploying a Supercluster is possible but time-consuming. The PCE FQDNs are set in the pce_fqdn parameter in runtime_env.yml.

For example, you might want to have identifying strings in the FQDNs that indicate the geographic location of the various members of the Supercluster, such as the following examples:

illumio-eu.bigco.com: eu in the hostname indicates Europe.
illumio.na.bigco.com: North America as a separate domain.

You can also configure a global FQDN for the Supercluster. The global FQDN is used by the VENs rather than individual PCE FQDNs. The global Supercluster FQDN is set in the supercluster_fqdn parameter in runtime_env.yml.

When set, the PCE provides this FQDN instead of its own FQDN to VENs during pairing. This parameter must be set on all nodes in each PCE of the Supercluster. When you configure this option, each PCE server certificate must include the global FQDN in the SAN field. For example:

illumio-supercluster.bigco.com

Number of Supercluster PCEs

A PCE Supercluster consists of a minimum of two and a maximum of eight (8) PCEs. One of the PCEs is always the Supercluster leader, while the others are Supercluster members.

Capacity Planning for Supercluster PCEs

Use these guidelines and requirements to estimate host system capacity based on typical usage patterns.

The exact requirements vary based on a large number of factors, including, but not limited to:

Whether you are using physical or virtual hardware
Number of managed workloads
Number of unmanaged workloads and other labeled objects, such as virtual services
Policy complexity, which includes the following factors:
- Number of rules in your rulesets
- Number of labels, IP lists, and other objects in your rules
- Number of IP ranges in your IP lists
- Number of workloads affected by your rules
The frequency at which your policies change
Frequency at which workloads are added or deleted, or workload context changes, such as change of IP address
Volume of traffic flows per second reported to the PCE from all VENs
See the “Maximum Flow Capacity” table for information about maximum flow capacity of the PCE.
Total number of unique flows reported to the PCE from all VENs

Recommended CPU, Memory, and Storage

The capacity planning tables in this section list the minimum recommended sizes for CPU, memory, and storage. This section provides two tables, one for physical hardware and one for virtual machines. Use these tables to plan your deployment.

Note

Based on your actual usage and other factors, your capacity needs might be greater than the recommended sizes. For example, if you have installed additional software along with the PCE, such as application performance management (APM) software or an endpoint protection agent, this consumes additional system resources.

Data nodes are configured with a dedicated storage device for each database on the data nodes. This configuration accommodates growth in traffic data, which is used by Explorer.

For more than 150 IOPS, locally attached, spinning hard disk drives (HDD) are not sufficient. You will require either mixed-use Solid-State Disk (SSD) or Storage Area Network (SAN).

The PCE does not require that you set up swap memory, but it is permissible to enable swap memory. As long as the PCE nodes are provisioned with the recommended memory (RAM) as shown in the tables below, the use of swap memory should not cause any issues.

Physical Hardware

Use this table if you are installing the PCE on physical hardware. If you are using virtual machines, see the table Virtual Hardware.

MNC Type + Workloads/VENs	Cores/Clock Speed	RAM per Node	Storage Device Size and IOPS
MNC Type + Workloads/VENs	Cores/Clock Speed	RAM per Node	Core Nodes	Data Nodes
SNC 250 VENs1 2500 workloads	3 cores2 Intel® Xeon(R) CPU E5-2695 v4 at 2.10GHz or equivalent	16GB	A single node including both core and data: 1 x 50GB4 100 IOPS per device5	N/A
2x2 Small 2,500 VENs1 12,500 workloads Cluster type: `4node_v0_small`	4 cores per node2 Intel® Xeon(R) CPU E5-2695 v4 at 2.10GHz or equivalent	32GB	Minimum: Disk: 50GB3, 4 150 IOPS per device5	Minimum: Disk 1: 250GB4 Disk 2: 250GB4 600 IOPS per device5
2x2 10,000 VENs1 50,000 workloads Cluster type: `4node_v0` or `4node_dx`	16 cores per node2, 6 Intel® Xeon(R) CPU E5-2695 v4 at 2.10GHz or equivalent	Recommended: 128GB6 Minimum: 64GB	Minimum: Disk: 50GB3, 4 150 IOPS per device5	Minimum: Disk 1: 1TB4 Disk 2: 1TB4 1,800 IOPS per device5
4x2 25,000 VENs1 125,000 workloads Cluster type: `6node_v0` or `6node_dx`	16 cores per node2, 6 Intel® Xeon(R) CPU E5-2695 v4 at 2.10GHz or equivalent.	128GB6	Minimum: Disk: 50GB3, 4 150 IOPS per device5	Disk 1: 1TB4 Disk 2: 1TB4 5,000 IOPS per device5

Footnotes:

1 Number of VENs/workloads is the sum of both the number of managed VENs and the number of unmanaged workloads.

2 CPUs:

The recommended number of cores is based only on physical cores from allocated CPUs, irrespective of hyper-threading.

3 This is the absolute minimum needed. In the future, other applications, support reports, or new features may require additional disk.

4 Additional disk notes:

Storage requirements for network traffic data can increase rapidly as the amount of network traffic increases.
Network File Systems (NFS) is not supported for Illumio directories specified in runtime; for example, data_dir, persistent_data_dir, ephemeral_data_dir.

5 Input/output operations per second (IOPS) are based on 8K random write operations. IOPS specified for an average of 300 flow summaries (80% unique src_ip, dest_ip, dest_port, proto) per workload every 10 minutes. Different traffic profiles might require higher IOPS.

6 In the case of fresh installs or upgrades of a 2x2 for 10,000 VENs or a 4x2 for 25,000 VENs, if you deploy a system without sufficient cores, memory, or both, then the PCE will automatically reduce the object limits to 2,500 workloads. Object limit is the number of VENs (agents) per PCE. Adding more than 2,500 workloads will fail and an event is logged indicating that object limits have been exceeded. The workaround is to increase the number of cores, memory, or both to the recommended specifications and then increase the object limits manually. See "PCE Default Object Limits" in the PCE Administration Guide.

Virtual Hardware

Use this table if you are installing the PCE on virtual machines. If you are using physical hardware, see the table Physical Hardware.

MNC Type + Workloads/VENs	Virtual Cores/Clock Speed	RAM per Node	Storage Device Size and IOPS
MNC Type + Workloads/VENs	Virtual Cores/Clock Speed	RAM per Node	Core Nodes	Data Nodes
SNC 250 VENs1 2500 workloads	6 virtual cores (vCPU)2 Intel® Xeon(R) CPU E5-2695 v4 at 2.10GHz or higher	16GB7	Minimum: Disk: 50GB3, 4 150 IOPS per device5	N/A
2x2 Small 2,500 VENs1 12,500 workloads Cluster type: `4node_v0_small`	8 virtual cores (vCPU) per node2 Intel® Xeon(R) CPU E5-2695 v4 at 2.10GHz or higher	32GB7	Minimum: Disk: 50GB3, 4 150 IOPS per device5	Minimum: Disk 1: 250GB Disk 2: 250GB 600 IOPS per device
2x2 10,000 VENs1 50,000 workloads Cluster type: `4node_v0` or `4node_dx`	32 virtual cores (vCPU) per node2, 6 Intel® Xeon(R) CPU E5-2695 v4 at 2.10GHz or higher	Recommended: 128GB6, 7 Minimum: 64GB	Minimum: Disk: 50GB3, 4 150 IOPS per device5	Minimum: Disk 1: 1TB4 Disk 2: 1TB4 1,800 IOPS per device5
4x2 25,000 VENs1 125,000 workloads Cluster type: `6node_v0` or `6node_dx`	32 virtual cores (vCPU) per node2, 6 Intel® Xeon(R) CPU E5-2695 v4 at 2.10GHz or higher	128GB6, 7	Minimum: Disk: 50GB3, 4 150 IOPS per device5	Disk 1: 1TB4 Disk 2: 1TB4 5,000 IOPS per device5

Footnotes:

1 Number of VENs/workloads is the sum of both the number of managed VENs and the number of unmanaged workloads.

2 Full reservations for vCPU. No overcommit.

3 This is the absolute minimum needed. In the future, other applications, support reports, or new features may require additional disk.

4 Additional disk notes:

Storage requirements for network traffic data can increase rapidly as the amount of network traffic increases.
Network File Systems (NFS) is not supported for Illumio directories specified in runtime; for example, data_dir, persistent_data_dir, ephemeral_data_dir.

7 Full reservations for vRAM. No overcommit.

Maximum Flow Capacity

The following table shows the maximum capacity of the PCE to accept flow data from all VENs.

Cluster Type + VENs and Total Workloads	Flow Rate (flow-summaries/second)	Equivalent Flow Rate (flows/second)²
SNC 250 VENs 1,250 total workloads	100	1,030
2x2 2,500 VENs 12,500 total workloads	1,000	10,300
2x2 10,000 VENs 50,000 total workloads	4,100	422,000
4x2 25,000 VENs 125,000 total workloads	10,400 ¹	1,070,000

Footnotes

¹ The PCE might need to be tuned to achieve this rate. If you need to tune the PCE, please contact Illumio Support for assistance.

² Real-world observation shows that 102 flows result in one flow summary on average.

Storage Device Layout

You should create separate storage device partitions to reserve the amount of space specified below. See "PCE Capacity Planning" in the PCE Installation and Upgrade Guide.

The values given in these recommendation tables are guidelines based on testing in Illumio’s labs. If you wish to deviate from these recommendations based on your own platform standards, please first contact your Illumio support representative for advice and approval.

PCE Single-Node Cluster for 250 VENs

Storage Device	Partition mount point	Size to Allocate	Node Types	Notes
Device 1, Partition A	`/`	8GB	Core, Data	The size of this partition assumes the system temporary files are stored in /tmp and core dump file size is set to zero. The PCE installation occupies approximately 500MB of this space.
Device 1, Partition B	`/var/log`	16GB	Core, Data	The size of this partition assumes that PCE application logs and system logs are both stored in `/var/log`. PCE application logs are stored in the `/var/log/illumio-pce` directory. The recommended size assumes average use by the OS with common packages installed and logging levels set to system defaults. Log size limits are configurable, so your system may require more or less log space. To find the potential maximum disk space required for your logs, use this command: sudo -u ilo-pce illumio-pce-env logs --diag
Device 1, Partition C	`/var/lib/illumio-pce`	Balance of Device 1	Core, Data	The size of this partition assumes that Core nodes use local storage for application code in /var/lib/illumio-pce, and also assumes that PCE support report files, and other temporary (ephemeral) files, etc., are stored in /var/lib/illumio-pce/tmp.

Storage Device

Partition mount point

Size to Allocate

Node Types

Notes

Device 1, Partition A

/

8GB

Core, Data

The size of this partition assumes the system temporary files are stored in /tmp and core dump file size is set to zero. The PCE installation occupies approximately 500MB of this space.

Device 1, Partition B

/var/log

16GB

Core, Data

The size of this partition assumes that PCE application logs and system logs are both stored in /var/log. PCE application logs are stored in the /var/log/illumio-pce directory. The recommended size assumes average use by the OS with common packages installed and logging levels set to system defaults. Log size limits are configurable, so your system may require more or less log space. To find the potential maximum disk space required for your logs, use this command:

sudo -u ilo-pce illumio-pce-env logs --diag

Device 1, Partition C

/var/lib/illumio-pce

Balance of Device 1

Core, Data

The size of this partition assumes that Core nodes use local storage for application code in /var/lib/illumio-pce, and also assumes that PCE support report files, and other temporary (ephemeral) files, etc., are stored in /var/lib/illumio-pce/tmp.

PCE 2x2 Multi-Node Cluster for 2,500 VENs

Storage Device	Partition mount point	Size to Allocate	Node Types	Notes
Device 1, Partition A	`/`	16GB	Core, Data	The size of this partition assumes the system temporary files are stored in /tmp and core dump file size is set to zero.
Device 1, Partition B	`/var/log`	32GB	Core, Data	The size of this partition assumes that PCE application logs and system logs are both stored in `/var/log`. PCE application logs are stored in the `/var/log/illumio-pce` directory.
Device 1, Partition C	`/var/lib/illumio-pce`	Balance of Device 1	Core, Data	The size of this partition assumes that Core nodes use local storage for application code in /var/lib/illumio-pce, and also assumes that PCE support report files, and other temporary (ephemeral) files, etc. are stored in /var/lib/illumio-pce/tmp.
Device 2, Single partition. Applicable in a two-storage-device configuration	`/var/lib/illumio-pce/data/Explorer`	All of Device 2 (250GB)	Data	For network traffic data in a two-storage-device configuration for the data nodes, it should be a separate device that is mounted on this directory. Set the `runtime_emv.yml` to `data_dir: /var/lib/illumio-pce/data/Explorer`, which will automatically create a subdirectory called `/var/lib/illumio-pce/data/Explorer/traffic_datastore` The partition mount point and the runtime setting must match. If you customize the mount point, make sure that you also change the runtime setting accordingly.

PCE 2x2 Multi-Node Cluster for 10,000 VENs and

PCE 4x2 Multi-Node Cluster for 25,000 VENs

Storage Device	Partition mount point	Size to Allocate	Node Types	Notes
Device 1, Partition A	`/`	16GB	Core, Data	The size of this partition assumes the system temporary files are stored in /tmp and core dump file size is set to zero.
Device 1, Partition B	`/var/log`	32GB	Core, Data	The size of this partition assumes that PCE application logs and system logs are both stored in `/var/log`. PCE application logs are stored in the `/var/log/illumio-pce` directory.
Device 1, Partition C	`/var/lib/illumio-pce`	Balance of Device 1	Core, Data	The size of this partition assumes that Core nodes use local storage for application code in /var/lib/illumio-pce, and also assumes that PCE support report files, and other temporary (ephemeral) files, etc. are stored in /var/lib/illumio-pce/tmp.
Device 2, Single Partition Applicable in a two-storage-device configuration	`/var/lib/illumio-pce/data/traffic`	All of Device 2 (1TB)	Data	For network traffic data in a two-storage-device configuration for the data nodes, it should be a separate device that is mounted on this directory. In `runtime_env.yml`, set the `traffic_datastore : data_dir` parameter to match the value of the partition mount point (see previous column) as follows: `traffic_datastore: data_dir: /var/lib/illumio-pce/data/traffic`. The partition mount point and the runtime setting must match. If you customize the mount point, make sure that you also change the runtime setting accordingly.

Runtime Parameters for Two-Storage-Device Configuration

In the two-storage-device configuration, to accommodate growth in the traffic data store, set the following parameters in runtime_env.yml:

Note

When you are deploying the two-storage-device configuration, you must set these parameters.

traffic_datastore:

data_dir: path_to_second_disk
max_disk_usage_gb: Set this parameter according to the table below.
partition_fraction: Set this parameter according to the table below.
time_bucket_type: Set this parameter according to the table below.

The recommended values for the above parameters based on PCE node cluster type (2x2 or 4x2) and the estimated number of workloads (VENs) are as follows:

Setting	2x2 \| 2,500 VENs	2x2 \| 10,000 VENs	4x2 \| 25,000 VENs	Note
`traffic_datastore:max_disk_usage_gb`	100 GB	400 GB	400 GB	This size reflects only part of the required total size, as detailed in "PCE Capacity Planning" in the PCE Installation and Upgrade Guide.
`traffic_datastore:partition_fraction`	0.5	0.5	0.5
`traffic_datastore:time_bucket_type`	Day	Day	Day

Setting

2x2

| 2,500 VENs

2x2 | 10,000 VENs

4x2 | 25,000 VENs

Note

traffic_datastore:max_disk_usage_gb

100 GB

400 GB

This size reflects only part of the required total size, as detailed in "PCE Capacity Planning" in the PCE Installation and Upgrade Guide.

traffic_datastore:partition_fraction

0.5

traffic_datastore:time_bucket_type

Day

Network Traffic Between PCEs

PCEs in the Supercluster communicate via the following ports. Any network firewalls between the PCEs must be configured to allow this traffic.

Ports	Sources	Destinations
The default TCP 8443 or the management port configured for the PCE Web Console and REST API in `runtime_env.yml`. This port must be the same on all PCEs in the Supercluster.	Core nodes of leader PCE	PCE FQDN of all member PCEs
TCP 5432	All nodes of all PCEs	IP addresses of all other PCE data nodes
TCP 5532	Core nodes of leader PCE	IP addresses of all other PCE data nodes
TCP 8302	All nodes of all PCEs	PCE FQDN of all other PCEs and IP address of all nodes of all other PCEs
UDP 8302	All nodes of all PCEs	IP address of all nodes of all other PCEs
TCP 8300	All nodes of all PCEs	IP address of all nodes of all other PCEs

Load Balancers

Similar to a single PCE, all PCEs in the Supercluster must be front-ended with a load balancer (DNS or L4) to distribute requests across the PCEs' core nodes.

GSLB or a manual DNS update can be used to fail over VENs to a different PCE. See GSLB Requirements and High Availability and Disaster Recovery.

Traffic Load Balancer Configuration

When you use L4 load balancers in front of the PCEs, the load balancers should already be configured to forward inbound connections on the default TCP 8443 or the management port configured for the PCE web console and REST API in runtime_env.yml and 8444 to an available, healthy core node.

In a Supercluster, the L4 load balancer must also be configured to forward additional inbound TCP 8302 connections originating from the other PCEs to an available, healthy core node.

GSLB Requirements

Workloads can be paired to a specific PCE, or you can optionally use a GSLB to route workloads to the required PCE in your Supercluster.

When you are using a GSLB to route workloads, consider the following general guidelines.

For normal operations:

When all PCEs are available, workloads should be routed to the nearest PCE based on proximity and geolocation.
GSLB persistence (also known as “stickiness”) must be enabled so workloads are always routed to the same PCE that they are paired with (non-failure case). Balancing workloads across multiple PCEs is not supported.

For failover:

Recommended: A dedicated failover PCE joined to the Supercluster that has no other VENs.
Failover to any other PCE in the Supercluster. In this case, take care to prevent overloading the PCE beyond its rated capacity and to avoid cascading failures. One strategy is to configure a “buddy” PCE for each PCE that the GSLB uses for failover.
Workload failover time depends on the DNS time-to-live (TTL) configured in the GSLB.
Illumio strongly recommends that you do not automate workload failover using GSLB and instead initiate it manually.

Configure SAML IdP for User Login

After installation, you can configure the PCE to rely on an external, third-party SAML identity source system IIdP). See "Single Sign-On Configuration" in PCE Administration Guide .The guide provides set up instructions for a wide variety of IdPs.

For the PCE Supercluster, you configure the details in the leader PCE web console exactly as you do for the standalone PCE, with one exception: you are presented an intermediate page that lists all the PCEs in the Supercluster, including the leader and all members. Follow the same processes detailed in PCE Administration Guide to configure all the Supercluster PCEs, both leader and members.

Certificate Requirements

PCE-to-PCE communication is done over TLS v1.2. The root CA certificate that signed each PCEs certificate must be in the root CA bundle on all other PCEs in the Supercluster.

Object Limits and Supercluster

The PCE enforces certain soft and hard limits to restrict the total number of system objects you can create. These limits are based on tested performance and capacity limits of the PCE. Most PCE object limits apply to the entire Supercluster. The limits are enforced by the leader when objects are created.

The object limit for the number of VENs per PCE (active_agents_per_pce) is not cluster-wide and applies to each PCE. When the VENs per PCE limit is reached, no more VENs can be paired to that PCE. This limit is enforced by moving VENs from one PCE to another via the REST API.

An exception is made when VENs are failed over by the system itself from one PCE to a different PCE in the cluster. The VENs that failover do not count towards the limit, allowing you to temporarily exceed the limit of VENs per PCE when an extended outage to a PCE in the Supercluster occurs.

Changes to the object limit for the number of VENs per PCE (active_agents_per_pce) made on the Supercluster leader are propagated to the members within 30 minutes.

For more information on object limits and how to view your current object limit usage, see PCE Administration Guide, the command 'illumio-pce-ctl obj-limits list".

RBAC Permissions: Leader or Member

In general, when you are using the Illumio PCE web console or the Illumio REST API, the types of operations you can perform depend on your PCE role-based access control (RBAC) permissions and whether you have logged into the leader or a member, as shown in the table below.

User Role	Operations	Leader	Members
Any Role	View objects	Yes	Yes
Global Administrator & User Manager (Organization Owner)	Add, delete users Add, modify, delete, and provision system objects and rulesets (includes creating a pairing script).	Yes	No
Global Administrator	Add, modify, delete, and provision system objects and rulesets (includes creating a pairing script)	Yes	No
Global read-only	View all objects	Yes	Yes
Global Policy Object Provisioner	Provision system objects	Yes	No
Ruleset Manager	Create, update, and delete rulesets within defined scopes.	Yes	No
Ruleset Provisioner	Provision rulesets within defined scopes.	Yes	No

Process, File Limits, and Kernel Parameters

Even if you are running systemd, the file and kernel limits must be set as outlined in our build document for init.d systems.

Servers with systemd also need to make the configuration changes outlined for init.d systems because some of our supercluster command-line tool commands are hard-coded to reference init.d security limits. It is necessary to set file and process limits for both configuration file changes. Please refer to our build documentation for the required settings.

For reference, see "Requirements for PCE Installation" in "PCE Installation and Upgrade".

Configure PCE Internal Syslog on Leader

You can configure the PCE's internal syslog service in the PCE web console on the Supercluster leader, for both the leader and the member PCEs. The internal syslog cannot be configured on a member PCE.

Note

When a standalone PCE is installed, a local destination for the PCE internal syslog is created to record events. When the PCE is joined as a member of the Supercluster, this local destination is removed.

After joining a member, you have to log into the Supercluster leader and configure the internal syslog for each member individually.

When the events occurring before joining a PCE as a member are essential to preserve, back up the PCE before you join it to the Supercluster.

See PCE Installation and Upgrade Guide for information about the PCE internal syslog.

Illumio Core 22.2 Install, Configure, Upgrade

PCE Supercluster Deployment Planning

Plan Supercluster FQDNs Carefully

Number of Supercluster PCEs

Capacity Planning for Supercluster PCEs

Recommended CPU, Memory, and Storage

Note

Maximum Flow Capacity

Storage Device Layout

Runtime Parameters for Two-Storage-Device Configuration

Note

Network Traffic Between PCEs

Load Balancers

Traffic Load Balancer Configuration

GSLB Requirements

Configure SAML IdP for User Login

Certificate Requirements

Object Limits and Supercluster

RBAC Permissions: Leader or Member

Process, File Limits, and Kernel Parameters

Configure PCE Internal Syslog on Leader

Note

Search results