VEN-to-PCE Communication
This topic discusses how the VEN communicates with the PCE for both Illumio Core Cloud customers and Illumio Core On-Premises customers.
Details about VEN-to-PCE Communication
On Prem
The VEN, by default, communicates with the PCE when installed in customers data centers (On-Premises) over the following ports:
Port 8443 - HTTPS requests
Port 8444 - long-lived TLS-over-TCP connection
SaaS
The VEN communicates with the Illumio Core Cloud PCE over Port 443 for both HTTPS requests and the long-lived TLS-over-TCP connection.
The VEN uses Transport Level Security (TLS) to connect to the PCE. The PCE certificate must be trusted by the VEN before communication can occur.
The VEN sends the following details to the PCE:
Regular heartbeat with the latest hostname and other properties of the workload
Traffic log
Network interfaces
Processes
Open ports
Interactive users (Windows only)
Container workload information (C-VEN only)
The VEN receives the following details from the PCE:
Firewall policy
Lightning bolts/heartbeat responses with action to perform, such as sending a support report
PCE Certificate Verification
Keep in mind the following:
The VEN requires that the full certificate chain, up to but not including a self-signed root certificate trusted by the OS, be sent as part of the TLS handshake with the PCE.
The PCE will always send the full certificate chain, minus the root certificate.
If a “Man In The Middle" (MITM) device with TLS inspection capability is deployed on a path between the VEN and the PCE, Illumio recommends bypassing such capabilities for VEN-to-PCE communication:
Some MITM devices that forge the PCE certificate will not send the full certificate chain, resulting in a TLS failure with some VEN and OS combinations.
Illumio does not test coexistence with any MITM devices. With respect to compatibility with partial certificate chains in the TLS handshake, the behavior of the VEN and the behavior of the MITM device may change at any time without notice on either side.
Configurable Time for Heartbeat Warning
By specifying a custom time through the PCE User Interface, you can change how long the VEN can go without heartbeating before it enters the Warning state:
Go to Settings > Offline Timers.
Select the Server or Endpoint tab.
Click Edit.
In the Disconnect and Quarantine section, select Custom Timeout.
Specify a wait time.
Click Save.
VEN Connectivity
Online: The workload is connected to the network and can communicate with the PCE.
Offline: The workload is not connected to the network and cannot communicate with the PCE.
Suspended: The VEN is in the suspended state and any rules programmed into the workload's IP tables (including custom iptables rules) or Windows filtering platform firewalls are removed completely. No Illumio-related processes are running on the workload.
VEN Support for IPv6 Traffic
You can configure how VENs support IPv6 traffic. Go to Settings > Security and click the General tab:
For VEN releases 20.2.0 and later, choose one of these options:
Allow IPv6 traffic according to your policy
Block IPv6 traffic only when in Full Enforcement. (Traffic will always be allowed on AIX and Solaris workstations.)
For VEN releases pre-20.2.0, choose one of these options:
Allow all IPv6 traffic
Block IPv6 traffic only when in Full Enforcement. (Traffic will always be allowed on AIX and Solaris workstations.)
Communication Frequency
The following table shows the frequency of communications to the PCE for common VEN operations. See PCE Administration Guide for more details about these intervals and their effects.
Function | Frequency | Notes |
---|---|---|
Firewall policy updates | Real-time if lightning bolts are enabled. | If lightning bolts are displayed or the channel is not functional, policy updates are communicated to the VEN by a heartbeat action. |
Active service reporting | See note. |
|
Interface reports and changes | Event driven. | Only if there are changes to the interfaces; otherwise, no data are sent. |
Traffic flow log | Every 10 minutes. |
|
Heartbeat | Every 5 minutes. | If the PCE does not receive three consecutive heartbeats, an event is written to the PCE's event log. See also VEN Heartbeats and Lost Agents. |
Dead-peer interval | Configurable | Defaults are:
See also VEN Offline Timers and Isolation. |
VEN tampering detection | Within a few seconds on Windows and Linux. | For more information, see Host Firewall Tampering Protection. |
VEN Heartbeats and Lost Agents
The VEN sends a heartbeat message every five minutes to the PCE to inform the PCE that it is up and running. If the VEN fails to send a heartbeat, check the workload where the VEN is installed and investigate any connectivity issues. If the VEN continues to fail to send a heartbeat, it eventually is marked Offline, which means it can no longer communicate with the PCE or other managed workloads.
PCE down or network issue and the VEN degraded state
If the VEN cannot connect to the PCE either because the PCE is down or because of a network issue, the VEN continues to enforce the last-known-good policy while it tries to reconnect with the PCE.
After missing three heartbeats, the VEN enters the degraded state. In the degraded state, the VEN ignores all the asynchronous commands received as lightning bolts from the PCE except the commands for software upgrades and support reports.
After connectivity to the PCE is restored, the VEN comes out of the degraded state after three successful heartbeats.
Failed authentication and the VEN minimal state
If the VEN enters the degraded state because of failed authentications, the VEN enters a state called minimal. In the minimal state, the VEN only attempts to connect with the PCE every four hours through a heartbeat.
If the authentication failure was temporary, the VEN exits the minimal state after its first successful connection to the PCE. Whenever the VEN enters the minimal state, it stops the VTAP service. VTAP is then restarted when the VEN exits the minimal state.
If Kerberos authentication is used, the VEN attempts to refresh the agent token with a new Kerberos ticket before sending a heartbeat. If the authentication error is not recovered after four hours, the VEN sends a lost-agent message to the PCE which then logs a message in the Organization Events. The message informs the user that the VEN needs to be uninstalled or reinstalled manually on this workload.
VEN Offline Timers and Isolation
When the VEN on a workload is stopped, the VEN makes a "best effort" REST API goodbye call to the PCE. After a delay specified by the "workload goodbye timer" (default: 15 minutes for Server VENs, 1 day for Endpoint VENs), the PCE marks the workload offline and removes it from the policy.
If the REST API call (goodbye) fails, or if the workload goes offline abruptly (for example, due to a power outage), the PCE stops receiving heartbeats from the workload. After the period of time configured in the PCE web console Settings > Offline Timers elapses, the PCE marks the workload offline and recomputes policies for the peer workloads to isolate the offline workload. If no time period has been configured, the defaults are:
Server VENs: 60 minutes, or 12 heartbeats
Endpoint VENs: 24 hours
The system_task.agent_missed_heartbeats_check
alert triggers an alert to be sent at 25% of the time configured in the offline timer. For example, if the offline timer is configured to 1 hour, an alert is sent after the VEN has not sent a heartbeat for 15 minutes; if the offline timer is configured to 4 hours, an alert is sent after the VEN hasn't sent a heartbeat for 1 hour. If a user has customized the timer, the event will show up when 25% of the timer has elapsed.
Sampling Mode for VENs
If the VEN receives a sustained amount of high traffic per second from many individual connections, the VEN enters Sampling Mode to reduce the load. Sampling Mode is a protection mechanism to ensure that the VEN does not contribute to the consumption of CPU. In Sampling Mode, not every flow is reported. Instead, flows are periodically sampled and logged.
After CPU usage on the VEN decreases, Sampling Mode is disabled and each connection is reported to the VEN. The entry and exit from sampling-mode is automatically performed by the VEN depending on the load on the VEN.
Details about entering and exiting Sampling Mode are captured in /opt/illumio_ven_data/log/vtap.log
. Look for Entering
and Exiting throttle state
.
Linux TCP Timeout Variable
For VENs installed on Linux workloads, the VEN relies on conntrack to manage the nf_conntrack_tcp_timeout_established
variable.
By default, as soon as the VEN is installed, it sets the nf_conntrack_tcp_timeout_established
frequency to eight hours (28,800 seconds). Setting this frequency manages workload memory by removing unused connections from the table and thereby increasing performance.
If you change the frequency via sysctl
, it is reverted the next time the workload is rebooted or the next time the VEN's configuration file is read.
Wireless Connections and VPNs
The Illumio Core VEN supports wireless connections for VENs installed on endpoints in the Illumio Core.
For more information about installing the VEN on an endpoint, and supporting a wireless network connection, see Illumio Endpoint Segmentation.
Note
Wireless network support is only available for endpoints in Illumio Core. It is not available for other support server types, such as bare-metal servers, virtual machines (VMs), or container hosts.
Show Amount of Data Transfer
The operation of 'show amount of data transfer' capability on the PCE is a preview feature available with the 20.2.0 release. The PCE now reports amount of data transferred in to and out of workloads and applications in a datacenter. The number of bytes sent by and received by the provider of an application are provided separately. You can see these values in traffic flow summaries streamed out of the PCE. You can enable this capability on a per-workload basis in the Workload page. You can also enable it in the pairing profile so that workloads are directly paired into this mode.
After the feature is enabled, the VEN starts reporting the number of bytes transferred over the connections. The PCE collects this data, adds relevant information, such as, labels and sends the traffic flow summaries out of the PCE.
The direction reported in flow summary is from the viewpoint of the provider of the flow.
Destination Total Bytes Out (
dst_tbo
): Number of bytes transferred out of provider (Connection Responder)Destination Total Bytes In (
dst_tbi
): Number of bytes transferred in to provider (Connection Responder)
The number of bytes includes:
L3 and L4 header sizes of each packet (IP Header and TCP Header)
Sizes of multiple headers that may be included in communication (when SecureConnect is enabled)
Retransmitted packets.
The bytes transferred in the packets of a connection are included in measurement. This is similar to various networking products such as firewalls, span-port measurement tools, and other network traffic measurement tools that measure network traffic.
Term | Description |
---|---|
dst_tbi | Destination Total Bytes In Total bytes received till now by the destination over the flows included in this flow-summary in the latest sampled interval. This is the same as bytes sent by the source. Present in 'A', 'C', and 'T' flow-summaries. source = client = connection initiator, destination = server = connection responder. |
dst_tbo | Destination Total Bytes Out Total bytes sent till now by the destination over the flows included in this flow-summary in the latest sampled interval. This is the same as bytes received by the source. Present in 'A', 'C', and 'T' flow-summaries. source = client = connection initiator, destination = server = connection responder. |
dst_dbi | Destination Delta Bytes In Number of bytes received by the destination in the latest sampled interval, over the flows included in this flow-summary. This is the same as bytes sent by the source. Present in 'A', 'C', and 'T' flow-summaries. source = client = connection initiator, destination = server = connection responder. |
dst_dbo | Destination Delta Bytes Out Number of bytes sent by the destination in the latest sampled interval, over the flows included in this flow-summary. This is the same as bytes received by the source. Present in 'A', 'C', and 'T' flow-summaries. source = client = connection initiator, destination = server = connection responder. |
interval_sec T | Time Interval in Seconds Duration of latest sampled interval over which the above metrics are valid. |
Connection State | Description |
---|---|
A | Active: The connection is still active at the time the record was posted. Typically observed with long-lived flows on source and destination side of communication. |
T | Timed Out: Flow does not exist any more. It has timed out. Typically observed on destination side of communication. |
C | Closed: Flow does not exist any more. It has been closed. Typically observed on source side of communication. |
S | Snapshot: Connection was active at the time VEN sampled the flow. Typically observed when the VEN is in Idle state. |