NPEP-133: FQDN Selector for Egress Traffic¶
- Issue: #133
- Status: Implementable
TLDR¶
This enhancement proposes adding a new optional selector to specify egress peers using Fully Qualified Domain Names (FQDNs).
Goals¶
- Provide a selector to specify egress peers using a Fully Qualified Domain Name
(for example
kubernetes.io). - Support basic wildcard matching capabilities when specifying FQDNs (for
example
*.cloud-provider.io) - Currently only
ACCEPTtype rules are proposed. -
Safely enforcing
DENYrules based on FQDN selectors is difficult as there is no guarantee a Network Policy plugin is aware of all IPs backing a FQDN policy. If a Network Policy plugin has incomplete information, it may accidentally allow traffic to an IP belonging to a denied domain. This would constitute a security breach.By contrast,
ACCEPTrules, which may also have an incomplete list of IPs, would not create a security breach. In case of incomplete information, valid traffic would be dropped as the plugin believes the destination IP does not belong to the domain. While this is definitely undesirable, it is at least not an unsafe failure. -
DomainNames is restricted to the Admin tier of ClusterNetworkPolicy only.
- Since Kubernetes NetworkPolicy does not have a FQDN selector, using
domainNames in the Baseline tier would allow writing baseline rules that
can't be replicated by an overriding NetworkPolicy. For example, if a
Baseline-tier ClusterNetworkPolicy allows traffic to
example.io, but the namespace admin installs a Kubernetes NetworkPolicy, the namespace admin has no way to replicate theexample.ioselector using just Kubernetes NetworkPolicies. This breaks the fundamental tier override model where NetworkPolicy can always override Baseline-tier rules.
Non-Goals¶
- This enhancement does not include a FQDN selector for allowing ingress traffic.
- This enhancement only describes enhancements to the existing L4 filtering as provided by ClusterNetworkPolicy. It does not propose any new L7 matching or filtering capabilities, like matching HTTP traffic or URL paths.
- This selector should not control what DNS records are resolvable from a particular workload.
- This selector provides no capability to detect traffic destined for different domains backed by the same IP (e.g. CDN or load balancers).
- This enhancement does not add any new mechanisms for specifying how traffic is routed to a destination (egress gateways, alternative SNAT IPs, etc). It just adds a new way of specifying packets to be allowed or dropped on the normal egress data path.
- This enhancement does not require any mechanism for securing DNS resolution (e.g. DNSSEC or DNS-over-TLS). Unsecured DNS requests are expected to be sufficient for looking up FQDNs.
Introduction¶
FQDN-based egress controls are a common enterprise security practice. Administrators often prefer to write security policies using DNS names such as “www.kubernetes.io” instead of capturing all the IP addresses the DNS name might resolve to. Keeping up with changing IP addresses is a maintenance burden, and hampers the readability of the network policies.
User Stories¶
-
As a cluster admin, I want to allow all Pods in the cluster to send traffic to an external service specified by a well-known domain name. For example, all Pods must be able to talk to
my-service.com. -
As a cluster admin, I want to allow Pods in the "monitoring" namespace to be able to send traffic to a logs-sink, hosted at
logs-storage.com -
As a cluster admin, I want to allow all Pods in the cluster to send traffic to any of the managed services provided by my Cloud Provider. Since the cloud provider has a well known parent domain, I want to allow Pods to send traffic to all sub-domains using a wild-card selector --
*.my-cloud-provider.com -
As a cluster admin, I want to allow Pods in the cluster to send traffic to a entire tree of domains. For example, our CDN has domains of the format
<session>.<random>.<region>.my-app.cdn.com. I want to be able to use a wild-card selector to allow the full tree of subdomains below**.my-app.cdn.com.
Future User Stories¶
These are some user stories we want to keep in mind, but due to limitations of the existing Network Policy API, cannot be implemented currently. The design goal in this case is to ensure we do not make these unimplementable down the line.
- As a cluster admin, I want to switch the default disposition of the cluster to
be default deny. This is enforced using a Baseline-tier
ClusterNetworkPolicy. I also want individual namespace owners to be able to specify their egress peers. Namespace admins would then use a FQDN selector in the KubernetesNetworkPolicyobjects to allowmy-service.com.
API¶
This NPEP proposes adding a DomainNames field to
ClusterNetworkPolicyEgressPeer which allows specifying domain names as
egress peers. DomainNames is only available with Accept rules in the Admin
tier of ClusterNetworkPolicy.
These restrictions are enforced via CEL validation on
ClusterNetworkPolicySpec (Baseline tier) and
ClusterNetworkPolicyEgressRule (Accept-only):
// +kubebuilder:validation:XValidation:rule="self.tier == 'Baseline' ? !self.egress.exists(rule, rule.to.exists(peer, has(peer.domainNames))) : true",message="domainNames cannot be used in Baseline tier as NetworkPolicy cannot override FQDN rules"
type ClusterNetworkPolicySpec struct { ... }
// +kubebuilder:validation:XValidation:rule="self.action != 'Accept' ? !self.to.exists(peer, has(peer.domainNames)) : true",message="domainNames may only be used with Accept action"
type ClusterNetworkPolicyEgressRule struct { ... }
// DomainName describes one or more domain names to be used as a peer.
//
// DomainName can be an exact match, or use the wildcard specifier '*' to match
// one or more labels.
//
// '*', the wildcard specifier, matches one or more entire labels. It does not
// support partial matches. '*' may only be specified as a prefix.
//
// Examples:
// - `kubernetes.io` matches only `kubernetes.io`.
// It does not match "www.kubernetes.io", "blog.kubernetes.io",
// "my-kubernetes.io", or "wikipedia.org".
// - `blog.kubernetes.io` matches only "blog.kubernetes.io".
// It does not match "www.kubernetes.io" or "kubernetes.io".
// - `*.kubernetes.io` matches subdomains of kubernetes.io.
// "www.kubernetes.io", "blog.kubernetes.io", and
// "latest.blog.kubernetes.io" match, however "kubernetes.io", and
// "wikipedia.org" do not.
//
// +kubebuilder:validation:Pattern=`^(\*\.)?([a-zA-z0-9]([-a-zA-Z0-9_]*[a-zA-Z0-9])?\.)+[a-zA-z0-9]([-a-zA-Z0-9_]*[a-zA-Z0-9])?\.?$`
type DomainName string
type ClusterNetworkPolicyEgressPeer struct {
<snipped>
// DomainNames provides a way to specify domain names as peers.
//
// DomainNames is only supported for Accept rules in the Admin tier.
// In order to control access, DomainNames Accept rules should be used
// with a lower precedence egress deny -- this allows the admin to
// maintain an explicit "allowlist" of reachable domains.
//
// DomainNames cannot be used in the Baseline tier because Kubernetes
// NetworkPolicy has no FQDN selector, so a Baseline FQDN rule cannot
// be overridden by a NetworkPolicy.
//
// Support: Extended
//
// <network-policy-api:experimental>
// +optional
// +listType=set
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=25
DomainNames []DomainName `json:"domainNames,omitempty"`
}
Examples¶
Pods in monitoring namespace can talk to my-service.com and *.cloud-provider.io¶
apiVersion: policy.networking.k8s.io/v1alpha2
kind: ClusterNetworkPolicy
metadata:
name: allow-my-service-egress
spec:
tier: Admin
priority: 55
tier: Admin
subject:
namespaces:
matchLabels:
kubernetes.io/metadata.name: "monitoring"
egress:
- name: "allow-to-my-service"
action: "Accept"
to:
- domainNames:
- "my-service.com"
- "*.cloud-provider.io"
protocols:
- tcp:
destinationPort:
number: 443
Maintaining an allowlist of domains¶
There are a couple ways to maintain an allowlist:
This example includes the Deny rule in the same ClusterNetworkPolicy object.
It's also possible to use another ClusterNetworkPolicy object with a lower
priority (e.g. 100 in this example):
apiVersion: policy.networking.k8s.io/v1alpha2
kind: ClusterNetworkPolicy
metadata:
name: allow-my-service-egress
spec:
tier: Admin
priority: 55
tier: Admin
subject:
namespaces:
matchLabels:
kubernetes.io/metadata.name: "monitoring"
egress:
- name: "allow-to-my-service"
action: "Accept"
to:
- domainNames:
- "my-service.com"
- "*.cloud-provider.io"
protocols:
- tcp:
destinationPort:
number: 443
- name: "default-deny"
action: "Deny"
to:
- networks:
- "0.0.0.0/0"
- "::/0"
This example uses a Baseline-tier default-deny ClusterNetworkPolicy to create the allowlist:
apiVersion: policy.networking.k8s.io/v1alpha2
kind: ClusterNetworkPolicy
metadata:
name: allow-my-service-egress
spec:
tier: Admin
priority: 55
tier: Admin
subject:
namespaces:
matchLabels:
kubernetes.io/metadata.name: "monitoring"
egress:
- name: "allow-to-my-service"
action: "Accept"
to:
- domainNames:
- "my-service.com"
- "*.cloud-provider.io"
protocols:
- tcp:
destinationPort:
number: 443
---
apiVersion: policy.networking.k8s.io/v1alpha2
kind: ClusterNetworkPolicy
metadata:
name: default-deny
spec:
tier: Baseline
priority: 0
subject:
namespaces: {}
egress:
- name: "default-deny"
action: "Deny"
to:
- networks:
- "0.0.0.0/0"
- "::/0"
Expected Behavior¶
- A FQDN egress policy does not grant the workload permission to communicate
with any in-cluster DNS services (like
kube-dns). A separate rule needs to be configured to allow traffic to any DNS servers. FQDN policies are not expected to work if the pod cannot reach DNS. - FQDN policies should not affect the ability of workloads to resolve domains, only their ability to communicate with the IP backing them. Put another way, FQDN policies should not result in any form of DNS filtering.
- For example, if a policy allows traffic to
kubernetes.io, any selected Pods can still resolvewikipedia.orgormy-services.default.svc.cluster.local, but can not send traffic to them unless allowed by a different rule. - Each implementation will provide guidance on which DNS name-server is
considered authoritative for resolving domain names. This could be the
kube-dnsService or potentially some other DNS provider specified in the implementation's configuration. - Pods are expected to use the DNS configuration provided via
resolv.conf(i.e. the canonical cluster DNS server). If a pod uses a different DNS server (e.g. hardcoded8.8.8.8), FQDN rule processing is not guaranteed to work. - DNS record querying and lifetimes:
- Pods are expected to make a DNS query for a domain before sending traffic to it. If the Pod fails to send a DNS request and instead just sends traffic to the IP (either because of caching or a static config), traffic is not guaranteed to flow.
- Pods should respect the TTL of DNS records they receive. Trying to establish new connection using DNS records that are expired is not guaranteed to work.
- When the TTL for a DNS record expires, the implementor should stop allowing new connections to that IP. Existing connections will still be allowed (that's consistent with NetworkPolicy behavior on long-running connections).
- If a DNS record is refreshed before the TTL expires (e.g., due to a new DNS query from the workload), the TTL timer should be reset based on the new response.
- Implementations must support at least 100 unique IPs (either IPv4 or IPv6)
for each domain. This is true for both explicitly specified domains, as well
as for each domain selected by a wild-card rule. For example, the rule
*.kubernetes.iosupports 100 IPs each for bothdocs.kubernetes.ioandblog.kubernetes.io. - PTR records are not required to properly configure a FQDN selector. For
example, as long as an A record exists mapping
my-hostnameto1.2.3.4, the Network Policy implementation should allow traffic to1.2.3.4. There is no requirement that a PTR record for1.2.3.4.in-addr.arpaexist or that it points tomy-hostname(it is allowed to point toother-host). - Targeting in-cluster endpoints with FQDN selector is not recommended. There are other selectors which can more precisely capture intent. However, if in-cluster endpoints are selected:
- ✅︎ Supported:
- Selecting Pods using their generated DNS
record
(for example
pod-ip-address.my-namespace.pod.cluster.local). This is analogous to selecting the Pod by its IP address using the Network selector. - Headless Services can be selected using their generated DNS record because the generated DNS records contain a list of all the Pod IPs that back the service.
- Selecting Pods using their generated DNS
record
(for example
- ❌ Not Supported:
- ClusterIP Services can not be selected using their generated DNS
record
(for example
my-svc.my-namespace.svc.cluster.local). This is consistent with the behavior when selecting the Service VIP using the Network selector. - ExternalName Services return a
CNAMErecord. See the entry below about CNAME support. - Any record which points to the IPs used for
LoadBalancertype services. This includes theexternalIPsand the.status.loadBalancer.ingressfields
- ClusterIP Services can not be selected using their generated DNS
record
(for example
- If the specified domain in a FQDN selector resolves to a CNAME record the behavior of the implementor depends on the returned response.
If the upstream resolver used CNAME chasing to fully resolve the domain to a A/AAAA record and returns the resulting chain, the implementor can use this information to allow traffic to the specified IPs. However the implementor does not need to perform their own CNAME chasing or to understand resolutions across multiple DNS requests.
For example, if the FQDN selector is allowing traffic to www.kubernetes.io:
* If a DNS query to the upstream resolver returns a single response with
the following records:
www.kubernetes.io -- CNAME to kubernetes.io
kubernetes.io -- A to 1.2.3.4
1.2.3.4
* If DNS query only responds with a CNAME record, the resolver is not
required to allow traffic even if subsequent requests resolve the full
chain:
# REQUEST 1
www.kubernetes.io -- CNAME to kubernetes.io
# REQUEST 2
kubernetes.io -- A to 1.2.3.4
1.2.3.4 because no single
response contained the full chain required to resolve the domain.
Recommended Behavior¶
The following recommendations are based on operational experience from existing implementations and feedback gathered during KubeCon Atlanta 2025.
- Pods should only make DNS requests to the canonical DNS server, because
FQDN rules for a pod are only guaranteed to work after that pod makes an
appropriate DNS request to the canonical DNS server (see Expected
Behavior). If a pod queries an alternate DNS server
(e.g. hardcoded
8.8.8.8), the implementation may not observe the DNS response, and the FQDN Accept rule will simply not take effect. This is a functionality issue, not a security breach -- the pod's traffic is denied, not accidentally allowed. For example, when a pod queries the canonical DNS server formy-service.comand receives1.2.3.4, the implementation observes that response and learns thatmy-service.commaps to1.2.3.4. It can then enforce the FQDN Accept rule by allowing traffic to1.2.3.4. If the pod instead queries8.8.8.8, the implementation never sees the response and has no IP to allow -- the rule simply doesn't take effect. - Administrators should create a high-precedence Admin-tier rule allowing
egress DNS traffic to the canonical DNS server (e.g.
kube-dnsinkube-system), to ensure that DNS is not blocked by other deny rules. For example:apiVersion: policy.networking.k8s.io/v1alpha2 kind: ClusterNetworkPolicy metadata: name: allow-dns spec: tier: Admin priority: 50 subject: namespaces: matchLabels: requires-fqdn-policy: "true" egress: - name: "allow-dns" action: "Accept" to: - namespaces: matchLabels: kubernetes.io/metadata.name: "kube-system" protocols: - udp: destinationPort: number: 53 - tcp: destinationPort: number: 53 - Implementations that operate by snooping DNS responses on the wire MUST only trust responses originating from the canonical DNS server. Trusting responses from arbitrary sources is a security concern: a malicious actor could forge DNS responses to trick the implementation into allowing traffic to unintended IPs.
- Implementations MAY provide a configurable grace period beyond the TTL to accommodate DNS propagation delays and client-side caching.
- Although securing DNS resolution is a non-goal of this NPEP, implementations are recommended to consider mitigations for DNS cache poisoning (e.g., DNSSEC validation by the canonical DNS server) when documenting their trust model.
Connection Lifecycle on Policy Updates¶
When FQDN policies change, implementations must handle in-flight connections gracefully. See the Expected Behavior section for DNS record updates.
- Policy addition: New connections matching the FQDN are allowed once DNS resolution for the domain has been observed. Connections to IPs that haven't been observed via DNS are not guaranteed to be allowed.
- Policy removal: Existing established connections SHOULD be allowed to complete gracefully. New connections SHOULD be denied after the policy is removed. This is consistent with the general NetworkPolicy behavior for long-running connections.
Alternatives¶
IP Block Selector¶
IP blocks are an important tool for specifying Network Policies. However, they do not address all user needs and have a few short-comings when compared to FQDN selectors:
- IP-based selectors can become verbose if a single logical service has numerous IPs backing it.
- IP-based selectors pose an ongoing maintenance burden for administrators, who need to be aware of changing IPs.
- IP-based selectors can result in policies that are difficult to read and audit.
L4 Proxy¶
Users can also configure a L4 Proxy (e.g. using SOCKS) to inspect their traffic and implement egress firewalls. They present a few trade-ofs when compared to a FQDN selector:
- Additional configuration and maintenance burden of the proxy application itself
- Configuring new routes to direct traffic leaving the application to the L4 proxy.
L7 Policy¶
Another alternative is to provide a L7 selector, similar to the policies provided by Service Mesh providers. While L7 selectors can offer more expressivity, they often come trade-offs that are not suitable for all users:
- L7 selectors necessarily support a select set of protocols. Users may be using a custom protocol for application-level communication, but still want the ability to specify endpoints using DNS.
- L7 selectors often require proxies to perform deep packet inspection and enforce the policies. These proxies can introduce un-desireable latencies in the datapath of applications.
References¶
- NPEP #126: Egress Control in ANP
Implementations¶
The following is a best-effort breakdown of capabilities of different NetworkPolicy providers, as of 2023-09-25. This information may be out-of-date, or inaccurate.
| Antrea | Calico | Cilium | OpenShift (current) |
OpenShift (future) |
|
|---|---|---|---|---|---|
| Implementation | DNS Snooping + Async DNS |
DNS Snooping | DNS Snooping | Async DNS | DNS Snooping |
| Wildcards | ✅︎ | ️✅︎ | ✅︎ | ❌ | ✅︎ |
| Egress Rules | ✅︎ | ️✅︎ | ✅︎ | ✅︎ | ✅︎ |
| Ingress Rules | ❌ | ️❌ | ❌ | ❌ | ❌ |
| Allow Rules | ✅︎ | ️✅︎ | ✅︎ | ✅︎ | ✅︎ |
| Deny Rules | ✅︎ | ️❌(?) | ❌ | ✅︎ | ❌(?) |
Appendix¶
CNAME Records¶
CNAME records are a type of DNS record (like a A or AAAA) that direct the
resolver to query another name to retrieve actual A/AAAA records.
For example:
$ dig www.kubernetes.io
... Omitted output ...
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.kubernetes.io. IN A
;; ANSWER SECTION:
www.kubernetes.io. 3600 IN CNAME kubernetes.io.
kubernetes.io. 3600 IN A 147.75.40.148
... Omitted Output ...
CNAME Chasing¶
CNAME chasing refers to an optional behavior for DNS resolvers whereby they
perform subsequent lookups to resolve CNAMEs returned for a particular query. In
the above example, querying for www.kubernetes.io. returned a CNAME record for
kubernetes.io.. When CNAME chasing is enabled, the DNS server will
automatically resolve kubernetes.io. and return both records as the DNS
response.