I've setup an AKS K8s cluster (hybrid windows & linux containers) and I'm setting up Network policies to better control the network accessibility from within the pods in my clusters. Following various 'best practice' examples from around the internet I've first applied a global deny all ingress and egress rule such as:
apiVersion: crd.projectcalico/v1
kind: GlobalNetworkPolicy
metadata:
name: default-deny-all-except-dns
spec:
namespaceSelector: has(kubernetes.io/metadata.name) && kubernetes.io/metadata.name not in {"kube-system", "calico-system", "traefik", "cert-manager"}
types:
- Ingress
- Egress
egress:
# allow all namespaces to communicate to DNS pods
- action: Allow
protocol: UDP
destination:
selector: k8s-app == "kube-dns"
ports:
- 53
And this works, ingress and egress from my pods is denied, DNS works etc.. I then successfully setup the appropriate policies to allow ingress into the pods, which also works fine.
I've now come to start building the egress policies and I've run into a confusing stumbling block. I want to setup an egress policy that allows labelled pods to access the Azure SQL instances, shown below:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-database-egress
spec:
podSelector:
matchLabels:
allow-database-egress: "true"
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
#- 10.0.0.0/8
#- 192.168.0.0/16
#- 172.16.0.0/20
ports:
- protocol: TCP
port: 1433
- protocol: TCP
port: 11000
endPort: 11999
If I apply the above, it works, my pod can successfully connect to Azure SQL. However, the moment I uncomment any of the 'except' array it stops working. Note, I've tested this with IPs that I control as well as local ranges in case it was some strange DNat problem, and it doesn't seem to matter what value is in the except array, just if there's something there the outbound traffic no longer works. Pods hosted on the Linux node pool appear to behave correctly, those on the windows node pool do not.
There is a note on the AKS Network Policy documentation page that states 'CIDR with exceptions' is not supported by Azure NPM on Windows. However, I don't believe that I am using NPM as I'm using Calico. Is this an error in the documentation and I shouldn't be able to get this working or is something been missed from my configuration?
(I appreciate that I could specify the region specific CIDRs for Azure SQL, but that felt more fragile.)