K8S-iptables在K8S中的应用剖析

K8S-iptables在K8S中的应用剖析iptables 在 K8s 中的应用剖析 https jishuin proginn com p 763bfbd2bf02 kube proxy 修改了 filter 和 nat 表 它对 iptables 的链进行了扩充 自定义了 KUBE SERVICES KUBE NODEPORTS

大家好,我是讯享网,很高兴认识大家。

iptables 在 K8s 中的应用剖析

https://jishuin.proginn.com/p/763bfbd2bf02

kube-proxy 修改了 filter 和 nat 表,它对 iptables 的链进行了扩充,自定义了KUBE-SERVICES,KUBE-NODEPORTS,KUBE-POSTROUTING,KUBE-MARK-MASQ 和 KUBE-MARK-DROP 五个链,并主要通过为 KUBE-SERVICES 链(附着在PREROUTING 和 OUTPUT)增加 rule 来配制 traffic routing 规则

// the services chain kubeServicesChain utiliptables.Chain = "KUBE-SERVICES" // the external services chain kubeExternalServicesChain utiliptables.Chain = "KUBE-EXTERNAL-SERVICES" // the nodeports chain kubeNodePortsChain utiliptables.Chain = "KUBE-NODEPORTS" // the kubernetes postrouting chain kubePostroutingChain utiliptables.Chain = "KUBE-POSTROUTING" // the mark-for-masquerade chain KubeMarkMasqChain utiliptables.Chain = "KUBE-MARK-MASQ" /*对于未能匹配到跳转规则的traffic set mark 0x8000,有此标记的数据包会在filter表drop掉*/ // the mark-for-drop chain KubeMarkDropChain utiliptables.Chain = "KUBE-MARK-DROP" /*对于符合条件的包 set mark 0x4000, 有此标记的数据包会在KUBE-POSTROUTING chain中统一做MASQUERADE*/ // the kubernetes forward chain kubeForwardChain utiliptables.Chain = "KUBE-FORWARD" 

讯享网

KUBE-MARK-MASQ 和 KUBE-MARK-DROP
这两个规则主要用来对经过的报文打标签,打上标签的报文可能会做相应处理,KUBE-MARK-DROP 和 KUBE-MARK-MASQ 本质上就是使用了 iptables 的 MARK命令。打标签处理如下:

讯享网-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000 丢弃 -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000 伪装 

KUBE-SVC 和 KUBE-SEP

Kube-proxy 接着对每个服务创建 “KUBE-SVC-” 链,并在 nat 表中将 KUBE-SERVICES 链中每个目标地址是 service 的数据包导入这个 “KUBE-SVC-” 链,如果endpoint 尚未创建,KUBE-SVC- 链中没有规则,任何 incomming packets 在规则匹配失败后会被 KUBE-MARK-DROP。在 iptables 的 filter 中有如下处理,如果KUBE-SVC 处理失败会通过 KUBE_FIREWALL 丢弃

#KUBE-SERVICES---------------KUBE-SVC-4CRUJHTV5RT5YMFY -A KUBE-SERVICES -d 10.109.53.21/32 -p tcp -m comment --comment "kubernetes-dashboard/kubernetes-dashboard: cluster IP" -m tcp --dport 443 -j KUBE-SVC-4CRUJHTV5RT5YMFY #KUBE-SVC-4CRUJHTV5RT5YMFY-------------KUBE-SEP-JHLKQBYX2MBZKAVY -A KUBE-SVC-4CRUJHTV5RT5YMFY -m comment --comment "kubernetes-dashboard/kubernetes-dashboard:" -j KUBE-SEP-JHLKQBYX2MBZKAVY #KUBE-SEP-JHLKQBYX2MBZKAVY---------KUBE-MARK-MASQ -A KUBE-SEP-JHLKQBYX2MBZKAVY -s 10.244.1.8/32 -m comment --comment "kubernetes-dashboard/kubernetes-dashboard:" -j KUBE-MARK-MASQ #KUBE-SEP-JHLKQBYX2MBZKAVY-----------DNAT -A KUBE-SEP-JHLKQBYX2MBZKAVY -p tcp -m comment --comment "kubernetes-dashboard/kubernetes-dashboard:" -m tcp -j DNAT --to-destination 10.244.1.8:8443 

匹配的目的是kubernetes会对这些包做SNAT操作

https://www.csdn.net/tags/OtDaEgwsOTExMzgtYmxvZwO0O0OO0O0O.html#61_PREROUTING_325

入包处理整体流程(进入主机)

讯享网(1)Nat表PreRouting、OutPut拦截所有数据包进入KUBE-SERVICE链处理 (2)对出包时需要SNAT的数据包打mark (3)对数据包基于目的地址及目的端口进行匹配,进入KUBE-SVC-HASH链处理 (4)进入KUBE-SVC-HASH链,基于概率进入KUBE-SEP-HASH链,进行DNAT,将目的地址转换为RIP:RPORT 

出包处理整体流程(出主机)

(1)Nat表PostRouting拦截所有进入KUBE-POSTROUTING链处理 (2)KUBE-POSTROUTING链对于打了标记的数据包,进行SNAT处理 

1、kube-proxy在开始下发svc相关规则前,会现往filter表追加KUBE-SERVICES、KUBE-EXTERNAL-SERVICES、KUBE-FORWARD链跳转规则,会往nat表追加入KUBE-SERVICES、KUBE-NODEPORTS、KUBE-POSTROUTING、KUBE-MARK-MASQ链跳转规则。并为KUBE-POSTROUTING及KUBE-MARK-MASQ链添加默认规则。

讯享网# KUBE-POSTROUTING [root@node1 ~]# iptables-save |grep KUBE-POSTROUTING -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING [root@node1 ~]# iptables -t nat -nL KUBE-POSTROUTING Chain KUBE-POSTROUTING (1 references) target prot opt source destination MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000 ----------------------------------------------------------------------------------- # KUBE-MARK-MASQ [root@node1 ~]# iptables-save |grep KUBE-POSTROUTING -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE [root@node1 ~]# iptables -t nat -L KUBE-MARK-MASQ Chain KUBE-MARK-MASQ (21 references) target prot opt source destination MARK all -- anywhere anywhere MARK or 0x4000 

PREROUTING

PREROUTING ->KUBE-SERVICES

NAT表的PREROUTING链中加入了,KUBE-SERVICES链跳转规则。对于所有的数据包,跳转入KUBE-SERVICES链处理

# Add Rule -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES # Show Rule iptables -t nat -L PREROUTING # Show Result Chain PREROUTING (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */ 

KUBE-SERVICES

数据包经过NAT表PREROUTING链中规则匹配,进入KUBE-SERVICE链处理

ClusterIP :
KUBE-SERVICES->KUBE-MARK-MASQ

kube-proxy会遍历每个svcInfo,根据clusterIP、启动参数–cluster-cidr及–masquerade-all来配置, jump到KUBE-MARK-MASQ链的匹配规则。

讯享网-A KUBE-SERVICES -d 10.68.81.30/32 -p tcp -m comment --comment "demo/s-test-mq:tcp-5672 cluster IP" -m tcp --dport 5672 -j KUBE-MARK-MASQ 

KUBE-MARK-MASQ链内规则会为进入链的所有数据包打上0x4000/0x4000标签(进行完此处理动作后,将会继续比对其它规则),配置了–masquerade-all欺骗,对外部流量打标签

[root@node1 ~]# iptables -t nat -L KUBE-MARK-MASQ Chain KUBE-MARK-MASQ (21 references) target prot opt source destination MARK all -- anywhere anywhere MARK or 0x4000 

KUBE-SERVICES->KUBE-SVC-HASH

kube-proxy会在KUBE-SERVICES中为每个具有clusterIP的 svcPort构建KUBE-SVC-HASH跳转规则,将访问svcPort的数据包,导入到KUBE-SVC-HASH链中。

讯享网-A KUBE-SERVICES -d 10.103.54.188/32 -p tcp -m comment --comment "kubernetes-dashboard/dashboard-metrics-scraper: cluster IP" -m tcp --dport 8000 -j KUBE-SVC-NDSMHFCKXJRPU4FV [root@node1 ~]# iptables -t nat -L KUBE-SVC-NDSMHFCKXJRPU4FV Chain KUBE-SVC-NDSMHFCKXJRPU4FV (1 references) target prot opt source destination KUBE-SEP-CEDDRRUEZB7B4HBW all -- anywhere anywhere /* kubernetes-dashboard/dashboard-metrics-scraper: */ 

3.2External IP:

对于External IP,只有当前物理机上设备有这个地址时,kube-proxy才会下发规则.kube-proxy首先在本地使用External IP及svcPort、Protocol打开一个端口,然后下发规则。

KUBE-SERVICES->KUBE-MARK-MASQ

kube-proxy会在KUBE-SERVICES内,为每个External IP添加KUBE-MARK-MASQ跳转规则,对于目的地址为External IP的数据包丢入KUBE-MARK-MASQ打mark

# 查询 iptables-save -t nat | grep external # Add rule -A KUBE-SERVICES -d 公网IP地址 /32 -p tcp -m comment --comment "demo/hk-nginx-hello:tcp-80 external IP" -m tcp --dport 80 -j KUBE-MARK-MASQ # 查询 iptables -t nat -L KUBE-SERVICES | grep "external IP" | grep MARK # 结果 KUBE-MARK-MASQ tcp -- anywhere 地址这里显示的主机名 /* demo/hk-nginx-hello:tcp-80 external IP */ tcp dpt:http 
KUBE-SERVICES->KUBE-SVC-HASH

kube-proxy会在KUBE-SERVICES内,为每个External IP添加KUBE-SVC-HASH跳转规则

讯享网# Add rule -A KUBE-SERVICES -d 公网IP地址 /32 -p tcp -m comment --comment "demo/hk-nginx-hello:tcp-80 external IP" -m tcp --dport 80 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-UEOQSLEZ4LUM4H7G -A KUBE-SERVICES -d 公网IP地址 /32 -p tcp -m comment --comment "demo/hk-nginx-hello:tcp-80 external IP" -m tcp --dport 80 -m addrtype --dst-type LOCAL -j KUBE-SVC-UEOQSLEZ4LUM4H7G # show rule iptables -t nat -L KUBE-SERVICES | grep "external IP" | grep -v MARK # 非本地请求,转发到svc KUBE-SVC-UEOQSLEZ4LUM4H7G tcp -- anywhere 地址这里显示的主机名 /* demo/hk-nginx-hello:tcp-80 external IP */ tcp dpt:http PHYSDEV match ! --physdev-is-in ADDRTYPE match src-type !LOCAL # 本地请求转发到svc KUBE-SVC-UEOQSLEZ4LUM4H7G tcp -- anywhere 地址这里显示的主机名 /* demo/hk-nginx-hello:tcp-80 external IP */ tcp dpt:http ADDRTYPE match dst-type LOCAL 
LB IP
KUBE-SERVICES->KUBE-FW-HASH

kube-proxy会为每个具有LB IP的svcPort构建KUBE-FW-HASH跳转规则,将访问LBIP:svcPort(协议+端口)的数据包,导入到KUBE-FW-HASH链中


讯享网

# Add Rule -A KUBE-SERVICES -d 公网IP地址 /32 -p tcp -m comment --comment "demo/netutil-2:tcp-8081 loadbalancer IP" -m tcp --dport 8081 -j KUBE-FW-QUZXUNUIPD3MZETI # Show Rule KUBE-FW-QUZXUNUIPD3MZETI tcp -- anywhere 公网IP地址 /* demo/netutil-2:tcp-8081 loadbalancer IP */ tcp dpt:tproxy 
KUBE-FW-HASH->KUBE-MARK-MASQ

对于进入KUBE-FW-HASH链的所有数据包进入KUBE-MARK-MASQ,打上0x4000/0x4000标记

KUBE-FW-HASH->KUBE-SVC-HASH

打完MASQ的数据包,进入链表KUBE-SVC-QUZXUNUIPD3MZETI处理(如果这个时候有后端,会把数据包按概率丢给链KUBE-SVC-QUZXUNUIPD3MZETI-HASH处理)

讯享网# Show Rule iptables -t nat -L KUBE-FW-QUZXUNUIPD3MZETI # Show Rule Chain KUBE-FW-QUZXUNUIPD3MZETI (1 references) target prot opt source destination KUBE-MARK-MASQ all -- anywhere anywhere /* demo/netutil-2:tcp-8081 loadbalancer IP */ KUBE-SVC-QUZXUNUIPD3MZETI all -- anywhere anywhere /* demo/netutil-2:tcp-8081 loadbalancer IP */ KUBE-MARK-DROP all -- anywhere anywhere /* demo/netutil-2:tcp-8081 loadbalancer IP */ 
KUBE-FW-HASH->KUBE-MARK-DROP

对于没有endpoint的数据包,不会在KUBE-SVC-HASH中匹配,进入KUBE-MARK-DROP链表处理,会打上0x8000/0x8000标记(这些数据包会在filter被过滤丢弃)

NodePort
KUBE-SERVICES->KUBE-NODEPORTS

kube-proxy会在KUBE-SERVICES链中构建KUBE-NODEPORTS链,将不匹配ClusterIP、LB规则的数据包导入到KUBE-NODEPORTS链中处理

iptables -t nat -L KUBE-SERVICES| grep KUBE-NODEPORTS 
KUBE-NODEPORTS->KUBE-MARK-MASQ

kube-proxy在KUBE-NODEPORTS中添加规则,匹配目标端口是否为NodePort,如果是,将数据包导入KUBE-NODEPORTS,打上MASQ标记

KUBE-NODEPORTS->KUBE-SVC-HASH

kub-proxy在KUBE-NODEPORTS中添加规则,匹配目标端口是否为NodePort,如果是将数据包导入到KUBE-SVC-HASH链处理

讯享网iptables -t nat -L KUBE-NODEPORTS 
KUBE-SVC-HASH -> KUBE-SEP-HASH

kube-proxy会在KUBE-SVC-HASH链内为svcPort的每个endpoint构建基于概率的 KUBE-SEP-HASH跳转规则,并为每个endpoint构建KUBE-SEP-HASH规则

# Add Rule -A KUBE-SVC-QUZXUNUIPD3MZETI -m statistic --mode random --probability 0. -j KUBE-SEP-AMWXJCMBQ4RG26Q5 -A KUBE-SVC-QUZXUNUIPD3MZETI -m statistic --mode random --probability 0. -j KUBE-SEP-2OQUDCMHKG5JUGZU -A KUBE-SVC-QUZXUNUIPD3MZETI -j KUBE-SEP-Z2UJWNX76VNMX7FW # Rule show iptables -t nat -L KUBE-SVC-QUZXUNUIPD3MZETI # Show Result Chain KUBE-SVC-QUZXUNUIPD3MZETI (2 references) target prot opt source destination KUBE-SEP-AMWXJCMBQ4RG26Q5 all -- anywhere anywhere statistic mode random probability 0. KUBE-SEP-2OQUDCMHKG5JUGZU all -- anywhere anywhere statistic mode random probability 0. KUBE-SEP-Z2UJWNX76VNMX7FW all -- anywhere anywhere 
KUBE-SEP-HASH

(1)对于源IP地址为172.20.4.10的数据包进行MARK。endPoint 访问SVC的情况。

(2)对数据包进行DNAT,将目的IP地址从Service CluterIP NAT为Pod IP

讯享网# Add Rule -A KUBE-SEP-AMWXJCMBQ4RG26Q5 -s 172.20.4.10/32 -j KUBE-MARK-MASQ -A KUBE-SEP-AMWXJCMBQ4RG26Q5 -p tcp -m tcp -j DNAT --to-destination 172.20.4.10:8091 # Show Rule iptables -t nat -L KUBE-SEP-AMWXJCMBQ4RG26Q5 # Show Result Chain KUBE-SEP-AMWXJCMBQ4RG26Q5 (1 references) target prot opt source destination KUBE-MARK-MASQ all -- 172.20.4.10 anywhere DNAT tcp -- anywhere anywhere tcp to:172.20.4.10:8091 

INPUT

FILTER 表的INPUT链中配置了KUBE-SERVICES, KUBE-EXTERNAL-SERVICES 跳转规则(KUBE-SERVICES未命中,再匹配KUBE-EXTERNAL-SERVICES)。

iptables -t filter -L INPUT 

FILTER INPUT ->KUBE-SERVICES

拒绝目的地址为指定IP的数据包(这些IP对应的Service没有endpoints,这里的目的IP可能是Cluter IP、LB IP)

讯享网# Show Rules iptables -t filter -L KUBE-SERVICES # Show Result Chain KUBE-SERVICES (3 references) target prot opt source destination REJECT tcp -- anywhere 10.68.236.66 /* default/myservice: has no endpoints */ tcp dpt:ssh reject-with icmp-port-unreachable REJECT tcp -- anywhere 10.68.121.211 /* demo/test-lb:tcp-80 has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable REJECT tcp -- anywhere 公网IP地址 /* demo/test-lb:tcp-80 has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable REJECT tcp -- anywhere 10.68.220.136 /* demo/test3:tcp-80 has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable REJECT tcp -- anywhere 公网IP地址 /* demo/test3:tcp-80 has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable 

FILTER INPUT ->KUBE-EXTERNAL-SERVICES

拒绝目的地址为指定External IP的数据包(这些IP对应的Service没有endpoints,这里的IP可能是External IP、Node IP)

iptables -t filter -L KUBE-EXTERNAL-SERVICES 

FORWARD

FILTER 表的FORWARD链中配置了KUBE-FORWARD, KUBE-SERVICES 跳转规则(KUBE-FORWARD未命中,再匹配KUBE-SERVICES)。

讯享网iptables -t filter -L FORWARD 

FORWARD-> KUBE-FORWARD

(1)丢弃处于状态处于INVALID的数据包,因为它可能会潜在导致不期望的链接重置

(2)转发所有打了0x4000/0x4000标记的数据包

(3)后两条规则确保在“kubernetes kubernetes forwarding rules"”接受的初始数据包之后的流量将被接受,尽可能具体地说,流量必须来源于或指向clusterCIDR(到/从一个pod), 即保证Pod流量可以被转发(该规则再配置了–cluster-cidr时配置)。

# Show Rules iptables -t filter -L KUBE-FORWARD # Show Result Chain KUBE-FORWARD (1 references) target prot opt source destination DROP all -- anywhere anywhere ctstate INVALID ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000 ACCEPT all -- 172.20.0.0/16 anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere 172.20.0.0/16 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHE 

FORWARD-> KUBE-SERVICES

拒绝目的地址为指定IP的数据包(这些IP对应的Service没有endpoints,这里的目的IP可能是Cluter IP、LB IP)

OUTPUT

NAT 表及Filter表中添加了KUBE-SERVICE链表跳转规则。数据包先过NAT链,再过Filter链。

NAT OUTPUT-> KUBE-SERVICES

所有数据包进入NAT表的KUBE-SERVICES链处理,KUBE-SERVICES链处理流程和6.1从NAT 表PreRouting 链进入NAT表一致。

讯享网Chain OUTPUT (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */ 

NAT OUTPUT-> KUBE-SERVICES

数据包进入Filter表的KUBE-SERVICES,KUBE-SERVICES链处理流程和6.2.1 Filter INPUT进入KUBE-SERVICES一致。

POSTROUTING

kube-proxy在NAT 表的POSTROUTING链中加入了KUBE-POSTROUTING跳转链表规则

-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING 

NAT POSTROUTING -> KUBE-POSTROUTING

KUBE-POSTROUTING 链表中对于打了0x4000/0x4000标签的数据包,进行MASQUERADE,进行SNAT。

讯享网# Add Rule -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE # Show Rule iptables -t nat -L KUBE-POSTROUTING # Show Result Chain KUBE-POSTROUTING (1 references) target prot opt source destination MASQUERADE all -- anywhere anywhere /* kubernetes service traffic requiring SNAT */ mark match 0x400/0x4000 
小讯
上一篇 2025-04-01 14:21
下一篇 2025-01-04 16:23

相关推荐

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/12213.html