K8s : Understanding nuances of in-cluster external service LB (with LoxiLB)

When we talk about exposing k8s application to the external world, there are two main functions to do the same: the service load balancer or ingress.

We are going to talk about the service load balancer function here. Kubernetes does not provide a load balancer component directly, so it is up to the users to integrate the same in their respective K8s deployments. There are multiple ways to achieve this.

First way is to host LB outside the k8s cluster which means LB lifecycle is not managed by Kubernetes directly. Kubernetes cluster integration with the cloud provider's infrastructure is accomplished through CCM or Load Balancer Spec APIs. In this approach, there is a demarcation of external networking and internal cluster networking. This allows better control over ingress traffic, providing an additional layer of security. This is also the de-facto way for public-cloud providers to deploy load balancers. As a matter of fact, this is the very reason why service type External load-balancer is called so in Kubernetes.

Second way is to offer load balancer as a service inside the Kubernetes cluster itself. This approach eases LB life-cycle management but makes it a little more challenging to manage external and internal networking together. On-prem users who wish to have all the services and feature packaged inside the k8s cluster, or want to deploy a relatively small cluster prefer this way. Cost is another factor as managing external LB boxes/software might incur additional costs.

Readers of our previous blog series would be well-aware of how LoxiLB is deployed outside the cluster to manage LB services but we happen to come across many user requests who would love to run LoxiLB inside the cluster, be it for ease of management, limited resources or deployment architecture etc.

As the famous quote by Tony Robbins says - "The challenge of resourcefulness lies in turning limitations into opportunities.", here we are with the blog about running the LoxiLB load balancer in in-cluster mode.

With in-cluster mode support, LoxiLB joins the rank of select few who can support any mode seamlessly. For starters, LoxiLB is a completely new take on proxy-less load-balancing (using eBPF) which replaces traditional frameworks like ipvs/iptables. And unlike other load-balancers which usually simply rebadge these frameworks, loxilb is built from the ground-up and performs much better than traditional/legacy load-balancers.

This blog will explain how one can setup a 4-node K3s cluster with flannel CNI, run LoxiLB (loxilb-lb) as a DaemonSet and kube-loxilb as a Deployment. For this blog, we also deploy LoxiLB in a special peering mode (loxilb-peer) as a DaemonSet, which runs in worker nodes and connects with LoxiLB (loxilb-lb) instances to exchange connectivity info. Usually, popular CNIs such as Calico provide their own BGP implementations (but not by default) while some CNIs don't at all. Hence, loxilb-peer is an optional component when such options are not available or, users want an optimized BGP implementation. It can even run side-by-side with other BGP implementations if need be.

Design considerations for in-cluster LB

In order to provide in-cluster service load balancing, we deploy three LoxiLB components to achieve our goal. First is loxilb-lb in master node(s), which takes care of the actual service load balancing. The reason for running this in master node(s) is to be inline with the master-worker nodes' roles. Master nodes are usually meant to run control plane applications and worker nodes for running application workloads. Also, master nodes are recommended to be in multiples to ensure high availability. loxilb-lb runs in all master nodes and hence ensures high availability for service load balancer function. Having said that, one should have absolutely no problem to run it in any node. It should be easily achieved by tinkering with labels and pod affinity.

Second component to be run is loxilb-peer, which run in worker nodes. This component, is a non-intrusive one and together with LoxiLB, creates a BGP mesh to ensure the service IP and end-point reachability to/from LoxiLB instances.

Last but not the least is kube-loxilb, which provides kubernetes service load-balancer spec interface and implementation. It will now additionally be responsible for auto-configuring/managing the BGP mesh as well as arbitrate role(s) selection for different loxilb-lb pods.

Finally, readers might wonder how loxilb-lb pods gets their hand on ingress packets in presence of iptables/CNI rules etc. loxilb-lb uses eBPF to intercept packets much earlier than Linux kernel processing thereby is able to act on ingress packets as per configured LB rules. Also, loxilb-lb is able to work on system interfaces directly (please check yaml file definitions) and hence does not need multiple interfaces assignment using multus etc.

Bring up the Kubernetes Cluster

We will use Vagrant tool to quickly spin up a complete test topology in less than 5 mins. The following Vagrantfile is used to set up the K3s cluster:

# -*- mode: ruby -*-
# vi: set ft=ruby :

workers = (ENV['WORKERS'] || "2").to_i
box_name = (ENV['VAGRANT_BOX'] || "sysnet4admin/Ubuntu-k8s")
box_version = "0.7.1"
Vagrant.configure("2") do |config|
  config.vm.box = "#{box_name}"
  config.vm.box_version = "#{box_version}"

  if Vagrant.has_plugin?("vagrant-vbguest")
    config.vbguest.auto_update = false
  end

  config.vm.define "master1" do |master|
    master.vm.hostname = 'master1'
    master.vm.network :private_network, ip: "192.168.80.10", :netmask => "255.255.255.0"
    master.vm.network :private_network, ip: "192.168.90.10", :netmask => "255.255.255.0"
    master.vm.provision :shell, :path => "master1.sh"
    master.vm.provider :virtualbox do |vbox|
        vbox.customize ["modifyvm", :id, "--memory", 8192]
        vbox.customize ["modifyvm", :id, "--cpus", 4]
    end
  end

  config.vm.define "master2" do |master|
    master.vm.hostname = 'master2'
    master.vm.network :private_network, ip: "192.168.80.11", :netmask => "255.255.255.0"
    master.vm.network :private_network, ip: "192.168.90.11", :netmask => "255.255.255.0"
    master.vm.provision :shell, :path => "master2.sh"
    master.vm.provider :virtualbox do |vbox|
        vbox.customize ["modifyvm", :id, "--memory", 8192]
        vbox.customize ["modifyvm", :id, "--cpus", 4]
    end
  end

  (1..workers).each do |node_number|
    config.vm.define "worker#{node_number}" do |worker|
      worker.vm.hostname = "worker#{node_number}"
      ip = node_number + 100
      worker.vm.network :private_network, ip: "192.168.80.#{ip}", :netmask => "255.255.255.0"
      worker.vm.provision :shell, :path => "worker.sh"
      worker.vm.provider :virtualbox do |vbox|
          vbox.customize ["modifyvm", :id, "--memory", 4096]
          vbox.customize ["modifyvm", :id, "--cpus", 2]
      end
    end
  end
end

The scripts master1.sh, master2.sh and worker.sh can be found here. Setup of the cluster can be done by a simple vagrant command:

$ vagrant up

Deploy kube-loxilb

In this blog, we will connect with external client with BGP. We need to specify client's IP address and AS number in kube-loxilb.yaml. Add below config in this yaml file as:

args:
        - --cidrPools=defaultPool=123.123.123.1/24
        - --setBGP=64512
        - --setRoles

Now apply the modified yaml file.

vagrant@master1:~$ sudo kubectl apply -f /vagrant/kube-loxilb.yml 
serviceaccount/kube-loxilb created
clusterrole.rbac.authorization.k8s.io/kube-loxilb created
clusterrolebinding.rbac.authorization.k8s.io/kube-loxilb created
deployment.apps/kube-loxilb created

vagrant@master1:~$ sudo kubectl get pods -A
NAMESPACE     NAME                                     READY   STATUS    RESTARTS   AGE
kube-system   local-path-provisioner-957fdf8bc-vmndm   1/1     Running   0          10m
kube-system   coredns-77ccd57875-2md2m                 1/1     Running   0          10m
kube-system   metrics-server-648b5df564-44wnc          1/1     Running   0          10m
kube-system   loxilb-lb-7v8qm                          1/1     Running   0         4m2s
kube-system   kube-loxilb-5c5f686ccf-knw2p             1/1     Running   0          28s

Get LoxiLB UP and Running

Once the cluster is all set, it is time to run LoxiLB as a DaemonSet. Below is the yaml file, we used for this blog:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: loxilb-lb
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: loxilb-app
  template:
    metadata:
      name: loxilb-lb
      labels:
        app: loxilb-app
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: Exists
      - key: "node-role.kubernetes.io/control-plane"
        operator: Exists
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "node-role.kubernetes.io/master"
                operator: Exists
              - key: "node-role.kubernetes.io/control-plane"
                operator: Exists
      containers:
      - name: loxilb-app
        image: "ghcr.io/loxilb-io/loxilb:latest"
        imagePullPolicy: Always
        command: [ "/root/loxilb-io/loxilb/loxilb", "--bgp", "--egr-hooks", "--blacklist=cni[0-9a-z]|veth.|flannel.|cali.|tunl.|vxlan[.]calico" ]
        ports:
        - containerPort: 11111
        - containerPort: 179
        securityContext:
          privileged: true
          capabilities:
            add:
              - SYS_ADMIN
---
apiVersion: v1
kind: Service
metadata:
  name: loxilb-lb-service
  namespace: kube-system
spec:
  clusterIP: None
  selector:
    app: loxilb-app
  ports:
  - name: loxilb-app
    port: 11111
    targetPort: 11111
    protocol: TCP
  - name: loxilb-app-bgp
    port: 179
    targetPort: 179
    protocol: TCP

If we just look at this line in the yaml file:

command: [ "/root/loxilb-io/loxilb/loxilb", "--bgp", "--egr-hooks", "--blacklist=cni[0-9a-z]|veth.|flannel.|cali.|tunl.|vxlan[.]calico" ]

Argument "--bgp" indicates that loxilb will be running with bgp instance and will be advertising the service IP to the external peer or loxilb-peer.
Argument "--egr-hooks" is required for those cases in which workloads can be scheduled in the master nodes. No need to mention this argument when you are managing the workload scheduling to worker nodes.
Argument "--blacklist=cni[0-9a-z]|veth.|flannel.|cali.|tunl.|vxlan[.]calico" is mandatory for running in in-cluster mode. As loxilb attaches it's ebpf programs on all the interfaces but since we running it in the default namespace then all the interfaces including CNI interfaces will be exposed and loxilb will attach it's ebpf program in those interfaces which is definitely not desired. So, user needs to mention a regex for excluding all those interfaces.
Argument "--fallback" is optional and ensures that egress traffic from the cluster to the internet or external destinations defaults to system masquerade rules if no other rules apply. LoxiLB can also be used as a HA capable egress on its own.

Apply the loxilb.yml file to create "loxilb-lb-service" in the master node:

vagrant@master1:~$ sudo kubectl apply -f /vagrant/loxilb.yaml
daemonset.apps/loxilb-lb created
service/loxilb-lb-service created
vagrant@master1:~$ sudo kubectl get pods -A
NAMESPACE     NAME                                     READY   STATUS    RESTARTS   AGE
kube-system   coredns-77ccd57875-dwrsm                 1/1     Running   0         129m
kube-system   kube-loxilb-5c5f686ccf-knw2p             1/1     Running   0          39m
kube-system   local-path-provisioner-957fdf8bc-72kcx   1/1     Running   0         129m
kube-system   loxilb-lb-9s5qw                          1/1     Running   0          19m
kube-system   loxilb-lb-sk9cd                          1/1     Running   0          19m
kube-system   metrics-server-648b5df564-mfg2j          1/1     Running   0         129m

Deploy loxilb-peer

Below is the yaml file, we used for this blog:


apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: loxilb-peer
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: loxilb-peer-app
  template:
    metadata:
      name: loxilb-peer
      labels:
        app: loxilb-peer-app
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "node-role.kubernetes.io/master"
                operator: DoesNotExist
              - key: "node-role.kubernetes.io/control-plane"
                operator: DoesNotExist
      containers:
      - name: loxilb-peer-app
        image: "ghcr.io/loxilb-io/loxilb:latest"
        imagePullPolicy: Always
        command: [ "/root/loxilb-io/loxilb/loxilb", "--peer" ]
        ports:
        - containerPort: 11111
        - containerPort: 179
        securityContext:
          privileged: true
          capabilities:
            add:
              - SYS_ADMIN
---
apiVersion: v1
kind: Service
metadata:
  name: loxilb-peer-service
  namespace: kube-system
spec:
  clusterIP: None
  selector:
    app: loxilb-peer-app
  ports:
  - name: loxilb-peer-app
    port: 11111
    targetPort: 11111
    protocol: TCP
  - name: loxilb-peer-bgp
    port: 179
    targetPort: 179
    protocol: TCP

If you wish to use CNI's BGP speakers then it is totally fine, no need to deploy loxilb-peer.yml. And, Just remove "--bgp" in loxilb.yml as below and then apply it.

- name: loxilb-app
        image: "ghcr.io/loxilb-io/loxilb:latest"
        imagePullPolicy: Always
        command: [ "/root/loxilb-io/loxilb/loxilb" ]

Apply the loxilb-peer.yml file to create "loxilb-peer-service" :

vagrant@master1:~$ sudo kubectl apply -f /vagrant/loxilb-peer.yml 
daemonset.apps/loxilb-peer created
service/loxilb-peer-service created
vagrant@master1:~$ sudo kubectl get pods -A
NAMESPACE     NAME                                     READY   STATUS    RESTARTS   AGE
kube-system   coredns-77ccd57875-dwrsm                 1/1     Running   0         154m
kube-system   kube-loxilb-5c5f686ccf-knw2p             1/1     Running   0          64m
kube-system   local-path-provisioner-957fdf8bc-72kcx   1/1     Running   0         154m
kube-system   loxilb-lb-9s5qw                          1/1     Running   0          44m
kube-system   loxilb-lb-sk9cd                          1/1     Running   0          44m
kube-system   loxilb-peer-8bh9b                        1/1     Running   0         105s
kube-system   loxilb-peer-f5fmt                        1/1     Running   0         105s
kube-system   metrics-server-648b5df564-mfg2j          1/1     Running   0         154m

Let's verify the BGP (auto) configuration in LoxiLB instances:

vagrant@master1:~$ sudo kubectl exec -it loxilb-lb-9s5qw -n kube-system -- bash
root@master1:/# gobgp neigh
Peer              AS  Up/Down State       |#Received  Accepted
192.168.80.1   65101 00:34:38 Establ      |        1         0
192.168.80.11  64512 00:34:46 Establ      |        0         0
192.168.80.101 64512 00:03:58 Establ      |        0         0
192.168.80.102 64512 00:04:03 Establ      |        0         0
root@master1:/# gobgp global policy
Import policy:
    Default: ACCEPT
Export policy:
    Default: ACCEPT
    Name set-next-hop-self-gpolicy:
        StatementName set-next-hop-self-gstmt:
          Conditions:
          Actions:
             Nexthop:  self

vagrant@master1:~$ sudo kubectl exec -it loxilb-lb-sk9cd -n kube-system -- bash
root@master2:/# gobgp global
AS:        64512
Router-ID: 192.168.80.11
Listening Port: 179, Addresses: 0.0.0.0
root@master2:/# gobgp neigh
Peer              AS  Up/Down State       |#Received  Accepted
192.168.80.1   65101 00:36:18 Establ      |        1         0
192.168.80.10  64512 00:36:51 Establ      |        0         0
192.168.80.101 64512 00:06:04 Establ      |        0         0
192.168.80.102 64512 00:06:06 Establ      |        0         0
root@master2:/# gobgp global policy
Import policy:
    Default: ACCEPT
Export policy:
    Default: ACCEPT
    Name set-next-hop-self-gpolicy:
        StatementName set-next-hop-self-gstmt:
          Conditions:
          Actions:
             Nexthop:  self

BGP Configuration in LoxiLB peer pods:

vagrant@master1:~$ sudo kubectl exec -it loxilb-peer-8bh9b -n kube-system -- bash
root@worker1:/# gobgp neigh
Peer              AS  Up/Down State       |#Received  Accepted
192.168.80.10  64512 00:10:35 Establ      |        0         0
192.168.80.11  64512 00:10:36 Establ      |        0         0
192.168.80.102 64512 00:10:38 Establ      |        0         0

vagrant@master1:~$ sudo kubectl exec -it loxilb-peer-f5fmt  -n kube-system -- bash
root@worker2:/# gobgp neigh
Peer              AS  Up/Down State       |#Received  Accepted
192.168.80.10  64512 00:11:14 Establ      |        0         0
192.168.80.11  64512 00:11:12 Establ      |        0         0
192.168.80.101 64512 00:11:12 Establ      |        0         0

Deploy Services

Create TCP, UDP and SCTP services in Kubernetes :

vagrant@master1:~$ sudo kubectl apply -f /vagrant/nginx.yml 
service/nginx-lb1 created
pod/nginx-test created
vagrant@master1:~$ sudo kubectl apply -f /vagrant/udp.yml 
service/udp-lb1 created
pod/udp-test created
vagrant@master1:~$ sudo kubectl apply -f /vagrant/sctp.yml 
service/sctp-lb1 created
pod/sctp-test created
vagrant@master1:~$ sudo kubectl get pods -A
NAMESPACE     NAME                                     READY   STATUS             RESTARTS   AGE
default       nginx-test                               1/1     Running   0          19m
default       sctp-test                                1/1     Running   0          32s
default       udp-test                                 1/1     Running   0         113s
kube-system   coredns-77ccd57875-dwrsm                 1/1     Running   0         3h2m
kube-system   kube-loxilb-5c5f686ccf-knw2p             1/1     Running   0          60m
kube-system   local-path-provisioner-957fdf8bc-72kcx   1/1     Running   0         3h2m
kube-system   loxilb-lb-9s5qw                          1/1     Running   0          72m
kube-system   loxilb-lb-sk9cd                          1/1     Running   0          72m
kube-system   loxilb-peer-8bh9b                        1/1     Running   0          29m
kube-system   loxilb-peer-f5fmt                        1/1     Running   0          29m
kube-system   metrics-server-648b5df564-mfg2j          1/1     Running   0         3h2m

Let's verify the services in the Kubernetes cluster:

vagrant@master1:~$ sudo kubectl get svc
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)            AGE
kubernetes   ClusterIP      10.43.0.1       <none>          443/TCP            14m
nginx-lb1    LoadBalancer   10.43.91.80     123.123.123.1   55002:32694/TCP    3m11s
sctp-lb1     LoadBalancer   10.43.149.41    123.123.123.1   55004:31402/SCTP   3m57s
udp-lb1      LoadBalancer   10.43.149.142   123.123.123.1   55003:30165/UDP    3m18s

In LoxiLB:

vagrant@master1:~$ sudo kubectl exec -it loxilb-lb-9s5qw -n kube-system -- bash
root@master1:/# loxicmd get lb        
|  EXTERNAL IP  | PORT  | PROTOCOL | BLOCK | SELECT |  MODE  |# OF ENDPOINTS| MONITOR |
|---------------|-------|----------|-------|--------|--------|--------------|---------|
| 123.123.123.1 | 55002 | tcp      |     0 | rr     |fullnat |            1 | Off     |
| 123.123.123.1 | 55003 | udp      |     0 | rr     |fullnat |            1 | On      |
| 123.123.123.1 | 55004 | sctp     |     0 | rr     |fullnat |            1 | On      |

LoxiLB instances announces Service IPs to its configured peers. Since, Client is running BGP server and worker nodes are running LoxiLB BGP peer service, they will install all the advertised routes. We can verify the same in the client and worker nodes.

In the Client:

$ ip route
default via 192.168.20.1 dev eno1 proto static metric 100 
123.123.123.1 via 192.168.80.10 dev vboxnet2 proto bird metric 32 
169.254.0.0/16 dev eno1 scope link metric 1000 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.20.0/24 dev eno1 proto kernel scope link src 192.168.20.55 metric 100 
192.168.80.0/24 dev vboxnet2 proto kernel scope link src 192.168.80.1 
192.168.90.0/24 dev vboxnet0 proto kernel scope link src 192.168.90.1

In the Worker node:

vagrant@worker2:~$ ip route
default via 10.0.2.2 dev eth0 
default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100 
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100 
10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100 
10.0.2.3 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100 
10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink 
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink 
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink 
123.123.123.1 via 192.168.80.10 dev eth1 proto bgp 
192.168.80.0/24 dev eth1 proto kernel scope link src 192.168.80.102

Time to validate the results

Let's verify if client can access the TCP service with the External Service IP:

Conclusion

Hopefully, this blog provides readers a good idea about how to deploy LoxiLB inside the Kubernetes cluster and interesting tidbits of in-cluster LB based services in K8s. If you like our work then please don't forget to support us by going to our Github page and give us a star. You can also reach us through our slack channel to share your valuable feedback and ideas.

Note: Want to try it out yourself? All the scripts and configurations used for this blog are available here. Download all the scripts to a folder and follow the steps as below:

$ ./config.sh
$ ./validation.sh
# Cleanup
$ ./rmconfig.sh

LoxiLB

K8s : Understanding nuances of in-cluster external service LB (with LoxiLB)

Design considerations for in-cluster LB

Bring up the Kubernetes Cluster

Deploy kube-loxilb

Get LoxiLB UP and Running

Deploy loxilb-peer

Deploy Services

Create TCP, UDP and SCTP services in Kubernetes :

Time to validate the results

Conclusion

Recent Posts

Comentários

GIThub

GETTING STARTED

Documentation

JOIN US