top of page
Writer's pictureLoxiLB

Building a resilient EKS Cluster with in-cluster auto-scaled LoxiLB

Updated: 3 days ago

In this blog we will demonstrate how to deploy a EKS cluster across AWS regions/local zones with LoxiLB as an auto-scalable and in-cluster network load-balancer. We will provide complete set of steps to bring up such an cluster fully automated with Terraform. We will further elaborate the various benefits of such a deployment from an end-user perspective and compelling business value it can create.


Deployment Architecture and its Benefits


Let's take a quick look at the benefits of this whole use-case including LoxiLB :


  • Cut down costs and flexibility with LoxiLB and auto-scaler node-groups

LoxiLB can reduce your cost is available by running in-cluster as compared to ELB. ELB services operate independently of your Kubernetes cluster. In this use-case LoxiLB is scheduled on auto-scaling node-group which runs as part of the cluster. With fine-grained policies, the LB nodes can be scaled up or down aligned to the business needs.


  • Optimized for multi-homed networking

For clusters using Multus (secondary network interfaces), LoxiLB becomes even more effective. LoxiLB manages traffic routing across multiple networks within the cluster. Many workloads like Telco Apps or Kube-virt based apps need multi-networked pods.


  • Highly Performant

LoxiLB is already highly performant thanks to its efficient eBPF implementation. Here, it provides further optimization by utilizing EKS VPC CNI’s feature which allows podIPs to be directly reachable inside a VPC. Hence, we are able to streamline traffic ingress'ing into the EKS cluster by bypassing unnecessary Kubernetes networking layers.


  • All LocalZones do not have managed ELB support

AWS Local Zones are a type of AWS infrastructure deployment that places compute, storage, database, and other select AWS services closer to large population centers, industries, and IT hubs. The primary goal of AWS Local Zones is to provide ultra-low latency access to applications and services, improving performance for specific use cases such as real-time gaming, video streaming, augmented/virtual reality (AR/VR), and machine learning at the edge. Not all zones offer ELB services such as NLB. For such local-zones, this provides much relief to the users who need load-balancing for their workloads.


  • Full integration with Route53

This use-case is based on an active-active HA model. The services created in Kubernetes can get directly updated on Route53 records. Since an instance's elasticIP lives outside EKS, there has been no straight forward way to integrate them in EKS. We have done extensive integration/automation with LoxiLB, external-dns and Route53 to achieve this.


Last but not the least, one also gets an on-prem' style LB in their EKS deployments. The overall deployment topology will be similar to the following figure:



Prerequisites before starting


Make sure you have the latest versions of awscli, eksctl, kubectl and terraform tools configured in the host. The host should also have sufficient IAM privileges to do cluster operations among others.


Create an EKS cluster


We will create an EKS cluster in the main AWS regional zone, having 3 sets of worker node-groups. One will be created in AWS main region(with one node). The other two node-groups ((with two nodes each) will be used to run LoxiLB and workload pods respectively. This has been completely automated with Terraform. The terraform scripts sets up the cluster and also IAM roles/K8s service accounts necessary for cluster access from inside the cluster using OIDC based scheme. Terraform variables are set to create a cluster in "us-east-1" region. Please feel free to check the GitHub repo and change as per your need.

$ git clone https://github.com/loxilb-io/demo-examples
$ cd demo-examples/terraform/eks-inclb
$ terraform init
$ terraform apply

Span the cluster across LocalZone (Optional)


This cluster can also span across AWS local-region as well. If you want to setup LoxiLB in the local zone then local-az in your region needs to be enabled(e.g. "us-east-1-atl-2a" is a local zone in "us-east-1" region). The terraform script in this blog does not create a NodeGroup in the local zone. However one can follow other examples such as the one found here.


Check the EKS cluster status


Let's get back to our original cluster we created. At this point, we can check the status of the cluster and its nodes :

$ kubectl get nodes
NAME                             STATUS   ROLES    AGE    VERSION
ip-192-168-68-85.ec2.internal    Ready    <none>   84m    v1.31.0-eks-a737599
ip-192-168-73-205.ec2.internal   Ready    <none>   71m    v1.31.0-eks-a737599
ip-192-168-81-126.ec2.internal   Ready    <none>   71m    v1.31.0-eks-a737599
ip-192-168-85-237.ec2.internal   Ready    <none>   125m   v1.31.0-eks-a737599
ip-192-168-90-199.ec2.internal   Ready    <none>   84m    v1.31.0-eks-a737599

Create LoxiLB CRD

Deploy LoxiLB incluster

$ kubectl apply -f yaml/loxilb.yaml
daemonset.apps/loxilb-lb created
service/loxilb-lb-service created

This is pretty straightforward apart from the fact that it uses an InitContainer to get instance metadata to populate K8s CRDs.


Deploy kube-loxilb component (LoxiLB's operator)

$ kubectl apply -f yaml/kube-loxilb.yaml 
serviceaccount/kube-loxilb created
clusterrole.rbac.authorization.k8s.io/kube-loxilb created
clusterrolebinding.rbac.authorization.k8s.io/kube-loxilb created
deployment.apps/kube-loxilb created

Check LoxiLB CRD driven node publicIP registration in EKS


By now, LoxiLB would have updated its node's public IP to Kubernetes via loxi CRDs as can be verified below :

$ kubectl describe loxiurl | grep "Loxi URL"
  Loxi URL:  54.234.13.xxx
  Loxi URL:  34.229.17.xxx

External-DNS/Route53 Setup


The following steps needs to be followed to make sure external-DNS is able to communicate with Route53


Setup IAM Permissions


Create a policy to set up IAM permissions that will allow ExternalDNS to update Route53 DNS records. route53_policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "route53:ChangeResourceRecordSets"
      ],
      "Resource": [
        "arn:aws:route53:::hostedzone/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "route53:ListHostedZones",
        "route53:ListResourceRecordSets",
        "route53:ListTagsForResource"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Use AWS CLI to create the policy with the following command:

$ aws iam create-policy --policy-name "AllowExternalDNSUpdates" --policy-document file://route53_policy.json
{
    "Policy": {
        "PolicyName": "AllowExternalDNSUpdates",
        "PolicyId": "ANPA4CF3XA2FPM25QT3TB",
        "Arn": "arn:aws:iam::829322364554:policy/AllowExternalDNSUpdates",
        "Path": "/",
        "DefaultVersionId": "v1",
        "AttachmentCount": 0,
        "PermissionsBoundaryUsageCount": 0,
        "IsAttachable": true,
        "CreateDate": "2024-10-17T07:14:35+00:00",
        "UpdateDate": "2024-10-17T07:14:35+00:00"
    }
}

$ export POLICY_ARN=$(aws iam list-policies \
 --query 'Policies[?PolicyName==`AllowExternalDNSUpdates`].Arn' --output text)

Create an IAM role bound to service account

$ eksctl create iamserviceaccount \
  --cluster demo \
  --region us-east-1 \
  --name "external-dns" \
  --namespace "default" \
  --attach-policy-arn $POLICY_ARN \
  --approve

First we need to check if RBAC is enabled in your cluster with the following command:

$ kubectl api-versions | grep rbac.authorization.k8s.io
rbac.authorization.k8s.io/v1

If RBAC is turned on, get the eks role-arn:

$ kubectl describe sa external-dns
Name:                external-dns
Namespace:           default
Labels:              app.kubernetes.io/managed-by=eksctl
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::829322364554:role/eksctl-eks-loxilb-lz-cluster-addon-iamserviceaccou-Role1-hHpB9SHbHTUu
Image pull secrets:  <none>
Mountable secrets:   <none>
Tokens:              <none>
Events:              <none>

Then, use the manifest file in yaml/external-dns-with-rbac.yaml after replacing the role-arn to deploy ExternalDNS.


If RBAC is not enabled:


Then, we need to use the manifest file yaml/external-dns-with-no-rbac.yaml


Create the externalDNS deployment

$ kubectl apply yaml/external-dns-xxx.yaml

Verify the externalDNS deployment

$ kubectl get deployments
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
external-dns   1/1     1            1           13h

Test !!!

We use a test nginx pod with the following yaml :

apiVersion: v1
kind: Service
metadata:
  name: nginx-lb1
  annotations:
    external-dns.alpha.kubernetes.io/hostname: www.multi-xxx-domain.com
    loxilb.io/usepodnetwork : "yes"
spec:
  externalTrafficPolicy: Local
  loadBalancerClass: loxilb.io/loxilb
  selector:
    what: nginx-test
  ports:
    - port: 80
      targetPort: 80
  type: LoadBalancer
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
  labels:
    what: nginx-test
spec:
  nodeSelector:
    node: wlznode02
  containers:
    - name: nginx-test
      image: nginx
      imagePullPolicy: Always
      ports:
        - containerPort: 80

We need to note a couple of annotations here


And apply it :

$ kubectl apply -f yaml/nginx.yaml
service/nginx-lb1 created
pod/nginx-test created
Kindly note that one would need to edit cluster security groups to allow the traffic (this is not handled by Terraform). From AWS console: EKS -> Clusters -> <Name> - > Networking -> ClusterSecurityGroup.

Lets check the created K8s services:

$ kubectl get svc 
NAME         TYPE         CLUSTER-IP      EXTERNAL-IP                 PORT(S)        AGE
kubernetes   ClusterIP    10.100.0.1      <none>                      443/TCP        49m
nginx-lb1    LoadBalancer 10.100.144.82   34.229.17.XX,52.90.160.XX   80:31800/TCP   28m

So, we are able to list the publicIP of all the nodes that have LoxiLB scheduled. We can reach this via each of this external IPs or use domain-name which is set to auto-failover for an active-active HA setup:

Test Access with Domain-Name

$ curl http://www.multi-xxx-domain.com
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Test Access with PublicIP

$ curl http://34.229.17.XX
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Performance


For testing the performance we used the same setup as above. Additionally, we launched a EC2 host in the same zone/subnet as loxilb and worker nodes and ran a series of test. We measured the performance of LoxiLB vs NodePort exposed by EKS/Kubernetes. NodePort (although not a production option) was chosen as a baseline because it is available inside Kubernetes cluster via kube-proxy. and supposed to give the best numbers possible for comparison.


As seen from the above charts, LoxiLB based workloads performed better or equal in almost all the tests performed. LoxiLB performed exceptionally well in requests per second as well as overall latency for the requests which is crucial for various applications.


Conclusion


In this blog, we learned how LoxiLB deployed within an auto-scaled node group in AWS region/Local Zones, integrated with Route 53, offers a robust and scalable solution for low-latency, high-performance applications. This kind of setup ensures seamless traffic distribution and dynamic scaling, allowing your infrastructure to efficiently handle fluctuating workloads. The integration with Route 53 enables intelligent routing and global DNS management, further enhancing application availability and performance.


Additionally, by leveraging AWS Local Zones for proximity to end-users and LoxiLB’s efficient load balancing capabilities, this architecture delivers improved responsiveness, cost optimization through autoscaling, and a highly resilient infrastructure for modern, demanding applications.


We hope you liked our blog. Don’t forget to visit our GitHub and Check our Website to know more!


Credits


Special thanks to Saravanan Shanmugan for his collaboration, unwavering support, and invaluable feedback on this post. As a Hybrid Cloud and Networking Expert leading innovative solutions at Amazon Web Services, his insights and expertise made this possible.


References


322 views0 comments

Comments


GIThub

Learn, Contribute & Share

GETTING STARTED

Get started with deploying LoxiLB in your cluster

Documentation

Check LoxiLB Documentation for more information.

Join the LoxiLB slack channel to chat with the developers community.

bottom of page