EKS Cluster upgrade

Ā·

3 min read

šŸš€ EKS Cluster Upgrade on AWS (EKS) ā€“ Best Practices & Key Considerations šŸš€

Upgrading your Kubernetes (K8s) cluster might seem intimidating at first, but with the right preparation and process, it can be a breeze. Whether you are managing your own cluster or using Amazon EKS, this guide has the essential best practices and key considerations to help you upgrade seamlessly.


šŸ”§ Pre-Upgrade Checklist (prerequisites)

Before you dive into upgrading your EKS cluster, make sure you are prepared with these essentials:

  1. Cordon Your Nodes
    This is your first moveā€”mark your nodes as unschedulable to prevent new pods from being scheduled during the upgrade. Typically, this process takes 1-2 hours, so be sure to inform your team.

  2. Read the Release Notes
    Donā€™t skip this step! Check out the release notes for any new features, bug fixes, or deprecated features (e.g., moving from K8S 1.30 to 1.31).

  3. Test in Lower Environments
    Kubernetes upgrades are irreversible, so make sure youā€™ve tested it thoroughly in a non-production environment. Allow a grace period of up to two weeks to monitor resources for any unusual behavior.

  4. Ensure Version Consistency
    Keep your control plane and nodes in syncā€”both should be running the same Kubernetes version. The kubelet should also match the control plane version for smooth compatibility.

  5. Update the Cluster Auto-Scaler
    If youā€™re using the Cluster Auto-Scaler, make sure itā€™s updated to the same version as the control plane and nodes for smooth scaling.

  6. Check IP Availability
    Youā€™ll need at least 5 available IPs in the same subnet to make sure the upgrade runs smoothly. Don't overlook this!

  7. Backup Your Data
    Backup is always a must, especially for critical components like etcd and application data. Tools like eksctl or the AWS Management Console make it easy.


šŸŒ High Availability & Disaster Recovery

AWS EKS is designed to ensure high availability for your control plane across multiple availability zones. This built-in disaster recovery feature means that even if one zone fails, your cluster will continue to run smoothly.


šŸ”„ The Upgrade Process for Managed K8s (EKS/AKS/GKE)

Hereā€™s a breakdown of the upgrade process for your managed Kubernetes cluster:

  1. Control Plane Upgrade
    You can upgrade the control plane via AWS CLI, eksctl, or directly through the AWS Console. Donā€™t worry about scaling or disaster recoveryā€”the AWS platform has you covered.

  2. Data Plane Upgrade
    Now itā€™s time to upgrade your nodes (or Fargate). Pay close attention to node groups and launch templates to ensure theyā€™re aligned with the upgrade.

  3. Update Your Add-ons
    Donā€™t forget to update important add-ons like kubeproxy and VPC CNI. These are crucial to ensure that everything runs as expected after the upgrade.


āš” Key Upgrade Considerations

Before hitting the ā€œupgradeā€ button, here are a few things to keep in mind:

  • Node Upgrade Strategy

    • AWS Node Groups with Launch Templates: AWS will manage the upgrade automatically, ensuring zero downtime by upgrading nodes one by one.

    • Self-managed Nodes: Youā€™ll need to cordon and upgrade nodes manually, one at a time, to prevent downtime.

    • Hybrid Setups: You can also mix both approaches if you have a hybrid setup.

  • Managing Controllers
    For tools like Helm, ArgoCD, or Prometheus, you donā€™t necessarily need to stop them during the upgrade. Just make sure youā€™ve tested everything in a lower environment first to minimize any risks when you go live.


šŸ’” Pro Tip

Know your cluster inside and out. Before upgrading, ensure you understand your K8s componentsā€”add-ons, controllers, networking, and everything else. This knowledge will save you time and headaches during the upgrade.


šŸ” Final Thought

Upgrading Kubernetes might feel complex, but with thorough preparation, testing, and phased upgrades, you can make the process smooth and stress-free. Backup, test, and upgrade wisely to ensure your production environment is rock-solid post-upgrade.

Ā