Overview
CloudNet@ AEWS 스터디 3주차는 EKS Storage & Node 관리이다.
따라서 이번 실습환경에서는 EFS를 사용한다.
스터디원 이지오님께서 스터디 내용을 간단하게 정리해주셨다.
0. 실습 환경 배포
Amazon EKS 윈클릭 배포 (EFS 생성 추가) & 기본 설정
# YAML 파일 다운로드
curl -O https://s3.ap-northeast-2.amazonaws.com/cloudformation.cloudneta.net/K8S/eks-oneclick2.yaml
# CloudFormation 스택 배포
예시) aws cloudformation deploy --template-file eks-oneclick2.yaml --stack-name myeks --parameter-overrides KeyName=somaz-key SgIngressSshCidr=$(curl -s ipinfo.io/ip)/32 MyIamUserAccessKeyID=AKIA5... MyIamUserSecretAccessKey='CVNa2...' ClusterBaseName=myeks --region ap-northeast-2
# CloudFormation 스택 배포 완료 후 작업용 EC2 IP 출력
aws cloudformation describe-stacks --stack-name myeks --query 'Stacks[*].Outputs[0].OutputValue' --output text
3.38.104.49
# 작업용 EC2 SSH 접속
ssh -i ~/.ssh/somaz-key.pem ec2-user@$(aws cloudformation describe-stacks --stack-name myeks --query 'Stacks[*].Outputs[0].OutputValue' --output text)
The authenticity of host '3.38.104.49 (3.38.104.49)' can't be established.
ED25519 key fingerprint is SHA256:MjVpvGG9nVtekjhovOMdTcdFRWIHiMBvJBs5U1YWG30.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '3.38.104.49' (ED25519) to the list of known hosts.
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
5 package(s) needed for security, out of 17 available
Run "sudo yum update" to apply all updates.
(somaz@myeks:N/A) [root@myeks-bastion-EC2 ~]
기본 설정 및 EFS 확인
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-1-145.ap-northeast-2.compute.internal Ready <none> 2m34s v1.24.11-eks-a59e1f0
ip-192-168-2-193.ap-northeast-2.compute.internal Ready <none> 2m32s v1.24.11-eks-a59e1f0
ip-192-168-3-8.ap-northeast-2.compute.internal Ready <none> 2m31s v1.24.11-eks-a59e1f0
# default 네임스페이스 적용
kubectl ns default
# (옵션) context 이름 변경
NICK=<각자 자신의 닉네임>
NICK=somaz
kubectl ctx
somaz@myeks.ap-northeast-2.eksctl.io
kubectl config rename-context admin@myeks.ap-northeast-2.eksctl.io $NICK@myeks
# EFS 확인 : AWS 관리콘솔 EFS 확인해보자
#mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport <EFS FS ID>.efs.ap-northeast-2.amazonaws.com:/ /mnt/myefs
mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-0712490dce491d5a1.efs.ap-northeast-2.amazonaws.com:/ /mnt/myefs
df -hT --type nfs4
Filesystem Type Size Used Avail Use% Mounted on
fs-0712490dce491d5a1.efs.ap-northeast-2.amazonaws.com:/ nfs4 8.0E 0 8.0E 0% /mnt/myefs
mount | grep nfs4
echo "efs file test" > /mnt/myefs/memo.txt
cat /mnt/myefs/memo.txt
efs file test
rm -f /mnt/myefs/memo.txt
# 스토리지클래스 및 CSI 노드 확인
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 13m
kubectl get sc gp2 -o yaml | yh
kubectl get csinodes
NAME DRIVERS AGE
ip-192-168-1-145.ap-northeast-2.compute.internal 0 5m40s
ip-192-168-2-193.ap-northeast-2.compute.internal 0 5m38s
ip-192-168-3-8.ap-northeast-2.compute.internal 0 5m37s
# 노드 정보 확인
kubectl get node --label-columns=node.kubernetes.io/instance-type,eks.amazonaws.com/capacityType,topology.kubernetes.io/zone
NAME STATUS ROLES AGE VERSION INSTANCE-TYPE CAPACITYTYPE ZONE
ip-192-168-1-145.ap-northeast-2.compute.internal Ready <none> 5m51s v1.24.11-eks-a59e1f0 t3.medium ON_DEMAND ap-northeast-2a
ip-192-168-2-193.ap-northeast-2.compute.internal Ready <none> 5m49s v1.24.11-eks-a59e1f0 t3.medium ON_DEMAND ap-northeast-2b
ip-192-168-3-8.ap-northeast-2.compute.internal Ready <none> 5m48s v1.24.11-eks-a59e1f0 t3.medium ON_DEMAND ap-northeast-2c
eksctl get iamidentitymapping --cluster myeks
ARN USERNAME GROUPS ACCOUNT
arn:aws:iam::61184xxxxxx:role/eksctl-myeks-nodegroup-ng1-NodeInstanceRole-1C8RIIG7INSIK system:node:{{EC2PrivateDNSName}} system:bootstrappers,system:nodes
# 노드 IP 확인 및 PrivateIP 변수 지정
N1=$(kubectl get node --label-columns=topology.kubernetes.io/zone --selector=topology.kubernetes.io/zone=ap-northeast-2a -o jsonpath={.items[0].status.addresses[0].address})
N2=$(kubectl get node --label-columns=topology.kubernetes.io/zone --selector=topology.kubernetes.io/zone=ap-northeast-2b -o jsonpath={.items[0].status.addresses[0].address})
N3=$(kubectl get node --label-columns=topology.kubernetes.io/zone --selector=topology.kubernetes.io/zone=ap-northeast-2c -o jsonpath={.items[0].status.addresses[0].address})
echo "export N1=$N1" >> /etc/profile
echo "export N2=$N2" >> /etc/profile
echo "export N3=$N3" >> /etc/profile
echo $N1, $N2, $N3
192.168.1.145, 192.168.2.193, 192.168.3.8
# 노드 보안그룹 ID 확인
NGSGID=$(aws ec2 describe-security-groups --filters Name=group-name,Values=*ng1* --query "SecurityGroups[*].[GroupId]" --output text)
aws ec2 authorize-security-group-ingress --group-id $NGSGID --protocol '-1' --cidr 192.168.1.100/32
{
"Return": true,
"SecurityGroupRules": [
{
"SecurityGroupRuleId": "sgr-07f7d08144f00b845",
"GroupId": "sg-0e8ef8273c1e585e2",
"GroupOwnerId": "611841095956",
"IsEgress": false,
"IpProtocol": "-1",
"FromPort": -1,
"ToPort": -1,
"CidrIpv4": "192.168.1.100/32"
}
]
}
# 워커 노드 SSH 접속
ssh ec2-user@$N1 hostname
ssh ec2-user@$N2 hostname
ssh ec2-user@$N3 hostname
# 노드에 툴 설치
ssh ec2-user@$N1 sudo yum install links tree jq tcpdump sysstat -y
ssh ec2-user@$N2 sudo yum install links tree jq tcpdump sysstat -y
ssh ec2-user@$N3 sudo yum install links tree jq tcpdump sysstat -y
AWS LB/ExternalDNS, kube-ops-view 설치
# AWS LB Controller
helm repo add eks https://aws.github.io/eks-charts
helm repo update
helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=$CLUSTER_NAME \
--set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller
NAME: aws-load-balancer-controller
LAST DEPLOYED: Sun May 7 20:19:17 2023
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
AWS Load Balancer controller installed!
# ExternalDNS
MyDomain=<자신의 도메인>
MyDomain=somaz.link
MyDnzHostedZoneId=$(aws route53 list-hosted-zones-by-name --dns-name "${MyDomain}." --query "HostedZones[0].Id" --output text)
echo $MyDomain, $MyDnzHostedZoneId
somaz.link, /hostedzone/Z03204211VEUZG9O0RLE5
curl -s -O https://raw.githubusercontent.com/gasida/PKOS/main/aews/externaldns.yaml
MyDomain=$MyDomain MyDnzHostedZoneId=$MyDnzHostedZoneId envsubst < externaldns.yaml | kubectl apply -f -
# kube-ops-view
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set env.TZ="Asia/Seoul" --namespace kube-system
kubectl patch svc -n kube-system kube-ops-view -p '{"spec":{"type":"LoadBalancer"}}'
kubectl annotate service kube-ops-view -n kube-system "external-dns.alpha.kubernetes.io/hostname=kubeopsview.$MyDomain"
echo -e "Kube Ops View URL = http://kubeopsview.$MyDomain:8080/#scale=1.5"
Kube Ops View URL = http://kubeopsview.somaz.link:8080/#scale=1.5
k get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
aws-load-balancer-webhook-service ClusterIP 10.100.112.95 <none>
443/TCP 87s
kube-dns ClusterIP 10.100.0.10 <none>
53/UDP,53/TCP 18m
kube-ops-view LoadBalancer 10.100.16.12 a9efa1d7a8d2b44b0a03d72754f9931f-179953017.ap-northeast-2.elb.amazonaws.com 8080:30969/TCP 32s
설치 정보 확인
# 이미지 정보 확인
kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" | tr -s '[[:space:]]' '\n' | sort | uniq -c
3 602401xxxxxx.dkr.ecr.ap-northeast-2.amazonaws.com/amazon-k8s-cni:v1.12.6-eksbuild.1
2 602401xxxxxx.dkr.ecr.ap-northeast-2.amazonaws.com/eks/coredns:v1.9.3-eksbuild.3
3 602401xxxxxx.dkr.ecr.ap-northeast-2.amazonaws.com/eks/kube-proxy:v1.24.10-minimal-eksbuild.2
1 hjacobs/kube-ops-view:20.4.0
2 public.ecr.aws/eks/aws-load-balancer-controller:v2.5.1
1 registry.k8s.io/external-dns/external-dns:v0.13.
# eksctl 설치/업데이트 addon 확인
eksctl get addon --cluster $CLUSTER_NAME
...
NAME VERSION STATUS ISSUES IAMROLE UPDATE AVAILABLE CONFIGURATION VALUES
coredns v1.9.3-eksbuild.3 ACTIVE 0
kube-proxy v1.24.10-eksbuild.2 ACTIVE 0
vpc-cni v1.12.6-eksbuild.1 ACTIVE 0 arn:aws:iam::61184xxxxxxx:role/eksctl-myeks-addon-vpc-cni-Role1-R4FIFPTW7DJL
# IRSA 확인
eksctl get iamserviceaccount --cluster $CLUSTER_NAME
NAMESPACE NAME ROLE ARN
kube-system aws-load-balancer-controller arn:aws:iam::61184xxxxxxx:role/eksctl-myeks-addon-iamserviceaccount-kube-sy-Role1-1K1FTMHEUMTMF
1. 스토리지 이해
배경소개
파드 내부의 데이터는 파드가 정지되면 모두 삭제된다.
즉, 파드가 모두 상태가 없는(Stateless) 애플리케이션이기 때문이다.
데이터베이스(파드)처럼 데이터 보존이 필요한 파드들이 있다.
즉 상태가 있는(Stateful) 애플리케이션들이다. 따라서 그러한 파드들은 PV & PVC를 사용해 연결하여 사용 가능하다.
- ex) NFS, AWS EBS, Ceph
파드가 생성될 때 자동으로 볼륨을 마운트하여 파드에 연결하는 기능을 동적 프로비저닝(Dynamic Provisioning)이라고 한다.
퍼시스턴트 볼륨의 사용이 끝났을 때 해당 볼륨은 어떻게 초기화할 것인지 별도로 설정할 수 있는데, 쿠버네티스는 이를 Reclaim Policy 라고 부른다.
Reclaim Policy 에는 크게 Retain(보존), Delete(삭제, 즉 EBS 볼륨도 삭제됨) 방식이 있다.
스토리지 소개 : 출처 김태민님 기술 블로그
볼륨 종류에는 emptyDir, hostPath, PV/PVC 이렇게 3가지가 있다.
그리고 다양한 볼륨 사용을 사용한다. K8S에서 자체 제공(hostPath, local), 온프렘 솔루션(ceph 등), NFS 클라우드 스토리지(AWS EBS 등)이 있다.
동적 프로비저닝 & 볼륨 상태, ReclaimPolicy
CSI (Container Storage Interface) 소개
먼저 CSI Driver 배경에 대해서 말해보자면, Kubernetes source code 내부에 존재하는 AWS EBS provisioner는 당연히 Kubernetes release lifecycle을 따라서 배포되므로, provisioner 신규 기능을 사용하기 위해서는 Kubernetes version을 업그레이드해야 하는 제약 사항이 있다.
따라서, Kubernetes 개발자는 Kubernetes 내부에 내장된 provisioner (in-tree)를 모두 삭제하고, 별도의 controller Pod을 통해 동적 provisioning을 사용할 수 있도록 만들었다. 이것이 바로 CSI (Container Storage Interface) driver 이다.
CSI를 사용하면, K8S의 공통화된 CSI 인터페이스를 통해 다양한 Provider를 사용할 수 있다.
일반적인 CSI driver의 구조이다. AWS EBS CSI driver 역시 아래와 같은 구조를 가지는데, 오른쪽 StatefulSet 또는 Deployment로 배포된 controller Pod이 AWS API를 사용하여 실제 EBS volume을 생성하는 역할을 한다.
왼쪽 DaemonSet으로 배포된 node Pod은 AWS API를 사용하여 Kubernetes node (EC2 instance)에 EBS volume을 attach한다.
Node-specific Volume Limits
AWS EC2 Type에 따라 볼륨 최대 제한 : 25개 ~ 39개
# 확인
kubectl describe node | grep Allocatable: -A1
Allocatable:
attachable-volumes-aws-ebs: 25
- KUBE_MAX_PD_VOLS 환경 변수의 값을 설정한 후, 스케줄러를 재시작하여 이러한 한도를 변경 가능
기본 컨테이너 환경의 임시 파일시스템 사용
Pod를 삭제 후 다시 배포하면 이전 기록되었던 내용은 사라진다.
# 파드 배포
# date 명령어로 현재 시간을 10초 간격으로 /home/pod-out.txt 파일에 저장
curl -s -O https://raw.githubusercontent.com/gasida/PKOS/main/3/date-busybox-pod.yaml
cat date-busybox-pod.yaml | yh
apiVersion: v1
kind: Pod
metadata:
name: busybox
spec:
terminationGracePeriodSeconds: 3
containers:
- name: busybox
image: busybox
command:
- "/bin/sh"
- "-c"
- "while true; do date >> /home/pod-out.txt; cd /home; sync; sync; sleep 10; done"
kubectl apply -f date-busybox-pod.yaml
pod/busybox created
# 파일 확인
kubectl get pod
kubectl exec busybox -- tail -f /home/pod-out.txt
Sun May 7 11:23:36 UTC 2023
Sun May 7 11:23:46 UTC 2023
Sun May 7 11:23:56 UTC 2023
Sun May 7 11:24:06 UTC 2023
...
# 파드 삭제 후 다시 생성 후 파일 정보 확인 > 이전 기록이 보존되어 있는지?
kubectl delete pod busybox
kubectl apply -f date-busybox-pod.yaml
kubectl exec busybox -- tail -f /home/pod-out.txt
...
Sun May 7 11:24:38 UTC 2023
Sun May 7 11:24:48 UTC 2023
# 실습 완료 후 삭제
kubectl delete pod busybox
호스트 Path 를 사용하는 PV/PVC : local-path-provisioner 스트리지 클래스 배포
# 배포
curl -s -O https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
kubectl apply -f local-path-storage.yaml
namespace/local-path-storage created
serviceaccount/local-path-provisioner-service-account created
clusterrole.rbac.authorization.k8s.io/local-path-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/local-path-provisioner-bind created
deployment.apps/local-path-provisioner created
storageclass.storage.k8s.io/local-path created
configmap/local-path-config created
# 확인
kubectl get-all -n local-path-storage
NAME NAMESPACE AGE
configmap/kube-root-ca.crt local-path-storage 13s
configmap/local-path-config local-path-storage 13s
pod/local-path-provisioner-759f6bd7c9-szvz8 local-path-storage 13s
serviceaccount/default local-path-storage 13s
serviceaccount/local-path-provisioner-service-account local-path-storage 13s
deployment.apps/local-path-provisioner local-path-storage 13s
replicaset.apps/local-path-provisioner-759f6bd7c9 local-path-storage 13s
kubectl get pod -n local-path-storage -owide
kubectl describe cm -n local-path-storage local-path-config
kubectl get sc
kubectl get sc local-path
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path rancher.io/local-path Delete WaitForFirstConsumer false
PV/PVC를 사용하는 파드 생성한다.
# PVC 생성
curl -s -O https://raw.githubusercontent.com/gasida/PKOS/main/3/localpath1.yaml
cat localpath1.yaml | yh
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: localpath-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: "local-path"
kubectl apply -f localpath1.yaml
persistentvolumeclaim/localpath-claim created
# PVC 확인
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
localpath-claim Pending local-path 8s
kubectl describe pvc
Name: localpath-claim
Namespace: default
StorageClass: local-path
Status: Pending
Volume:
Labels: <none>
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 10s (x3 over 28s) persistentvolume-controller waiting for first consumer to be created before binding
# 파드 생성
curl -s -O https://raw.githubusercontent.com/gasida/PKOS/main/3/localpath2.yaml
cat localpath2.yaml | yh
kubectl apply -f localpath2.yaml
pod/app created
# 파드 확인
kubectl get pod,pv,pvc
NAME READY STATUS RESTARTS AGE
pod/app 1/1 Running 0 21s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-afbfc060-9bf6-47a8-a744-e26e9cfe9129 1Gi RWO Delete Bound default/localpath-claim local-path 16s
NAME STATUS VOLUME
CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/localpath-claim Bound pvc-afbfc060-9bf6-47a8-a744-e26e9cfe9129 1Gi RWO local-path 103s
kubectl describe pv # Node Affinity 확인
kubectl exec -it app -- tail -f /data/out.txt
Sun May 7 11:29:10 UTC 2023
Sun May 7 11:29:15 UTC 2023
Sun May 7 11:29:20 UTC 2023
Sun May 7 11:29:25 UTC 2023
...
# 워커노드 중 현재 파드가 배포되어 있다만, 아래 경로에 out.txt 파일 존재 확인
ssh ec2-user@$N3 tree /opt/local-path-provisioner
/opt/local-path-provisioner
└── pvc-afbfc060-9bf6-47a8-a744-e26e9cfe9129_default_localpath-claim
└── out.txt
1 directory, 1 file
# 해당 워커노드 자체에서 out.txt 파일 확인 : 아래 굵은 부분은 각자 실습 환경에 따라 다름
ssh ec2-user@$N3 tail -f /opt/local-path-provisioner/pvc-afbfc060-9bf6-47a8-a744-e26e9cfe9129_default_localpath-claim/out.txt
Sun May 7 11:30:10 UTC 2023
Sun May 7 11:30:15 UTC 2023
Sun May 7 11:30:20 UTC 2023
Sun May 7 11:30:25 UTC 2023
Sun May 7 11:30:30 UTC 2023
Sun May 7 11:30:35 UTC 2023
Sun May 7 11:30:40 UTC 2023
Sun May 7 11:30:45 UTC 2023
Sun May 7 11:30:50 UTC 2023
Sun May 7 11:30:55 UTC 2023
...
파드 삭제 후 파드 재생성해서 데이터 유지되는지 확인한다.
# 파드 삭제 후 PV/PVC 확인
kubectl delete pod app
kubectl get pod,pv,pvc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-afbfc060-9bf6-47a8-a744-e26e9cfe9129 1Gi RWO Delete Bound default/localpath-claim local-path 2m23s
NAME STATUS VOLUME
CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/localpath-claim Bound pvc-afbfc060-9bf6-47a8-a744-e26e9cfe9129 1Gi RWO local-path 3m50s
ssh ec2-user@$N3 tree /opt/local-path-provisioner
/opt/local-path-provisioner
└── pvc-afbfc060-9bf6-47a8-a744-e26e9cfe9129_default_localpath-claim
└── out.txt
1 directory, 1 file
# 파드 다시 실행
kubectl apply -f localpath2.yaml
# 확인
kubectl exec -it app -- head /data/out.txt
Sun May 7 11:29:10 UTC 2023
Sun May 7 11:29:15 UTC 2023
Sun May 7 11:29:20 UTC 2023
Sun May 7 11:29:25 UTC 2023
Sun May 7 11:29:30 UTC 2023
Sun May 7 11:29:35 UTC 2023
Sun May 7 11:29:40 UTC 2023
Sun May 7 11:29:45 UTC 2023
Sun May 7 11:29:50 UTC 2023
Sun May 7 11:29:55 UTC 2023
kubectl exec -it app -- tail -f /data/out.txt
Sun May 7 11:30:50 UTC 2023
Sun May 7 11:30:55 UTC 2023
Sun May 7 11:31:00 UTC 2023
Sun May 7 11:31:05 UTC 2023
Sun May 7 11:31:10 UTC 2023
Sun May 7 11:31:15 UTC 2023
Sun May 7 11:31:58 UTC 2023
Sun May 7 11:32:03 UTC 2023
Sun May 7 11:32:08 UTC 2023
Sun May 7 11:32:13 UTC 2023
Sun May 7 11:32:18 UTC 2023
다음 실습을 위해서 파드와 PVC 삭제한다.
# 파드와 PVC 삭제
kubectl delete pod app
kubectl get pv,pvc
kubectl delete pvc localpath-claim
# 확인
kubectl get pv
No resources found
ssh ec2-user@$N3 tree /opt/local-path-provisioner
/opt/local-path-provisioner
0 directories, 0 files
만약 /opt/ 경로를 수정하고 싶다면 local-path storgate의 configmap을 수정하면 된다.
k get cm -n local-path-storage
NAME DATA AGE
kube-root-ca.crt 1 9m2s
local-path-config 4 9m2s
k get cm -n local-path-storage -o yaml | grep opt
"paths":["/opt/local-path-provisioner"]
2. AWS EBS Controller
Volume (ebs-csi-controller)
- EBS CSI driver 동작 : 볼륨 생성 및 파드에 볼륨 연결 - 링크
- persistentvolume, persistentvolumeclaim의 accessModes는 ReadWriteOnce로 설정해야 한다.
- Why?
- EBS스토리지 기본 설정이 동일 AZ에 있는 EC2 인스턴스(에 배포된 파드)에 연결해야 한다.
- Why? 파드 스케줄링 방안은?
설치 Amazon EBS CSI driver as an Amazon EKS add-on
# 아래는 aws-ebs-csi-driver 전체 버전 정보와 기본 설치 버전(True) 정보 확인
aws eks describe-addon-versions \
--addon-name aws-ebs-csi-driver \
--kubernetes-version 1.24 \
--query "addons[].addonVersions[].[addonVersion, compatibilities[].defaultVersion]" \
--output text
v1.18.0-eksbuild.1
True
v1.17.0-eksbuild.1
False
...
# ISRA 설정 : AWS관리형 정책 AmazonEBSCSIDriverPolicy 사용
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster ${CLUSTER_NAME} \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--approve \
--role-only \
--role-name AmazonEKS_EBS_CSI_DriverRole
# ISRA 확인
kubectl get sa -n kube-system ebs-csi-controller-sa -o yaml | head -5
eksctl get iamserviceaccount --cluster myeks
NAMESPACE NAME ROLE ARN
kube-system aws-load-balancer-controller arn:aws:iam::61184xxxxxx:role/eksctl-myeks-addon-iamserviceaccount-kube-sy-Role1-1K1FTMHEUMTMF
kube-system ebs-csi-controller-sa arn:aws:iam::61184xxxxxx:role/AmazonEKS_EBS_CSI_DriverRole
# Amazon EBS CSI driver addon 추가
eksctl create addon --name aws-ebs-csi-driver --cluster ${CLUSTER_NAME} --service-account-role-arn arn:aws:iam::${ACCOUNT_ID}:role/AmazonEKS_EBS_CSI_DriverRole --force
# 확인
eksctl get addon --cluster ${CLUSTER_NAME}
NAME VERSION STATUS ISSUES IAMROLE UPDATE AVAILABLE CONFIGURATION VALUES
aws-ebs-csi-driver v1.18.0-eksbuild.1 CREATING 0 arn:aws:iam::61184xxxxxxx:role/AmazonEKS_EBS_CSI_DriverRole
coredns v1.9.3-eksbuild.3 ACTIVE 0
kube-proxy v1.24.10-eksbuild.2 ACTIVE 0
vpc-cni v1.12.6-eksbuild.1 ACTIVE 0 arn:aws:iam::61184xxxxxxx:role/eksctl-myeks-addon-vpc-cni-Role1-R4FIFPTW7DJL
kubectl get deploy,ds -l=app.kubernetes.io/name=aws-ebs-csi-driver -n kube-system
kubectl get pod -n kube-system -l 'app in (ebs-csi-controller,ebs-csi-node)'
NAME READY STATUS RESTARTS AGE
ebs-csi-controller-67658f895c-lhd7n 6/6 Running 0 44s
ebs-csi-controller-67658f895c-lpxm6 6/6 Running 0 44s
ebs-csi-node-56hdd 3/3 Running 0 44s
ebs-csi-node-7vcv5 3/3 Running 0 44s
ebs-csi-node-g7llr 3/3 Running 0 44s
kubectl get pod -n kube-system -l app.kubernetes.io/component=csi-driver
# ebs-csi-controller 파드에 6개 컨테이너 확인
kubectl get pod -n kube-system -l app=ebs-csi-controller -o jsonpath='{.items[0].spec.containers[*].name}' ; echo
ebs-plugin csi-provisioner csi-attacher csi-snapshotter csi-resizer liveness-probe
# csinodes 확인
kubectl get csinodes
# gp3 스토리지 클래스 생성
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 38m
local-path rancher.io/local-path Delete WaitForFirstConsumer false 15m
cat <<EOT > gp3-sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gp3
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp3
allowAutoIOPSPerGBIncrease: 'true'
encrypted: 'true'
EOT
kubectl apply -f gp3-sc.yaml
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 39m
gp3 ebs.csi.aws.com Delete WaitForFirstConsumer true 7s
local-path rancher.io/local-path Delete WaitForFirstConsumer false 15m
kubectl describe sc gp3 | grep Parameters
Parameters: allowAutoIOPSPerGBIncrease=true,encrypted=true,type=gp3
- allowVolumeExpansion 은 자동 볼륨 확장 옵션이다.
- volumeBindingMode : WaitForFirstConsumer는
PVC/PV 파드 테스트
# 워커노드의 EBS 볼륨 확인 : tag(키/값) 필터링 - 링크
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --output table
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[*].Attachments" | jq
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[*].{ID:VolumeId,Tag:Tags}" | jq
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[].[VolumeId, VolumeType, Attachments[].[InstanceId, State][]][]" | jq
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" | jq
[
{
"VolumeId": "vol-0bbdc20c3154d8b99",
"VolumeType": "gp3",
"InstanceId": "i-015565a6e6bc7d99a",
"State": "attached"
},
{
"VolumeId": "vol-0fe2e5af33eece181",
"VolumeType": "gp3",
"InstanceId": "i-05692f52427286cd1",
"State": "attached"
},
{
"VolumeId": "vol-0f55c2ee0693471fa",
"VolumeType": "gp3",
"InstanceId": "i-037b799a444c56ff8",
"State": "attached"
}
]
# 워커노드에서 파드에 추가한 EBS 볼륨 확인
aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --output table
aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --query "Volumes[*].{ID:VolumeId,Tag:Tags}" | jq
aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" | jq
# 워커노드에서 파드에 추가한 EBS 볼륨 모니터링
while true; do aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" --output text; date; sleep 1; done
# PVC 생성
cat <<EOT > awsebs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
storageClassName: gp3
EOT
kubectl apply -f awsebs-pvc.yaml
persistentvolumeclaim/ebs-claim created
kubectl get pvc,pv
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/ebs-claim Pending gp3 30s
# 파드 생성
cat <<EOT > awsebs-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
terminationGracePeriodSeconds: 3
containers:
- name: app
image: centos
command: ["/bin/sh"]
args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: ebs-claim
EOT
kubectl apply -f awsebs-pod.yaml
# PVC, 파드 확인
kubectl get pvc,pv,pod
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/ebs-claim Bound pvc-f8144d29-4d84-4652-be59-a5b1bb600c89 4Gi RWO gp3 2m18s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-f8144d29-4d84-4652-be59-a5b1bb600c89 4Gi RWO Delete Bound default/ebs-claim gp3 80s
NAME READY STATUS RESTARTS AGE
pod/app 1/1 Running 0 84s
# 추가된 EBS 볼륨 상세 정보 확인
aws ec2 describe-volumes --volume-ids $(kubectl get pv -o jsonpath="{.items[0].spec.csi.volumeHandle}") | jq
# PV 상세 확인 : nodeAffinity 내용의 의미는?
kubectl get pv -o yaml | yh
...
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.ebs.csi.aws.com/zone
operator: In
values:
- ap-northeast-2b
...
kubectl get node --label-columns=topology.ebs.csi.aws.com/zone,topology.kubernetes.io/zone
NAME STATUS ROLES AGE VERSION
ZONE ZONE
ip-192-168-1-145.ap-northeast-2.compute.internal Ready <none> 38m v1.24.11-eks-a59e1f0 ap-northeast-2a ap-northeast-2a
ip-192-168-2-193.ap-northeast-2.compute.internal Ready <none> 38m v1.24.11-eks-a59e1f0 ap-northeast-2b ap-northeast-2b
ip-192-168-3-8.ap-northeast-2.compute.internal Ready <none> 38m v1.24.11-eks-a59e1f0 ap-northeast-2c ap-northeast-2c
kubectl describe node | more
# 파일 내용 추가 저장 확인
kubectl exec app -- tail -f /data/out.txt
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
# 아래 명령어는 확인까지 다소 시간이 소요됨
kubectl df-pv
PV NAME PVC NAME NAMESPACE NODE NAME POD NAME VOLUME MOUNT NAME SIZE USED AVAILABLE %USED IUSED IFREE %IUSED
pvc-f8144d29-4d84-4652-be59-a5b1bb600c89 ebs-claim default ip-192-168-3-8.ap-northeast-2.compute.internal app persistent-storage 3Gi 28Ki 3Gi 0.00 12 262132 0.00
## 파드 내에서 볼륨 정보 확인
kubectl exec -it app -- sh -c 'df -hT --type=overlay'
kubectl exec -it app -- sh -c 'df -hT --type=ext4'
Filesystem Type Size Used Avail Use% Mounted on
/dev/nvme1n1 ext4 3.8G 28K 3.8G 1% /data
PV 상세 확인 : nodeAffinity 내용의 의미는?
- nodeAffinity: 노드의 레이블을 기반으로 포드가 예약될 수 있는 노드를 제한할 수 있다.
- required: 노드가 예약된 파드에 대해 정의된 조건을 충족해야 함을 의미한다.
- nodeSelectorTerms: nodeAffinity 용어중 하나이다. 조건은 OR이다.
- matchExpressions: 노드 레이블과 관련된 요구 사항이다. 파드가 노드에서 실행될 수 있으려면 모든 표현식의 모든 요구 사항이 충족되어야 한다. 조건은 AND이다.
- key: topology.ebs.csi.aws.com/zone: 시스템이 노드에서 확인할 레이블 키이다. 이 특정 키는 Kubernetes가 AWS EBS 볼륨을 관리할 수 있게 해주는 플러그인인 AWS EBS CSI 드라이버와 관련이 있다.
- operator: In: 연산자는 키와 값 사이의 관계를 정의한다. In은 노드의 레이블 값(주어진 키에 대한)이 파드가 노드에서 실행될 수 있도록 지정된 값 세트에 있어야 함을 의미한다.
- values: - ap-northeast-2b: 레이블이 취할 수 있는 값입니다. 파드가 노드에서 실행될 수 있으려면 값이 "ap-northeast-2b"인 "topology.ebs.csi.aws.com/zone" 키가 있는 레이블이 노드에 있어야 한다.
볼륨 증가 - 링크
⇒ 늘릴수는 있어도 줄일수는 없다! - 링크
# 현재 pv 의 이름을 기준하여 4G > 10G 로 증가 : .spec.resources.requests.storage의 4Gi 를 10Gi로 변경
kubectl get pvc ebs-claim -o jsonpath={.spec.resources.requests.storage} ; echo
kubectl get pvc ebs-claim -o jsonpath={.status.capacity.storage} ; echo
kubectl patch pvc ebs-claim -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
persistentvolumeclaim/ebs-claim patched
kubectl patch pvc ebs-claim -p '{"status":{"capacity":{"storage":"10Gi"}}}' # status 는 바로 위 커멘드 적용 후 EBS 10Gi 확장 후 알아서 10Gi 반영됨
# 확인 : 볼륨 용량 수정 반영이 되어야 되니, 수치 반영이 조금 느릴수 있다
kubectl exec -it app -- sh -c 'df -hT --type=ext4'
Filesystem Type Size Used Avail Use% Mounted on
/dev/nvme1n1 ext4 9.8G 28K 9.7G 1% /data
kubectl df-pv
PV NAME PVC NAME NAMESPACE NODE NAME POD NAME VOLUME MOUNT NAME SIZE USED AVAILABLE %USED IUSED IFREE %IUSED
pvc-f8144d29-4d84-4652-be59-a5b1bb600c89 ebs-claim default ip-192-168-3-8.ap-northeast-2.compute.internal app persistent-storage 9Gi 28Ki 9Gi 0.00 12 655348 0.00
aws ec2 describe-volumes --volume-ids $(kubectl get pv -o jsonpath="{.items[0].spec.csi.volumeHandle}") | jq
삭제
kubectl delete pod app & kubectl delete pvc ebs-claim
3. AWS Volume SnapShots Controller
Volumesnapshots 컨트롤러 설치
# (참고) EBS CSI Driver에 snapshots 기능 포함 될 것으로 보임
kubectl describe pod -n kube-system -l app=ebs-csi-controller
# Install Snapshot CRDs
curl -s -O https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
curl -s -O https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
curl -s -O https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f snapshot.storage.k8s.io_volumesnapshots.yaml,snapshot.storage.k8s.io_volumesnapshotclasses.yaml,snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl get crd | grep snapshot
volumesnapshotclasses.snapshot.storage.k8s.io 2023-05-07T11:57:33Z
volumesnapshotcontents.snapshot.storage.k8s.io 2023-05-07T11:57:33Z
volumesnapshots.snapshot.storage.k8s.io 2023-05-07T11:57:33Z
kubectl api-resources | grep snapshot
volumesnapshotclasses vsclass,vsclasses snapshot.storage.k8s.io/v1 false VolumeSnapshotClass
volumesnapshotcontents vsc,vscs snapshot.storage.k8s.io/v1 false VolumeSnapshotContent
volumesnapshots vs snapshot.storage.k8s.io/v1 true VolumeSnapshot
# Install Common Snapshot Controller
curl -s -O https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
curl -s -O https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
kubectl apply -f rbac-snapshot-controller.yaml,setup-snapshot-controller.yaml
kubectl get deploy -n kube-system snapshot-controller
kubectl get pod -n kube-system -l app=snapshot-controller
NAME READY STATUS RESTARTS AGE
snapshot-controller-76494bf6c9-25lwj 1/1 Running 0 10s
snapshot-controller-76494bf6c9-nw5dq 1/1 Running 0 10s
# Install Snapshotclass
curl -s -O https://raw.githubusercontent.com/kubernetes-sigs/aws-ebs-csi-driver/master/examples/kubernetes/snapshot/manifests/classes/snapshotclass.yaml
kubectl apply -f snapshotclass.yaml
kubectl get vsclass # 혹은 volumesnapshotclasses
NAME DRIVER DELETIONPOLICY AGE
csi-aws-vsc ebs.csi.aws.com Delete 12s
Volume SnapShot 사용해보기
테스트 PVC/파드 생성한다.
# PVC 생성
kubectl apply -f awsebs-pvc.yaml
# 파드 생성
kubectl apply -f awsebs-pod.yaml
# VolumeSnapshot 생성 : Create a VolumeSnapshot referencing the PersistentVolumeClaim name >> EBS 스냅샷 확인
curl -s -O https://raw.githubusercontent.com/gasida/PKOS/main/3/ebs-volume-snapshot.yaml
cat ebs-volume-snapshot.yaml | yh
kubectl apply -f ebs-volume-snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/ebs-volume-snapshot created
# VolumeSnapshot 확인
kubectl get volumesnapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
ebs-volume-snapshot true ebs-claim 4Gi csi-aws-vsc snapcontent-6a6a074f-66a7-4a86-9c1a-04cc8aedf13d 21s 21s
kubectl get volumesnapshot ebs-volume-snapshot -o jsonpath={.status.boundVolumeSnapshotContentName} ; echo
kubectl describe volumesnapshot.snapshot.storage.k8s.io ebs-volume-snapshot
kubectl get volumesnapshotcontents
# VolumeSnapshot ID 확인
kubectl get volumesnapshotcontents -o jsonpath='{.items[*].status.snapshotHandle}' ; echo
snap-09b75e990eeab8919
# AWS EBS 스냅샷 확인
aws ec2 describe-snapshots --owner-ids self | jq
aws ec2 describe-snapshots --owner-ids self --query 'Snapshots[]' --output table
# data 확인
kubectl exec app -- cat /data/out.txt
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
강제로 장애 재현 후 스냅샷으로 복원한다.
# app & pvc 제거 : 강제로 장애 재현
kubectl delete pod app && kubectl delete pvc ebs-claim
# 스냅샷에서 PVC 로 복원
kubectl get pvc,pv
cat <<EOT > ebs-snapshot-restored-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-snapshot-restored-claim
spec:
storageClassName: gp3
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
dataSource:
name: ebs-volume-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
EOT
cat ebs-snapshot-restored-claim.yaml | yh
kubectl apply -f ebs-snapshot-restored-claim.yaml
# 확인
kubectl get pvc,pv
# 파드 생성
curl -s -O https://raw.githubusercontent.com/gasida/PKOS/main/3/ebs-snapshot-restored-pod.yaml
cat ebs-snapshot-restored-pod.yaml | yh
kubectl apply -f ebs-snapshot-restored-pod.yaml
# 파일 내용 저장 확인 : 파드 삭제 전까지의 저장 기록이 남아 있다. 이후 파드 재생성 후 기록도 잘 저장되고 있다
kubectl exec app -- cat /data/out.txt
...
Sun May 7 11:47:05 UTC 2023
Sun May 7 11:47:05 UTC 2023
Sun May 7 12:11:42 UTC 2023
...
# 삭제
kubectl delete pod app && kubectl delete pvc ebs-snapshot-restored-claim && kubectl delete volumesnapshots ebs-volume-snapshot
pod "app" deleted
persistentvolumeclaim "ebs-snapshot-restored-claim" deleted
volumesnapshot.snapshot.storage.k8s.io "ebs-volume-snapshot" deleted
4. AWS EFS Controller
EFS 파일시스템 확인 및 EFS Controller 설치
# EFS 정보 확인
aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text
fs-0712490dce491d5a1
# IAM 정책 생성
curl -s -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/docs/iam-policy-example.json
aws iam create-policy --policy-name AmazonEKS_EFS_CSI_Driver_Policy --policy-document file://iam-policy-example.json
{
"Policy": {
"PolicyName": "AmazonEKS_EFS_CSI_Driver_Policy",
"PolicyId": "ANPAY45ESHEKCRCMUU2IY",
"Arn": "arn:aws:iam::6118xxxxxxx:policy/AmazonEKS_EFS_CSI_Driver_Policy",
"Path": "/",
"DefaultVersionId": "v1",
"AttachmentCount": 0,
"PermissionsBoundaryUsageCount": 0,
"IsAttachable": true,
"CreateDate": "2023-05-07T12:59:31+00:00",
"UpdateDate": "2023-05-07T12:59:31+00:00"
}
}
# ISRA 설정 : 고객관리형 정책 AmazonEKS_EFS_CSI_Driver_Policy 사용
eksctl create iamserviceaccount \
--name efs-csi-controller-sa \
--namespace kube-system \
--cluster ${CLUSTER_NAME} \
--attach-policy-arn arn:aws:iam::6118xxxxxxxx:policy/AmazonEKS_EFS_CSI_Driver_Policy \
--approve
# ISRA 확인
kubectl get sa -n kube-system efs-csi-controller-sa -o yaml | head -5
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::6118xxxxxxxx:role/eksctl-myeks-addon-iamserviceaccount-kube-sy-Role1-14BWTQFZSF8YE
eksctl get iamserviceaccount --cluster myeks
ks
NAMESPACE NAME ROLE ARN
kube-system aws-load-balancer-controller arn:aws:iam::6118xxxxxxxx:role/eksctl-myeks-addon-iamserviceaccount-kube-sy-Role1-1K1FTMHEUMTMF
kube-system ebs-csi-controller-sa arn:aws:iam::6118xxxxxxxx:role/AmazonEKS_EBS_CSI_DriverRole
kube-system efs-csi-controller-sa arn:aws:iam::6118xxxxxxxx:role/eksctl-myeks-addon-iamserviceaccount-kube-sy-Role1-14BWTQFZSF8YE
- EFS 정책은 고객 관리형 정책을 사용한다.
- 고객 관리형 정책에는 account id가 들어간다.
# EFS Controller 설치
helm repo add aws-efs-csi-driver https://kubernetes-sigs.github.io/aws-efs-csi-driver/
helm repo update
helm upgrade -i aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver \
--namespace kube-system \
--set image.repository=602401143452.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com/eks/aws-efs-csi-driver \
--set controller.serviceAccount.create=false \
--set controller.serviceAccount.name=efs-csi-controller-sa
# 확인
helm list -n kube-system
NAME NAMESPACE REVISION UPDATED
STATUS CHART APP VERSION
aws-efs-csi-driver kube-system 1 2023-05-07 22:03:42.996474877 +0900 KST deployed aws-efs-csi-driver-2.4.1 1.5.4
aws-load-balancer-controller kube-system 1 2023-05-07 20:19:17.64763575 +0900 KST deployed aws-load-balancer-controller-1.5.2 v2.5.1
kube-ops-view kube-system 1 2023-05-07 20:20:14.120093903 +0900 KST deployed kube-ops-view-1.2.2
kubectl get pod -n kube-system -l "app.kubernetes.io/name=aws-efs-csi-driver,app.kubernetes.io/instance=aws-efs-csi-driver"
NAME READY STATUS RESTARTS AGE
efs-csi-controller-64d67ffcdd-8xj9k 3/3 Running 0 14s
efs-csi-controller-64d67ffcdd-p9g27 3/3 Running 0 14s
efs-csi-node-hdgbl 3/3 Running 0 14s
efs-csi-node-nbrcz 3/3 Running 0 14s
efs-csi-node-p7px7 3/3 Running 0 14s
EFS 파일시스템을 다수의 파드가 사용하게 설정(Add empty StorageClasses from static example)
# 모니터링
watch 'kubectl get sc efs-sc; echo; kubectl get pv,pvc,pod'
# 실습 코드 clone
git clone https://github.com/kubernetes-sigs/aws-efs-csi-driver.git /root/efs-csi
cd /root/efs-csi/examples/kubernetes/multiple_pods/specs && tree
.
├── claim.yaml
├── pod1.yaml
├── pod2.yaml
├── pv.yaml
└── storageclass.yaml
0 directories, 5 files
# EFS 스토리지클래스 생성 및 확인
cat storageclass.yaml | yh
kubectl apply -f storageclass.yaml
kubectl get sc efs-sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
efs-sc efs.csi.aws.com Delete Immediate false 5s
# PV 생성 및 확인 : volumeHandle을 자신의 EFS 파일시스템ID로 변경
EfsFsId=$(aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text)
sed -i "s/fs-4af69aab/$EfsFsId/g" pv.yaml
cat pv.yaml | yh
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
csi:
driver: efs.csi.aws.com
volumeHandle: fs-0712490dce491d5a1
kubectl apply -f pv.yaml
kubectl get pv; kubectl describe pv
# PVC 생성 및 확인
cat claim.yaml | yh
kubectl apply -f claim.yaml
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
efs-claim Bound efs-pv 5Gi RWX efs-sc 6s
# 파드 생성 및 연동 : 파드 내에 /data 데이터는 EFS를 사용
cat pod1.yaml pod2.yaml | yh
kubectl apply -f pod1.yaml,pod2.yaml
pod/app1 created
pod/app2 created
kubectl df-pv
# 파드 정보 확인 : PV에 5Gi 와 파드 내에서 확인한 NFS4 볼륨 크리 8.0E의 차이는 무엇? 파드에 6Gi 이상 저장 가능한가?
kubectl get pods
NAME READY STATUS RESTARTS AGE
app1 1/1 Running 0 25s
app2 1/1 Running 0 25s
kubectl exec -ti app1 -- sh -c "df -hT -t nfs4"
kubectl exec -ti app2 -- sh -c "df -hT -t nfs4"
Filesystem Type Size Used Available Use% Mounted on
127.0.0.1:/ nfs4 8.0E 0 8.0E 0% /data
# 공유 저장소 저장 동작 확인
tree /mnt/myefs # 작업용EC2에서 확인
/mnt/myefs
├── out1.txt
└── out2.txt
0 directories, 2 files
tail -f /mnt/myefs/out1.txt # 작업용EC2에서 확인
kubectl exec -ti app1 -- tail -f /data/out1.txt
Sun May 7 13:07:44 UTC 2023
Sun May 7 13:07:49 UTC 2023
Sun May 7 13:07:54 UTC 2023
...
kubectl exec -ti app2 -- tail -f /data/out2.txt
실습 완료 후 삭제한다.
# 쿠버네티스 리소스 삭제
kubectl delete pod app1 app2
kubectl delete pvc efs-claim && kubectl delete pv efs-pv && kubectl delete sc efs-sc
EFS 파일시스템을 다수의 파드가 사용하게 설정
Dynamic provisioning using EFS ← Fargate node는 현재 미지원
# 모니터링
watch 'kubectl get sc efs-sc; echo; kubectl get pv,pvc,pod'
# EFS 스토리지클래스 생성 및 확인
curl -s -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/examples/kubernetes/dynamic_provisioning/specs/storageclass.yaml
cat storageclass.yaml | yh
sed -i "s/fs-92107410/$EfsFsId/g" storageclass.yaml
kubectl apply -f storageclass.yaml
kubectl get sc efs-sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
efs-sc efs.csi.aws.com Delete Immediate false 2s
# PVC/파드 생성 및 확인
curl -s -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/examples/kubernetes/dynamic_provisioning/specs/pod.yaml
cat pod.yaml | yh
kubectl apply -f pod.yaml
kubectl get pvc,pv,pod
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/efs-claim Bound pvc-13090653-9897-42e2-a540-4e11de4ce075 5Gi RWX efs-sc 16s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-13090653-9897-42e2-a540-4e11de4ce075 5Gi RWX Delete Bound default/efs-claim efs-sc 15s
NAME READY STATUS RESTARTS AGE
pod/efs-app 1/1 Running 0 16s
# PVC/PV 생성 로그 확인
kubectl logs -n kube-system -l app=efs-csi-controller -c csi-provisioner -f
...
I0507 13:03:50.684727 1 controller.go:732] Using saving PVs to API server in background
I0507 13:03:50.685002 1 leaderelection.go:248] attempting to acquire leader lease kube-system/efs-csi-aws-com...
# 파드 정보 확인
kubectl exec -it efs-app -- sh -c "df -hT -t nfs4"
Filesystem Type Size Used Available Use% Mounted on
127.0.0.1:/ nfs4 8.0E 0 8.0E 0% /data
# 공유 저장소 저장 동작 확인
tree /mnt/myefs # 작업용EC2에서 확인
kubectl exec efs-app -- bash -c "cat data/out"
...
Sun May 7 13:11:35 UTC 2023
Sun May 7 13:11:40 UTC 2023
Sun May 7 13:11:45 UTC 2023
EFS → 액세스 포인트 확인 한다.
실습 완료 후 삭제한다.
# 쿠버네티스 리소스 삭제
kubectl delete -f pod.yaml
persistentvolumeclaim "efs-claim" deleted
pod "efs-app" deleted
kubectl delete -f storageclass.yaml
storageclass.storage.k8s.io "efs-sc" deleted
5. Fargate
Fargate 소개
EKS(컨트롤 플레인) + Fargate(데이터 플레인)의 완전한 서버리스화(=AWS 관리형)
Cluster Autoscaler 불필요, VM 수준의 격리 가능(VM isolation at Pod Level)
Farate Profile(파드가 사용할 서브넷, 네임스페이스, 레이블 조건)을 생성하여 지정한 파드가 Fargate에서 동작하게 한다.
EKS는 스케줄러가 특정 조건을 기준으로 어느 노드에 파드를 동작시킬지 결정한다.
혹은 특정 설정으로 특정 노드에 파드가 동작하게 가능한다.
6. EKS Persistent Volumes for Instance Store & Add NodeGroup
신규 노드 그룹 ng2 생성 / c5d.large 의 EC2 인스턴스 스토어(임시 블록 스토리지) 설정 작업
데이터 손실
기본 디스크 드라이브 오류, 인스턴스가 중지됨, 인스턴스가 최대 절전 모드로 전환된다. 인스턴스가 종료된다.
인스턴스 스토어는 EC2 스토리지(EBS) 정보에 출력되지는 않는다
#
# 인스턴스 스토어 볼륨이 있는 c5 모든 타입의 스토리지 크기
aws ec2 describe-instance-types \
--filters "Name=instance-type,Values=c5*" "Name=instance-storage-supported,Values=true" \
--query "InstanceTypes[].[InstanceType, InstanceStorageInfo.TotalSizeInGB]" \
--output table
--------------------------
| DescribeInstanceTypes |
+---------------+--------+
| c5d.large | 50 |
| c5d.12xlarge | 1800 |
...
#
eksctl create nodegroup --help
eksctl create nodegroup -c $CLUSTER_NAME -r $AWS_DEFAULT_REGION --subnet-ids "$PubSubnet1","$PubSubnet2","$PubSubnet3" --ssh-access \
-n ng2 -t c5d.large -N 1 -m 1 -M 1 --node-volume-size=30 --node-labels disk=nvme --max-pods-per-node 100 --dry-run > myng2.yaml
# nvme 관련 tool을 깔고 /dev/nvme1n1을 /data 디렉토리에 mount후 오토 마운트 설정을 해주는 것이다.
cat <<EOT >> nvme.yaml
preBootstrapCommands:
- |
# Install Tools
yum install nvme-cli links tree jq tcpdump sysstat -y
# Filesystem & Mount
mkfs -t xfs /dev/nvme1n1
mkdir /data
mount /dev/nvme1n1 /data
# Get disk UUID
uuid=\$(blkid -o value -s UUID mount /dev/nvme1n1 /data)
# Mount the disk during a reboot
echo /dev/nvme1n1 /data xfs defaults,noatime 0 2 >> /etc/fstab
EOT
sed -i -n -e '/volumeType/r nvme.yaml' -e '1,$p' myng2.yaml
eksctl create nodegroup -f myng2.yaml
2023-05-07 22:14:13 [ℹ] nodegroup "ng2" will use "" [AmazonLinux2/1.24]
2023-05-07 22:14:13 [ℹ] using SSH public key "/root/.ssh/id_rsa.pub" as "eksctl-myeks-nodegroup-ng2-86:9e:a1:95:22:b9:d1:0c:74:c6:86:6e:b5:21:c2:3b"
2023-05-07 22:14:14 [ℹ] 1 existing nodegroup(s) (ng1) will be excluded
2023-05-07 22:14:14 [ℹ] 1 nodegroup (ng2) was included (based on the include/exclude rules)
2023-05-07 22:14:14 [ℹ] will create a CloudFormation stack for each of 1 managed nodegroups in cluster "myeks"
2023-05-07 22:14:15 [ℹ]
2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "ng2" } }
}
2023-05-07 22:14:15 [ℹ] checking cluster stack for missing resources
2023-05-07 22:14:15 [ℹ] cluster stack has all required resources
2023-05-07 22:14:15 [ℹ] building managed nodegroup stack "eksctl-myeks-nodegroup-ng2"
2023-05-07 22:14:15 [ℹ] deploying stack "eksctl-myeks-nodegroup-ng2"
2023-05-07 22:14:15 [ℹ] waiting for CloudFormation stack "eksctl-myeks-nodegroup-ng2"
2023-05-07 22:14:45 [ℹ] waiting for CloudFormation stack "eksctl-myeks-nodegroup-ng2"
2023-05-07 22:15:44 [ℹ] waiting for CloudFormation stack "eksctl-myeks-nodegroup-ng2"
2023-05-07 22:16:14 [ℹ] waiting for CloudFormation stack "eksctl-myeks-nodegroup-ng2"
2023-05-07 22:17:29 [ℹ] waiting for CloudFormation stack "eksctl-myeks-nodegroup-ng2"
2023-05-07 22:17:29 [ℹ] no tasks
2023-05-07 22:17:29 [✔] created 0 nodegroup(s) in cluster "myeks"
2023-05-07 22:17:29 [ℹ] nodegroup "ng2" has 1 node(s)
2023-05-07 22:17:29 [ℹ] node "ip-192-168-1-172.ap-northeast-2.compute.internal" is ready
2023-05-07 22:17:29 [ℹ] waiting for at least 1 node(s) to become ready in "ng2"
2023-05-07 22:17:29 [ℹ] nodegroup "ng2" has 1 node(s)
2023-05-07 22:17:29 [ℹ] node "ip-192-168-1-172.ap-northeast-2.compute.internal" is ready
2023-05-07 22:17:29 [✔] created 1 managed nodegroup(s) in cluster "myeks"
2023-05-07 22:17:30 [ℹ] checking security group configuration for all nodegroups
2023-05-07 22:17:30 [ℹ] all nodegroups have up-to-date cloudformation templates
# 노드 보안그룹 ID 확인
NG2SGID=$(aws ec2 describe-security-groups --filters Name=group-name,Values=*ng2* --query "SecurityGroups[*].[GroupId]" --output text)
aws ec2 authorize-security-group-ingress --group-id $NG2SGID --protocol '-1' --cidr 192.168.1.100/32
{
"Return": true,
"SecurityGroupRules": [
{
"SecurityGroupRuleId": "sgr-0d51193ff1a1318ae",
"GroupId": "sg-04f5a2e98bb3d048c",
"GroupOwnerId": "61184xxxxxxx",
"IsEgress": false,
"IpProtocol": "-1",
"FromPort": -1,
"ToPort": -1,
"CidrIpv4": "192.168.1.100/32"
}
]
}
# 워커 노드 IP 확인
NAME STATUS ROLES AGE VERSION
ip-192-168-1-145.ap-northeast-2.compute.internal Ready <none> 128m v1.24.11-eks-a59e1f0
ip-192-168-1-172.ap-northeast-2.compute.internal Ready <none> 2m40s v1.24.11-eks-a59e1f0
ip-192-168-2-193.ap-northeast-2.compute.internal Ready <none> 128m v1.24.11-eks-a59e1f0
ip-192-168-3-8.ap-northeast-2.compute.internal Ready <none> 128m v1.24.11-eks-a59e1f0
# 워커 노드 SSH 접속
N4=192.168.1.172
ssh ec2-user@$N4 hostname
# 확인
ssh ec2-user@$N4 sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 vol0f5281507cfa5e25c Amazon Elastic Block Store 1 32.21 GB / 32.21 GB 512 B + 0 B 1.0
/dev/nvme1n1 AWS2DDB3D868F0A89DD2 Amazon EC2 NVMe Instance Storage 1 50.00 GB / 50.00 GB 512 B + 0 B 0
ssh ec2-user@$N4 sudo lsblk -e 7 -d
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme1n1 259:0 0 46.6G 0 disk /data
nvme0n1 259:1 0 30G 0 disk
ssh ec2-user@$N4 sudo df -hT -t xfs
Filesystem Type Size Used Avail Use% Mounted on
/dev/nvme0n1p1 xfs 30G 3.2G 27G 11% /
/dev/nvme1n1 xfs 47G 365M 47G 1% /data
ssh ec2-user@$N4 sudo tree /data
ssh ec2-user@$N4 sudo cat /etc/fstab
#
UUID=0ccd3c9e-3e0f-4e59-ae3c-9498ef40c541 / xfs defaults,noatime 1 1
/dev/nvme1n1 /data xfs defaults,noatime 0 2
# (옵션)
kubectl describe node -l disk=nvme | grep Allocatable: -A7
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 1930m
ephemeral-storage: 27905944324
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3097552Ki
pods: 100
# (옵션) kubelet 데몬 파라미터 확인 : --max-pods=29 --max-pods=100
ssh ec2-user@$N4 sudo ps -ef | grep kubelet
root 2973 1 1 13:16 ? 00:00:02 /usr/bin/kubelet --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime-endpoint unix:///run/containerd/containerd.sock --image-credential-provider-config /etc/eks/image-credential-provider/config.json --image-credential-provider-bin-dir /etc/eks/image-credential-provider --node-ip=192.168.1.172 --pod-infra-container-image=602401143452.dkr.ecr.ap-northeast-2.amazonaws.com/eks/pause:3.5 --v=2 --cloud-provider=aws --container-runtime=remote --node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,alpha.eksctl.io/cluster-name=myeks,alpha.eksctl.io/nodegroup-name=ng2,disk=nvme,eks.amazonaws.com/nodegroup-image=ami-0da378ed846e950a4,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=ng2,eks.amazonaws.com/sourceLaunchTemplateId=lt-0e533f7dba051cd0f --max-pods=29 --max-pods=100
스토리지 클래스 재생성한다.
# 기존 삭제
#curl -s -O https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
kubectl delete -f local-path-storage.yaml
#
sed -i 's/opt/data/g' local-path-storage.yaml
kubectl apply -f local-path-storage.yaml
# 모니터링
watch 'kubectl get pod -owide;echo;kubectl get pv,pvc'
ssh ec2-user@$N4 iostat -xmdz 1 -p nvme1n1
Linux 5.10.178-162.673.amzn2.x86_64 (ip-192-168-1-172.ap-northeast-2.compute.internal) 05/07/2023 _x86_64_ (2 CPU)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme1n1 0.00 0.00 0.65 0.66 0.01 0.06 115.97 0.00 0.81 0.15 1.47 0.51 0.07
# 측정 : Read
#curl -s -O https://raw.githubusercontent.com/wikibook/kubepractice/main/ch10/fio-read.fio
kubestr fio -f fio-read.fio -s local-path --size 10G --nodeselector disk=nvme
...
리소스 삭제한다.
# local-path-storage 삭제
kubectl delete -f local-path-storage.yaml
namespace "local-path-storage" deleted
serviceaccount "local-path-provisioner-service-account" deleted
clusterrole.rbac.authorization.k8s.io "local-path-provisioner-role" deleted
clusterrolebinding.rbac.authorization.k8s.io "local-path-provisioner-bind" deleted
deployment.apps "local-path-provisioner" deleted
storageclass.storage.k8s.io "local-path" deleted
configmap "local-path-config" deleted
# 노드그룹 삭제
eksctl delete nodegroup -c $CLUSTER_NAME -n ng2
2023-05-07 22:24:06 [ℹ] 1 nodegroup (ng2) was included (based on the include/exclude rules)
2023-05-07 22:24:06 [ℹ] will drain 1 nodegroup(s) in cluster "myeks"
2023-05-07 22:24:06 [ℹ] starting parallel draining, max in-flight of 1
2023-05-07 22:24:06 [ℹ] cordon node "ip-192-168-1-172.ap-northeast-2.compute.internal"
2023-05-07 22:24:06 [✔] drained all nodes: [ip-192-168-1-172.ap-northeast-2.compute.internal]
2023-05-07 22:24:06 [ℹ] will delete 1 nodegroups from cluster "myeks"
2023-05-07 22:24:06 [ℹ] 1 task: { 1 task: { delete nodegroup "ng2" [async] } }
2023-05-07 22:24:06 [ℹ] will delete stack "eksctl-myeks-nodegroup-ng2"
2023-05-07 22:24:06 [ℹ] will delete 0 nodegroups from auth ConfigMap in cluster "myeks"
2023-05-07 22:24:06 [✔] deleted 1 nodegroup(s) from cluster "myeks"
(실습 완료 후) 자원 삭제
공통 : AWS CloudFormation 에 IRSA 스택 삭제
helm uninstall -n kube-system kube-ops-view
aws cloudformation delete-stack --stack-name eksctl-$CLUSTER_NAME-addon-iamserviceaccount-kube-system-efs-csi-controller-sa
aws cloudformation delete-stack --stack-name eksctl-$CLUSTER_NAME-addon-iamserviceaccount-kube-system-ebs-csi-controller-sa
aws cloudformation delete-stack --stack-name eksctl-$CLUSTER_NAME-addon-iamserviceaccount-kube-system-aws-load-balancer-controller
CloudFormation 삭제한다.
eksctl delete cluster --name $CLUSTER_NAME && aws cloudformation delete-stack --stack-name $CLUSTER_NAME
스터디 후기
벌써 스터디 3주차가 되었습니다. 지난 스터디 이후 이직하여 현재는 DevOps Engineer로 AWS를 사용하고 있다보니, 저번 스터디 때보다 훨씬 더 이해가 잘되는 것 같습니다. 마지막까지 열심히 하겠습니다.
CloudNet@ 팀 , 가시다님 항상 준비하시느라 고생이 많으십니다. 감사합니다!
Reference
(참고) Kubestr & sar 모니터링 및 성능 측정 확인 (NVMe SSD) - 링크 Github 한글 CloudStorage
악분님 블로그 - EKS 스터디 3주차 1편 : EKS가 AWS스토리지를 다루는 원리
CloudNet@ 팀의 AWS EKS Workshop Study 정리
'교육, 커뮤니티 후기 > AEWS 스터디' 카테고리의 다른 글
AEWS 스터디 6주차 - EKS Security (4) | 2023.06.03 |
---|---|
AEWS 스터디 5주차 - EKS Autoscaling (0) | 2023.05.27 |
AEWS 스터디 4주차 - EKS Observability (0) | 2023.05.17 |
AEWS 스터디 2주차 - EKS Networking (0) | 2023.05.01 |
AEWS 스터디 1주차 - AWS EKS 설치 및 기본 사용 (2) | 2023.04.30 |