Trouble Shooting

K8s Worker Node에 지정한 Pod 배치하기(Taint, Tolerations)

Somaz 2024. 6. 17. 12:08
728x90
반응형

Overview

특정한 Pod가 너무많은 Memory와 CPU를 사용해서, Worker Node 장애가 발생하였다.

따라서 Worker Node에 지정한 Pod 배치하는 방법에 대해서 알아본다.

 


Worker Node에 지정한 Pod 배치하기

간단하게 Taint와 Tolerations를 활용해서, 특정 Worker Node에 지정한 Pod를 배치할 수 있다.

Taint는 특정 노드에 적용되어 Tolerations을 갖지 않는 파드를 거부한다.

Tolerations은 Taint 가 적용된 노드에 스케줄링될 수 있도록 파드에 설정한다.

node1 노드에 `key=value:NoSchedule` taint를 추가한다.

kubectl taint nodes node1 key=value:NoSchedule

 

간단하게 스크립트를 작성한다.

## check-node-taint.sh

#!/bin/bash

# List all nodes
nodes=$(kubectl get nodes -o jsonpath='{.items[*].metadata.name}')

for node in $nodes; do
  # Get the taints on each node
  taints=$(kubectl get node $node -o jsonpath='{.spec.taints}')

  if [ -n "$taints" ]; then
    # If taints are present, print them
    echo "Node $node has taints: $taints"
  else
    # If no taints are present
    echo "Node $node has no taints"
  fi
done

 

taint 결과를 확인해본다.

./check-node-taint.sh
Node master0 has taints: [{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"}]
Node node1 has taints: [{"effect":"NoSchedule","key":"key","value":"value"}]
Node node2 has no taints
Node node3 has no taints
Node node4 has no taints
Node node5 has no taints

 

 

그리고 helm chart에 아래와 같이 정의해준다.

nodeSelector:
  kubernetes.io/hostname: node1

tolerations:
  - key: "key"
    operator: "Exists"
    effect: "NoSchedule"

 

deployment를 활용하면 다음과 같이 적용이 잘되어있다.

k get deployments.apps -n somaz -o yaml | k neat |grep tolerations -A6
        tolerations:
        - effect: NoSchedule
          key: key
          operator: Exists
        volumes:
        - name: somaz-local
          persistentVolumeClaim:
 
  k get deployments.apps -n somaz -o yaml | k neat |grep nodeSelector -A6
        nodeSelector:
          kubernetes.io/hostname: node1
        restartPolicy: Always
        schedulerName: default-scheduler
        serviceAccount: somaz-server
        serviceAccountName: somaz-server
        terminationGracePeriodSeconds: 30

 

 

간단하게 node에 배포된 pod를 확인하는 script를 작성한다.

## multi-list-pod-on-node.sh

#!/bin/bash

# Function to list all nodes and prompt the user to select one or more, excluding master by default
select_node() {
  echo "Available nodes (excluding master nodes):"
  # Exclude master nodes unless they are explicitly requested
  kubectl get nodes --selector='!node-role.kubernetes.io/master' -o name
  echo ""
  echo "Enter the name of the node(s) separated by commas (e.g., node/node1,node/node2) or type 'all' to select all non-master nodes."
  read -p "Enter your choice: " INPUT

  if [ -z "$INPUT" ]; then
    echo "No input provided. Exiting."
    exit 1
  elif [ "$INPUT" == "all" ]; then
    NODES=$(kubectl get nodes --selector='!node-role.kubernetes.io/master' -o jsonpath='{.items[*].metadata.name}')
  else
    NODES=$(echo $INPUT | tr ',' '\n')
  fi
  echo "Selected nodes: $NODES"
}

# Function to list pods on the selected node(s)
list_pods_on_node() {
  for NODE in $NODES; do
    echo "Pods running on $NODE:"
    # Ensuring we're querying correctly by logging the field selector
    echo "Running command: kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=$(echo $NODE | cut -d'/' -f2)"
    kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=$(echo $NODE | cut -d'/' -f2)
    echo ""
  done
}

# Main script execution
select_node
list_pods_on_node

 

 

그리고 node를 확인하면 아래와 같이 배포되어있다. 기존 배포되었던 pod는 전부 재시작 해주었고 다른 node로 이동하였다.

./multi-list-pod-on-node.sh
Available nodes (excluding master nodes):
node/node1
node/node2
node/node3
node/node4
node/node5

Enter the name of the node(s) separated by commas (e.g., node/node1,node/node2) or type 'all' to select all non-master nodes.
Enter your choice: node/node1
Selected nodes: node/node1
Pods running on node/node1:
Running command: kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=node1
NAMESPACE            NAME                                  READY   STATUS    RESTARTS        AGE     IP              NODE    NOMINATED NODE   READINESS GATES
somaz                somaz-server                          2/2     Running   0               22h     10.233.90.60    node1   <none>           <none>
kube-system          calico-node-qc2d6                     1/1     Running   16 (59d ago)    2y22d   10.10.100.21    node1   <none>           <none>
kube-system          kube-proxy-b8vc7                      1/1     Running   7 (147d ago)    2y22d   10.10.100.21    node1   <none>           <none>
kube-system          metrics-server-dcb8c9c5b-mxrwv        1/1     Running   11 (2d6h ago)   21d     10.233.90.201   node1   <none>           <none>
kube-system          nginx-proxy-node1                     1/1     Running   22 (3d6h ago)   2y22d   10.10.100.21    node1   <none>           <none>
kube-system          nodelocaldns-27682                    1/1     Running   0               127d    10.10.100.21    node1   <none>           <none>
metallb-system       speaker-zt72w                         1/1     Running   6 (147d ago)    2y22d   10.10.100.21    node1   <none>           <none>

 

 


Reference

none.

728x90
반응형