K8s Kubernetes incubating
Architecture
graph LR; c1(Cluster) --> n1(node1) --> p11[pod 1] n1 --> p12[pod 2] c1 --> n2(node2) --> p21[pod 1]
API docs - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/
Insert diagram about control nodes + worker nodes + api servers etc.
#refine https://kubernetes.io/docs/concepts/architecture/
Components
Metric Server
Use HostNetwork
Helps monitor pods
Can run kubectl top pod
to check resource usage for pods
kubectl top nodes
CNCF
Project status
graph LR; a(Sandbox
New) --> b(Incubating
More wide-spread adoption, active development) --> c[Graduated
mature, stable part of k8s core]
Kubernetes Plugins?
CRIO
Kubernetes container runtime #readmore
CNI
Common network interface
Jaeger
Kubernetes Operator, manages packaging, deploying and managing applications
Rook
Storage Orchestrator
Cluster Autoscaler
https://github.com/kubernetes/autoscaler
Kubernetes Distributions
Rancher
Red Hat OpenShift
SUSE Containers as a Service
Kubernetes Managed Services
AWS Elastic Kubernetes Service
Azure Kubernetes Service
Google Kubernetes Engine
Certifications
CKAD
Developers
CKA
Admins
Kubernetes Dashboard
Kubernetes Database
etcd
essentially a key value pair
all k8s resources are stored in etcd in json format
json is not very human friendly, so yaml is the de-facto choice for k8s config files, which are called manifests.
A manifest file broadly contains -
apiVersion: # v1, v1beta1, v1beta2 etc.
kind: # pod, deployment, secret, configmap etc.
metadata:
annotations: # used for configurations sometimes
labels:
selector:
name: # name of the object
resourceVersion: # value changes with each update
data: # found in secret, and configmap objects
spec: # configs, varies by object, absent for some like secret, configmaps
kubectl
Config file ~/.kube/config
Structure of the config file, and the values that need to be specified -
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: xxxx
server: https://172.22.28.5:6443
name: kubernetes
contexts: # combination of cluster, username and namespace
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: xxxx
client-key-data: xxxx
If this file is not present or has invalid details of a cluster, you might see an error like
$ kubectl get all
E1223 11:43:00.538822 14558 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
This file is usually generated with help of /etc/kubernetes/admin.conf
file from the control node. This file’s user is kube admin
Commands
incorrect! see below
kubectl create deployment
can’t specify replicas!kubectl create deployment my-dep --image=nginx --replicas=3
kubectl explain pod
shows all the fields that are necessary to configure a pod.
To deep dive into a particular property, use
kubectl explain pod.Spec
To find out specific fields to specify for configuring [[K8S Scheduling#Node Affinity|nodeAffinity]], use
kubectl explain pod.spec.affinity.nodeAffinity
Generate yaml
from existing resources, use
kubectl get <resource> -o yaml
Remember to cleanup the output (metadata, and status) as these should be added automatically when the resource is created
kubectl delete pod/podname --grace-period=0 --force
to delete pod immediately
Troubleshooting and Debugging
When creating a pod, kubernetes first adds it to the etcd store.
kubectl describe
can highlight problems if there is an issue during this initial step. Once the pod is added to etcd, it’s then started up.
kubectl describe pod <podname>
Containers[].State shows the current state of containers in the pod
Containers:
busybox:
Container ID: containerd://
Image: busybox
Image ID: docker.io/library/busybox@sha256:ver
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed # this can hint at a problem where the container has exited after completing its task
Exit Code: 0
Started: Sun, 10 Dec 2023 00:06:04 +0000
Finished: Sun, 10 Dec 2023 00:06:04 +0000
Ready: False
Restart Count: 7
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sfpsw (ro)
Events section shows any errors including any errors like CrashLoopBackOff
. Latest events are at the bottom.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
...
Warning BackOff 3m24s (x48 over 13m) kubelet Back-off restarting failed container busybox in pod mydep-8677c6d8bd-8c2c6_default(72a66cb8-e29a-4be3-ac99-8f88565327f3)
Once the pod is running, in addition to kubectl describe
more information can be found out using the following commands.
kubectl get pods
- high level view of pods in a namespace
kubectl get pod/pod-id -o yaml
another way of getting similar info as kubectl describe
, here the interesting field to watch is status.condition
, and status.containerStatuses:
Restart a pod?
kubectl scale
kubectl rollout restart deployment name
kubectl delete pod name
kunectl replace -f pod/name
To find problems when a container/pod is running, use the following commands -
kubectl logs podname --all-containers
get logs from all containers in pod podname
kubectl logs podname -c container
get logs from container
in pod podname
kubectl exec -it podname -- /bin/sh
get a session into the container, if the container has a shell
Even if a container has a shell, you will find many of the regular utilities missing since images are usually optimized for runtime.
The [[Linux#Proc|proc]] file system can still help in such a case to find running processes etc.
Helpful debugging kubectl
commands for most objects -
describe
Show details of a specific resource or group of resources
logs
Print the logs for a container in a pod
events
List events
attach
Attach to a running container
exec
Execute a command in a container
cp
Copy files and directories to and from containers
port-forward
Forward one or more local ports to a pod
proxy
Run a proxy to the Kubernetes API server
auth
Inspect authorization
debug
Create debugging sessions for troubleshooting workloads and nodes
Problems with Nodes
kubectl get nodes
shows which nodes are available and in a ready state
kubectl cordon
- Use to mark node(s) unschedulable, can use selector. Use uncordon
once the maintenance is done.
kubectl drain
- Prepare node for maintenance by removing running pods gracefully and marking it unschedulable for new pods.
The behaviour differs based on how the pod is started on the node -
- If controlled by a daemon-set, the pods are ignored! Since the daemonset controller ignores the unschedulable node state.
- If controlled by deployment, replicat-set, stateful-set, job, replication controller, then drain will either evict the pods (if supported by API server), or delete them.
- If there are standalone pods, these won’t be deleted or evicted unless
--force
flag is specified.
If a node is NOT_READY
,
- Check if
kubelet
is running on a node. - Check networking plugin is setup properly and running
Q: How to port forward to local, when running kubectl
in docker?
A: start the kubectl
container on docker, and expose a port
docker run -it --name kubectl -p 8000:8000 kubectl:latest
now run port forward as normal, but listen on 0.0.0.0
in addition to localhost
.
kubectl -n workload port-forward svc/workload --address localhost,0.0.0.0 8000:8000
Kubernetes Objects
Diagram
![[k8s-objects.png]]
Deployments
Adds scalability, high availability, self healing capabilities to a pod by defining replication strategy and update strategy
kubectl create deployment my-dep --image=nginx --replicas=3
Example declaration
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 2 # tells deployment to run 2 pods matching the template
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
kubectl rollout history deployments
provides recent rollout events including reason for change (scale out/in not included)
kubectl rollout history deployment/my-app
Rollback a failed deployment to previous version
kubectl rollout undo deployment/my-app --to-revision=1
Update Strategy
Specify rollingUpdate
or recreate
(can cause temporary )
Deployment | rollingUpdate | recreate |
---|---|---|
Note | deploys new replicaset, then removes old replicaset | |
Disruption | no | yes |
Useful for | add examples | add examples |
apiVersion:
kind: Deployment
spec:
strategy: rollingUpdate | recreate
ReplicaSet
Use labels to monitor pods. If you remove a label from the pod, see another come up within seconds, check the 1st and 2nd pod in output below.
root@controlplane:~$ kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
my-dep-7674c564c-9t2wk 1/1 Running 0 6m39s pod-template-hash=7674c564c,test=worksok
my-dep-7674c564c-gxzb7 1/1 Running 0 6m39s app=my-dep,pod-template-hash=7674c564c
my-dep-7674c564c-svzgv 1/1 Running 0 4s app=my-dep,pod-template-hash=7674c564c
my-dep-7674c564c-v4mnc 1/1 Running 0 6m39s app=my-dep,pod-template-hash=7674c564c
DaemonSet
StatefulSet
Pods
Usually a group of containers, volume declarations
Smallest app building block in k8s, replicated across nodes to achieve the app’s desired availability, scalability, performance, capacity requirements.
Smallest unit of compute that can be deployed.
A Pod is similar to a set of containers with shared namespaces and shared filesystem volumes
Offers similar isolation as [[Containers]] using cgroups, namespaces etc.
Execute a command in a container contained in the pod -
kubectl exec -it <podname> -c <container-name> -- /bin/sh
There no ntworking within a pod. Any containers running within a pod use the same IP.
Run a single stand alone pod, change it’s default image “command”, check the output using kubectl logs
kubectl run busybox --image busybox --command -- nslookup kubernetes
kubectl run busybox --image busybox --command -- sleep 3600
kubectl exec busybox -it -- nslookup kubernetes
Double dash --
separates the kubectl
command from the command you want to run in the container. Use -n namespace
immediately after kubectl
to avoid passing this argument to the container command instead.
#ask can you do this in a single step? run a pod/container, and get the output on command line?
Labels
Add identifying information to an object. This information can then be used to query and select objects. Labels help add information to objects that is relevant to users, so are useful in UI or CLI.
Labels allow users to map their own org structure on system resources. Things like environment, team etc.
A label key and value must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores, up to 63 characters each.
Optionally, the key can begin with a DNS subdomain prefix and a single ‘/’, like example.com/my-app
.
It appears under the metadata field
apiVersion:
kind:
metadata:
label:
app: myawesomeapp
List labels applied to a pod. By default, show-labels=false
kubectl get pod/nginx --show-labels
Apply label to a pod
kubectl label pod/podid newlabel=value
Remove an existing label from a pod
kubectl label pod/podid newlabel-
Note the trailing -
Update an existing label
kubectl label pod/podid oldlabel=newvalue --overwrite
without --overwrite
flag label is not updated
If --overwrite
is true, then existing labels can be overwritten, otherwise attempting to overwrite a label will result in an error.
Inspect labels applied to all objects
kubectl get all --all-namespaces --show-labels
Use Selector flag to list only resources with a specific label
kubectl get all --selector app=my-dep
Some labels are applied automatically, example on a [[#Namespace]], kubernetes.io/metadata.name=namespacename
If --resource-version
is specified, then updates will use this resource version, otherwise the existing resource-version will be used. This resource-version
available under metadata.resourceVersion
.
Selector
Appears under spec
apiVersion:
kind:
metadata:
spec:
selector:
matchLabels:
app: myawesomeapp
Use Selector flag to list only resources with a specific label
kubectl get all --selector app=my-dep
Annotations
Add non-identifying information/metadata to objects. Annotations cannot be used to query and select objects. Information or metadata added as annotation to objects is mostly for use by machines ex - iam role annotations in case of IRSA
Deployment versions are added as annotations to the metadata
field in the manifest yml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
Labels vs Annotations
- Labels = identifying information, Annotations = non-identifying information
- Labels can be used to select objects or collection of objects, annotations cannot be used to identify or select objects
- Annotations can contain characters not allowed by labels
Property | Labels | Annotations | Notes |
---|---|---|---|
Identifying information | yes | no | a |
Limited characters | yes | no | a |
Use with selector | yes | no | a |
User friendly | yes | no | a |
Namespace
Some objects are namespaced scoped while others are cluster wide
Objects can have same name across namespaces, but must be unique within a namespace
Provides isolation for resources
All namespaces
Use --all-namespace
and -n
flags to work with all, or a specific namespace
kubectl [verb] [resource] --all-namespaces
kubectl [verb] [resource] -A
kubectl [verb] [resource] -n namespace
Existing namespaces
kubectl get ns
on a fresh cluster will show these 4 existing namespaces
default Active 48m
kube-node-lease Active 48m
kube-public Active 48m
kube-system Active 48m
Namespace Issue
When creating a [[#Service]], a corresponding DNS entry like service.namespace.svc.cluster.local
is created. Due to this, all namespace names must be valid DNS name.
To connect to a service in the same namespace, just specifying service
is enough. It will be resolved locally within the same namespace. This is useful to launch multiple environments with the same config without much modifications.
To connect to a service in a different namespace, fully qualified name service.othernamespace.svc.cluster.local
must be used.
[!danger] Be careful about namespaces matching public domain names.
Suppose, a namespace is named
com
, it contains a service calledgoogle.com.svc.cluster.local
. If another service,foo
in the same namespace tries to reach the publicgoogle.com
, it will get resolved to the localRestrict permissions to create namespaces, and use admission controllers to further enforce this.
[!Test] Launch a service called landing in ai namespace, are other services in that space able to reach the public landing.ai service?
I wasn’t able to reproduce this behaviour :(
Update: I don’t see this happening with the busybox
image, BUT this can be seen with the dnsutils
image. All properties are exactly the same between both pods, so it might be down to the OS used in each image 🤷
$ kubectl -n ai get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
landing ClusterIP 10.97.252.177 <none> 80/TCP 21m
# this still resolves to the public landing.ai service
$ kubectl exec busybox -it -- nslookup landing.ai
Server: 10.96.0.10
Address: 10.96.0.10:53
Non-authoritative answer:
Name: landing.ai
Address: 35.196.113.152
Non-authoritative answer:
# this resolves to the private landing.ai service
$ kubectl exec busybox -it -- nslookup landing.ai.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10:53
Name: landing.ai.svc.cluster.local
Address: 10.97.252.177
$ kubectl exec -it busybox -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
$ kubectl exec -it dnsutils -- nslookup launch.ai
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: launch.ai.svc.cluster.local
Address: 10.106.142.53
$ kubectl exec -it dnsutils -- nslookup launch
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can\'t find launch: NXDOMAIN
$ kubectl exec -it dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
Service
Almost like a virtual load balancer, connected to [[#Deployments]] using [[#Labels]]
Properties - id address, target port, and endpoints, session affinity?
It connects to the nodes which run kube-proxy. kube-proxy uses iptables to connect to the pods running on the nodes.
This service object ensures, the traffic is redirected to one of the pods.
kubectl get svc -A
shows all services running in a cluster
kubectl expose
creates a service by looking up a deployment, replica set, replication controller, pod or another service by name and using the selector of the resource.
kubectl expose deployment nginx --port=80 --target-port=8000
Port vs Target Port? target port is the port on the pod that the service target, port is the port that the service exposes
Cluster IP
default, internal access only
NodePort
ties a port of the node to the node of a pod, accessible from outside the cluster
LoadBalancer
Public cloud load balancers
ExternalName
uses DNS names, redirection happens at DNS level
Service without selector
use for direct connections based on ip/port, without an endpoint. Useful for databases and within namespaces
#ask Can I not use a service for resources with no labels?
#ask What is a headless service?
Ingress
Successor [[GatewayApi]]
Provides a http route from outside the cluster to services running in the cluster. It can also handle ssl termination, load balancing and name based virtual hosting.
Example ingress sending all traffic to a single service
graph LR; client([client])-. Ingress-managed
load balancer .->ingress[Ingress]; ingress-->|routing rule|service[Service]; subgraph cluster ingress; service-->pod1[Pod]; service-->pod2[Pod]; end classDef plain fill:#ddd,stroke:#fff,stroke-width:4px,color:#000; classDef k8s fill:#326ce5,stroke:#fff,stroke-width:4px,color:#fff; classDef cluster fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5; class ingress,service,pod1,pod2 k8s; class client plain; class cluster cluster;
Example ingress config
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minimal-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx-example
rules:
- http:
paths:
- path: /testpath
pathType: Prefix
backend:
service:
name: test
port:
number: 80
Ingress spec has rules which are matched against all incoming http requests, and the traffic is directed accordingly.
Ingress Annotations are often used to configure certain properties depending on the ingress controller in use.
If no host
is specified in rules as in the example above, it matches all hosts.
Backend can also be a resource
, but you cannot specify both resource
and service
for a path. resource
backend is useful for directing requests for static assets to an object storage.
pathType
can be one of Prefix
, Exact
, or ImplementationSpecific
(upto the IngressClass)
For exposing arbitraty protocols and ports, [[NodePort]] or LoadBalancer service type can be used.
An ingress resource on its own doesn’t mean anything, it needs an [[Ingress Controller]] to be present on the cluster to provide the required functionality.
For handling TLS, the ingress spec should refer to a secret which provides the cert and secret key. For TLS to work properly, the host
values in spec.tls.hosts
must match spec.rules.host
.
#find how is the ingress configured in the general eks cluster?
Ingress Controller
Various options like nginx, aws alb, istio etc.
Each ingress controller implements a particular ingress class. For ex, for aws load balancer controller, it is alb
. (ref)
Netowrking
Node contains pods which is controlled by a deployment, each pod has an IP. But
Service is connected to deployment using label
IP is a pod property, not container property, kubectl describe pod
shows the IP assigned to a pod, or use kubectl get pods -o wide
4 major problems -
- container to container communication - handled by [[##Pods|pod]],
localhost
communication - pod to pod communication - explained below
- pod to service communication - handled by [[#Service|services]]
- external to service communication - handled by [[#Service|services]]
Plugin
When changing a network plugin - ensure the network cidr stays the same
DNS
# service
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 33s
#pods
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-5dd5756b68-2l28z 1/1 Running 0 33s
kube-system coredns-5dd5756b68-t55kw 1/1 Running 0 33
What objects get dns names?
- [[#Service]]
service.namespace.svc.cluster.local
- [[#Pods]]
pod-ipv4.namespace.pod.cluster.local
Each pod has a dns policy defined under pod.spec.dnsPolicy
, value is either of
Default
, inherits from the nodeClusterFirst
, any query not matching cluster domain is forwarded to upstream DNS servers.ClusterFirstWithHostNet
, for pods running withhostNetwork: true
.- None, specify dns configs under
pod.spec.dnsConfig
Note: default
is NOT the default dns policy. If no policy is used ClusterFirst
is used.
do an nsloop on kubernetes
pod/busybox created
$ kubectl logs pod/busybox
Server: 10.96.0.10
Address: 10.96.0.10:53
** server can't find kubernetes.cluster.local: NXDOMAIN
** server can't find kubernetes.cluster.local: NXDOMAIN
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
** server can't find kubernetes.svc.cluster.local: NXDOMAIN
** server can't find kubernetes.svc.cluster.local: NXDOMAIN
This provides the ip of the kubernetes service which can be verified using kubectl describe svc/kubernetes
Note: lookup only works within the namespace. Outside the namespace, you won’t get the result!
$ kubectl run dnsnginx --image busybox --command -- nslookup nginx
pod/dnsnginx created
$ kubectl run dnskube --image busybox --command -- nslookup kube-dns
pod/dnskube created
$
$ kubectl logs pod/dnsnginx
Server: 10.96.0.10
Address: 10.96.0.10:53
** server can't find nginx.cluster.local: NXDOMAIN
** server can't find nginx.cluster.local: NXDOMAIN
** server can't find nginx.default.svc.cluster.local: NXDOMAIN
** server can't find nginx.default.svc.cluster.local: NXDOMAIN
$ kubectl -n nginx run dnsnginx --image busybox --command -- nslookup nginx
pod/dnsnginx created
root@controlplane:~$ kubectl logs pod/dnsnginx -n nginx
Server: 10.96.0.10
Address: 10.96.0.10:53
Name: nginx.nginx.svc.cluster.local
Address: 10.99.230.232
** server can't find nginx.cluster.local: NXDOMAIN
** server can't find nginx.cluster.local: NXDOMAIN
Why is that? Check the dns config inserted into a pod -
$ kubectl -n nginx exec -it dnsnginx -- /bin/sh
/ # cat /etc/resolv.conf
search nginx.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
The name server 10.96.0.10
points to the kube-dns
service running in the kube-system
namespace
$ kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 52m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 52m
nginx nginx ClusterIP 10.99.230.232 <none> 80/TCP 43m
Q: How to connect to a service running in namespace B if it can’t be queried from pods in namespace A?
A: Service name can be queried using the format servicename.namespace
from any namespace in the cluster
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
Storage
kubectl explain pod.spec.volumes
shows the different volume types that are available for use.
Volumes
Volumes can be ephermal or persistent.
To use a volume within a pod’s containers, you need to specify spec.volumes
and spec.containers[*].volumeMounts
. The container so created sees the data contained in the image + any data mounted as a volume.
Specified for a pod in spec.volumes
, to check all the available configuration options, use kubectl explain pod.spec.volumes
Volume types were cloud specific which have now been deprecated in favor of 3rd party [[##storage drivers]] instead. The following volume types are still valid -
- Secret (always mounted as RO, don’t use as subpath to receive updates)
- ConfigMap (always mounted as RO, don’t use as subpath to receive updates)
- Local, Empty Dir, Host Path relate to local filesystems of the node.
- PVC
- Projected
- Downward API - check
coredns
pods
graph LR; subgraph pod subgraph container1 m1[volMount] end subgraph container2 m2[volMount] end subgraph volumes v[vol] end end subgraph storage pv[pv] end subgraph claim pvc[pvc] end pv --bound--> pvc v --> m1 v --> m2 pvc --> v
PV Persistent Volumes decouple the storage requirements from pod development. PV use properties like accessModes, capacity, mountOptions, pvreclaimPolicy, volumeMode etc to mount the persistent volume to the pod.
PV can be created manually (manifest) or dynamically (using a storage class)
Access Modes can be one of the following
- ReadWriteOnce (RWO) - A single node can mount this volume as read write. Many pods on this node can still use the volume.
- ReadOnlyMany (ROX) - Many pods can mount the volume as read only.
- ReadWriteMany (RWX) - Many pods can mount the volume as read, write.
- ReadWriteOncePod (RWOP) - A single pod can mount the volume as read, write (version v1.22 onwards only).
PVC Persistent volume claims are used by pod authors to add storage needs in a declarative way, without worrying about storage specifics.
PVC use properties like accessModes, volumeMode, storageClassName, resources, selector to provision the storage as per the requirements.
kubectl explain pvc.spec
to know about all the properties.
Simple local example
# pv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv-vol # not used anywhere
labels:
type: local
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 2Gi
hostPath:
path: "/data" # this should exist on host
# pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pv-claim # used in pod.spec.volumes[].pvc.claimName
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi # <= pv.spec.capacity.storage
# pod.yaml
kind: Pod
apiVersion: v1
metadata:
name: pv-pod
spec:
containers:
name: pv-container
image: nginx
ports:
- containerPort: 80
name: nginxhttp
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: cvol # from pod.spec.volumes[].name
volumes:
- name: cvol
persistentVolumeClaim:
claimName: pv-claim # from pvc.metadata.name
#ask what happens if pvc.spec.requests.storage
> pv.spec.capacity.storage
?
Storage Class
- can be grouped according to anything - capacity, type, location etc.
- Uses
spec.provisioner
to connect to the storage - When a PVC does not specify a
storageClassName
, the default StorageClass is used. - The cluster can only have one default StorageClass. If more than one default StorageClass is set, the newest default is used.
Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/aws-ebs
parameters: # provisioner specific parameters
type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
- debug
volumeBindingMode: Immediate
ConfigMap
Decouple configuration from application
example, notice it uses data
instead of the usual spec
apiVersion: v1
kind: ConfigMap
metadata:
name: nginxcm
data:
# use in `pod.spec.volumes[].configMap.items[].key`
nginx-custom-config.conf: |
server {
listen 8080;
server_name localhost;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
}
}
Use it in a pod
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- name: conf
mountPath: /etc/nginx/conf.d/
volumes:
- name: conf
configMap:
name: nginxcm
items:
# key as in configMap.data.key
- key: nginx-custom-config.conf
# path within the container
path: default.conf
Secrets
Decouple sensitive variables from application
example - notice it uses data
instead of the usual spec
apiVersion: v1
kind: Secret
metadata:
name: secret
data:
username: encodedusername
password: encodedpassword
Kubernetes API
Collection of [[RESTful APIs]], supports GET, POST, DELETE. It is crucial to identify api version to use.
#ask why did kubernetes project choose this RESTful API approach?
To allow the system to continuously evolve and grow.
New features can be easily added without impacting existing clients as alpha, and moved to beta, then stable version as they mature.
It also allows the project to maintain compatibility with existing clients by offering both beta and stable version of an API simultaneously (for a length of time).
Versioning is done at the API level rather than at the resource or field level to ensure that the API presents a clear, consistent view of system resources and behavior, and to enable controlling access to end-of-life and/or experimental APIs.
The API server handles the conversion between API versions transparently: all the different versions are actually representations of the same persisted data. The API server may serve the same underlying data through multiple API versions.
So, if I create a resource using an API version v1beta1
, I can later use v1
version to query or manage it (within the deprecation period). Some fields may need updating due to the API graduating to v1
, but, I can still migrate to the newer version of the API without having to destroy and recreate the resource.
API versions cannot be removed in future versions until this issue is fixed.
API access is controlled by the API server.
It saves the serialized objects in [[etcd]].
API resources are distinguished by their API group, resource type, namespace (for namespaced resources), and name.
Monitor deprecated API requests - apiserver_requested_deprecated_apis
metric. This can help identify if there are objects in the cluster still using deprecated APIs.
#ask kube-proxy, where is it hosted, host it works?
graph LR subgraph server api[api/etcd] end cr[curl] --> kp[kube-proxy] --> api
List available resource APIs, their kind, groups, version, namespaced (bool), version, any shortnames etc, use
kubectl api-resources -o wide
API groups can be enabled or disabled using --runtime-config
flag on API server
$ kubectl api-resources
NAME SHORTNAMES APIVERSION NAMESPACED KIND
..
configmaps cm v1 true ConfigMap
...
namespaces ns v1 false Namespace
nodes no v1 false Node
persistentvolumeclaims pvc v1 true PersistentVolumeClaim
persistentvolumes pv v1 false PersistentVolume
pods po v1 true Pod
...
secrets v1 true Secret
serviceaccounts sa v1 true ServiceAccount
services svc v1 true Service
...
networking.k8s.io/v1 false IngressClass
ingresses ing networking.k8s.io/v1 true Ingress
networkpolicies netpol
...
List versions of available API
kubectl api-versions
$ kubectl api-versions
..
apps/v1
authentication.k8s.io/v1
authorization.k8s.io/v1
autoscaling/v1
autoscaling/v2
..
flowcontrol.apiserver.k8s.io/v1beta2 flowcontrol.apiserver.k8s.io/v1beta3
networking.k8s.io/v1
node.k8s.io/v1
policy/v1
rbac.authorization.k8s.io/v1
scheduling.k8s.io/v1
storage.k8s.io/v1
v1
Find properties required for an object
kubectl explain <object.property>
Find namespace scoped APIs or cluster wide APIs
kubectl api-resources --namespaced=true
kubectl api-resources --namespaced=false
Proxy kubectl
to access the API more easily using [[Curl]]
#ask But why would you do this?
If an app needs to interact with kubernetes, it can simply use the language specific http library to do this directly instead of going through kubectl
kubectl proxy --port=8080
$ curl http://localhost:8080/version
{
"major": "1",
"minor": "28",
"gitVersion": "v1.28.2",
"gitCommit": "89a4ea3e1e4ddd7f7572286090359983e0387b2f",
"gitTreeState": "clean",
"buildDate": "2023-09-13T09:29:07Z",
"goVersion": "go1.20.8",
"compiler": "gc",
"platform": "linux/amd64"
}
Get pods from kube-system
namespace (truncated output)
$ curl http://localhost:8080/api/v1/namespaces/kube-system/pods | les
{
"kind": "PodList",
"apiVersion": "v1",
"metadata": {
"resourceVersion": "1555"
"name": "coredns-5dd5756b68-fcz42",
"generateName": "coredns-5dd5756b68-",
"namespace": "kube-system",
"uid": "326ba1b7-31b6-4d6c-9978-1057f6734154",
"resourceVersion": "553",
..
Check the openapi v3 specification (truncated output) on /openapi/v3
, and v2 specification on /openapi/v2
$ curl http://localhost:8080/openapi/v3
{
"paths": {
".well-known/openid-configuration": {
"serverRelativeURL": "/openapi/v3/.well-known/openid-configuration?hash=4488--"
},
"api": {
"serverRelativeURL": "/openapi/v3/api?hash=929E--"
},
"api/v1": {
"serverRelativeURL": "/openapi/v3/api/v1?hash=5133--"
},
"apis": {
"serverRelativeURL": "/openapi/v3/apis?hash=27E0--"
},
"apis/admissionregistration.k8s.io": {
"serverRelativeURL": "/openapi/v3/apis/admissionregistration.k8s.io?hash=E8D5.."
}
}
API Extensions
This is a way to make the API server recognize new non-standard Kubernetes objects.
Example - Prometheus Operator uses a number of CRDs to manage the deployment in a cluster.
Needs to be enabled and then runs in-process in the kube-apisever
.
You first need to create an APIService object, say myawesomeapi
, at a path, say apis/myawesomeapi/v1beta1/
. The aggregation layer then proxies any requests API server receives for this API to the registered APIService.
Example - metrics server
Create a Cluster Manually!
Kubernetes releases before v1.24 included a direct integration with Docker Engine, using a component named dockershim.
What is dockershim?
To provide support for multile container runtimes, CRI API/ specification was developed. But since docker was the first container runtime k8s supported, and to maintain backward compatibility, dockershim
was developed which allowed kubelet to interact with docker runtime via the CRI API, sort of like a proxy?
graph LR; kb[kubelet] <--cri--> ds[dockershim] <--> dc[docker] <--> cd[containerd] --> c1[container 1] cd --> c2[container 2] cd --> cn[container n]
graph LR; kb[kubelet] <--cri--> ccd[cri-containerd] <--> cd[containerd] --> c1[container 1] cd --> c2[container 2] cd --> cn[container n]
Create a 3 node cluster - 1 control node, and 2 worker nodes.
- On control node,
kubeadm init
- On control node, Networking
- On worker node,
kubeadm join
Control node
- Install a container runtime like
docker
,containerd
,cri-o
. - Install kube tools like
kubeadm
,kubelet
,kubectl
kubeadm init
Ref- Setup
$HOME/.kube/config
- Verify all hosts are present under
/etc/hosts
- Install a pod network add-on - any one of calico, cilium, flannel etc.
Worker nodes
- Install a container runtime like
docker
,containerd
,cri-o
. - Install kube tools like
kubeadm
,kubelet
,kubectl
kubeadm join --token xx --discovery-cert xx
To create high availability - use 3 controller nodes, each running with etcd, or use a dedicated etcd cluster
Older, needs refining
Pod Disruption Budgets https://kubernetes.io/docs/tasks/run-application/configure-pdb/
Based on the value of maxUnavailable
for specific pods, cluster autoscaler will either ignore a node, or scale it down.
Pod Affinity https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ uses pod labels
example
apiVersion: v1
kind: Pod
metadata:
name: label-demo
labels:
environment: production
app: nginx
spec: . . .
Pods Anti Affinity https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity
labels allow us to use selectors
labels are also useful to slice/dice resources when using kubectl
kubectl get pods -Lapp -Ltier -Lrole
-L
displays an extra column in kubectl
output
-l
either selects or update the label applied to a resourse.
Selectors can be of 2 types
Equality based (accelerator=nvidia-tesla-p100
)
apiVersion: v1
kind: Pod
metadata:
name: cuda-test
spec:
containers:
- name: cuda-test
image: "registry.k8s.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
accelerator: nvidia-tesla-p100
Set based
apiVersion: v1
kind: Pod
metadata:
name: cuda-test
spec:
containers:
- name: cuda-test
image: "registry.k8s.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
environment: qa,qa1 # and condition
accelerator in (nvidia, intel)
service, replicationcontroller format for selector
selector:
component: redis
daemonset, replicaset, deployment, job format for selector
selector:
matchLabels:
component: redis
matchExpressions:
- {key: component, values: [redis]}
Labels
Standard or default
kubernetes.io/arch
kubernetes.io/hostname # cloud provider specific
kubernetes.io/os
node.kubernetes.io/instance-type # if available to kubelet
topology.kubernetes.io/region #
topology.kubernetes.io/zone #
Labels and selectors
[[K8S Scheduling]]
Troubleshooting
Pod Error Alerts
sum (kube_pod_container_status_waiting_reason{reason=~"CrashLoopBackOff|ImagePullBackOff|ErrImagePull.+"}) by (namespace, container, reason)
Questions
1. What is the value of kubernetes.io/hostname
in [[eks]]?
I know it’s part of standard labels #todo (link it) but, not seen this tag really on [[eks]]. Found following tags instead 🤷
kubernetes.io/cluster/myawesomecluster=owned
- `aws:eks:cluster-name=myawesomecluster
2. How do pods communicate with each other in a cluster?
3. How will you control which pod runs on which node(s)
Mix of scheduling options like node selector, affinity/anti-affinity, taints, tolerations etc.