Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62794

Pipeline jobs running in K8s agent hangs if server pod is restarted 2 or more times

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • kubernetes-plugin
    • Jenkins Version: v2.222.1 LTS (used the image from Docker hub: jenkins/jenkins:2.222.1)
      Jenkins Kubernetes plugin version: 1.25.7
      Kubernetes version: 1.18.3

      We have a K8s setup running Jenkins master in one of the deployments. We have configured the Jenkins talk to run dynamic agents.

      During an upgrade/restart of Jenkins Server if there is a job running in a pod, if the Jenkins server pod is restarted (by deleting the pod kubernetes will start the pod again), the job is resuming after the server is fully up and running. 

      If due to some reason, the Jenkins server pod is restarted 2 or more times (while a pipeline job is running in a agent pod) before the Jenkins container is fully up (restarting either manully or by other means like Kubernetes kills and starts the pod again, etc.,) the running job hangs and fails to resume. with below error

      ERROR: Issue with creating launcher for agent k8s-pod-label-xxxxx-yyyyy. The agent has not been fully initialized yet

      This issue is seen on all kinds of pods

      How to duplicate this issue:

      1. Setup K8s and create a deployment to run Jenkins. This will run a pod that provisions Jenkins. Setup Jenkins to use Kubernetes agents. 
      2. Create a pipeline Job, that runs on Kubernetes agents and trigger the job and let it start the pod and run the stages and steps
      3. While the pipeline step is executing, delete the pod that runs the server. 
      4. K8s will start a new pod to provision Jenkins. Delete the newly started pod also. 
      5. Wait for K8s to complete provisioning the pod again and check the pipeline job's console and observe the job is hung forever. It is expected to resume the job and complete.

      In my case, the jenkins pod is restarted multiple times by K8s due to OOM error or other reasons.  

      The test pipeline job has the following definition

      cloudprovider.yaml
      apiVersion: v1
      kind: Pod
      metadata:
        name: test
      spec:
        containers:
          - name: ubuntu
            image: 'debian:buster-slim'
            imagePullPolicy: Always
            command: ['cat']
            tty: true
            resources:
              requests:
                cpu: 200m
                memory: 500Mi
              limits:
                cpu: 500m
                memory: 1Gi
      
      Jenkinsfile
      pipeline {
          agent {
              kubernetes {
                  label "test-container"
                  yamlFile 'cloudprovider.yaml'
                  defaultContainer 'ubuntu'
              }
          }
      
          
          stages {
      
              stage ('Build') {
                  steps {
      		sh '''
                         echo "Start"
                         sleep 100
                         ls -lah
                         echo "End"
                      '''
                  }
              }
          }
      }
      

            Unassigned Unassigned
            shankar128 Shankar Ramasamy
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: