Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-56673

Better handling of ChannelClosedException in Declarative pipeline

    Details

    • Type: Improvement
    • Status: Resolved (View Workflow)
    • Priority: Minor
    • Resolution: Duplicate
    • Component/s: kubernetes-plugin
    • Labels:
      None
    • Environment:
      Jenkins: 2.150.2, k8s plugin version: 1.14.3
    • Similar Issues:

      Description

      When pods get deleted for any reason,  there is a log/exception like so:

      hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from .... failed. The channel is closing down or has closed down 

      The job then appears to hang indefinitely until a timeout is reached or it's stopped manually.

      In our use case (k8s using preemptible vms) we actually expect pods to be deleted mid build and want to be able to handle pod deletion with a retry.

      I have not been able to find a way to handle this in declarative syntax.

      For testing, using a very simple declarative example:

          stages {
              stage('Try test') {
                  steps {
                      container('jnlp') {
                          sh """
                          echo Kill the pod now
                          sleep 5m
                          """
                      }
                  }
                  post {
                      failure {
                          echo "Failuuure"
                      }
                  }
              }

      But the exception does not actually trigger the failure block when the pod is killed.

      Is there currently any best practice to handle the deletion of a pod? Are there any timeout parameters that would be useful in this case?

      I'm happy to add a PR to the Readme after learning

        Attachments

          Issue Links

            Activity

            Hide
            csanchez Carlos Sanchez added a comment -

            I think this is just another version of JENKINS-55392
            You can't catch these exceptions are they are underlying infra issues

            Show
            csanchez Carlos Sanchez added a comment - I think this is just another version of JENKINS-55392 You can't catch these exceptions are they are underlying infra issues
            Hide
            bkmeneguello Bruno Meneguello added a comment -

            Carlos Sanchez
            I don't think this is the same case.
            I've opened a ticket with the same problem (sorry).
            What I've tracked is when my pods are killed (by OOMKiller, usually) the job doesn't get aborted, it hangs indefinitely. The node label gets an "offline" and node log displays the OP message.
            If I click the abort button, the job is aborted immediately. So, why this don't occur when the node is detected to be offline?

            Show
            bkmeneguello Bruno Meneguello added a comment - Carlos Sanchez I don't think this is the same case. I've opened a ticket with the same problem (sorry). What I've tracked is when my pods are killed (by OOMKiller, usually) the job doesn't get aborted, it hangs indefinitely. The node label gets an "offline" and node log displays the OP message. If I click the abort button, the job is aborted immediately. So, why this don't occur when the node is detected to be offline?
            Hide
            jglick Jesse Glick added a comment -

            I believe the patch for JENKINS-49707 addresses this.

            Show
            jglick Jesse Glick added a comment - I believe the patch for JENKINS-49707 addresses this.

              People

              • Assignee:
                Unassigned
                Reporter:
                cfebs Collin Lefeber
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: