Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54643

A connection interruption causes the pipeline to fail when USE_WATCHING=true

    Details

    • Similar Issues:

      Description

      Run Jenkins with -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=true. Add an agent launched via SSH (the launch method may not be important; this is just what I've observed the issue with).

      Add a pipeline job with this script:

      node('mynode') {
          sh '''#!/bin/sh -e
              for n in $(seq 100); do
                  echo "$n"
                  sleep 1
              done
          '''
          sh 'echo OK'
      }
      

      Run the pipeline. When it starts printing numbers to the log, disconnect the master from the network. After 30 seconds, reconnect it.

      What happens is that for a while (haven't measured, but it feels like a couple of minutes) nothing new appears in the log. After that, the job instantly completes, but:

      • Some of the output is missing from the log.
      • The "echo OK" step doesn't run.
      • The pipeline fails with an EOFException.

      I'm attaching a full example log.

      By contrast, with USE_WATCHING=false the log resumes a few seconds after the reconnection, no output is skipped and the job succeeds.

        Attachments

          Issue Links

            Activity

            Hide
            svanoort Sam Van Oort added a comment -

            Jesse Glick Have you seen this one?

            Show
            svanoort Sam Van Oort added a comment - Jesse Glick Have you seen this one?
            Hide
            jglick Jesse Glick added a comment -

            We have a functional test for a similar scenario which does not display this issue, but it is probably too simple.

            Show
            jglick Jesse Glick added a comment - We have a functional test for a similar scenario which does not display this issue, but it is probably too simple.
            Hide
            jglick Jesse Glick added a comment -

            The failure of the second sh step sounds like JENKINS-41854. Why watch mode would trigger that, I am not sure. The channel is getting closed, which is unsurprising if the network is unplugged (for example, a ping thread would be expected to fail); the more interesting question is why it does not get closed when in polling mode.

            Loss of some output in the face of network outages is hard to avoid in watch mode; this is simply a tradeoff for far more efficient network and master CPU utilization. PR 86 discussed possible alternative approaches that would adjust the tradeoffs.

            Show
            jglick Jesse Glick added a comment - The failure of the second sh step sounds like JENKINS-41854 . Why watch mode would trigger that, I am not sure. The channel is getting closed, which is unsurprising if the network is unplugged (for example, a ping thread would be expected to fail); the more interesting question is why it does not get closed when in polling mode. Loss of some output in the face of network outages is hard to avoid in watch mode; this is simply a tradeoff for far more efficient network and master CPU utilization. PR 86 discussed possible alternative approaches that would adjust the tradeoffs.
            Hide
            jglick Jesse Glick added a comment -

            Filed JENKINS-56851 for loss of output.

            Show
            jglick Jesse Glick added a comment - Filed JENKINS-56851 for loss of output.
            Hide
            jglick Jesse Glick added a comment -

            Considering a duplicate of JENKINS-41854 since that was the primary reported problem.

            Show
            jglick Jesse Glick added a comment - Considering a duplicate of JENKINS-41854 since that was the primary reported problem.

              People

              • Assignee:
                jglick Jesse Glick
                Reporter:
                rdonchen_intel Roman Donchenko
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: