Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-43038

Intermittent error "Cannot contact node123: java.lang.InterruptedException " in jenkins

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      We face below connection errors intermittently while running jobs on node123.

      Error which we see in build log is : Cannot contact node123: java.lang.InterruptedException

      I dont see any error in thread dump or any other logs related to this node.

      Also i see there was not connection drop between Master and node.

      Slave is see is running since more than 24 hrs now.

       

       

        Attachments

          Issue Links

            Activity

            Hide
            shahmishal mishal shah added a comment -

            Renjith Pillai Did you find a workaround for the x4 slowness? 

            Show
            shahmishal mishal shah added a comment - Renjith Pillai Did you find a workaround for the x4 slowness? 
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

            Show
            oleg_nenashev Oleg Nenashev added a comment - Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.
            Hide
            svanoort Sam Van Oort added a comment - - edited

            Manish Sawlani mishal shah Tsvi Mostovicz If you update to the latest Pipeline plugins and especially support-core plugin and use the suggested GC settings (https://jenkins.io/blog/2016/11/21/gc-tuning/) you should find that the InterruptedExceptions are pretty much gone – they are the result of timeouts in remoting-related operations generally. The only cases they should happen now I believe are actual hardware/system/network issues.

            In the last quarter of 2017 we did a big change to the way Pipeline's durable tasks interact with remoting that should avoid many of these issues.

            Edit: There was an additional issue fixed around support-core that caused problems and was recently fixed. Specifically, support-core plugin in version 2.42 added heap histogram analysis for diagnostics but this had the unexpected side effect of introducing periodic catastrophically long GC pauses that made the Jenkins master unresponsive for long periods and triggered timeouts (and thus the InterruptedException here when Timeouts kick in).

            Please see https://issues.jenkins-ci.org/browse/JENKINS-49931 for more details of that.

            For now I'm going to transition this to "closed" because when working with several users showing this among other symptoms, the suggestions above successfully resolved the issues – but I'm happy to re-open this if you all still experience problems after applying the above (please reply to note the same).

            Show
            svanoort Sam Van Oort added a comment - - edited Manish Sawlani mishal shah Tsvi Mostovicz If you update to the latest Pipeline plugins and especially support-core plugin and use the suggested GC settings ( https://jenkins.io/blog/2016/11/21/gc-tuning/ ) you should find that the InterruptedExceptions are pretty much gone – they are the result of timeouts in remoting-related operations generally. The only cases they should happen now I believe are actual hardware/system/network issues. In the last quarter of 2017 we did a big change to the way Pipeline's durable tasks interact with remoting that should avoid many of these issues. Edit: There was an additional issue fixed around support-core that caused problems and was recently fixed. Specifically, support-core plugin in version 2.42 added heap histogram analysis for diagnostics but this had the unexpected side effect of introducing periodic catastrophically long GC pauses that made the Jenkins master unresponsive for long periods and triggered timeouts (and thus the InterruptedException here when Timeouts kick in). Please see https://issues.jenkins-ci.org/browse/JENKINS-49931 for more details of that. For now I'm going to transition this to "closed" because when working with several users showing this among other symptoms, the suggestions above successfully resolved the issues – but I'm happy to re-open this if you all still experience problems after applying the above (please reply to note the same).
            Hide
            joebarber Joe Barber added a comment -

            Hi I am recently seeing the same "Cannot contact node123: java.lang.InterruptedException" error but only during parallel stages in a pipeline job.

            I have created a brand new Jenkins environment (Jenkins version 2.121.1) with all updated plugins and have the GC settings according to the gc-tuning page from the above comment.
            This issue is intermittent (about 1 every 8 builds or so).

            Support-Core version 2.48
            Pipeline version 2.5

            Any other advice?

             

            Thanks,

             

            Show
            joebarber Joe Barber added a comment - Hi I am recently seeing the same "Cannot contact node123: java.lang.InterruptedException" error but only during parallel stages in a pipeline job. I have created a brand new Jenkins environment (Jenkins version 2.121.1) with all updated plugins and have the GC settings according to the gc-tuning page from the above comment. This issue is intermittent (about 1 every 8 builds or so). Support-Core version 2.48 Pipeline version 2.5 Any other advice?   Thanks,  
            Hide
            svanoort Sam Van Oort added a comment -

            Joe Barber What you describe sounds a lot like https://issues.jenkins-ci.org/browse/JENKINS-46507 but we have not had a consistent way to reproduce the issue, so it's very hard to debug. If you can provide a simple, self-contained sample Pipeline in the comments of that ticket that will reproduce the issue, that would be very helpful. Thanks!

            Show
            svanoort Sam Van Oort added a comment - Joe Barber What you describe sounds a lot like https://issues.jenkins-ci.org/browse/JENKINS-46507 but we have not had a consistent way to reproduce the issue, so it's very hard to debug. If you can provide a simple, self-contained sample Pipeline in the comments of that ticket that will reproduce the issue, that would be very helpful. Thanks!

              People

              • Assignee:
                svanoort Sam Van Oort
                Reporter:
                msavlani1 Manish Sawlani
              • Votes:
                25 Vote for this issue
                Watchers:
                42 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: