Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-49707

Auto retry for elastic agents after channel closure

    Details

    • Similar Issues:

      Description

      While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

      Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
      

      There's a spinning arrow below it.

      I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

      I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

      Things seem stuck. Please advise.

        Attachments

        1. grub.remoting.logs.zip
          3 kB
        2. grubSystemInformation.html
          67 kB
        3. image-2018-02-22-17-27-31-541.png
          image-2018-02-22-17-27-31-541.png
          56 kB
        4. image-2018-02-22-17-28-03-053.png
          image-2018-02-22-17-28-03-053.png
          30 kB
        5. JavaMelodyGrubHeapDump_4_07_18.pdf
          220 kB
        6. JavaMelodyNodeGrubThreads_4_07_18.pdf
          9 kB
        7. jenkins_agent_devbuild9_remoting_logs.zip
          4 kB
        8. jenkins_Agent_devbuild9_System_Information.html
          66 kB
        9. jenkins_agents_Thread_dump.html
          172 kB
        10. jenkins_support_2018-06-29_01.14.18.zip
          1.26 MB
        11. jenkins.log
          984 kB
        12. jobConsoleOutput.txt
          12 kB
        13. jobConsoleOutput.txt
          12 kB
        14. MonitoringJavaelodyOnNodes.html
          44 kB
        15. NetworkAndMachineStats.png
          NetworkAndMachineStats.png
          224 kB
        16. slaveLogInMaster.grub.zip
          8 kB
        17. support_2018-07-04_07.35.22.zip
          956 kB
        18. threadDump.txt
          98 kB
        19. Thread dump [Jenkins].html
          219 kB

          Issue Links

            Activity

            piratejohnny Jon B created issue -
            piratejohnny Jon B made changes -
            Field Original Value New Value
            Description While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

            ```

            Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down

            ```

            There's a spinning arrow below it.

            I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

            I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

             

            Please advise.

            !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
            While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

             
            {code:java}
            Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
            {code}
             

            There's a spinning arrow below it.

            I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

            I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

             

            Please advise.

            !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
            piratejohnny Jon B made changes -
            Description While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

             
            {code:java}
            Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
            {code}
             

            There's a spinning arrow below it.

            I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

            I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

             

            Please advise.

            !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
            While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:
            {code:java}
            Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
            {code}
            There's a spinning arrow below it.

            I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

            I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

            Things seem stuck. Please advise.

            !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
            oleg_nenashev Oleg Nenashev made changes -
            Component/s pipeline [ 21692 ]
            Component/s core [ 15593 ]
            abayer Andrew Bayer made changes -
            Component/s workflow-durable-task-step-plugin [ 21715 ]
            Component/s pipeline [ 21692 ]
            piratejohnny Jon B made changes -
            Summary Pipeline stuck: "The channel is closing down or has closed down" Pipeline hangs: "The channel is closing down or has closed down"
            piratejohnny Jon B made changes -
            Component/s remoting [ 15489 ]
            Component/s workflow-durable-task-step-plugin [ 21715 ]
            oleg_nenashev Oleg Nenashev made changes -
            Component/s _unsorted [ 19622 ]
            slaughter550 Alex Slaughter made changes -
            Priority Minor [ 4 ] Major [ 3 ]
            fnaum Federico Naum made changes -
            Assignee Federico Naum [ fnaum ]
            fnaum Federico Naum made changes -
            Attachment jenkins_agent_devbuild9_remoting_logs.zip [ 43244 ]
            Attachment jenkins_agents_Thread_dump.html [ 43245 ]
            Attachment jenkins_Agent_devbuild9_System_Information.html [ 43246 ]
            Attachment jenkins_support_2018-06-29_01.14.18.zip [ 43247 ]
            fnaum Federico Naum made changes -
            Attachment jenkins_agents_Thread_dump.html [ 43248 ]
            fnaum Federico Naum made changes -
            Attachment jenkins_Agent_devbuild9_System_Information.html [ 43249 ]
            fnaum Federico Naum made changes -
            fnaum Federico Naum made changes -
            Attachment jenkins_support_2018-06-29_01.14.18.zip [ 43247 ]
            fnaum Federico Naum made changes -
            Attachment jenkins_agents_Thread_dump.html [ 43245 ]
            fnaum Federico Naum made changes -
            Attachment jenkins_Agent_devbuild9_System_Information.html [ 43249 ]
            fnaum Federico Naum made changes -
            Attachment jobConsoleOutput.txt [ 43294 ]
            fnaum Federico Naum made changes -
            Attachment jobConsoleOutput.txt [ 43295 ]
            Attachment grub.remoting.logs.zip [ 43296 ]
            Attachment NetworkAndMachineStats.png [ 43297 ]
            Attachment JavaMelodyGrubHeapDump_4_07_18.pdf [ 43298 ]
            Attachment JavaMelodyNodeGrubThreads_4_07_18.pdf [ 43299 ]
            Attachment MonitoringJavaelodyOnNodes.html [ 43300 ]
            Attachment grubSystemInformation.html [ 43301 ]
            Attachment Thread dump [Jenkins].html [ 43302 ]
            Attachment support_2018-07-04_07.35.22.zip [ 43303 ]
            Attachment slaveLogInMaster.grub.zip [ 43304 ]
            Attachment jenkins.log [ 43305 ]
            tom_ghyselinck Tom Ghyselinck made changes -
            Assignee Federico Naum [ fnaum ] Oleg Nenashev [ oleg_nenashev ]
            tom_ghyselinck Tom Ghyselinck made changes -
            Link This issue is duplicated by JENKINS-49241 [ JENKINS-49241 ]
            tom_ghyselinck Tom Ghyselinck made changes -
            Link This issue is duplicated by JENKINS-47868 [ JENKINS-47868 ]
            oleg_nenashev Oleg Nenashev made changes -
            Assignee Oleg Nenashev [ oleg_nenashev ] Jeff Thompson [ jthompson ]
            oleg_nenashev Oleg Nenashev made changes -
            Component/s _unsorted [ 19622 ]
            tom_ghyselinck Tom Ghyselinck made changes -
            Link This issue is related to JENKINS-41854 [ JENKINS-41854 ]
            jglick Jesse Glick made changes -
            Summary Pipeline hangs: "The channel is closing down or has closed down" Auto retry for elastic agents after channel closure
            Issue Type Bug [ 1 ] New Feature [ 2 ]
            Component/s workflow-durable-task-step-plugin [ 21715 ]
            Component/s remoting [ 15489 ]
            Assignee Jeff Thompson [ jthompson ]
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-36013 [ JENKINS-36013 ]
            jglick Jesse Glick made changes -
            Link This issue is duplicated by JENKINS-43607 [ JENKINS-43607 ]
            dubrsl Viacheslav Dubrovskyi made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            amirbarkal Amir Barkal made changes -
            Attachment threadDump.txt [ 45197 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "workflow-durable-task-step #104 (Web Link)" [ 22737 ]
            jglick Jesse Glick made changes -
            Link This issue relates to INFRA-2140 [ INFRA-2140 ]
            jglick Jesse Glick made changes -
            Link This issue is duplicated by JENKINS-57675 [ JENKINS-57675 ]
            jglick Jesse Glick made changes -
            Link This issue is duplicated by JENKINS-56673 [ JENKINS-56673 ]
            vlatombe Vincent Latombe made changes -
            Assignee Jesse Glick [ jglick ]
            vlatombe Vincent Latombe made changes -
            Remote Link This issue links to "kubernetes-plugin PR #461 (Web Link)" [ 23226 ]
            jglick Jesse Glick made changes -
            Assignee Jesse Glick [ jglick ]
            allan_burdajewicz Allan BURDAJEWICZ made changes -
            Link This issue relates to JENKINS-59340 [ JENKINS-59340 ]
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-61387 [ JENKINS-61387 ]

              People

              • Assignee:
                Unassigned
                Reporter:
                piratejohnny Jon B
              • Votes:
                33 Vote for this issue
                Watchers:
                47 Start watching this issue

                Dates

                • Created:
                  Updated: