Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46680

Computer offline by ping thread leaves the channel half open

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Reproducer:

      Launch a local agent over ssh/command launcher and stop its process by kill -TSTP $PID. The agent stops responding and Jenkins notices is eventually closing its connection with clear exception.

      Actual behavior:

      • The channel is never disassociated from its computer so long running operations and other clients that only care for computer.channel != null will keep using it throwing exceptions all over the place. EDIT: The computer is not even temporarily offline and it does not seem to improve after all monitors has run as they all choke on closed channel.
      • The channel is in the middle of closing procedure as it is outClosed but not inClosed. The other end does not send the close command for obvious reasons so it is never closed fully. I speculate that specifically is the reason why SlaveComputer#closeChannel() is not called thus causing the previous problem.

      Expected behavior:

      • The broken/half-closed/fully-closed channel is disassociated from computer that will therefore appear disconnected to all possible clients.

        Attachments

          Issue Links

            Activity

            Hide
            olivergondza Oliver Gondža added a comment -

            Fix proposed.

            Show
            olivergondza Oliver Gondža added a comment - Fix proposed.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Oliver Gondža
            Path:
            core/src/main/java/hudson/slaves/ChannelPinger.java
            core/src/test/java/hudson/slaves/ChannelPingerTest.java
            test/src/test/java/hudson/slaves/PingThreadTest.java
            http://jenkins-ci.org/commit/jenkins/dbb5e443b96ddc7472207862e9e60d807666f72c
            Log:
            JENKINS-46680 Disconnect computer on ping timeout (#3005)

            • [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout
            • JENKINS-46680 Attach channel termination offline cause on ping timeouts
            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oliver Gondža Path: core/src/main/java/hudson/slaves/ChannelPinger.java core/src/test/java/hudson/slaves/ChannelPingerTest.java test/src/test/java/hudson/slaves/PingThreadTest.java http://jenkins-ci.org/commit/jenkins/dbb5e443b96ddc7472207862e9e60d807666f72c Log: JENKINS-46680 Disconnect computer on ping timeout (#3005) JENKINS-46680 Reproduce in unittest [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout JENKINS-46680 Attach channel termination offline cause on ping timeouts
            Hide
            olivergondza Oliver Gondža added a comment -

            Postponing backport to 2.73.3 as it is fairly new for .2 and I would like to see this soaked properly.

            Show
            olivergondza Oliver Gondža added a comment - Postponing backport to 2.73.3 as it is fairly new for .2 and I would like to see this soaked properly.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Oliver Gondža
            Path:
            core/src/main/java/hudson/slaves/ChannelPinger.java
            core/src/test/java/hudson/slaves/ChannelPingerTest.java
            test/src/test/java/hudson/slaves/PingThreadTest.java
            http://jenkins-ci.org/commit/jenkins/06b0cd637c79728d7a9b552c36ca59f5c0260e26
            Log:
            JENKINS-46680 Disconnect computer on ping timeout (#3005)

            • [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout
            • JENKINS-46680 Attach channel termination offline cause on ping timeouts

            (cherry picked from commit dbb5e443b96ddc7472207862e9e60d807666f72c)

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oliver Gondža Path: core/src/main/java/hudson/slaves/ChannelPinger.java core/src/test/java/hudson/slaves/ChannelPingerTest.java test/src/test/java/hudson/slaves/PingThreadTest.java http://jenkins-ci.org/commit/jenkins/06b0cd637c79728d7a9b552c36ca59f5c0260e26 Log: JENKINS-46680 Disconnect computer on ping timeout (#3005) JENKINS-46680 Reproduce in unittest [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout JENKINS-46680 Attach channel termination offline cause on ping timeouts (cherry picked from commit dbb5e443b96ddc7472207862e9e60d807666f72c)

              People

              • Assignee:
                olivergondza Oliver Gondža
                Reporter:
                olivergondza Oliver Gondža
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: