Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-21251

Slave channel timeout seemingly caused by usage of System.currentTimeMillis()

    Details

    • Similar Issues:

      Description

      My projects run on separate slave machines/nodes.
      Some of these projects have testcases that change the system clock.
      This causes Jenkins and the slave to immediately disconnect. It always happens.

      I've tried to dig into it a bit and here's what I have:
      I'm looking at hudson.remoting.PingThread.java
      The timeout on the channel ping is enforced by calculating the difference between 2 counter values, each obtained with System.currentTimeMillis().
      However System.currentTimeMillis() is not the right way of obtaining such counters - the values are not monotonically increasing.
      If wall-clock time changes for any reason (like maybe Daylight savings or a manual change) then the value returned might become smaller than a moment ago.
      Obtaining the counter values with System.nanoTime() would solve this.

      STACK TRACE below:

      FATAL: Unable to delete script file C:\WINDOWS\TEMP\hudson2959765976006672203.bat
      00:44:00 hudson.util.IOException2: remote file operation failed: C:\WINDOWS\TEMP\hudson2959765976006672203.bat at hudson.remoting.Channel@62ec4e:UTM-RT5AUTOM027
      00:44:00 at hudson.FilePath.act(FilePath.java:901)
      00:44:00 at hudson.FilePath.act(FilePath.java:878)
      00:44:00 at hudson.FilePath.delete(FilePath.java:1263)
      00:44:00 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
      00:44:00 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
      00:44:00 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      00:44:00 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802)
      00:44:00 at hudson.model.Build$BuildExecution.build(Build.java:199)
      00:44:00 at hudson.model.Build$BuildExecution.doRun(Build.java:160)
      00:44:00 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
      00:44:00 at hudson.model.Run.execute(Run.java:1592)
      00:44:00 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      00:44:00 at hudson.model.ResourceController.execute(ResourceController.java:88)
      00:44:00 at hudson.model.Executor.run(Executor.java:237)
      00:44:00 Caused by: hudson.remoting.ChannelClosedException: channel is already closed
      00:44:00 at hudson.remoting.Channel.send(Channel.java:494)
      00:44:00 at hudson.remoting.Request.call(Request.java:129)
      00:44:00 at hudson.remoting.Channel.call(Channel.java:672)
      00:44:00 at hudson.FilePath.act(FilePath.java:894)
      00:44:00 ... 13 more
      00:44:00 Caused by: java.io.IOException: Unexpected termination of the channel
      00:44:00 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      00:44:00 Caused by: java.io.EOFException
      00:44:00 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      00:44:00 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      00:44:00 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      00:44:00 at hudson.remoting.Command.readFrom(Command.java:92)
      00:44:00 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
      00:44:00 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      00:44:00 FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      00:44:00 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      00:44:00 at hudson.remoting.Request.call(Request.java:174)
      00:44:00 at hudson.remoting.Channel.call(Channel.java:672)
      00:44:00 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      00:44:00 at $Proxy45.join(Unknown Source)
      00:44:00 at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:925)
      00:44:00 at hudson.Launcher$ProcStarter.join(Launcher.java:360)
      00:44:00 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
      00:44:00 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
      00:44:00 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      00:44:00 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802)
      00:44:00 at hudson.model.Build$BuildExecution.build(Build.java:199)
      00:44:00 at hudson.model.Build$BuildExecution.doRun(Build.java:160)
      00:44:00 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
      00:44:00 at hudson.model.Run.execute(Run.java:1592)
      00:44:00 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      00:44:00 at hudson.model.ResourceController.execute(ResourceController.java:88)
      00:44:00 at hudson.model.Executor.run(Executor.java:237)
      00:44:00 Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      00:44:00 at hudson.remoting.Request.abort(Request.java:299)
      00:44:00 at hudson.remoting.Channel.terminate(Channel.java:732)
      00:44:00 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
      00:44:00 Caused by: java.io.IOException: Unexpected termination of the channel
      00:44:00 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      00:44:00 Caused by: java.io.EOFException
      00:44:00 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      00:44:00 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      00:44:00 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      00:44:00 at hudson.remoting.Command.readFrom(Command.java:92)
      00:44:00 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
      00:44:00 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

        Attachments

          Issue Links

            Activity

            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: James Nord
            Path:
            src/main/java/hudson/remoting/AtmostOneThreadExecutor.java
            src/main/java/hudson/remoting/Channel.java
            src/main/java/hudson/remoting/PingThread.java
            src/main/java/hudson/remoting/Request.java
            src/main/java/hudson/remoting/SingleLaneExecutorService.java
            src/main/java/hudson/remoting/SynchronousExecutorService.java
            http://jenkins-ci.org/commit/remoting/75fb3559e354418000760573eac19efcd86c267f
            Log:
            JENKINS-21251 Initial calculation of wait time was susseptable to
            clock drift.

            The calculation on how long to wait in various places on the code was
            suseptable to the clock changing as multiple calls where made to
            System.currentTimeMillis() for the initial calculation.

            Changed this so that we use a single call to System.nanoTime and made just
            a single call for the initial calculation.

            There is still a potential issue for any callers of Channel.getLastHeard()
            this will be addressed in a future change and has only been noted in this
            commit.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: James Nord Path: src/main/java/hudson/remoting/AtmostOneThreadExecutor.java src/main/java/hudson/remoting/Channel.java src/main/java/hudson/remoting/PingThread.java src/main/java/hudson/remoting/Request.java src/main/java/hudson/remoting/SingleLaneExecutorService.java src/main/java/hudson/remoting/SynchronousExecutorService.java http://jenkins-ci.org/commit/remoting/75fb3559e354418000760573eac19efcd86c267f Log: JENKINS-21251 Initial calculation of wait time was susseptable to clock drift. The calculation on how long to wait in various places on the code was suseptable to the clock changing as multiple calls where made to System.currentTimeMillis() for the initial calculation. Changed this so that we use a single call to System.nanoTime and made just a single call for the initial calculation. There is still a potential issue for any callers of Channel.getLastHeard() this will be addressed in a future change and has only been noted in this commit.
            Hide
            teilo James Nord added a comment -

            lacostej -
            if you use ntp deamon then probably not as it corrects the clock over a longer time.
            if you use cron and ntpdate then yes.

            Show
            teilo James Nord added a comment - lacostej - if you use ntp deamon then probably not as it corrects the clock over a longer time. if you use cron and ntpdate then yes.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: James Nord
            Path:
            src/main/java/hudson/remoting/AtmostOneThreadExecutor.java
            src/main/java/hudson/remoting/Channel.java
            src/main/java/hudson/remoting/PingThread.java
            src/main/java/hudson/remoting/Request.java
            src/main/java/hudson/remoting/SingleLaneExecutorService.java
            src/main/java/hudson/remoting/SynchronousExecutorService.java
            http://jenkins-ci.org/commit/remoting/7f35e250f5d535a9dbe16e67a47443aab568e7c4
            Log:
            Merge pull request #33 from jenkinsci/JENKINS-21251_take2

            JENKINS-21251 Slave channel timeout seemingly caused by usage of System.currentTimeMillis()

            Compare: https://github.com/jenkinsci/remoting/compare/c4df5cb9c458...7f35e250f5d5

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: James Nord Path: src/main/java/hudson/remoting/AtmostOneThreadExecutor.java src/main/java/hudson/remoting/Channel.java src/main/java/hudson/remoting/PingThread.java src/main/java/hudson/remoting/Request.java src/main/java/hudson/remoting/SingleLaneExecutorService.java src/main/java/hudson/remoting/SynchronousExecutorService.java http://jenkins-ci.org/commit/remoting/7f35e250f5d535a9dbe16e67a47443aab568e7c4 Log: Merge pull request #33 from jenkinsci/ JENKINS-21251 _take2 JENKINS-21251 Slave channel timeout seemingly caused by usage of System.currentTimeMillis() Compare: https://github.com/jenkinsci/remoting/compare/c4df5cb9c458...7f35e250f5d5
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            changelog.html
            core/pom.xml
            pom.xml
            http://jenkins-ci.org/commit/jenkins/72e0b9c53a15306e0a7b7117100f4d1b7ec250b8
            Log:
            [FIXED JENKINS-6167 JENKINS-21251 JENKINS-21160]

            Integrated a newer version of remoting & args4j

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html core/pom.xml pom.xml http://jenkins-ci.org/commit/jenkins/72e0b9c53a15306e0a7b7117100f4d1b7ec250b8 Log: [FIXED JENKINS-6167 JENKINS-21251 JENKINS-21160] Integrated a newer version of remoting & args4j
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Moving to Resolved since there is a "take2". Please reopen if the issue still happens

            Show
            oleg_nenashev Oleg Nenashev added a comment - Moving to Resolved since there is a "take2". Please reopen if the issue still happens

              People

              • Assignee:
                teilo James Nord
                Reporter:
                benc666 Ben Combrink
              • Votes:
                3 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: