Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-19619

Collecting findbugs analysis results occasionally causes ssh slave to go offline causing job to abort

    Details

    • Similar Issues:

      Description

      Collecting findbugs analysis results occasionally causes ssh slave to go offline causing job to abort

      Some of our legacy builds have a large number of Findbugs warnings.
      Parsing these results in the post build action sometimes causes the slave to go offline, and jenkins quickly brings the slave back online but it causes the job to have failed.

      I notice that findbugs outputs a lot of information to the slave.log on the jenkins master
      By the time I check, the slave.log has rotated to slave.log.1 but a new slave.log has not been created.

      From tailing slave.log on Jenkins master:

      Sep 17, 2013 11:52:39 AM hudson.plugins.findbugs.parser.FindBugsParser findSourceFile
      WARNING: Can't resolve absolute file name for file CallbackInterceptorConfigurer.java, dir list = [/tr/j/jh/workspace/tws_trunk_nightly_build/com.aepona.tws.build/build/test-reports/findbugs.xml/src/main/java, /tr/j/jh/workspace/tws_trunk_nightly_build/com.aepona.tws.build/build/test-reports/findbugs.xml/src/test/java, /tr/j/jh/workspace/tws_trunk_nightly_build/com.aepona.tws.build/build/test-reports/findbugs.xml/src]
      tail: `slave.log' has become inaccessible: No such file or directory
      

      list of slave logs: notice how slave.log no longer exists
      slave.log has rolled over to slave.log.1

      $ ll -h
      total 764K
      -rw-r--r-- 1 rcbuild_user cs_sl025 461K Sep 17 11:52 slave.log.1
      -rw-r--r-- 1 rcbuild_user cs_sl025 3.1K Sep 11 10:28 slave.log.10
      -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 17 08:42 slave.log.2
      -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 17 08:03 slave.log.3
      -rw-r--r-- 1 rcbuild_user cs_sl025 2.8K Sep 17 07:53 slave.log.4
      -rw-r--r-- 1 rcbuild_user cs_sl025 267K Sep 13 10:33 slave.log.5
      -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 12 10:03 slave.log.6
      -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 12 09:37 slave.log.7
      

      From Jenkins log:

      Sep 17, 2013 11:52:38 AM hudson.model.Run execute
      INFO: tws_trunk_nightly_build #306 main build action completed: SUCCESS
      Sep 17, 2013 11:52:39 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
      SEVERE: I/O error in channel Neshi
      java.io.IOException: Unexpected termination of the channel
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      Caused by: java.io.EOFException
              at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
              at hudson.remoting.Command.readFrom(Command.java:92)
              at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      
      Sep 17, 2013 11:52:39 AM hudson.model.AbstractBuild$AbstractBuildExecution performAllBuildSteps
      WARNING: Publisher hudson.plugins.findbugs.FindBugsPublisher aborted due to exception
      hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
              at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
              at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
              at hudson.remoting.Request.call(Request.java:174)
              at hudson.remoting.Channel.call(Channel.java:714)
              at hudson.FilePath.act(FilePath.java:898)
              at hudson.FilePath.act(FilePath.java:882)
              at hudson.plugins.findbugs.FindBugsPublisher.perform(FindBugsPublisher.java:161)
              at hudson.plugins.analysis.core.HealthAwarePublisher.perform(HealthAwarePublisher.java:144)
              at hudson.plugins.analysis.core.HealthAwareRecorder.perform(HealthAwareRecorder.java:333)
              at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
              at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
              at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:754)
              at hudson.model.Build$BuildExecution.post2(Build.java:183)
              at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:707)
              at hudson.model.Run.execute(Run.java:1629)
              at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
              at hudson.model.ResourceController.execute(ResourceController.java:88)
              at hudson.model.Executor.run(Executor.java:246)
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
              at hudson.remoting.Request.abort(Request.java:299)
              at hudson.remoting.Channel.terminate(Channel.java:774)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
      Caused by: java.io.IOException: Unexpected termination of the channel
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      Caused by: java.io.EOFException
              at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
              at hudson.remoting.Command.readFrom(Command.java:92)
              at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      

      Slave Reconnect from jenkins.log

      Sep 17, 2013 11:53:09 AM hudson.slaves.SlaveComputer tryReconnect
      INFO: Attempting to reconnect xxx
      

      Log from build:

      11:52:38 BUILD SUCCESSFUL
      11:52:38 Total time: 8 seconds
      11:52:38 [FINDBUGS] Collecting findbugs analysis files...
      11:52:39 ERROR: Publisher hudson.plugins.findbugs.FindBugsPublisher aborted due to exception
      11:52:39 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      11:52:39 	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
      11:52:39 	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
      11:52:39 	at hudson.remoting.Request.call(Request.java:174)
      11:52:39 	at hudson.remoting.Channel.call(Channel.java:714)
      11:52:39 	at hudson.FilePath.act(FilePath.java:898)
      11:52:39 	at hudson.FilePath.act(FilePath.java:882)
      11:52:39 	at hudson.plugins.findbugs.FindBugsPublisher.perform(FindBugsPublisher.java:161)
      11:52:39 	at hudson.plugins.analysis.core.HealthAwarePublisher.perform(HealthAwarePublisher.java:144)
      11:52:39 	at hudson.plugins.analysis.core.HealthAwareRecorder.perform(HealthAwareRecorder.java:333)
      11:52:39 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
      11:52:39 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
      11:52:39 	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:754)
      11:52:39 	at hudson.model.Build$BuildExecution.post2(Build.java:183)
      11:52:39 	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:707)
      11:52:39 	at hudson.model.Run.execute(Run.java:1629)
      11:52:39 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      11:52:39 	at hudson.model.ResourceController.execute(ResourceController.java:88)
      11:52:39 	at hudson.model.Executor.run(Executor.java:246)
      11:52:39 Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      11:52:39 	at hudson.remoting.Request.abort(Request.java:299)
      11:52:39 	at hudson.remoting.Channel.terminate(Channel.java:774)
      11:52:39 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
      11:52:39 Caused by: java.io.IOException: Unexpected termination of the channel
      11:52:39 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      11:52:39 Caused by: java.io.EOFException
      11:52:39 	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
      11:52:39 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
      11:52:39 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
      11:52:39 	at hudson.remoting.Command.readFrom(Command.java:92)
      11:52:39 	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
      11:52:39 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      11:52:39 [PMD] Skipping publisher since build result is FAILURE
      

        Attachments

          Issue Links

            Activity

            gcummings Geoff Cummings created issue -
            gcummings Geoff Cummings made changes -
            Field Original Value New Value
            Link This issue is related to JENKINS-18879 [ JENKINS-18879 ]
            gcummings Geoff Cummings made changes -
            Link This issue is related to JENKINS-18836 [ JENKINS-18836 ]
            Hide
            gcummings Geoff Cummings added a comment -

            For some of our jos the following logging in FindBugsParser.java causes a lot of logging in the slave.log on the jenkins master.

            Logger.getLogger(getClass().getName()).log(Level.WARNING,
                "Can't resolve absolute file name for file " + sourceLine.getSourceFile()
                + ", dir list = " + project.getSourceDirList().toString());
            

            This logging eventually causes problems with Jenkins causing the ssh slave to be disconnected.
            Not sure if it is related to slave.log size, or if it may be the large amount of logging being transferred causes issues with the jenkins master slave communication>

            This may be a core issue around the slave.log, but changing this log level to FINE does not result in the log entries being recorded in the slave.log
            As a result we no longer experience the errors reported and the slave does not get disconnected.

            Show
            gcummings Geoff Cummings added a comment - For some of our jos the following logging in FindBugsParser.java causes a lot of logging in the slave.log on the jenkins master. Logger.getLogger(getClass().getName()).log(Level.WARNING, "Can't resolve absolute file name for file " + sourceLine.getSourceFile() + ", dir list = " + project.getSourceDirList().toString()); This logging eventually causes problems with Jenkins causing the ssh slave to be disconnected. Not sure if it is related to slave.log size, or if it may be the large amount of logging being transferred causes issues with the jenkins master slave communication> This may be a core issue around the slave.log, but changing this log level to FINE does not result in the log entries being recorded in the slave.log As a result we no longer experience the errors reported and the slave does not get disconnected.
            Hide
            drulli Ulli Hafner added a comment -

            Can you please check why FindBugs can't resolve the file names? Normally, FindBugs reports the absolute paths so that the files will be found. How is FindBugs started? I think this would help to track down the problem...

            Show
            drulli Ulli Hafner added a comment - Can you please check why FindBugs can't resolve the file names? Normally, FindBugs reports the absolute paths so that the files will be found. How is FindBugs started? I think this would help to track down the problem...
            Hide
            gcummings Geoff Cummings added a comment -

            First Project which causes this.

            hmm.. it's an old project that uses ant, so I had to go have a look around... and I'm Not overly familiar with how findbugs works with ant.

            The workspace has ~400 projects which each create a jar file.
            The ant build script builds these each project and copies the jars to a build folder and later runs findbugs on the jars.

            workspace/
             - build_project
               - build 
                 - test_reports
                   - findbugs.xml  (findbugs report)
                 - project_1.jar
                 - project_2.jar
             - project_1
             - project_2
             - project_3
            

            looking at the findbugs.xml results, the sourcepath is relative and not absolute.
            so for a file that is in project_1, the sourcepath would be com/company/xxx/MyJavaFile.java
            This causes the plugin to try looking in
            $workspace/build_project/build/test-reports/findbugs.xml/src/test/java

            The file is really in
            $workspace/project_1/src/test/java

            i'll try and get time to get it working correctly...

            Show
            gcummings Geoff Cummings added a comment - First Project which causes this. hmm.. it's an old project that uses ant, so I had to go have a look around... and I'm Not overly familiar with how findbugs works with ant. The workspace has ~400 projects which each create a jar file. The ant build script builds these each project and copies the jars to a build folder and later runs findbugs on the jars. workspace/ - build_project - build - test_reports - findbugs.xml (findbugs report) - project_1.jar - project_2.jar - project_1 - project_2 - project_3 looking at the findbugs.xml results, the sourcepath is relative and not absolute. so for a file that is in project_1, the sourcepath would be com/company/xxx/MyJavaFile.java This causes the plugin to try looking in $workspace/build_project/build/test-reports/findbugs.xml/src/test/java The file is really in $workspace/project_1/src/test/java i'll try and get time to get it working correctly...
            Hide
            gcummings Geoff Cummings added a comment -

            The other project which causes this is giving the following error:

            WARNING: Can't resolve absolute file name for file <Unknown>

            I'll need to spend some time looking into this one in more detail.

            Show
            gcummings Geoff Cummings added a comment - The other project which causes this is giving the following error: WARNING: Can't resolve absolute file name for file <Unknown> I'll need to spend some time looking into this one in more detail.
            Hide
            gcummings Geoff Cummings added a comment -

            This is an old project and it just means I cannot see the findbugs errors in context of the java file, so not a high priority.
            More concerned that it causes the slave to go offline and fail the build.

            main problem might be how jenkins core manages the slave logs.
            changing the logging level just stops findbugs plugin from triggering the core issue...

            Show
            gcummings Geoff Cummings added a comment - This is an old project and it just means I cannot see the findbugs errors in context of the java file, so not a high priority. More concerned that it causes the slave to go offline and fail the build. main problem might be how jenkins core manages the slave logs. changing the logging level just stops findbugs plugin from triggering the core issue...
            Hide
            drulli Ulli Hafner added a comment -

            I see. In the first project, it is clear that the file resolution does not work. Here it could help, that in the ant script the source path is provided for findbugs. I'm using that information afterwards to resolve the file names.

            But you are right, the actual problem still remains. I only suggested the workaround since I can't change the affected code in Jenkins core (or in the ssh plug-in). I already posted a question in the dev list but got no answer so far from the developers. (Actually I think the only one who can help here is Kohsuke since he wrote most of that code). So if there is anything I can do in my plug-in, let me know. Otherwise we need to wait until someone else has the time to look into that part of Jenkins.

            Show
            drulli Ulli Hafner added a comment - I see. In the first project, it is clear that the file resolution does not work. Here it could help, that in the ant script the source path is provided for findbugs. I'm using that information afterwards to resolve the file names. But you are right, the actual problem still remains. I only suggested the workaround since I can't change the affected code in Jenkins core (or in the ssh plug-in). I already posted a question in the dev list but got no answer so far from the developers. (Actually I think the only one who can help here is Kohsuke since he wrote most of that code). So if there is anything I can do in my plug-in, let me know. Otherwise we need to wait until someone else has the time to look into that part of Jenkins.
            drulli Ulli Hafner made changes -
            Description
            Collecting findbugs analysis results occasionally causes ssh slave to go offline causing job to abort

            Some of our legacy builds have a large number of Findbugs warnings.
            Parsing these results in the post build action sometimes causes the slave to go offline, and jenkins quickly brings the slave back online but it causes the job to have failed.

            I notice that findbugs outputs a lot of information to the slave.log on the jenkins master
            By the time I check, the slave.log has rotated to slave.log.1 but a new slave.log has not been created.




            From tailing slave.log on Jenkins master:
            {code}
            Sep 17, 2013 11:52:39 AM hudson.plugins.findbugs.parser.FindBugsParser findSourceFile
            WARNING: Can't resolve absolute file name for file CallbackInterceptorConfigurer.java, dir list = [/tr/j/jh/workspace/tws_trunk_nightly_build/com.aepona.tws.build/build/test-reports/findbugs.xml/src/main/java, /tr/j/jh/workspace/tws_trunk_nightly_build/com.aepona.tws.build/build/test-reports/findbugs.xml/src/test/java, /tr/j/jh/workspace/tws_trunk_nightly_build/com.aepona.tws.build/build/test-reports/findbugs.xml/src]
            tail: `slave.log' has become inaccessible: No such file or directory
            {code}

            list of slave logs: notice how slave.log no longer exists
            slave.log has rolled over to slave.log.1
            {code}
            $ ll -h
            total 764K
            -rw-r--r-- 1 rcbuild_user cs_sl025 461K Sep 17 11:52 slave.log.1
            -rw-r--r-- 1 rcbuild_user cs_sl025 3.1K Sep 11 10:28 slave.log.10
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 17 08:42 slave.log.2
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 17 08:03 slave.log.3
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.8K Sep 17 07:53 slave.log.4
            -rw-r--r-- 1 rcbuild_user cs_sl025 267K Sep 13 10:33 slave.log.5
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 12 10:03 slave.log.6
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 12 09:37 slave.log.7
            {code}



            From Jenkins log:
            {code}
            Sep 17, 2013 11:52:38 AM hudson.model.Run execute
            INFO: tws_trunk_nightly_build #306 main build action completed: SUCCESS
            Sep 17, 2013 11:52:39 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
            SEVERE: I/O error in channel Neshi
            java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
                    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
                    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
                    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
                    at hudson.remoting.Command.readFrom(Command.java:92)
                    at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

            Sep 17, 2013 11:52:39 AM hudson.model.AbstractBuild$AbstractBuildExecution performAllBuildSteps
            WARNING: Publisher hudson.plugins.findbugs.FindBugsPublisher aborted due to exception
            hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
                    at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
                    at hudson.remoting.Request.call(Request.java:174)
                    at hudson.remoting.Channel.call(Channel.java:714)
                    at hudson.FilePath.act(FilePath.java:898)
                    at hudson.FilePath.act(FilePath.java:882)
                    at hudson.plugins.findbugs.FindBugsPublisher.perform(FindBugsPublisher.java:161)
                    at hudson.plugins.analysis.core.HealthAwarePublisher.perform(HealthAwarePublisher.java:144)
                    at hudson.plugins.analysis.core.HealthAwareRecorder.perform(HealthAwareRecorder.java:333)
                    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
                    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
                    at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:754)
                    at hudson.model.Build$BuildExecution.post2(Build.java:183)
                    at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:707)
                    at hudson.model.Run.execute(Run.java:1629)
                    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
                    at hudson.model.ResourceController.execute(ResourceController.java:88)
                    at hudson.model.Executor.run(Executor.java:246)
            Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.Request.abort(Request.java:299)
                    at hudson.remoting.Channel.terminate(Channel.java:774)
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
            Caused by: java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
                    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
                    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
                    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
                    at hudson.remoting.Command.readFrom(Command.java:92)
                    at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            {code}

            Slave Reconnect from jenkins.log
            {code}
            Sep 17, 2013 11:53:09 AM hudson.slaves.SlaveComputer tryReconnect
            INFO: Attempting to reconnect xxx
            {code}

            Log from build:
            {code}
            11:52:38 BUILD SUCCESSFUL
            11:52:38 Total time: 8 seconds
            11:52:38 [FINDBUGS] Collecting findbugs analysis files...
            11:52:39 ERROR: Publisher hudson.plugins.findbugs.FindBugsPublisher aborted due to exception
            11:52:39 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            11:52:39 at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
            11:52:39 at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
            11:52:39 at hudson.remoting.Request.call(Request.java:174)
            11:52:39 at hudson.remoting.Channel.call(Channel.java:714)
            11:52:39 at hudson.FilePath.act(FilePath.java:898)
            11:52:39 at hudson.FilePath.act(FilePath.java:882)
            11:52:39 at hudson.plugins.findbugs.FindBugsPublisher.perform(FindBugsPublisher.java:161)
            11:52:39 at hudson.plugins.analysis.core.HealthAwarePublisher.perform(HealthAwarePublisher.java:144)
            11:52:39 at hudson.plugins.analysis.core.HealthAwareRecorder.perform(HealthAwareRecorder.java:333)
            11:52:39 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
            11:52:39 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
            11:52:39 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:754)
            11:52:39 at hudson.model.Build$BuildExecution.post2(Build.java:183)
            11:52:39 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:707)
            11:52:39 at hudson.model.Run.execute(Run.java:1629)
            11:52:39 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            11:52:39 at hudson.model.ResourceController.execute(ResourceController.java:88)
            11:52:39 at hudson.model.Executor.run(Executor.java:246)
            11:52:39 Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            11:52:39 at hudson.remoting.Request.abort(Request.java:299)
            11:52:39 at hudson.remoting.Channel.terminate(Channel.java:774)
            11:52:39 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
            11:52:39 Caused by: java.io.IOException: Unexpected termination of the channel
            11:52:39 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            11:52:39 Caused by: java.io.EOFException
            11:52:39 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
            11:52:39 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
            11:52:39 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
            11:52:39 at hudson.remoting.Command.readFrom(Command.java:92)
            11:52:39 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
            11:52:39 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            11:52:39 [PMD] Skipping publisher since build result is FAILURE
            {code}
            Collecting findbugs analysis results occasionally causes ssh slave to go offline causing job to abort

            Some of our legacy builds have a large number of Findbugs warnings.
            Parsing these results in the post build action sometimes causes the slave to go offline, and jenkins quickly brings the slave back online but it causes the job to have failed.

            I notice that findbugs outputs a lot of information to the slave.log on the jenkins master
            By the time I check, the slave.log has rotated to slave.log.1 but a new slave.log has not been created.




            From tailing slave.log on Jenkins master:
            {code}
            Sep 17, 2013 11:52:39 AM hudson.plugins.findbugs.parser.FindBugsParser findSourceFile
            WARNING: Can't resolve absolute file name for file CallbackInterceptorConfigurer.java, dir list = [/tr/j/jh/workspace/tws_trunk_nightly_build/com.aepona.tws.build/build/test-reports/findbugs.xml/src/main/java, /tr/j/jh/workspace/tws_trunk_nightly_build/com.aepona.tws.build/build/test-reports/findbugs.xml/src/test/java, /tr/j/jh/workspace/tws_trunk_nightly_build/com.aepona.tws.build/build/test-reports/findbugs.xml/src]
            tail: `slave.log' has become inaccessible: No such file or directory
            {code}

            list of slave logs: notice how slave.log no longer exists
            slave.log has rolled over to slave.log.1
            {code}
            $ ll -h
            total 764K
            -rw-r--r-- 1 rcbuild_user cs_sl025 461K Sep 17 11:52 slave.log.1
            -rw-r--r-- 1 rcbuild_user cs_sl025 3.1K Sep 11 10:28 slave.log.10
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 17 08:42 slave.log.2
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 17 08:03 slave.log.3
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.8K Sep 17 07:53 slave.log.4
            -rw-r--r-- 1 rcbuild_user cs_sl025 267K Sep 13 10:33 slave.log.5
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 12 10:03 slave.log.6
            -rw-r--r-- 1 rcbuild_user cs_sl025 2.6K Sep 12 09:37 slave.log.7
            {code}



            From Jenkins log:
            {code}
            Sep 17, 2013 11:52:38 AM hudson.model.Run execute
            INFO: tws_trunk_nightly_build #306 main build action completed: SUCCESS
            Sep 17, 2013 11:52:39 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
            SEVERE: I/O error in channel Neshi
            java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
                    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
                    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
                    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
                    at hudson.remoting.Command.readFrom(Command.java:92)
                    at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

            Sep 17, 2013 11:52:39 AM hudson.model.AbstractBuild$AbstractBuildExecution performAllBuildSteps
            WARNING: Publisher hudson.plugins.findbugs.FindBugsPublisher aborted due to exception
            hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
                    at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
                    at hudson.remoting.Request.call(Request.java:174)
                    at hudson.remoting.Channel.call(Channel.java:714)
                    at hudson.FilePath.act(FilePath.java:898)
                    at hudson.FilePath.act(FilePath.java:882)
                    at hudson.plugins.findbugs.FindBugsPublisher.perform(FindBugsPublisher.java:161)
                    at hudson.plugins.analysis.core.HealthAwarePublisher.perform(HealthAwarePublisher.java:144)
                    at hudson.plugins.analysis.core.HealthAwareRecorder.perform(HealthAwareRecorder.java:333)
                    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
                    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
                    at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:754)
                    at hudson.model.Build$BuildExecution.post2(Build.java:183)
                    at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:707)
                    at hudson.model.Run.execute(Run.java:1629)
                    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
                    at hudson.model.ResourceController.execute(ResourceController.java:88)
                    at hudson.model.Executor.run(Executor.java:246)
            Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.Request.abort(Request.java:299)
                    at hudson.remoting.Channel.terminate(Channel.java:774)
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
            Caused by: java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
                    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
                    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
                    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
                    at hudson.remoting.Command.readFrom(Command.java:92)
                    at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            {code}

            Slave Reconnect from jenkins.log
            {code}
            Sep 17, 2013 11:53:09 AM hudson.slaves.SlaveComputer tryReconnect
            INFO: Attempting to reconnect xxx
            {code}

            Log from build:
            {code}
            11:52:38 BUILD SUCCESSFUL
            11:52:38 Total time: 8 seconds
            11:52:38 [FINDBUGS] Collecting findbugs analysis files...
            11:52:39 ERROR: Publisher hudson.plugins.findbugs.FindBugsPublisher aborted due to exception
            11:52:39 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            11:52:39 at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
            11:52:39 at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
            11:52:39 at hudson.remoting.Request.call(Request.java:174)
            11:52:39 at hudson.remoting.Channel.call(Channel.java:714)
            11:52:39 at hudson.FilePath.act(FilePath.java:898)
            11:52:39 at hudson.FilePath.act(FilePath.java:882)
            11:52:39 at hudson.plugins.findbugs.FindBugsPublisher.perform(FindBugsPublisher.java:161)
            11:52:39 at hudson.plugins.analysis.core.HealthAwarePublisher.perform(HealthAwarePublisher.java:144)
            11:52:39 at hudson.plugins.analysis.core.HealthAwareRecorder.perform(HealthAwareRecorder.java:333)
            11:52:39 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
            11:52:39 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
            11:52:39 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:754)
            11:52:39 at hudson.model.Build$BuildExecution.post2(Build.java:183)
            11:52:39 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:707)
            11:52:39 at hudson.model.Run.execute(Run.java:1629)
            11:52:39 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            11:52:39 at hudson.model.ResourceController.execute(ResourceController.java:88)
            11:52:39 at hudson.model.Executor.run(Executor.java:246)
            11:52:39 Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            11:52:39 at hudson.remoting.Request.abort(Request.java:299)
            11:52:39 at hudson.remoting.Channel.terminate(Channel.java:774)
            11:52:39 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
            11:52:39 Caused by: java.io.IOException: Unexpected termination of the channel
            11:52:39 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            11:52:39 Caused by: java.io.EOFException
            11:52:39 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
            11:52:39 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
            11:52:39 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
            11:52:39 at hudson.remoting.Command.readFrom(Command.java:92)
            11:52:39 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
            11:52:39 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            11:52:39 [PMD] Skipping publisher since build result is FAILURE
            {code}
            Component/s core [ 15593 ]
            gcummings Geoff Cummings made changes -
            Link This issue is related to JENKINS-19026 [ JENKINS-19026 ]
            Hide
            gcummings Geoff Cummings added a comment -

            Thanks Ulli, I have built a copy of the findbugs plugin with that logging set to FINE to work around it for now.
            Better to get it fixed in core.

            Show
            gcummings Geoff Cummings added a comment - Thanks Ulli, I have built a copy of the findbugs plugin with that logging set to FINE to work around it for now. Better to get it fixed in core.
            stephenconnolly Stephen Connolly made changes -
            Link This issue duplicates JENKINS-18879 [ JENKINS-18879 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            src/com/trilead/ssh2/channel/Channel.java
            http://jenkins-ci.org/commit/trilead-ssh2/f1353cc0e0aa1b1e6bc845236e4a2530ea3103fd
            Log:
            [FIXED JENKINS-18836][FIXED JENKINS-18879][FIXED JENKINS-19619] remove double call of freeupWindow(len); when using ssh-slaves 0.27+

            • the more performant code path is only followed when using SSH Slaves 0.27+
            • the double call causes the channel to get torn down
            • thus excessive logging to stderr on the slave side of the connection will cause the connection to tear down
            • removing the duplicate call resolves the issue
            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/com/trilead/ssh2/channel/Channel.java http://jenkins-ci.org/commit/trilead-ssh2/f1353cc0e0aa1b1e6bc845236e4a2530ea3103fd Log: [FIXED JENKINS-18836] [FIXED JENKINS-18879] [FIXED JENKINS-19619] remove double call of freeupWindow(len); when using ssh-slaves 0.27+ the more performant code path is only followed when using SSH Slaves 0.27+ the double call causes the channel to get torn down thus excessive logging to stderr on the slave side of the connection will cause the connection to tear down removing the duplicate call resolves the issue
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            changelog.html
            core/pom.xml
            http://jenkins-ci.org/commit/jenkins/bb265c5e95b0fe39128720b903914236962db41b
            Log:
            [FIXED JENKINS-18836][FIXED JENKINS-18879][FIXED JENKINS-19619] Upgrade trilead-ssh to version with the fix

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: changelog.html core/pom.xml http://jenkins-ci.org/commit/jenkins/bb265c5e95b0fe39128720b903914236962db41b Log: [FIXED JENKINS-18836] [FIXED JENKINS-18879] [FIXED JENKINS-19619] Upgrade trilead-ssh to version with the fix
            Hide
            stephenconnolly Stephen Connolly added a comment -

            Fixed in Jenkins core 1.536

            Show
            stephenconnolly Stephen Connolly added a comment - Fixed in Jenkins core 1.536
            stephenconnolly Stephen Connolly made changes -
            Status Open [ 1 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Hide
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #2938
            [FIXED JENKINS-18836][FIXED JENKINS-18879][FIXED JENKINS-19619] Upgrade trilead-ssh to version with the fix (Revision bb265c5e95b0fe39128720b903914236962db41b)

            Result = UNSTABLE
            Stephen Connolly : bb265c5e95b0fe39128720b903914236962db41b
            Files :

            • changelog.html
            • core/pom.xml
            Show
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #2938 [FIXED JENKINS-18836] [FIXED JENKINS-18879] [FIXED JENKINS-19619] Upgrade trilead-ssh to version with the fix (Revision bb265c5e95b0fe39128720b903914236962db41b) Result = UNSTABLE Stephen Connolly : bb265c5e95b0fe39128720b903914236962db41b Files : changelog.html core/pom.xml
            Hide
            drulli Ulli Hafner added a comment -

            Thanks for fixing that Stephen!

            Show
            drulli Ulli Hafner added a comment - Thanks for fixing that Stephen!
            stephenconnolly Stephen Connolly made changes -
            Labels lts-candidate
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/pom.xml
            http://jenkins-ci.org/commit/jenkins/1bb06ada301496ebed6d212188d1b7c9d006317b
            Log:
            [FIXED JENKINS-18836][FIXED JENKINS-18879][FIXED JENKINS-19619] Upgrade trilead-ssh to version with the fix

            (cherry picked from commit bb265c5e95b0fe39128720b903914236962db41b)

            Conflicts:
            changelog.html

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/pom.xml http://jenkins-ci.org/commit/jenkins/1bb06ada301496ebed6d212188d1b7c9d006317b Log: [FIXED JENKINS-18836] [FIXED JENKINS-18879] [FIXED JENKINS-19619] Upgrade trilead-ssh to version with the fix (cherry picked from commit bb265c5e95b0fe39128720b903914236962db41b) Conflicts: changelog.html
            olivergondza Oliver Gondža made changes -
            Labels lts-candidate 1.532.1-fixed
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            src/com/trilead/ssh2/channel/Channel.java
            http://jenkins-ci.org/commit/trilead-ssh2/5811ddd7ae15670a4f9ad345352613b3f2f2db97
            Log:
            JENKINS-22938 SSH slave connections die after the slave outputs 4MB of stderr, usually during findbugs analysis

            The fix for JENKINS-18836, JENKINS-18879, JENKINS-19619 was incorrect in its analysis.

            • There is no call to getChannelData() on the new code path, so thus you cannot have two calls of freeupWindow()
            • The problem with the original call to freeupWindow() is that it is on the receiver thread. You should not mix the responsibilities. Blocking the receiver thread to send a message will negatively impact performance and connection stability.
            • The correct solution is to push the freeupWindow onto the async queue thus the ACK gets sent and the purity of the receiving thread can be maintained.
            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/com/trilead/ssh2/channel/Channel.java http://jenkins-ci.org/commit/trilead-ssh2/5811ddd7ae15670a4f9ad345352613b3f2f2db97 Log: JENKINS-22938 SSH slave connections die after the slave outputs 4MB of stderr, usually during findbugs analysis The fix for JENKINS-18836 , JENKINS-18879 , JENKINS-19619 was incorrect in its analysis. There is no call to getChannelData() on the new code path, so thus you cannot have two calls of freeupWindow() The problem with the original call to freeupWindow() is that it is on the receiver thread. You should not mix the responsibilities. Blocking the receiver thread to send a message will negatively impact performance and connection stability. The correct solution is to push the freeupWindow onto the async queue thus the ACK gets sent and the purity of the receiving thread can be maintained.
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 151136 ] JNJira + In-Review [ 193800 ]

              People

              • Assignee:
                drulli Ulli Hafner
                Reporter:
                gcummings Geoff Cummings
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: