Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42428

Jenkins master throwing java.io.IOException when running pipeline in swarm client

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: swarm-plugin
    • Labels:
    • Environment:
      Master:
      Jenkins Version : 2.32.2
      Running on Windows Server 2012 R2
      Pipeline: Nodes and Processes 2.10 (works fine in 2.8)

      Client:
      Swarm Client 3.3 on AIX 7.1 / JDK 8
    • Similar Issues:

      Description

      Im trying to run a pipeline job in an agent which is using swarm client. The job runs fine but im getting a lot of error messages in the log like below:

      Cannot contact tst_db2: java.io.IOException: Remote call on Channel to /XX.XX.XX.XXX failed
      

      (actual IP address replaced with XX)

      In my observation the master is throwing this errors while waiting for the script that is running in the client. Again, the pipeline job run perfectly except that im getting this error on the pipeline logs.

      Below is my pipeline script:

      pipeline {
          agent none
          stages {
              stage('Recreate DB') {
                  agent { label 'tst_db2'}
                  steps {
                      checkout([$class: 'SubversionSCM', 
                        additionalCredentials: [], 
                        excludedCommitMessages: '', 
                        excludedRegions: '', 
                        excludedRevprop: '', 
                        excludedUsers: '', 
                        filterChangelog: false, 
                        ignoreDirPropChanges: false, 
                        includedRegions: '', 
                        locations: [[credentialsId: 'a84f7197-929a-437e-9aac-ca09fcd4c63a', 
                                     depthOption: 'infinity', 
                                     ignoreExternalsOption: true, 
                                     local: '', 
                                     remote: 'svn://XXXXX/XXX/tags/CR/Rebuild_VCRDWD01']], 
                        workspaceUpdater: [$class: 'CheckoutUpdater']])  
      
                       sh 'Rebuild_VCRDWD01/recreate_db.sh'
                  }
              }       
          }
      }   
      

      Is there anyway we can get rid of this errors?

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            A recent version of workflow-durable-task-step began reporting connectivity errors that had previously been suppressed unless you happened to have a FINE logger on DurableTaskStep. The error is somewhere in the Remoting layer, generally specific to the agent connection method.

            Show
            jglick Jesse Glick added a comment - A recent version of workflow-durable-task-step began reporting connectivity errors that had previously been suppressed unless you happened to have a FINE logger on DurableTaskStep . The error is somewhere in the Remoting layer, generally specific to the agent connection method.
            Hide
            jglick Jesse Glick added a comment -

            Brent Laster

            Even seems to happen in the very simplest test case where you have a Linux slave on the same system as a master (no swarm involved).

            Then you are probably seeing some unrelated issue. Use the logger to diagnose more precisely.

            Show
            jglick Jesse Glick added a comment - Brent Laster Even seems to happen in the very simplest test case where you have a Linux slave on the same system as a master (no swarm involved). Then you are probably seeing some unrelated issue. Use the logger to diagnose more precisely.
            Hide
            bclaster Brent Laster added a comment -

            Jesse Glick So I am getting the same error message repeatedly in the same circumstances of activity on an agent (activity to every agent actually). And this same error message is fixed by doing the same revert of the same plugin.

            That certainly doesn't seem like an unrelated issue. How do I use the logger to diagnose more precisely?

            Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
            Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
            Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
            Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
            Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
            Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
            Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
            Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
            Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed
            Running shell script
            [worker2] + /usr/share/gradle/bin/gradle -D test.single=TestExample3 test
            [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
            [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
            [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
            [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
            [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
            [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
            [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
            [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed

            Show
            bclaster Brent Laster added a comment - Jesse Glick So I am getting the same error message repeatedly in the same circumstances of activity on an agent (activity to every agent actually). And this same error message is fixed by doing the same revert of the same plugin. That certainly doesn't seem like an unrelated issue. How do I use the logger to diagnose more precisely? Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed Cannot contact worker_node1: java.io.IOException: Remote call on worker_node1 failed Running shell script [worker2] + /usr/share/gradle/bin/gradle -D test.single=TestExample3 test [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed [worker2] Cannot contact worker_node2: java.io.IOException: Remote call on worker_node2 failed
            Hide
            jglick Jesse Glick added a comment -

            See logging instructions in my previous comment.

            Show
            jglick Jesse Glick added a comment - See logging instructions in my previous comment.
            Hide
            tparker1 Tammy Parker added a comment - - edited

            I am having a similar problem but seemingly with additional ramifications.  My jenkins master is running on linux with Jenkins version 2.32.3.

            I have a pipeline job based upon the parallel multiple nodes example found at: 

            https://jenkins.io/doc/pipeline/examples/

            // Parallel JNI Build
            if ("${run_jni}" == "true") {
                
            stage ('Run jni builds on each platform') {    
            
            def labels = ['winky', 'harry', 'hagrid', 'lnxec333']
            //def labels = ['winky', 'hannah', 'moss', 'lnxec651']
            def ws_list = ['rm_lnx_86dv', 'rm_win_86dv', 'rm_aix_86dv', 'rm_zlnx_86dv']
            
            Integer i=0
            def builders = [:]
            for ( x in labels ) {
                def label = x
                def ws = ws_list[i]
                builders[label] = {
                    node(label) {
                                    
                        stage ('Checkout the code on ' + label) {
                          if (isUnix()) {
                            checkout([$class: 'RTCScm', avoidUsingToolkit: false, buildTool: '4.0.2 Toolkit', buildType: [buildWorkspace: ws, clearLoadDirectory: true, loadDirectory: SB_ROOT_unix, value: 'buildWorkspace'], credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', overrideGlobal: false, serverURI: 'https://xxxx', timeout: 480])
                          } else {
                              env.SB_ROOT_win = "${SB_ROOT_win}"
                              bat '''rd /s/q %SB_ROOT_win%
                                    exit 0'''
                            checkout([$class: 'RTCScm', avoidUsingToolkit: false, buildTool: '4.0.2 Toolkit', buildType: [buildWorkspace: ws, clearLoadDirectory: false, loadDirectory: SB_ROOT_win, value: 'buildWorkspace'], credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', overrideGlobal: true, serverURI: 'https://xxxx', timeout: 480])
                          }
                        }
                        
                        stage ('Run the build on ' + label) {
                            
                              switch(label) {
                                  case 'hagrid':
                                      load "${SB_ROOT_unix}/env_aix.properties"
                                      withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', passwordVariable: 'rtcPassword', usernameVariable: 'rtcUser')]) {
                                       withEnv(["JAVA_HOME=${env.JAVA_AIX_HOME}","TSM_HOME=${env.TSM_HOME_aix}","ANT_HOME=${env.ANT_HOME_aix}","COMPILER_HOME=${env.COMPILER_HOME_aix}","PATH=${env.JAVA_AIX_HOME}/bin:${env.ANT_HOME_aix}/bin:${env.TSM_HOME_aix}/api/bin:${env.COMPILER_HOME_aix}/bin:${env.PATH}"]) {
                                        sh '''cd ${SB_ROOT_unix}
                                           './gradlew' -PJAVA_HOME=${JAVA_HOME} -PnexusUsername=${nexusUsername} -PnexusUrl=${nexusUrl} -PnexusPassword=$rtcPassword -PPATH=$PATH showEnv rmjni:clean rmjni:build rmjni:upload rmjni:clean'''
            
                                        }
                                      }
                                      break;
                                  case ['winky', 'lnxec651'] :
                                      load "${SB_ROOT_unix}/env_lnx.properties"
                                      withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', passwordVariable: 'rtcPassword', usernameVariable: 'rtcUser')]) {
                                        withEnv(["JAVA_HOME=${env.JAVA_LNX_HOME}","TSM_HOME=${env.TSM_HOME_lnx}","ANT_HOME=${env.ANT_HOME_lnx}","COMPILER_HOME=${env.COMPILER_HOME_lnx}","PATH=${env.JAVA_LNX_HOME}/bin:${env.ANT_HOME_lnx}/bin:${env.TSM_HOME_lnx}/api/bin:${env.COMPILER_HOME_lnx}/bin:${env.PATH}"]) {
                                        sh '''cd ${SB_ROOT_unix}
                                            './gradlew' -PJAVA_HOME=${JAVA_HOME} -PnexusUsername=${nexusUsername} -PnexusUrl=${nexusUrl} -PnexusPassword=$rtcPassword rmjni:clean rmjni:build rmjni:upload rmjni:clean'''
                                        }
                                      }
                                      break;
                                 case 'lnxec333' :
                                      load "${SB_ROOT_unix}/env_zlnx.properties"
                                      withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', passwordVariable: 'rtcPassword', usernameVariable: 'rtcUser')]) {
                                        withEnv(["JAVA_HOME=${env.JAVA_LNX_HOME}","TSM_HOME=${env.TSM_HOME_zlnx}","ANT_HOME=${env.ANT_HOME_zlnx}","COMPILER_HOME=${env.COMPILER_HOME_zlnx}","PATH=${env.JAVA_LNX_HOME}/bin:${env.ANT_HOME_zlnx}/bin:${env.TSM_HOME_zlnx}/api/bin:${env.COMPILER_HOME_zlnx}/bin:${env.PATH}"]) {
                                        sh '''export BUILD_NUMBER=$BUILD_NUMBER
                                            cd ${SB_ROOT_unix}
                                            './gradlew' -PJAVA_HOME=${JAVA_HOME} -PnexusUsername=${nexusUsername} -PnexusUrl=${nexusUrl} -PnexusPassword=$rtcPassword rmjni:clean rmjni:build rmjni:upload rmjni:clean'''
                                        }
                                      }
                                      break;      
                                  case ['hannah', 'harry'] :
                                      env.SB_ROOT_win = "${SB_ROOT_win}"
                                      env.nexusUsername = "${nexusUsername}"
                                      env.nexusUrl = "${nexusUrl}"
                                      load "${SB_ROOT_win}/env_win.properties"
                                      withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00', passwordVariable: 'rtcPassword', usernameVariable: 'rtcUser')]) {
                                          withEnv(["JAVA_HOME=${env.JAVA_WIN_HOME}","TSM_HOME=${env.TSM_HOME_win}","ANT_HOME=${env.ANT_HOME_win}","COMPILER_HOME=${env.COMPILER_HOME_win}","PATH=${env.JAVA_WIN_HOME}\\bin;${env.ANT_HOME_win}\\bin;${env.TSM_HOME_win};${env.COMPILER_HOME_win}\\bin;C:\\UNXTOOLS\\usr\\local\\wbin;c:\\WinZip;c:\\grep;C:\\WINDOWS\\SYSTEM32;C:\\WINDOWS;c:\\WINDOWS\\SYSTEM32\\WBEM;c:\\msvs2008\\Common7\\Tools;c:\\msvs2008\\Common7\\Tools\\Bin;c:\\msvs2008\\vc\\bin;c:\\msvs2008\\common7\\ide;c:\\msvs2008\\common7\\tools;C:\\PROGRA~1\\MIA713~1\\Windows\\v6.0A\\bin"]) {
                                          bat '''cd %SB_ROOT_win%
                                                 gradlew.bat -PnexusUsername=%nexusUsername% -PnexusUrl=%nexusUrl% -PnexusPassword=%rtcPassword% -PCOMPILER_HOME=%COMPILER_HOME% -PPATH=%PATH% rmjni:clean rmjni:build rmjni:upload rmjni:clean'''
                                          }        
                                      }  
                                      break;
                                  default:
                                    echo "do nothing"
                                    break;
                              }
                                         
                        }
                    }
                }
                i++
            }
            
            parallel builders
            
            }
            }
            

            The nodes are all different platforms ( windows, linux, aix and s390 linux ) and things were working fine when I was using a set of nodes that were set up for building a prior release of our product.  So far, I have set up new machines/nodes for windows, aix and s390 linux.
            When I use those new nodes, aix and s390 linux have the problem described in this issue which, in itself is not that bad, but these same nodes are also somehow forgetting/losing the Build Number during the course of their build.  This causes an issue because I use the Build Number as part of the artifact name that gets uploaded to our nexus repository and it fails as it thinks I am trying to update a previous artifact...
            If I reboot these systems, then the build will pass ( it won't 'lose' the Build Number ) but once I run a subsequent one, it breaks again with the same issue.

            I am currently using Pipeline: Nodes and Process 2.10; I can try downgrading to 2.8...  It is just strange that this
            only is an issue for my new nodes.  I do see that the older and new nodes are using the same version of the slave.jar (3.4.1).
            I suspect it must be some configuration issue that I am missing.

            Show
            tparker1 Tammy Parker added a comment - - edited I am having a similar problem but seemingly with additional ramifications.  My jenkins master is running on linux with Jenkins version 2.32.3. I have a pipeline job based upon the parallel multiple nodes example found at:  https://jenkins.io/doc/pipeline/examples/ // Parallel JNI Build if ( "${run_jni}" == " true " ) {      stage ( 'Run jni builds on each platform' ) {     def labels = [ 'winky' , 'harry' , 'hagrid' , 'lnxec333' ] //def labels = [ 'winky' , 'hannah' , 'moss' , 'lnxec651' ] def ws_list = [ 'rm_lnx_86dv' , 'rm_win_86dv' , 'rm_aix_86dv' , 'rm_zlnx_86dv' ] Integer i=0 def builders = [:] for ( x in labels ) {     def label = x     def ws = ws_list[i]     builders[label] = {         node(label) {                                      stage ( 'Checkout the code on ' + label) {               if (isUnix()) {                 checkout([$class: 'RTCScm' , avoidUsingToolkit: false , buildTool: '4.0.2 Toolkit' , buildType: [buildWorkspace: ws, clearLoadDirectory: true , loadDirectory: SB_ROOT_unix, value: 'buildWorkspace' ], credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00' , overrideGlobal: false , serverURI: 'https: //xxxx' , timeout: 480])               } else {                   env.SB_ROOT_win = "${SB_ROOT_win}"                   bat '''rd /s/q %SB_ROOT_win%                         exit 0'''                 checkout([$class: 'RTCScm' , avoidUsingToolkit: false , buildTool: '4.0.2 Toolkit' , buildType: [buildWorkspace: ws, clearLoadDirectory: false , loadDirectory: SB_ROOT_win, value: 'buildWorkspace' ], credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00' , overrideGlobal: true , serverURI: 'https: //xxxx' , timeout: 480])               }             }                          stage ( 'Run the build on ' + label) {                                    switch (label) {                       case 'hagrid' :                           load "${SB_ROOT_unix}/env_aix.properties"                           withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00' , passwordVariable: 'rtcPassword' , usernameVariable: 'rtcUser' )]) {                           withEnv([ "JAVA_HOME=${env.JAVA_AIX_HOME}" , "TSM_HOME=${env.TSM_HOME_aix}" , "ANT_HOME=${env.ANT_HOME_aix}" , "COMPILER_HOME=${env.COMPILER_HOME_aix}" , "PATH=${env.JAVA_AIX_HOME}/bin:${env.ANT_HOME_aix}/bin:${env.TSM_HOME_aix}/api/bin:${env.COMPILER_HOME_aix}/bin:${env.PATH}" ]) {                             sh '''cd ${SB_ROOT_unix}                                './gradlew' -PJAVA_HOME=${JAVA_HOME} -PnexusUsername=${nexusUsername} -PnexusUrl=${nexusUrl} -PnexusPassword=$rtcPassword -PPATH=$PATH showEnv rmjni:clean rmjni:build rmjni:upload rmjni:clean'''                             }                           }                           break ;                       case [ 'winky' , 'lnxec651' ] :                           load "${SB_ROOT_unix}/env_lnx.properties"                           withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00' , passwordVariable: 'rtcPassword' , usernameVariable: 'rtcUser' )]) {                             withEnv([ "JAVA_HOME=${env.JAVA_LNX_HOME}" , "TSM_HOME=${env.TSM_HOME_lnx}" , "ANT_HOME=${env.ANT_HOME_lnx}" , "COMPILER_HOME=${env.COMPILER_HOME_lnx}" , "PATH=${env.JAVA_LNX_HOME}/bin:${env.ANT_HOME_lnx}/bin:${env.TSM_HOME_lnx}/api/bin:${env.COMPILER_HOME_lnx}/bin:${env.PATH}" ]) {                             sh '''cd ${SB_ROOT_unix}                                 './gradlew' -PJAVA_HOME=${JAVA_HOME} -PnexusUsername=${nexusUsername} -PnexusUrl=${nexusUrl} -PnexusPassword=$rtcPassword rmjni:clean rmjni:build rmjni:upload rmjni:clean'''                             }                           }                           break ;                      case 'lnxec333' :                           load "${SB_ROOT_unix}/env_zlnx.properties"                           withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00' , passwordVariable: 'rtcPassword' , usernameVariable: 'rtcUser' )]) {                             withEnv([ "JAVA_HOME=${env.JAVA_LNX_HOME}" , "TSM_HOME=${env.TSM_HOME_zlnx}" , "ANT_HOME=${env.ANT_HOME_zlnx}" , "COMPILER_HOME=${env.COMPILER_HOME_zlnx}" , "PATH=${env.JAVA_LNX_HOME}/bin:${env.ANT_HOME_zlnx}/bin:${env.TSM_HOME_zlnx}/api/bin:${env.COMPILER_HOME_zlnx}/bin:${env.PATH}" ]) {                             sh '''export BUILD_NUMBER=$BUILD_NUMBER                                 cd ${SB_ROOT_unix}                                 './gradlew' -PJAVA_HOME=${JAVA_HOME} -PnexusUsername=${nexusUsername} -PnexusUrl=${nexusUrl} -PnexusPassword=$rtcPassword rmjni:clean rmjni:build rmjni:upload rmjni:clean'''                             }                           }                           break ;                             case [ 'hannah' , 'harry' ] :                           env.SB_ROOT_win = "${SB_ROOT_win}"                           env.nexusUsername = "${nexusUsername}"                           env.nexusUrl = "${nexusUrl}"                           load "${SB_ROOT_win}/env_win.properties"                           withCredentials([usernamePassword(credentialsId: 'efe12e69-483c-4ed0-becd-a63180224c00' , passwordVariable: 'rtcPassword' , usernameVariable: 'rtcUser' )]) {                               withEnv([ "JAVA_HOME=${env.JAVA_WIN_HOME}" , "TSM_HOME=${env.TSM_HOME_win}" , "ANT_HOME=${env.ANT_HOME_win}" , "COMPILER_HOME=${env.COMPILER_HOME_win}" , "PATH=${env.JAVA_WIN_HOME}\\bin;${env.ANT_HOME_win}\\bin;${env.TSM_HOME_win};${env.COMPILER_HOME_win}\\bin;C:\\UNXTOOLS\\usr\\local\\wbin;c:\\WinZip;c:\\grep;C:\\WINDOWS\\SYSTEM32;C:\\WINDOWS;c:\\WINDOWS\\SYSTEM32\\WBEM;c:\\msvs2008\\Common7\\Tools;c:\\msvs2008\\Common7\\Tools\\Bin;c:\\msvs2008\\vc\\bin;c:\\msvs2008\\common7\\ide;c:\\msvs2008\\common7\\tools;C:\\PROGRA~1\\MIA713~1\\Windows\\v6.0A\\bin" ]) {                               bat '''cd %SB_ROOT_win%                                      gradlew.bat -PnexusUsername=%nexusUsername% -PnexusUrl=%nexusUrl% -PnexusPassword=%rtcPassword% -PCOMPILER_HOME=%COMPILER_HOME% -PPATH=%PATH% rmjni:clean rmjni:build rmjni:upload rmjni:clean'''                               }                                   }                             break ;                       default :                         echo " do nothing"                         break ;                   }                                           }         }     }     i++ } parallel builders } } The nodes are all different platforms ( windows, linux, aix and s390 linux ) and things were working fine when I was using a set of nodes that were set up for building a prior release of our product.  So far, I have set up new machines/nodes for windows, aix and s390 linux. When I use those new nodes, aix and s390 linux have the problem described in this issue which, in itself is not that bad, but these same nodes are also somehow forgetting/losing the Build Number during the course of their build.  This causes an issue because I use the Build Number as part of the artifact name that gets uploaded to our nexus repository and it fails as it thinks I am trying to update a previous artifact... If I reboot these systems, then the build will pass ( it won't 'lose' the Build Number ) but once I run a subsequent one, it breaks again with the same issue. I am currently using Pipeline: Nodes and Process 2.10; I can try downgrading to 2.8...  It is just strange that this only is an issue for my new nodes.  I do see that the older and new nodes are using the same version of the slave.jar (3.4.1). I suspect it must be some configuration issue that I am missing.

              People

              • Assignee:
                Unassigned
                Reporter:
                mcenita Marlon Cenita
              • Votes:
                15 Vote for this issue
                Watchers:
                26 Start watching this issue

                Dates

                • Created:
                  Updated: