Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38778

Reserved node leaked from 'DisposableQueue'

XMLWordPrintable

      I've detected a situation when reserved node wasn't put into disposable queue and was reserved forever. I've tried to find out the possible path/place where an exception occurred and ForemanSharedNode.terminate() wasn't invoked, but I failed.

      What I can see from the log is:

      [ForemanAPI] Reserving host X
      [ForemanAPI] Retrieving ip="AA.BB.CC.DD" for X
      [ForemanSharedNode] Instancing a new ForemanSharedNode: name='X', label='LABEL'
      [hudson.slaves.NodeProvisioner] (jenkins.util.Timer [#10]) Started provisioning LABEL from Foreman-test with 1 executors. Remaining excess workload: 0
      

      X, AA.BB.CC.DD and LABEL are obfuscated only. The third reported line is produced by not released branch yet (see here).

      Nothing else was reported for node X. In that time 3 nodes were reserved while network to Foreman was dropped, the first detection was in:

      2016-10-06 03:50:34,331 INFO  [STDOUT] (jenkins.util.Timer [#10]) 03:50:34,330 ERROR [ForemanAPI] Unhandled exception getting JENKINS_LABEL for Y
      
      2016-10-06 03:50:34,331 INFO  [STDOUT] (jenkins.util.Timer [#10]) 03:50:34,331 ERROR [ForemanSharedNodeCloud] com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
       at [Source: ; line: 1, column: 1]
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
       at [Source: ; line: 1, column: 1]
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10])       at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10])       at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3781)
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10])       at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3721)
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10])       at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726)
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10])       at com.redhat.foreman.ForemanAPI.getHostParameterValue(ForemanAPI.java:200)
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10])       at com.redhat.foreman.ForemanAPI.getHostForQuery(ForemanAPI.java:305)
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10])       at com.redhat.foreman.ForemanAPI.getCompatibleHosts(ForemanAPI.java:322)
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10])       at com.redhat.foreman.ForemanSharedNodeCloud.canProvision(ForemanSharedNodeCloud.java:159)
      2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.model.Label.getClouds(Label.java:227)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.model.Label.isEmpty(Label.java:435)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at jenkins.model.Jenkins.trimLabels(Jenkins.java:1725)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at jenkins.model.Nodes$2.run(Nodes.java:137)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.model.Queue._withLock(Queue.java:1289)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.model.Queue.withLock(Queue.java:1172)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at jenkins.model.Nodes.addNode(Nodes.java:132)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at jenkins.model.Jenkins.addNode(Jenkins.java:1693)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.slaves.NodeProvisioner$2.run(NodeProvisioner.java:226)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.model.Queue._withLock(Queue.java:1289)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.model.Queue.withLock(Queue.java:1172)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:208)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:57)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:775)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
      2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10])       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
      2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10])       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
      2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10])       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
      2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10])       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
      2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10])       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10])       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10])       at java.lang.Thread.run(Thread.java:682)
      

      Y was another slave than X. Each following request to Foreman ended in:

      ERROR [ForemanAPI] java.net.SocketException: Unexpected end of file from server

      As I wrote, 3 nodes were reserved in that time (X, Y, Z) and the Y and Z were correctly released when I established network back. From the log above, there has to be a path during/after provisioning a slave which doesn't ended in ForemanSharedNode.terminate(), but I don't see it in the code.

            pajasoft Pavel Janoušek
            pajasoft Pavel Janoušek
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: