Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-57805

Exception thrown from ExecutorStepExecution.PlaceHolderTask.getAffinityKey breaks Queue maintenance

    Details

    • Similar Issues:
    • Released As:
      workflow-api 2.35, Jenkins 2.181

      Description

      I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:

      2019-04-29 14:24:14.748+0000 [id=118]	SEVERE	hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
      java.lang.IndexOutOfBoundsException: Index: 0
      	at java.util.Collections$EmptyList.get(Collections.java:4454)
      	at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
      	at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
      	at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
      	at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
      	at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
      	at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
      	at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
      	at hudson.model.Queue.maintain(Queue.java:1629)
      	at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
      	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
      	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:

      2019-04-29 14:24:14.745+0000 [id=118]	WARNING	o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
      java.nio.channels.ClosedByInterruptException
      	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
      	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
      	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
      	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
      	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
      	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
      	at java.io.FilterInputStream.read(FilterInputStream.java:83)
      	at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
      	at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
      	at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
      	at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
      Caused: com.thoughtworks.xstream.io.StreamException:  : null
      	at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
      	at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
      	at hudson.XmlFile.read(XmlFile.java:147)
      Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
      	at hudson.XmlFile.read(XmlFile.java:149)
      	at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
      	at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
      	at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
      	at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
      	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
      	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
      	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
      	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
      	at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
      	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
      	at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
      	at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
      Caused: java.io.IOException
      	at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
      	at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
      	at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
      	at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
      	at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
      	at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
      	at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
      	at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
      	at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
      	at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
      	at hudson.model.Queue.maintain(Queue.java:1629)
      	at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
      	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
      	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against getAffinityKey that throw RuntimeException so that they do not break Queue maintenance entirely.

      On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:

      1. Harden StandardGraphLookupView.bruteForceScanForEnclosingBlock to make it return null if there is an error deserializing a node or we come upon a node with no parents. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt FlowNode elsewhere.
      2. Add a try/catch inside of CpsFlowExecution.getNode and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in CpsFlowExecution.onLoad. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
      3. Eagerly load all flow nodes in CpsFlowExecution.onLoad. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with many graph traversals anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.

        Attachments

          Issue Links

            Activity

            dnusbaum Devin Nusbaum created issue -
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Opened a PR against core to fix the symptoms, but I'm still not sure about the best way to fix the error on Pipeline's side.

            Show
            dnusbaum Devin Nusbaum added a comment - Opened a PR against core to fix the symptoms, but I'm still not sure about the best way to fix the error on Pipeline's side.
            dnusbaum Devin Nusbaum made changes -
            Field Original Value New Value
            Remote Link This issue links to "jenkinsci/jenkins#4055 (Web Link)" [ 23039 ]
            dnusbaum Devin Nusbaum made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            dnusbaum Devin Nusbaum made changes -
            Description I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:

            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}

            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:

            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue. The most obvious thing to me would be to add a try/catch inside of {{CpsFlowExecution.getNode}}, and if an exception is caught, we treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}, but in this case the program is already loaded and potentially executing so I'm not sure if there would be a clean way to shut it down from that method. Another thing to try could be to eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}, but that would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
            * Make the parents field in a FlowNode a critical field in XStream, so if it can't be deserialized the whole node fails to be deserialized rather than returning a corrupt node. I think this is a good idea (maybe for other fields in FlowNode as well), but as far as the symptoms here think it would just change where the error is thrown in {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}}.
            * Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
            * Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            * Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node. Would work fine, but it doesn't really seem right for the error to be handled here. In retrospect the method should probably throw IOException instead of rethrowing internal exceptions as a RuntimeException [here|https://github.com/jenkinsci/workflow-api-plugin/blob/c84dbab8cd35c90f70d84390dc711901fa73b7ad/src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java#L143] (even though that is not actually where the error is thrown in this case).
            dnusbaum Devin Nusbaum made changes -
            Description I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
            * Make the parents field in a FlowNode a critical field in XStream, so if it can't be deserialized the whole node fails to be deserialized rather than returning a corrupt node. I think this is a good idea (maybe for other fields in FlowNode as well), but as far as the symptoms here think it would just change where the error is thrown in {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}}.
            * Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
            * Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            * Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node. Would work fine, but it doesn't really seem right for the error to be handled here. In retrospect the method should probably throw IOException instead of rethrowing internal exceptions as a RuntimeException [here|https://github.com/jenkinsci/workflow-api-plugin/blob/c84dbab8cd35c90f70d84390dc711901fa73b7ad/src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java#L143] (even though that is not actually where the error is thrown in this case).
            I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
             * Make the parents field in a FlowNode a critical field in XStream, so if it can't be deserialized the whole node fails to be deserialized rather than returning a corrupt node. I think this is a good idea (maybe for other fields in FlowNode as well), but as far as the symptoms here think it would just change where the error is thrown in {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}}.
             * Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
             * Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
             * Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt {{FlowNode}} elsewhere.
            dnusbaum Devin Nusbaum made changes -
            Description I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
             * Make the parents field in a FlowNode a critical field in XStream, so if it can't be deserialized the whole node fails to be deserialized rather than returning a corrupt node. I think this is a good idea (maybe for other fields in FlowNode as well), but as far as the symptoms here think it would just change where the error is thrown in {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}}.
             * Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
             * Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
             * Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt {{FlowNode}} elsewhere.
            I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
             # Make the parents field in a FlowNode a critical field in XStream, so if it can't be deserialized the whole node fails to be deserialized rather than returning a corrupt node. I think this is a good idea (maybe for other fields in FlowNode as well), but as far as the symptoms here think it would just change where the error is thrown in {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}}.
             # Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt {{FlowNode}} elsewhere.
             # Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
             # Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            dnusbaum Devin Nusbaum made changes -
            Description I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
             # Make the parents field in a FlowNode a critical field in XStream, so if it can't be deserialized the whole node fails to be deserialized rather than returning a corrupt node. I think this is a good idea (maybe for other fields in FlowNode as well), but as far as the symptoms here think it would just change where the error is thrown in {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}}.
             # Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt {{FlowNode}} elsewhere.
             # Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
             # Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
             # Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt {{FlowNode}} elsewhere.
             # Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
             # Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            dnusbaum Devin Nusbaum made changes -
            Description I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
             # Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt {{FlowNode}} elsewhere.
             # Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
             # Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
             # Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node or we come upon a node with no parents. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt {{FlowNode}} elsewhere.
             # Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
             # Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            dnusbaum Devin Nusbaum made changes -
            Description I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
             # Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node or we come upon a node with no parents. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt {{FlowNode}} elsewhere.
             # Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
             # Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with any flow graph traversal is used anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            I have seen the following exception break Queue maintenance and make it impossible for new jobs to be scheduled:
            {noformat}
            2019-04-29 14:24:14.748+0000 [id=118] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7edb0f64 failed
            java.lang.IndexOutOfBoundsException: Index: 0
            at java.util.Collections$EmptyList.get(Collections.java:4454)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            As best as I can tell, the root cause was that a channel reading a flow node's parent node XML file was interrupted, which threw an exception and left the flow graph corrupted. (I am not sure how or why the read was interrupted, but since the same thing would happen if the file was corrupted on disk it seems like a reasonable failure mode to handle.) Here is the stack trace for that issue:
            {noformat}
            2019-04-29 14:24:14.745+0000 [id=118] WARNING o.j.p.workflow.graph.FlowNode#loadParents: failed to load parents of 96
            java.nio.channels.ClosedByInterruptException
            at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
            at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
            at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
            at java.io.FilterInputStream.read(FilterInputStream.java:83)
            at java.io.PushbackInputStream.read(PushbackInputStream.java:139)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.getHeader(XmlHeaderAwareReader.java:79)
            at com.thoughtworks.xstream.core.util.XmlHeaderAwareReader.<init>(XmlHeaderAwareReader.java:61)
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:65)
            Caused: com.thoughtworks.xstream.io.StreamException: : null
            at com.thoughtworks.xstream.io.xml.AbstractXppDriver.createReader(AbstractXppDriver.java:69)
            at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
            at hudson.XmlFile.read(XmlFile.java:147)
            Caused: java.io.IOException: Unable to read /var/jenkins_home/jobs/foo/jobs/bar/builds/163/workflow/95.xml
            at hudson.XmlFile.read(XmlFile.java:149)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.load(SimpleXStreamFlowNodeStorage.java:204)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.access$000(SimpleXStreamFlowNodeStorage.java:71)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:76)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage$1.load(SimpleXStreamFlowNodeStorage.java:74)
            at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
            at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
            at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
            at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
            at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
            at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
            at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:101)
            Caused: java.io.IOException
            at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.getNode(SimpleXStreamFlowNodeStorage.java:109)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1791)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1179)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.loadParents(FlowNode.java:165)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getParents(FlowNode.java:156)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.bruteForceScanForEnclosingBlock(StandardGraphLookupView.java:150)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findEnclosingBlockStart(StandardGraphLookupView.java:197)
            at org.jenkinsci.plugins.workflow.graph.StandardGraphLookupView.findAllEnclosingBlockStarts(StandardGraphLookupView.java:217)
            at org.jenkinsci.plugins.workflow.flow.FlowExecution.findAllEnclosingBlockStarts(FlowExecution.java:299)
            at org.jenkinsci.plugins.workflow.graph.FlowNode.getEnclosingBlocks(FlowNode.java:195)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.concatenateAllEnclosingLabels(ExecutorStepExecution.java:527)
            at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getAffinityKey(ExecutorStepExecution.java:545)
            at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115)
            at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105)
            at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157)
            at hudson.model.Queue.maintain(Queue.java:1629)
            at hudson.model.Queue$MaintainTask.doRun(Queue.java:2887)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            {noformat}
            We should make changes over on the Pipeline side so that a deserialization failure after the program is already loaded is handled more gracefully, but I think we should also harden core against {{getAffinityKey}} that throw {{RuntimeException}} so that they do not break Queue maintenance entirely.

            On the Pipeline side, I'm not sure how best to handle this issue, since none of the call sites in the stack trace look like a great place to actually handle the error. Here are some ideas:
             # Harden {{StandardGraphLookupView.bruteForceScanForEnclosingBlock}} to make it return null if there is an error deserializing a node or we come upon a node with no parents. Would work fine, but it doesn't really seem right for the error to be handled here, since other code would have to deal with the same corrupt {{FlowNode}} elsewhere.
             # Add a try/catch inside of {{CpsFlowExecution.getNode}} and if an exception is caught, treat that as a fatal error for the execution and handle it like we would an error loading flow heads in {{CpsFlowExecution.onLoad}}. In this case the program is already loaded and potentially executing, so I'm not sure if there would be a clean way to shut it down from that method, and the method might be used in contexts where that is not the desired behavior.
             # Eagerly load all flow nodes in {{CpsFlowExecution.onLoad}}. That would probably have bad performance implications (although maybe not as bad as I am imagining since that will happen with many graph traversals anyway), and since loaded nodes are cached with soft references, they could need to be loaded again from disk anyway in which case the same problem could occur.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Jesse Glick Any thoughts on a good place to handle the error shown by the second stack trace in the description here in Pipeline code? (assuming you think it makes sense to try to handle this kind of error) Since FlowNodes are not eagerly loaded on restart and may need to be reloaded if they are evicted from the cache, it seems like anything that loads FlowNodes needs to throw IOException, but there are places where we swallow or rethrow these kinds of errors as a RuntimeException, potentially leaving the flow graph corrupt for other users (See FlowNode.getParents, StandardGraphLookupView.bruteForceScanForExistingBlock).

            Does my second idea (make FlowNode.getNode try to stop the execution when a node cannot be loaded) seem like something worth looking into? If it is even possible, I am a little concerned that it might cause executions that could have otherwise completed successfully or with minor data loss to fail completely.

            Show
            dnusbaum Devin Nusbaum added a comment - Jesse Glick Any thoughts on a good place to handle the error shown by the second stack trace in the description here in Pipeline code? (assuming you think it makes sense to try to handle this kind of error) Since FlowNodes are not eagerly loaded on restart and may need to be reloaded if they are evicted from the cache, it seems like anything that loads FlowNodes needs to throw IOException, but there are places where we swallow or rethrow these kinds of errors as a RuntimeException, potentially leaving the flow graph corrupt for other users (See FlowNode.getParents , StandardGraphLookupView.bruteForceScanForExistingBlock ). Does my second idea (make FlowNode.getNode try to stop the execution when a node cannot be loaded) seem like something worth looking into? If it is even possible, I am a little concerned that it might cause executions that could have otherwise completed successfully or with minor data loss to fail completely.
            jglick Jesse Glick made changes -
            Link This issue is caused by JENKINS-36547 [ JENKINS-36547 ]
            jglick Jesse Glick made changes -
            Labels robustness
            Hide
            jglick Jesse Glick added a comment -

            For the first stack trace, I would start by fixing bruteForceScanForEnclosingBlock to not assume !current.getParents().isEmpty(). You could probably remove the special-case handling of FlowStartNode in that case: when the parents list is empty for any reason, break the recursion and return null.

            Halting execution if there is a problem in the node graph seems reasonable to me. Something more important is likely to go wrong with the build anyway, or have gone wrong already.

            Show
            jglick Jesse Glick added a comment - For the first stack trace, I would start by fixing bruteForceScanForEnclosingBlock to not assume !current.getParents().isEmpty() . You could probably remove the special-case handling of FlowStartNode in that case: when the parents list is empty for any reason, break the recursion and return null. Halting execution if there is a problem in the node graph seems reasonable to me. Something more important is likely to go wrong with the build anyway, or have gone wrong already.
            dnusbaum Devin Nusbaum made changes -
            Remote Link This issue links to "jenkinsci/workflow-api-plugin#94 (Web Link)" [ 23060 ]
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Thanks, I filed https://github.com/jenkinsci/workflow-api-plugin/pull/94 to harden bruteForceScanForEnclosingBlock.

            Halting execution if there is a problem in the node graph seems reasonable to me

            Ok, and you think it makes sense to do this from `CpsFlowExecution.getNode`, so that that method would no longer throw exceptions and would instead terminate the program and return null if loading the node returns an error?

            Show
            dnusbaum Devin Nusbaum added a comment - Thanks, I filed https://github.com/jenkinsci/workflow-api-plugin/pull/94 to harden bruteForceScanForEnclosingBlock. Halting execution if there is a problem in the node graph seems reasonable to me Ok, and you think it makes sense to do this from `CpsFlowExecution.getNode`, so that that method would no longer throw exceptions and would instead terminate the program and return null if loading the node returns an error?
            Hide
            jglick Jesse Glick added a comment -

            Well you can try.

            Show
            jglick Jesse Glick added a comment - Well you can try.
            dnusbaum Devin Nusbaum made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Ok, considering that the PRs to core and workflow-api should fix the immediate problems described by this ticket, and that halting Pipeline execution is a bit more experimental and would require more time to investigate then I have at the moment, I filed JENKINS-57913 to track that as a separate improvement.

            Show
            dnusbaum Devin Nusbaum added a comment - Ok, considering that the PRs to core and workflow-api should fix the immediate problems described by this ticket, and that halting Pipeline execution is a bit more experimental and would require more time to investigate then I have at the moment, I filed JENKINS-57913 to track that as a separate improvement.
            dnusbaum Devin Nusbaum made changes -
            Link This issue relates to JENKINS-57913 [ JENKINS-57913 ]
            dnusbaum Devin Nusbaum made changes -
            Released As workflow-api 2.35
            oleg_nenashev Oleg Nenashev made changes -
            Labels robustness lts-candidate robustness
            Released As workflow-api 2.35 workflow-api 2.35, Jenkins 2.181
            dnusbaum Devin Nusbaum made changes -
            Status In Review [ 10005 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            danielbeck Daniel Beck made changes -
            Labels lts-candidate robustness robustness

              People

              • Assignee:
                dnusbaum Devin Nusbaum
                Reporter:
                dnusbaum Devin Nusbaum
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: