Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25623

timeout step should be able to kill infinite loop

    Details

    • Similar Issues:

      Description

      Timeout step is currently only able to interrupt executions that are pausing (such as the 'sh' step.)

      It would be nice if it can also interrupt infinite loops like while(true) ; by magically throwing an exception.

      To safely doing this without race seems to require that CpsThread.runNextChunk spontaneously return even without the CPS code explicitly yielding. When resuming, any abnormal Outcome should act as if it's throwing the exception, and normal Outcome should get ignored and its natural result should be used instead.

      The spontaneous return could happen every N instructions. Or CpsTransformer can insert Block that does it, like HotSpot inserts safepoint checks at the beginning of every function and every loop.

        Attachments

          Issue Links

            Activity

            kohsuke Kohsuke Kawaguchi created issue -
            rtyler R. Tyler Croy made changes -
            Field Original Value New Value
            Workflow JNJira [ 159597 ] JNJira + In-Review [ 180047 ]
            jglick Jesse Glick made changes -
            Link This issue is related to JENKINS-32986 [ JENKINS-32986 ]
            Hide
            jglick Jesse Glick added a comment -

            We probably need a multistaged approach here.

            • A FlowInterruptedException arising from a containing step like timeout should be able to interrupt CPS-transformed code not inside a StepExecution at a safe point (runNextChunk). I suppose similar to how we implemented the pause function, except aborting rather than pausing.
            • A build interrupt should be able to do the same. This is coming in via different code path but should delegate to the same implementation.
            • If the CPS VM is running native code, Thread.interrupt should be called. It should be given a limited grace period—say, a few seconds—to terminate; after that, resort to Thread.stop¹, making sure we are able to provide a fresh Thread for the pool so we can still run finally blocks or whatever.
            • We may also need some sort of per-build CPS VM CPU quota, distinct from timeout in that we do not care about wall clock time spent running a shell script on an agent, we just care about not overloading the master. Alternately, if a given build starts taking too much CPU time (measurable via System.nanoTime around runNextChunk), gradually being delaying its chunk execution (i.e., CpsThreadGroup.scheduleRun may call schedule rather than submit) so that it does not hog the system, and also institute a hard time limit for individual chunks (such as slow native methods).

            ¹Ignore the stern deprecation warnings. In practice it works fine—have used it for years in NetBeans “internal execution”, such as is used for Ant scripts—and any harm it might cause is AFAIK purely theoretical and a much lesser evil than JENKINS-32986.

            Show
            jglick Jesse Glick added a comment - We probably need a multistaged approach here. A FlowInterruptedException arising from a containing step like timeout should be able to interrupt CPS-transformed code not inside a StepExecution at a safe point ( runNextChunk ). I suppose similar to how we implemented the pause function, except aborting rather than pausing. A build interrupt should be able to do the same. This is coming in via different code path but should delegate to the same implementation. If the CPS VM is running native code, Thread.interrupt should be called. It should be given a limited grace period—say, a few seconds—to terminate; after that, resort to Thread.stop ¹, making sure we are able to provide a fresh Thread for the pool so we can still run finally blocks or whatever. We may also need some sort of per-build CPS VM CPU quota, distinct from timeout in that we do not care about wall clock time spent running a shell script on an agent, we just care about not overloading the master. Alternately, if a given build starts taking too much CPU time (measurable via System.nanoTime around runNextChunk ), gradually being delaying its chunk execution (i.e., CpsThreadGroup.scheduleRun may call schedule rather than submit ) so that it does not hog the system, and also institute a hard time limit for individual chunks (such as slow native methods). ¹Ignore the stern deprecation warnings. In practice it works fine—have used it for years in NetBeans “internal execution”, such as is used for Ant scripts—and any harm it might cause is AFAIK purely theoretical and a much lesser evil than JENKINS-32986 .
            jglick Jesse Glick made changes -
            Epic Link JENKINS-35396 [ 171189 ]
            Hide
            jglick Jesse Glick added a comment -

            Another issue is unbounded heap consumption (without consuming much CPU). I doubt we can do anything about this, as Java does not even provide a means to track it. (From CPS-transformed code, perhaps, but not from native code including @NonCPS.)

            Show
            jglick Jesse Glick added a comment - Another issue is unbounded heap consumption (without consuming much CPU). I doubt we can do anything about this, as Java does not even provide a means to track it. (From CPS-transformed code, perhaps, but not from native code including @NonCPS .)
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            I came here to mention safepoints like HotSpot.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - I came here to mention safepoints like HotSpot.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            pom.xml
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsGroovyShellFactory.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/Safepoint.java
            src/test/java/org/jenkinsci/plugins/workflow/WorkflowJobNonRestartingTest.java
            http://jenkins-ci.org/commit/workflow-cps-plugin/95e73725ec0479685916ace0e1b2d80801519e43
            Log:
            JENKINS-25623

            previously, CpsThreadGroup.run() was greedy. It was running as much as
            it can before it returns. This means if the Pipeline script in question
            has an infinite loop, this method never returns. This prevents other
            activities to take place on CPS VM thread, most notably the attempt to
            kill the execution.

            In this change, CpsThreadGroup.run() is modified to execute just a
            little bit, not as much as it can, by using the new safepoint capability
            in groovy-cps. CpsThreadGroup.scheduleRun() is modified so that if the
            Pipeline Script is still runnable, the next chunk of execution is
            scheduled.

            Future implementation from scheduleRun() was simplified a bit. This code
            is really only used for testing, so it's not that important that the
            cancel() and other methods work correctly.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: pom.xml src/main/java/org/jenkinsci/plugins/workflow/cps/CpsGroovyShellFactory.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java src/main/java/org/jenkinsci/plugins/workflow/cps/Safepoint.java src/test/java/org/jenkinsci/plugins/workflow/WorkflowJobNonRestartingTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/95e73725ec0479685916ace0e1b2d80801519e43 Log: JENKINS-25623 previously, CpsThreadGroup.run() was greedy. It was running as much as it can before it returns. This means if the Pipeline script in question has an infinite loop, this method never returns. This prevents other activities to take place on CPS VM thread, most notably the attempt to kill the execution. In this change, CpsThreadGroup.run() is modified to execute just a little bit, not as much as it can, by using the new safepoint capability in groovy-cps. CpsThreadGroup.scheduleRun() is modified so that if the Pipeline Script is still runnable, the next chunk of execution is scheduled. Future implementation from scheduleRun() was simplified a bit. This code is really only used for testing, so it's not that important that the cancel() and other methods work correctly.
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Implemented the safepoints & kill.

            This change does:

            • allow interruption to kill infinite loops and recursions
            • timeout to work correctly even when the thread is not paused at step
              This change does not:
            • allow interruption to escalate to Thread.interrup() / Thread.stop() to forcibly stop an execution that's happening in non-CPS code
            • do any CPU usage accounting.
            Show
            kohsuke Kohsuke Kawaguchi added a comment - Implemented the safepoints & kill. This change does: allow interruption to kill infinite loops and recursions timeout to work correctly even when the thread is not paused at step This change does not: allow interruption to escalate to Thread.interrup() / Thread.stop() to forcibly stop an execution that's happening in non-CPS code do any CPU usage accounting.
            kohsuke Kohsuke Kawaguchi made changes -
            Remote Link This issue links to "PR #46 (Web Link)" [ 14746 ]
            jglick Jesse Glick made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            jglick Jesse Glick made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            Hide
            jglick Jesse Glick added a comment -

            Maybe we can leave the open tasks to JENKINS-32986.

            Show
            jglick Jesse Glick added a comment - Maybe we can leave the open tasks to JENKINS-32986 .
            abayer Andrew Bayer made changes -
            Component/s pipeline-general [ 21692 ]
            abayer Andrew Bayer made changes -
            Component/s workflow-plugin [ 18820 ]
            jglick Jesse Glick made changes -
            Component/s workflow-cps-plugin [ 21713 ]
            Component/s pipeline [ 21692 ]
            jglick Jesse Glick made changes -
            Labels robustness
            jglick Jesse Glick made changes -
            Status In Review [ 10005 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            pom.xml
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsGroovyShellFactory.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/Safepoint.java
            src/test/java/org/jenkinsci/plugins/workflow/WorkflowJobNonRestartingTest.java
            http://jenkins-ci.org/commit/workflow-cps-plugin/f4a660726b6d7a6d149c919ae7c5971ac9bb4818
            Log:
            Merge pull request #46 from jenkinsci/JENKINS-25623

            JENKINS-25623 time out & kill infinite recursion

            Compare: https://github.com/jenkinsci/workflow-cps-plugin/compare/1100650f3c9b...f4a660726b6d

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsGroovyShellFactory.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java src/main/java/org/jenkinsci/plugins/workflow/cps/Safepoint.java src/test/java/org/jenkinsci/plugins/workflow/WorkflowJobNonRestartingTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/f4a660726b6d7a6d149c919ae7c5971ac9bb4818 Log: Merge pull request #46 from jenkinsci/ JENKINS-25623 JENKINS-25623 time out & kill infinite recursion Compare: https://github.com/jenkinsci/workflow-cps-plugin/compare/1100650f3c9b...f4a660726b6d
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-38157 [ JENKINS-38157 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
            http://jenkins-ci.org/commit/workflow-cps-plugin/22fa38ac38ea8fd61c9b615779f5b793cd381214
            Log:
            Avoid saving program.dat if we are about to run another chunk anyway.
            Reverts this aspect of behavior to that prior to the fix of JENKINS-25623.
            Also logging time spent saving.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java http://jenkins-ci.org/commit/workflow-cps-plugin/22fa38ac38ea8fd61c9b615779f5b793cd381214 Log: Avoid saving program.dat if we are about to run another chunk anyway. Reverts this aspect of behavior to that prior to the fix of JENKINS-25623 . Also logging time spent saving.
            jglick Jesse Glick made changes -
            Link This issue is duplicated by JENKINS-32228 [ JENKINS-32228 ]

              People

              • Assignee:
                kohsuke Kohsuke Kawaguchi
                Reporter:
                kohsuke Kohsuke Kawaguchi
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: