Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50784

"Replay" pipeline button/link is not (always?) available (anymore?)

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Labels:
    • Environment:
    • Similar Issues:

      Description

      Based on https://groups.google.com/forum/?#!msg/jenkinsci-users/fsU3BnOpGOk/9q1bBAxRCgAJ (Sorry about kind of cross-posting; and sorry for my naive guess of involved "component/s")

      Since (felt) very, very recently the "Replay" button is not shown anymore for some slightly older (pipeline) builds.

      What does it depend on?

      • It does NOT seem to be available based on (a) work day, or (jenkins log inspired suspicions) (b) jobAnalytics or (c) Workspace clean-up.
      • And also not Jenkins (master) restart dependent.

      (Maybe it always behaved like this and I have really never ever tried to replay a slightly older build. Which I strongly doubt at the moment.)

        Attachments

          Issue Links

            Activity

            reinholdfuereder Reinhold Füreder created issue -
            Hide
            bherfurth Ben Herfurth added a comment -

            I just experienced the same problem ...

            Could it be, that if a build slave is not available anymore, one can't run the build again?

            Show
            bherfurth Ben Herfurth added a comment - I just experienced the same problem ... Could it be, that if a build slave is not available anymore, one can't run the build again?
            Hide
            svanoort Sam Van Oort added a comment -

            Andrew Bayer Could this be related to the permissions changes at all?

            Show
            svanoort Sam Van Oort added a comment - Andrew Bayer Could this be related to the permissions changes at all?
            Hide
            abayer Andrew Bayer added a comment -

            The permission changes from a few months ago would have just made the "Rebuild" button show up instead of "Replay" if the user didn't have config permissions - I think this is actually more likely to be an annoying side effect of not lazy-loading the FlowExecution any more - if run.asFlowExecutionOwner().getOrNull() is null, the run wouldn't be viewed as rebuildable and so neither "Rebuild" nor "Replay" would show up. Grr. We'll need to think on how to handle this.

            Show
            abayer Andrew Bayer added a comment - The permission changes from a few months ago would have just made the "Rebuild" button show up instead of "Replay" if the user didn't have config permissions - I think this is actually more likely to be an annoying side effect of not lazy-loading the FlowExecution any more - if run.asFlowExecutionOwner().getOrNull() is null, the run wouldn't be viewed as rebuildable and so neither "Rebuild" nor "Replay" would show up. Grr. We'll need to think on how to handle this.
            Hide
            svanoort Sam Van Oort added a comment -

            Dang, you're right. I think this opens up a whole other class of ugliess around the execution promise – because why just settle for a simple concurrency model using either synchronization or promises when you can combine synchronization, forced use of a single thread for the CpsVmThread, AND promises in one class?

            Show
            svanoort Sam Van Oort added a comment - Dang, you're right. I think this opens up a whole other class of ugliess around the execution promise – because why just settle for a simple concurrency model using either synchronization or promises when you can combine synchronization, forced use of a single thread for the CpsVmThread, AND promises in one class?
            Hide
            abayer Andrew Bayer added a comment -

            At first glance, I'm wondering if we should be changing WorkflowRun.Owner#getOrNull() to fall back on returning the existing WorkflowRun#execution? Except I'm not clear yet whether that will be null when completed is true. Fun.

            Show
            abayer Andrew Bayer added a comment - At first glance, I'm wondering if we should be changing WorkflowRun.Owner#getOrNull() to fall back on returning the existing WorkflowRun#execution ? Except I'm not clear yet whether that will be null when completed is true. Fun.
            Hide
            svanoort Sam Van Oort added a comment -

            Andrew Bayer It may still be null. Unfortunately we still have cases where it will be nulled out for old and busted but completed builds.

            I almost want to fall back to the standard #get() method there, which will force a slower return but at least not rely on the excutionPromise working correctly... but am going to have to trace through the consequences of that.

            Show
            svanoort Sam Van Oort added a comment - Andrew Bayer It may still be null. Unfortunately we still have cases where it will be nulled out for old and busted but completed builds. I almost want to fall back to the standard #get() method there, which will force a slower return but at least not rely on the excutionPromise working correctly... but am going to have to trace through the consequences of that.
            Hide
            svanoort Sam Van Oort added a comment - - edited

            So, as best I can determine:

            1. WorkflowRun.Owner#getOrNull() is pretty much incompatible with lazy loading of the FlowExecution-- either we violate its premise of not blocking (in which case, why bother with it) or we accept bogus nulls when lazy load hasn't happened. We may consider switching consumers of that API to blocking calls maybe (it is only used a tiny number of places).
            2. The WorkflowRun.executionPromise field can be null freely since it is only initialized on first run... we could solve this with a lazy-initialization on the getter, but that really mandates synchronization and I cannot determine which of 3 objects we should be synchronizing on (the run, the copyLogsGuard, or LOADING_RUNS). NPEs that could be thrown within getOrNull when loading the build are, um, swallowed if your log level is >FINE (it only logs them).
            3. We could add null checks for the executionPromise in the first few places it's touched, and have getOrNull take advantage of that
            4. WorkflowRun#getExecutionPromise is (aside from the Owner.getOrNull) exclusively used in tests to wait until the run begins.

            Show
            svanoort Sam Van Oort added a comment - - edited So, as best I can determine: 1. WorkflowRun.Owner#getOrNull() is pretty much incompatible with lazy loading of the FlowExecution-- either we violate its premise of not blocking (in which case, why bother with it) or we accept bogus nulls when lazy load hasn't happened. We may consider switching consumers of that API to blocking calls maybe (it is only used a tiny number of places). 2. The WorkflowRun.executionPromise field can be null freely since it is only initialized on first run... we could solve this with a lazy-initialization on the getter, but that really mandates synchronization and I cannot determine which of 3 objects we should be synchronizing on (the run, the copyLogsGuard, or LOADING_RUNS). NPEs that could be thrown within getOrNull when loading the build are, um, swallowed if your log level is >FINE (it only logs them). 3. We could add null checks for the executionPromise in the first few places it's touched, and have getOrNull take advantage of that 4. WorkflowRun#getExecutionPromise is (aside from the Owner.getOrNull) exclusively used in tests to wait until the run begins.
            svanoort Sam Van Oort made changes -
            Field Original Value New Value
            Labels regression
            Hide
            svanoort Sam Van Oort added a comment -

            Comment 2: the behavior of the ExecutionPromise is poorly defined with regards to loading the execution after a restart of Jenkins AFAICT.

            And in general there's not a sensible way to distinguish the following cases:

            1. FlowExecution is null because we're just beginning the Run and it hasn't been created yet
            2. FlowExecution is null because something went horribly wrong and it had to be trashed

            I'm also wondering if this may relate somehow to https://issues.jenkins-ci.org/browse/JENKINS-50199 and https://issues.jenkins-ci.org/browse/JENKINS-49686 in some vague way since it seems to open up a possibility that the Owner.get() method might be failing to invoke the FlowExecution.onLoad.

            Show
            svanoort Sam Van Oort added a comment - Comment 2: the behavior of the ExecutionPromise is poorly defined with regards to loading the execution after a restart of Jenkins AFAICT. And in general there's not a sensible way to distinguish the following cases: 1. FlowExecution is null because we're just beginning the Run and it hasn't been created yet 2. FlowExecution is null because something went horribly wrong and it had to be trashed I'm also wondering if this may relate somehow to https://issues.jenkins-ci.org/browse/JENKINS-50199 and https://issues.jenkins-ci.org/browse/JENKINS-49686 in some vague way since it seems to open up a possibility that the Owner.get() method might be failing to invoke the FlowExecution.onLoad.
            Hide
            jglick Jesse Glick added a comment -

            Sam Van Oort when referring to other issues, just paste in the ID, not the link, or JIRA will not properly provide tool tips. So: JENKINS-50199 and JENKINS-49686

            Show
            jglick Jesse Glick added a comment - Sam Van Oort when referring to other issues, just paste in the ID, not the link, or JIRA will not properly provide tool tips. So: JENKINS-50199 and JENKINS-49686
            svanoort Sam Van Oort made changes -
            Link This issue is duplicated by JENKINS-50820 [ JENKINS-50820 ]
            Hide
            jglick Jesse Glick added a comment -

            I am not sure what the concerns surrounding corner cases are, but if the issue is simply that for a build which was loaded from disk (not started in this session) for which we did not independently load the FlowExecution, that ReplayAction.isEnabled is returning false because WorkflowRun.Owner.getOrNull is returning null for reasons of laziness, then the fix is simple: make the action enabled whenever we have a FlowExecutionOwner, and if and when the user invokes the action, then call get() (blocking), check RUN_SCRIPTS for non-sandbox mode, and provide some sort of UI notification if these preconditions are not met. Could all happen in ReplayAction/index.jelly in fact.

            Show
            jglick Jesse Glick added a comment - I am not sure what the concerns surrounding corner cases are, but if the issue is simply that for a build which was loaded from disk (not started in this session) for which we did not independently load the FlowExecution , that ReplayAction.isEnabled is returning false because WorkflowRun.Owner.getOrNull is returning null for reasons of laziness, then the fix is simple: make the action enabled whenever we have a FlowExecutionOwner , and if and when the user invokes the action, then call get() (blocking), check RUN_SCRIPTS for non-sandbox mode, and provide some sort of UI notification if these preconditions are not met. Could all happen in ReplayAction/index.jelly in fact.
            Show
            jglick Jesse Glick added a comment - Also, the following calls to getOrNull should be reviewed to see whether they will still produce reasonable behavior given the lazy-loading change: https://github.com/jenkinsci/workflow-cps-plugin/blob/9b0848c4efd71f051c653fc288492a2d70a84e58/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java#L1779 (arguably a regression) https://github.com/jenkinsci/workflow-cps-plugin/blob/41ac59c8f0878645ee5d697cdf8f37ce88c68a7d/src/main/java/org/jenkinsci/plugins/workflow/cps/RunningFlowActions.java#L51 (I think this one is fine) https://github.com/jenkinsci/workflow-multibranch-plugin/blob/dd6fbd2c1fdba58bc0a8b0404b72304beb3e09be/src/main/java/org/jenkinsci/plugins/workflow/multibranch/JobPropertyStep.java#L104 (looks like a regression) https://github.com/jenkinsci/parallel-test-executor-plugin/blob/89aad65707343dc98339b006fa4fe07ba1fe58b0/src/main/java/org/jenkinsci/plugins/parallel_test_executor/ParallelTestExecutor.java#L170 (probably fine) https://github.com/jenkinsci/junit-realtime-test-reporter-plugin/blob/b929d6924439f4c31775c704d3d83f8b052533a5/src/test/java/org/jenkinsci/plugins/junitrealtimetestreporter/RealtimeJUnitStepTest.java#L218 (probably fine) https://github.com/jenkinsci/throttle-concurrent-builds-plugin/blob/d105bb791364f3d8bfe42dcad448b14f9b0ed8b7/src/main/java/hudson/plugins/throttleconcurrents/ThrottleJobProperty.java#L327 (probably fine)
            Hide
            jglick Jesse Glick added a comment -

            Note that you could trigger a background load if and when someone calls getOrNull (but return null for now). Whether or not that is too late depends on the caller. In the case of Replay, it would mean that the action would not initially appear in the sidebar, but then it would appear in the context menu a moment later. For PipelineTimings and JobPropertyStep, this would not really help—in both cases it probably makes more sense to just use get and block if necessary. And I think the originally suggested change to ReplayAction (+ its index.jelly) would be desirable anyway—more transparent to have the link always be there, and if it cannot be used, explain why.

            Show
            jglick Jesse Glick added a comment - Note that you could trigger a background load if and when someone calls getOrNull (but return null for now). Whether or not that is too late depends on the caller. In the case of Replay , it would mean that the action would not initially appear in the sidebar, but then it would appear in the context menu a moment later. For PipelineTimings and JobPropertyStep , this would not really help—in both cases it probably makes more sense to just use get and block if necessary. And I think the originally suggested change to ReplayAction (+ its index.jelly ) would be desirable anyway—more transparent to have the link always be there, and if it cannot be used, explain why.
            Hide
            svanoort Sam Van Oort added a comment -

            So, we were right that it's related to the lazy load functionality, but wrong about why... it has to do with the FlowExecutionOwner doing The Wrong Thing. AFAICT the ReplayAction actually works in testing currently if we apply the workflow-job patch proposed in the linked PR – my guess is something upstream triggers the full execution loading (looking into what, though).

            Show
            svanoort Sam Van Oort added a comment - So, we were right that it's related to the lazy load functionality, but wrong about why... it has to do with the FlowExecutionOwner doing The Wrong Thing. AFAICT the ReplayAction actually works in testing currently if we apply the workflow-job patch proposed in the linked PR – my guess is something upstream triggers the full execution loading (looking into what, though).
            svanoort Sam Van Oort made changes -
            Remote Link This issue links to "Workflow-job #95 (Web Link)" [ 20466 ]
            svanoort Sam Van Oort made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            svanoort Sam Van Oort made changes -
            Assignee Sam Van Oort [ svanoort ]
            Hide
            svanoort Sam Van Oort added a comment -

            Vetting the getOrNull uses:

            • The multibranch one is indeed an issue, so is the parallelTestExecutor, and the Pipeline Timings, others are OK
            • Replay needs to fully fetch executions if you are fetching execution info (addressing via PR).
            • We do need to address the use of getOrNull in ReplayAction as well
            Show
            svanoort Sam Van Oort added a comment - Vetting the getOrNull uses: The multibranch one is indeed an issue, so is the parallelTestExecutor, and the Pipeline Timings, others are OK Replay needs to fully fetch executions if you are fetching execution info (addressing via PR). We do need to address the use of getOrNull in ReplayAction as well
            svanoort Sam Van Oort made changes -
            Remote Link This issue links to "workflow-multibranch-plugin #73 (Web Link)" [ 20467 ]
            svanoort Sam Van Oort made changes -
            Remote Link This issue links to "parallel-test-executor-plugin #35 (Web Link)" [ 20468 ]
            svanoort Sam Van Oort made changes -
            Remote Link This issue links to "workflow-cps-plugin #222 (Web Link)" [ 20469 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Sam Van Oort
            Path:
            src/main/java/org/jenkinsci/plugins/workflow/multibranch/JobPropertyStep.java
            http://jenkins-ci.org/commit/workflow-multibranch-plugin/43f130cbc0a339a02233e9bab943752d4cf44007
            Log:
            Fix JENKINS-50784 by using a blocking load of execution where needed

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Sam Van Oort Path: src/main/java/org/jenkinsci/plugins/workflow/multibranch/JobPropertyStep.java http://jenkins-ci.org/commit/workflow-multibranch-plugin/43f130cbc0a339a02233e9bab943752d4cf44007 Log: Fix JENKINS-50784 by using a blocking load of execution where needed
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Sam Van Oort
            Path:
            src/main/java/org/jenkinsci/plugins/workflow/multibranch/JobPropertyStep.java
            http://jenkins-ci.org/commit/workflow-multibranch-plugin/bd2a61ffe01a96d3d9677a202c761c06d9bfc433
            Log:
            Merge pull request #73 from svanoort/fix-flowexecutionowner-lazyload

            Fix JENKINS-50784 by using a blocking load of execution where needed

            Compare: https://github.com/jenkinsci/workflow-multibranch-plugin/compare/60845a36c129...bd2a61ffe01a

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Sam Van Oort Path: src/main/java/org/jenkinsci/plugins/workflow/multibranch/JobPropertyStep.java http://jenkins-ci.org/commit/workflow-multibranch-plugin/bd2a61ffe01a96d3d9677a202c761c06d9bfc433 Log: Merge pull request #73 from svanoort/fix-flowexecutionowner-lazyload Fix JENKINS-50784 by using a blocking load of execution where needed Compare: https://github.com/jenkinsci/workflow-multibranch-plugin/compare/60845a36c129...bd2a61ffe01a
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Sam Van Oort
            Path:
            pom.xml
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction.java
            src/test/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayActionTest.java
            http://jenkins-ci.org/commit/workflow-cps-plugin/64848b1f7f83d7846f607181b675e04aaf8caed0
            Log:
            Fixes to JENKINS-50784 to ensure we hard-fetch FlowExecutions when we need to

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Sam Van Oort Path: pom.xml src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java src/main/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction.java src/test/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayActionTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/64848b1f7f83d7846f607181b675e04aaf8caed0 Log: Fixes to JENKINS-50784 to ensure we hard-fetch FlowExecutions when we need to
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Sam Van Oort
            Path:
            src/main/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction.java
            src/main/resources/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction/index.jelly
            src/test/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayActionTest.java
            http://jenkins-ci.org/commit/workflow-cps-plugin/b29500120684e54eb202f2dad46ad7269e631ab4
            Log:
            Full test and fix for lazy-load issues with ReplayAction JENKINS-50784

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Sam Van Oort Path: src/main/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction.java src/main/resources/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction/index.jelly src/test/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayActionTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/b29500120684e54eb202f2dad46ad7269e631ab4 Log: Full test and fix for lazy-load issues with ReplayAction JENKINS-50784
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Sam Van Oort
            Path:
            pom.xml
            src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction.java
            src/main/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayPipelineCommand.java
            src/main/resources/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction/index.jelly
            src/test/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayActionTest.java
            http://jenkins-ci.org/commit/workflow-cps-plugin/9a54fd25886a301502121918f91bc0b5be29ca13
            Log:
            Merge pull request #222 from svanoort/lazy-load-fixToGetOrNull-JENKINS-50784

            Lazy load fix to get or null JENKINS-50784

            Compare: https://github.com/jenkinsci/workflow-cps-plugin/compare/56f3fde1d5c6...9a54fd25886a

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Sam Van Oort Path: pom.xml src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java src/main/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction.java src/main/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayPipelineCommand.java src/main/resources/org/jenkinsci/plugins/workflow/cps/replay/ReplayAction/index.jelly src/test/java/org/jenkinsci/plugins/workflow/cps/replay/ReplayActionTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/9a54fd25886a301502121918f91bc0b5be29ca13 Log: Merge pull request #222 from svanoort/lazy-load-fixToGetOrNull- JENKINS-50784 Lazy load fix to get or null JENKINS-50784 Compare: https://github.com/jenkinsci/workflow-cps-plugin/compare/56f3fde1d5c6...9a54fd25886a
            Hide
            svanoort Sam Van Oort added a comment -

            Solved as of workflow-cps 2.49, workflow-job 2.20 (plus some ancillary PRs)

            Show
            svanoort Sam Van Oort added a comment - Solved as of workflow-cps 2.49, workflow-job 2.20 (plus some ancillary PRs)
            svanoort Sam Van Oort made changes -
            Status In Progress [ 3 ] Closed [ 6 ]
            Resolution Done [ 10000 ]
            Hide
            reinholdfuereder Reinhold Füreder added a comment -

            Yes, indeed: it works! (Sorry for the delay)

            Thanks Sam Van Oort

            Show
            reinholdfuereder Reinhold Füreder added a comment - Yes, indeed: it works! (Sorry for the delay) Thanks Sam Van Oort

              People

              • Assignee:
                svanoort Sam Van Oort
                Reporter:
                reinholdfuereder Reinhold Füreder
              • Votes:
                3 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: