Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22427

Parameterized Remote Trigger Plugin fails when remote job waits for available executor

    Details

    • Similar Issues:

      Description

      When the remote job is still waiting for an available executor, the job that triggered the remote job, gives a build failed status.

      Triggering this remote job: *****
      Checking that the remote job ***** is not currently building.
      Remote job remote job ***** is not currenlty building.
      This job is build #[**] on the remote server.
      Triggering remote job now.
      Blocking local job until remote job completes
      ERROR: Remote build failed for the following reason:
      ERROR: http://****/job/***/**/api/json
      Finished: FAILURE

        Attachments

          Activity

          v969540 Kevin Van Poppel created issue -
          Hide
          v969540 Kevin Van Poppel added a comment -

          Fixed it myself.
          When the build is waiting for an executor, the constructed url returns a 404 message.
          I added a loop where it tries to get a new response (every 5 seconds, max 5 min) instead of immediately sending a build failed signal.

          Show
          v969540 Kevin Van Poppel added a comment - Fixed it myself. When the build is waiting for an executor, the constructed url returns a 404 message. I added a loop where it tries to get a new response (every 5 seconds, max 5 min) instead of immediately sending a build failed signal.
          v969540 Kevin Van Poppel made changes -
          Field Original Value New Value
          Status Open [ 1 ] Resolved [ 5 ]
          Assignee Maurice W. [ morficus ] Kevin Van Poppel [ v969540 ]
          Resolution Fixed [ 1 ]
          Hide
          morficus Maurice W. added a comment -

          Kevin - could you either share the patch you did as a gist (http://gist.github.com) or issue a PR.
          Thanks

          Show
          morficus Maurice W. added a comment - Kevin - could you either share the patch you did as a gist ( http://gist.github.com ) or issue a PR. Thanks
          Hide
          morficus Maurice W. added a comment - - edited

          My alternate solution is to have RemoteBuildConfiguration::sendHTTPCall() treat 404's as a non-error when it's called from RemoteBuildConfiguration::getBuildStatus()

          Might not be the safest solution (since it walks up the call-stack to figure out who called it) but it certainly seems to be the most effective so far.

          Either way, I would still like to see the code you implemented.

          Show
          morficus Maurice W. added a comment - - edited My alternate solution is to have RemoteBuildConfiguration::sendHTTPCall() treat 404's as a non-error when it's called from RemoteBuildConfiguration::getBuildStatus() Might not be the safest solution (since it walks up the call-stack to figure out who called it) but it certainly seems to be the most effective so far. Either way, I would still like to see the code you implemented.
          morficus Maurice W. made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Assignee Kevin Van Poppel [ v969540 ] Maurice W. [ morficus ]
          Hide
          v969540 Kevin Van Poppel added a comment -

          Hi Maurice,

          Here is the link to the code:
          https://github.com/v969540/ParamRemoteTrigger

          It would be nice if you could implement it in the official code or, if you know a better solution then mine
          I might have made little changes to the POM file etc, but these can be reverted if you like.

          I don't think that it is safe to threat every 404 as a non-error. I was first thinking of a call to the build queue, but didn't found a way to do this.

          Greetings,

          Kevin

          Show
          v969540 Kevin Van Poppel added a comment - Hi Maurice, Here is the link to the code: https://github.com/v969540/ParamRemoteTrigger It would be nice if you could implement it in the official code or, if you know a better solution then mine I might have made little changes to the POM file etc, but these can be reverted if you like. I don't think that it is safe to threat every 404 as a non-error. I was first thinking of a call to the build queue, but didn't found a way to do this. Greetings, Kevin
          Hide
          canuck1987 Tim Brown added a comment -

          I had a similar issue here where we had network issues. I added a trying a retry in sendHTTPCall (when it caught a IOException is thrown). This currently waits 'pollInterval' seconds and retries sending the HTTP call again. This will retry up to this.getConnectionRetryLimit() times - which is currently hardcoded to 5. It would be goo to allow this as a configurable feature if used.

          This allows any call to be retried up to a specific amount of times with an configurable interval.

          GIST: https://gist.github.com/timbrown5/ec4add8797fbf4d9cd19

          Show
          canuck1987 Tim Brown added a comment - I had a similar issue here where we had network issues. I added a trying a retry in sendHTTPCall (when it caught a IOException is thrown). This currently waits 'pollInterval' seconds and retries sending the HTTP call again. This will retry up to this.getConnectionRetryLimit() times - which is currently hardcoded to 5. It would be goo to allow this as a configurable feature if used. This allows any call to be retried up to a specific amount of times with an configurable interval. GIST: https://gist.github.com/timbrown5/ec4add8797fbf4d9cd19
          Hide
          morficus Maurice W. added a comment -

          Tim - I like your solution of the recursive function with and tracking "numberOfAttempts". Requires minimal change to the current code base and is pretty easy to follow. And it still follows the same concept at Kevin outlined (a re-try loop).

          But the flaw they both have... is that they local build can still fail if the remote build does not complete before hitting the max number of retries. At the moment I have just added a new exception to give a clear indication as to why the build failed (you can see it here: https://gist.github.com/morficus/5bd94e330bf4212679b5).

          A work-around that would require no code change... is to increase the default "poll interval" from 10sec to 30sec or even 60sec. That at least decreases the odds of hitting the max number of retries (still hard-coded to 5) if the remote server does have a long-running build. What do you guys think of that?

          Also, all current changes are in this branch: https://github.com/morficus/Parameterized-Remote-Trigger-Plugin/tree/dev-v2.1.x

          Show
          morficus Maurice W. added a comment - Tim - I like your solution of the recursive function with and tracking "numberOfAttempts". Requires minimal change to the current code base and is pretty easy to follow. And it still follows the same concept at Kevin outlined (a re-try loop). But the flaw they both have... is that they local build can still fail if the remote build does not complete before hitting the max number of retries. At the moment I have just added a new exception to give a clear indication as to why the build failed (you can see it here: https://gist.github.com/morficus/5bd94e330bf4212679b5 ). A work-around that would require no code change... is to increase the default "poll interval" from 10sec to 30sec or even 60sec. That at least decreases the odds of hitting the max number of retries (still hard-coded to 5) if the remote server does have a long-running build. What do you guys think of that? Also, all current changes are in this branch: https://github.com/morficus/Parameterized-Remote-Trigger-Plugin/tree/dev-v2.1.x
          Hide
          v969540 Kevin Van Poppel added a comment -

          Changing the poll interval didn't change anything here.
          The job already failed before polling the first time, because you get the 404 error.

          Do any of you guys know a way to check if the triggerd build is in the remote server's queue?

          Show
          v969540 Kevin Van Poppel added a comment - Changing the poll interval didn't change anything here. The job already failed before polling the first time, because you get the 404 error. Do any of you guys know a way to check if the triggerd build is in the remote server's queue?
          Hide
          canuck1987 Tim Brown added a comment -

          Are you using 'wait to trigger remote builds until no other builds are running'?
          If not does it fix your issue? I found I needed both working together for it to be reliable.

          If so is the problem that the build gets triggered but is waiting in the queue (as it's waiting for and executor)?

          I will see if I can have a look tomorrow. Had a lot on the last week.

          Show
          canuck1987 Tim Brown added a comment - Are you using 'wait to trigger remote builds until no other builds are running'? If not does it fix your issue? I found I needed both working together for it to be reliable. If so is the problem that the build gets triggered but is waiting in the queue (as it's waiting for and executor)? I will see if I can have a look tomorrow. Had a lot on the last week.
          Hide
          canuck1987 Tim Brown added a comment -

          It looks like the root issue is that we are not getting a response from the REST API. This is because until the build get's an executor it won't have a REST API page. whenw e try and call the page we get a null or response. There is an update coming for the JSON issue you're seeing, thanks to @scotthains, which will likely fix your issue.

          I think this should work because getBuildStatus method takes a null response (from sendHTTPCall) to mean the build has not yet started - the problem with this though is that this means if someone cancels the job before it gets an executor the Remote trigger plugsdsdin will wait indefinitely.

          The best way I can see to solve this is to try and get Jenkins Core to give jobs an API page before they get an executor. I'm not sure how easy that would be as, if I remember correctly, the job is a different class of object after it gets and executor and before.

          Did you check with the link Maurice posted?

          Show
          canuck1987 Tim Brown added a comment - It looks like the root issue is that we are not getting a response from the REST API. This is because until the build get's an executor it won't have a REST API page. whenw e try and call the page we get a null or response. There is an update coming for the JSON issue you're seeing, thanks to @scotthains, which will likely fix your issue. I think this should work because getBuildStatus method takes a null response (from sendHTTPCall) to mean the build has not yet started - the problem with this though is that this means if someone cancels the job before it gets an executor the Remote trigger plugsdsdin will wait indefinitely. The best way I can see to solve this is to try and get Jenkins Core to give jobs an API page before they get an executor. I'm not sure how easy that would be as, if I remember correctly, the job is a different class of object after it gets and executor and before. Did you check with the link Maurice posted?
          Hide
          v969540 Kevin Van Poppel added a comment -

          'wait to trigger remote builds until no other builds are running' doesn't fix the issue for me.

          Do you have any idea when this JSON update is getting released?

          Show
          v969540 Kevin Van Poppel added a comment - 'wait to trigger remote builds until no other builds are running' doesn't fix the issue for me. Do you have any idea when this JSON update is getting released?
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Maurice Williams
          Path:
          CHANGELOG.md
          src/main/java/org/jenkinsci/plugins/ParameterizedRemoteTrigger/RemoteBuildConfiguration.java
          http://jenkins-ci.org/commit/parameterized-remote-trigger-plugin/0f704da096418c11c543c36382efc997b6125a2a
          Log:
          fixing JENKINS-22427 (https://issues.jenkins-ci.org/browse/JENKINS-22427)

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Maurice Williams Path: CHANGELOG.md src/main/java/org/jenkinsci/plugins/ParameterizedRemoteTrigger/RemoteBuildConfiguration.java http://jenkins-ci.org/commit/parameterized-remote-trigger-plugin/0f704da096418c11c543c36382efc997b6125a2a Log: fixing JENKINS-22427 ( https://issues.jenkins-ci.org/browse/JENKINS-22427 )
          Hide
          morficus Maurice W. added a comment -

          this fix is part of the 2.1.2 release done on April 26th

          Show
          morficus Maurice W. added a comment - this fix is part of the 2.1.2 release done on April 26th
          morficus Maurice W. made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Fix Version/s current [ 10162 ]
          Resolution Fixed [ 1 ]
          rtyler R. Tyler Croy made changes -
          Workflow JNJira [ 154528 ] JNJira + In-Review [ 194935 ]

            People

            • Assignee:
              morficus Maurice W.
              Reporter:
              v969540 Kevin Van Poppel
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: