Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55038

`Parameterized Remote Trigger Plugin` sometimes fails with poll interval value more than 5 minutes

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Labels:
      None
    • Environment:
      Jenkins version - 2.89.4 (Ubuntu 16.04.5 LTS x64, openjdk version "1.8.0_151")
      Parameterized Remote Trigger Plugin version - 3.0.5
    • Similar Issues:

      Description

       

      There is a problem when we try to use poll interval parameter's value more than 300 seconds(5 minutes). We have a Jenkins pipeline which may take from 30 to 60 minutes.

      In some cases Jenkins `queued` item may move to the `pending` state as it described here:
      https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Queue.java#L139

      In such case the plugin use user specified poll interval time out value to check queued item state:

      https://github.com/jenkinsci/parameterized-remote-trigger-plugin/blob/master/src/main/java/org/jenkinsci/plugins/ParameterizedRemoteTrigger/RemoteBuildConfiguration.java#L696

      But all `queued` items in the Jenkins have time to live only 5 minutes as you can see it here:
      https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Queue.java#L218

      As result we have triggered `build` on the remote Jenkins server but failed build(`Max number of connection retries have been exeeded` error from the plugin) on the main Jenkins server: 

      Triggering parameterized remote job 'http://192.168.200.8:6680/job/RunTests'
       Using job-level defined 'Credentials Authentication' as user '***' (Credentials ID '***')
      Triggering remote job now.
      CSRF protection is enabled on the remote server.
      The remote job is pending. Waiting for next available executor on master.
       Remote job queue number: 124200
      Waiting for remote build to be executed...
      Waiting for 300 seconds until next poll.
      Connection to remote server failed [404], waiting for to retry - 300 seconds until next attempt. URL: http://192.168.200.8:6680//queue/item/124200/api/json/, parameters: 
      Retry attempt #1 out of 5
      Connection to remote server failed [404], waiting for to retry - 300 seconds until next attempt. URL: http://192.168.200.8:6680//queue/item/124200/api/json/, parameters: 
      Retry attempt #2 out of 5
      Connection to remote server failed [404], waiting for to retry - 300 seconds until next attempt. URL: http://192.168.200.8:6680//queue/item/124200/api/json/, parameters: 
      Retry attempt #3 out of 5
      Connection to remote server failed [404], waiting for to retry - 300 seconds until next attempt. URL: http://192.168.200.8:6680//queue/item/124200/api/json/, parameters: 
      Retry attempt #4 out of 5
      Connection to remote server failed [404], waiting for to retry - 300 seconds until next attempt. URL: http://192.168.200.8:6680//queue/item/124200/api/json/, parameters: 
      Retry attempt #5 out of 5
      Max number of connection retries have been exeeded.

       

      From my point of view there is a simple fix: use default value of the poll interval(10 seconds) to check `queued` item state. Because in the `pending` state the `queued` item is staying only for a few seconds: 

       

      int pollIntervalForQueuedItem = this.pollInterval;
      if (pollIntervalForQueuedItem > DEFAULT_POLLINTERVALL) {
       pollIntervalForQueuedItem = DEFAULT_POLLINTERVALL;
      }
      while (buildInfo.isQueued()) {
       context.logger.println("Waiting for " + pollIntervalForQueuedItem + " seconds until next poll.");
       Thread.sleep(pollIntervalForQueuedItem * 1000);
       buildInfo = updateBuildInfo(buildInfo, context);
       handle.setBuildInfo(buildInfo);
      }

       

      i have just created pull request
      https://github.com/lifemanship/Parameterized-Remote-Trigger-Plugin/commit/f65dae8077a0e16b86ce08512cb984a5a879e555 

       

        Attachments

          Activity

          Hide
          cashlalala KaiHsiang Chang added a comment -

          Nick Korsakov thanks for your help, 

          I'll handle the PR asap

          Show
          cashlalala KaiHsiang Chang added a comment - Nick Korsakov thanks for your help,  I'll handle the PR asap
          Show
          cashlalala KaiHsiang Chang added a comment - https://github.com/jenkinsci/parameterized-remote-trigger-plugin/pull/49  
          Hide
          lifemanship Nick Korsakov added a comment -

          KaiHsiang Chang thank you very much for your work. We will wait for the next plugin release.

          Show
          lifemanship Nick Korsakov added a comment - KaiHsiang Chang thank you very much for your work. We will wait for the next plugin release.
          Hide
          cashlalala KaiHsiang Chang added a comment -

          released

          Show
          cashlalala KaiHsiang Chang added a comment - released
          Hide
          lfiorino Lou Fiorino added a comment -

          There's an additional (and apparently SLIPPERY) problem with checking the status of the submitted job... I believe in acquiring the remote context.
          We've got multipledynamic instances sitting behind a load balancer (HAPROXY). 

          For the sake of this discussion let's use the following configuration:
          haproxy.mydomain.com with listeners on ports 9001 ->myjenkins1.mycompany.com:8443, 9002 ->myjenkins2.mycompany.com:8443, 9003 ->myjenkins3.mycompany.com:8443

           

          The SSL certificates installed on the 3 back end servers are the same cert for the proxy (haproxy)... as the back end servers are rebuilt on demand when there's a revision to the LTS release or significant plugin updates.

          The instance on myjenkins1 attempts to trigger a job on myjenkins2 (i.e. https://haproxy.mydomain.com:*9002*/remote_job_to_be_triggered) via the parameterized remote trigger plugin in a pipleine job.  The job SUCCESSFULLY triggers, however when the pipeline attempts to check the status of the SUCCESSFULLY triggered job, instead of querying https://haproxy.mydomain.com:*9002*/job/remote_job_to_be_triggered/\{build#}/api/json/?seed={seed#}, when it constructs the URL to query, it seems to be pulling the port number from the remote context(evidenced by the presence of "GOT CONTEXT for Buildand Deploy" in the logs on myjenkisn1)  and instead fails to query https://haproxy.mydomain.com:*8443*/job/remote_job_to_be_triggered/\{build#}/api/json/?seed={seed#}.

          The error will either be an HTTP 404 (not found)... or if there DOES happen to be a listener available on port 8443 on haproxy for a DIFFERENT jenkins instance but the requesting user does not have login access, an HTTP 401 (unauthorized).

          This may be an odd configuration, but I have seen other users complaining of similar problems (i.e. using a VIEW based URL, web server front end, proxy, etc.)... and the heart of the problem here is utilization of an inconsistent base URL between the triggering request and polling for the job status.  While the complexity of having the remote context reported back consistently with "odd" configurations, a simple solution would be to simply pull the base URL(protocol/host/port) from the trigger request instead of the remote context.

          I don't have the bandwidth at the moment to do a deep dive into the code, but I cannot imagine the would be a difficult fix... in the meantime we have resorted to performing remote triggers via groovy scripting over the SSH listener and polling for job status inline. (silly workaround for what should be a quick fix).  Hope this provides a little better insight as to the root cause

          Show
          lfiorino Lou Fiorino added a comment - There's an additional (and apparently SLIPPERY) problem with checking the status of the submitted job... I believe in acquiring the remote context. We've got multipledynamic instances sitting behind a load balancer (HAPROXY).  For the sake of this discussion let's use the following configuration: haproxy.mydomain.com with listeners on ports 9001 ->myjenkins1.mycompany.com:8443, 9002 ->myjenkins2.mycompany.com: 8443 , 9003 ->myjenkins3.mycompany.com:8443   The SSL certificates installed on the 3 back end servers are the same cert for the proxy (haproxy)... as the back end servers are rebuilt on demand when there's a revision to the LTS release or significant plugin updates. The instance on myjenkins1 attempts to trigger a job on myjenkins2 (i.e. https://haproxy.mydomain.com:*9002*/remote_job_to_be_triggered ) via the parameterized remote trigger plugin in a pipleine job.  The job SUCCESSFULLY triggers, however when the pipeline attempts to check the status of the SUCCESSFULLY triggered job, instead of querying  https://haproxy.mydomain.com:*9002*/job/remote_job_to_be_triggered/\ {build#}/api/json/?seed={seed#}, when it constructs the URL to query, it seems to be pulling the port number from the remote context(evidenced by the presence of "GOT CONTEXT for Buildand Deploy" in the logs on myjenkisn1 )  and instead fails to query  https://haproxy.mydomain.com:*8443*/job/remote_job_to_be_triggered/\ {build#}/api/json/?seed={seed#}. The error will either be an HTTP 404 (not found)... or if there DOES happen to be a listener available on port 8443 on haproxy for a DIFFERENT jenkins instance but the requesting user does not have login access, an HTTP 401 (unauthorized). This may be an odd configuration, but I have seen other users complaining of similar problems (i.e. using a VIEW based URL, web server front end, proxy, etc.)... and the heart of the problem here is utilization of an inconsistent base URL between the triggering request and polling for the job status.  While the complexity of having the remote context reported back consistently with "odd" configurations, a simple solution would be to simply pull the base URL(protocol/host/port) from the trigger request instead of the remote context. I don't have the bandwidth at the moment to do a deep dive into the code, but I cannot imagine the would be a difficult fix... in the meantime we have resorted to performing remote triggers via groovy scripting over the SSH listener and polling for job status inline. (silly workaround for what should be a quick fix).  Hope this provides a little better insight as to the root cause
          Hide
          cashlalala KaiHsiang Chang added a comment -

          Thanks for you report, but It would be nice if you can provide me a docker-compose environment cuz I'm not able to reproduce the issue without all you configurations. 

          With the testable environment, maybe I can handle it when I'm free.

          Show
          cashlalala KaiHsiang Chang added a comment - Thanks for you report, but It would be nice if you can provide me a docker-compose environment cuz I'm not able to reproduce the issue without all you configurations.  With the testable environment, maybe I can handle it when I'm free.

            People

            • Assignee:
              cashlalala KaiHsiang Chang
              Reporter:
              lifemanship Nick Korsakov
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: