Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-16879

More robust display detection needed - builds fail when many builds require Xvnc

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: xvnc-plugin
    • Labels:
      None
    • Environment:
      Ubuntu 12.04, Jenkins 1.500, XVNC plugin 1.10
    • Similar Issues:

      Description

      We're having issues with failing builds. We're running several builds in parallel on the same Jenkins machine, and many of them use XVNC, and I'm assuming it's related to this.

      Builds fail several times a day with the following error (though displays might differ, obviously):

      Starting xvnc
      [my-build] $ /usr/bin/vncserver :37 -geometry 1920x1280
      A VNC server is already running as :37
      Starting xvnc
      [my-build] $ /usr/bin/vncserver :49 -geometry 1920x1280
      A VNC server is already running as :49
      Starting xvnc
      [my-build] $ /usr/bin/vncserver :50 -geometry 1920x1280
      A VNC server is already running as :50
      Starting xvnc
      [my-build] $ /usr/bin/vncserver :51 -geometry 1920x1280
      A VNC server is already running as :51
      FATAL: Failed to run '/usr/bin/vncserver :51 -geometry 1920x1280' (exit code 98), blacklisting display #51; consider checking the "Clean up before start" option
      java.io.IOException: Failed to run '/usr/bin/vncserver :51 -geometry 1920x1280' (exit code 98), blacklisting display #51; consider checking the "Clean up before start" option
      	at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:100)
      	at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:98)
      	at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:98)
      	at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:98)
      	at hudson.plugins.xvnc.Xvnc.setUp(Xvnc.java:73)
      	at hudson.model.Build$BuildExecution.doRun(Build.java:154)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:592)
      	at hudson.model.Run.execute(Run.java:1557)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:236)
      

      To me it seems like displays in use are blacklisted, and this is undesired since we currently don't have any stale locks in /tmp/.X11-unix/ (where our VNC locks are placed).

      Could locks in /tmp/.X*-lock and /tmp/.X11-unix/X* be considered when trying to start a new display? Could we allow more retries than the current 3? Or do you have any other ideas on how to address this issue?

        Attachments

          Activity

          Hide
          davidparsson David Pärsson added a comment -

          If necessary I could contribute with a patch, but if so I'd appreciate if you could point me in a good direction.

          Show
          davidparsson David Pärsson added a comment - If necessary I could contribute with a patch, but if so I'd appreciate if you could point me in a good direction.
          Hide
          jglick Jesse Glick added a comment -

          Not sure what the root cause is. The plugin maintains a list of free display numbers so it should not be attempting to reuse one unless that build is done. Perhaps the vncserver -kill at the end is failing?

          Show
          jglick Jesse Glick added a comment - Not sure what the root cause is. The plugin maintains a list of free display numbers so it should not be attempting to reuse one unless that build is done. Perhaps the vncserver -kill at the end is failing?
          Hide
          davidparsson David Pärsson added a comment - - edited

          I think the actual cause in this case was that we ran out of TCP ports in VNC's port range because of a misconfigured Jenkins machine under heavy load, but I think I've seen this a few times before that bad configuration was applied as well.

          Is it so expensive to try to start a VNC server so that we can't afford to try more than three times? And why are displays never reused?

          Show
          davidparsson David Pärsson added a comment - - edited I think the actual cause in this case was that we ran out of TCP ports in VNC's port range because of a misconfigured Jenkins machine under heavy load, but I think I've seen this a few times before that bad configuration was applied as well. Is it so expensive to try to start a VNC server so that we can't afford to try more than three times? And why are displays never reused?
          Hide
          jglick Jesse Glick added a comment -

          Trying more than three times would probably not hurt but I doubt it would help. The real problem is that displays are not being reused in your case. I have no idea why; you will need to debug it.

          I just committed (but have not yet released) a fix for JENKINS-12431; probably unrelated but worth checking just in case.

          Show
          jglick Jesse Glick added a comment - Trying more than three times would probably not hurt but I doubt it would help. The real problem is that displays are not being reused in your case. I have no idea why; you will need to debug it. I just committed (but have not yet released) a fix for JENKINS-12431 ; probably unrelated but worth checking just in case.
          Hide
          jglick Jesse Glick added a comment -

          https://github.com/jenkinsci/xvnc-plugin/pull/2 purports to fix this or something similar but the root cause of the problem is not explained or directly addressed.

          Show
          jglick Jesse Glick added a comment - https://github.com/jenkinsci/xvnc-plugin/pull/2 purports to fix this or something similar but the root cause of the problem is not explained or directly addressed.
          Hide
          davidparsson David Pärsson added a comment -

          That's from a colleague of mine, and the fix seems to have resolved the problem for us. The root cause was related to external factors. From my point of view this issue could be considered as resolved.

          Show
          davidparsson David Pärsson added a comment - That's from a colleague of mine, and the fix seems to have resolved the problem for us. The root cause was related to external factors. From my point of view this issue could be considered as resolved.
          Hide
          levsa Levon Saldamli added a comment -

          Resolved in xvnc-1.12

          Show
          levsa Levon Saldamli added a comment - Resolved in xvnc-1.12

            People

            • Assignee:
              Unassigned
              Reporter:
              davidparsson David Pärsson
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: