Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59140

Deadlock on ZipInstaller during installIfNecessaryFrom

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • core
    • None

      We're seeing a deadlock on agents when the ZipInstaller tool tries to check whether the tool needs to be installed again or not. 

      A relevant thread dump:

       "org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution [#30]" - Thread t@99095 java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) at hudson.FilePath.installIfNecessaryFrom(FilePath.java:874) at hudson.FilePath.installIfNecessaryFrom(FilePath.java:846) at hudson.tools.ZipExtractionInstaller.performInstallation(ZipExtractionInstaller.java:83) at hudson.tools.InstallerTranslator.getToolHome(InstallerTranslator.java:69) at hudson.tools.ToolLocationNodeProperty.getToolHome(ToolLocationNodeProperty.java:109) at hudson.tools.ToolInstallation.translateFor(ToolInstallation.java:206) at hudson.model.JDK.forNode(JDK.java:147) at hudson.model.JDK.forNode(JDK.java:60) at org.jenkinsci.plugins.workflow.steps.ToolStep$Execution.run(ToolStep.java:152) at org.jenkinsci.plugins.workflow.steps.ToolStep$Execution.run(ToolStep.java:133) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$$Lambda$271/291985104.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

      We believe this comes down to the connection getResponseCode() call waiting indefinitely, from here: https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/FilePath.java#L876

      We're using the ZipInstaller to install a JDK from our artifactory repository (instead of the Oracle JDK website). 

      Proposal

      Add a timeout around the getResponseCode() section (or further down), just to prevent that connection from hanging indefinitely. 

      I have a screenshot of a heap dump where we found what URL was getting stuck within our infrastructure. 

            Unassigned Unassigned
            anemortalkid Jan Monterrubio
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: