Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50597

Verify behavior of timeouts, interrupts, and network disconnections in S3 storage

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Sam Van Oort reminds me that we need to examine the behavior of this plugin with respect to timeouts and network failures and the like. Specifically, we can classify anomalous events as follows:

      • Network failures, throwing an exception from some socket call typically.
      • Network hangs (perhaps due to misconfigured TCP settings), whereby a socket call just blocks indefinitely (java.io versions are typically immune to interruption except by Thread.stop, alas).
      • User-initiated interrupt: Stop button is clicked.
      • System-initiated interrupt, such as via the timeout step.

      The code which would be impacted by such events can also be classified:

      • Master-side S3 metadata calls made in the course of a build, such as for archiveArtifacts, typically inside SynchronousNonBlockingStepExecution.
      • Master-side S3 metadata calls made in the context of a build but not inside a build step:
        • artifact & stash deletion during log rotation of old builds
        • stash deletion at the end of a build
        • artifact & stash copy during checkpoint resumption
      • Master-side S3 metadata calls made completely outside the context of a build:
        • artifact browsing from classic UI
        • same but from Blue Ocean
      • Agent-side URL GET or POST calls made from a build step.

      Draft acceptance criteria:

      • Build steps may hang or fail due to network issues, but timeout or manual interrupts must be honored promptly. (retry can be used for critical builds when there is an advance expectation of problems; checkpoints can also be used for manual intervention.)
      • Operations associated with a build but outside the context of a build step must apply some reasonable timeout, and if this is exceeded, either fail or issue a warning, according to the nature of the API.
      • Operations associated with an HTTP request thread in classic UI may block on the network, though if some reasonable timeout is exceeded an HTTP error should be returned and the thread returned to the pool.
      • Blue Ocean behavior is TBD. Ideally these REST calls would be asynchronous and not block rendering of the Artifacts tab.

        Attachments

          Issue Links

            Activity

            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Carlos Sanchez
            Path:
            pom.xml
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
            http://jenkins-ci.org/commit/artifact-manager-s3-plugin/2fece887119dd8ad512aa6213ddb6079908ebe6b
            Log:
            Merge pull request #41 from jenkinsci/network-JENKINS-50597

            JENKINS-50597 Network behavior tuning III

            Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/bb65de81dfd5...2fece887119d
            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/2fece887119dd8ad512aa6213ddb6079908ebe6b Log: Merge pull request #41 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning III Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/bb65de81dfd5...2fece887119d * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            Hide
            jglick Jesse Glick added a comment -

            From code inspection and such experiments as I can run, there are these basic cases:

            • Master creates a presigned URL (no network operation) and agent uploads to or downloads from it. We need to have custom code to handle network errors, hangs, and HTTP errors distinguishing 4xx (fatal) from 5xx (retryable).
            • Master makes a metadata call. jclouds itself handles timeouts and retries. While we could probably influence its strategies if we needed to, it seems to bake in reasonable defaults, so unless we observe some serious problem from the field, leave well enough alone.
            • Master downloads bits. This only happens in some relatively unusual cases from HTTP threads. Not obvious what the jclouds behavior is when there is, say, a network hang in the middle, but anyway this would at worst block on handler thread and probably the servlet container imposes some limits.

            Still checking Blue Ocean behavior.

            Show
            jglick Jesse Glick added a comment - From code inspection and such experiments as I can run, there are these basic cases: Master creates a presigned URL (no network operation) and agent uploads to or downloads from it. We need to have custom code to handle network errors, hangs, and HTTP errors distinguishing 4xx (fatal) from 5xx (retryable). Master makes a metadata call. jclouds itself handles timeouts and retries. While we could probably influence its strategies if we needed to, it seems to bake in reasonable defaults, so unless we observe some serious problem from the field, leave well enough alone. Master downloads bits. This only happens in some relatively unusual cases from HTTP threads. Not obvious what the jclouds behavior is when there is, say, a network hang in the middle, but anyway this would at worst block on handler thread and probably the servlet container imposes some limits. Still checking Blue Ocean behavior.
            Hide
            jglick Jesse Glick added a comment -

            B.O. behavior seems less than ideal but OK. If you, say, disconnect your network prior to open the main page for a build, you get a brief delay while jclouds retries the connection, and then Run.getArtifactsUpTo warns you about the error. This seems to be done by PipelineStatePreloader but it does not seem to block the general page rendering unless I misread the Chrome timing graph. If you then go to the Artifacts tab, it tries again, this time from /blue/organizations/jenkins/smokes/detail/…/artifacts/. Actual artifact downloads use the classic URL which does a redirect, so that is fine.

            Show
            jglick Jesse Glick added a comment - B.O. behavior seems less than ideal but OK. If you, say, disconnect your network prior to open the main page for a build, you get a brief delay while jclouds retries the connection, and then Run.getArtifactsUpTo warns you about the error. This seems to be done by PipelineStatePreloader but it does not seem to block the general page rendering unless I misread the Chrome timing graph. If you then go to the Artifacts tab, it tries again, this time from /blue/organizations/jenkins/smokes/detail/…/artifacts/ . Actual artifact downloads use the classic URL which does a redirect, so that is fine.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Carlos Sanchez
            Path:
            pom.xml
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsVirtualFile.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadata.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
            src/test/java/io/jenkins/plugins/artifact_manager_s3/JCloudsArtifactManagerTest.java
            http://jenkins-ci.org/commit/artifact-manager-s3-plugin/0a012ef1c974fcde11328a5f66f6e58634f55fee
            Log:
            Merge pull request #42 from jenkinsci/network-JENKINS-50597

            JENKINS-50597 Network behavior tuning IV

            Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/2561a7ad88ee...0a012ef1c974
            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsVirtualFile.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadata.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java src/test/java/io/jenkins/plugins/artifact_manager_s3/JCloudsArtifactManagerTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/0a012ef1c974fcde11328a5f66f6e58634f55fee Log: Merge pull request #42 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning IV Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/2561a7ad88ee...0a012ef1c974 * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            Hide
            jglick Jesse Glick added a comment -

            Main work done. Cannot close without approval from ikedam.

            Show
            jglick Jesse Glick added a comment - Main work done. Cannot close without approval from ikedam .

              People

              • Assignee:
                jglick Jesse Glick
                Reporter:
                jglick Jesse Glick
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: