-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Jenkins 1.6.4.1 on Ubuntu 14.04
On every Jenkins restart (or start) the ECS plugin throws a "The reference task was not found" exception which prevents Jenkins from starting correctly.
I've also noticed that there are occasions when this happens when you are attempting to apply or save within the Jenkins configuration which bubbles up to the UI & stops you from saving.
The full stack trace is here:
hudson.util.HudsonFailedToLoad: com.amazonaws.services.ecs.model.ClientException: The referenced task was not found. (Service: AmazonECS; Status Code: 400; Error Code: ClientException; Request ID: 041ee786-b4c1-11e5-a864-d7bdaaa4b5cd) at hudson.WebAppMain$3.run(WebAppMain.java:237) Caused by: com.amazonaws.services.ecs.model.ClientException: The referenced task was not found. (Service: AmazonECS; Status Code: 400; Error Code: ClientException; Request ID: 041ee786-b4c1-11e5-a864-d7bdaaa4b5cd) at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1181) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:766) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:485) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:306) at com.amazonaws.services.ecs.AmazonECSClient.invoke(AmazonECSClient.java:2199) at com.amazonaws.services.ecs.AmazonECSClient.stopTask(AmazonECSClient.java:1874) at com.cloudbees.jenkins.plugins.amazonecs.ECSCloud.deleteTask(ECSCloud.java:205) at com.cloudbees.jenkins.plugins.amazonecs.ECSSlave._terminate(ECSSlave.java:90) at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:67) at hudson.slaves.CloudRetentionStrategy.check(CloudRetentionStrategy.java:58) at hudson.slaves.CloudRetentionStrategy.check(CloudRetentionStrategy.java:42) at hudson.slaves.SlaveComputer$4.run(SlaveComputer.java:717) at hudson.model.Queue._withLock(Queue.java:1346) at hudson.model.Queue.withLock(Queue.java:1229) at hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:714) at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:118) at hudson.model.AbstractCIBase.access$000(AbstractCIBase.java:44) at hudson.model.AbstractCIBase$2.run(AbstractCIBase.java:186) at hudson.model.Queue._withLock(Queue.java:1346) at hudson.model.Queue.withLock(Queue.java:1229) at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:169) at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1247) at jenkins.model.Jenkins.<init>(Jenkins.java:844) at hudson.model.Hudson.<init>(Hudson.java:83) at hudson.model.Hudson.<init>(Hudson.java:79) at hudson.WebAppMain$3.run(WebAppMain.java:225)
When you have containers / tasks which execute correctly & aren't spawning a lot of builds then it happens less, but when you've got issues with the `jenkins-slave` entrypoint coming up correctly of the server rejecting the JNLP agent (because of a key issue or something) then it happens a lot & basically stops you from using the configuration UI correctly.
A workaround on Jenkins startup is to delete /var/lib/jenkins/plugins/amazon-ecs & then the startup can happen normally
I'm thinking of putting a try / catch around com.cloudbees.jenkins.plugins.amazonecs.ECSCloud.deleteTask(ECSCloud.java:205) but am not sure whether this is appropriate (it'll certainly stop the issues above though)
Thoughts before I make a PR?