Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: Minor
Component/s: kubernetes-plugin
Labels:
None

Similar Issues:

Show
Released As:
kubernetes 1.27.0

Previously implemented feature [PR-497|https://github.com/jenkinsci/kubernetes-plugin/pull/497] is no longer working as intended.

Following the new implementation of moving the responsibility of cleaning up terminated pods to a Reaper class ( [PR-772|https://github.com/jenkinsci/kubernetes-plugin/pull/772] ), the aforementioned feature is no longer working. The expected behavior is when an invalid Docker image is used for a container, resulting in the pod failing due to an ImagePullBackoff, a corresponding error message is printed to the caller build's console output and the build is canceled/aborted.
The error message is being printed, but the build is no longer being canceled, resulting in the build continuously looping requesting for a worker pod, having the pod fail and terminate, and a then requesting a new one again.

The problem occurs due to there being no items in the Queue when the Reaper receives the pod failure event. Thus, when the Reaper goes to check the Queue (here]), it's unable to locate the corresponding Queue Item. And without the Queue Item, it's unable to get a reference to the original job to cancel it.

Before the change, the Queue Item search was handled by the AllContainersRunningPodWatcher.areAllContainersRunning() method]
And checking the Queue then gives us a Queue Item.

So due to the terminating pod clean up responsibility being moved from AllContainersRunningPodWatcher to Reaper , the Queue Item responsible for the pod creation has been removed by the time the Reaper has been notified of the event, resulting in an infinite loop of requesting new pods only for them to fail because the Reaper is not being able to find the corresponding build to cancel.

We're currently trying submit a fix, but some help would be appreciated that could help us figure out either:

A way to keep the Queue Item for creating the job in the Queue long enough for the Reaper to use it.
A way for the Reaper to be made aware of the event before the Queue Item is removed from the Queue
Or if we need to move the canceling build functionality out of the Reaper and back into the AllContainersRunningPodWatcher

Steps to Recreate Issue:

Create a Jenkinsfile pipeline with a kubernetes agent that specifies a container using a nonexistent Docker image
Build the job.
Infinite loop.

Assignee:: Unassigned

Reporter:: Pierson Yieh

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2020-08-19 01:19

Updated:: 2020-08-28 09:27

Resolved:: 2020-08-26 20:49

Details

Description

Attachments

Activity

People

Dates