Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60087

Kubernetes nodes failed not removed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • core, kubernetes-plugin
    • None
    • Jenkins LTS 2.190.2
      Kubernetes Plugin 1.20.2

      When my pods are killed by OOM, the nodes aren't removed, this pollutes the interface and causes the job stay running but zombie.

      If I click to abort the job it prints "Are you sure you want to abort null?"

      This message come from executors.jelly when executor.currentExecutable.fullDisplayName is null.

      On proceed it deletes the node, as expected.

      In the logs I found these entries:

      INFO	o.c.j.p.k.pod.retention.Reaper#eventReceived: default/infra-mf3jg was just deleted, so removing corresponding Jenkins agent
      INFO	j.s.DefaultJnlpSlaveReceiver#channelClosed: IOHub#1: Worker[channel:java.nio.channels.SocketChannel[connected local=/172.17.0.2:50000 remote=ip-172-16-29-221.ec2.internal/172.16.29.221:39454]] / Computer.threadPoolForRemoting [#12347] for infra-mf3jg terminated: java.nio.channels.ClosedChannelException
      

      I think it's related to Reaper class, when DELETED event is received (here) which calls Node#removeNode.] There I found this comment "If the node instance is not in the list of nodes, then this will be a no-op, even if there is another instance with the same".
      I think by some reason the instance passed by Reaper is different from Node, which causes it to be ignored.
      The OfflineCause for the node is "Node is being removed"

            Unassigned Unassigned
            bkmeneguello Bruno Meneguello
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: