[JENKINS-27922] Jenkins job execution becomes unstable - jobs fail with OOM: unable to create new native thread - Jenkins Jira

Type: Bug
Resolution: Duplicate
Priority: Critical
Component/s: ssh-agent-plugin
Labels:
- outofmemoryerror
Environment:

Hide
Jenkins 1.598
Linux 2.6.32-431.5.1.el6.x86_64
CentOS release 6.5 (Final)

Plugins:
Attached file saved from pluginManager/api/xml?tree=plugins[shortName,version]&pretty

Load:
Single master node with 5 executors, no slaves
Average job throughput ~23 jobs per hour
Number of configured jobs: 66

Show
Jenkins 1.598 Linux 2.6.32-431.5.1.el6.x86_64 CentOS release 6.5 (Final) Plugins: Attached file saved from pluginManager/api/xml?tree=plugins[shortName,version]&pretty Load: Single master node with 5 executors, no slaves Average job throughput ~23 jobs per hour Number of configured jobs: 66

Similar Issues:
Powered by SuggestiMate

Show

After running for 2-3 days, jenkins jobs no longer launch.

The console outputs usually just say that fetching from git failed, but sometimes contain other unusual errors.

The system log for jenkins reports

java.lang.OutOfMemoryError: unable to create new native thread

I was able to get a heap dump but due to the potential inclusion of sensitive data cannot post it.

In VisualVM analysis of the heap dump, I noticed that there are almost 1000 instances of AgentServer and AgentServer$1. The threads don't show up in the thread monitor, but are still referenced somehow.

Unfortunately the parent references are numerous and hard to decipher. The proximate parent is the ThreadGroup.threads array in the main ThreadGroup instance. This seems unlikely to be the true root cause.

I also noticed about the same number of ThreadLocalMap instances, so the leak may be related to incorrect use of ThreadLocal.

Attached a screenshot of the AgentServer$1 instances in VisualVM, and the jenkins system log.

Please let me know if there is any other analysis I can provide.

I am entering this bug as blocker because I don't currently have a workaround. I am using jenkins in conjunction with an external php application that needs to post jobs to the jenkins build queue. Therefore, in order to workaround, I need to implement a controlled shutdown process and restart jenkins at a daily or semi-daily interval. This will ultimately require the calling application to retry, which is probably a good idea anyway, but is not yet implemented.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

jenkins.log
146 kB
2015-04-13 18:06
plugins.xml
3 kB
2015-04-13 18:06
Screen Shot 2015-04-13 at 11.02.27 AM.png
314 kB
2015-04-13 18:06
Thread dump [Jenkins].html
485 kB
2015-04-14 22:17
Thread dump [Jenkins].html
348 kB
2015-04-14 18:13

duplicates

JENKINS-27555 ssh-agent plugin leaking file descriptors leaving behind jenkinsXXXXXX.jnr socket files

Resolved

Assignee:: Unassigned

Reporter:: Jamie Doornbos

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2015-04-13 18:06

Updated:: 2015-06-15 17:03

Resolved:: 2015-06-15 17:03

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates