Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-12297

Excessive number of postgres processes kept around by mirrobrain

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Component/s: other
    • Labels:
      None

      Description

      On cucumber, I noticed that there are a large number of postgres processes running around with ps output showing something like this:

      root@cucumber:/var/log/apache2# ps -ef | grep post
      postgres   931   998  0 12:02 ?        00:00:04 postgres: mirrorbrain mirrorbrain 127.0.0.1(44778) idle                                                                     
      postgres   998     1  0  2011 ?        00:07:54 /usr/lib/postgresql/8.4/bin/postgres -D /var/lib/postgresql/8.4/main -c config_file=/etc/postgresql/8.4/main/postgresql.conf
      postgres  1030   998  0  2011 ?        00:03:41 postgres: writer process                                                                                                    
      postgres  1031   998  0  2011 ?        00:01:59 postgres: wal writer process                                                                                                
      postgres  1032   998  0  2011 ?        00:02:48 postgres: autovacuum launcher process                                                                                       
      postgres  1033   998  0  2011 ?        00:11:28 postgres: stats collector process                                                                                           
      postgres  7588   998  0 12:16 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(58442) idle                                                                           
      postgres  7990   998  0 12:16 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(58631) idle                                                                           
      postgres  7994   998  0 12:16 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(58635) idle                                                                           
      postgres  7996   998  0 12:16 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(58639) idle                                                                           
      postgres  7999   998  0 12:16 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(58661) idle                                                                           
      postgres  8009   998  0 12:16 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(58662) idle                                                                           
      postgres  8075   998  0 12:16 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(58722) idle                                                                           
      postgres  8079   998  0 12:16 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(58726) idle                                                                           
      postgres  8168   998  0 12:17 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(48148) idle                                                                           
      postgres  8181   998  0 12:17 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(48498) idle                                                                           
      postgres  8225   998  0 12:18 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(49804) idle                                                                           
      postgres  8266   998  0 12:19 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(51263) idle                                                                           
      postgres  8279   998  0 12:19 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(51566) idle                                                                           
      postgres  8286   998  0 12:19 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(51641) idle                                                                           
      postgres  8311   998  0 12:19 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(52046) idle                                                                           
      postgres  8313   998  0 12:19 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(52050) idle                                                                           
      postgres  8381   998  0 12:20 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(52961) idle                                                                           
      postgres  8386   998  0 12:20 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(53005) idle                                                                           
      postgres  8388   998  0 12:20 ?        00:00:00 postgres: mirrorbrain mirrorbrain 127.0.0.1(39663) idle                                                                     
      postgres  8399   998  0 12:20 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(53548) idle                                                                           
      postgres  8404   998  0 12:20 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(53761) idle                                                                           
      postgres  8408   998  0 12:20 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(53805) idle                                                                           
      postgres  8409   998  0 12:20 ?        00:00:00 postgres: mirrorbrain mirrorbrain ::1(53806) idle                                                                           
      postgres 31053   998  0 11:45 ?        00:00:00 postgres: mirrorbrain mirrorbrain 127.0.0.1(41433) idle                                                                     
      postgres 31056   998  0 11:45 ?        00:00:00 postgres: mirrorbrain mirrorbrain 127.0.0.1(41442) idle                                                                     
      

      Postgres apparently have some built-in throttling mechanism to restrict the total concurrent connections that can be served, and it appears to be around 100. As the number of these idle processes climb up, Drupal starts to fail indicating that the database connection had failed, and it renders the whole http://jenkins-ci.org/ unusable. I noticed this Drupal error last night, and a little investigation led to these idle processes.

      I've run a shell script that counts the # of idle processes every 10 seconds for past 3 hours, and created a chart out of it. As you see, about 10 times in the past 3 hours the # of processes spike up to the level dangerously close to the ceiling.

      I think we need to investigate that before this becomes a major issue.

        Activity

        kohsuke Kohsuke Kawaguchi created issue -
        Hide
        kohsuke Kohsuke Kawaguchi added a comment -

        Most of those processes are around up to 3-5 minutes.

        Show
        kohsuke Kohsuke Kawaguchi added a comment - Most of those processes are around up to 3-5 minutes.
        Hide
        oldelvet Richard Mortimer added a comment -

        max_connections is tuneable in postgresql.conf. I'm not sure if any other settings need to be increased but it may be something that can be increased until mirrorbrain can be "fixed".

        On my Ubuntu 10.04 system /etc/postgresql/8.4/main/postgresql.conf has

        max_connections = 100                   # (change requires restart)
        # Note:  Increasing max_connections costs ~400 bytes of shared memory per
        # connection slot, plus lock space (see max_locks_per_transaction).  You might
        # also need to raise shared_buffers to support more connections.
        
        Show
        oldelvet Richard Mortimer added a comment - max_connections is tuneable in postgresql.conf. I'm not sure if any other settings need to be increased but it may be something that can be increased until mirrorbrain can be "fixed". On my Ubuntu 10.04 system /etc/postgresql/8.4/main/postgresql.conf has max_connections = 100 # (change requires restart) # Note: Increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). You might # also need to raise shared_buffers to support more connections.
        Hide
        kohsuke Kohsuke Kawaguchi added a comment -

        Had this chat in #mirrobrain

        (09:40:10 AM) kohsuke: I'm trying to understand large number of idle postgres processes that appear to be kept around by mirrorbrain: https://issues.jenkins-ci.org/browse/JENKINS-12297
        (09:40:46 AM) kohsuke: I'm seeing 30-40 of them constantly, and it occasionally goes up to 100 and causes starvation with other clients of postgres.
        (09:41:03 AM) kohsuke: I wonder if anyone can kindly tell me where I should look.
        (02:00:34 PM) poeml: kohsuke: 
        (02:00:36 PM) poeml: hi
        (02:02:16 PM) poeml: idle postgres processes shouldn't harm as such, as long as they don't use resources. However, often postgresql has a connection limit configured, like 100, and if clients try to use that much connections, one runs into a problem.
        (02:03:36 PM) poeml: are you using the connections via mod_mirrorbrain, or via another way (Drupal)?
        (02:04:29 PM) kohsuke: poeml: we are hitting that 100 cap indeed
        (02:04:35 PM) poeml: mod_mirrorbrain uses connection pooling, so it needs only a handful of connections - provided that Apache runs threaded (threaded MPM).
        (02:05:09 PM) poeml: Drupal means PHP, so you maybe use prefork. Then you can easily have 100 Apache (preforked) processes where each of them wants to open a postgresql connection.
        (02:05:23 PM) kohsuke: We are using mod_mirrorbrain, I think, but I think we are forking Apache like mad
        (02:05:26 PM) poeml: or do you use PHP via fastcgi or something else?
        (02:05:44 PM) kohsuke: So I guess that'd be the quick fix --- to change to threaded MPM
        (02:06:11 PM) poeml: you can either increase the number of allowed connections in postgresql - as long as there are the resources for it (cpu and ram).
        (02:06:17 PM) kohsuke: right
        (02:06:27 PM) poeml: or you can limit the number of Apache processes that can be spawned (that's useful anyway).
        (02:07:16 PM) kohsuke: Got it. Come to think of it, it all makes sense
        (02:07:28 PM) poeml: If you have to keep prefork MPM for the reason of PHP, you can still limit the number of processes in the Apache process pool to a number that fits the number of allowed postgresql processes.
        (02:07:32 PM) kohsuke: at least one connection needs to be kept around by apache process to be able to serve requests rapidly
        (02:08:06 PM) poeml: anyway, if you have so many Apache processes lingering around, odds are that you don't really need them (I would expect most of them to be idle, or rather in keepalive state)
        (02:08:53 PM) poeml: get rid of keepalive, or at least limit the keepalive time to 2 seconds at most. If you switch keepalive off even, you'll have dramatically fewer Apache processes -- I predict ;)
        (02:09:41 PM) poeml: the small benefits of keepalive are more than outweighed by the benefit of keeping Apache small and responsive for everybody
        (02:10:26 PM) kohsuke: poeml: thank you very much for your help. I know what to tweak now.
        

        Come to think of it, this is obvious. Given that we are forking Apache, it makes sense that we need to allow as many database connection as the # of Apache processes, minimum.

        Show
        kohsuke Kohsuke Kawaguchi added a comment - Had this chat in #mirrobrain (09:40:10 AM) kohsuke: I'm trying to understand large number of idle postgres processes that appear to be kept around by mirrorbrain: https://issues.jenkins-ci.org/browse/JENKINS-12297 (09:40:46 AM) kohsuke: I'm seeing 30-40 of them constantly, and it occasionally goes up to 100 and causes starvation with other clients of postgres. (09:41:03 AM) kohsuke: I wonder if anyone can kindly tell me where I should look. (02:00:34 PM) poeml: kohsuke: (02:00:36 PM) poeml: hi (02:02:16 PM) poeml: idle postgres processes shouldn't harm as such, as long as they don't use resources. However, often postgresql has a connection limit configured, like 100, and if clients try to use that much connections, one runs into a problem. (02:03:36 PM) poeml: are you using the connections via mod_mirrorbrain, or via another way (Drupal)? (02:04:29 PM) kohsuke: poeml: we are hitting that 100 cap indeed (02:04:35 PM) poeml: mod_mirrorbrain uses connection pooling, so it needs only a handful of connections - provided that Apache runs threaded (threaded MPM). (02:05:09 PM) poeml: Drupal means PHP, so you maybe use prefork. Then you can easily have 100 Apache (preforked) processes where each of them wants to open a postgresql connection. (02:05:23 PM) kohsuke: We are using mod_mirrorbrain, I think, but I think we are forking Apache like mad (02:05:26 PM) poeml: or do you use PHP via fastcgi or something else? (02:05:44 PM) kohsuke: So I guess that'd be the quick fix --- to change to threaded MPM (02:06:11 PM) poeml: you can either increase the number of allowed connections in postgresql - as long as there are the resources for it (cpu and ram). (02:06:17 PM) kohsuke: right (02:06:27 PM) poeml: or you can limit the number of Apache processes that can be spawned (that's useful anyway). (02:07:16 PM) kohsuke: Got it. Come to think of it, it all makes sense (02:07:28 PM) poeml: If you have to keep prefork MPM for the reason of PHP, you can still limit the number of processes in the Apache process pool to a number that fits the number of allowed postgresql processes. (02:07:32 PM) kohsuke: at least one connection needs to be kept around by apache process to be able to serve requests rapidly (02:08:06 PM) poeml: anyway, if you have so many Apache processes lingering around, odds are that you don't really need them (I would expect most of them to be idle, or rather in keepalive state) (02:08:53 PM) poeml: get rid of keepalive, or at least limit the keepalive time to 2 seconds at most. If you switch keepalive off even, you'll have dramatically fewer Apache processes -- I predict ;) (02:09:41 PM) poeml: the small benefits of keepalive are more than outweighed by the benefit of keeping Apache small and responsive for everybody (02:10:26 PM) kohsuke: poeml: thank you very much for your help. I know what to tweak now. Come to think of it, this is obvious. Given that we are forking Apache, it makes sense that we need to allow as many database connection as the # of Apache processes, minimum.
        Hide
        kohsuke Kohsuke Kawaguchi added a comment -

        Fixed by reducing the max # of Apache to 90, which should give room for other processes (like MB scanner) to use some database connections.

        I'll leave it up to Tyler to decide if he wants to tweak Keep-Alive setting.

        Show
        kohsuke Kohsuke Kawaguchi added a comment - Fixed by reducing the max # of Apache to 90, which should give room for other processes (like MB scanner) to use some database connections. I'll leave it up to Tyler to decide if he wants to tweak Keep-Alive setting.
        Hide
        kohsuke Kohsuke Kawaguchi added a comment -

        Marking as resolved.

        Show
        kohsuke Kohsuke Kawaguchi added a comment - Marking as resolved.
        kohsuke Kohsuke Kawaguchi made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        ircbot IRCbot Run by Kohsuke made changes -
        Component/s other [ 15490 ]
        Component/s infrastructure [ 15687 ]

          People

          • Assignee:
            rtyler R. Tyler Croy
            Reporter:
            kohsuke Kohsuke Kawaguchi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: