Thread: Performance Woes

Performance Woes

From

"Ralph Mason"

Date:

09 May 2007, 20:56:37

Hi,

I have several databases. They are each about 35gb in size and have about 10.5K relations (count from pg_stat_all_tables) in them. Pg_class is about 26k rows and the data directory contains about 70k files. These are busy machines, they run about 50 xactions per second, ( aproxx insert / update / delete about 500 rows per second).

We started getting errors about the number of open file descriptors

: 2007-05-09 03:07:50.083 GMT 1146975740: LOG: 53000: out of file descriptors: Too many open files; release and retry

2007-05-09 03:07:50.083 GMT 1146975740: CONTEXT: SQL statement "insert ….. "

PL/pgSQL function "trigfunc_whatever" line 50 at execute statement

2007-05-09 03:07:50.083 GMT 1146975740: LOCATION: BasicOpenFile, fd.c:471

2007-05-09 03:07:50.091 GMT 0: LOG: 00000: duration: 12.362 ms

2007-05-09 03:07:50.091 GMT 0: LOCATION: exec_simple_query, postgres.c:1090

So we decreased the max_files_per_process to 800. This took care of the error *BUT* about quadrupled the IO wait that is happening on the machine. It went from a peek of about 50% to peeks of over 200% (4 processor machines, 4 gigs ram, raid). The load on the machine remained constant.

I am really to get an understanding of exactly what this setting is and ‘what’ is out of file descriptors and how I can fix that. I need to bring that IO back down.

Thanks for any help.

Ralph

--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 05/12/2006 16:07

Re: Performance Woes

From

"CAJ CAJ"

Date:

09 May 2007, 21:26:18

I have several databases. They are each about 35gb in size and have about 10.5K relations (count from pg_stat_all_tables) in them. Pg_class is about 26k rows and the data directory contains about 70k files. These are busy machines, they run about 50 xactions per second, ( aproxx insert / update / delete about 500 rows per second).
We started getting errors about the number of open file descriptors
: 2007-05-09 03:07:50.083 GMT 1146975740: LOG: 53000: out of file descriptors: Too many open files; release and retry
2007-05-09 03:07:50.083 GMT 1146975740: CONTEXT: SQL statement "insert ….. "
PL/pgSQL function "trigfunc_whatever" line 50 at execute statement
2007-05-09 03:07:50.083 GMT 1146975740: LOCATION: BasicOpenFile, fd.c:471
2007-05-09 03:07:50.091 GMT 0: LOG: 00000: duration: 12.362 ms
2007-05-09 03:07:50.091 GMT 0: LOCATION: exec_simple_query, postgres.c:1090

So we decreased the max_files_per_process to 800. This took care of the error *BUT* about quadrupled the IO wait that is happening on the machine. It went from a peek of about 50% to peeks of over 200% (4 processor machines, 4 gigs ram, raid). The load on the machine remained constant.

What version of Pg/OS? What is your hardware config?

I had seen these errors with earlier versions of Pg 7.4.x which was fixed in later releases according to the changelogs

Re: Performance Woes

From

"Joshua D. Drake"

Date:

09 May 2007, 21:29:42

>     2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile,
>     fd.c:471
>
>     2007-05-09 03:07:50.091 GMT 0: LOG:  00000: duration: 12.362 ms
>
>     2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query,
>     postgres.c:1090
>
>
>
>     So  we decreased the max_files_per_process to 800.  This took care
>     of the error **BUT** about quadrupled  the IO wait that is happening
>     on the machine. It went from a peek of about 50% to peeks of over
>     200% (4 processor machines, 4 gigs ram, raid).  The load on the
>     machine remained constant.
>

Sounds to me like you just need to up the total amount of open files
allowed by the operating system.

Joshua D. Drake

--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: Performance Woes

From

"Ralph Mason"

Date:

09 May 2007, 21:35:03

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of CAJ CAJ
Sent: 10 May 2007 12:26
To: Ralph Mason
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance Woes

I have several databases. They are each about 35gb in size and have about 10.5K relations (count from pg_stat_all_tables) in them. Pg_class is about 26k rows and the data directory contains about 70k files. These are busy machines, they run about 50 xactions per second, ( aproxx insert / update / delete about 500 rows per second).
We started getting errors about the number of open file descriptors
: 2007-05-09 03:07:50.083 GMT 1146975740: LOG: 53000: out of file descriptors: Too many open files; release and retry
2007-05-09 03:07:50.083 GMT 1146975740: CONTEXT: SQL statement "insert ….. "
PL/pgSQL function "trigfunc_whatever" line 50 at execute statement
2007-05-09 03:07:50.083 GMT 1146975740: LOCATION: BasicOpenFile, fd.c:471
2007-05-09 03:07:50.091 GMT 0: LOG: 00000: duration: 12.362 ms
2007-05-09 03:07:50.091 GMT 0: LOCATION: exec_simple_query, postgres.c:1090

So we decreased the max_files_per_process to 800. This took care of the error *BUT* about quadrupled the IO wait that is happening on the machine. It went from a peek of about 50% to peeks of over 200% (4 processor machines, 4 gigs ram, raid). The load on the machine remained constant.

>What version of Pg/OS? What is your hardware config?

>I had seen these errors with earlier versions of Pg 7.4.x which was fixed in later releases according to the changelogs

"PostgreSQL 8.1.4 on x86_64-redhat-linux-gnu, compiled by GCC x86_64-redhat-linux-gcc (GCC) 4.1.0 20060304 (Red Hat 4.1.0-3)"

su postgres -c 'ulimit -a'

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

max nice (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 49152

max locked memory (kbytes, -l) 32

max memory size (kbytes, -m) unlimited

open files (-n) 1000000

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

max rt priority (-r) 0

stack size (kbytes, -s) 10240

cpu time (seconds, -t) unlimited

max user processes (-u) 49152

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

Seems like I should be able to use lots and lots of open files.

Machines are quad processor opterons, 4gb ram with raid 5 data and logging to a raid 0.

Ralph

--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 05/12/2006 16:07

Re: Performance Woes

From

"Joshua D. Drake"

Date:

09 May 2007, 21:39:51

Hello,

You likely need to increase your file-max parameters using sysctl.conf.

Sincerely,

Joshua D. Drake
--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: Performance Woes

From

Jeff Davis

Date:

09 May 2007, 21:42:48

On Wed, 2007-05-09 at 17:29 -0700, Joshua D. Drake wrote:
> >     2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile,
> >     fd.c:471
> >
> >     2007-05-09 03:07:50.091 GMT 0: LOG:  00000: duration: 12.362 ms
> >
> >     2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query,
> >     postgres.c:1090
> >
> >
> >
> >     So  we decreased the max_files_per_process to 800.  This took care
> >     of the error **BUT** about quadrupled  the IO wait that is happening
> >     on the machine. It went from a peek of about 50% to peeks of over
> >     200% (4 processor machines, 4 gigs ram, raid).  The load on the
> >     machine remained constant.
> >
>
> Sounds to me like you just need to up the total amount of open files
> allowed by the operating system.

It looks more like the opposite, here's the docs for
max_files_per_process:

"Sets the maximum number of simultaneously open files allowed to each
server subprocess. The default is one thousand files. If the kernel is
enforcing a safe per-process limit, you don't need to worry about this
setting. But on some platforms (notably, most BSD systems), the kernel
will allow individual processes to open many more files than the system
can really support when a large number of processes all try to open that
many files. If you find yourself seeing "Too many open files" failures,
try reducing this setting. This parameter can only be set at server
start."

To me, that means that his machine is allowing the new FD to be created,
but then can't really support that many so it gives an error.

Ralph, how many connections do you have open at once? It seems like the
machine perhaps just can't handle that many FDs in all of those
processes at once.

That is a lot of tables. Maybe a different OS will handle it better?
Maybe there's some way that you can use fewer connections and then the
OS could still handle it?

Regards,
    Jeff Davis

Re: Performance Woes

From

"Ralph Mason"

Date:

09 May 2007, 22:45:48

>To me, that means that his machine is allowing the new FD to be created,
>but then can't really support that many so it gives an error.

files-max is 297834
ulimit is 1000000

(doesn't make sense but there you go)

What I don’t really understand is with max_files_per_process at 800 we don't
get the problem, but with 1000 we do.

$lsof | wc -l
14944


$cat /proc/sys/fs/file-nr
12240   0       297834


>Ralph, how many connections do you have open at once? It seems like the
>machine perhaps just can't handle that many FDs in all of those
>processes at once.

There are only 30 connections - of those probably only 10 are really active.
It doesn't seem like we should be stressing this machine/


>That is a lot of tables. Maybe a different OS will handle it better?
>Maybe there's some way that you can use fewer connections and then the
>OS could still handle it?

It would be less but then you can't maintain the db b/c of the constant
vacuuming needed :-(  I think the linux folks would get up in arms if you
told them they couldn't handle that many open files ;-)

Thanks,
Ralph

--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 05/12/2006
16:07

Re: Performance Woes

From

"Scott Mohekey"

Date:

09 May 2007, 22:50:34

Just adding a bit of relevant information:

We have the kernel file-max setting set to 297834 (256 per 4mb of ram).

/proc/sys/fs/file-nr tells us that we have roughly 13000 allocated handles of which zero are always free.

On 10/05/07, Jeff Davis <pgsql@j-davis.com> wrote:

On Wed, 2007-05-09 at 17:29 -0700, Joshua D. Drake wrote:
> >     2007-05-09 03:07:50.083 GMT 1146975740: LOCATION:  BasicOpenFile,
> >     fd.c:471
> >
> >     2007-05-09 03:07:50.091 GMT 0: LOG:  00000: duration: 12.362 ms
> >
> >     2007-05-09 03:07:50.091 GMT 0: LOCATION:  exec_simple_query,
> >     postgres.c:1090
> >
> >
> >
> >     So  we decreased the max_files_per_process to 800.  This took care
> >     of the error **BUT** about quadrupled  the IO wait that is happening
> >     on the machine. It went from a peek of about 50% to peeks of over
> >     200% (4 processor machines, 4 gigs ram, raid).  The load on the
> >     machine remained constant.
> >
>
> Sounds to me like you just need to up the total amount of open files
> allowed by the operating system.

It looks more like the opposite, here's the docs for
max_files_per_process:

"Sets the maximum number of simultaneously open files allowed to each
server subprocess. The default is one thousand files. If the kernel is
enforcing a safe per-process limit, you don't need to worry about this
setting. But on some platforms (notably, most BSD systems), the kernel
will allow individual processes to open many more files than the system
can really support when a large number of processes all try to open that
many files. If you find yourself seeing "Too many open files" failures,
try reducing this setting. This parameter can only be set at server
start."

To me, that means that his machine is allowing the new FD to be created,
but then can't really support that many so it gives an error.

Ralph, how many connections do you have open at once? It seems like the
machine perhaps just can't handle that many FDs in all of those
processes at once.

That is a lot of tables. Maybe a different OS will handle it better?
Maybe there's some way that you can use fewer connections and then the
OS could still handle it?

Regards,
        Jeff Davis

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Scott Mohekey
Systems Administrator
Telogis
Intelligent Location Technologies

NOTICE:
This message (including any attachments) contains CONFIDENTIAL INFORMATION intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited

Re: Performance Woes

From

Alvaro Herrera

Date:

09 May 2007, 23:05:21

Ralph Mason wrote:

> I have several databases.  They are each about 35gb in size and have about
> 10.5K relations (count from pg_stat_all_tables) in them.  Pg_class is about
> 26k rows and the data directory contains about 70k files.  These are busy
> machines, they run about 50 xactions per second, ( aproxx insert / update /
> delete about 500 rows per second).

Is it always the same trigger the problematic one?  Is it just PL/pgSQL,
or do you have something else?  Something that may be trying to open
additional files for example?  Something that may be trying to open
files behind your back?  PL/Perl with funky operators or temp files?

Also, what PG version is this?

> So  we decreased the max_files_per_process to 800.  This took care of the
> error *BUT* about quadrupled  the IO wait that is happening on the machine.
> It went from a peek of about 50% to peeks of over 200% (4 processor
> machines, 4 gigs ram, raid).  The load on the machine remained constant.

The max_files_per_process settings controls how many actual file
descriptors each process is allowed to have.  Postgres uses internally a
"virtual file descriptor", which normally have one file descriptor open
each.  However, if your transactions need to access lots of files, the
VFDs will close the kernel FDs to allow other VFDs to open theirs.

So it sounds like your transaction has more than 800 files open.  The
extra IO wait could be caused by the additional system calls to open and
close those files as needed.  I would actually expect it to cause extra
"system" load (as opposed to "user") rather than IO, but I'm not sure.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Performance Woes

From

Tom Lane

Date:

10 May 2007, 01:30:20

Jeff Davis <pgsql@j-davis.com> writes:
> On Wed, 2007-05-09 at 17:29 -0700, Joshua D. Drake wrote:
>> Sounds to me like you just need to up the total amount of open files
>> allowed by the operating system.

> It looks more like the opposite, here's the docs for
> max_files_per_process:

I think Josh has got the right advice.  The manual is just saying that
you can reduce max_files_per_process to avoid the failure, but it's not
making any promises about the performance penalty for doing that.
Apparently Ralph's app needs a working set of between 800 and 1000 open
files to have reasonable performance.

> That is a lot of tables. Maybe a different OS will handle it better?
> Maybe there's some way that you can use fewer connections and then the
> OS could still handle it?

Also, it might be worth rethinking the database structure to reduce the
number of tables.  But for a quick-fix, increasing the kernel limit
seems like the easiest answer.

            regards, tom lane