Thread: Postgres 7.3.1 poor insert/update/search performance esp WRT Oracle

Postgres 7.3.1 poor insert/update/search performance esp WRT Oracle

From
"Seth Robertson"
Date:
I'm trying to get converted over to Postgres from Oracle (Postgres is
billions of times more straightforward and pragmatically clean than
Oracle), but I'm having some severe performance problems on what
naively appears to be a very straightforward dead-simple test.

The test is comprised of two parts: a write part which attempts to
accumulate (sum) numbers by distinct keys, and a read part which
searches for keys in the database (some of which will be present, some
of which will not).  In a more realistic scenario, both will be
happening all of the time, but we can start off easy.

However, performance is terrible: around 39 write transactions/second
and 69 searches/second.  Oracle, by comparison, writes at 314 and
reads at 395--practically an order of magnitude better performance.
Both are using the same hardware (obviously not at the same time)
which is a dual-processor AMD 2000+ with 3GB memory and both oracle
and postgres loaded on a 105GB ``MD'' striped (no redundancy) 2 SCSI
disks running ext3 fs (no special flags) with Linux 2.4.18-10smp.

I actually have seven different schemes for performing the writes
using Postgres:

----------------------------------------------------------------------
"normal" C libpq            39 t/s
"normal" Perl DBI            39 t/s
"DBI Prepared Statement" Perl DBI    39 t/s
"Batching" Perl DBI            45 t/s
"arrays" Perl DBI            26 t/s
"server-side function" Perl DBI        39 t/s
"server-side trigger" Perl DBI        39 t/s
"normal" Perl DBI read            69 t/s
"normal" Perl DBI for Oracle        314 t/s
"normal" Perl DBI read for Oracle    395 t/s
----------------------------------------------------------------------

Only batching had a statistically significant improvement, and it
wasn't that major.  I couldn't use true Postgres prepared statements
since you cannot determine the success/failure of the statements yet.
I was planning on using arrays as well, but the additional 33%
performance impact is not amusing (though I suppose it is only an
additional 3% if you consider the 87% performance drop of Postgres
from Oracle).

I'll include all methods in the attached file, but since there was no
significant difference, I'll concentrate on the basic one:

Example table:
----------------------------------------------------------------------
CREATE TABLE test (
 val BIGINT PRIMARY KEY,             # "vals" may be between 0 and 2^32-1
 accum INTEGER
);
----------------------------------------------------------------------

Basic algorithm for writes
----------------------------------------------------------------------
  while (<>)
  {
    chomp;
    @A = split;

    if (dosql($dbh, "UPDATE test SET accum = accum + $A[1] WHERE val = '$A[0]';",0) eq "0E0")
    {
      dosql($dbh, "INSERT INTO test VALUES ( $A[0], $A[1] );");
    }
  }
----------------------------------------------------------------------

Basic algorithm for reads
----------------------------------------------------------------------
while (<>)
{
  chomp;
  @A = split;
  $sth = querysql($dbh,"SELECT accum FROM test WHERE val = $A[0];");
  $hit++ if ($sth && ($row = $sth->fetchrow_arrayref));
  $tot++;
}
----------------------------------------------------------------------

What could be simpler.

In my randomly generated write data, I usually have about 18K inserts
and 82K updates.  In my randomly generated read data, I have 100K keys
which will be found and 100K keys which will not be found.

The postgresql.conf file is default (my sysadmin nuked all of my
changes when he upgraded to 7.3.1--grr) and there are some shared
memory configs: kernel.sem = 250 32000 100 128, kernel.shmmax =
2147483648, kernel.shmmni = 100, kernel.shmmax = 134217728  The
WAL is not seperated (but see below).

A "vacuum analyze" is performed between the write phase and the read
phase.  However, for your analysis pleasure, here are the results
of a full verbose analyze and some explain results (both before and after).

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
seth=> explain update test set accum = accum + 53 where val = '5';
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on test  (cost=0.00..0.00 rows=1 width=18)
   Filter: (val = 5::bigint)
(2 rows)
seth=> explain insert into test values (5, 53);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0)
(1 row)
seth=> vacuum full verbose analyze test;
INFO:  --Relation public.test--
INFO:  Pages 541: Changed 2, reaped 540, Empty 0, New 0; Tup 18153: Vac 81847, Keep/VTL 0/0, UnUsed 0, MinLen 40,
MaxLen40; Re-using: Free/Avail. Space 3294932/3294932; EndEmpty/Avail. Pages 0/541. 
        CPU 0.00s/0.03u sec elapsed 0.02 sec.
INFO:  Index test_pkey: Pages 355; Tuples 18153: Deleted 81847.
        CPU 0.03s/0.34u sec elapsed 0.65 sec.
INFO:  Rel test: Pages: 541 --> 99; Tuple(s) moved: 18123.
        CPU 1.01s/0.31u sec elapsed 9.65 sec.
INFO:  Index test_pkey: Pages 355; Tuples 18153: Deleted 18123.
        CPU 0.02s/0.06u sec elapsed 0.19 sec.
INFO:  Analyzing public.test
VACUUM

seth=> explain select accum from test where val = 5;
                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..5.99 rows=1 width=4)
   Index Cond: (val = 5)
(2 rows)
seth=> explain update test set accum = accum + 53 where val = '5';
                              QUERY PLAN
-----------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..5.99 rows=1 width=18)
   Index Cond: (val = 5::bigint)
(2 rows)
seth=> explain insert into test values (5, 53);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0)
(1 row)
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

I certainly understand that using an index scan might well speed
things up WRT the update policy, but considering the search
performance is post-analyze (pre-analyze it is even more deadly slow),
I am dubious that doing it during the updates will get me within
striking distance of Oracle since read performance has got to be
better than write performance, right?.  This is also why I am dubious
that moving the WAL to another filesystem or futzing with the fsync
policy will do anything.

I will include below a compressed tarball of the programs I used (and
the corresponding RUNME script) in case you wish to play along at
home.  I don't claim they are pretty, BTW :-)

                                        -Seth Robertson


Attachment

Re: Postgres 7.3.1 poor insert/update/search performance

From
Stephan Szabo
Date:
On Tue, 21 Jan 2003, Seth Robertson wrote:

> The postgresql.conf file is default (my sysadmin nuked all of my
> changes when he upgraded to 7.3.1--grr) and there are some shared
> memory configs: kernel.sem = 250 32000 100 128, kernel.shmmax =
> 2147483648, kernel.shmmni = 100, kernel.shmmax = 134217728  The
> WAL is not seperated (but see below).

You almost certainly want to raise shared_buffers from the default (64?)
to say 1k-10k.  I'm not sure how much that'll help but it should help
some.

> A "vacuum analyze" is performed between the write phase and the read
> phase.  However, for your analysis pleasure, here are the results
> of a full verbose analyze and some explain results (both before and after).

BTW: what does explain analyze (rather than plain explain) show?


Re: Postgres 7.3.1 poor insert/update/search performance

From
Seth Robertson
Date:
In message <20030121134242.Q84028-100000@megazone23.bigpanda.com>, Stephan Szabo writes:

    On Tue, 21 Jan 2003, Seth Robertson wrote:

    > The postgresql.conf file is default (my sysadmin nuked all of my
    > changes when he upgraded to 7.3.1--grr) and there are some shared
    > memory configs: kernel.sem = 250 32000 100 128, kernel.shmmax =
    > 2147483648, kernel.shmmni = 100, kernel.shmmax = 134217728  The
    > WAL is not seperated (but see below).

    You almost certainly want to raise shared_buffers from the default (64?)
    to say 1k-10k.  I'm not sure how much that'll help but it should help
    some.

I'll try that and report back later, but I was under the (false?)
impression that it was primarily important when you had multiple
database connections using the same table.

    > A "vacuum analyze" is performed between the write phase and the
    > read phase.  However, for your analysis pleasure, here are the
    > results of a full verbose analyze and some explain results (both
    > before and after).

    BTW: what does explain analyze (rather than plain explain) show?


/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
seth=> explain analyze select accum from test where val = 5;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..0.00 rows=1 width=4) (actual time=94.55..94.55 rows=0 loops=1)
   Filter: (val = 5)
 Total runtime: 99.20 msec
(3 rows)

seth=> explain analyze update test set accum = accum + 53 where val = '5';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..0.00 rows=1 width=18) (actual time=31.95..31.95 rows=0 loops=1)
   Filter: (val = 5::bigint)
 Total runtime: 32.04 msec
(3 rows)

seth=> explain analyze insert into test values (5, 53);
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1)
 Total runtime: 7.50 msec
(2 rows)

seth=> vacuum full verbose analyze test
seth-> ;
INFO:  --Relation public.test--
INFO:  Pages 541: Changed 1, reaped 539, Empty 0, New 0; Tup 18071: Vac 81930, Keep/VTL 0/0, UnUsed 0, MinLen 40,
MaxLen40; Re-using: Free/Avail. Space 3298208/3298176; EndEmpty/Avail. Pages 0/540. 
        CPU 0.03s/0.00u sec elapsed 0.02 sec.
INFO:  Index test_pkey: Pages 355; Tuples 18071: Deleted 81930.
        CPU 0.04s/0.41u sec elapsed 1.96 sec.
INFO:  Rel test: Pages: 541 --> 98; Tuple(s) moved: 18046.
        CPU 0.95s/0.42u sec elapsed 12.74 sec.
INFO:  Index test_pkey: Pages 355; Tuples 18071: Deleted 18046.
        CPU 0.02s/0.05u sec elapsed 0.31 sec.
INFO:  Analyzing public.test
VACUUM
seth=> explain analyze select accum from test where val = 5;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..323.89 rows=1 width=4) (actual time=0.13..14.20 rows=1 loops=1)
   Filter: (val = 5)
 Total runtime: 14.26 msec
(3 rows)

seth=> explain analyze select accum from test where val = 2147483648;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..5.99 rows=1 width=4) (actual time=0.11..0.11 rows=0 loops=1)
   Index Cond: (val = 2147483648::bigint)
 Total runtime: 0.16 msec
(3 rows)

seth=> explain analyze update test set accum = accum + 53 where val = '5';
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..5.99 rows=1 width=18) (actual time=0.24..0.24 rows=1 loops=1)
   Index Cond: (val = 5::bigint)
 Total runtime: 0.39 msec
(3 rows)

seth=> explain analyze insert into test values (6, 53);
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1)
 Total runtime: 0.08 msec
(2 rows)

seth=> explain analyze insert into test values (2147483647, 53);
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1)
 Total runtime: 0.33 msec
(2 rows)
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\

                                        -Seth Robertson

Re: Postgres 7.3.1 poor insert/update/search performance

From
Tom Lane
Date:
Seth Robertson <pgsql-performance@sysd.com> writes:
> I'll try that and report back later, but I was under the (false?)
> impression that it was primarily important when you had multiple
> database connections using the same table.

Definitely false.  shared_buffers needs to be 1000 or so for
production-grade performance.  There are varying schools of thought
about whether it's useful to raise it even higher, but in any case
64 is just a toy-installation setting.

> seth=> explain analyze select accum from test where val = 5;
>                                           QUERY PLAN
> -----------------------------------------------------------------------------------------------
>  Seq Scan on test  (cost=0.00..323.89 rows=1 width=4) (actual time=0.13..14.20 rows=1 loops=1)
>    Filter: (val = 5)
>  Total runtime: 14.26 msec
> (3 rows)

> seth=> explain analyze update test set accum = accum + 53 where val = '5';
>                                                   QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Index Scan using test_pkey on test  (cost=0.00..5.99 rows=1 width=18) (actual time=0.24..0.24 rows=1 loops=1)
>    Index Cond: (val = 5::bigint)
>  Total runtime: 0.39 msec
> (3 rows)

The quotes are important when you are dealing with BIGINT indexes.
You won't get an indexscan if the constant looks like int4 rather than int8.

            regards, tom lane

Re: Postgres 7.3.1 poor insert/update/search performance

From
Brian Hirt
Date:
Tom and others:

There has been a lot of talk about  shared memory size recently, along
with many conflicting statements from various people.  Earlier threads
said that setting the shared buffer to a high values (like 512MB on a
2GB dedicated DB server) is not a good idea.  A couple of reasons were
mentioned.  a) potential inefficiencies with the kernel and VM system
b) modern kernels aggressive  caching with all free memory and c) the
shared memory stealing from memory the kernel would use to cache, etc.

So my question is:  if the kernel is caching all this data, what's the
benefit of setting this to 1000 or higher?   Why wouldn't i just set it
to 0 if I believe my kernel is doing a good job.


 From all the discussion on this topic, it's still not clear to me how
to calculate what value this should be set at and why.  I've read these
documents and others and have yet to find explanations and
recommendations that i can use.

http://www.postgresql.org/docs/momjian/hw_performance.pdf
http://www.postgresql.org/idocs/index.php?runtime-config.html
http://www.postgresql.org/idocs/index.php?kernel-resources.html
http://www.postgresql.org/idocs/index.php?performance-tips.html
http://www.ca.postgresql.org/docs/momjian/hw_performance/node6.html
http://www.ca.postgresql.org/docs/momjian/hw_performance/node5.html
http://www.ca.postgresql.org/docs/faq-english.html#3.6

This is such a common topic, it would be nice to see a more definitive
and comprehensive section in the docs for tuning.  Google searches for
"shared_buffers site:www.postgresql.org" and "tuning
site:www.postgresql.org" come up with little info.

FYI: I've been running our database which is mostly read only with 1500
buffers.  On a whole, we  see very little IO.  postgresql  performs
many many million queries a day, many simple, many complex.  Though the
database is relatively small, around 3GB.

--brian

On Tuesday, January 21, 2003, at 03:31 PM, Tom Lane wrote:

> Seth Robertson <pgsql-performance@sysd.com> writes:
>> I'll try that and report back later, but I was under the (false?)
>> impression that it was primarily important when you had multiple
>> database connections using the same table.
>
> Definitely false.  shared_buffers needs to be 1000 or so for
> production-grade performance.  There are varying schools of thought
> about whether it's useful to raise it even higher, but in any case
> 64 is just a toy-installation setting.
>
>> seth=> explain analyze select accum from test where val = 5;
>>                                           QUERY PLAN
>> ----------------------------------------------------------------------
>> -------------------------
>>  Seq Scan on test  (cost=0.00..323.89 rows=1 width=4) (actual
>> time=0.13..14.20 rows=1 loops=1)
>>    Filter: (val = 5)
>>  Total runtime: 14.26 msec
>> (3 rows)
>
>> seth=> explain analyze update test set accum = accum + 53 where val =
>> '5';
>>                                                   QUERY PLAN
>> ----------------------------------------------------------------------
>> -----------------------------------------
>>  Index Scan using test_pkey on test  (cost=0.00..5.99 rows=1
>> width=18) (actual time=0.24..0.24 rows=1 loops=1)
>>    Index Cond: (val = 5::bigint)
>>  Total runtime: 0.39 msec
>> (3 rows)
>
> The quotes are important when you are dealing with BIGINT indexes.
> You won't get an indexscan if the constant looks like int4 rather than
> int8.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)


Re: Postgres 7.3.1 poor insert/update/search performance

From
Seth Robertson
Date:
In message <13165.1043188295@sss.pgh.pa.us>, Tom Lane writes:

    Seth Robertson <pgsql-performance@sysd.com> writes:
    > I'll try that and report back later, but I was under the (false?)
    > impression that it was primarily important when you had multiple
    > database connections using the same table.

    Definitely false.  shared_buffers needs to be 1000 or so for
    production-grade performance.  There are varying schools of thought
    about whether it's useful to raise it even higher, but in any case
    64 is just a toy-installation setting.

Increasing the setting to 4096 improved write performance by 20%.
Increasing the setting to 8192 had no additional effect.  I could try
a few more probes if anyone cared.

    The quotes are important when you are dealing with BIGINT indexes.
    You won't get an indexscan if the constant looks like int4 rather
    than int8.

You are not kidding!!!!  Changing this increased the search
performance to 2083 transactions/second.  This is 30 times faster than
before, and 5 times faster than Oracle!  Go Tom Lane!!!

Unfortunately, the update accidentally already used the quoting, so
this top did not directly help the write case.  However, it did
inspire me to check some other suggestions I have read since obviously
performance was to be had.

----------------------------------------------------------------------
Oracle read performance:                                         395
Original read performance:                                        69
shared_buffer = 4096                                             118
+ quoted where (WHERE val = '5')                                2083
----------------------------------------------------------------------

----------------------------------------------------------------------
Oracle write performance:                                       314
Original write performance:                                      39
shared_buffer = 4096:                                            47
+ Occassional (@ 10K & 60K vectors) vacuum analyze in bg:       121
+ Periodic (every 10K vectors) vacuum analyze in background:    124
+ wal_buffers = 24:                                             125
+ wal_method = fdatasync                                        127
+ wal_method = open_sync                                        248
+ wal_method = open_datasync                          Not Supported
+ fsync=false:                                                  793
----------------------------------------------------------------------

Just to round out my report, using the fastest safe combination I was
able to find (open_sync *is* safe, isn't it?), I reran all 7
performance tests to see if there was any different using the
different access methods:

----------------------------------------------------------------------
"normal" C libpq                        256 t/s
"normal" Perl DBI                       251 t/s
"DBI Prepared Statement" Perl DBI       254 t/s
"Batching" Perl DBI                     1149 t/s
"arrays" Perl DBI                       43 t/s
"server-side function" Perl DBI         84 t/s
"server-side trigger" Perl DBI          84 t/s
"normal" Perl DBI read                  1960 t/s
"normal" Perl DBI for Oracle            314 t/s
"normal" Perl DBI read for Oracle       395 t/s
----------------------------------------------------------------------

With a batching update of 1149 transactions per second (2900%
improvement), I am willing to call it a day unless anyone else has any
brilliant ideas.  However, it looks like my hope to use arrays is
doomed though, I'm not sure I can handle the performance penalty.

                                        -Seth Robertson

Re: Postgres 7.3.1 poor insert/update/search performance

From
Andrew Sullivan
Date:
On Tue, Jan 21, 2003 at 06:44:57PM -0700, Brian Hirt wrote:
>
> So my question is:  if the kernel is caching all this data, what's the
> benefit of setting this to 1000 or higher?   Why wouldn't i just set it
> to 0 if I believe my kernel is doing a good job.

If Postgres tries to fetch a bit of data which is in its own shared
buffer, it does not even need to make a system call in order to fetch
it.  The data fetch is extremely fast.

The problem is that managing that shared memory comes at some cost.

If the data is not in a shared buffer, then Postgres makes exactly
the same call, no matter what, to the OS kernel, asking for the data
from disk.  It might happen, however, that the kernel will have the
data in its disk cache, however.  The total cost of the operation,
therefore, is much lower in case the data is in the kernel's disk
cache than in the case where it is actually on the disk.  It is
nevertheless still higher (atomically speaking) than fetching the
data from Postgres's own shared buffer.

So the question is this: where is the "sweet spot" where it costs
little enough for Postgres to manage the shared buffer that the
reduced cost of a system call is worth it.  (As you point out, this
caclulation is complicated by the potential to waste memory by
caching the data twice -- once in the shared buffer and once in the
disk cache.  Some systems, like Solaris, allow you to turn off the
disk cache, so the problem may not be one you face.)  The trouble is
that there is no agreement on the answer to that question, and
precious little evidence which seems to settle the question.

The only way to determine the best setting, then, is to use your
system with more-or-less simulated production loads, and see where
the best setting lies.  You have to do this over time, because
sometimes inefficiencies turn up only after running for a while.  In
an experiment we tried, we used a 2G shared buffer on a 12G machine.
It looked brilliantly fast at first, but 48 hours later was
_crawling_; that indicates a problem with shared-buffer management on
the part of Postgres, I guess, but it was hard to say more than that.
We ultimately settled on a value somewhere less than 1 G as
appropriate for our use.  But if I had to justify the number I picked
(as opposed to one a few hundred higher or lower), I'd have a tough
time.

>  From all the discussion on this topic, it's still not clear to me how
> to calculate what value this should be set at and why.  I've read these
> documents and others and have yet to find explanations and
> recommendations that i can use.

I'm afraid what I'm saying is that it's a bit of a black art.  The
pg_autotune project is an attempt to help make this a little more
scientific.  It relies on pgbench, which has its own problems,
however.

Hope that's helpful, but I fear it doesn't give you the answer you'd
like.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Postgres 7.3.1 poor insert/update/search performance

From
Tom Lane
Date:
Brian Hirt <bhirt@mobygames.com> writes:
> So my question is:  if the kernel is caching all this data, what's the
> benefit of setting this to 1000 or higher?   Why wouldn't i just set it
> to 0 if I believe my kernel is doing a good job.

Well, setting it to 0 won't work ;-).  There's some minimum number of
buffers needed for PG to work at all; depending on complexity of your
queries and number of active backends it's probably around 25-100.
(You'll get "out of buffers" failures if too few.)  But more to the
point, when shared_buffers is too small you'll waste CPU cycles on
unnecessary data transfers between kernel and user memory.  It seems to
be pretty well established that 64 is too small for most applications.
I'm not sure how much is enough, but I suspect that a few thousand is
plenty to get past the knee of the performance curve in most scenarios.

            regards, tom lane

Re: Postgres 7.3.1 poor insert/update/search performance

From
Seth Robertson
Date:
Seth Robertson writes:


    However, it looks like my hope to use arrays is
    doomed though, I'm not sure I can handle the performance penalty.

Just in case I get the person who implemented arrays annoyed or
worried, I did not properly modify the "array" test and was vacuum'ing
the wrong table every 10000 vectors during the test.  I realized that
this morning and the new array results are listed below.  I also
experimented with batching read operations, and I was surprised to find
that this helps a great deal as well.

----------------------------------------------------------------------
"normal" C libpq                        256 t/s
"normal" Perl DBI                       251 t/s
"DBI Prepared Statement" Perl DBI       254 t/s
"Batching" Perl DBI                     1149 t/s
"arrays" Perl DBI                       250 t/s  (*)
"arrays with batching" Perl DBI         1020 t/s (*)
"server-side function" Perl DBI         84 t/s
"server-side trigger" Perl DBI          84 t/s
"normal" Perl DBI read                  1960 t/s
"batched" Perl DBI read            3076 t/s (*)
"array" Perl DBI read                   1754 t/s (*)
"batched array" Perl DBI read           2702 t/s (*)
"normal" Perl DBI for Oracle            314 t/s
"normal" Perl DBI read for Oracle       395 t/s
----------------------------------------------------------------------
(*) New/updated from this morning

This brings array code to within 11% of the performance of batched
non-arrays, and close enough to be an option.  I may well be doing
something wrong with the server-side functions, but I don't see
anything quite so obviously wrong.

                                        -Seth Robertson

Re: Postgres 7.3.1 poor insert/update/search performance

From
Neil Conway
Date:
On Wed, 2003-01-22 at 07:05, Andrew Sullivan wrote:
> (As you point out, this caclulation is complicated by the potential to
> waste memory by caching the data twice

If we had a good buffer replacement algorithm (which we currently do
not), ISTM that hot pages retained in PostgreSQL's buffer cache would
never get loaded from the OS's IO cache, thus causing those pages to
eventually be evicted from the OS's cache. So the "cache the data twice"
problem doesn't apply in all circumstances.

> Some systems, like Solaris, allow you to turn off the
> disk cache, so the problem may not be one you face.)

I think it would be interesting to investigate disabling the OS' cache
for all relation I/O (i.e. heap files, index files). That way we could
both improve performance (by moving all the caching into PostgreSQL's
domain, where there is more room for optimization), as well as make
configuration simpler: in an ideal world, it would remove the need to
consider the OS' caching when configuring the amount of shared memory to
allocate to PostgreSQL.

Can this be done using O_DIRECT? If so, is it portable?

BTW, if anyone has any information on actually *using* O_DIRECT, I'd be
interested in it. I tried to quickly hack PostgreSQL to use it, without
success...

Cheers,

Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC




Re: Postgres 7.3.1 poor insert/update/search performance

From
Curt Sampson
Date:
On Fri, 30 Jan 2003, Neil Conway wrote:

> If we had a good buffer replacement algorithm (which we currently do
> not), ISTM that hot pages retained in PostgreSQL's buffer cache would
> never get loaded from the OS's IO cache, thus causing those pages to
> eventually be evicted from the OS's cache. So the "cache the data twice"
> problem doesn't apply in all circumstances.

No, but it does apply to every block at some point, since during the
initial load it's present in both caches, and it has to be flushed from
the OS's cache at some point.

> > Some systems, like Solaris, allow you to turn off the
> > disk cache, so the problem may not be one you face.)
>
> I think it would be interesting to investigate disabling the OS' cache
> for all relation I/O (i.e. heap files, index files). That way we could
> both improve performance (by moving all the caching into PostgreSQL's
> domain, where there is more room for optimization)...

I'm not so sure that there is that all much more room for optimization.
But take a look at what Solaris and FFS do now, and consider how much
work it would be to rewrite it, and then see if you even want to do that
before adding stuff to improve performance.

> , as well as make configuration simpler: in an ideal world, it would
> remove the need to consider the OS' caching when configuring the
> amount of shared memory to allocate to PostgreSQL.

We could do that much more simply by using mmap.

> Can this be done using O_DIRECT?

It can, but you're doing to lose some of the advantages that you'd get
from using raw devices instead. In particular, you have no way to know
the physical location of blocks on the disk, because those locations are
often different from the location in the file.

> If so, is it portable?

O_DIRECT is not all that portable, I don't think. Certainly not as
portable as mmap.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

Re: Postgres 7.3.1 poor insert/update/search performance

From
Bruce Momjian
Date:
Curt Sampson wrote:
> > > Some systems, like Solaris, allow you to turn off the
> > > disk cache, so the problem may not be one you face.)
> >
> > I think it would be interesting to investigate disabling the OS' cache
> > for all relation I/O (i.e. heap files, index files). That way we could
> > both improve performance (by moving all the caching into PostgreSQL's
> > domain, where there is more room for optimization)...
>
> I'm not so sure that there is that all much more room for optimization.
> But take a look at what Solaris and FFS do now, and consider how much
> work it would be to rewrite it, and then see if you even want to do that
> before adding stuff to improve performance.

We need free-behind for large sequential scans, like Solaris has.  Do we
have LRU-2 or LRU-K now?

> > If so, is it portable?
>
> O_DIRECT is not all that portable, I don't think. Certainly not as
> portable as mmap.

As I remember, DIRECT doesn't return until the data hits the disk
(because there is no OS cache), so if you want to write a page so you
can reused the buffer, DIRECT would be quite slow.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Postgres 7.3.1 poor insert/update/search performance

From
Neil Conway
Date:
On Sun, 2003-02-02 at 05:39, Bruce Momjian wrote:
> We need free-behind for large sequential scans, like Solaris has.  Do we
> have LRU-2 or LRU-K now?

No.

> As I remember, DIRECT doesn't return until the data hits the disk
> (because there is no OS cache), so if you want to write a page so you
> can reused the buffer, DIRECT would be quite slow.

Why? If there is a finite amount of memory for doing buffering, the data
needs to be written to disk at *some* point, anyway. And if we didn't
use the OS cache, the size of the PostgreSQL shared buffer would be much
larger (I'd think 80% or more of the physical RAM in a typical high-end
machine, for dedicated PostgreSQL usage).

One possible problem would be the fact that it might mean that writing
out dirty pages would become part of some key code paths in PostgreSQL
(rather than assuming that the OS can write out dirty pages in the
background, as it chooses to). But there are lots of ways to work around
this, notably by using a daemon to periodically write out some of the
pages in the background.

Cheers,

Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC




Re: Postgres 7.3.1 poor insert/update/search performance

From
Curt Sampson
Date:
On Mon, 2 Feb 2003, Neil Conway wrote:

> > As I remember, DIRECT doesn't return until the data hits the disk
> > (because there is no OS cache), so if you want to write a page so you
> > can reused the buffer, DIRECT would be quite slow.
> ...
> One possible problem would be the fact that it might mean that writing
> out dirty pages would become part of some key code paths in PostgreSQL
> (rather than assuming that the OS can write out dirty pages in the
> background, as it chooses to). But there are lots of ways to work around
> this, notably by using a daemon to periodically write out some of the
> pages in the background.

If you're doing blocking direct I/O, you really have to have and use
what I guess I'd call "scatter-scatter I/O": you need to chose a large
number of blocks scattered over various positions in all your open
files, and be able to request a write for all of them at once.

If you write one by one, with each write going to disk before your
request returns, you're going to be forcing the physical order of the
writes with no knowledge of where the blocks physically reside on the
disk, and you stand a snowball's chance in you-know-where of getting
a write ordering that will maximize your disk throughput.

This is why systems that use direct I/O, for the most part, use a raw
partition and their own "filesystem" as well; you need to know the
physical layout of the blocks to create efficient write strategies.

(MS SQL Server on Windows NT is a notable exception to this. They do,
however, make you pre-create the data file in advance, and they suggest
doing it on an empty partition, which at the very least would get you
long stretches of the file in contiguous order. They may also be using
tricks to make sure the file gets created in contiguous order, or they
may be able to get information from the OS about the physical block
numbers corresponding to logical block numbers in the file.)

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

Re: Postgres 7.3.1 poor insert/update/search performance

From
Josh Berkus
Date:
Curt,

> (MS SQL Server on Windows NT is a notable exception to this. They do,
> however, make you pre-create the data file in advance, and they suggest
> doing it on an empty partition, which at the very least would get you
> long stretches of the file in contiguous order. They may also be using
> tricks to make sure the file gets created in contiguous order, or they
> may be able to get information from the OS about the physical block
> numbers corresponding to logical block numbers in the file.)

MSSQL is, in fact, doing some kind of direct-block-addressing.   If you
attempt to move a partition on the disk containing MSSQL databases, SQL
Server will crash on restart and be unrecoverable ... even if the other files
on that disk are fine.  Nor can you back up MSSQL by using disk imaging,
unless you can recover to an identical model disk/array.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Postgres 7.3.1 poor insert/update/search performance

From
Bruce Momjian
Date:
Neil Conway wrote:
> On Sun, 2003-02-02 at 05:39, Bruce Momjian wrote:
> > We need free-behind for large sequential scans, like Solaris has.  Do we
> > have LRU-2 or LRU-K now?
>
> No.
>
> > As I remember, DIRECT doesn't return until the data hits the disk
> > (because there is no OS cache), so if you want to write a page so you
> > can reused the buffer, DIRECT would be quite slow.
>
> Why? If there is a finite amount of memory for doing buffering, the data
> needs to be written to disk at *some* point, anyway. And if we didn't
> use the OS cache, the size of the PostgreSQL shared buffer would be much
> larger (I'd think 80% or more of the physical RAM in a typical high-end
> machine, for dedicated PostgreSQL usage).
>
> One possible problem would be the fact that it might mean that writing
> out dirty pages would become part of some key code paths in PostgreSQL
> (rather than assuming that the OS can write out dirty pages in the
> background, as it chooses to). But there are lots of ways to work around
> this, notably by using a daemon to periodically write out some of the
> pages in the background.

Right.  This is what we _don't_ want to do.  If we need a buffer, we
need it now.  We can't wait for some other process to write the buffer
directly to disk, nor do we want to group the writes somehow.

And the other person mentioning we have to group writes again causes the
same issues --- we are bypassing the kernel buffers which know more than
we do. I can see advantage of preventing double buffering _quickly_
being overtaken by the extra overhead of direct i/o.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073