Thread: day 2 results

day 2 results

From
Paul A Vixie
Date:
the prior results (http://www.vix.com/~vixie/pgsql-results.png) showed:
~70ms usual INSERT time (~1.5sec -> ~1.25sec occasional)~250ms usual SELECT time (~1.5sec occasional)

changing the attribute i key by to be PRIMARY KEY improved things a lot;
the new results (http://www.vix.com/~vixie/pgsql-indexed.png) show:
~80ms usual INSERT time (~1.28sec -> ~1.18sec occasional)~100ms usual SELECT time (~1.18sec occasional)

VACUUM ANALYZE after the INSERTs made no performance difference at all,
which is good since no other modern database requires anything to be done
to improve performance after a large number of INSERTs.  (i can understand
why COPY would need it, but not INSERT.)

the occasional 1.2sec has got to be due to some kind of scheduling or I/O
irregularity.  i'm going to try it on a 500MB "MFS partition" next.  it
turns out that MAPS RSS could actually live with "occasional 1.2sec" but
i want to make sure that its cause isn't trivial or my-stupidity-related.

just to let everybody know where i'm at with this.  and-- THANKS for pgsql!


Re: day 2 results

From
Tom Lane
Date:
Paul A Vixie <vixie@mfnx.net> writes:
> the occasional 1.2sec has got to be due to some kind of scheduling or I/O
> irregularity.

Hmm, could it just be delay when your syncer process runs?  Under WAL,
I believe we don't fsync anything except the WAL log file, so a bulk
insert operation would probably create lots and lots of dirty kernel
buffers that syncer would decide to shove out to disk every 30 sec or
so.  Is there any way to correlate the timing spikes against syncer
activity?
        regards, tom lane


Re: day 2 results

From
Thomas Lockhart
Date:
> VACUUM ANALYZE after the INSERTs made no performance difference at all,
> which is good since no other modern database requires anything to be done
> to improve performance after a large number of INSERTs.  (i can understand
> why COPY would need it, but not INSERT.)

afaik every modern database requires something like this to update
optimizer stats, since on-the-fly stats accumulation can be expensive
and inaccurate. But most of my recent experience has been with
PostgreSQL and perhaps some other DBs have added some hacks to get
around this. Of course, some databases advertised as modern don't do
much optimization, so don't need the stats.

The biggest effect seen is when growing from an empty database, when the
stats would be changing the most. Once populated, the stats update
usually has little effect, but shouldn't be ignored forever.

Glad to see the tests are going well...
                    - Thomas

btw, I'll guess that "no other" and "every" could both be
overstatements, but it sure makes a better sentence, eh? ;)