Thread: 8.x index insert performance

8.x index insert performance

From
Kelly Burkhart
Date:
Greetings,

We are running some performance tests in which we are attempting to
insert about 100,000,000 rows in a database at a sustained rate.  About
50M rows in, our performance drops dramatically.

This test is with data that we believe to be close to what we will
encounter in production.  However in tests with purely generated,
sequential data, we did not notice this slowdown.  I'm trying to figure
out what patterns in the "real" data may be causing us problems.

I have log,data and indexes on separate LUNs on an EMC SAN.  Prior to
slowdown, each partition is writing at a consistent rate.  Index
partition is reading at a much lower rate.  At the time of slowdown,
index partition read rate increases, all write rates decrease.  CPU
utilization drops.

The server is doing nothing aside from running the DB.  It is a dual
opteron (dual core, looks like 4 cpus) with 4GB RAM.  shared_buffers =
32768.  fsync = off.  Postgres version is 8.1.b4.  OS is SuSE Enterprise
server 9.

My leading hypothesis is that one indexed column may be leading to our
issue.  The column in question is a varchar(12) column which is non-null
in about 2% of the rows.  The value of this column is 5 characters which
are the same for every row, followed by a 7 character zero filled base
36 integer.  Thus, every value of this field will be exactly 12 bytes
long, and will be substantially the same down to the last bytes.

Could this pattern be pessimal for a postgresql btree index?  I'm
running a test now to see if I can verify, but my runs take quite a long
time...

If this sounds like an unlikely culprit how can I go about tracking down
the issue?

Thanks,

-K

Re: 8.x index insert performance

From
"Merlin Moncure"
Date:
Kelly wrote:
> We are running some performance tests in which we are attempting to
> insert about 100,000,000 rows in a database at a sustained rate.
About
> 50M rows in, our performance drops dramatically.
>
> This test is with data that we believe to be close to what we will
> encounter in production.  However in tests with purely generated,
> sequential data, we did not notice this slowdown.  I'm trying to
figure
> out what patterns in the "real" data may be causing us problems.
>
> I have log,data and indexes on separate LUNs on an EMC SAN.  Prior to
> slowdown, each partition is writing at a consistent rate.  Index
> partition is reading at a much lower rate.  At the time of slowdown,
> index partition read rate increases, all write rates decrease.  CPU
> utilization drops.
>
> The server is doing nothing aside from running the DB.  It is a dual
> opteron (dual core, looks like 4 cpus) with 4GB RAM.  shared_buffers =
> 32768.  fsync = off.  Postgres version is 8.1.b4.  OS is SuSE
Enterprise
> server 9.
>
> My leading hypothesis is that one indexed column may be leading to our
> issue.  The column in question is a varchar(12) column which is
non-null
> in about 2% of the rows.  The value of this column is 5 characters
which
> are the same for every row, followed by a 7 character zero filled base
> 36 integer.  Thus, every value of this field will be exactly 12 bytes
> long, and will be substantially the same down to the last bytes.
>
> Could this pattern be pessimal for a postgresql btree index?  I'm
> running a test now to see if I can verify, but my runs take quite a
long
> time...
>
> If this sounds like an unlikely culprit how can I go about tracking
down
> the issue?

well, can you defer index generation until after loading the set (or use
COPY?)

if that index is causing the problem, you may want to consider setting
up partial index to exclude null values.

One interesting thing to do would be to run your inserting process until
slowdown happens, stop the process, and reindex the table and then
resume it, and see if this helps.

Merlin




Re: 8.x index insert performance

From
Kelly Burkhart
Date:
On Mon, 2005-10-31 at 12:32 -0500, Merlin Moncure wrote:
> well, can you defer index generation until after loading the set (or use
> COPY?)

I cannot defer index generation.

We are using the copy API.  Copying 10000 rows in a batch.

>
> if that index is causing the problem, you may want to consider setting
> up partial index to exclude null values.

This is a single column index.  I assumed that null column values were
not indexed.  Is my assumption incorrect?

-K

Re: 8.x index insert performance

From
Claus Guttesen
Date:
> We are running some performance tests in which we are attempting to
> insert about 100,000,000 rows in a database at a sustained rate.  About
> 50M rows in, our performance drops dramatically.
>
> This test is with data that we believe to be close to what we will
> encounter in production.  However in tests with purely generated,
> sequential data, we did not notice this slowdown.  I'm trying to figure
> out what patterns in the "real" data may be causing us problems.
>
> I have log,data and indexes on separate LUNs on an EMC SAN.  Prior to
> slowdown, each partition is writing at a consistent rate.  Index
> partition is reading at a much lower rate.  At the time of slowdown,
> index partition read rate increases, all write rates decrease.  CPU
> utilization drops.

I'm doing some test-inserts (albeit with much fewer records) into
8.0.4 (on FreeBSD 6.0 RC1) and the import-time decreased three-fold
when I increased the below mentioned values:

shared_buffers = 8192
commit_delay = 100000
commit_siblings = 1000

When I increased shared_buffers the kernel needed minor tweaking.

regards
Claus

Re: 8.x index insert performance

From
"Merlin Moncure"
Date:
> > if that index is causing the problem, you may want to consider
setting
> > up partial index to exclude null values.
>
> This is a single column index.  I assumed that null column values were
> not indexed.  Is my assumption incorrect?
>
> -K
It turns out it is, or it certainly seems to be.  I didn't know that :).
So partial index will probably not help for null exclusion...

would be interesting to see if you are getting swaps (check pg_tmp) when
performance breaks down.  That is an easy fix, bump work_mem.

Merlin

Re: 8.x index insert performance

From
mark@mark.mielke.cc
Date:
On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote:
> if that index is causing the problem, you may want to consider setting
> up partial index to exclude null values.

Hey all.

Pardon my ignorance. :-)

I've been trying to figure out whether null values are indexed or not from
the documentation. I was under the impression, that null values are not
stored in the index. Occassionally, though, I then see a suggestion such
as the above, that seems to indicate to me that null values *are* stored
in the index, allowing for the 'exclude null values' to have effect?

Which is it? :-)

Thanks,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: 8.x index insert performance

From
"Merlin Moncure"
Date:
> On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote:
> > if that index is causing the problem, you may want to consider
setting
> > up partial index to exclude null values.
>
> Hey all.
>
> Pardon my ignorance. :-)
>
> I've been trying to figure out whether null values are indexed or not
from
> the documentation. I was under the impression, that null values are
not
> stored in the index. Occassionally, though, I then see a suggestion
such
> as the above, that seems to indicate to me that null values *are*
stored
> in the index, allowing for the 'exclude null values' to have effect?
>
> Which is it? :-)

I think I'm the ignorant one...do explain on any lookup on an indexed
field where the field value is null and you get a seqscan.

Merlin

Re: 8.x index insert performance

From
Tom Lane
Date:
mark@mark.mielke.cc writes:
> I've been trying to figure out whether null values are indexed or not from
> the documentation. I was under the impression, that null values are not
> stored in the index.

You're mistaken, at least with regard to btree indexes.

            regards, tom lane

Re: 8.x index insert performance

From
Kelly Burkhart
Date:
On Mon, 2005-10-31 at 15:30 -0500, Tom Lane wrote:
> mark@mark.mielke.cc writes:
> > I've been trying to figure out whether null values are indexed or not from
> > the documentation. I was under the impression, that null values are not
> > stored in the index.
>
> You're mistaken, at least with regard to btree indexes.

Ha!  So I'm creating an index 98% full of nulls!  Looks like this is
easily fixed with partial indexes.

-K

Re: 8.x index insert performance

From
"Merlin Moncure"
Date:
> mark@mark.mielke.cc writes:
> > I've been trying to figure out whether null values are indexed or
not
> from
> > the documentation. I was under the impression, that null values are
not
> > stored in the index.
>
> You're mistaken, at least with regard to btree indexes.

hmm. I tried several different ways to filter/extract null values from
an indexed key and got a seq scan every time.  The only way I could
query for/against null values was to convert to bool via function.

However I did a partial exclusion on a 1% non null value really big
table and index size dropped as expected.

Merlin

Re: 8.x index insert performance

From
Scott Marlowe
Date:
On Mon, 2005-10-31 at 13:13, Merlin Moncure wrote:
> > > if that index is causing the problem, you may want to consider
> setting
> > > up partial index to exclude null values.
> >
> > This is a single column index.  I assumed that null column values were
> > not indexed.  Is my assumption incorrect?
> >
> > -K
> It turns out it is, or it certainly seems to be.  I didn't know that :).
> So partial index will probably not help for null exclusion...
>
> would be interesting to see if you are getting swaps (check pg_tmp) when
> performance breaks down.  That is an easy fix, bump work_mem.

OK, here's the issue in a nutshell.

NULLS, like everything else, are indexed.  HOWEVER, there's no way for
them to be used by a normal query, since =NULL is not a legal
construct.  So, you can't do something like:

select * from sometable where somefield = NULL

because you won't get any answers, since nothing can equal NULL and

select * from sometable where somefield IS NULL won't work because IS is
not a nomally indexible operator.

Which is why you can create two indexes on a table to get around this
like so:

create index iname1 on table (field) where field IS NULL

and

create index iname2 on table (field) where field IS NOT NULL

And then the nulls are indexable by IS / IS NOT NULL.

Re: 8.x index insert performance

From
Tom Lane
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
>> You're mistaken, at least with regard to btree indexes.

> hmm. I tried several different ways to filter/extract null values from
> an indexed key and got a seq scan every time.

I said they were stored, not that you could query against them ;-)
IS NULL isn't considered an indexable operator, mainly because it's
not an operator at all in the strict sense of the word; and our index
access APIs only support querying on indexable operators.

The reason they're stored is that they have to be in order to make
multi-column indexes work right.  I suppose we could special-case
single-column indexes, but we don't.  In any case, it's more likely
that someone would one day get around to making IS NULL an indexable
operator than that we'd insert a special case like that.

            regards, tom lane

Re: 8.x index insert performance

From
"Merlin Moncure"
Date:
> select * from sometable where somefield IS NULL won't work because IS
is
> not a nomally indexible operator.

Ah, I didn't know that.  So there is no real reason not to exclude null
values from all your indexes :).  Reading Tom's recent comments
everything is clear now.

Instead of using your two index approach I prefer to:
create function nullidx(anyelement) returns boolean as $$ select $1 is
null; $$ language
sql immutable;

create index on t(nullidx(f)); -- etc

Merlin

Re: 8.x index insert performance

From
Tom Lane
Date:
Kelly Burkhart <kelly@tradebotsystems.com> writes:
> Ha!  So I'm creating an index 98% full of nulls!  Looks like this is
> easily fixed with partial indexes.

Still, though, it's not immediately clear why you'd be seeing a severe
dropoff in insert performance after 50M rows.  Even though there are
lots of nulls, I don't see why they'd behave any worse for insert speed
than real data.  One would like to think that the insert speed would
follow a nice O(log N) rule.

Are you doing the inserts all in one transaction, or several?  If
several, could you get a gprof profile of inserting the same number of
rows (say a million or so) both before and after the unexpected dropoff
occurs?

            regards, tom lane

Re: 8.x index insert performance

From
mark@mark.mielke.cc
Date:
On Mon, Oct 31, 2005 at 03:27:31PM -0500, Merlin Moncure wrote:
> > On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote:
> > > if that index is causing the problem, you may want to consider setting
> > > up partial index to exclude null values.
> > Hey all.
> > Pardon my ignorance. :-)
> > I've been trying to figure out whether null values are indexed or not from
> > the documentation. I was under the impression, that null values are not
> > stored in the index. Occassionally, though, I then see a suggestion such
> > as the above, that seems to indicate to me that null values *are* stored
> > in the index, allowing for the 'exclude null values' to have effect?
> > Which is it? :-)
> I think I'm the ignorant one...do explain on any lookup on an indexed
> field where the field value is null and you get a seqscan.

Nahhh... I think the documentation could use more explicit or obvious
explanation. Or, I could have checked the source code to see. In any case,
I expect we aren't the only ones that lacked confidence.

Tom was kind enough to point out that null values are stored. I expect
that the seqscan is used if the null values are not selective enough,
the same as any other value that isn't selective enough.

Now we can both have a little more confidence! :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: 8.x index insert performance

From
Kelly Burkhart
Date:
On Mon, 2005-10-31 at 16:18 -0500, Tom Lane wrote:
> Kelly Burkhart <kelly@tradebotsystems.com> writes:
> > Ha!  So I'm creating an index 98% full of nulls!  Looks like this is
> > easily fixed with partial indexes.
>
> Still, though, it's not immediately clear why you'd be seeing a severe
> dropoff in insert performance after 50M rows.  Even though there are
> lots of nulls, I don't see why they'd behave any worse for insert speed
> than real data.  One would like to think that the insert speed would
> follow a nice O(log N) rule.
>
> Are you doing the inserts all in one transaction, or several?  If
> several, could you get a gprof profile of inserting the same number of
> rows (say a million or so) both before and after the unexpected dropoff
> occurs?

I'm doing the inserts via libpq copy.  Commits are in batches of approx
15000 rows.  I did a run last night after modifying the indexes and saw
the same pattern.  I'm dumping the database now and will modify my test
program to copy data from the dump rather than purely generated data.
Hopefully, this will allow me to reproduce the problem in a way that
takes less time to set up and run.

Tom, I'd be happy to profile the backend at several points in the run if
you think that would be helpful.  What compiler flags should I use?
Current settings in Makefile.global are:

CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wendif-labels -fno-strict-aliasing

Should I change this to:

CFLAGS = -g -pg -Wall ...

Or should I leave the -O2 in?

It may be weekend by the time I get this done.

-K

Re: 8.x index insert performance

From
Tom Lane
Date:
Kelly Burkhart <kelly@tradebotsystems.com> writes:
> Tom, I'd be happy to profile the backend at several points in the run if
> you think that would be helpful.  What compiler flags should I use?

Add -g -pg and leave the rest alone.  Also, if you're on Linux note that
you need -DLINUX_PROFILE.

> It may be weekend by the time I get this done.

Well, it's probably too late to think of tweaking 8.1 anyway...

            regards, tom lane

Re: 8.x index insert performance

From
Ron Peacetree
Date:
I'm surprised that no one seems to have yet suggested the following
simple experiment:

Increase the RAM 4GB -> 8GB, tune for best performance, and
repeat your 100M row insert experiment.

Does overall insert performance change?  Does the performance
drop <foo> rows in still occur?  Does it occur in ~ the same place?
Etc.

If the effect does seem to be sensitive to the amount of RAM in the
server, it might be worth redoing the experiment(s) with 2GB and
16GB as well...

ron

-----Original Message-----
From: Kelly Burkhart <kelly@tradebotsystems.com>
Sent: Oct 31, 2005 12:12 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] 8.x index insert performance

Greetings,

We are running some performance tests in which we are attempting to
insert about 100,000,000 rows in a database at a sustained rate.  About
50M rows in, our performance drops dramatically.

This test is with data that we believe to be close to what we will
encounter in production.  However in tests with purely generated,
sequential data, we did not notice this slowdown.  I'm trying to figure
out what patterns in the "real" data may be causing us problems.

I have log,data and indexes on separate LUNs on an EMC SAN.  Prior to
slowdown, each partition is writing at a consistent rate.  Index
partition is reading at a much lower rate.  At the time of slowdown,
index partition read rate increases, all write rates decrease.  CPU
utilization drops.

The server is doing nothing aside from running the DB.  It is a dual
opteron (dual core, looks like 4 cpus) with 4GB RAM.  shared_buffers =
32768.  fsync = off.  Postgres version is 8.1.b4.  OS is SuSE Enterprise
server 9.

My leading hypothesis is that one indexed column may be leading to our
issue.  The column in question is a varchar(12) column which is non-null
in about 2% of the rows.  The value of this column is 5 characters which
are the same for every row, followed by a 7 character zero filled base
36 integer.  Thus, every value of this field will be exactly 12 bytes
long, and will be substantially the same down to the last bytes.

Could this pattern be pessimal for a postgresql btree index?  I'm
running a test now to see if I can verify, but my runs take quite a long
time...

If this sounds like an unlikely culprit how can I go about tracking down
the issue?

Thanks,

-K

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org


Re: 8.x index insert performance

From
Greg Stark
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:

> > select * from sometable where somefield IS NULL won't work because IS
> is
> > not a nomally indexible operator.
>
> Ah, I didn't know that.  So there is no real reason not to exclude null
> values from all your indexes :).  Reading Tom's recent comments
> everything is clear now.

There are other reasons. If you want a query like

  SELECT * FROM tab ORDER BY col LIMIT 10

to use an index on col then it can't exclude NULLs or else it wouldn't be
useful. (Oracle actually has this problem, you frequently have to add WHERE
col IS NOT NULL" in order to let it use an index.)


--
greg

Re: 8.x index insert performance

From
Kelly Burkhart
Date:
On Tue, 2005-11-01 at 10:37 -0500, Ron Peacetree wrote:
> I'm surprised that no one seems to have yet suggested the following
> simple experiment:
>
> Increase the RAM 4GB -> 8GB, tune for best performance, and
> repeat your 100M row insert experiment.
>
> Does overall insert performance change?  Does the performance
> drop <foo> rows in still occur?  Does it occur in ~ the same place?
> Etc.
>
> If the effect does seem to be sensitive to the amount of RAM in the
> server, it might be worth redoing the experiment(s) with 2GB and
> 16GB as well...

Ron,

I would like to try this, however, since I'm sitting about 1000 miles
away from the server, tweaking things is not as simple as one might
hope.  I would also like to understand what is going on before I start
changing things.  If I can't get a satisfactory explanation for what I'm
seeing with current hardware, I'll have memory added and see what
happens.

-K

Re: 8.x index insert performance

From
Kelly Burkhart
Date:
Second try... no attachment this time.

I've finally gotten around to profiling the back end.  Here is a more
precise description of what I'm doing:

I am copying data into two tables, order_main and order_transition
(table defs at the end of this post).  The order_transition table has
roughly double the number of rows as the order_main table.

My program is a C program using the libpq copy api which effectively
simulates our real application.  It reads data from two data files, and
appends copy-formatted data into two in-memory buffers.  After 10,000
order_transitions, it copies the order_main data, then the
order_transition data, then commits.  The test program is running on a
different machine than the DB.

After each batch it writes a record to stdout with the amount of time it
took to copy and commit the data (time only includes pg time, not the
time it took to build the buffers).  A graph showing the performance
characteristics is here:

<http://kkcsm.net/pgcpy.jpg>

The horizontal axis is number of transitions * 10000 that have been
written.  The vertical axis is time in milliseconds to copy and commit
the data.  The commit time is very consistent up until about 60,000,000
rows, then performance drops and times become much less consistent.

I profiled the backend at three points, on batches 4, 6042 and 6067.
The first is right after start, the second is right before we hit the
wall, and the third is one of the initial slow batches.

I'm including inline the first 20 lines of gprof output for each batch.
Please let me know if this is insufficient.  I'll supply any necessary
further info.

Since this thread is stale, I'll repeat relevant hardware/software
stats:  server is a dual, dual-core opteron with 4GB RAM.  Disk is an
EMC Symmetrix connected via FC.  Data, index, logs on three separate
LUNS.  OS is SuSE Enterprise 9.  Postgres version is 8.1.b4.
shared_buffers=32768, fsync=off.

Thanks in advance for your help.

-K

---------------------------
> head -n 20 gprof.txt.4.777.47
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 10.92      0.38     0.38    55027     0.00     0.00  XLogInsert
  6.90      0.62     0.24   702994     0.00     0.00  _bt_compare
  5.46      0.81     0.19        2     0.10     1.64  DoCopy
  4.60      0.97     0.16    16077     0.00     0.00  CopyReadLine
  3.74      1.10     0.13   484243     0.00     0.00  bttextcmp
  2.87      1.20     0.10    93640     0.00     0.00  _bt_binsrch
  2.59      1.29     0.09   484243     0.00     0.00  varstr_cmp
  2.59      1.38     0.09   364292     0.00     0.00  LWLockRelease
  2.30      1.46     0.08   703394     0.00     0.00  FunctionCall2
  2.01      1.53     0.07   138025     0.00     0.00  hash_any
  2.01      1.60     0.07   133176     0.00     0.00  ReadBuffer
  2.01      1.67     0.07   364110     0.00     0.00  LWLockAcquire
  2.01      1.74     0.07   132563     0.00     0.00  PinBuffer
  1.72      1.80     0.06    38950     0.00     0.00  _bt_insertonpg
  1.72      1.86     0.06    38767     0.00     0.00  _bt_mkscankey

---------------------------
> head -n 20 gprof.txt.6042.1344.84
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  9.67      0.52     0.52    50431     0.00     0.00  XLogInsert
  7.71      0.94     0.42  1045427     0.00     0.00  _bt_compare
  5.95      1.26     0.32   713392     0.00     0.00  bttextcmp
  4.28      1.49     0.23  1045814     0.00     0.00  FunctionCall2
  3.35      1.67     0.18   155756     0.00     0.00  _bt_binsrch
  2.60      1.81     0.14   713392     0.00     0.00  varstr_cmp
  2.60      1.95     0.14   475524     0.00     0.00  LWLockAcquire
  2.60      2.09     0.14   191837     0.00     0.00  ReadBuffer
  2.60      2.23     0.14        2     0.07     2.52  DoCopy
  2.60      2.37     0.14   197393     0.00     0.00  hash_search
  2.60      2.51     0.14   197205     0.00     0.00  hash_any
  2.23      2.63     0.12   190481     0.00     0.00  PinBuffer
  2.04      2.74     0.11   345866     0.00     0.00  AllocSetAlloc
  1.86      2.84     0.10   475788     0.00     0.00  LWLockRelease
  1.86      2.94     0.10    29620     0.00     0.00  pg_localtime

---------------------------
> head -n 20 gprof.txt.6067.9883.31
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 17.17      1.14     1.14    51231     0.00     0.00  XLogInsert
 10.82      1.85     0.72  1065556     0.00     0.00  _bt_compare
  4.77      2.17     0.32   158378     0.00     0.00  _bt_binsrch
  3.18      2.38     0.21   202921     0.00     0.00  hash_search
  3.18      2.59     0.21   742891     0.00     0.00  bttextcmp
  2.87      2.78     0.19  1485787     0.00     0.00  pg_detoast_datum
  2.87      2.97     0.19  1065325     0.00     0.00  FunctionCall2
  2.65      3.14     0.18   490373     0.00     0.00  LWLockAcquire
  2.27      3.29     0.15        2     0.08     3.08  DoCopy
  2.27      3.44     0.15   490908     0.00     0.00  LWLockRelease
  1.97      3.57     0.13   195049     0.00     0.00  ReadBuffer
  1.97      3.70     0.13   742891     0.00     0.00  varstr_cmp
  1.66      3.81     0.11   462134     0.00     0.00  LockBuffer
  1.51      3.91     0.10   191345     0.00     0.00  PinBuffer
  1.51      4.01     0.10   195049     0.00     0.00  UnpinBuffer

---------------------------
create table order_main (
    ord_id varchar(12) not null,
    firm_id varchar not null,
    firm_sub_id varchar not null,
    cl_ord_id varchar not null,
    clearing_firm varchar not null,
    clearing_account varchar not null,
    symbol varchar not null,
    side varchar(1) not null,
    size integer not null,
    price numeric(10,4) not null,
    expire_time timestamp with time zone,
    flags varchar(7) not null
);

create unique index order_main_pk on order_main (
    ord_id
) tablespace idx_space;

create index order_main_ak1 on order_main (
    cl_ord_id
) tablespace idx_space;


create table order_transition (
    collating_seq bigint not null,
    ord_id varchar(12) not null,
    cl_ord_id varchar,
    sending_time timestamp with time zone not null,
    transact_time timestamp with time zone not null,
    flags varchar(6) not null,
    exec_id varchar(12),
    size integer,
    price numeric(10,4),
    remainder integer,
    contra varchar
);

create unique index order_transition_pk on order_transition (
    collating_seq
) tablespace idx_space;

create index order_transition_ak1 on order_transition (
    ord_id
) tablespace idx_space;

create index order_transition_ak2 on order_transition (
    cl_ord_id
)
tablespace idx_space
where cl_ord_id is not null;

create index order_transition_ak3 on order_transition (
    exec_id
)
tablespace idx_space
where exec_id is not null;



Re: 8.x index insert performance

From
Tom Lane
Date:
Kelly Burkhart <kelly@tradebotsystems.com> writes:
> I've finally gotten around to profiling the back end.

Thanks for following up.

The sudden appearance of pg_detoast_datum() in the top ten in the third
profile is suspicious.  I wouldn't expect that to get called at all,
really, during a normal COPY IN process.  The only way I can imagine it
getting called is if you have index entries that require toasting, which
seems a bit unlikely to start happening only after 60 million rows.
Is it possible that the index keys are getting longer and longer as your
test run proceeds?

Could you send me (off list) the complete gprof output files?

            regards, tom lane

Re: 8.x index insert performance

From
Tom Lane
Date:
Kelly Burkhart <kelly@tradebotsystems.com> writes:
> On Thu, 2005-11-10 at 17:18 -0500, Tom Lane wrote:
>> Could you send me (off list) the complete gprof output files?

> Sure,

Thanks.  Right offhand I can see no smoking gun here.  The
pg_detoast_datum entry I was worried about seems to be just measurement
noise --- the gprof trace shows that it's called a proportional number
of times in both cases, and it falls through without actually doing
anything in all cases.

The later trace involves a slightly larger amount of time spent
inserting into the indexes, which is what you'd expect as the indexes
get bigger, but it doesn't seem that CPU time per se is the issue.
The just-before-the-cliff trace shows total CPU of 5.38 sec and the
after-the-cliff one shows 6.61 sec.

What I now suspect is happening is that you "hit the wall" at the point
where the indexes no longer fit into main memory and it starts taking
significant I/O to search and update them.  Have you tried watching
iostat or vmstat output to see if there's a noticeable increase in I/O
at the point where things slow down?  Can you check the physical size of
the indexes at that point, and see if it seems related to your available
RAM?

If that is the correct explanation, then the only solutions I can see
are (1) buy more RAM or (2) avoid doing incremental index updates;
that is, drop the indexes before bulk load and rebuild them afterwards.

One point to consider is that an index will be randomly accessed only
if its data is being loaded in random order.  If you're loading keys in
sequential order then only the "right-hand edge" of the index would get
touched, and it wouldn't need much RAM.  So, depending on what order
you're loading data in, the primary key index may not be contributing
to the problem.

            regards, tom lane

Re: 8.x index insert performance

From
Tom Lane
Date:
Kelly Burkhart <kelly@tradebotsystems.com> writes:
> ...  A graph showing the performance
> characteristics is here:

> <http://kkcsm.net/pgcpy.jpg>

I hadn't looked at this chart till just now, but it sure seems to put a
crimp in my theory that you are running out of room to hold the indexes
in RAM.  That theory would predict that once you fall over the knee of
the curve, performance would get steadily worse; instead it gets
markedly worse and then improves a bit.  And there's another cycle of
worse-and-better around 80M rows.  I have *no* idea what's up with that.
Anyone?  Kelly, could there be any patterns in the data that might be
related?

The narrow spikes look like they are probably induced by checkpoints.
You could check that by seeing if their spacing changes when you alter
checkpoint_segments and checkpoint_timeout.  It might also be
entertaining to make the bgwriter parameters more aggressive to see
if you can ameliorate the spikes.

            regards, tom lane

Re: 8.x index insert performance

From
Kelly Burkhart
Date:
On Thu, 2005-11-10 at 19:13 -0500, Tom Lane wrote:
> Kelly Burkhart <kelly@tradebotsystems.com> writes:
> > ...  A graph showing the performance
> > characteristics is here:
>
> > <http://kkcsm.net/pgcpy.jpg>
>
> I hadn't looked at this chart till just now, but it sure seems to put a
> crimp in my theory that you are running out of room to hold the indexes
> in RAM.  That theory would predict that once you fall over the knee of
> the curve, performance would get steadily worse; instead it gets
> markedly worse and then improves a bit.  And there's another cycle of
> worse-and-better around 80M rows.  I have *no* idea what's up with that.
> Anyone?  Kelly, could there be any patterns in the data that might be
> related?

I modified my original program to insert generated, sequential data.
The following graph shows the results to be flat:

<http://kkcsm.net/pgcpy_20051111_1.jpg>

Thus, hardware is sufficient to handle predictably sequential data.
There very well could be a pattern in the data which could affect
things, however, I'm not sure how to identify it in 100K rows out of
100M.

If I could identify a pattern, what could I do about it?  Could I do
some kind of a reversible transform on the data?  Is it better to insert
nearly random values?  Or nearly sequential?


I now have an 8G and a 16G machine I'm loading the data into.  I'll
report back after that's done.

I also want to try eliminating the order_main table, moving fields to
the transition table.  This will reduce the number of index updates
significantly at the cost of some wasted space in the table...

-K

Re: 8.x index insert performance

From
"Kevin Grittner"
Date:
That sure seems to bolster the theory that performance is degrading
because you exhaust the cache space and need to start reading
index pages.  When inserting sequential data, you don't need to
randomly access pages all over the index tree.

-Kevin


>>> Kelly Burkhart <kelly@tradebotsystems.com>  >>>

I modified my original program to insert generated, sequential data.
The following graph shows the results to be flat:

<http://kkcsm.net/pgcpy_20051111_1.jpg>


Re: 8.x index insert performance

From
Tom Lane
Date:
Kelly Burkhart <kelly@tradebotsystems.com> writes:
> On Thu, 2005-11-10 at 19:13 -0500, Tom Lane wrote:
>> Kelly, could there be any patterns in the data that might be
>> related?

> I modified my original program to insert generated, sequential data.
> The following graph shows the results to be flat:
> <http://kkcsm.net/pgcpy_20051111_1.jpg>
> Thus, hardware is sufficient to handle predictably sequential data.

Yeah, inserting sequentially increasing data would only ever touch the
right-hand edge of the btree, so memory requirements would be pretty low
and constant.

> There very well could be a pattern in the data which could affect
> things, however, I'm not sure how to identify it in 100K rows out of
> 100M.

I conjecture that the problem areas represent places where the key
sequence is significantly "more random" than it is elsewhere.  Hard
to be more specific than that though.

            regards, tom lane

Re: 8.x index insert performance

From
Kelly Burkhart
Date:
On Fri, 2005-11-11 at 18:02 -0500, Tom Lane wrote:
> > There very well could be a pattern in the data which could affect
> > things, however, I'm not sure how to identify it in 100K rows out of
> > 100M.
>
> I conjecture that the problem areas represent places where the key
> sequence is significantly "more random" than it is elsewhere.  Hard
> to be more specific than that though.
>

OK, I understand the pattern now.

My two tables hold orders, and order state transitions.  Most orders
have two transitions: creation and termination.  The problem happens
when there is a significant number of orders where termination is
happening a long time after creation, causing order_transition rows with
old ord_id values to be inserted.

This is valid, so I have to figure out a way to accomodate it.

You mentioned playing with checkpointing and bgwriter earlier in this
thread.  I experimented with the bgwriter through the weekend, but I
don't have a good idea what sensible parameter changes are...

Re: checkpointing, currently my checkpoints are happening every 5
minutes (if I turn on fsync, the graph shows checkpoints dramatically).
If I increase the checkpoint_timeout, could that be beneficial?  Or
would I just have more time between larger spikes?

-K

Re: 8.x index insert performance

From
Ron
Date:
At 09:43 AM 11/14/2005, Kelly Burkhart wrote:
>On Fri, 2005-11-11 at 18:02 -0500, Tom Lane wrote:
> > > There very well could be a pattern in the data which could affect
> > > things, however, I'm not sure how to identify it in 100K rows out of
> > > 100M.
> >
> > I conjecture that the problem areas represent places where the key
> > sequence is significantly "more random" than it is elsewhere.  Hard
> > to be more specific than that though.
> >
>
>OK, I understand the pattern now.
>
>My two tables hold orders, and order state transitions.  Most orders
>have two transitions: creation and termination.  The problem happens
>when there is a significant number of orders where termination is
>happening a long time after creation, causing order_transition rows with
>old ord_id values to be inserted.
>
>This is valid, so I have to figure out a way to accomodate it.
Perhaps a small schema change would help?  Instead of having the
order state transitions explicitly listed in the table, why not
create two new tables; 1 for created orders and 1 for terminated
orders.  When an order is created, its ord_id goes into the
CreatedOrders table.  When an order is terminated, its ord_id is
added to the TerminatedOrders table and then deleted from the
CreatedOrders table.

Downsides to this approach are some extra complexity and that you
will have to make sure that system disaster recovery includes making
sure that no ord_id appears in both the CreatedOrders and
TerminatedOrdes tables.  Upsides are that the insert problem goes
away and certain kinds of accounting and inventory reports are now
easier to create.

Ron