Thread: 8.x index insert performance
Greetings, We are running some performance tests in which we are attempting to insert about 100,000,000 rows in a database at a sustained rate. About 50M rows in, our performance drops dramatically. This test is with data that we believe to be close to what we will encounter in production. However in tests with purely generated, sequential data, we did not notice this slowdown. I'm trying to figure out what patterns in the "real" data may be causing us problems. I have log,data and indexes on separate LUNs on an EMC SAN. Prior to slowdown, each partition is writing at a consistent rate. Index partition is reading at a much lower rate. At the time of slowdown, index partition read rate increases, all write rates decrease. CPU utilization drops. The server is doing nothing aside from running the DB. It is a dual opteron (dual core, looks like 4 cpus) with 4GB RAM. shared_buffers = 32768. fsync = off. Postgres version is 8.1.b4. OS is SuSE Enterprise server 9. My leading hypothesis is that one indexed column may be leading to our issue. The column in question is a varchar(12) column which is non-null in about 2% of the rows. The value of this column is 5 characters which are the same for every row, followed by a 7 character zero filled base 36 integer. Thus, every value of this field will be exactly 12 bytes long, and will be substantially the same down to the last bytes. Could this pattern be pessimal for a postgresql btree index? I'm running a test now to see if I can verify, but my runs take quite a long time... If this sounds like an unlikely culprit how can I go about tracking down the issue? Thanks, -K
Kelly wrote: > We are running some performance tests in which we are attempting to > insert about 100,000,000 rows in a database at a sustained rate. About > 50M rows in, our performance drops dramatically. > > This test is with data that we believe to be close to what we will > encounter in production. However in tests with purely generated, > sequential data, we did not notice this slowdown. I'm trying to figure > out what patterns in the "real" data may be causing us problems. > > I have log,data and indexes on separate LUNs on an EMC SAN. Prior to > slowdown, each partition is writing at a consistent rate. Index > partition is reading at a much lower rate. At the time of slowdown, > index partition read rate increases, all write rates decrease. CPU > utilization drops. > > The server is doing nothing aside from running the DB. It is a dual > opteron (dual core, looks like 4 cpus) with 4GB RAM. shared_buffers = > 32768. fsync = off. Postgres version is 8.1.b4. OS is SuSE Enterprise > server 9. > > My leading hypothesis is that one indexed column may be leading to our > issue. The column in question is a varchar(12) column which is non-null > in about 2% of the rows. The value of this column is 5 characters which > are the same for every row, followed by a 7 character zero filled base > 36 integer. Thus, every value of this field will be exactly 12 bytes > long, and will be substantially the same down to the last bytes. > > Could this pattern be pessimal for a postgresql btree index? I'm > running a test now to see if I can verify, but my runs take quite a long > time... > > If this sounds like an unlikely culprit how can I go about tracking down > the issue? well, can you defer index generation until after loading the set (or use COPY?) if that index is causing the problem, you may want to consider setting up partial index to exclude null values. One interesting thing to do would be to run your inserting process until slowdown happens, stop the process, and reindex the table and then resume it, and see if this helps. Merlin
On Mon, 2005-10-31 at 12:32 -0500, Merlin Moncure wrote: > well, can you defer index generation until after loading the set (or use > COPY?) I cannot defer index generation. We are using the copy API. Copying 10000 rows in a batch. > > if that index is causing the problem, you may want to consider setting > up partial index to exclude null values. This is a single column index. I assumed that null column values were not indexed. Is my assumption incorrect? -K
> We are running some performance tests in which we are attempting to > insert about 100,000,000 rows in a database at a sustained rate. About > 50M rows in, our performance drops dramatically. > > This test is with data that we believe to be close to what we will > encounter in production. However in tests with purely generated, > sequential data, we did not notice this slowdown. I'm trying to figure > out what patterns in the "real" data may be causing us problems. > > I have log,data and indexes on separate LUNs on an EMC SAN. Prior to > slowdown, each partition is writing at a consistent rate. Index > partition is reading at a much lower rate. At the time of slowdown, > index partition read rate increases, all write rates decrease. CPU > utilization drops. I'm doing some test-inserts (albeit with much fewer records) into 8.0.4 (on FreeBSD 6.0 RC1) and the import-time decreased three-fold when I increased the below mentioned values: shared_buffers = 8192 commit_delay = 100000 commit_siblings = 1000 When I increased shared_buffers the kernel needed minor tweaking. regards Claus
> > if that index is causing the problem, you may want to consider setting > > up partial index to exclude null values. > > This is a single column index. I assumed that null column values were > not indexed. Is my assumption incorrect? > > -K It turns out it is, or it certainly seems to be. I didn't know that :). So partial index will probably not help for null exclusion... would be interesting to see if you are getting swaps (check pg_tmp) when performance breaks down. That is an easy fix, bump work_mem. Merlin
On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote: > if that index is causing the problem, you may want to consider setting > up partial index to exclude null values. Hey all. Pardon my ignorance. :-) I've been trying to figure out whether null values are indexed or not from the documentation. I was under the impression, that null values are not stored in the index. Occassionally, though, I then see a suggestion such as the above, that seems to indicate to me that null values *are* stored in the index, allowing for the 'exclude null values' to have effect? Which is it? :-) Thanks, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
> On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote: > > if that index is causing the problem, you may want to consider setting > > up partial index to exclude null values. > > Hey all. > > Pardon my ignorance. :-) > > I've been trying to figure out whether null values are indexed or not from > the documentation. I was under the impression, that null values are not > stored in the index. Occassionally, though, I then see a suggestion such > as the above, that seems to indicate to me that null values *are* stored > in the index, allowing for the 'exclude null values' to have effect? > > Which is it? :-) I think I'm the ignorant one...do explain on any lookup on an indexed field where the field value is null and you get a seqscan. Merlin
mark@mark.mielke.cc writes: > I've been trying to figure out whether null values are indexed or not from > the documentation. I was under the impression, that null values are not > stored in the index. You're mistaken, at least with regard to btree indexes. regards, tom lane
On Mon, 2005-10-31 at 15:30 -0500, Tom Lane wrote: > mark@mark.mielke.cc writes: > > I've been trying to figure out whether null values are indexed or not from > > the documentation. I was under the impression, that null values are not > > stored in the index. > > You're mistaken, at least with regard to btree indexes. Ha! So I'm creating an index 98% full of nulls! Looks like this is easily fixed with partial indexes. -K
> mark@mark.mielke.cc writes: > > I've been trying to figure out whether null values are indexed or not > from > > the documentation. I was under the impression, that null values are not > > stored in the index. > > You're mistaken, at least with regard to btree indexes. hmm. I tried several different ways to filter/extract null values from an indexed key and got a seq scan every time. The only way I could query for/against null values was to convert to bool via function. However I did a partial exclusion on a 1% non null value really big table and index size dropped as expected. Merlin
On Mon, 2005-10-31 at 13:13, Merlin Moncure wrote: > > > if that index is causing the problem, you may want to consider > setting > > > up partial index to exclude null values. > > > > This is a single column index. I assumed that null column values were > > not indexed. Is my assumption incorrect? > > > > -K > It turns out it is, or it certainly seems to be. I didn't know that :). > So partial index will probably not help for null exclusion... > > would be interesting to see if you are getting swaps (check pg_tmp) when > performance breaks down. That is an easy fix, bump work_mem. OK, here's the issue in a nutshell. NULLS, like everything else, are indexed. HOWEVER, there's no way for them to be used by a normal query, since =NULL is not a legal construct. So, you can't do something like: select * from sometable where somefield = NULL because you won't get any answers, since nothing can equal NULL and select * from sometable where somefield IS NULL won't work because IS is not a nomally indexible operator. Which is why you can create two indexes on a table to get around this like so: create index iname1 on table (field) where field IS NULL and create index iname2 on table (field) where field IS NOT NULL And then the nulls are indexable by IS / IS NOT NULL.
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: >> You're mistaken, at least with regard to btree indexes. > hmm. I tried several different ways to filter/extract null values from > an indexed key and got a seq scan every time. I said they were stored, not that you could query against them ;-) IS NULL isn't considered an indexable operator, mainly because it's not an operator at all in the strict sense of the word; and our index access APIs only support querying on indexable operators. The reason they're stored is that they have to be in order to make multi-column indexes work right. I suppose we could special-case single-column indexes, but we don't. In any case, it's more likely that someone would one day get around to making IS NULL an indexable operator than that we'd insert a special case like that. regards, tom lane
> select * from sometable where somefield IS NULL won't work because IS is > not a nomally indexible operator. Ah, I didn't know that. So there is no real reason not to exclude null values from all your indexes :). Reading Tom's recent comments everything is clear now. Instead of using your two index approach I prefer to: create function nullidx(anyelement) returns boolean as $$ select $1 is null; $$ language sql immutable; create index on t(nullidx(f)); -- etc Merlin
Kelly Burkhart <kelly@tradebotsystems.com> writes: > Ha! So I'm creating an index 98% full of nulls! Looks like this is > easily fixed with partial indexes. Still, though, it's not immediately clear why you'd be seeing a severe dropoff in insert performance after 50M rows. Even though there are lots of nulls, I don't see why they'd behave any worse for insert speed than real data. One would like to think that the insert speed would follow a nice O(log N) rule. Are you doing the inserts all in one transaction, or several? If several, could you get a gprof profile of inserting the same number of rows (say a million or so) both before and after the unexpected dropoff occurs? regards, tom lane
On Mon, Oct 31, 2005 at 03:27:31PM -0500, Merlin Moncure wrote: > > On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote: > > > if that index is causing the problem, you may want to consider setting > > > up partial index to exclude null values. > > Hey all. > > Pardon my ignorance. :-) > > I've been trying to figure out whether null values are indexed or not from > > the documentation. I was under the impression, that null values are not > > stored in the index. Occassionally, though, I then see a suggestion such > > as the above, that seems to indicate to me that null values *are* stored > > in the index, allowing for the 'exclude null values' to have effect? > > Which is it? :-) > I think I'm the ignorant one...do explain on any lookup on an indexed > field where the field value is null and you get a seqscan. Nahhh... I think the documentation could use more explicit or obvious explanation. Or, I could have checked the source code to see. In any case, I expect we aren't the only ones that lacked confidence. Tom was kind enough to point out that null values are stored. I expect that the seqscan is used if the null values are not selective enough, the same as any other value that isn't selective enough. Now we can both have a little more confidence! :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
On Mon, 2005-10-31 at 16:18 -0500, Tom Lane wrote: > Kelly Burkhart <kelly@tradebotsystems.com> writes: > > Ha! So I'm creating an index 98% full of nulls! Looks like this is > > easily fixed with partial indexes. > > Still, though, it's not immediately clear why you'd be seeing a severe > dropoff in insert performance after 50M rows. Even though there are > lots of nulls, I don't see why they'd behave any worse for insert speed > than real data. One would like to think that the insert speed would > follow a nice O(log N) rule. > > Are you doing the inserts all in one transaction, or several? If > several, could you get a gprof profile of inserting the same number of > rows (say a million or so) both before and after the unexpected dropoff > occurs? I'm doing the inserts via libpq copy. Commits are in batches of approx 15000 rows. I did a run last night after modifying the indexes and saw the same pattern. I'm dumping the database now and will modify my test program to copy data from the dump rather than purely generated data. Hopefully, this will allow me to reproduce the problem in a way that takes less time to set up and run. Tom, I'd be happy to profile the backend at several points in the run if you think that would be helpful. What compiler flags should I use? Current settings in Makefile.global are: CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wendif-labels -fno-strict-aliasing Should I change this to: CFLAGS = -g -pg -Wall ... Or should I leave the -O2 in? It may be weekend by the time I get this done. -K
Kelly Burkhart <kelly@tradebotsystems.com> writes: > Tom, I'd be happy to profile the backend at several points in the run if > you think that would be helpful. What compiler flags should I use? Add -g -pg and leave the rest alone. Also, if you're on Linux note that you need -DLINUX_PROFILE. > It may be weekend by the time I get this done. Well, it's probably too late to think of tweaking 8.1 anyway... regards, tom lane
I'm surprised that no one seems to have yet suggested the following simple experiment: Increase the RAM 4GB -> 8GB, tune for best performance, and repeat your 100M row insert experiment. Does overall insert performance change? Does the performance drop <foo> rows in still occur? Does it occur in ~ the same place? Etc. If the effect does seem to be sensitive to the amount of RAM in the server, it might be worth redoing the experiment(s) with 2GB and 16GB as well... ron -----Original Message----- From: Kelly Burkhart <kelly@tradebotsystems.com> Sent: Oct 31, 2005 12:12 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] 8.x index insert performance Greetings, We are running some performance tests in which we are attempting to insert about 100,000,000 rows in a database at a sustained rate. About 50M rows in, our performance drops dramatically. This test is with data that we believe to be close to what we will encounter in production. However in tests with purely generated, sequential data, we did not notice this slowdown. I'm trying to figure out what patterns in the "real" data may be causing us problems. I have log,data and indexes on separate LUNs on an EMC SAN. Prior to slowdown, each partition is writing at a consistent rate. Index partition is reading at a much lower rate. At the time of slowdown, index partition read rate increases, all write rates decrease. CPU utilization drops. The server is doing nothing aside from running the DB. It is a dual opteron (dual core, looks like 4 cpus) with 4GB RAM. shared_buffers = 32768. fsync = off. Postgres version is 8.1.b4. OS is SuSE Enterprise server 9. My leading hypothesis is that one indexed column may be leading to our issue. The column in question is a varchar(12) column which is non-null in about 2% of the rows. The value of this column is 5 characters which are the same for every row, followed by a 7 character zero filled base 36 integer. Thus, every value of this field will be exactly 12 bytes long, and will be substantially the same down to the last bytes. Could this pattern be pessimal for a postgresql btree index? I'm running a test now to see if I can verify, but my runs take quite a long time... If this sounds like an unlikely culprit how can I go about tracking down the issue? Thanks, -K ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > > select * from sometable where somefield IS NULL won't work because IS > is > > not a nomally indexible operator. > > Ah, I didn't know that. So there is no real reason not to exclude null > values from all your indexes :). Reading Tom's recent comments > everything is clear now. There are other reasons. If you want a query like SELECT * FROM tab ORDER BY col LIMIT 10 to use an index on col then it can't exclude NULLs or else it wouldn't be useful. (Oracle actually has this problem, you frequently have to add WHERE col IS NOT NULL" in order to let it use an index.) -- greg
On Tue, 2005-11-01 at 10:37 -0500, Ron Peacetree wrote: > I'm surprised that no one seems to have yet suggested the following > simple experiment: > > Increase the RAM 4GB -> 8GB, tune for best performance, and > repeat your 100M row insert experiment. > > Does overall insert performance change? Does the performance > drop <foo> rows in still occur? Does it occur in ~ the same place? > Etc. > > If the effect does seem to be sensitive to the amount of RAM in the > server, it might be worth redoing the experiment(s) with 2GB and > 16GB as well... Ron, I would like to try this, however, since I'm sitting about 1000 miles away from the server, tweaking things is not as simple as one might hope. I would also like to understand what is going on before I start changing things. If I can't get a satisfactory explanation for what I'm seeing with current hardware, I'll have memory added and see what happens. -K
Second try... no attachment this time. I've finally gotten around to profiling the back end. Here is a more precise description of what I'm doing: I am copying data into two tables, order_main and order_transition (table defs at the end of this post). The order_transition table has roughly double the number of rows as the order_main table. My program is a C program using the libpq copy api which effectively simulates our real application. It reads data from two data files, and appends copy-formatted data into two in-memory buffers. After 10,000 order_transitions, it copies the order_main data, then the order_transition data, then commits. The test program is running on a different machine than the DB. After each batch it writes a record to stdout with the amount of time it took to copy and commit the data (time only includes pg time, not the time it took to build the buffers). A graph showing the performance characteristics is here: <http://kkcsm.net/pgcpy.jpg> The horizontal axis is number of transitions * 10000 that have been written. The vertical axis is time in milliseconds to copy and commit the data. The commit time is very consistent up until about 60,000,000 rows, then performance drops and times become much less consistent. I profiled the backend at three points, on batches 4, 6042 and 6067. The first is right after start, the second is right before we hit the wall, and the third is one of the initial slow batches. I'm including inline the first 20 lines of gprof output for each batch. Please let me know if this is insufficient. I'll supply any necessary further info. Since this thread is stale, I'll repeat relevant hardware/software stats: server is a dual, dual-core opteron with 4GB RAM. Disk is an EMC Symmetrix connected via FC. Data, index, logs on three separate LUNS. OS is SuSE Enterprise 9. Postgres version is 8.1.b4. shared_buffers=32768, fsync=off. Thanks in advance for your help. -K --------------------------- > head -n 20 gprof.txt.4.777.47 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 10.92 0.38 0.38 55027 0.00 0.00 XLogInsert 6.90 0.62 0.24 702994 0.00 0.00 _bt_compare 5.46 0.81 0.19 2 0.10 1.64 DoCopy 4.60 0.97 0.16 16077 0.00 0.00 CopyReadLine 3.74 1.10 0.13 484243 0.00 0.00 bttextcmp 2.87 1.20 0.10 93640 0.00 0.00 _bt_binsrch 2.59 1.29 0.09 484243 0.00 0.00 varstr_cmp 2.59 1.38 0.09 364292 0.00 0.00 LWLockRelease 2.30 1.46 0.08 703394 0.00 0.00 FunctionCall2 2.01 1.53 0.07 138025 0.00 0.00 hash_any 2.01 1.60 0.07 133176 0.00 0.00 ReadBuffer 2.01 1.67 0.07 364110 0.00 0.00 LWLockAcquire 2.01 1.74 0.07 132563 0.00 0.00 PinBuffer 1.72 1.80 0.06 38950 0.00 0.00 _bt_insertonpg 1.72 1.86 0.06 38767 0.00 0.00 _bt_mkscankey --------------------------- > head -n 20 gprof.txt.6042.1344.84 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 9.67 0.52 0.52 50431 0.00 0.00 XLogInsert 7.71 0.94 0.42 1045427 0.00 0.00 _bt_compare 5.95 1.26 0.32 713392 0.00 0.00 bttextcmp 4.28 1.49 0.23 1045814 0.00 0.00 FunctionCall2 3.35 1.67 0.18 155756 0.00 0.00 _bt_binsrch 2.60 1.81 0.14 713392 0.00 0.00 varstr_cmp 2.60 1.95 0.14 475524 0.00 0.00 LWLockAcquire 2.60 2.09 0.14 191837 0.00 0.00 ReadBuffer 2.60 2.23 0.14 2 0.07 2.52 DoCopy 2.60 2.37 0.14 197393 0.00 0.00 hash_search 2.60 2.51 0.14 197205 0.00 0.00 hash_any 2.23 2.63 0.12 190481 0.00 0.00 PinBuffer 2.04 2.74 0.11 345866 0.00 0.00 AllocSetAlloc 1.86 2.84 0.10 475788 0.00 0.00 LWLockRelease 1.86 2.94 0.10 29620 0.00 0.00 pg_localtime --------------------------- > head -n 20 gprof.txt.6067.9883.31 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 17.17 1.14 1.14 51231 0.00 0.00 XLogInsert 10.82 1.85 0.72 1065556 0.00 0.00 _bt_compare 4.77 2.17 0.32 158378 0.00 0.00 _bt_binsrch 3.18 2.38 0.21 202921 0.00 0.00 hash_search 3.18 2.59 0.21 742891 0.00 0.00 bttextcmp 2.87 2.78 0.19 1485787 0.00 0.00 pg_detoast_datum 2.87 2.97 0.19 1065325 0.00 0.00 FunctionCall2 2.65 3.14 0.18 490373 0.00 0.00 LWLockAcquire 2.27 3.29 0.15 2 0.08 3.08 DoCopy 2.27 3.44 0.15 490908 0.00 0.00 LWLockRelease 1.97 3.57 0.13 195049 0.00 0.00 ReadBuffer 1.97 3.70 0.13 742891 0.00 0.00 varstr_cmp 1.66 3.81 0.11 462134 0.00 0.00 LockBuffer 1.51 3.91 0.10 191345 0.00 0.00 PinBuffer 1.51 4.01 0.10 195049 0.00 0.00 UnpinBuffer --------------------------- create table order_main ( ord_id varchar(12) not null, firm_id varchar not null, firm_sub_id varchar not null, cl_ord_id varchar not null, clearing_firm varchar not null, clearing_account varchar not null, symbol varchar not null, side varchar(1) not null, size integer not null, price numeric(10,4) not null, expire_time timestamp with time zone, flags varchar(7) not null ); create unique index order_main_pk on order_main ( ord_id ) tablespace idx_space; create index order_main_ak1 on order_main ( cl_ord_id ) tablespace idx_space; create table order_transition ( collating_seq bigint not null, ord_id varchar(12) not null, cl_ord_id varchar, sending_time timestamp with time zone not null, transact_time timestamp with time zone not null, flags varchar(6) not null, exec_id varchar(12), size integer, price numeric(10,4), remainder integer, contra varchar ); create unique index order_transition_pk on order_transition ( collating_seq ) tablespace idx_space; create index order_transition_ak1 on order_transition ( ord_id ) tablespace idx_space; create index order_transition_ak2 on order_transition ( cl_ord_id ) tablespace idx_space where cl_ord_id is not null; create index order_transition_ak3 on order_transition ( exec_id ) tablespace idx_space where exec_id is not null;
Kelly Burkhart <kelly@tradebotsystems.com> writes: > I've finally gotten around to profiling the back end. Thanks for following up. The sudden appearance of pg_detoast_datum() in the top ten in the third profile is suspicious. I wouldn't expect that to get called at all, really, during a normal COPY IN process. The only way I can imagine it getting called is if you have index entries that require toasting, which seems a bit unlikely to start happening only after 60 million rows. Is it possible that the index keys are getting longer and longer as your test run proceeds? Could you send me (off list) the complete gprof output files? regards, tom lane
Kelly Burkhart <kelly@tradebotsystems.com> writes: > On Thu, 2005-11-10 at 17:18 -0500, Tom Lane wrote: >> Could you send me (off list) the complete gprof output files? > Sure, Thanks. Right offhand I can see no smoking gun here. The pg_detoast_datum entry I was worried about seems to be just measurement noise --- the gprof trace shows that it's called a proportional number of times in both cases, and it falls through without actually doing anything in all cases. The later trace involves a slightly larger amount of time spent inserting into the indexes, which is what you'd expect as the indexes get bigger, but it doesn't seem that CPU time per se is the issue. The just-before-the-cliff trace shows total CPU of 5.38 sec and the after-the-cliff one shows 6.61 sec. What I now suspect is happening is that you "hit the wall" at the point where the indexes no longer fit into main memory and it starts taking significant I/O to search and update them. Have you tried watching iostat or vmstat output to see if there's a noticeable increase in I/O at the point where things slow down? Can you check the physical size of the indexes at that point, and see if it seems related to your available RAM? If that is the correct explanation, then the only solutions I can see are (1) buy more RAM or (2) avoid doing incremental index updates; that is, drop the indexes before bulk load and rebuild them afterwards. One point to consider is that an index will be randomly accessed only if its data is being loaded in random order. If you're loading keys in sequential order then only the "right-hand edge" of the index would get touched, and it wouldn't need much RAM. So, depending on what order you're loading data in, the primary key index may not be contributing to the problem. regards, tom lane
Kelly Burkhart <kelly@tradebotsystems.com> writes: > ... A graph showing the performance > characteristics is here: > <http://kkcsm.net/pgcpy.jpg> I hadn't looked at this chart till just now, but it sure seems to put a crimp in my theory that you are running out of room to hold the indexes in RAM. That theory would predict that once you fall over the knee of the curve, performance would get steadily worse; instead it gets markedly worse and then improves a bit. And there's another cycle of worse-and-better around 80M rows. I have *no* idea what's up with that. Anyone? Kelly, could there be any patterns in the data that might be related? The narrow spikes look like they are probably induced by checkpoints. You could check that by seeing if their spacing changes when you alter checkpoint_segments and checkpoint_timeout. It might also be entertaining to make the bgwriter parameters more aggressive to see if you can ameliorate the spikes. regards, tom lane
On Thu, 2005-11-10 at 19:13 -0500, Tom Lane wrote: > Kelly Burkhart <kelly@tradebotsystems.com> writes: > > ... A graph showing the performance > > characteristics is here: > > > <http://kkcsm.net/pgcpy.jpg> > > I hadn't looked at this chart till just now, but it sure seems to put a > crimp in my theory that you are running out of room to hold the indexes > in RAM. That theory would predict that once you fall over the knee of > the curve, performance would get steadily worse; instead it gets > markedly worse and then improves a bit. And there's another cycle of > worse-and-better around 80M rows. I have *no* idea what's up with that. > Anyone? Kelly, could there be any patterns in the data that might be > related? I modified my original program to insert generated, sequential data. The following graph shows the results to be flat: <http://kkcsm.net/pgcpy_20051111_1.jpg> Thus, hardware is sufficient to handle predictably sequential data. There very well could be a pattern in the data which could affect things, however, I'm not sure how to identify it in 100K rows out of 100M. If I could identify a pattern, what could I do about it? Could I do some kind of a reversible transform on the data? Is it better to insert nearly random values? Or nearly sequential? I now have an 8G and a 16G machine I'm loading the data into. I'll report back after that's done. I also want to try eliminating the order_main table, moving fields to the transition table. This will reduce the number of index updates significantly at the cost of some wasted space in the table... -K
That sure seems to bolster the theory that performance is degrading because you exhaust the cache space and need to start reading index pages. When inserting sequential data, you don't need to randomly access pages all over the index tree. -Kevin >>> Kelly Burkhart <kelly@tradebotsystems.com> >>> I modified my original program to insert generated, sequential data. The following graph shows the results to be flat: <http://kkcsm.net/pgcpy_20051111_1.jpg>
Kelly Burkhart <kelly@tradebotsystems.com> writes: > On Thu, 2005-11-10 at 19:13 -0500, Tom Lane wrote: >> Kelly, could there be any patterns in the data that might be >> related? > I modified my original program to insert generated, sequential data. > The following graph shows the results to be flat: > <http://kkcsm.net/pgcpy_20051111_1.jpg> > Thus, hardware is sufficient to handle predictably sequential data. Yeah, inserting sequentially increasing data would only ever touch the right-hand edge of the btree, so memory requirements would be pretty low and constant. > There very well could be a pattern in the data which could affect > things, however, I'm not sure how to identify it in 100K rows out of > 100M. I conjecture that the problem areas represent places where the key sequence is significantly "more random" than it is elsewhere. Hard to be more specific than that though. regards, tom lane
On Fri, 2005-11-11 at 18:02 -0500, Tom Lane wrote: > > There very well could be a pattern in the data which could affect > > things, however, I'm not sure how to identify it in 100K rows out of > > 100M. > > I conjecture that the problem areas represent places where the key > sequence is significantly "more random" than it is elsewhere. Hard > to be more specific than that though. > OK, I understand the pattern now. My two tables hold orders, and order state transitions. Most orders have two transitions: creation and termination. The problem happens when there is a significant number of orders where termination is happening a long time after creation, causing order_transition rows with old ord_id values to be inserted. This is valid, so I have to figure out a way to accomodate it. You mentioned playing with checkpointing and bgwriter earlier in this thread. I experimented with the bgwriter through the weekend, but I don't have a good idea what sensible parameter changes are... Re: checkpointing, currently my checkpoints are happening every 5 minutes (if I turn on fsync, the graph shows checkpoints dramatically). If I increase the checkpoint_timeout, could that be beneficial? Or would I just have more time between larger spikes? -K
At 09:43 AM 11/14/2005, Kelly Burkhart wrote: >On Fri, 2005-11-11 at 18:02 -0500, Tom Lane wrote: > > > There very well could be a pattern in the data which could affect > > > things, however, I'm not sure how to identify it in 100K rows out of > > > 100M. > > > > I conjecture that the problem areas represent places where the key > > sequence is significantly "more random" than it is elsewhere. Hard > > to be more specific than that though. > > > >OK, I understand the pattern now. > >My two tables hold orders, and order state transitions. Most orders >have two transitions: creation and termination. The problem happens >when there is a significant number of orders where termination is >happening a long time after creation, causing order_transition rows with >old ord_id values to be inserted. > >This is valid, so I have to figure out a way to accomodate it. Perhaps a small schema change would help? Instead of having the order state transitions explicitly listed in the table, why not create two new tables; 1 for created orders and 1 for terminated orders. When an order is created, its ord_id goes into the CreatedOrders table. When an order is terminated, its ord_id is added to the TerminatedOrders table and then deleted from the CreatedOrders table. Downsides to this approach are some extra complexity and that you will have to make sure that system disaster recovery includes making sure that no ord_id appears in both the CreatedOrders and TerminatedOrdes tables. Upsides are that the insert problem goes away and certain kinds of accounting and inventory reports are now easier to create. Ron