Thread: Reducing memory usage of insert into select operations?
Redhat 4 postgresql 8.3.3 Memory: 12GB While doing a couple of operations of the type insert into <table> select from <some other table> The OS triggered the out of memory killer (oom-killer). After some research and trial/error I found it was the inserts. I see one of the inserts is using up 12GB! How can I reduce the usage? Postgresql.conf settings. shared_buffers = 3GB temp_buffers = 64MB # min 800kB work_mem = 256MB # min 64kB maintenance_work_mem = 1GB Reducing work_mem would help? The table I am selecting from has a few hundred million rows. The table I am inserting into has partitions. I am benchmarking breaking up a large table into smaller partitions.
On Thu, Jul 17, 2008 at 7:21 PM, Francisco Reyes <lists@stringsutils.com> wrote: > Redhat 4 > postgresql 8.3.3 > Memory: 12GB > > While doing a couple of operations of the type > insert into <table> select from <some other table> > > The OS triggered the out of memory killer (oom-killer). Is this a 32-bit installation or 64-bit? 3GB of shared_buffers is way too big for a 32-bit setup. -Doug
On Thu, Jul 17, 2008 at 9:27 PM, Francisco Reyes <lists@stringsutils.com> wrote: > Douglas McNaught writes: > >> Is this a 32-bit installation or 64-bit? 3GB of shared_buffers is way >> too big for a 32-bit setup. > > > 64-bit. > The machine has 12GB of RAM so shared-buffers is about 1/3. > Dedicated DB server. Ahh, good. Just wanted to answer the obvious question first. Some people set shared_buffers really high on 32-bit systems and then are surprised when it doesn't work well. It does seem that reducing work_mem might help you, but others on this list are much more expert than I in diagnosing this sort of problem. It would probably be helpful for you to post the EXPLAIN output from your query, so they can see which part of the plan causes the large memory usage. -Doug
Douglas McNaught writes: > It does seem that reducing work_mem might help you, but others on this I reduced it from 256MB to 64MB. It seems it is helping. At 256MB the usage per DB connection instance was upwards of 12GB. At 64MB so far is around 7GB. I just reduced it further to 32MB and see how that works. > It would probably be helpful for you to post the EXPLAIN output from Hash Join (cost=712213.57..27293913.33 rows=234402352 width=24) Hash Cond: (coh.id = ids.id) -> Hash Join (cost=551387.26..18799378.16 rows=234402352 width=22) Hash Cond: (coh.user_id = ca.user_id) -> Seq Scan on customer_original_historical coh (cost=0.00..6702501.40 rows=234402352 width=47) Filter: (yearmo > '200703'::bpchar) -> Hash (cost=268355.67..268355.67 rows=14637567 width=32) -> Seq Scan on cards ca (cost=0.00..268355.67 rows=14637567 width=32) -> Hash (cost=77883.25..77883.25 rows=5055525 width=6) -> Seq Scan on customer_ids ids (cost=0.00..77883.25 rows=5055525 width=6)
Douglas McNaught writes: > Is this a 32-bit installation or 64-bit? 3GB of shared_buffers is way > too big for a 32-bit setup. 64-bit. The machine has 12GB of RAM so shared-buffers is about 1/3. Dedicated DB server.
Francisco Reyes wrote: > The OS triggered the out of memory killer (oom-killer). > The table I am selecting from has a few hundred million rows. > The table I am inserting into has partitions. I am benchmarking breaking > up a large table into smaller partitions. Is the partition split done with triggers or rules? -- Richard Huxton Archonet Ltd
On Fri, Jul 18, 2008 at 12:18 AM, Francisco Reyes <lists@stringsutils.com> wrote: > Douglas McNaught writes: > > >> It does seem that reducing work_mem might help you, but others on this > > I reduced it from 256MB to 64MB. It seems it is helping. You should also look at your memory overcommit settings (in /proc/sys/vm). You can set things up so that Postgres gets a malloc() failure (which it is generally prepared to cope with cleanly) when the system runs out of RAM, rather than having the OOM killer go off and hit it with SIGKILL. Overcommit is useful in some contexts (Java apps tend to map a lot more memory than they actually use) but for a dedicated database server you really don't ever want to have the OOM killer triggered. -Doug
On 9:53 am 07/18/08 "Douglas McNaught" <doug@mcnaught.org> wrote: > dedicated database server you really don't ever want to have the OOM > killer triggered. Found that yesterday (vm.overcommit_memory=2). Agree that this is better than OOM. I still ran out of memory last night and postgres just failed on the malloc(), which as you mentioned is better. Reduced work_mem to 8MB and trying again.
On 8:13 am 07/18/08 Richard Huxton <dev@archonet.com> wrote: > Is the partition split done with triggers or rules? I have a single trigger+function combo that dynamically computes which partition the data has to go to.
Francisco Reyes wrote: > On 8:13 am 07/18/08 Richard Huxton <dev@archonet.com> wrote: >> Is the partition split done with triggers or rules? > > I have a single trigger+function combo that dynamically computes which > partition the data has to go to. I'm wondering whether it's memory usage either for the trigger itself, or for the function (pl/pgsql?). If you're doing something like: INSERT INTO partitioned_table SELECT * FROM big_table then that's not only taking place within a single transaction, but within a single statement. Without being a hacker, I'd say it's entirely plausible that PG might clean up triggers at the end of a statement meaning you would need memory for 200million+ triggers. Alternatively, it could be a memory-leak somewhere in the pl/pgsql or trigger code. Wouldn't have to be much to affect this particular case. What happens if you do the insert/select in stages but all in one transaction? Do you see PG's memory requirement stay constant or grow in steps. That will show whether the memory is growing over the duration of a statement or a transaction. BEGIN; INSERT ... SELECT ... WHERE id BETWEEN 0 AND 999999 INSERT ... SELECT ... WHERE id BETWEEN 1000000 AND 1999999 ... COMMIT; -- Richard Huxton Archonet Ltd
On 11:25 am 07/18/08 Richard Huxton <dev@archonet.com> wrote: > I'm wondering whether it's memory usage either for the trigger > itself, or for the function (pl/pgsql?). Good point. > If you're doing something > like: INSERT INTO partitioned_table SELECT * FROM big_table > then that's not only taking place within a single transaction, but > within a single statement. Correct. I have kept decreasing work_mem and that does not seem to help. > Without being a hacker, I'd say it's entirely plausible that PG might > clean up triggers at the end of a statement meaning you would need > memory for 200million+ triggers. Sure hope that is not the case. > Alternatively, it could be a memory-leak somewhere in the pl/pgsql or > trigger code. Wouldn't have to be much to affect this particular case. Will post an strace. > What happens if you do the insert/select in stages but all in one > transaction? Will test. The data is about a year worth of data. I will try to do one month at a time, within a single transaction. A single month finishes fine. > Do you see PG's memory requirement stay constant or grow > in steps. That will show whether the memory is growing over the > duration of a statement or a transaction. Right now for the single statement/transaction (the one big process) it is growing slowly over time. It may be a leak. It seems to start growing somewhere between the 1st and 2nd hower. It seems to always be failing around 4 hours. I wrote a little process that shows the amount of free memory every 15 minutes.. I will post strace for the big process and then will try breaking the process down by month, but within a single transaction and report that later when I get some results.
On 11:25 am 07/18/08 Richard Huxton <dev@archonet.com> wrote: Strace of the single/large process. Again, all the query is doing is insert into <file> select <subquery> The strace is pretty much a repetition of the lines below. semop(557057, 0x7fbfffdfb0, 1) = 0 lseek(100, 0, SEEK_END) = 671719424 write(100, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 lseek(508, 0, SEEK_END) = 55697408 write(508, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 read(381, "\0\0\0&\0\224\21\0\225o\10\0\30\331c\0c\225%w(\0\0\0\0\0\0\0\0\0\5\0"..., 8192) = 8192 semop(557057, 0x7fbfffd1a0, 1) = 0 lseek(100, 0, SEEK_END) = 671727616 write(100, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 semop(557057, 0x7fbfffd1c0, 1) = 0 semop(557057, 0x7fbfffd1a0, 1) = 0 semop(557057, 0x7fbfffd1c0, 1) = 0 read(381, "w\317\21\0<]9\0\177\246eA(\0\0\0\0\0\0\0\0\0\5\0\2\0\30\0.\v\0\0"..., 8192) = 8192 semop(557057, 0x7fbfffd1a0, 1) = 0 lseek(512, 0, SEEK_END) = 48144384 write(512, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 semop(557057, 0x7fbfffd1c0, 1) = 0 lseek(100, 0, SEEK_END) = 671735808 write(100, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 lseek(517, 0, SEEK_END) = 89309184 write(517, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 semop(557057, 0x7fbfffd1c0, 1) = 0 semop(557057, 0x7fbfffddd0, 1) = 0 lseek(100, 0, SEEK_END) = 671744000 write(100, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 read(381, ">\212\225\202(\0\0\0\0\0\0\0\0\0\5\0\2\0\30\0000\v\0\0\1\0\23\2\0\0\0\t"..., 8192) = 8192 lseek(510, 0, SEEK_END) = 29351936 write(510, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 lseek(100, 0, SEEK_END) = 671752192 write(100, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 semop(557057, 0x7fbfffddf0, 1) = 0 read(381, "\0\0\0\0\0\0\5\0\2\0\30\0001\v\0\0\0\0\23\2\0\0\0\30\0\4\20\0\302\326\0\0"..., 8192) = 8192 lseek(513, 0, SEEK_END) = 19316736 write(513, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 lseek(100, 0, SEEK_END) = 671760384 write(100, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 read(381, "\2\0\30\0004\v\0\0\0\0\23\2\0\0\0\16\0\254\r\0\t+\35\0\301\217@\0\304\346U\241"..., 8192) = 8192 lseek(100, 0, SEEK_END) = 671768576 write(100, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 lseek(518, 0, SEEK_END) = 55025664 write(518, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 semop(557057, 0x7fbfffd1c0, 1) = 0 semop(557057, 0x7fbfffd1c0, 1) = 0 semop(557057, 0x7fbfffd1c0, 1) = 0 lseek(100, 0, SEEK_END) = 671776768
On Fri, Jul 18, 2008 at 10:40:02AM -0400, Francisco Reyes wrote: > Found that yesterday (vm.overcommit_memory=2). > Agree that this is better than OOM. I still ran out of memory last night > and postgres just failed on the malloc(), which as you mentioned is better. > > Reduced work_mem to 8MB and trying again. Perhaps you can try reducing the shared_buffers, to see if that helps more? 8MB is quite small for workmem. More shared_buffers is not necessarily better. Also, how much swap are you running? overcommit disabled while not having any swap setup is a great way to ensure you run out of memory quickly. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Francisco Reyes wrote: > On 11:25 am 07/18/08 Richard Huxton <dev@archonet.com> wrote: > > Strace of the single/large process. > Again, all the query is doing is > insert into <file> select <subquery> > > The strace is pretty much a repetition of the lines below. Do you have long-running transactions? (For example transactions that have been idle for a long time). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 12:03 pm 07/18/08 Martijn van Oosterhout <kleptog@svana.org> wrote: > Perhaps you can try reducing the shared_buffers, to see if that helps > more? Will try. > 8MB is quite small for workmem. More shared_buffers is not > necessarily better. Ok, but from everything I had read shared_buffers of 1/4 seemed like a starting point. Will try reducing it to 2GB. > Also, how much swap are you running? Started out with 12GB (same as memory) and last night I added 24GB more. I had 2 instances of inserts going so each exausted about 18GB of ram!
On 12:23 pm 07/18/08 Alvaro Herrera <alvherre@commandprompt.com> wrote: > Do you have long-running transactions? (For example transactions that > have been idle for a long time). No. The two inserts I was running were the only processes. I even did a restart to make sure there was absolutely nothing else running and to make sure all my postgresql.conf settings were in. Given that memory grows over time I am beggining to wonder if it is some type of memory leak. Just installed the postgresql debug rpm, but not sure if did anything.. strace doesn't look any different.. read(81, "2\1\0\0\260~!\16\1\0\0\0\370\1\0\2\0 \4 \0\0\0\0\300\237r\0\200\237r\0"..., 8192) = 8192 write(191, "Q=J\313\253]1\0\0\0\1\0007\33\4\0\2\0\2\t\30\0\3\302\204\0;a1OjG"..., 8192) = 8192 write(160, "XQxbqQEx+yo=H\333o\2371\0\0\0\1\0.\33C\0\2\0\2\t\30\0"..., 8192) = 8192 read(81, "2\1\0\0\320(\301\17\1\0\0\0\370\1\0\2\0 \4 \0\0\0\0\300\237r\0\200\237r\0"..., 8192) = 8192
"Francisco Reyes" <lists@stringsutils.com> writes: > Given that memory grows over time I am beggining to wonder if it is some > type of memory leak. Are there any AFTER triggers (including foreign key constraints) on the table being inserted into? If so the list of pending trigger events might be your problem. If you can get Postgres to report an actual out-of-memory error (as opposed to crashing from OOM kill) then it should dump a memory usage map into the postmaster log. Looking at that would be informative. regards, tom lane
On 1:00 pm 07/18/08 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Are there any AFTER triggers (including foreign key constraints) I have two foreign key constraints. > the table being inserted into? If so the list of pending trigger > events might be your problem. I guess I can try disablign the foreign key, but that would be less than ideal for production. This is an analytics environment so all operations are in bulk. > If you can get Postgres to report an actual out-of-memory error (as > opposed to crashing from OOM kill) Disabled oom with vm.overcommit_memory=2. >then it should dump a memory usage >map into the postmaster log. Looking at that would be informative. Got it. ---------------------- AfterTriggerEvents: 10553909248 total in 1268 blocks; 20432 free (6 chunks); 10553888816 used ExecutorState: 122880 total in 4 blocks; 68040 free (8 chunks); 54840 used Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used MessageContext: 131072 total in 5 blocks; 50712 free (291 chunks); 80360 used smgr relation table: 24576 total in 2 blocks; 3584 free (4 chunks); 20992 used TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used PortalHeapMemory: 1024 total in 1 blocks; 768 free (0 chunks); 256 used ExecutorState: 98784 total in 8 blocks; 24064 free (22 chunks); 74720 used ExprContext: 8192 total in 1 blocks; 8016 free (0 chunks); 176 used HashTableContext: 8192 total in 1 blocks; 8064 free (1 chunks); 128 used HashBatchContext: 532676656 total in 74 blocks; 1863936 free (5 chunks); 530812720 used HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used HashBatchContext: 415227952 total in 59 blocks; 6589744 free (5 chunks); 408638208 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 8192 total in 1 blocks; 8136 free (0 chunks); 56 used Relcache by OID: 24576 total in 2 blocks; 8672 free (3 chunks); 15904 used CacheMemoryContext: 2390256 total in 22 blocks; 751904 free (2 chunks); 1638352 used CachedPlan: 1024 total in 1 blocks; 336 free (0 chunks); 688 used CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used SPI Plan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used CachedPlan: 7168 total in 3 blocks; 3120 free (0 chunks); 4048 used CachedPlanSource: 7168 total in 3 blocks; 1816 free (0 chunks); 5352 used SPI Plan: 1024 total in 1 blocks; 784 free (0 chunks); 240 used CachedPlan: 3072 total in 2 blocks; 792 free (0 chunks); 2280 used CachedPlanSource: 7168 total in 3 blocks; 3600 free (0 chunks); 3568 used SPI Plan: 1024 total in 1 blocks; 800 free (0 chunks); 224 used pg_cast_source_target_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_language_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_toast_2619_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_amop_opr_fam_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used tcf_mnfoids_partid: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used tcf_mnfoids_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used cards_cardnum_key: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used cards_pkey: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used tcf_original_trans_partid_cardnum: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used tcf_original_trans_yearmo: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used pg_constraint_contypid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_constraint_conname_nsp_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_operator_oprname_l_r_n_index: 2048 total in 1 blocks; 392 free (0 chunks); 1656 used pg_proc_proname_args_nsp_index: 2048 total in 1 blocks; 584 free (0 chunks); 1464 used pg_proc_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_shdepend_reference_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_namespace_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_statistic_relid_att_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_inherits_relid_seqno_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_constraint_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_attribute_relid_attnam_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_attrdef_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_shdepend_depender_index: 2048 total in 1 blocks; 584 free (0 chunks); 1464 used pg_type_typname_nsp_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_type_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_depend_depender_index: 2048 total in 1 blocks; 584 free (0 chunks); 1464 used pg_depend_reference_index: 2048 total in 1 blocks; 584 free (0 chunks); 1464 used pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_description_o_c_o_index: 2048 total in 1 blocks; 584 free (0 chunks); 1464 used pg_constraint_conrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_class_relname_nsp_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_namespace_nspname_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_authid_oid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_operator_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used MdSmgr: 24576 total in 2 blocks; 15872 free (0 chunks); 8704 used LOCALLOCK hash: 57344 total in 3 blocks; 36384 free (11 chunks); 20960 used Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used ERROR: out of memory DETAIL: Failed on request of size 40.
"Francisco Reyes" <lists@stringsutils.com> writes: > On 1:00 pm 07/18/08 Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If you can get Postgres to report an actual out-of-memory error (as >> opposed to crashing from OOM kill) >> then it should dump a memory usage >> map into the postmaster log. Looking at that would be informative. > Got it. > AfterTriggerEvents: 10553909248 total in 1268 blocks; 20432 free (6 > chunks); 10553888816 used Well, that's definitely your problem ... > HashBatchContext: 532676656 total in 74 blocks; 1863936 free (5 > chunks); 530812720 used > HashBatchContext: 415227952 total in 59 blocks; 6589744 free (5 > chunks); 408638208 used although these numbers seem way outta line too. What did you say you had work_mem set to? regards, tom lane
On 3:55 pm 07/18/08 Tom Lane <tgl@sss.pgh.pa.us> wrote: > > AfterTriggerEvents: 10553909248 total in 1268 blocks; 20432 free (6 > > chunks); 10553888816 used > > Well, that's definitely your problem ... So I need to remove the foreign constraints? > > HashBatchContext: 415227952 total in 59 blocks; 6589744 > > free (5 chunks); 408638208 used > > although these numbers seem way outta line too. What did you say you > had work_mem set to? Initially on the first crash it was 256MB. I believe at the time of the crash I got the dump for it was down to 64MB or 8MB. I kept trying lower values. Also tried reducing shared_buffers as someone suggested. I will bump my shared_buffers back to 3GB and work_mem back to 64MB.
On 3:55 pm 07/18/08 Tom Lane <tgl@sss.pgh.pa.us> wrote: > > AfterTriggerEvents: 10553909248 total in 1268 blocks; 20432 free (6 > > chunks); 10553888816 used > > Well, that's definitely your problem ... What is the overhead for each AfterTriggerEvent? I guess I can write a program to process so many rows at a time, if I know how much overhead each AfterTriggerEvent uses. I know 15 million at a time worked fine, so I could do 5 or 10 million at a time. When does the memory usage for those AfterTriggerEvents gets released? At commit?
"Francisco Reyes" <lists@stringsutils.com> writes: > On 3:55 pm 07/18/08 Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> AfterTriggerEvents: 10553909248 total in 1268 blocks; 20432 free (6 >>> chunks); 10553888816 used >> >> Well, that's definitely your problem ... > So I need to remove the foreign constraints? Either that or do the update in sections. But working through umpteen gig of pending trigger events would take forever anyway --- dropping and re-adding the FK constraint is almost certainly a better way. >>> HashBatchContext: 415227952 total in 59 blocks; 6589744 >>> free (5 chunks); 408638208 used >> >> although these numbers seem way outta line too. What did you say you >> had work_mem set to? > Initially on the first crash it was 256MB. I believe at the time of the > crash I got the dump for it was down to 64MB or 8MB. Something fishy about that. The max size of a HashBatchContext should be work_mem, more or less (the accounting isn't perfectly accurate I think, but it's not off by an order of magnitude). The only thing I can think of is that you had a huge number of rows with all the same hash value, so that there wasn't any way to split the batch into smaller sections. What are the join keys exactly in this query, and what can you tell us about their data distributions? regards, tom lane
"Francisco Reyes" <lists@stringsutils.com> writes: > What is the overhead for each AfterTriggerEvent? On a 64-bit machine it looks like they'd cost you about 80 bytes each :-(. A good deal of that is palloc overhead --- I wonder if we should get rid of the separate-palloc-for-each-event design? > When does the memory usage for those AfterTriggerEvents gets released? At > commit? Whenever the check is done; you'd have to read the rules about deferred constraints ... regards, tom lane
On 4:55 pm 07/18/08 Tom Lane <tgl@sss.pgh.pa.us> wrote: > The only thing I can think of is that you had a huge number of rows > with all the same hash value, so that there wasn't any way to split > the batch into smaller sections. What are the join keys exactly in > this query, and what can you tell us about their data distributions? I can't put actual table or column names so I am putting the actual select and explain, with all names changed.. insert into customer_transactions (record_id, date, type, amount, ids, groupid) select ca.record_id, coh.date, coh.type, coh.amount, coh.ids, ids.groupid from customer_original_historical coh, cards ca, customer_ids ids where ca.natural_key = coh.natural_key and ids.ids = coh.ids and coh.yearmo > '200703'; Hash Join (cost=712213.57..27293913.33 rows=234402352 width=24) Hash Cond: (coh.id = ids.id) -> Hash Join (cost=551387.26..18799378.16 rows=234402352 width=22) Hash Cond: (coh.user_id = ca.user_id) -> Seq Scan on customer_original_historical coh (cost=0.00..6702501.40 rows=234402352 width=47) Filter: (yearmo > '200703'::bpchar) -> Hash (cost=268355.67..268355.67 rows=14637567 width=32) -> Seq Scan on cards ca (cost=0.00..268355.67 rows=14637567 width=32) -> Hash (cost=77883.25..77883.25 rows=5055525 width=6) -> Seq Scan on customer_ids ids (cost=0.00..77883.25 rows=5055525 width=6) There was a single table, customer_original_historical, which was using a natural key with a text field. Most queries used the customer_original_historical by itself or joined against a single other table which we shoudl call "area". The new schema I am testing is to split the one single file into 12 files per month. In addition I replaced the natural keys with a synthetic integer key. I also replaced the "area" table with a customer_ids table which only has two columns: synthetic key for historical and a region. In order to have 12 tables per month I grouped all the regions into 12 groups. Queries are usually within a single region so what I am trying to benchmark is if dividing 24 months of data into 24 sets of 12 regions will perform better than a single large file. The distribution of the joins is: There are about 1000,000 unique natural keys. Each natural key has in average 15 rows per month. ids are regions where the natural_keys are. Figure 10s of thousands of natural_keys to an id. Is that along the lines of what you were looking for?
On Fri, Jul 18, 2008 at 04:48:26PM -0400, Francisco Reyes wrote: > On 3:55 pm 07/18/08 Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > AfterTriggerEvents: 10553909248 total in 1268 blocks; 20432 free (6 > > > chunks); 10553888816 used > > > > Well, that's definitely your problem ... > > What is the overhead for each AfterTriggerEvent? Can you make them not deferred? Then you don't need the memory either. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Martijn van Oosterhout wrote: > Can you make them not deferred? How? I found the issue. I had the foreign key in the master table instead of the children. Deleted RI from master table and put into the inherited partitions. My whole 230 million rows merged in about an hour! And I even had two of those running at the same time. (one setup with 14 partitions per month and another with 5 partitions per month to test difference in performance). It was so fast I even had to do a count(*) to make sure both actually merged. That is 117K rows per second for rows that were about 33 bytes long. That only comes down to about 3 MB/sec+overhead, but still 117K rows/sec is not too shabby. In case it is of interest to anyone.. 2 AMD dual core, 2GHz CPUs 12GB of RAM shared_buffers 3GB work_mem 64MB 256 check_point segments 10 min checkpoing_timeout LSI controller with 128MB cache with BBU. Write cache enabled. Many thanks to all that offered suggestions in the troubleshooting.
Francisco Reyes wrote: > I had the foreign key in the master table instead of the children. > Deleted RI from master table and put into the inherited partitions. > My whole 230 million rows merged in about an hour! Heh -- but are the FKs now checked? Try inserting something that violates the constraints and see if they are rejected. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera writes: > Heh -- but are the FKs now checked? Try inserting something that > violates the constraints and see if they are rejected. I knew it sounded too good to be true. 1- The trigger was not set in the master (ie nothing went to the children). 2- The master had no index and no RI.. so it was a straight insert. I corrected (ie set the trigger in the master and RI in the children). Has been running for 10 hours and has not finished. The good news is that memory doesn't seem to be going up. I will give it till tomorrow AM.. and if hasn't finished will turn off the foreign keys in the children. Already modified the scripts so I can easily build/drop the foreign keys as needed.
Francisco Reyes wrote: > I knew it sounded too good to be true. > 1- The trigger was not set in the master (ie nothing went to the children). > 2- The master had no index and no RI.. so it was a straight insert. > > I corrected (ie set the trigger in the master and RI in the children). > Has been running for 10 hours and has not finished. FWIW it tends to be faster to do the bulk load first and add the indexes and constraints later. (Though obviously you must be prepared to cope with the failing rows, if any). However, if you do this INSERT/SELECT thing frequently, this is probably not very workable. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.