Thread: creating of temporary table takes very long
create temporary table c_chkpfw_hr_tr_updates as
select * from c_chkpfw_hr_tr a
where exists(select 1 from chkpfw_tr_hr_dimension b
WHERE a.firstoccurrence = b.firstoccurrence
AND a.sentryid_id = b.sentryid_id
AND a.node_id = b.node_id
AND a.customerid_id = b.customerid_id
AND coalesce(a.interface_id,0) = coalesce(b.interface_id,0)
AND coalesce(a.source_id,0) = coalesce(b.source_id,0)
AND coalesce(a.destination_id,0) = coalesce(b.destination_id,0)
AND coalesce(a.sourceport_id,0) = coalesce(b.sourceport_id,0)
AND coalesce(a.destinationport_id,0) = coalesce(b.destinationport_id,0)
AND coalesce(a.inoutbound_id,0) = coalesce(b.inoutbound_id,0)
AND coalesce(a.action_id,0) = coalesce(b.action_id,0)
AND coalesce(a.protocol_id,0) = coalesce(b.protocol_id,0)
AND coalesce(a.service_id,0) = coalesce(b.service_id,0)
AND coalesce(a.sourcezone_id,0) = coalesce(b.sourcezone_id,0)
AND coalesce(a.destinationzone_id,0) = coalesce(b.destinationzone_id,0));
This takes forever (I have to cancel the statement each time)
c_chkpfw_hr_tr has about 20000 rows
chkpfw_tr_hr_dimension has 150K rows
c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension
For such a small data set, this seems like a mystery. The only other alternative I have is to use cursors which are also very slow for row sets of 10- 15K or more.
Explain analyze on the select statement that is the basis for temp table data takes forever. I turned off enable_seqscan but it did not have an effect
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Sriram Dandapani
Sent: Monday, April 17, 2006 11:37 AM
To: Pgsql-Performance (E-mail)
Subject: [PERFORM] creating of temporary table takes very long
create temporary table c_chkpfw_hr_tr_updates as
select * from c_chkpfw_hr_tr a
where exists(select 1 from chkpfw_tr_hr_dimension b
WHERE a.firstoccurrence = b.firstoccurrence
AND a.sentryid_id = b.sentryid_id
AND a.node_id = b.node_id
AND a.customerid_id = b.customerid_id
AND coalesce(a.interface_id,0) = coalesce(b.interface_id,0)
AND coalesce(a.source_id,0) = coalesce(b.source_id,0)
AND coalesce(a.destination_id,0) = coalesce(b.destination_id,0)
AND coalesce(a.sourceport_id,0) = coalesce(b.sourceport_id,0)
AND coalesce(a.destinationport_id,0) = coalesce(b.destinationport_id,0)
AND coalesce(a.inoutbound_id,0) = coalesce(b.inoutbound_id,0)
AND coalesce(a.action_id,0) = coalesce(b.action_id,0)
AND coalesce(a.protocol_id,0) = coalesce(b.protocol_id,0)
AND coalesce(a.service_id,0) = coalesce(b.service_id,0)
AND coalesce(a.sourcezone_id,0) = coalesce(b.sourcezone_id,0)
AND coalesce(a.destinationzone_id,0) = coalesce(b.destinationzone_id,0));
This takes forever (I have to cancel the statement each time)
c_chkpfw_hr_tr has about 20000 rows
chkpfw_tr_hr_dimension has 150K rows
c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension
For such a small data set, this seems like a mystery. The only other alternative I have is to use cursors which are also very slow for row sets of 10- 15K or more.
"Sriram Dandapani" <sdandapani@counterpane.com> writes: > [ query snipped ] > This takes forever (I have to cancel the statement each time) How long did you wait? > c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension Which would be what exactly? What does EXPLAIN show for that SELECT? (I won't make you post EXPLAIN ANALYZE, if you haven't got the patience to let it finish, but you should at least provide EXPLAIN results.) regards, tom lane
Explain output. I tried explain analyze but pgadmin froze after 10 minutes. QUERY PLAN "Seq Scan on c_chkpfw_hr_tr a (cost=0.00..225975659.89 rows=11000 width=136)" " Filter: (subplan)" " SubPlan" " -> Bitmap Heap Scan on chkpfw_tr_hr_dimension b (cost=1474.64..10271.13 rows=1 width=0)" " Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id))" " Filter: (($3 = customerid_id) AND (COALESCE($4, 0) = COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id, 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8, 0) = COALESCE(destinationport_id, 0)) AND (COALESCE($9, 0) = COALESCE(inoutbound_id, 0)) AND (COALESCE($10, 0) = COALESCE(action_id, 0)) AND (COALESCE($11, 0) = COALESCE(protocol_id, 0)) AND (COALESCE($12, 0) = COALESCE(service_id, 0)) AND (COALESCE($13, 0) = COALESCE(sourcezone_id, 0)) AND (COALESCE($14, 0) = COALESCE(destinationzone_id, 0)))" " -> Bitmap Index Scan on chkpfw_tr_hr_idx1 (cost=0.00..1474.64 rows=38663 width=0)" " Index Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id))" -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, April 17, 2006 12:29 PM To: Sriram Dandapani Cc: Pgsql-Performance (E-mail) Subject: Re: [PERFORM] creating of temporary table takes very long "Sriram Dandapani" <sdandapani@counterpane.com> writes: > [ query snipped ] > This takes forever (I have to cancel the statement each time) How long did you wait? > c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension Which would be what exactly? What does EXPLAIN show for that SELECT? (I won't make you post EXPLAIN ANALYZE, if you haven't got the patience to let it finish, but you should at least provide EXPLAIN results.) regards, tom lane
Got an explain analyze output..Here it is "Seq Scan on c_chkpfw_hr_tr a (cost=0.00..225975659.89 rows=11000 width=136) (actual time=2.345..648070.474 rows=22001 loops=1)" " Filter: (subplan)" " SubPlan" " -> Bitmap Heap Scan on chkpfw_tr_hr_dimension b (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439 rows=1 loops=22001)" " Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id))" " Filter: (($3 = customerid_id) AND (COALESCE($4, 0) = COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id, 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)" " -> Bitmap Index Scan on chkpfw_tr_hr_idx1 (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144 rows=33026 loops=22001)" " Index Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) AND ($2 = node_id))" "Total runtime: 648097.800 ms" Regards Sriram -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, April 17, 2006 12:29 PM To: Sriram Dandapani Cc: Pgsql-Performance (E-mail) Subject: Re: [PERFORM] creating of temporary table takes very long "Sriram Dandapani" <sdandapani@counterpane.com> writes: > [ query snipped ] > This takes forever (I have to cancel the statement each time) How long did you wait? > c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension Which would be what exactly? What does EXPLAIN show for that SELECT? (I won't make you post EXPLAIN ANALYZE, if you haven't got the patience to let it finish, but you should at least provide EXPLAIN results.) regards, tom lane
"Sriram Dandapani" <sdandapani@counterpane.com> writes: > Got an explain analyze output..Here it is > "Seq Scan on c_chkpfw_hr_tr a (cost=0.00..225975659.89 rows=11000 > width=136) (actual time=2.345..648070.474 rows=22001 loops=1)" > " Filter: (subplan)" > " SubPlan" > " -> Bitmap Heap Scan on chkpfw_tr_hr_dimension b > (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439 > rows=1 loops=22001)" > " Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) > AND ($2 = node_id))" > " Filter: (($3 = customerid_id) AND (COALESCE($4, 0) = > COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id, > 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND > (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)" > " -> Bitmap Index Scan on chkpfw_tr_hr_idx1 > (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144 > rows=33026 loops=22001)" > " Index Cond: (($0 = firstoccurrence) AND ($1 = > sentryid_id) AND ($2 = node_id))" > "Total runtime: 648097.800 ms" That's probably about as good a query plan as you can hope for given the way the query is written. Those COALESCE comparisons are all unindexable (unless you make functional indexes on the COALESCE expressions). You might get somewhere by converting the EXISTS to an IN, though. regards, tom lane
Thx Tom I guess I have to abandon the bulk update. The columns in the where clause comprise 80% of the table columns..So indexing all may not help. The target table will have on average 60-180 million rows. I will attempt the in instead of exist and let you know the result -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, April 18, 2006 9:10 AM To: Sriram Dandapani Cc: Pgsql-Performance (E-mail) Subject: Re: [PERFORM] creating of temporary table takes very long "Sriram Dandapani" <sdandapani@counterpane.com> writes: > Got an explain analyze output..Here it is > "Seq Scan on c_chkpfw_hr_tr a (cost=0.00..225975659.89 rows=11000 > width=136) (actual time=2.345..648070.474 rows=22001 loops=1)" > " Filter: (subplan)" > " SubPlan" > " -> Bitmap Heap Scan on chkpfw_tr_hr_dimension b > (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439 > rows=1 loops=22001)" > " Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id) > AND ($2 = node_id))" > " Filter: (($3 = customerid_id) AND (COALESCE($4, 0) = > COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id, > 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND > (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)" > " -> Bitmap Index Scan on chkpfw_tr_hr_idx1 > (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144 > rows=33026 loops=22001)" > " Index Cond: (($0 = firstoccurrence) AND ($1 = > sentryid_id) AND ($2 = node_id))" > "Total runtime: 648097.800 ms" That's probably about as good a query plan as you can hope for given the way the query is written. Those COALESCE comparisons are all unindexable (unless you make functional indexes on the COALESCE expressions). You might get somewhere by converting the EXISTS to an IN, though. regards, tom lane
You might try rewriting the coalesces into a row comparison... WHERE row($4, $5, ...) IS NOT DISTINCT FROM row(interface_id, source_id, ...) See http://www.postgresql.org/docs/8.1/interactive/functions-comparisons.html#AEN13408 Note that the docs only show IS DISTINCT FROM, so you might have to do WHERE NOT row(...) IS DISTINCT FROM row(...) On Tue, Apr 18, 2006 at 09:13:04AM -0700, Sriram Dandapani wrote: > Thx Tom > > I guess I have to abandon the bulk update. The columns in the where > clause comprise 80% of the table columns..So indexing all may not help. > The target table will have on average 60-180 million rows. > > I will attempt the in instead of exist and let you know the result > > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, April 18, 2006 9:10 AM > To: Sriram Dandapani > Cc: Pgsql-Performance (E-mail) > Subject: Re: [PERFORM] creating of temporary table takes very long > > "Sriram Dandapani" <sdandapani@counterpane.com> writes: > > Got an explain analyze output..Here it is > > "Seq Scan on c_chkpfw_hr_tr a (cost=0.00..225975659.89 rows=11000 > > width=136) (actual time=2.345..648070.474 rows=22001 loops=1)" > > " Filter: (subplan)" > > " SubPlan" > > " -> Bitmap Heap Scan on chkpfw_tr_hr_dimension b > > (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439 > > rows=1 loops=22001)" > > " Recheck Cond: (($0 = firstoccurrence) AND ($1 = > sentryid_id) > > AND ($2 = node_id))" > > " Filter: (($3 = customerid_id) AND (COALESCE($4, 0) = > > COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id, > > 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND > > (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)" > > " -> Bitmap Index Scan on chkpfw_tr_hr_idx1 > > (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144 > > rows=33026 loops=22001)" > > " Index Cond: (($0 = firstoccurrence) AND ($1 = > > sentryid_id) AND ($2 = node_id))" > > "Total runtime: 648097.800 ms" > > That's probably about as good a query plan as you can hope for given > the way the query is written. Those COALESCE comparisons are all > unindexable (unless you make functional indexes on the COALESCE > expressions). You might get somewhere by converting the EXISTS > to an IN, though. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > You might try rewriting the coalesces into a row comparison... > WHERE row($4, $5, ...) IS NOT DISTINCT FROM row(interface_id, source_id, ...) That would be notationally nicer, but no help performance-wise; I'm fairly sure that IS DISTINCT doesn't get optimized in any fashion whatsoever :-( What might be worth trying is functional indexes on the COALESCE(foo,0) expressions. Or if possible, consider revising your data schema to avoid using NULLs in a way that requires assuming that NULL = NULL. regards, tom lane