Home > mailing lists

Thread: creating of temporary table takes very long

creating of temporary table takes very long

From

"Sriram Dandapani"

Date:

17 April 2006, 15:36:56

create temporary table c_chkpfw_hr_tr_updates as

select * from c_chkpfw_hr_tr a

where exists(select 1 from chkpfw_tr_hr_dimension b

WHERE a.firstoccurrence = b.firstoccurrence

AND a.sentryid_id = b.sentryid_id

AND a.node_id = b.node_id

AND a.customerid_id = b.customerid_id

AND coalesce(a.interface_id,0) = coalesce(b.interface_id,0)

AND coalesce(a.source_id,0) = coalesce(b.source_id,0)

AND coalesce(a.destination_id,0) = coalesce(b.destination_id,0)

AND coalesce(a.sourceport_id,0) = coalesce(b.sourceport_id,0)

AND coalesce(a.destinationport_id,0) = coalesce(b.destinationport_id,0)

AND coalesce(a.inoutbound_id,0) = coalesce(b.inoutbound_id,0)

AND coalesce(a.action_id,0) = coalesce(b.action_id,0)

AND coalesce(a.protocol_id,0) = coalesce(b.protocol_id,0)

AND coalesce(a.service_id,0) = coalesce(b.service_id,0)

AND coalesce(a.sourcezone_id,0) = coalesce(b.sourcezone_id,0)

AND coalesce(a.destinationzone_id,0) = coalesce(b.destinationzone_id,0));

This takes forever (I have to cancel the statement each time)

c_chkpfw_hr_tr has about 20000 rows

chkpfw_tr_hr_dimension has 150K rows

c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension

For such a small data set, this seems like a mystery. The only other alternative I have is to use cursors which are also very slow for row sets of 10- 15K or more.

Re: creating of temporary table takes very long

From

"Sriram Dandapani"

Date:

17 April 2006, 15:52:40

Explain analyze on the select statement that is the basis for temp table data takes forever. I turned off enable_seqscan but it did not have an effect

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Sriram Dandapani
Sent: Monday, April 17, 2006 11:37 AM
To: Pgsql-Performance (E-mail)
Subject: [PERFORM] creating of temporary table takes very long

create temporary table c_chkpfw_hr_tr_updates as

select * from c_chkpfw_hr_tr a

where exists(select 1 from chkpfw_tr_hr_dimension b

WHERE a.firstoccurrence = b.firstoccurrence

AND a.sentryid_id = b.sentryid_id

AND a.node_id = b.node_id

AND a.customerid_id = b.customerid_id

AND coalesce(a.interface_id,0) = coalesce(b.interface_id,0)

AND coalesce(a.source_id,0) = coalesce(b.source_id,0)

AND coalesce(a.destination_id,0) = coalesce(b.destination_id,0)

AND coalesce(a.sourceport_id,0) = coalesce(b.sourceport_id,0)

AND coalesce(a.destinationport_id,0) = coalesce(b.destinationport_id,0)

AND coalesce(a.inoutbound_id,0) = coalesce(b.inoutbound_id,0)

AND coalesce(a.action_id,0) = coalesce(b.action_id,0)

AND coalesce(a.protocol_id,0) = coalesce(b.protocol_id,0)

AND coalesce(a.service_id,0) = coalesce(b.service_id,0)

AND coalesce(a.sourcezone_id,0) = coalesce(b.sourcezone_id,0)

AND coalesce(a.destinationzone_id,0) = coalesce(b.destinationzone_id,0));

This takes forever (I have to cancel the statement each time)

c_chkpfw_hr_tr has about 20000 rows

chkpfw_tr_hr_dimension has 150K rows

c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension

For such a small data set, this seems like a mystery. The only other alternative I have is to use cursors which are also very slow for row sets of 10- 15K or more.

Re: creating of temporary table takes very long

From

Tom Lane

Date:

17 April 2006, 16:28:44

"Sriram Dandapani" <sdandapani@counterpane.com> writes:
> [ query snipped ]
> This takes forever (I have to cancel the statement each time)

How long did you wait?

> c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension

Which would be what exactly?  What does EXPLAIN show for that SELECT?
(I won't make you post EXPLAIN ANALYZE, if you haven't got the patience
to let it finish, but you should at least provide EXPLAIN results.)

            regards, tom lane

Re: creating of temporary table takes very long

From

"Sriram Dandapani"

Date:

17 April 2006, 16:49:24

Explain output. I tried explain analyze but pgadmin froze after 10
minutes.


QUERY PLAN
"Seq Scan on c_chkpfw_hr_tr a  (cost=0.00..225975659.89 rows=11000
width=136)"
"  Filter: (subplan)"
"  SubPlan"
"    ->  Bitmap Heap Scan on chkpfw_tr_hr_dimension b
(cost=1474.64..10271.13 rows=1 width=0)"
"          Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id)
AND ($2 = node_id))"
"          Filter: (($3 = customerid_id) AND (COALESCE($4, 0) =
COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id,
0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND
(COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8, 0) =
COALESCE(destinationport_id, 0)) AND (COALESCE($9, 0) =
COALESCE(inoutbound_id, 0)) AND (COALESCE($10, 0) = COALESCE(action_id,
0)) AND (COALESCE($11, 0) = COALESCE(protocol_id, 0)) AND (COALESCE($12,
0) = COALESCE(service_id, 0)) AND (COALESCE($13, 0) =
COALESCE(sourcezone_id, 0)) AND (COALESCE($14, 0) =
COALESCE(destinationzone_id, 0)))"
"          ->  Bitmap Index Scan on chkpfw_tr_hr_idx1
(cost=0.00..1474.64 rows=38663 width=0)"
"                Index Cond: (($0 = firstoccurrence) AND ($1 =
sentryid_id) AND ($2 = node_id))"

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, April 17, 2006 12:29 PM
To: Sriram Dandapani
Cc: Pgsql-Performance (E-mail)
Subject: Re: [PERFORM] creating of temporary table takes very long

"Sriram Dandapani" <sdandapani@counterpane.com> writes:
> [ query snipped ]
> This takes forever (I have to cancel the statement each time)

How long did you wait?

> c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension

Which would be what exactly?  What does EXPLAIN show for that SELECT?
(I won't make you post EXPLAIN ANALYZE, if you haven't got the patience
to let it finish, but you should at least provide EXPLAIN results.)

            regards, tom lane

Re: creating of temporary table takes very long

From

"Sriram Dandapani"

Date:

17 April 2006, 17:05:41

Got an explain analyze output..Here it is


"Seq Scan on c_chkpfw_hr_tr a  (cost=0.00..225975659.89 rows=11000
width=136) (actual time=2.345..648070.474 rows=22001 loops=1)"
"  Filter: (subplan)"
"  SubPlan"
"    ->  Bitmap Heap Scan on chkpfw_tr_hr_dimension b
(cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439
rows=1 loops=22001)"
"          Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id)
AND ($2 = node_id))"
"          Filter: (($3 = customerid_id) AND (COALESCE($4, 0) =
COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id,
0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND
(COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)"
"          ->  Bitmap Index Scan on chkpfw_tr_hr_idx1
(cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144
rows=33026 loops=22001)"
"                Index Cond: (($0 = firstoccurrence) AND ($1 =
sentryid_id) AND ($2 = node_id))"
"Total runtime: 648097.800 ms"

Regards

Sriram

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, April 17, 2006 12:29 PM
To: Sriram Dandapani
Cc: Pgsql-Performance (E-mail)
Subject: Re: [PERFORM] creating of temporary table takes very long

"Sriram Dandapani" <sdandapani@counterpane.com> writes:
> [ query snipped ]
> This takes forever (I have to cancel the statement each time)

How long did you wait?

> c_chkpfw_hr_tr has same indexes as chkpfw_tr_hr_dimension

Which would be what exactly?  What does EXPLAIN show for that SELECT?
(I won't make you post EXPLAIN ANALYZE, if you haven't got the patience
to let it finish, but you should at least provide EXPLAIN results.)

            regards, tom lane

Re: creating of temporary table takes very long

From

Tom Lane

Date:

18 April 2006, 13:10:10

"Sriram Dandapani" <sdandapani@counterpane.com> writes:
> Got an explain analyze output..Here it is
> "Seq Scan on c_chkpfw_hr_tr a  (cost=0.00..225975659.89 rows=11000
> width=136) (actual time=2.345..648070.474 rows=22001 loops=1)"
> "  Filter: (subplan)"
> "  SubPlan"
> "    ->  Bitmap Heap Scan on chkpfw_tr_hr_dimension b
> (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439
> rows=1 loops=22001)"
> "          Recheck Cond: (($0 = firstoccurrence) AND ($1 = sentryid_id)
> AND ($2 = node_id))"
> "          Filter: (($3 = customerid_id) AND (COALESCE($4, 0) =
> COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id,
> 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND
> (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)"
> "          ->  Bitmap Index Scan on chkpfw_tr_hr_idx1
> (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144
> rows=33026 loops=22001)"
> "                Index Cond: (($0 = firstoccurrence) AND ($1 =
> sentryid_id) AND ($2 = node_id))"
> "Total runtime: 648097.800 ms"

That's probably about as good a query plan as you can hope for given
the way the query is written.  Those COALESCE comparisons are all
unindexable (unless you make functional indexes on the COALESCE
expressions).  You might get somewhere by converting the EXISTS
to an IN, though.

            regards, tom lane

Re: creating of temporary table takes very long

From

"Sriram Dandapani"

Date:

18 April 2006, 13:13:26

Thx Tom

I guess I have to abandon the bulk update. The columns in the where
clause comprise 80% of the table columns..So indexing all may not help.
The target table will have on average 60-180 million rows.

I will attempt the in instead of exist and let you know the result

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, April 18, 2006 9:10 AM
To: Sriram Dandapani
Cc: Pgsql-Performance (E-mail)
Subject: Re: [PERFORM] creating of temporary table takes very long

"Sriram Dandapani" <sdandapani@counterpane.com> writes:
> Got an explain analyze output..Here it is
> "Seq Scan on c_chkpfw_hr_tr a  (cost=0.00..225975659.89 rows=11000
> width=136) (actual time=2.345..648070.474 rows=22001 loops=1)"
> "  Filter: (subplan)"
> "  SubPlan"
> "    ->  Bitmap Heap Scan on chkpfw_tr_hr_dimension b
> (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439
> rows=1 loops=22001)"
> "          Recheck Cond: (($0 = firstoccurrence) AND ($1 =
sentryid_id)
> AND ($2 = node_id))"
> "          Filter: (($3 = customerid_id) AND (COALESCE($4, 0) =
> COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id,
> 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND
> (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)"
> "          ->  Bitmap Index Scan on chkpfw_tr_hr_idx1
> (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144
> rows=33026 loops=22001)"
> "                Index Cond: (($0 = firstoccurrence) AND ($1 =
> sentryid_id) AND ($2 = node_id))"
> "Total runtime: 648097.800 ms"

That's probably about as good a query plan as you can hope for given
the way the query is written.  Those COALESCE comparisons are all
unindexable (unless you make functional indexes on the COALESCE
expressions).  You might get somewhere by converting the EXISTS
to an IN, though.

            regards, tom lane

Re: creating of temporary table takes very long

From

"Jim C. Nasby"

Date:

18 April 2006, 19:34:21

You might try rewriting the coalesces into a row comparison...

WHERE row($4, $5, ...) IS NOT DISTINCT FROM row(interface_id, source_id, ...)

See
http://www.postgresql.org/docs/8.1/interactive/functions-comparisons.html#AEN13408

Note that the docs only show IS DISTINCT FROM, so you might have to do

WHERE NOT row(...) IS DISTINCT FROM row(...)

On Tue, Apr 18, 2006 at 09:13:04AM -0700, Sriram Dandapani wrote:
> Thx Tom
>
> I guess I have to abandon the bulk update. The columns in the where
> clause comprise 80% of the table columns..So indexing all may not help.
> The target table will have on average 60-180 million rows.
>
> I will attempt the in instead of exist and let you know the result
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, April 18, 2006 9:10 AM
> To: Sriram Dandapani
> Cc: Pgsql-Performance (E-mail)
> Subject: Re: [PERFORM] creating of temporary table takes very long
>
> "Sriram Dandapani" <sdandapani@counterpane.com> writes:
> > Got an explain analyze output..Here it is
> > "Seq Scan on c_chkpfw_hr_tr a  (cost=0.00..225975659.89 rows=11000
> > width=136) (actual time=2.345..648070.474 rows=22001 loops=1)"
> > "  Filter: (subplan)"
> > "  SubPlan"
> > "    ->  Bitmap Heap Scan on chkpfw_tr_hr_dimension b
> > (cost=1474.64..10271.13 rows=1 width=0) (actual time=29.439..29.439
> > rows=1 loops=22001)"
> > "          Recheck Cond: (($0 = firstoccurrence) AND ($1 =
> sentryid_id)
> > AND ($2 = node_id))"
> > "          Filter: (($3 = customerid_id) AND (COALESCE($4, 0) =
> > COALESCE(interface_id, 0)) AND (COALESCE($5, 0) = COALESCE(source_id,
> > 0)) AND (COALESCE($6, 0) = COALESCE(destination_id, 0)) AND
> > (COALESCE($7, 0) = COALESCE(sourceport_id, 0)) AND (COALESCE($8 (..)"
> > "          ->  Bitmap Index Scan on chkpfw_tr_hr_idx1
> > (cost=0.00..1474.64 rows=38663 width=0) (actual time=12.144..12.144
> > rows=33026 loops=22001)"
> > "                Index Cond: (($0 = firstoccurrence) AND ($1 =
> > sentryid_id) AND ($2 = node_id))"
> > "Total runtime: 648097.800 ms"
>
> That's probably about as good a query plan as you can hope for given
> the way the query is written.  Those COALESCE comparisons are all
> unindexable (unless you make functional indexes on the COALESCE
> expressions).  You might get somewhere by converting the EXISTS
> to an IN, though.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: creating of temporary table takes very long

From

Tom Lane

Date:

18 April 2006, 19:48:41

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> You might try rewriting the coalesces into a row comparison...
> WHERE row($4, $5, ...) IS NOT DISTINCT FROM row(interface_id, source_id, ...)

That would be notationally nicer, but no help performance-wise; I'm
fairly sure that IS DISTINCT doesn't get optimized in any fashion
whatsoever :-(

What might be worth trying is functional indexes on the COALESCE(foo,0)
expressions.  Or if possible, consider revising your data schema to
avoid using NULLs in a way that requires assuming that NULL = NULL.

            regards, tom lane