Thread: Too slow
How can I improve speed on my queries. For example this query takes one day executing itself and it has not finalized !!! "create table tmp_partes as select * from partes where identificacion not in (select cedula from sujetos)" partes have 1888000 rows, an index on identificacion sujetos have 5500000 rows, an index on cedula
PG version? Maybe worth to try NOT EXISTS instead of NOT IN -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Sabio - PSQL Sent: Tuesday, March 22, 2005 7:23 AM To: PostgreSQL Admin Subject: [ADMIN] Too slow How can I improve speed on my queries. For example this query takes one day executing itself and it has not finalized !!! "create table tmp_partes as select * from partes where identificacion not in (select cedula from sujetos)" partes have 1888000 rows, an index on identificacion sujetos have 5500000 rows, an index on cedula ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
Sabio - PSQL wrote: > How can I improve speed on my queries. For example this query takes > one day executing itself and it has not finalized !!! > "create table tmp_partes as select * from partes where identificacion > not in (select cedula from sujetos)" > > partes have 1888000 rows, an index on identificacion > sujetos have 5500000 rows, an index on cedula > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > > > try create table tmp_partes as select * from partes where not exists (select cedula from sujetos where cedula = partes.identificacion); The "not in (subselect)" is very slow in postgresql. HTH, chris
PG version = 8.0 over Linux Fedora Core 3 PostgreSQL was compiled from source Lee Wu wrote: >PG version? > >Maybe worth to try NOT EXISTS instead of NOT IN > >-----Original Message----- >From: pgsql-admin-owner@postgresql.org >[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Sabio - PSQL >Sent: Tuesday, March 22, 2005 7:23 AM >To: PostgreSQL Admin >Subject: [ADMIN] Too slow > >How can I improve speed on my queries. For example this query takes one >day executing itself and it has not finalized !!! >"create table tmp_partes as select * from partes where identificacion >not in (select cedula from sujetos)" > >partes have 1888000 rows, an index on identificacion >sujetos have 5500000 rows, an index on cedula > > > > > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster > > > >
WITH: select * from partes where cedula not in (select cedula from sujetos) Seq Scan on partes (cost=0.00..168063925339.69 rows=953831 width=109) Filter: (NOT (subplan)) SubPlan -> Seq Scan on sujetos (cost=0.00..162348.43 rows=5540143 width=15) WITH: select * from partes where not exists (select cedula from sujetos where cedula=partes.cedula) Seq Scan on partes (cost=0.00..7373076.94 rows=953831 width=109) Filter: (NOT (subplan)) SubPlan -> Index Scan using sujetos_pkey on sujetos (cost=0.00..3.84 rows=1 width=15) Index Cond: ((cedula)::text = ($0)::text) Thomas F. O'Connell wrote: > Please post the results of that query as run through EXPLAIN ANALYZE. > > Also, I'm going to reply to this on pgsql-performance, which is > probably where it better belongs. > > -tfo > > -- > Thomas F. O'Connell > Co-Founder, Information Architect > Sitening, LLC > http://www.sitening.com/ > 110 30th Avenue North, Suite 6 > Nashville, TN 37203-6320 > 615-260-0005 > > On Mar 22, 2005, at 8:23 AM, Sabio - PSQL wrote: > >> How can I improve speed on my queries. For example this query takes >> one day executing itself and it has not finalized !!! >> "create table tmp_partes as select * from partes where identificacion >> not in (select cedula from sujetos)" >> >> partes have 1888000 rows, an index on identificacion >> sujetos have 5500000 rows, an index on cedula > > > >
"Chris Hoover" <revoohc@sermonaudio.com> writes: > The "not in (subselect)" is very slow in postgresql. It's OK as long as the subselect result is small enough to hash, but with 5500000 rows that's not going to happen :-(. Another issue is that if there are any NULLs in the subselect then you will probably not like the results. They are correct per spec but not very intuitive. Personally I'd try ye olde outer join trick: select partes.* from partes left join sujetos on (identificacion = cedula) where cedula is null; A merge join on this would likely be the most effective solution. regards, tom lane