Thread: Too slow

Too slow

From
Sabio - PSQL
Date:
How can I improve speed on my queries. For example this query takes one
day executing itself and it has not finalized !!!
"create table tmp_partes as select * from partes where identificacion
not in (select cedula from sujetos)"

partes have 1888000 rows, an index on identificacion
sujetos have 5500000 rows, an index on cedula





Re: Too slow

From
"Lee Wu"
Date:
PG version?

Maybe worth to try NOT EXISTS instead of NOT IN

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Sabio - PSQL
Sent: Tuesday, March 22, 2005 7:23 AM
To: PostgreSQL Admin
Subject: [ADMIN] Too slow

How can I improve speed on my queries. For example this query takes one
day executing itself and it has not finalized !!!
"create table tmp_partes as select * from partes where identificacion
not in (select cedula from sujetos)"

partes have 1888000 rows, an index on identificacion
sujetos have 5500000 rows, an index on cedula





---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: Too slow

From
"Chris Hoover"
Date:
Sabio - PSQL wrote:

> How can I improve speed on my queries. For example this query takes
> one day executing itself and it has not finalized !!!
> "create table tmp_partes as select * from partes where identificacion
> not in (select cedula from sujetos)"
>
> partes have 1888000 rows, an index on identificacion
> sujetos have 5500000 rows, an index on cedula
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
>
>
try create table tmp_partes as select * from partes where not exists
(select cedula from sujetos where cedula = partes.identificacion);

The "not in (subselect)" is very slow in postgresql.

HTH,

chris


Re: Too slow

From
Sabio - PSQL
Date:
PG version = 8.0 over Linux Fedora Core 3
PostgreSQL was compiled from source

Lee Wu wrote:

>PG version?
>
>Maybe worth to try NOT EXISTS instead of NOT IN
>
>-----Original Message-----
>From: pgsql-admin-owner@postgresql.org
>[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Sabio - PSQL
>Sent: Tuesday, March 22, 2005 7:23 AM
>To: PostgreSQL Admin
>Subject: [ADMIN] Too slow
>
>How can I improve speed on my queries. For example this query takes one
>day executing itself and it has not finalized !!!
>"create table tmp_partes as select * from partes where identificacion
>not in (select cedula from sujetos)"
>
>partes have 1888000 rows, an index on identificacion
>sujetos have 5500000 rows, an index on cedula
>
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>
>
>



Re: Too slow

From
Sabio - PSQL
Date:
WITH: select * from partes where cedula not in (select cedula from sujetos)
Seq Scan on partes  (cost=0.00..168063925339.69 rows=953831 width=109)
  Filter: (NOT (subplan))
  SubPlan
    ->  Seq Scan on sujetos  (cost=0.00..162348.43 rows=5540143 width=15)

WITH: select * from partes where not exists (select cedula from sujetos
where cedula=partes.cedula)
Seq Scan on partes  (cost=0.00..7373076.94 rows=953831 width=109)
  Filter: (NOT (subplan))
  SubPlan
    ->  Index Scan using sujetos_pkey on sujetos  (cost=0.00..3.84
rows=1 width=15)
          Index Cond: ((cedula)::text = ($0)::text)

Thomas F. O'Connell wrote:

> Please post the results of that query as run through EXPLAIN ANALYZE.
>
> Also, I'm going to reply to this on pgsql-performance, which is
> probably where it better belongs.
>
> -tfo
>
> --
> Thomas F. O'Connell
> Co-Founder, Information Architect
> Sitening, LLC
> http://www.sitening.com/
> 110 30th Avenue North, Suite 6
> Nashville, TN 37203-6320
> 615-260-0005
>
> On Mar 22, 2005, at 8:23 AM, Sabio - PSQL wrote:
>
>> How can I improve speed on my queries. For example this query takes
>> one day executing itself and it has not finalized !!!
>> "create table tmp_partes as select * from partes where identificacion
>> not in (select cedula from sujetos)"
>>
>> partes have 1888000 rows, an index on identificacion
>> sujetos have 5500000 rows, an index on cedula
>
>
>
>



Re: Too slow

From
Tom Lane
Date:
"Chris Hoover" <revoohc@sermonaudio.com> writes:
> The "not in (subselect)" is very slow in postgresql.

It's OK as long as the subselect result is small enough to hash, but
with 5500000 rows that's not going to happen :-(.

Another issue is that if there are any NULLs in the subselect then you
will probably not like the results.  They are correct per spec but not
very intuitive.

Personally I'd try ye olde outer join trick:

select partes.*
  from partes left join sujetos on (identificacion = cedula)
  where cedula is null;

A merge join on this would likely be the most effective solution.

            regards, tom lane