Thread: Query take 101 minutes, help, please

Query take 101 minutes, help, please

From
Christian Compagnon
Date:
Hello,

I'm a newbie in postgresql, I've installed it on a Windows XP machine
( I can't use linux, it's a company machine ), I'm courious why this
query takes so long

SELECT "Rut Cliente"
FROM "Internet_Abril"
WHERE "Rut Cliente"  NOT IN ((SELECT "Rut Cliente"  FROM
"Internet_Enero") UNION (SELECT "Rut Cliente"  FROM
"Internet_Febrero") UNION (SELECT "Rut Cliente"  FROM
"Internet_Marzo"));

it takes about 100 minutes to complete the query.
All tables has index created ( Rut Cliente is a VarChar ), and tables
has 50.000 records each.

The explain for the query tells the following

"QUERY PLAN
 Seq Scan on "Internet_Abril"  (cost=19406.67..62126112.70 rows=24731 width=13)
  Filter: (NOT (subplan))
   SubPlan
       ->  Materialize  (cost=19406.67..21576.07 rows=136740 width=13)
        ->  Unique  (cost=17784.23..18467.93 rows=136740 width=13)
        ->  Sort  (cost=17784.23..18126.08 rows=136740 width=13)
                                                               Sort
Key: "Rut Cliente"
          ->  Append  (cost=0.00..3741.80 rows=136740 width=13)
          ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1233.38
rows=45069 width=13)
           ->  Seq Scan on "Internet_Enero"  (cost=0.00..782.69
rows=45069 width=13)
           ->  Subquery Scan "*SELECT* 2"  (cost=0.00..1104.06
rows=40353 width=13)
             ->  Seq Scan on "Internet_Febrero"  (cost=0.00..700.53
rows=40353 width=13)
           ->  Subquery Scan "*SELECT* 3"  (cost=0.00..1404.36
rows=51318 width=13)
            ->  Seq Scan on "Internet_Marzo"  (cost=0.00..891.18
rows=51318 width=13)

Any help will be apreciated, It's for my thesis


saludos
Christian

Re: Query take 101 minutes, help, please

From
"Steinar H. Gunderson"
Date:
On Wed, Sep 07, 2005 at 12:22:27PM -0400, Christian Compagnon wrote:
> I'm a newbie in postgresql, I've installed it on a Windows XP machine
> ( I can't use linux, it's a company machine ), I'm courious why this
> query takes so long

It sounds like you've set work_mem too low; increasing it might help. Also
try rewriting your query to

  SELECT "Rut Cliente"
  FROM "Internet_Abril"
  WHERE
    "Rut Cliente" NOT IN ( SELECT "Rut Cliente" FROM "Internet_Enero" )
    AND "Rut Cliente" NOT IN ( SELECT "Rut Cliente" FROM "Internet_Febrero" )
    AND "Rut Cliente" NOT IN ( SELECT "Rut Cliente" FROM "Internet_Marzo" )

(I'm not sure how optimized UNION inside an IN/NOT IN is.)

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Query take 101 minutes, help, please

From
Meetesh Karia
Date:
PG is creating the union of January, February and March tables first and that doesn't have an index on it.  If you're going to do many queries using the union of those three tables, you might want to place their contents into one table and create an index on it.

Otherwise, try something like this:

SELECT "Rut Cliente"
FROM "Internet_Abril"
WHERE "Rut Cliente"  NOT IN (SELECT "Rut Cliente"  FROM
"Internet_Enero")
AND "Rut Cliente" NOT IN (SELECT "Rut Cliente"  FROM
"Internet_Febrero")
AND "Rut Cliente" NOT IN (SELECT "Rut Cliente"  FROM
"Internet_Marzo");

You could also compare the performance of that to this and choose the one that works the best:

SELECT "Rut Cliente"
FROM "Internet_Abril" a
LEFT JOIN "Internet_Enero" e ON a."Rut Cliente" = e."Rut Cliente"
LEFT JOIN "Internet_Febrero" f ON a."Rut Cliente" = f."Rut Cliente"
LEFT JOIN "Internet_Marzo" m ON a."Rut Cliente" = m."Rut Cliente"
WHERE e."Rut Cliente" IS NULL AND f."Rut Cliente" IS NULL and m."Rut Cliente" IS NULL;

Meetesh

On 9/7/05, Christian Compagnon <ccompagnon@gmail.com> wrote:
Hello,

I'm a newbie in postgresql, I've installed it on a Windows XP machine
( I can't use linux, it's a company machine ), I'm courious why this
query takes so long

SELECT "Rut Cliente"
FROM "Internet_Abril"
WHERE "Rut Cliente"  NOT IN ((SELECT "Rut Cliente"  FROM
"Internet_Enero") UNION (SELECT "Rut Cliente"  FROM
"Internet_Febrero") UNION (SELECT "Rut Cliente"  FROM
"Internet_Marzo"));

it takes about 100 minutes to complete the query.
All tables has index created ( Rut Cliente is a VarChar ), and tables
has 50.000 records each.

The explain for the query tells the following

"QUERY PLAN
Seq Scan on "Internet_Abril"  (cost=19406.67..62126112.70 rows=24731 width=13)
  Filter: (NOT (subplan))
   SubPlan
       ->  Materialize  (cost=19406.67..21576.07 rows=136740 width=13)
        ->  Unique  (cost=17784.23..18467.93 rows=136740 width=13)
        ->  Sort  (cost=17784.23..18126.08 rows=136740 width=13)
                                                               Sort
Key: "Rut Cliente"
          ->  Append  (cost=0.00..3741.80 rows=136740 width=13)
          ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1233.38
rows=45069 width=13)
           ->  Seq Scan on "Internet_Enero"  (cost=0.00..782.69
rows=45069 width=13)
           ->  Subquery Scan "*SELECT* 2"  (cost=0.00..1104.06
rows=40353 width=13)
             ->  Seq Scan on "Internet_Febrero"  (cost=0.00..700.53
rows=40353 width=13)
           ->  Subquery Scan "*SELECT* 3"  (cost=0.00..1404.36
rows=51318 width=13)
            ->  Seq Scan on "Internet_Marzo"  (cost=0.00..891.18
rows=51318 width=13)

Any help will be apreciated, It's for my thesis


saludos
Christian

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

Re: Query take 101 minutes, help, please

From
Alex Hayward
Date:
On Wed, 7 Sep 2005, Meetesh Karia wrote:

> PG is creating the union of January, February and March tables first and
> that doesn't have an index on it. If you're going to do many queries using
> the union of those three tables, you might want to place their contents into
> one table and create an index on it.
>
> Otherwise, try something like this:
>
> SELECT "Rut Cliente"
> FROM "Internet_Abril"
> WHERE "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM
> "Internet_Enero")
> AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM
> "Internet_Febrero")
> AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM
> "Internet_Marzo");

You may also wish to try:

SELECT "Rut Cliente"
FROM "Internet_Abril"
WHERE NOT EXISTS
  (SELECT 1 FROM "Internet_Enero"
   WHERE "Internet_Enero"."Rut Cliente"="Internet_Abril"."Rut Cliente")
AND NOT EXISTS
  (SELECT 1 FROM "Internet_Febrero"
   WHERE "Internet_Febrero"."Rut Cliente"="Internet_Abril"."Rut Cliente")
AND NOT EXISTS
  (SELECT 1 FROM "Internet_Marzo"
   WHERE "Internet_Marzo"."Rut Cliente"="Internet_Abril"."Rut Cliente")

which will probably scan the indexes on the January, February and March
indexes once for each row in the April table.


Re: Query take 101 minutes, help, please

From
Tom Lane
Date:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes:
> (I'm not sure how optimized UNION inside an IN/NOT IN is.)

NOT IN is pretty nonoptimal, period.  It'd help a lot to boost work_mem
to the point where the planner figures it can use a hashtable (look for
EXPLAIN to say "hashed subplan" rather than just "subplan").  Of course,
if there's enough stuff in the UNION that that drives you into swapping,
it's gonna be painful anyway.

Using UNION ALL instead of UNION might save a few cycles too.

If you're willing to rewrite the query wholesale, you could try the old
trick of a LEFT JOIN where you discard rows for which there's a match,
ie, the righthand join value isn't NULL.

            regards, tom lane