Thread: psql nested queries with 2000+ records

psql nested queries with 2000+ records

From
Coronach
Date:
I have noted with 2000+ records in a table (using a btree unique index on
the primary key, btw) performance of a query takes 15+ minutes to respond.

The query is in the structure of this

select foo from bar where foo in (select foo from bar where bazzfoo like
'Word%');

Just reporting a possible bug..

With smaller amounts of records, responce time is within a few seconds.

Other information you might want:

postgres version: 6.3
os: BSDI 3.1

Thanks
-coronach@hill-b-073.resnet.purdue.edu


Re: [HACKERS] psql nested queries with 2000+ records

From
Bruce Momjian
Date:
>
> I have noted with 2000+ records in a table (using a btree unique index on
> the primary key, btw) performance of a query takes 15+ minutes to respond.
>
> The query is in the structure of this
>
> select foo from bar where foo in (select foo from bar where bazzfoo like
> 'Word%');

Very strange.  15+ minutes?  Wow, that is terrible, even longer than a
sequential scan of the table.  Try EXPLAIN and tell us what it says in
the two cases.

>
> Just reporting a possible bug..
>
> With smaller amounts of records, responce time is within a few seconds.
>
> Other information you might want:
>
> postgres version: 6.3
> os: BSDI 3.1

Same OS here.
--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] psql nested queries with 2000+ records

From
Coronach
Date:
At 12:24 PM 3/20/98 -0500, Bruce Momjian wrote:

>Very strange.  15+ minutes?  Wow, that is terrible, even longer than a
>sequential scan of the table.  Try EXPLAIN and tell us what it says in
>the two cases.

This is within the 2717 record database.

explain select name from games where name in (select name from games where
name like 'A%');

NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=207.95 size=446 width=12)
  SubPlan
    ->  Seq Scan on games  (cost=207.95 size=1 width=12)

NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=207.95 size=446 width=12)
  SubPlan
    ->  Seq Scan on games  (cost=207.95 size=1 width=12)

EXPLAIN

This is within the 24 record database of the same type of data

explain select name from games where name in (select name from game
s where name like 'A%');
NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=207.95 size=446 width=12)
  SubPlan
    ->  Seq Scan on games  (cost=207.95 size=1 width=12)

NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=207.95 size=446 width=12)
  SubPlan
    ->  Seq Scan on games  (cost=207.95 size=1 width=12)

EXPLAIN

Obviously, I get the same information, any suggestions?

-Coronach@hill-b-073.resnet.purdue.edu


Re: [HACKERS] psql nested queries with 2000+ records

From
Bruce Momjian
Date:
OK, I think I see the problem.  Indexes are not being used in this case.
Basically for every row in the outer query, you are doing a sequential
scan in the inner table.

Can you try this:

select * into games2 from games;
explain select name from games where name in (select name from games2 where
name like 'A%');


I would like to know if the use of the same table in the subquery is
causing the problem.  I assume you have run vacuum and vacuum analyze.


>
> At 12:24 PM 3/20/98 -0500, Bruce Momjian wrote:
>
> >Very strange.  15+ minutes?  Wow, that is terrible, even longer than a
> >sequential scan of the table.  Try EXPLAIN and tell us what it says in
> >the two cases.
>
> This is within the 2717 record database.
>
> explain select name from games where name in (select name from games where
> name like 'A%');
>
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=207.95 size=446 width=12)
>   SubPlan
>     ->  Seq Scan on games  (cost=207.95 size=1 width=12)
>
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=207.95 size=446 width=12)
>   SubPlan
>     ->  Seq Scan on games  (cost=207.95 size=1 width=12)
>
> EXPLAIN
>
> This is within the 24 record database of the same type of data
>
> explain select name from games where name in (select name from game
> s where name like 'A%');
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=207.95 size=446 width=12)
>   SubPlan
>     ->  Seq Scan on games  (cost=207.95 size=1 width=12)
>
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=207.95 size=446 width=12)
>   SubPlan
>     ->  Seq Scan on games  (cost=207.95 size=1 width=12)
>
> EXPLAIN
>
> Obviously, I get the same information, any suggestions?
>
> -Coronach@hill-b-073.resnet.purdue.edu
>
>


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] psql nested queries with 2000+ records

From
Coronach
Date:
>select * into games2 from games;

amusements=> select * into games2 from games;
ERROR:  parser: parse error at or near "games2"

But, I just dumped it and changed a few things to fit the scenario...

>explain select name from games where name in (select name from games2 where
>name like 'A%');

amusements=> explain select name from games where name in (select name from
game
s2 where name like 'A%');
NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=207.95 size=446 width=12)
  SubPlan
    ->  Seq Scan on games2  (cost=10.23 size=1 width=12)

EXPLAIN

Upon execution of the query, it still took quite a long time.

>causing the problem.  I assume you have run vacuum and vacuum analyze.

Yep, ran vacuum on it and also had it work with newly generated databases too.


Any other suggestions would be welcome, thanks

-Coronach@hill-b-073.resnet.purdue.edu


Re: [HACKERS] psql nested queries with 2000+ records

From
"Vadim B. Mikheev"
Date:
Coronach wrote:
>
> I have noted with 2000+ records in a table (using a btree unique index on
> the primary key, btw) performance of a query takes 15+ minutes to respond.
>
> The query is in the structure of this
>
> select foo from bar where foo in (select foo from bar where bazzfoo like
> 'Word%');

EXPLAIN select foo from bar where bazzfoo like 'Word%';

Is index used ?

Vadim

Re: [HACKERS] psql nested queries with 2000+ records

From
Bruce Momjian
Date:
>
> >select * into games2 from games;
>
> amusements=> select * into games2 from games;
> ERROR:  parser: parse error at or near "games2"

Oops, we haven't released 6.3p1 yet, so 'select * into table games2 from
games;'

>
> But, I just dumped it and changed a few things to fit the scenario...
>
> >explain select name from games where name in (select name from games2 where
> >name like 'A%');
>
> amusements=> explain select name from games where name in (select name from
> game
> s2 where name like 'A%');
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=207.95 size=446 width=12)
>   SubPlan
>     ->  Seq Scan on games2  (cost=10.23 size=1 width=12)
>
> EXPLAIN
>
> Upon execution of the query, it still took quite a long time.
>
> >causing the problem.  I assume you have run vacuum and vacuum analyze.
>
> Yep, ran vacuum on it and also had it work with newly generated databases too.

Try it without the LIKE, with just an equals.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] psql nested queries with 2000+ records

From
Coronach
Date:
At 09:54 AM 3/21/98 -0500, Bruce Momjian wrote:

>Try it without the LIKE, with just an equals.

amusements=> explain select name from games where name in (select name from
game
s where mfr = '');
NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=207.95 size=446 width=12)
  SubPlan
    ->  Seq Scan on games  (cost=207.95 size=1 width=12)

EXPLAIN

--

amusements=> explain select name from games where name in (select name from
game
s2 where mfr = '');
NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=207.95 size=446 width=12)
  SubPlan
    ->  Seq Scan on games2  (cost=10.23 size=1 width=12)

EXPLAIN

The first query approved to be lengthy, but the last query was executed and
returned rows in just a few.

I then did an explain on a query that refered to the games table within the
sub query.

amusements=> explain select name from games where name in (select name from
game
s where mfr = '');
NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=207.95 size=446 width=12)
  SubPlan
    ->  Seq Scan on games  (cost=207.95 size=1 width=12)

EXPLAIN


For the nature of the searches that the front end send to postgres (the
database is used with a web interface), it is nec. to use the like
expression.  Where should I go from here?

Thanks once again,

-Coronach@hill-b-073.resnet.purdue.edu


Re: [HACKERS] psql nested queries with 2000+ records

From
Bruce Momjian
Date:
What does this show in explain?  Does it use an index?

    select name from games where mfr = ''

>
> At 09:54 AM 3/21/98 -0500, Bruce Momjian wrote:
>
> >Try it without the LIKE, with just an equals.
>
> amusements=> explain select name from games where name in (select name from
> game
> s where mfr = '');
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=207.95 size=446 width=12)
>   SubPlan
>     ->  Seq Scan on games  (cost=207.95 size=1 width=12)
>
> EXPLAIN

I ran this test:

    test=> explain select * from pg_proc where oid in (select oid from
    pg_proc where oid =3);
    NOTICE:  QUERY PLAN:

    Seq Scan on pg_proc  (cost=48.90 size=91 width=122)
      SubPlan
        ->  Index Scan on pg_proc  (cost=2.05 size=1 width=4)

and it used the index in the subquery.  Please let us about the above
test.


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] psql nested queries with 2000+ records

From
Coronach
Date:
Aparently, the postgresql server was moved from the BSDI box to a linux
2.0.30 box, so the numbers that where given where not from the bsdi box. So
now I am doing the queries personally.  I'm sorry for the run around.

This is the responce from the explain on the BSDI 3.1 box.

amusements=> explain select name from games where name in (select name from
game
s where name like 'A%');
NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=0.00 size=0 width=12)
  SubPlan
    ->  Seq Scan on games  (cost=0.00 size=0 width=12)

EXPLAIN

amusements=> explain select name from games where name in (select name from
game
s2 where mfr = '');
NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=0.00 size=0 width=12)
  SubPlan
    ->  Seq Scan on games2  (cost=0.00 size=0 width=12)

EXPLAIN

amusements=> explain select * from games where mfr = '';
NOTICE:  QUERY PLAN:

Seq Scan on games  (cost=0.00 size=0 width=160)

EXPLAIN

Once again, I appologize for this misinformation and hope this sheds some
light to the original problem.  I'm making a note to change permissions on
who can do what from now on.

On a side note, the query was done on the 2.0.30 box in 5 minutes compared
to the BSDI box at 15+.

-Coronach@hill-b-073.resnet.purdue.edu


Re: [HACKERS] psql nested queries with 2000+ records

From
Bruce Momjian
Date:
> amusements=> explain select * from games where mfr = '';
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=0.00 size=0 width=160)
>
> EXPLAIN

OK. This is the base of the problem.  You are vacuum.  How varied is the
data in the mfr field.  Is the usually just a few unique values.  If so,
the index really does you no good, and perhaps that is why it is not
being used.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] psql nested queries with 2000+ records

From
dg@illustra.com (David Gould)
Date:
Coronach@hill-b-073.resnet.purdue.edu blushes and says:
> Aparently, the postgresql server was moved from the BSDI box to a linux
> 2.0.30 box, so the numbers that where given where not from the bsdi box. So
> now I am doing the queries personally.  I'm sorry for the run around.
>
> This is the responce from the explain on the BSDI 3.1 box.
>
> amusements=> explain select name from games where name in (select name from
> game
> s where name like 'A%');
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=0.00 size=0 width=12)
>   SubPlan
>     ->  Seq Scan on games  (cost=0.00 size=0 width=12)
>
> EXPLAIN
>
> amusements=> explain select name from games where name in (select name from
> game
> s2 where mfr = '');
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=0.00 size=0 width=12)
>   SubPlan
>     ->  Seq Scan on games2  (cost=0.00 size=0 width=12)
>
> EXPLAIN
>
> amusements=> explain select * from games where mfr = '';
> NOTICE:  QUERY PLAN:
>
> Seq Scan on games  (cost=0.00 size=0 width=160)
>
> EXPLAIN
>
> Once again, I appologize for this misinformation and hope this sheds some
> light to the original problem.  I'm making a note to change permissions on
> who can do what from now on.
>
> On a side note, the query was done on the 2.0.30 box in 5 minutes compared
> to the BSDI box at 15+.

Well of course... ;-)

I am sure this has been mentioned, but could you post the exact schema for
these tables and their indexes? It might even be worth dropping and
recreating the indexes just to be _sure_.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - I realize now that irony has no place in business communications.