Thread: slow SELECT ... LIMIT query

slow SELECT ... LIMIT query

From
"Simon Stanlake"
Date:
Hi,

I have a fairly large table (1 million records) with the following
structure...

sampleid int4
unitid int4
datetimestamp timestamp
data1 float8
data2 float8

btree indexes on sampleid, unitid, and datetimestamp.

I want to be able to pull out the most recent record for a certain unit.

the query that seemed most reasonable was
SELECT * FROM MYTABLE WHERE UNITID = unit_id ORDER BY DATETIMESTAMP DESC
LIMIT 1;

some strange results...

1) for units that have a lot of records (100K), the query is reasonably fast
(~ 0.5 seconds) but for units with not too many records (100) the query is
REALLY SLOW, like 15 seconds.  the explain plan says INDEX SCAN BACKWARDS so
I guess it is using the index.

2) when I take away the LIMIT 1 and run the query it returns the results
almost immediately no matter how many records the unit has.

It looks like the indexes are not being used properly when the LIMIT clause
is inserted.  I didn't do anything fancy when I created the indexes, just
the standard CREATE INDEX statement.  Ran VACUUM ANALYZE and everything.

Is this common when you use the LIMIT clause?  Is there a work around?

Thanks,
Simon


Re: slow SELECT ... LIMIT query

From
Tom Lane
Date:
What query plans are you getting for these various combinations?

            regards, tom lane

Re: slow SELECT ... LIMIT query

From
Martijn van Oosterhout
Date:
On Wed, Jul 18, 2001 at 10:49:29AM -0700, Simon Stanlake wrote:
> Hi,
>
> I have a fairly large table (1 million records) with the following
> structure...
>
[snip]

Well, it's fai;rly hard to guess at questions like this unless you provide
some more details. Do you think you could post EXPLAIN outputs for both
queries?

> Is this common when you use the LIMIT clause?  Is there a work around?

It shouldn't happen but something that might work is to create a cursor and
then do fetch 1;

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

RE: slow SELECT ... LIMIT query

From
"Simon Stanlake"
Date:
Thanks for the response...

here's the explain for the two queries...


with limit
postgres=# explain select * from myTable where unitid = 2 order by
datetimestamp desc limit 1;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..29.33 rows=1 width=32)
  ->  Index Scan Backward using datetimestamp_idx on myTable
(cost=0.00..159464.46 rows=5436 width=32)

EXPLAIN


and without limit...
postgres=# explain select * from myTable where unitid = 2 order by
datetimestamp desc;
NOTICE:  QUERY PLAN:

Sort  (cost=15977.53..15977.53 rows=5436 width=32)
  ->  Index Scan using unitid_idx on myTable (cost=0.00..15640.27 rows=5436
width=32)

hmmm, looks like either using the datetimestamp index or doing the backward
scan could be messing it up.

tried
select * from (select * from myTable where unitid = 2 order by datetimestamp
desc) a limit 1;

and it's super fast.  the explain for this one is...
Limit  (cost=15977.53..15977.53 rows=1 width=32)
  ->  Subquery Scan a  (cost=15977.53..15977.53 rows=5436 width=32)
        ->  Sort  (cost=15977.53..15977.53 rows=5436 width=32)
              ->  Index Scan using unitid_idx on locationevent
(cost=0.00..15640.27 rows=5436 width=32

EXPLAIN

can anyone tell me what slowed down the first query?

thanks,
simon


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, July 18, 2001 2:44 PM
To: Simon Stanlake
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] slow SELECT ... LIMIT query


What query plans are you getting for these various combinations?

            regards, tom lane


Re: slow SELECT ... LIMIT query

From
Tom Lane
Date:
"Simon Stanlake" <stanlake@hi.ca> writes:
> can anyone tell me what slowed down the first query?

Well, the first one is going to scan (backwards) in datetimestamp order
until it finds a row with unitid = 2.  The second one is going to use
the unitid index to pick out just the rows with unitid = 2, and then
sort them by timestamp.  Depending on how many rows have unitid = 2
and how new the latest one is, I could see either of these being way
faster than the other.

            regards, tom lane