Thread: Question whether this is a known problem in 7.1.2

Question whether this is a known problem in 7.1.2

From
Rachit Siamwalla
Date:
This problem was discovered in 7.1.2. Was wondering whether this is a known problem or not; we plan to test this on the
latestpostgres sometime later.
 

We have a large table, lets call it A, millions of rows. And in the table is a field called time, which is TIMESTAMP
type.We have an index on it.
 

Oftentimes we like to get the latest row inserted by time on a given constraint. So we do a:


SELECT * FROM A WHERE someconstraint = somerandomnumber ORDER BY time desc limit 1;

Postgres intellegently uses the index to scan through the table from the end forward.

If there are no items that fit the constraint, the query will take a long time (cause it has to scan the whole table).

If there are items (plural important here, read below) that fit the constraint, the database finds the first item, and
returnsit right away (fairly quickly if the item is near the end).
 

However, if there is only ONE item, postgres still scans the whole database. Not sure why. We also find out that if:

There are 2 items that match the criteria, and you do a LIMIT 2, it scans the whole table as well. Limit 1 returns
quickly.Basically it seems like postgres is looking for one more item than it needs to.
 

-rchit


Re: Question whether this is a known problem in 7.1.2

From
Rachit Siamwalla
Date:
By the way, a colleague just reproduced this problem on a 7.2.1 postgres.

-----Original Message-----
From: Rachit Siamwalla [mailto:rachit@ensim.com]
Sent: Friday, June 07, 2002 4:27 PM
To: pgsql-hackers; Paul Menage
Subject: [HACKERS] Question whether this is a known problem in 7.1.2



This problem was discovered in 7.1.2. Was wondering whether this is a known problem or not; we plan to test this on the
latestpostgres sometime later.
 

We have a large table, lets call it A, millions of rows. And in the table is a field called time, which is TIMESTAMP
type.We have an index on it.
 

Oftentimes we like to get the latest row inserted by time on a given constraint. So we do a:


SELECT * FROM A WHERE someconstraint = somerandomnumber ORDER BY time desc limit 1;

Postgres intellegently uses the index to scan through the table from the end forward.

If there are no items that fit the constraint, the query will take a long time (cause it has to scan the whole table).

If there are items (plural important here, read below) that fit the constraint, the database finds the first item, and
returnsit right away (fairly quickly if the item is near the end).
 

However, if there is only ONE item, postgres still scans the whole database. Not sure why. We also find out that if:

There are 2 items that match the criteria, and you do a LIMIT 2, it scans the whole table as well. Limit 1 returns
quickly.Basically it seems like postgres is looking for one more item than it needs to.
 

-rchit

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Re: Question whether this is a known problem in 7.1.2

From
Tom Lane
Date:
Rachit Siamwalla <rachit@ensim.com> writes:
> There are 2 items that match the criteria, and you do a LIMIT 2, it
> scans the whole table as well. Limit 1 returns quickly. Basically it
> seems like postgres is looking for one more item than it needs to. 

This is not a bug; or at least it's not something I'm prepared to break
other things to change.

If you can figure out a way to change nodeLimit.c to not get confused
about change-of-fetch-direction without the extra fetch, then send a
patch.
        regards, tom lane