Thread: Order-by and indexes
I need to get the latest entry of a large table matching a certain criteria. This is my query:
SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC LIMIT 1
This query is quite slow. If I do a explain on it, it seems that it uses an Index Scan Backward.
If I omit the order by on the query:
SELECT * FROM "data" WHERE "data"."fk" = 238496 LIMIT 1
It is very fast. And the explain says that it uses Index scan. This is also very fast if there aren't any matches. But I've read that I'm not guaranteed to get the correct match If I do not use a order by, postgres just returns its fastest possible match. Is this right? But will not the fastest possible match always be the first match in the index? Is there another way to make the order by query go faster?
Thanks!
Odd-R.
SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC LIMIT 1
This query is quite slow. If I do a explain on it, it seems that it uses an Index Scan Backward.
If I omit the order by on the query:
SELECT * FROM "data" WHERE "data"."fk" = 238496 LIMIT 1
It is very fast. And the explain says that it uses Index scan. This is also very fast if there aren't any matches. But I've read that I'm not guaranteed to get the correct match If I do not use a order by, postgres just returns its fastest possible match. Is this right? But will not the fastest possible match always be the first match in the index? Is there another way to make the order by query go faster?
Thanks!
Odd-R.
Dear Odd, I am only a novice, but from what I understand from your email, you want to write a query that selects the newest record from the table for a certain set of data? To do this, I would use the MIN () function on a column such as 'time_of_entry' or 'order_id' if the 'order_id' is a sequence? Perhaps something like the below maybe...? Though my construction of the query is probably incorrect - I'm only learning! SELECT * FROM "data" WHERE "data"."fk" = '238496' AND "data"."id" = (SELECT MIN("data"."id") Cheers James On 29 June 2011 14:48, Odd Hogstad <odd.hogstad@smartm.no> wrote: > I need to get the latest entry of a large table matching a certain criteria. > This is my query: > > SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC > LIMIT 1 > > This query is quite slow. If I do a explain on it, it seems that it uses an > Index Scan Backward. > > If I omit the order by on the query: > > SELECT * FROM "data" WHERE "data"."fk" = 238496 LIMIT 1 > > It is very fast. And the explain says that it uses Index scan. This is also > very fast if there aren't any matches. But I've read that I'm not guaranteed > to get the correct match If I do not use a order by, postgres just returns > its fastest possible match. Is this right? But will not the fastest possible > match always be the first match in the index? Is there another way to make > the order by query go faster? > > Thanks! > > Odd-R. > > >
On Wed, 29 Jun 2011 15:48:56 +0200, Odd Hogstad <odd.hogstad@smartm.no> wrote: > SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC > LIMIT 1 > > This query is quite slow. If I do a explain on it, it seems that it uses an > Index Scan Backward. > > If I omit the order by on the query: > > SELECT * FROM "data" WHERE "data"."fk" = 238496 LIMIT 1 > > It is very fast. And the explain says that it uses Index scan. This is also > very fast if there aren't any matches. But I've read that I'm not guaranteed > to get the correct match If I do not use a order by, postgres just returns > its fastest possible match. Is this right? But will not the fastest possible > match always be the first match in the index? Is there another way to make > the order by query go faster? Unfortunately (and AFAIK), you don't have any other solution as you want the *latest* row; may be often clustering this table in this order would help a bit. Perhaps creating fragmented indexes could also help (1 >= data.fk < 50001, and so on) JY -- He asked me if I knew what time it was -- I said yes, but not right now. -- Steven Wright
> SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC
From the docs:
By default, B-tree indexes store their entries in ascending order with nulls last. This means that a forward scan of an index on column x produces output satisfying ORDER BY x (or more verbosely, ORDER BY x ASC NULLS LAST). The index can also be scanned backward, producing output satisfying ORDER BY x DESC (or more verbosely, ORDER BY x DESC NULLS FIRST, since NULLS FIRST is the default for ORDER BY DESC).
Doesn't this mean that when I'm not using the order by clause, and it uses a Index Scan, I will always get the latest value in return? Also I don't understand why the order by query is scanning backwards, when the record I want is in the other end?
Thanks!
Odd-R.
> LIMIT 1Unfortunately (and AFAIK), you don't have any other solution as you want the
>
> This query is quite slow. If I do a explain on it, it seems that it uses an
> Index Scan Backward.
>
> If I omit the order by on the query:
>
> SELECT * FROM "data" WHERE "data"."fk" = 238496 LIMIT 1
>
> It is very fast. And the explain says that it uses Index scan. This is also
> very fast if there aren't any matches. But I've read that I'm not guaranteed
> to get the correct match If I do not use a order by, postgres just returns
> its fastest possible match. Is this right? But will not the fastest possible
> match always be the first match in the index? Is there another way to make
> the order by query go faster?
*latest* row; may be often clustering this table in this order would help a
bit.
Perhaps creating fragmented indexes could also help (1 >= data.fk < 50001, and so on)
From the docs:
By default, B-tree indexes store their entries in ascending order with nulls last. This means that a forward scan of an index on column x produces output satisfying ORDER BY x (or more verbosely, ORDER BY x ASC NULLS LAST). The index can also be scanned backward, producing output satisfying ORDER BY x DESC (or more verbosely, ORDER BY x DESC NULLS FIRST, since NULLS FIRST is the default for ORDER BY DESC).
Doesn't this mean that when I'm not using the order by clause, and it uses a Index Scan, I will always get the latest value in return? Also I don't understand why the order by query is scanning backwards, when the record I want is in the other end?
Thanks!
Odd-R.
On Wed, 29 Jun 2011 16:42:36 +0200, Odd Hogstad <odd.hogstad@smartm.no> wrote: ... > By default, B-tree indexes store their entries in ascending order with nulls > last. This means that a forward scan of an index on column x produces output > satisfying ORDER BY x (or more verbosely, ORDER BY x ASC NULLS LAST). The > index can also be scanned backward, producing output satisfying ORDER BY x > DESC (or more verbosely, ORDER BY x DESC NULLS FIRST, since NULLS FIRST is > the default for ORDER BY DESC). > > Doesn't this mean that when I'm not using the order by clause, and it uses a > Index Scan, I will always get the latest value in return? Yes, but you're ordering by column id while you seek value into column fk. This is very different from the doc you quote: in your case, ordering by id returns you all values of fk BUT the ordering of fk is absolutely undefined. And, as I suppose fk stands for Foreign Key, you have many row using same values for fk (?) > Also I don't > understand why the order by query is scanning backwards, when the record I > want is in the other end? Because id is the primary key (I guess:) and ordering DESC puts id latest rows first in list, so limiting select to 1 returns the last one. -- BOFH excuse #287: Telecommunications is downshifting.
Odd Hogstad <odd.hogstad@smartm.no> writes: > I need to get the latest entry of a large table matching a certain criteria. > This is my query: > SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC > LIMIT 1 A two-column index on (fk, id) would help for this. If you think about the ordering of the index entries you'll see why. regards, tom lane
2011/6/29 Jean-Yves F. Barbier <12ukwn@gmail.com>
The ordering of the fk doesn't matter to me now. Yes, there might be (and are) several ones with the same value for this. I just want the latest added one that matches. And I don't understand why this is not always the first one matching a forward scan, as new entries are put in front?
> Also I don't
> understand why the order by query is scanning backwards, when the record I
> want is in the other end?
Then why is it slow?
On Wed, 29 Jun 2011 16:42:36 +0200, Odd Hogstad <odd.hogstad@smartm.no> wrote:
...> By default, B-tree indexes store their entries in ascending order with nullsYes, but you're ordering by column id while you seek value into column fk.
> last. This means that a forward scan of an index on column x produces output
> satisfying ORDER BY x (or more verbosely, ORDER BY x ASC NULLS LAST). The
> index can also be scanned backward, producing output satisfying ORDER BY x
> DESC (or more verbosely, ORDER BY x DESC NULLS FIRST, since NULLS FIRST is
> the default for ORDER BY DESC).
>
> Doesn't this mean that when I'm not using the order by clause, and it uses a
> Index Scan, I will always get the latest value in return?
This is very different from the doc you quote: in your case, ordering by id
returns you all values of fk BUT the ordering of fk is absolutely undefined.
And, as I suppose fk stands for Foreign Key, you have many row using same
values for fk (?)
The ordering of the fk doesn't matter to me now. Yes, there might be (and are) several ones with the same value for this. I just want the latest added one that matches. And I don't understand why this is not always the first one matching a forward scan, as new entries are put in front?
> Also I don't
> understand why the order by query is scanning backwards, when the record I
> want is in the other end?
Because id is the primary key (I guess:) and ordering DESC puts id latest
rows first in list, so limiting select to 1 returns the last one.
Then why is it slow?
On Wed, 29 Jun 2011 17:19:00 +0200, Odd Hogstad <odd.hogstad@smartm.no> wrote: ... > > The ordering of the fk doesn't matter to me now. It should: it gives the condition that let you get the latest fk... > Yes, there might be (and > are) several ones with the same value for this. I just want the latest added > one that matches. And I don't understand why this is not always the first > one matching a forward scan, as new entries are put in front? Because, according to the docs, such a query as you told about *cannot* guaranty the order of the rows (logical: you ask for all fk=111 but nothing except sorting on id can insure you'll have fk rows in the right order.) > > Also I don't > > understand why the order by query is scanning backwards, when the record I > > want is in the other end? Take a sheet of paper and a pencil, write the whole shebang down, make this model run'by'hand and you'll see why. > Because id is the primary key (I guess:) and ordering DESC puts id latest > > rows first in list, so limiting select to 1 returns the last one. Anyway, Tom gave you the answer to speed up your query. -- <stu> Stupid nick highlighting <stu> Whenever someone starts with "stupid" it highlights the nick. Hmm. -- #Debian
>> I need to get the latest entry of a large table matching a certain criteria.
>> This is my query:
>> SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC
>> LIMIT 1
>A two-column index on (fk, id) would help for this. If you think about
>the ordering of the index entries you'll see why.
I added an index for this, and now it goes much faster, thanks. The ones that contain data is now superfast, 1-2 ms, but the ones that does not match, and returns empty sets, are considerably slower, 150 -200 ms. Is this something I have to live with, or are there any tricks that will make these queries fast to?
Thanks again!
Odd-R.
>> This is my query:
>> SELECT * FROM "data" WHERE "data"."fk" = 238496 ORDER BY "data"."id" DESC
>> LIMIT 1
>A two-column index on (fk, id) would help for this. If you think about
>the ordering of the index entries you'll see why.
I added an index for this, and now it goes much faster, thanks. The ones that contain data is now superfast, 1-2 ms, but the ones that does not match, and returns empty sets, are considerably slower, 150 -200 ms. Is this something I have to live with, or are there any tricks that will make these queries fast to?
Thanks again!
Odd-R.