Thread: best way to fetch next/prev record based on index

best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
> SELECT * FROM t WHERE
> (a >= a1 AND b>=b1 AND c>=c1) ORDER BY a,b,c LIMIT 1 OFFSET 1;
>
> using the way LIMIT cuts down on sort time (I've never tried it with
both
> LIMIT and OFFSET, though; you could always use LIMIT 2 and skip a
record
> client-side if that works better).

Don't want to further clutter the list (answered this question several
times already), but your query does not work.  What I meant to write
was:

select * from t where
    a >= a1 and
     (a >  a1 or b >= b1) and
     (a >  a1 or b > b1 or c > c1)
    order by a, b, c limit 1

The problem with your query is it excludes all values of c >= c1
regardless of values of a and b.

Merlin

Re: best way to fetch next/prev record based on index

From
Tom Lane
Date:
I said:
> Oh, wait, you're right --- I'm mis-visualizing the situation.
> Hmm, it sure seems like there ought to be an easy way to do this...

The problem is that a multi-column index doesn't actually have the
semantics you want.  If you are willing to consider adding another
index (or replacing the existing 3-column guy), how about

create index ti on t((array[a,b,c]));

select * from t where array[a,b,c] >= array[a1,b1,c1]
order by array[a,b,c]
limit 1 offset 1;

This seems to do the right thing in 7.4 and later.  It does require that
all three columns have the same datatype though; you weren't specific
about the data model ...

            regards, tom lane

Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
Greg wrote:
> One thing that can help is to add ORDER BY a,b,c LIMIT 1 to your
query.
> That
> will virtually guarantee that it uses an index scan, which will at
least
> avoid
> making it scan all the records *after* finding the match. However it
still
> doesn't seem to make Postgres use an Index Cond to allow it to do an
> instant
> lookup.

Yes, order by/limit was accidentally left of my original example.  My
problem is with the word 'virtually'.

> do it for multi-column keys. It seems it would be nice if some syntax
> similar
> to (a,b,c) > (a1,b1,c1) worked for this.

'nice' would be an understatement...
if the above syntax is not defined in the standard, I would humbly
suggest, well, beg for it to work as you thought it did.  That would be
GREAT!  ISMT it may be that that is in fact standard...(I don't have it,
so I don't know).

Merlin



Re: best way to fetch next/prev record based on index

From
Tom Lane
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> Plus, your where clause does not guarantee results.

No, but in combination with the ORDER BY it does.  Please note also
that the offset would *always* be one, so your gripe about it not
scaling seems misguided to me.

            regards, tom lane

Re: best way to fetch next/prev record based on index

From
andrew@pillette.com
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> wrote ..
[snip]
> select * from t where
>     a >= a1 and
>      (a >  a1 or b >= b1) and
>      (a >  a1 or b > b1 or c > c1)

I don't see why this is guaranteed to work without an ORDER BY clause, even if TABLE t is clustered on the correct
index.Am I missing something? I have two suggestions:  

(1) I think I would have written

SELECT * FROM t WHERE
(a >= a1 AND b>=b1 AND c>=c1) ORDER BY a,b,c LIMIT 1 OFFSET 1;

using the way LIMIT cuts down on sort time (I've never tried it with both LIMIT and OFFSET, though; you could always
useLIMIT 2 and skip a record client-side if that works better). 

(2) I've seen code where depending on the types and values of the fields, it was possible to construct a string from a,
b,c by some sort of concatenation where the index now agreed with the lexicographic (dictionary) ordering on the
string.Postgres could do that with a functional index, if your values can be used with this trick. 

Re: best way to fetch next/prev record based on index

From
Markus Schaber
Date:
Hi, Merlin,

On Tue, 27 Jul 2004 10:21:32 -0400
"Merlin Moncure" <merlin.moncure@rcsonline.com> wrote:

> The basic problem is the planner can't always match the query to the
> index.  So, either the planner has to be helped/fixed or I have to
> explore another solution.  This seems to happen most when the 'a'
> column has very poor selectivity.  In this case, the planner will only
> examine the 'a' part of the key.

So it may help to add some more indices so you have an index for all permutations,

Create an index on (a,b,c), (a,c,b), (b,c,a), (b,a,c), (c,a,b) and (c,b,a).

So as long as one of the rows has enough selectivity, the planner should
be able to select the correct index. Maybe increasing the number of
random samples for the rows is useful.

HTH,
Markus

--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@logi-track.com | www.logi-track.com

Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
> > So, for a table t with a three part key over columns a,b,c, the
query
> > to read the next value from t for given values a1, b1, c1 is
> >
> > select * from t where
> >     a >= a1 and
> >      (a >  a1 or b >= b1) and
> >      (a >  a1 or b > b1 or c > c1)
>
> You mut not rely on such trickery to get any ordering, as the SQL data
> model contains no ordering, and a query optimizer is free to deliver
you
> the tuples in any order it feels like.
>
> Why don't you add a 'ORDER BY a,b,c ASC' to your query?

Left that part out (oops) :).  My queries always have that at the end
(or they will give incorrect results!).  All are suffixed with order by
a,b,c limit n.  n is manipulated in some cases for progressive read
ahead (kind of like fetch 'n' in cursors)).

The basic problem is the planner can't always match the query to the
index.  So, either the planner has to be helped/fixed or I have to
explore another solution.  This seems to happen most when the 'a' column
has very poor selectivity.  In this case, the planner will only examine
the 'a' part of the key.

Merlin

Re: best way to fetch next/prev record based on index

From
Tom Lane
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> "Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
>> Plus, your where clause does not guarantee results.

> No, but in combination with the ORDER BY it does.

Oh, wait, you're right --- I'm mis-visualizing the situation.

Hmm, it sure seems like there ought to be an easy way to do this...

            regards, tom lane

Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
> Hmm, it sure seems like there ought to be an easy way to do this...

Here is the only alternative that I see:
create function column_stacker(text[] columns, text[] types) returns
text
[...]
language 'C' immutable;

the above function stacks the columns together in a single string for
easy range indexing.

create index on t_idx(array[t.a::text, t.b::text, t.c::text],
array['int', 'int', 'char(2)']);

This is a lot more complicated then it sounds but it can be done.  The
use of arrays is forced because of limitations in the way pg handles
parameters (no big deal).  The real disadvantage here is that it these
indexes don't help with normal queries so every key gets two indexes :(.

I'm just looking for a nudge in the right direction here...if the answer
is GIST, I'll start researching that, etc.  The ideal solution for me
would be a smarter planner or a simple C function to get the next record
out of the index (exposed through a UDF).

Everything has to stay generic...the ultimate goal is an ISAM driver for
pg.

Merlin



Re: best way to fetch next/prev record based on index

From
Greg Stark
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:

> > do it for multi-column keys. It seems it would be nice if some syntax
> > similar to (a,b,c) > (a1,b1,c1) worked for this.
>
> 'nice' would be an understatement...
>
> if the above syntax is not defined in the standard, I would humbly suggest,
> well, beg for it to work as you thought it did. That would be GREAT! ISMT it
> may be that that is in fact standard...(I don't have it, so I don't know).


Hum. It would seem my intuition matches the SQL92 spec and Postgres gets this
wrong.

From page 208 (Section 8.2.7) of
 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt


         7) Let Rx and Ry be the two <row value constructor>s of the <com-
            parison predicate> and let RXi and RYi be the i-th <row value
            constructor element>s of Rx and Ry, respectively. "Rx <comp op>
            Ry" is true, false, or unknown as follows:

            a) "x = Ry" is true if and only if RXi = RYi for all i.

            b) "x <> Ry" is true if and only if RXi <> RYi for some i.

            c) "x < Ry" is true if and only if RXi = RYi for all i < n and
              RXn < RYn for some n.

            d) "x > Ry" is true if and only if RXi = RYi for all i < n and
              RXn > RYn for some n.

            ...


(This is A July 10, 1992 Proposed revision, I don't know how far it differs
from the final. I imagine they mean "Rx" in all the places they use "x" alone)

That fairly clearly specifies (a,b,c) < (a1,b1,c1) to work the way you want it
to. Less-than-or-equal is then defined based on the above definition.


Even if Postgres did this right I'm not sure that would solve your index woes.
I imagine the first thing Postgres would do is rewrite it into regular scalar
expressions. Ideally the optimizer should be capable of then deducing from the
scalar expressions that an index scan would be useful.

--
greg

Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
Markus wrote:
> > The basic problem is the planner can't always match the query to the
> > index.  So, either the planner has to be helped/fixed or I have to
> > explore another solution.  This seems to happen most when the 'a'
> > column has very poor selectivity.  In this case, the planner will
only
> > examine the 'a' part of the key.
>
> So it may help to add some more indices so you have an index for all
> permutations,
>
> Create an index on (a,b,c), (a,c,b), (b,c,a), (b,a,c), (c,a,b) and
> (c,b,a).

It is mathematically impossible for any index except for (a,b,c) to
work.  Although, in theory, (a,b) could be used...but that wouldn't
help.  In any event, creating 1000s and 1000s of extra indices is not an
option.

Here is some log snippets illustrating my problem:

Here is a log snippet illustrating things when everything is working ok
(notice the sub-millisecond times):

prepare data3_read_next_menu_item_recent_file_0 (character varying,
numeric, numeric, numeric)
    as select xmin, xmax, *
    from data3.menu_item_recent_file
    where mir_user_id >= $1 and
        (mir_user_id >  $1 or  mir_menu_item_id >= $2) and
        (mir_user_id >  $1 or  mir_menu_item_id >  $2 or
mir_sequence_no >  $3)
    order by mir_user_id, mir_menu_item_id, mir_sequence_no
    limit $4 0.000849704 sec
data3_read_next_menu_item_recent_file_0 0.000435999 sec params:
$1=MERLIN $2=00057 $3=00000001 $4=1
data3_read_next_menu_item_recent_file_0 0.0117151 sec params: $1=MERLIN
$2=00058 $3=00000002 $4=2
data3_read_next_menu_item_recent_file_0 0.0385374 sec params: $1=MERLIN
$2=00203 $3=00000005 $4=3
data3_read_next_menu_item_recent_file_0 0.0211677 sec params: $1=MERLIN
$2=00449 $3=00000010 $4=4
data3_read_next_menu_item_recent_file_0 0.000818999 sec params:
$1=MERLIN $2=00813 $3=00000008 $4=5

Here is a log snippet when there is a problem:
data3_start_nl_line_file_0 37.2677 sec params: $1= $2=008768 $3=003 $4=1

prepare data3_read_next_line_file_0 (character, numeric, numeric,
numeric)
    as select xmin, xmax, *
    from data3.line_file
    where li_quote_flag >= $1 and
        (li_quote_flag >  $1 or  li_order_no >= $2) and
        (li_quote_flag >  $1 or  li_order_no >  $2 or  li_seq_no
>  $3)
    order by li_quote_flag, li_order_no, li_seq_no
    limit $4 0.000839501 sec
data3_read_next_line_file_0 0.313869 sec params: $1= $2=008768 $3=005
$4=1
data3_read_next_line_file_0 0.343179 sec params: $1= $2=008768 $3=006
$4=2
data3_read_next_line_file_0 0.308703 sec params: $1= $2=008768 $3=008
$4=3
data3_read_next_line_file_0 0.306802 sec params: $1= $2=008768 $3=011
$4=4
data3_read_next_line_file_0 0.311033 sec params: $1= $2=008768 $3=015
$4=5

in the above statements, .3 sec to return a single row is very poor.
Explain only matches li_quote_flag to the index which offers very poor
selectivity.  li_quote_flag is a char(1) and there is an index on
line_file on the three where columns in the proper order.

Merlin

Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
> > select * from t where
> >     a >= a1 and
> >      (a >  a1 or b >= b1) and
> >      (a >  a1 or b > b1 or c > c1)
>
> > In about 95% of cases, the planner correctly selects the index
t(a,b,c)
> > and uses it.
>
> I'm surprised it's that good.  Why not do

It is.  In fact, it's so good, I mistakenly assumed it would get it
right all the time.  That led me directly to my current situation.

>     select * from t where a >= a1 and b >= b1 and c >= c1
>     order by a,b,c
>     limit 1 offset 1;
Note: I left off the limit/order part of the query in my original
example.

My previous experience with offset was that it's not practical for this
type of use.  Query time degrades when offset gets large...it's
basically n^2/2 for a scan of a table.  If offset was pumped up to O(1)
for any sized offset, the problem would be trivial.

Plus, your where clause does not guarantee results.

Imagine:
a  b  c
2  3  4
4  2  1

c !> c1

The only other way to rewrite the query is thus (pg has much more
trouble with this form):
select * from t where
    a >   a1 or
     (a >=  a1 and b > b1) or
     (a >=  a1 and b >= b1 and c > c1)

Merlin

Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
> Greg Stark <gsstark@MIT.EDU> writes:
> This approach won't get the optimizer to actually use an index for
these
> comparisons, but it will fix the semantics to match the spec. Later we
can
> either improve the optimizer to detect expressions like this (which I
> think
> would be cooler since some users may write them by hand and not use
the
> row-expression approach, but I don't see how to do it), or introduce a
new
> run-time node and have the optimizer handle it. But at least we won't
have
> to
> worry about backwards-compatibility issues with the semantics
changing.
>
> Oh, I tried to stick to the style, but sometimes I couldn't help
myself. I
> suppose I would have to fix up the style the rest of the way if I got
it
> working and you wanted a patch to apply.

Regarding the <= and >= operators:  can you apply them in the complex
pass? If you can, this might be more efficient.

> /*
>  * Handles something like:
>  * (A,B,C) > (X,Y,Z)
>  *
>  * By constructing something like:
>  * ( ( A > X) OR (A=X AND B>Y) OR (A=X AND B=Y AND C>Z) )
>  *                                                  ^
>  */                                                 |

the last comparison of the last major clause (or the only comparison for
a single field row construct) is a special case.  In > cases use >, in
>= cases use >=, etc.; this is logical equivalent to doing or of simple
= intersected with complex >.

Is this step of the transformation visible to the optimizer/planner?
For purely selfish reasons, it would be really nice if a field by field
row construction could get a fast path to the index if the fields match
the index fields.

Merlin

Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
> Greg Stark <gsstark@mit.edu> writes:
> > Removing <,<=,>,>= would be trivial.
>
> ... and not backwards-compatible.  If we did that then cases involving
> unlabeled row expressions would no longer work as they did in prior
> releases.  For example
>
>     select (1,2,3) < (4,5,6);
>
> is accepted by all releases back to 7.0, and probably much further
(but
> 7.0 is the oldest I have handy to test).  The only reason the code in
> parse_expr.c appears new is that the functionality used to be in
gram.y.
>
> I'd like to see this fixed to comply with the spec, but given the lack
> of complaints about the existing behavior over so many years, ripping
> it out meanwhile doesn't seem appropriate.

Just to clarify:
I think Greg is arguing to bring pg to SQL 92 spec and not remove
anything.  ISTM the SQL standard was designed with exactly my problem in
mind: how to get the next key in a table.

IMHO, relying
on select (1,2,3) < (4,5,6);
to give a result which is neither standard nor documented seems to be
bad style.  The current methodology could cause pg to give incorrect
results in TPC benchmarks...not good.  Also, it's trivial to rewrite
that comparison with the old behavior using 'and'.  OTOH, it is not
trivial to rewrite the comparison to do it the correct way...it's kind
of an SQL 'trick question'.  Most likely, a very small minority of pg
users are even away of the above syntax anyways.

To be fair, I'm a big fan of deprecating features for at least one
release for compatibility reasons.  It's no big deal to me, because I'm
already writing the queries out the long way anyways.  My interests are
in the optimizer.  If there is a way to enhance it so that it
multi-column comparisons in a logical way, that would be great.  Is this
theoretically possible (probable)?

Merlin


Join performance

From
Pierre-Frédéric Caillaud
Date:
    Hello,

    I'm building a kind of messaging/forum application with postgresql. There
are users which will send each others messages, send messages to forums,
etc.
    I would like to put the messages in folders, so each user has several
folders (inbox, sent...), and forums are also folders.

    A message will appear in the sender's "sent" folder and the receiver's
inbox, or receiving forum folder.

    There are two ways I can do this :
    - either by placing two folder fields (outbox_folder_id and
receiving_folder_id) in the messages table, which can both point to a
folder or be null. When a user sends a message to another user/folder,
these fields are set appropriately.
    - or by having a messages table, and a link table linking messages to
folders.

    I have built a test database with about 20000 messages in 2000 folders
(10 messages per folder).

    Finding all the messages in a folder takes more than 600 times longer
with the link table than with the single table approach (66 ms vs. 0.12
ms).

    Is this normal ? I have checked explain analyze and it uses indexes in
all cases. The query plans look right to me.


Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
Just an update on this:
queries in the 'next key' form on fields(a,b,c)
only ever use the index for the first field (a).  I just never noticed
that before...in most cases this is selective enough.  In most logical
cases in multi part keys the most important stuff comes first.

Curiously, queries in the Boolean reversed form (switching and with or,
etc.) find the index but do not apply any condition, kind of like an
indexed sequential scan...

Well, if and when the rowtype comparison can be made to work over multi
part keys (and the optimizer is made to do tricks there), postgres can
be made to give much better generalized ISAM access.  In the meantime,
I'll just troubleshoot specific cases via application specific behavior
as they come up.  In any case, many thanks to Greg and Tom for taking
the time to pick this apart.

Merlin

Re: best way to fetch next/prev record based on index

From
Greg Stark
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:

> Well, if and when the rowtype comparison can be made to work over multi
> part keys (and the optimizer is made to do tricks there), postgres can
> be made to give much better generalized ISAM access.  In the meantime,
> I'll just troubleshoot specific cases via application specific behavior
> as they come up.  In any case, many thanks to Greg and Tom for taking
> the time to pick this apart.

Well I'm not sure whether you caught it, but Tom did come up with a
work-around that works with the current infrastructure if all the columns
involved are the same datatype.

You can create a regular btree index on the expression array[a,b,c] and then
do your lookup using array[a,b,c] > array[a1,b1,c1].

This will only work in 7.4, not previous releases, for several reasons.

--
greg

Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
Greg Stark wrote:
> Well I'm not sure whether you caught it, but Tom did come up with a
> work-around that works with the current infrastructure if all the
columns
> involved are the same datatype.
>
> You can create a regular btree index on the expression array[a,b,c]
and
> then
> do your lookup using array[a,b,c] > array[a1,b1,c1].

Unfortunately, ISAM files allow keys based on combinations of fields on
any type.  So this is not an option. (I have spent over 6 months
researching this problem).

However, this would work:
Create index on t(stackparam(array[a::text,b::text,c::text),
array['char(2)', 'int', 'date')];

With the 'type strings' queried out in advance.  stackparam(text[],
text[]) is a C function with uses the types and cats the strings
together in such a way that preserves sorting.  In any case, this is an
ugly and inefficient mess, and I have no desire to do this unless there
is no other way.  I would much rather see postgres 'get' (a,b,c) > (a1,
b1, c1)...if there is even a chance this is possible, I'll direct my
efforts there.  IMNSHO, this form was invented by the SQL folks for
dealing with data in an ISAM manner, postgres should be able do it and
do it well.

Merlin

Re: best way to fetch next/prev record based on index

From
Tom Lane
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> I would much rather see postgres 'get' (a,b,c) > (a1,
> b1, c1)...if there is even a chance this is possible, I'll direct my
> efforts there.

For the ISAM context this whole discussion is kinda moot, because you
really don't want to have to plan and execute a fairly expensive query
for every record fetch.  If we had all the improvements discussed, it
would be a reasonably cheap operation by the standards of "execute
an independent query", but by the standards of "fetch next record
in my query" it'll still suck.  (Using PREPARE might help, but only
somewhat.)

It strikes me that what you really want for ISAM is to improve the
cursor mechanism so it will do the things you need.  I'm not sure
what's involved, but let's talk about that angle for a bit.

            regards, tom lane

Re: best way to fetch next/prev record based on index

From
Greg Stark
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:

> However, this would work:
> Create index on t(stackparam(array[a::text,b::text,c::text),
> array['char(2)', 'int', 'date')];

Well, I fear not all datatypes sort properly when treated as text. Notably
integers don't. "10" sorts before "2" for example. You could probably deal
with this with careful attention to each datatype you're converting if you're
interested in going to that length.

--
greg

Re: best way to fetch next/prev record based on index

From
"Merlin Moncure"
Date:
> "Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> > I would much rather see postgres 'get' (a,b,c) > (a1,
> > b1, c1)...if there is even a chance this is possible, I'll direct my
> > efforts there.
>
> For the ISAM context this whole discussion is kinda moot, because you
> really don't want to have to plan and execute a fairly expensive query
> for every record fetch.  If we had all the improvements discussed, it
> would be a reasonably cheap operation by the standards of "execute
> an independent query", but by the standards of "fetch next record
> in my query" it'll still suck.  (Using PREPARE might help, but only
> somewhat.)
>
> It strikes me that what you really want for ISAM is to improve the
> cursor mechanism so it will do the things you need.  I'm not sure
> what's involved, but let's talk about that angle for a bit.

Sorry for the long post, but here's an explanation of why I think things
are better off where they are.

I've created a middleware that translates ISAM file I/O on the fly to
SQL and uses prepared statements over parse/bind to execute them.  This
is why I was so insistent against scoping prepared statement lifetime to
transaction level.
Cursors seem attractive at first but they are a decidedly mixed bag.
First of all, PostgreSQL cursors are insensitive, which absolutely
precludes their use.  Supposing they weren't though, I'm not so sure I'd
use them if it was possible to do the same things via vanilla queries.

It turns out that prepared statements get the job done.  Moving them to
parse/bind got me a 30% drop in server cpu time and statement execution
times run between .1 and .5 ms for random reads.  Sequential reads go
from that fast to slower depending on index performance.  So, I don't
have performance issues except where the index doesn't deliver.

2000-10000 reads/sec is competitive with commercial ISAM filesystems on
the pc (assuming application is not running on the server), and it is
far better than any other commercial ISAM emulation I've played with up
to this point.  Don't have concrete #s, but the server can deliver 2-3
times that in concurrency situations, in many cases the application
performance is actually network bound...this is all on pc hardware.  Of
course, mainframes can do much, much better than this but that's really
outside the scope of what I'm trying to do.

So, things run pretty fast as they are, and keeping things running
through queries keeps things generic and flexible.  Also, queries
consume zero resources on the server except for the time it takes to
process and deal with them.  Cursors, OTOH, have to be opened up for
every table for every user.  Cursors are read-only always, whereas views
can be updated with rules.  It's worth noting that other commercial ISAM
emulation systems (for example Acu4GL by AcuCorp) cut queries on the fly
as well, even when cursor options are available.

If cursors became sensitive, they would be worth consideration.  I've
never really had 1000 large ones open at once, be interesting to see how
that worked.  In ideal object would be a 'pseudo cursor', an insensitive
cursor shared by multiple users with each having their own record
pointer.  This might be better handled via middleware though (this might
also give better options handling different locking scenarios).

Merlin