Thread: best way to fetch next/prev record based on index
> SELECT * FROM t WHERE > (a >= a1 AND b>=b1 AND c>=c1) ORDER BY a,b,c LIMIT 1 OFFSET 1; > > using the way LIMIT cuts down on sort time (I've never tried it with both > LIMIT and OFFSET, though; you could always use LIMIT 2 and skip a record > client-side if that works better). Don't want to further clutter the list (answered this question several times already), but your query does not work. What I meant to write was: select * from t where a >= a1 and (a > a1 or b >= b1) and (a > a1 or b > b1 or c > c1) order by a, b, c limit 1 The problem with your query is it excludes all values of c >= c1 regardless of values of a and b. Merlin
I said: > Oh, wait, you're right --- I'm mis-visualizing the situation. > Hmm, it sure seems like there ought to be an easy way to do this... The problem is that a multi-column index doesn't actually have the semantics you want. If you are willing to consider adding another index (or replacing the existing 3-column guy), how about create index ti on t((array[a,b,c])); select * from t where array[a,b,c] >= array[a1,b1,c1] order by array[a,b,c] limit 1 offset 1; This seems to do the right thing in 7.4 and later. It does require that all three columns have the same datatype though; you weren't specific about the data model ... regards, tom lane
Greg wrote: > One thing that can help is to add ORDER BY a,b,c LIMIT 1 to your query. > That > will virtually guarantee that it uses an index scan, which will at least > avoid > making it scan all the records *after* finding the match. However it still > doesn't seem to make Postgres use an Index Cond to allow it to do an > instant > lookup. Yes, order by/limit was accidentally left of my original example. My problem is with the word 'virtually'. > do it for multi-column keys. It seems it would be nice if some syntax > similar > to (a,b,c) > (a1,b1,c1) worked for this. 'nice' would be an understatement... if the above syntax is not defined in the standard, I would humbly suggest, well, beg for it to work as you thought it did. That would be GREAT! ISMT it may be that that is in fact standard...(I don't have it, so I don't know). Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > Plus, your where clause does not guarantee results. No, but in combination with the ORDER BY it does. Please note also that the offset would *always* be one, so your gripe about it not scaling seems misguided to me. regards, tom lane
"Merlin Moncure" <merlin.moncure@rcsonline.com> wrote .. [snip] > select * from t where > a >= a1 and > (a > a1 or b >= b1) and > (a > a1 or b > b1 or c > c1) I don't see why this is guaranteed to work without an ORDER BY clause, even if TABLE t is clustered on the correct index.Am I missing something? I have two suggestions: (1) I think I would have written SELECT * FROM t WHERE (a >= a1 AND b>=b1 AND c>=c1) ORDER BY a,b,c LIMIT 1 OFFSET 1; using the way LIMIT cuts down on sort time (I've never tried it with both LIMIT and OFFSET, though; you could always useLIMIT 2 and skip a record client-side if that works better). (2) I've seen code where depending on the types and values of the fields, it was possible to construct a string from a, b,c by some sort of concatenation where the index now agreed with the lexicographic (dictionary) ordering on the string.Postgres could do that with a functional index, if your values can be used with this trick.
Hi, Merlin, On Tue, 27 Jul 2004 10:21:32 -0400 "Merlin Moncure" <merlin.moncure@rcsonline.com> wrote: > The basic problem is the planner can't always match the query to the > index. So, either the planner has to be helped/fixed or I have to > explore another solution. This seems to happen most when the 'a' > column has very poor selectivity. In this case, the planner will only > examine the 'a' part of the key. So it may help to add some more indices so you have an index for all permutations, Create an index on (a,b,c), (a,c,b), (b,c,a), (b,a,c), (c,a,b) and (c,b,a). So as long as one of the rows has enough selectivity, the planner should be able to select the correct index. Maybe increasing the number of random samples for the rows is useful. HTH, Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com
> > So, for a table t with a three part key over columns a,b,c, the query > > to read the next value from t for given values a1, b1, c1 is > > > > select * from t where > > a >= a1 and > > (a > a1 or b >= b1) and > > (a > a1 or b > b1 or c > c1) > > You mut not rely on such trickery to get any ordering, as the SQL data > model contains no ordering, and a query optimizer is free to deliver you > the tuples in any order it feels like. > > Why don't you add a 'ORDER BY a,b,c ASC' to your query? Left that part out (oops) :). My queries always have that at the end (or they will give incorrect results!). All are suffixed with order by a,b,c limit n. n is manipulated in some cases for progressive read ahead (kind of like fetch 'n' in cursors)). The basic problem is the planner can't always match the query to the index. So, either the planner has to be helped/fixed or I have to explore another solution. This seems to happen most when the 'a' column has very poor selectivity. In this case, the planner will only examine the 'a' part of the key. Merlin
Tom Lane <tgl@sss.pgh.pa.us> writes: > "Merlin Moncure" <merlin.moncure@rcsonline.com> writes: >> Plus, your where clause does not guarantee results. > No, but in combination with the ORDER BY it does. Oh, wait, you're right --- I'm mis-visualizing the situation. Hmm, it sure seems like there ought to be an easy way to do this... regards, tom lane
> Hmm, it sure seems like there ought to be an easy way to do this... Here is the only alternative that I see: create function column_stacker(text[] columns, text[] types) returns text [...] language 'C' immutable; the above function stacks the columns together in a single string for easy range indexing. create index on t_idx(array[t.a::text, t.b::text, t.c::text], array['int', 'int', 'char(2)']); This is a lot more complicated then it sounds but it can be done. The use of arrays is forced because of limitations in the way pg handles parameters (no big deal). The real disadvantage here is that it these indexes don't help with normal queries so every key gets two indexes :(. I'm just looking for a nudge in the right direction here...if the answer is GIST, I'll start researching that, etc. The ideal solution for me would be a smarter planner or a simple C function to get the next record out of the index (exposed through a UDF). Everything has to stay generic...the ultimate goal is an ISAM driver for pg. Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > > do it for multi-column keys. It seems it would be nice if some syntax > > similar to (a,b,c) > (a1,b1,c1) worked for this. > > 'nice' would be an understatement... > > if the above syntax is not defined in the standard, I would humbly suggest, > well, beg for it to work as you thought it did. That would be GREAT! ISMT it > may be that that is in fact standard...(I don't have it, so I don't know). Hum. It would seem my intuition matches the SQL92 spec and Postgres gets this wrong. From page 208 (Section 8.2.7) of http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt 7) Let Rx and Ry be the two <row value constructor>s of the <com- parison predicate> and let RXi and RYi be the i-th <row value constructor element>s of Rx and Ry, respectively. "Rx <comp op> Ry" is true, false, or unknown as follows: a) "x = Ry" is true if and only if RXi = RYi for all i. b) "x <> Ry" is true if and only if RXi <> RYi for some i. c) "x < Ry" is true if and only if RXi = RYi for all i < n and RXn < RYn for some n. d) "x > Ry" is true if and only if RXi = RYi for all i < n and RXn > RYn for some n. ... (This is A July 10, 1992 Proposed revision, I don't know how far it differs from the final. I imagine they mean "Rx" in all the places they use "x" alone) That fairly clearly specifies (a,b,c) < (a1,b1,c1) to work the way you want it to. Less-than-or-equal is then defined based on the above definition. Even if Postgres did this right I'm not sure that would solve your index woes. I imagine the first thing Postgres would do is rewrite it into regular scalar expressions. Ideally the optimizer should be capable of then deducing from the scalar expressions that an index scan would be useful. -- greg
Markus wrote: > > The basic problem is the planner can't always match the query to the > > index. So, either the planner has to be helped/fixed or I have to > > explore another solution. This seems to happen most when the 'a' > > column has very poor selectivity. In this case, the planner will only > > examine the 'a' part of the key. > > So it may help to add some more indices so you have an index for all > permutations, > > Create an index on (a,b,c), (a,c,b), (b,c,a), (b,a,c), (c,a,b) and > (c,b,a). It is mathematically impossible for any index except for (a,b,c) to work. Although, in theory, (a,b) could be used...but that wouldn't help. In any event, creating 1000s and 1000s of extra indices is not an option. Here is some log snippets illustrating my problem: Here is a log snippet illustrating things when everything is working ok (notice the sub-millisecond times): prepare data3_read_next_menu_item_recent_file_0 (character varying, numeric, numeric, numeric) as select xmin, xmax, * from data3.menu_item_recent_file where mir_user_id >= $1 and (mir_user_id > $1 or mir_menu_item_id >= $2) and (mir_user_id > $1 or mir_menu_item_id > $2 or mir_sequence_no > $3) order by mir_user_id, mir_menu_item_id, mir_sequence_no limit $4 0.000849704 sec data3_read_next_menu_item_recent_file_0 0.000435999 sec params: $1=MERLIN $2=00057 $3=00000001 $4=1 data3_read_next_menu_item_recent_file_0 0.0117151 sec params: $1=MERLIN $2=00058 $3=00000002 $4=2 data3_read_next_menu_item_recent_file_0 0.0385374 sec params: $1=MERLIN $2=00203 $3=00000005 $4=3 data3_read_next_menu_item_recent_file_0 0.0211677 sec params: $1=MERLIN $2=00449 $3=00000010 $4=4 data3_read_next_menu_item_recent_file_0 0.000818999 sec params: $1=MERLIN $2=00813 $3=00000008 $4=5 Here is a log snippet when there is a problem: data3_start_nl_line_file_0 37.2677 sec params: $1= $2=008768 $3=003 $4=1 prepare data3_read_next_line_file_0 (character, numeric, numeric, numeric) as select xmin, xmax, * from data3.line_file where li_quote_flag >= $1 and (li_quote_flag > $1 or li_order_no >= $2) and (li_quote_flag > $1 or li_order_no > $2 or li_seq_no > $3) order by li_quote_flag, li_order_no, li_seq_no limit $4 0.000839501 sec data3_read_next_line_file_0 0.313869 sec params: $1= $2=008768 $3=005 $4=1 data3_read_next_line_file_0 0.343179 sec params: $1= $2=008768 $3=006 $4=2 data3_read_next_line_file_0 0.308703 sec params: $1= $2=008768 $3=008 $4=3 data3_read_next_line_file_0 0.306802 sec params: $1= $2=008768 $3=011 $4=4 data3_read_next_line_file_0 0.311033 sec params: $1= $2=008768 $3=015 $4=5 in the above statements, .3 sec to return a single row is very poor. Explain only matches li_quote_flag to the index which offers very poor selectivity. li_quote_flag is a char(1) and there is an index on line_file on the three where columns in the proper order. Merlin
> > select * from t where > > a >= a1 and > > (a > a1 or b >= b1) and > > (a > a1 or b > b1 or c > c1) > > > In about 95% of cases, the planner correctly selects the index t(a,b,c) > > and uses it. > > I'm surprised it's that good. Why not do It is. In fact, it's so good, I mistakenly assumed it would get it right all the time. That led me directly to my current situation. > select * from t where a >= a1 and b >= b1 and c >= c1 > order by a,b,c > limit 1 offset 1; Note: I left off the limit/order part of the query in my original example. My previous experience with offset was that it's not practical for this type of use. Query time degrades when offset gets large...it's basically n^2/2 for a scan of a table. If offset was pumped up to O(1) for any sized offset, the problem would be trivial. Plus, your where clause does not guarantee results. Imagine: a b c 2 3 4 4 2 1 c !> c1 The only other way to rewrite the query is thus (pg has much more trouble with this form): select * from t where a > a1 or (a >= a1 and b > b1) or (a >= a1 and b >= b1 and c > c1) Merlin
> Greg Stark <gsstark@MIT.EDU> writes: > This approach won't get the optimizer to actually use an index for these > comparisons, but it will fix the semantics to match the spec. Later we can > either improve the optimizer to detect expressions like this (which I > think > would be cooler since some users may write them by hand and not use the > row-expression approach, but I don't see how to do it), or introduce a new > run-time node and have the optimizer handle it. But at least we won't have > to > worry about backwards-compatibility issues with the semantics changing. > > Oh, I tried to stick to the style, but sometimes I couldn't help myself. I > suppose I would have to fix up the style the rest of the way if I got it > working and you wanted a patch to apply. Regarding the <= and >= operators: can you apply them in the complex pass? If you can, this might be more efficient. > /* > * Handles something like: > * (A,B,C) > (X,Y,Z) > * > * By constructing something like: > * ( ( A > X) OR (A=X AND B>Y) OR (A=X AND B=Y AND C>Z) ) > * ^ > */ | the last comparison of the last major clause (or the only comparison for a single field row construct) is a special case. In > cases use >, in >= cases use >=, etc.; this is logical equivalent to doing or of simple = intersected with complex >. Is this step of the transformation visible to the optimizer/planner? For purely selfish reasons, it would be really nice if a field by field row construction could get a fast path to the index if the fields match the index fields. Merlin
> Greg Stark <gsstark@mit.edu> writes: > > Removing <,<=,>,>= would be trivial. > > ... and not backwards-compatible. If we did that then cases involving > unlabeled row expressions would no longer work as they did in prior > releases. For example > > select (1,2,3) < (4,5,6); > > is accepted by all releases back to 7.0, and probably much further (but > 7.0 is the oldest I have handy to test). The only reason the code in > parse_expr.c appears new is that the functionality used to be in gram.y. > > I'd like to see this fixed to comply with the spec, but given the lack > of complaints about the existing behavior over so many years, ripping > it out meanwhile doesn't seem appropriate. Just to clarify: I think Greg is arguing to bring pg to SQL 92 spec and not remove anything. ISTM the SQL standard was designed with exactly my problem in mind: how to get the next key in a table. IMHO, relying on select (1,2,3) < (4,5,6); to give a result which is neither standard nor documented seems to be bad style. The current methodology could cause pg to give incorrect results in TPC benchmarks...not good. Also, it's trivial to rewrite that comparison with the old behavior using 'and'. OTOH, it is not trivial to rewrite the comparison to do it the correct way...it's kind of an SQL 'trick question'. Most likely, a very small minority of pg users are even away of the above syntax anyways. To be fair, I'm a big fan of deprecating features for at least one release for compatibility reasons. It's no big deal to me, because I'm already writing the queries out the long way anyways. My interests are in the optimizer. If there is a way to enhance it so that it multi-column comparisons in a logical way, that would be great. Is this theoretically possible (probable)? Merlin
Hello, I'm building a kind of messaging/forum application with postgresql. There are users which will send each others messages, send messages to forums, etc. I would like to put the messages in folders, so each user has several folders (inbox, sent...), and forums are also folders. A message will appear in the sender's "sent" folder and the receiver's inbox, or receiving forum folder. There are two ways I can do this : - either by placing two folder fields (outbox_folder_id and receiving_folder_id) in the messages table, which can both point to a folder or be null. When a user sends a message to another user/folder, these fields are set appropriately. - or by having a messages table, and a link table linking messages to folders. I have built a test database with about 20000 messages in 2000 folders (10 messages per folder). Finding all the messages in a folder takes more than 600 times longer with the link table than with the single table approach (66 ms vs. 0.12 ms). Is this normal ? I have checked explain analyze and it uses indexes in all cases. The query plans look right to me.
Just an update on this: queries in the 'next key' form on fields(a,b,c) only ever use the index for the first field (a). I just never noticed that before...in most cases this is selective enough. In most logical cases in multi part keys the most important stuff comes first. Curiously, queries in the Boolean reversed form (switching and with or, etc.) find the index but do not apply any condition, kind of like an indexed sequential scan... Well, if and when the rowtype comparison can be made to work over multi part keys (and the optimizer is made to do tricks there), postgres can be made to give much better generalized ISAM access. In the meantime, I'll just troubleshoot specific cases via application specific behavior as they come up. In any case, many thanks to Greg and Tom for taking the time to pick this apart. Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > Well, if and when the rowtype comparison can be made to work over multi > part keys (and the optimizer is made to do tricks there), postgres can > be made to give much better generalized ISAM access. In the meantime, > I'll just troubleshoot specific cases via application specific behavior > as they come up. In any case, many thanks to Greg and Tom for taking > the time to pick this apart. Well I'm not sure whether you caught it, but Tom did come up with a work-around that works with the current infrastructure if all the columns involved are the same datatype. You can create a regular btree index on the expression array[a,b,c] and then do your lookup using array[a,b,c] > array[a1,b1,c1]. This will only work in 7.4, not previous releases, for several reasons. -- greg
Greg Stark wrote: > Well I'm not sure whether you caught it, but Tom did come up with a > work-around that works with the current infrastructure if all the columns > involved are the same datatype. > > You can create a regular btree index on the expression array[a,b,c] and > then > do your lookup using array[a,b,c] > array[a1,b1,c1]. Unfortunately, ISAM files allow keys based on combinations of fields on any type. So this is not an option. (I have spent over 6 months researching this problem). However, this would work: Create index on t(stackparam(array[a::text,b::text,c::text), array['char(2)', 'int', 'date')]; With the 'type strings' queried out in advance. stackparam(text[], text[]) is a C function with uses the types and cats the strings together in such a way that preserves sorting. In any case, this is an ugly and inefficient mess, and I have no desire to do this unless there is no other way. I would much rather see postgres 'get' (a,b,c) > (a1, b1, c1)...if there is even a chance this is possible, I'll direct my efforts there. IMNSHO, this form was invented by the SQL folks for dealing with data in an ISAM manner, postgres should be able do it and do it well. Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > I would much rather see postgres 'get' (a,b,c) > (a1, > b1, c1)...if there is even a chance this is possible, I'll direct my > efforts there. For the ISAM context this whole discussion is kinda moot, because you really don't want to have to plan and execute a fairly expensive query for every record fetch. If we had all the improvements discussed, it would be a reasonably cheap operation by the standards of "execute an independent query", but by the standards of "fetch next record in my query" it'll still suck. (Using PREPARE might help, but only somewhat.) It strikes me that what you really want for ISAM is to improve the cursor mechanism so it will do the things you need. I'm not sure what's involved, but let's talk about that angle for a bit. regards, tom lane
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > However, this would work: > Create index on t(stackparam(array[a::text,b::text,c::text), > array['char(2)', 'int', 'date')]; Well, I fear not all datatypes sort properly when treated as text. Notably integers don't. "10" sorts before "2" for example. You could probably deal with this with careful attention to each datatype you're converting if you're interested in going to that length. -- greg
> "Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > > I would much rather see postgres 'get' (a,b,c) > (a1, > > b1, c1)...if there is even a chance this is possible, I'll direct my > > efforts there. > > For the ISAM context this whole discussion is kinda moot, because you > really don't want to have to plan and execute a fairly expensive query > for every record fetch. If we had all the improvements discussed, it > would be a reasonably cheap operation by the standards of "execute > an independent query", but by the standards of "fetch next record > in my query" it'll still suck. (Using PREPARE might help, but only > somewhat.) > > It strikes me that what you really want for ISAM is to improve the > cursor mechanism so it will do the things you need. I'm not sure > what's involved, but let's talk about that angle for a bit. Sorry for the long post, but here's an explanation of why I think things are better off where they are. I've created a middleware that translates ISAM file I/O on the fly to SQL and uses prepared statements over parse/bind to execute them. This is why I was so insistent against scoping prepared statement lifetime to transaction level. Cursors seem attractive at first but they are a decidedly mixed bag. First of all, PostgreSQL cursors are insensitive, which absolutely precludes their use. Supposing they weren't though, I'm not so sure I'd use them if it was possible to do the same things via vanilla queries. It turns out that prepared statements get the job done. Moving them to parse/bind got me a 30% drop in server cpu time and statement execution times run between .1 and .5 ms for random reads. Sequential reads go from that fast to slower depending on index performance. So, I don't have performance issues except where the index doesn't deliver. 2000-10000 reads/sec is competitive with commercial ISAM filesystems on the pc (assuming application is not running on the server), and it is far better than any other commercial ISAM emulation I've played with up to this point. Don't have concrete #s, but the server can deliver 2-3 times that in concurrency situations, in many cases the application performance is actually network bound...this is all on pc hardware. Of course, mainframes can do much, much better than this but that's really outside the scope of what I'm trying to do. So, things run pretty fast as they are, and keeping things running through queries keeps things generic and flexible. Also, queries consume zero resources on the server except for the time it takes to process and deal with them. Cursors, OTOH, have to be opened up for every table for every user. Cursors are read-only always, whereas views can be updated with rules. It's worth noting that other commercial ISAM emulation systems (for example Acu4GL by AcuCorp) cut queries on the fly as well, even when cursor options are available. If cursors became sensitive, they would be worth consideration. I've never really had 1000 large ones open at once, be interesting to see how that worked. In ideal object would be a 'pseudo cursor', an insensitive cursor shared by multiple users with each having their own record pointer. This might be better handled via middleware though (this might also give better options handling different locking scenarios). Merlin