Thread: Scalar in a range (but textual not numeric)
Large table representing non-overlapping blocks: blocks(id int4, min varchar, max varchar) SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max; The estimator gets the wrong plan because it doesn't realise there's (at most) only one block that can match. Can't use any of the geometry related types since we've got text here not numbers. Nothing in the archives seems quite right (AFAICT). Any smart ideas? I'm happy to trade time when updating the blocks table against lookup speed. -- Richard Huxton Archonet Ltd
On Wednesday 25 February 2004 19:18, Richard Huxton wrote: > Large table representing non-overlapping blocks: > > blocks(id int4, min varchar, max varchar) > > SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max; > > The estimator gets the wrong plan because it doesn't realise there's (at > most) only one block that can match. Well, replying to myself (just one of my many bad habits) the best I've come up with so far is to add another column with a trimmed string and do a direct comparison against that too: SELECT * FROM blocks WHERE substring('ABCDE',1,3)=block_segment AND 'ABCDE' BETWEEN min AND max This gives the planner something to work with, and on 7.4 it even renders it down to 'ABC' first too (nice :-) That's not quite the same though, because it means I need to split ABCAA..ABDBB into ABCAA..ABCZZ and ABDAA..ABDZZ but it's close enough unless someone is feeling clever this evening. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > That's not quite the same though, because it means I need to split > ABCAA..ABDBB into ABCAA..ABCZZ and ABDAA..ABDZZ but it's close enough unless > someone is feeling clever this evening. > Would (a series of) partial indexes help? Joe
Richard Huxton <dev@archonet.com> writes: > Large table representing non-overlapping blocks: > blocks(id int4, min varchar, max varchar) > SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max; > The estimator gets the wrong plan because it doesn't realise there's > (at most) only one block that can match. Even if it did realize that, it couldn't do much, because this query isn't indexable as it stands. I wonder whether you could adapt the "line segment" datatype (see contrib/seg/) into a sort of "text segment" thingy and use the GiST indexing support on that. You'd have a query likeWHERE min_max_object overlaps-operator 'ABCDE' and the overlaps operator would be a GiST-indexable one. regards, tom lane
On Wednesday 25 February 2004 21:32, Tom Lane wrote: > Richard Huxton <dev@archonet.com> writes: > > Large table representing non-overlapping blocks: > > blocks(id int4, min varchar, max varchar) > > > > SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max; > > > > The estimator gets the wrong plan because it doesn't realise there's > > (at most) only one block that can match. > > Even if it did realize that, it couldn't do much, because this query > isn't indexable as it stands. Well, it is in the sense that an index can be used. Here I'd defined pkey as (min,max,id) and set enable_seqscan=off -> Index Scan using prnblock_range_pkey on prnblock_range (cost=0.00..1403.99 rows=892 width=33) (actual time=23.88..24.07 rows=1 loops=1) Index Cond: (('09050091234'::character varying >= pr_min) AND ('09050091234'::character varying <= pr_max)) Of course, what I really want is a "varchar_range" type with its own indexing... > I wonder whether you could adapt the "line segment" datatype > (see contrib/seg/) into a sort of "text segment" thingy and use the > GiST indexing support on that. You'd have a query like > WHERE min_max_object overlaps-operator 'ABCDE' > and the overlaps operator would be a GiST-indexable one. Yep, that's the sort of thing I was wanting, just not worth the trouble in this case. It's not the heart of the system, only a corner case. Thanks anyway Tom -- Richard Huxton Archonet Ltd
On Wednesday 25 February 2004 20:56, Joe Conway wrote: > Richard Huxton wrote: > > That's not quite the same though, because it means I need to split > > ABCAA..ABDBB into ABCAA..ABCZZ and ABDAA..ABDZZ but it's close enough > > unless someone is feeling clever this evening. > > Would (a series of) partial indexes help? Not here - it's the irritation of splitting my single range into two at some arbitrary boundary that bugs me with this solution. Ah well - if it was that straightforward, I wouldn't be able to charge for it. Thanks Joe -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > On Wednesday 25 February 2004 21:32, Tom Lane wrote: >>> SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max; >> >> Even if it did realize that, it couldn't do much, because this query >> isn't indexable as it stands. > Well, it is in the sense that an index can be used. Here I'd defined pkey as > (min,max,id) and set enable_seqscan=off > -> Index Scan using prnblock_range_pkey on prnblock_range > (cost=0.00..1403.99 rows=892 width=33) (actual time=23.88..24.07 rows=1 > loops=1) > Index Cond: (('09050091234'::character varying >= pr_min) AND > ('09050091234'::character varying <= pr_max)) You're mistaking "I can put all the relevant columns into an index" for "this index is useful". There's a reason why the planner will not use that plan without a gun to its head, and it is that you have only a one-sided restriction on the first index column. At runtime, the index machinery will have to scan starting at pr_min = '09050091234' and continuing clear to the end of the index. It will avoid visiting the heap for the index entries past the desired range, but since it does not understand that there's any relation between the pr_min and pr_max columns, it won't realize that it doesn't need to continue the index scan past the first failure of '09050091234' <= pr_max. For all it knows, there could be lots of entries with larger pr_min and smaller pr_max. Now that I think about it, you can exploit your knowledge that the min/max subranges don't overlap without building a new index type. What you have to do is put the knowledge into the query. Instead ofWHERE 'ABCDE' >= pr_min AND 'ABCDE' <= pr_max try writingWHERE 'ABCDE' >= pr_min AND 'ABCDE' <= pr_max AND pr_min < (SELECT pr_min FROM table WHERE pr_min> 'ABCDE' ORDER BY pr_min LIMIT 1) The idea here is to add an upper bound on pr_min to the index scan conditions, so that the scan can stop short of the end of the index. The sub-SELECT will efficiently use the index to pick out a safe stopping point. regards, tom lane
I wrote: > try writing > WHERE 'ABCDE' >= pr_min AND 'ABCDE' <= pr_max > AND pr_min < (SELECT pr_min FROM table > WHERE pr_min > 'ABCDE' > ORDER BY pr_min LIMIT 1) > The idea here is to add an upper bound on pr_min to the index scan > conditions, so that the scan can stop short of the end of the index. Argh, got that backwards. What you are missing is a *lower* bound on pr_min, and the index scan will therefore run from the start of the index up to pr_min = 'ABCDE'. So reverse the sense of the added test: AND pr_min >= (SELECT pr_min FROM table WHERE pr_min <= 'ABCDE' ORDER BY pr_min DESC LIMIT1) regards, tom lane
On Wednesday 25 February 2004 22:37, Tom Lane wrote: > I wrote: > > try writing > > WHERE 'ABCDE' >= pr_min AND 'ABCDE' <= pr_max > > AND pr_min < (SELECT pr_min FROM table > > WHERE pr_min > 'ABCDE' > > ORDER BY pr_min LIMIT 1) > > The idea here is to add an upper bound on pr_min to the index scan > > conditions, so that the scan can stop short of the end of the index. > > Argh, got that backwards. What you are missing is a *lower* bound on > pr_min, and the index scan will therefore run from the start of the > index up to pr_min = 'ABCDE'. So reverse the sense of the added test: > > AND pr_min >= (SELECT pr_min FROM table > WHERE pr_min <= 'ABCDE' > ORDER BY pr_min DESC LIMIT 1) Aha! I was trying something like that, but couldn't get it quite right and it was getting too late for me to see clearly. Thanks Tom, I'll have a play with this later today. -- Richard Huxton Archonet Ltd
On Wed, 2004-02-25 at 12:18, Richard Huxton wrote: As a complete aside: Is there any advantage to use varchar instead of type text? Thanks
On Thu, Feb 26, 2004 at 07:55:14 -0700, Edmund Bacon <ebacon@onesystem.com> wrote: > On Wed, 2004-02-25 at 12:18, Richard Huxton wrote: > > As a complete aside: > > Is there any advantage to use varchar instead of type text? Only if there is a business rule that limits the length of the data.