Thread: Scalar in a range (but textual not numeric)

Scalar in a range (but textual not numeric)

From

Richard Huxton

Date:

25 February 2004, 15:18:30

Large table representing non-overlapping blocks:

blocks(id int4, min varchar, max varchar)

SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max;

The estimator gets the wrong plan because it doesn't realise there's (at most) 
only one block that can match.

Can't use any of the geometry related types since we've got text here not 
numbers. Nothing in the archives seems quite right (AFAICT).

Any smart ideas? I'm happy to trade time when updating the blocks table 
against lookup speed.

--  Richard Huxton Archonet Ltd

Re: Scalar in a range (but textual not numeric)

From

Richard Huxton

Date:

25 February 2004, 16:36:49

On Wednesday 25 February 2004 19:18, Richard Huxton wrote:
> Large table representing non-overlapping blocks:
>
> blocks(id int4, min varchar, max varchar)
>
> SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max;
>
> The estimator gets the wrong plan because it doesn't realise there's (at
> most) only one block that can match.

Well, replying to myself (just one of my many bad habits) the best I've come 
up with so far is to add another column with a trimmed string and do a direct 
comparison against that too:

SELECT * FROM blocks WHERE substring('ABCDE',1,3)=block_segment AND 'ABCDE' 
BETWEEN min AND max

This gives the planner something to work with, and on 7.4 it even renders it 
down to 'ABC' first too (nice :-)

That's not quite the same though, because it means I need to split 
ABCAA..ABDBB into ABCAA..ABCZZ and ABDAA..ABDZZ but it's close enough unless 
someone is feeling clever this evening.

--  Richard Huxton Archonet Ltd

Re: Scalar in a range (but textual not numeric)

From

Joe Conway

Date:

25 February 2004, 16:57:30

Richard Huxton wrote:
> That's not quite the same though, because it means I need to split 
> ABCAA..ABDBB into ABCAA..ABCZZ and ABDAA..ABDZZ but it's close enough unless 
> someone is feeling clever this evening.
> 

Would (a series of) partial indexes help?

Joe

Re: Scalar in a range (but textual not numeric)

From

Tom Lane

Date:

25 February 2004, 17:32:18

Richard Huxton <dev@archonet.com> writes:
> Large table representing non-overlapping blocks:
> blocks(id int4, min varchar, max varchar)

> SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max;

> The estimator gets the wrong plan because it doesn't realise there's
> (at most) only one block that can match.

Even if it did realize that, it couldn't do much, because this query
isn't indexable as it stands.

I wonder whether you could adapt the "line segment" datatype
(see contrib/seg/) into a sort of "text segment" thingy and use the
GiST indexing support on that.  You'd have a query likeWHERE min_max_object overlaps-operator 'ABCDE'
and the overlaps operator would be a GiST-indexable one.
        regards, tom lane

Re: Scalar in a range (but textual not numeric)

From

Richard Huxton

Date:

25 February 2004, 18:09:03

On Wednesday 25 February 2004 21:32, Tom Lane wrote:
> Richard Huxton <dev@archonet.com> writes:
> > Large table representing non-overlapping blocks:
> > blocks(id int4, min varchar, max varchar)
> >
> > SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max;
> >
> > The estimator gets the wrong plan because it doesn't realise there's
> > (at most) only one block that can match.
>
> Even if it did realize that, it couldn't do much, because this query
> isn't indexable as it stands.

Well, it is in the sense that an index can be used. Here I'd defined pkey as 
(min,max,id) and set enable_seqscan=off

->  Index Scan using prnblock_range_pkey on prnblock_range  
(cost=0.00..1403.99 rows=892 width=33) (actual time=23.88..24.07 rows=1 
loops=1)   Index Cond: (('09050091234'::character varying >= pr_min) AND 
('09050091234'::character varying <= pr_max))

Of course, what I really want is a "varchar_range" type with its own 
indexing...

> I wonder whether you could adapt the "line segment" datatype
> (see contrib/seg/) into a sort of "text segment" thingy and use the
> GiST indexing support on that.  You'd have a query like
>     WHERE min_max_object overlaps-operator 'ABCDE'
> and the overlaps operator would be a GiST-indexable one.

Yep, that's the sort of thing I was wanting, just not worth the trouble in 
this case. It's not the heart of the system, only a corner case.

Thanks anyway Tom
--  Richard Huxton Archonet Ltd

Re: Scalar in a range (but textual not numeric)

From

Richard Huxton

Date:

25 February 2004, 18:10:54

On Wednesday 25 February 2004 20:56, Joe Conway wrote:
> Richard Huxton wrote:
> > That's not quite the same though, because it means I need to split
> > ABCAA..ABDBB into ABCAA..ABCZZ and ABDAA..ABDZZ but it's close enough
> > unless someone is feeling clever this evening.
>
> Would (a series of) partial indexes help?

Not here - it's the irritation of splitting my single range into two at some 
arbitrary boundary that bugs me with this solution.

Ah well - if it was that straightforward, I wouldn't be able to charge for it.

Thanks Joe
--  Richard Huxton Archonet Ltd

Re: Scalar in a range (but textual not numeric)

From

Tom Lane

Date:

25 February 2004, 18:32:36

Richard Huxton <dev@archonet.com> writes:
> On Wednesday 25 February 2004 21:32, Tom Lane wrote:
>>> SELECT * FROM blocks WHERE 'ABCDE' BETWEEN min AND max;
>>
>> Even if it did realize that, it couldn't do much, because this query
>> isn't indexable as it stands.

> Well, it is in the sense that an index can be used. Here I'd defined pkey as 
> (min,max,id) and set enable_seqscan=off

> ->  Index Scan using prnblock_range_pkey on prnblock_range  
> (cost=0.00..1403.99 rows=892 width=33) (actual time=23.88..24.07 rows=1 
> loops=1)
>     Index Cond: (('09050091234'::character varying >= pr_min) AND 
> ('09050091234'::character varying <= pr_max))

You're mistaking "I can put all the relevant columns into an index" for
"this index is useful".  There's a reason why the planner will not use
that plan without a gun to its head, and it is that you have only a
one-sided restriction on the first index column.  At runtime, the index
machinery will have to scan starting at pr_min = '09050091234' and
continuing clear to the end of the index.  It will avoid visiting the
heap for the index entries past the desired range, but since it does not
understand that there's any relation between the pr_min and pr_max
columns, it won't realize that it doesn't need to continue the index
scan past the first failure of '09050091234' <= pr_max.  For all it
knows, there could be lots of entries with larger pr_min and smaller
pr_max.

Now that I think about it, you can exploit your knowledge that the
min/max subranges don't overlap without building a new index type.
What you have to do is put the knowledge into the query.  Instead ofWHERE 'ABCDE' >= pr_min AND 'ABCDE' <= pr_max
try writingWHERE 'ABCDE' >= pr_min AND 'ABCDE' <= pr_max      AND pr_min < (SELECT pr_min FROM table            WHERE
pr_min> 'ABCDE'            ORDER BY pr_min LIMIT 1)

The idea here is to add an upper bound on pr_min to the index scan
conditions, so that the scan can stop short of the end of the index.
The sub-SELECT will efficiently use the index to pick out a safe
stopping point.
        regards, tom lane

Re: Scalar in a range (but textual not numeric)

From

Tom Lane

Date:

25 February 2004, 18:37:33

I wrote:
> try writing
>     WHERE 'ABCDE' >= pr_min AND 'ABCDE' <= pr_max
>           AND pr_min < (SELECT pr_min FROM table
>                 WHERE pr_min > 'ABCDE'
>                 ORDER BY pr_min LIMIT 1)
> The idea here is to add an upper bound on pr_min to the index scan
> conditions, so that the scan can stop short of the end of the index.

Argh, got that backwards.  What you are missing is a *lower* bound on
pr_min, and the index scan will therefore run from the start of the
index up to pr_min = 'ABCDE'.  So reverse the sense of the added test:
      AND pr_min >= (SELECT pr_min FROM table             WHERE pr_min <= 'ABCDE'             ORDER BY pr_min DESC
LIMIT1)
 
        regards, tom lane

Re: Scalar in a range (but textual not numeric)

From

Richard Huxton

Date:

26 February 2004, 04:03:25

On Wednesday 25 February 2004 22:37, Tom Lane wrote:
> I wrote:
> > try writing
> >     WHERE 'ABCDE' >= pr_min AND 'ABCDE' <= pr_max
> >           AND pr_min < (SELECT pr_min FROM table
> >                 WHERE pr_min > 'ABCDE'
> >                 ORDER BY pr_min LIMIT 1)
> > The idea here is to add an upper bound on pr_min to the index scan
> > conditions, so that the scan can stop short of the end of the index.
>
> Argh, got that backwards.  What you are missing is a *lower* bound on
> pr_min, and the index scan will therefore run from the start of the
> index up to pr_min = 'ABCDE'.  So reverse the sense of the added test:
>
>           AND pr_min >= (SELECT pr_min FROM table
>                  WHERE pr_min <= 'ABCDE'
>                  ORDER BY pr_min DESC LIMIT 1)

Aha! I was trying something like that, but couldn't get it quite right and it 
was getting too late for me to see clearly.

Thanks Tom, I'll have a play with this later today.
--  Richard Huxton Archonet Ltd

Re: Scalar in a range (but textual not numeric)

From

Edmund Bacon

Date:

26 February 2004, 10:55:27

On Wed, 2004-02-25 at 12:18, Richard Huxton wrote:

As a complete aside:

Is there any advantage to use varchar instead of type text?

Thanks

Re: Scalar in a range (but textual not numeric)

From

Bruno Wolff III

Date:

26 February 2004, 11:23:53

On Thu, Feb 26, 2004 at 07:55:14 -0700, Edmund Bacon <ebacon@onesystem.com> wrote:
> On Wed, 2004-02-25 at 12:18, Richard Huxton wrote:
> 
> As a complete aside:
> 
> Is there any advantage to use varchar instead of type text?

Only if there is a business rule that limits the length of the data.