Re: POSTGRES DB 3 800 000 rows table, speed up? - Mailing list pgsql-general

From Klein Balázs
Subject Re: POSTGRES DB 3 800 000 rows table, speed up?
Date
Msg-id 20051229022442.0E6EE3E3962@graveyard.mail.t-online.hu
Whole thread Raw
In response to Re: POSTGRES DB 3 800 000 rows table, speed up?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: POSTGRES DB 3 800 000 rows table, speed up?
List pgsql-general
Could you explain this a little bit more?
What are the conditions of this situation that makes b-tree ineffective?

Thanks
SWK

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Lane
Sent: 2005. december 28. 20:04
To: Jim C. Nasby
Cc: Eugene; pgsql-general@postgresql.org
Subject: Re: [GENERAL] POSTGRES DB 3 800 000 rows table, speed up?

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Tue, Dec 27, 2005 at 11:25:37PM +0200, Eugene wrote:
>> I ask db like this  SELECT * FROM ipdb2 WHERE '3229285376' BETWEEN ipfrom

>> AND ipto;

> I'm pretty sure PostgreSQL won't be able to use any indexes for this
> (EXPLAIN ANALYZE would verify that). Instead, expand the between out:

> WHERE ipfrom >= '...' AND ipto <= '...'

That won't help (it is in fact exactly the same query, because BETWEEN
is just rewritten into that).  The real problem is that btree indexes
are ill-suited to this type of condition.  If the typical row has only
a small distance between ipfrom and ipto then the query is actually
pretty selective, but there is no way to capture that selectivity in
a btree search, because neither of the single-column comparisons are
selective at all.  The planner realizes this and doesn't bother with
the index, instead it just does a seqscan.

You could probably get somewhere by casting the problem as an rtree
or GIST overlap/containment query, but with the currently available
tools it would be a pretty unnatural-looking query ... probably
something like
    box(point(ipfrom,ipfrom),point(ipto,ipto)) ~
    box(point(3229285376,3229285376),point(3229285376,3229285376))
after creating an rtree or GIST index on
    box(point(ipfrom,ipfrom),point(ipto,ipto))
(haven't tried this but there is a solution lurking somewhere in this
general vicinity).

Is there a good reason why the data is stored this way, and not as
say a single "cidr" column containing subnet addresses?  Querying
    WHERE '192.122.252.0' << cidrcolumn
would be a much more transparent way of expressing your problem.
We don't currently have an easy indexing solution for that one either,
but we might in the future.

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly


pgsql-general by date:

Previous
From: "xiapw"
Date:
Subject: I want to know how to improve the security of postgresql
Next
From: "Jonel Rienton"
Date:
Subject: Re: [Bulk] Re: Final stored procedure question, for now anyway