Re: Indexes? - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Re: Indexes?
Date
Msg-id 20031015022813.GA22182@svana.org
Whole thread Raw
In response to Indexes?  (Mike Leahy <mgleahy@fes.uwaterloo.ca>)
List pgsql-general
In general, the <> operator can't be indexed since it will match most rows,
in which case you're better off scanning the table anyway.

If you really run that query often, perhaps the answer is a partial index.

On Tue, Oct 14, 2003 at 09:57:05PM -0400, Mike Leahy wrote:
> Hello all,
>
> I have a question related to the use of indexes.  One of my tables is part of
> a census, with over 2.5 million records in it.  Clearly, just about any query
> takes a rather long time to complete.  I was hoping to reduce the amount of
> time by using indexes.  However, no matter what I do, PostgreSQL never seems
> to use them.  It would seem to make sense that if I calculate something
> grouped by a set of variables that have an index created for them, that the
> index should be used rather than a sequential search, but it doesn't.  I have
> a table with an index such as the one created below:
>
> CREATE INDEX tbl_censo_poblacion_1993_manzana_idx ON tbl_censo_poblacion_1993
> ( dubicacion, zona, manzana );
>
> Then I try a query such as:
>
> select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
> from tbl_censo_poblacion_1993 group by dubicacion, zona, manzana;
>
> The results from explain indicate that a sequential scan is used (as far as I
> can tell).  I tried adding where statement:
>
> select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
> from tbl_censo_poblacion_1993 where dubicacion <> '' and zona <> '' and
> manzana <> '' group by dubicacion, zona, manzana;
>
> The EXPLAIN analysis results appear as follows:
>
> Aggregate  (cost=847587.90..879024.28 rows=251491 width=27) (actual
> time=272782.00..279458.00 rows=21459 loops=1)
>    ->  Group  (cost=847587.90..872737.01 rows=2514911 width=27) (actual
> time=272782.00..278546.00 rows=2553015 loops=1)
>          ->  Sort  (cost=847587.90..853875.18 rows=2514911 width=27) (actual
> time=272782.00..274533.00 rows=2553015 loop
> s=1)
>                Sort Key: dubicacion, zona, manzana
>                ->  Seq Scan on tbl_censo_poblacion_1993  (cost=0.00..328346.76
> rows=2514911 width=27) (actual time=0.00. .189570.00 rows=2553015 loops=1)
>                      Filter: ((dubicacion <> ''::character varying) AND (zona
> <> ''::character varying) AND (manzana <> ''::character varying))  Total
> runtime: 279494.00 msec (7 rows)
>
>
> So...can anyone suggest to me what I could do?  I'm using PostgreSQL 7.3.4-2
> in Cygwin on a WinXP platform.
>
> As an aside, is there any way to increase the amount of memory allocated to
> the postmaster.exe process?  It seems to me that if I could add more than 4MB
> that it has by default, then maybe that could increase the performance.
>
> Any help is appreciated.
>
> Kind regards,
> Mike
>
>
>
>
> ----------------------------------------
> This mail sent through www.mywaterloo.ca
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment

pgsql-general by date:

Previous
From: Mike Leahy
Date:
Subject: Indexes?
Next
From: Greg Stark
Date:
Subject: Re: Question