Re: Seq scan vs. Index scan with different query - Mailing list pgsql-performance

From Andrew McMillan
Subject Re: Seq scan vs. Index scan with different query
Date
Msg-id 1089055325.6664.220.camel@lamb.mcmillan.net.nz
Whole thread Raw
In response to Re: Seq scan vs. Index scan with different query conditions  (eleven@ludojad.itpp.pl)
List pgsql-performance
On Mon, 2004-07-05 at 15:46 +0200, eleven@ludojad.itpp.pl wrote:
> On Mon, Jul 05, 2004 at 11:44:13PM +1200, Andrew McMillan wrote:
>
> > > DateTimeIndex was created on both columns (Date/Time):
> > > CREATE INDEX "DateTimeIndex" ON "tablex" USING btree ("Date", "Time");
> > PostgreSQL is always going to switch at some point, where the number of
> > rows that have to be read from the table exceed some percentage of the
> > total rows in the table.
> > We can possibly be more helpful if you send EXPLAIN ANALYZE, rather than
> > just EXPLAIN.
>
> Unfortunately that seq scan vs. index scan
> heuristic was wrong - full scan kills the machine
> in no time due to large amount of INSERTs happening
> in the background (I/O bottleneck).

In that case you could perhaps consider tweaking various parameters in
your postgresql.conf - with an ideal setup the switch should happen when
the costs are roughly equal.

Have you gone through the information here:
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Also, if table rows are regularly DELETEd or UPDATEd then you will need
to ensure it is regularly vacuumed.  Does a "VACUUM VERBOSE tablex" show
a large number of dead tuples?  Are you running pg_autovacuum?  Do you
get similar results immediately after a "VACUUM FULL ANALYZE tablex"?

Possibly there is an uneven distribution of rows in the table.  You
could consider increasing the statistics target:
ALTER TABLE tablex ALTER COLUMN "Date" SET STATISTICS;
ANALYZE tablex;


> > - Is this supposed to be a slice of midnight to 6pm, for each day
> > between 28 June and 4 July?  If you want a continuous period from
> > Midnight 28 June -> 6pm 4 July you're better to have a single timestamp
> > field.
> > - It is unlikely that the , "Time" on your index is adding much to your
> > selectivity, and it may be that you would be better off without it.
>
> Yes, we've figured out that index on Date + Time is rather useless.
> Thanks for the tip, we've created index upon Date column instead and
> it should be enough.

It may be that you are better with a single timestamp column with an
index on it in any case, if you want the data sorted in timestamp order.
Then you can ORDER BY <timestamp> as well, which will encourage the
index use further (although this advantage tends to get lost with the
DISTINCT).  You can still access the time part for a separate comparison
just with a cast.


> > - the DISTINCT can screw up your results, and it usually means that the
> > SQL is not really the best it could be.  A _real_ need for DISTINCT is
> > quite rare in my experience, and from what I have seen it adds overhead
> > and tends to encourage bad query plans when used unnecessarily.
>
> What do you mean? The reason for which there's DISTINCT in that query is
> because I want to know how many unique rows is in the table.
> Do you suggest selecting all rows and doing "DISTINCT"/counting
> on the application level?

That's fine, I've just seen it used far too many times as a substitute
for having an extra join, or an application that should only be
inserting unique rows in the first place.  Things like that.  It's just
one of those things that always sets off alarm bells when I'm reviewing
someone else's work, and on most of these occasions it has not been
justified when reexamined.

Cheers,
                    Andrew.

-------------------------------------------------------------------------
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/            PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201      MOB: +64(272)DEBIAN      OFFICE: +64(4)499-2267
It is truth which you cannot contradict; you can without any difficulty
                      contradict Socrates. - Plato
-------------------------------------------------------------------------

Attachment

pgsql-performance by date:

Previous
From: eleven@ludojad.itpp.pl
Date:
Subject: Re: Seq scan vs. Index scan with different query conditions
Next
From: Rodrigo Filgueira
Date:
Subject: to fsync or not to fsync