Re: Poor performance on seq scan - Mailing list pgsql-performance

From Guido Neitzer
Subject Re: Poor performance on seq scan
Date
Msg-id fbbe50e0609180150k66e1462fg6591db841e490dc1@mail.gmail.com
Whole thread Raw
In response to Re: Poor performance on seq scan  (Piotr Kołaczkowski <P.Kolaczkowski@elka.pw.edu.pl>)
List pgsql-performance
Because there is no MVCC information in the index.

cug

2006/9/12, Piotr Kołaczkowski <P.Kolaczkowski@elka.pw.edu.pl>:
> On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote:
> > Laszlo Nagy wrote:
> > > I made another test. I create a file with the identifiers and names of
> > > the products:
> > >
> > > psql#\o products.txt
> > > psql#select id,name from product;
> > >
> > > Then I can search using grep:
> > >
> > > grep "Mug" products.txt | cut -f1 -d\|
> > >
> > > There is a huge difference. This command runs within 0.5 seconds. That
> > > is, at least 76 times faster than the seq scan. It is the same if I
> > > vacuum, backup and restore the database. I thought that the table is
> > > stored in one file, and the seq scan will be actually faster than
> > > grepping the file. Can you please tell me what am I doing wrong? I'm
> > > not sure if I can increase the performance of a seq scan by adjusting
> > > the values in postgresql.conf. I do not like the idea of exporting the
> > > product table periodically into a txt file, and search with grep. :-)
> >
> > Is there any other columns besides id and name in the table? How big is
> > products.txt compared to the heap file?
> >
> > > Another question: I have a btree index on product(name). It contains
> > > all product names and the identifiers of the products. Wouldn't it be
> > > easier to seq scan the index instead of seq scan the table? The index
> > > is only 66MB, the table is 1123MB.
> >
> > Probably, but PostgreSQL doesn't know how to do that. Even if it did, it
> > depends on how many matches there is. If you scan the index and then
> > fetch the matching rows from the heap, you're doing random I/O to the
> > heap. That becomes slower than scanning the heap sequentially if you're
> > going to get more than a few hits.
>
> Why match rows from the heap if ALL required data are in the index itself?
> Why look at the heap at all?
>
> This is the same performance problem in PostgreSQL I noticed when doing
> some "SELECT count(*)" queries. Look at this:
>
> explain analyze select count(*) from transakcja where data > '2005-09-09' and
> miesiac >= (9 + 2005 * 12) and kwota < 50;
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=601557.86..601557.87 rows=1 width=0) (actual
> time=26733.479..26733.484 rows=1 loops=1)
>    ->  Bitmap Heap Scan on transakcja  (cost=154878.00..596928.23 rows=1851852
> width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1)
>          Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision))
>          Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone)
>          ->  Bitmap Index Scan on idx_transakcja_miesiac_kwota
> (cost=0.00..154878.00 rows=5555556 width=0) (actual time=9919.967..9919.967
> rows=1690402 loops=1)
>                Index Cond: ((miesiac >= 24069) AND (kwota < 50::double
> precision))
>  Total runtime: 26733.980 ms
> (7 rows)
>
> The actual time retrieving tuples from the index is less than 10 seconds, but
> the system executes needless heap scan that takes up additional 16 seconds.
>
> Best regards,
> Peter
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>


--
PostgreSQL Bootcamp, Big Nerd Ranch Europe, Nov 2006
http://www.bignerdranch.com/news/2006-08-21.shtml

pgsql-performance by date:

Previous
From: Markus Schaber
Date:
Subject: Re: Poor performance on seq scan
Next
From: Jérôme BENOIS
Date:
Subject: Re: High CPU Load