Thread: Poor performance on seq scan

Poor performance on seq scan

From

Laszlo Nagy

Date:

12 September 2006, 07:17:55

  Hello,

I have a big table called products. Table size: 1123MB. Toast table
size: 32MB. Indexes size: 380MB.
I try to do a query like this:

select id,name from products where name like '%Mug%';

Yes, I know that tsearch2 is better for this, but please read on. The
above query gives this plan:

Seq Scan on product  (cost=0.00..153489.52 rows=31390 width=40)
  Filter: (name ~~ '%Mug%'::text)

When I use this with explain analyze:

"Seq Scan on product  (cost=0.00..153489.52 rows=31390 width=40) (actual
time=878.873..38300.588 rows=72567 loops=1)"
"  Filter: (name ~~ '%Mug%'::text)"
"Total runtime: 38339.026 ms"

Meanwhile, "iostat 5" gives something like this:

 tin tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
   1   14 128.00   1  0.10  128.00   1  0.10   5  0 94  1  0
   0   12 123.98 104 12.56  123.74 104 12.56   8  0 90  2  0
   0   12 125.66 128 15.75  125.26 128 15.68  10  0 85  6  0
   0   12 124.66 129 15.67  124.39 129 15.64  12  0 85  3  0
   0   12 117.13 121 13.87  117.95 121 13.96  12  0 84  5  0
   0   12 104.84 118 12.05  105.84 118 12.19  10  0 87  2  0

130 transfers per second with 12-15MB/sec transfer speed. (FreeBSD 6.1
with two STATA150 drives in gmirror RAID1)

I made another test. I create a file with the identifiers and names of
the products:

psql#\o products.txt
psql#select id,name from product;

Then I can search using grep:

grep "Mug" products.txt | cut -f1 -d\|

There is a huge difference. This command runs within 0.5 seconds. That
is, at least 76 times faster than the seq scan. It is the same if I
vacuum, backup and restore the database. I thought that the table is
stored in one file, and the seq scan will be actually faster than
grepping the file. Can you please tell me what am I doing wrong? I'm not
sure if I can increase the performance of a seq scan by adjusting the
values in postgresql.conf. I do not like the idea of exporting the
product table periodically into a txt file, and search with grep. :-)

Another question: I have a btree index on product(name). It contains all
product names and the identifiers of the products. Wouldn't it be easier
to seq scan the index instead of seq scan the table? The index is only
66MB, the table is 1123MB.

I'm new to this list and also I just recently started to tune postgresql
so please forgive me if this is a dumb question.

Regards,

   Laszlo

Re: Poor performance on seq scan

From

Heikki Linnakangas

Date:

12 September 2006, 07:47:50

Laszlo Nagy wrote:
> I made another test. I create a file with the identifiers and names of
> the products:
>
> psql#\o products.txt
> psql#select id,name from product;
>
> Then I can search using grep:
>
> grep "Mug" products.txt | cut -f1 -d\|
>
> There is a huge difference. This command runs within 0.5 seconds. That
> is, at least 76 times faster than the seq scan. It is the same if I
> vacuum, backup and restore the database. I thought that the table is
> stored in one file, and the seq scan will be actually faster than
> grepping the file. Can you please tell me what am I doing wrong? I'm
> not sure if I can increase the performance of a seq scan by adjusting
> the values in postgresql.conf. I do not like the idea of exporting the
> product table periodically into a txt file, and search with grep. :-)

Is there any other columns besides id and name in the table? How big is
products.txt compared to the heap file?

> Another question: I have a btree index on product(name). It contains
> all product names and the identifiers of the products. Wouldn't it be
> easier to seq scan the index instead of seq scan the table? The index
> is only 66MB, the table is 1123MB.

Probably, but PostgreSQL doesn't know how to do that. Even if it did, it
depends on how many matches there is. If you scan the index and then
fetch the matching rows from the heap, you're doing random I/O to the
heap. That becomes slower than scanning the heap sequentially if you're
going to get more than a few hits.


--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Re: Poor performance on seq scan

From

"Luke Lonergan"

Date:

12 September 2006, 07:50:36

Lazlo,

> Meanwhile, "iostat 5" gives something like this:
>
>  tin tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
>    1   14 128.00   1  0.10  128.00   1  0.10   5  0 94  1  0
>    0   12 123.98 104 12.56  123.74 104 12.56   8  0 90  2  0

This is your problem.  Do the following and report the results here:

Take the number of GB of memory you have (say 2 for 2GB), multiply it by
250000.  This is the number of 8KB pages you can fit in twice your ram.
Let's say you have 2GB - the result is 500,000.

Use that number to do the following test on your database directory:
  time bash -c "dd if=/dev/zero of=/<dbdir>/bigfile bs=8k
count=<number_from_above> && sync"

Then do this:
  time bash -c "dd if=/<dbdir>/bigfile of=/dev/null bs=8k"

>
> I made another test. I create a file with the identifiers and
> names of the products:
>
> psql#\o products.txt
> psql#select id,name from product;
>
> Then I can search using grep:
>
> grep "Mug" products.txt | cut -f1 -d\|
>
> There is a huge difference. This command runs within 0.5
> seconds. That is, at least 76 times faster than the seq scan.

The file probably fits in the I/O cache.  Your disks will at most go
between 60-80MB/s, or from 5-7 times faster than what you see now.  RAID
1 with one query will only deliver one disk worth of bandwidth.

- Luke

Re: Poor performance on seq scan

From

Laszlo Nagy

Date:

12 September 2006, 09:15:49

Luke Lonergan írta:
> Lazlo,
>
>
>> Meanwhile, "iostat 5" gives something like this:
>>
>>  tin tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
>>    1   14 128.00   1  0.10  128.00   1  0.10   5  0 94  1  0
>>    0   12 123.98 104 12.56  123.74 104 12.56   8  0 90  2  0
>>
>
> This is your problem.  Do the following and report the results here:
>
> Take the number of GB of memory you have (say 2 for 2GB), multiply it by
> 250000.  This is the number of 8KB pages you can fit in twice your ram.
> Let's say you have 2GB - the result is 500,000.
>
> Use that number to do the following test on your database directory:
>   time bash -c "dd if=/dev/zero of=/<dbdir>/bigfile bs=8k
> count=<number_from_above> && sync"
>
I have 1GB RAM. The data directory is in /usr/local/pgsql/data. The root
of this fs is /usr.

time sh -c "dd if=/dev/zero of=/usr/test/bigfile bs=8k count=250000 &&
sync "

250000+0 records in
250000+0 records out
2048000000 bytes transferred in 48.030627 secs (42639460 bytes/sec)
0.178u 8.912s 0:48.31 18.7%     9+96k 37+15701io 0pf+0w


> Then do this:
>   time bash -c "dd if=/<dbdir>/bigfile of=/dev/null bs=8k"
>
time sh -c "dd if=/usr/test/bigfile of=/dev/null bs=8k"

250000+0 records in
250000+0 records out
2048000000 bytes transferred in 145.293473 secs (14095609 bytes/sec)
0.110u 5.857s 2:25.31 4.1%      10+99k 32923+0io 0pf+0w

At this point I thought there was another process reading doing I/O so I
retried:

250000+0 records in
250000+0 records out
2048000000 bytes transferred in 116.395211 secs (17595226 bytes/sec)
0.137u 5.658s 1:56.51 4.9%      10+103k 29082+0io 0pf+1w

and again:

250000+0 records in
250000+0 records out
2048000000 bytes transferred in 120.198224 secs (17038521 bytes/sec)
0.063u 5.780s 2:00.21 4.8%      10+98k 29776+0io 0pf+0w

This is a mirrored disk with two SATA disks. In theory, writing should
be slower than reading. Is this a hardware problem? Or is it that "sync"
did not do the sync?

  Laszlo

Re: Poor performance on seq scan

From

Laszlo Nagy

Date:

12 September 2006, 09:37:10

Heikki Linnakangas wrote:
>
> Is there any other columns besides id and name in the table? How big
> is products.txt compared to the heap file?
Yes, many other columns. The products.txt is only 59MB. It is similar to
the size of the index size (66MB).
>
>> Another question: I have a btree index on product(name). It contains
>> all product names and the identifiers of the products. Wouldn't it be
>> easier to seq scan the index instead of seq scan the table? The index
>> is only 66MB, the table is 1123MB.
>
> Probably, but PostgreSQL doesn't know how to do that. Even if it did,
> it depends on how many matches there is. If you scan the index and
> then fetch the matching rows from the heap, you're doing random I/O to
> the heap. That becomes slower than scanning the heap sequentially if
> you're going to get more than a few hits.
I have 700 000 rows in the table, and usually there are less than 500
hits. So probably using a "seq index scan" would be faster. :-) Now I
also tried this:

create table test(id int8 not null primary key, name text);
insert into test select id,name from product;

And then:

zeusd1=> explain analyze select id,name from test where name like
'%Tiffany%';
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..26559.62 rows=79 width=40) (actual
time=36.595..890.903 rows=117 loops=1)
   Filter: (name ~~ '%Tiffany%'::text)
 Total runtime: 891.063 ms
(3 rows)

But this might be coming from the disk cache. Thank you for your
comments. We are making progress.

   Laszlo

Re: Poor performance on seq scan

From

Guillaume Cottenceau

Date:

12 September 2006, 09:37:25

Laszlo Nagy <gandalf 'at' designaproduct.biz> writes:

> This is a mirrored disk with two SATA disks. In theory, writing should
> be slower than reading. Is this a hardware problem? Or is it that
> "sync" did not do the sync?

SATA disks are supposed to be capable of lying to pg's fsync (pg
asking the kernel to synchronize a write and waiting until it is
finished). Same can probably happen to the "sync" command.

--
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

Re: Poor performance on seq scan

From

Guillaume Cottenceau

Date:

12 September 2006, 09:47:18

Laszlo Nagy <gandalf 'at' designaproduct.biz> writes:

> > Probably, but PostgreSQL doesn't know how to do that. Even if it
> > did, it depends on how many matches there is. If you scan the index
> > and then fetch the matching rows from the heap, you're doing random
> > I/O to the heap. That becomes slower than scanning the heap
> > sequentially if you're going to get more than a few hits.
> I have 700 000 rows in the table, and usually there are less than 500
> hits. So probably using a "seq index scan" would be faster. :-) Now I

You can confirm this idea by temporarily disabling sequential
scans. Have a look at this chapter:

http://www.postgresql.org/docs/7.4/interactive/runtime-config.html#RUNTIME-CONFIG-QUERY

--
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

Re: Poor performance on seq scan

From

Heikki Linnakangas

Date:

12 September 2006, 10:25:02

Guillaume Cottenceau wrote:
> Laszlo Nagy <gandalf 'at' designaproduct.biz> writes:
>
>>> Probably, but PostgreSQL doesn't know how to do that. Even if it
>>> did, it depends on how many matches there is. If you scan the index
>>> and then fetch the matching rows from the heap, you're doing random
>>> I/O to the heap. That becomes slower than scanning the heap
>>> sequentially if you're going to get more than a few hits.
>>>
>> I have 700 000 rows in the table, and usually there are less than 500
>> hits. So probably using a "seq index scan" would be faster. :-) Now I
>>
>
> You can confirm this idea by temporarily disabling sequential
> scans. Have a look at this chapter:
>

I don't think it will anyway do a "seq index scan" as Laszlo envisions.
PostgreSQL cannot do "fetch index tuple and apply %Mug% to it. If it
matches, fetch heap tuple". Even if you disable sequential scans, it's
still going to fetch every heap tuple to see if it matches "%Mug%". It's
just going to do it in index order, which is slower than a seq scan.

BTW:  in addition to setting enable_seqscan=false, you probably have to
add a dummy where-clause like "name > ''" to force the index scan.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: Poor performance on seq scan

From

Alvaro Herrera

Date:

12 September 2006, 10:33:14

Heikki Linnakangas wrote:
> Guillaume Cottenceau wrote:
> >Laszlo Nagy <gandalf 'at' designaproduct.biz> writes:
> >
> >>>Probably, but PostgreSQL doesn't know how to do that. Even if it
> >>>did, it depends on how many matches there is. If you scan the index
> >>>and then fetch the matching rows from the heap, you're doing random
> >>>I/O to the heap. That becomes slower than scanning the heap
> >>>sequentially if you're going to get more than a few hits.
> >>>
> >>I have 700 000 rows in the table, and usually there are less than 500
> >>hits. So probably using a "seq index scan" would be faster. :-) Now I
> >>
> >
> >You can confirm this idea by temporarily disabling sequential
> >scans. Have a look at this chapter:
>
> I don't think it will anyway do a "seq index scan" as Laszlo envisions.
> PostgreSQL cannot do "fetch index tuple and apply %Mug% to it. If it
> matches, fetch heap tuple". Even if you disable sequential scans, it's
> still going to fetch every heap tuple to see if it matches "%Mug%". It's
> just going to do it in index order, which is slower than a seq scan.

Are you saying that an indexscan "Filter" only acts after getting the
heap tuple?  If that's the case, then there's room for optimization
here, namely if the affected column is part of the index key, then we
could do the filtering before fetching the heap tuple.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Poor performance on seq scan

From

Heikki Linnakangas

Date:

12 September 2006, 10:45:43

Alvaro Herrera wrote:
> Are you saying that an indexscan "Filter" only acts after getting the
> heap tuple? If that's the case, then there's room for optimization
> here, namely if the affected column is part of the index key, then we
> could do the filtering before fetching the heap tuple.

That's right. Yes, there's definitely room for optimization. In general,
it seems we should detach the index scan and heap fetch more. Perhaps
make them two different nodes, like the bitmap index scan and bitmap
heap scan. It would allow us to do the above. It's also going to be
necessary if we ever get to implement index-only scans.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Re: Poor performance on seq scan

From

Tom Lane

Date:

12 September 2006, 11:52:35

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Are you saying that an indexscan "Filter" only acts after getting the
> heap tuple?

Correct.

> If that's the case, then there's room for optimization
> here, namely if the affected column is part of the index key, then we
> could do the filtering before fetching the heap tuple.

Only if the index is capable of disgorging the original value of the
indexed column, a fact not in evidence in general (counterexample:
polygons indexed by their bounding boxes in an r-tree).  But yeah,
it's interesting to think about applying filters at the index fetch
step for index types that can hand back full values.  This has been
discussed before --- I think we had gotten as far as speculating about
doing joins with just index values.  See eg here:
http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php
A lot of the low-level concerns have already been dealt with in order to
support bitmap indexscans, but applying non-indexable conditions before
fetching from the heap is still not done.

            regards, tom lane

Re: Poor performance on seq scan

From

"Luke Lonergan"

Date:

12 September 2006, 13:22:57

Lazlo,

You can ignore tuning postgres and trying to use indexes, your problem is a bad hardware / OS configuration.  The disks
youare using should read 4-5 times faster than they are doing.  Look to the SATA chipset driver in your FreeBSD config
-perhaps upgrading your kernel would help. 

Still, the most you should expect is 5-6 times faster query than before.  The data in your table is slightly larger
thanRAM.  When you took it out of the DBMS it was smaller than RAM, so it fit in the I/O cache. 

With a text scan query you are stuck with a seqscan unless you use a text index like tsearch.  Buy more disks and a
Raidcontroller and use Raid5 or Raid10. 

- Luke

Msg is shrt cuz m on ma treo

 -----Original Message-----
From:     Laszlo Nagy [mailto:gandalf@designaproduct.biz]
Sent:    Tuesday, September 12, 2006 08:16 AM Eastern Standard Time
To:    Luke Lonergan; pgsql-performance@postgresql.org
Subject:    Re: [PERFORM] Poor performance on seq scan

Luke Lonergan írta:
> Lazlo,
>
>
>> Meanwhile, "iostat 5" gives something like this:
>>
>>  tin tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
>>    1   14 128.00   1  0.10  128.00   1  0.10   5  0 94  1  0
>>    0   12 123.98 104 12.56  123.74 104 12.56   8  0 90  2  0
>>
>
> This is your problem.  Do the following and report the results here:
>
> Take the number of GB of memory you have (say 2 for 2GB), multiply it by
> 250000.  This is the number of 8KB pages you can fit in twice your ram.
> Let's say you have 2GB - the result is 500,000.
>
> Use that number to do the following test on your database directory:
>   time bash -c "dd if=/dev/zero of=/<dbdir>/bigfile bs=8k
> count=<number_from_above> && sync"
>
I have 1GB RAM. The data directory is in /usr/local/pgsql/data. The root
of this fs is /usr.

time sh -c "dd if=/dev/zero of=/usr/test/bigfile bs=8k count=250000 &&
sync "

250000+0 records in
250000+0 records out
2048000000 bytes transferred in 48.030627 secs (42639460 bytes/sec)
0.178u 8.912s 0:48.31 18.7%     9+96k 37+15701io 0pf+0w


> Then do this:
>   time bash -c "dd if=/<dbdir>/bigfile of=/dev/null bs=8k"
>
time sh -c "dd if=/usr/test/bigfile of=/dev/null bs=8k"

250000+0 records in
250000+0 records out
2048000000 bytes transferred in 145.293473 secs (14095609 bytes/sec)
0.110u 5.857s 2:25.31 4.1%      10+99k 32923+0io 0pf+0w

At this point I thought there was another process reading doing I/O so I
retried:

250000+0 records in
250000+0 records out
2048000000 bytes transferred in 116.395211 secs (17595226 bytes/sec)
0.137u 5.658s 1:56.51 4.9%      10+103k 29082+0io 0pf+1w

and again:

250000+0 records in
250000+0 records out
2048000000 bytes transferred in 120.198224 secs (17038521 bytes/sec)
0.063u 5.780s 2:00.21 4.8%      10+98k 29776+0io 0pf+0w

This is a mirrored disk with two SATA disks. In theory, writing should
be slower than reading. Is this a hardware problem? Or is it that "sync"
did not do the sync?

  Laszlo

Re: Poor performance on seq scan

From

Tom Lane

Date:

12 September 2006, 13:52:18

Laszlo Nagy <gandalf@designaproduct.biz> writes:
> Meanwhile, "iostat 5" gives something like this:

>  tin tout  KB/t tps  MB/s   KB/t   tps  MB/s  us ni sy in id
>    1   14 128.00   1  0.10  128.00   1  0.10   5  0 94  1  0
>    0   12 123.98 104 12.56  123.74 104 12.56   8  0 90  2  0
>    0   12 125.66 128 15.75  125.26 128 15.68  10  0 85  6  0
>    0   12 124.66 129 15.67  124.39 129 15.64  12  0 85  3  0
>    0   12 117.13 121 13.87  117.95 121 13.96  12  0 84  5  0
>    0   12 104.84 118 12.05  105.84 118 12.19  10  0 87  2  0

Why is that showing 85+ percent *system* CPU time??  I could believe a
lot of idle CPU if the query is I/O bound, or a lot of user time if PG
was being a hog about doing the ~~ comparisons (not too unlikely BTW).
But if the kernel is eating all the CPU, there's something very wrong,
and I don't think it's Postgres' fault.

            regards, tom lane

tsearch2 question (was: Poor performance on seq scan)

From

Laszlo Nagy

Date:

12 September 2006, 14:02:32

Tom Lane wrote:
> Only if the index is capable of disgorging the original value of the
> indexed column, a fact not in evidence in general (counterexample:
> polygons indexed by their bounding boxes in an r-tree).  But yeah,
> it's interesting to think about applying filters at the index fetch
> step for index types that can hand back full values.  This has been
> discussed before --- I think we had gotten as far as speculating about
> doing joins with just index values.  See eg here:
> http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php
> A lot of the low-level concerns have already been dealt with in order to
> support bitmap indexscans, but applying non-indexable conditions before
> fetching from the heap is still not done.
>
To overcome this problem, I created a smaller "shadow" table:

CREATE TABLE product_search
(
  id int8 NOT NULL,
  name_desc text,
  CONSTRAINT pk_product_search PRIMARY KEY (id)
);

insert into product_search
  select
    id,
    name || ' ' || coalesce(description,'')
  from product;


Obviously, this is almost like an index, but I need to maintain it
manually. I'm able to search with

zeusd1=> explain analyze select id from product_search where name_desc
like '%Mug%';
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Seq Scan on product_search  (cost=0.00..54693.34 rows=36487 width=8)
(actual time=20.036..2541.971 rows=91399 loops=1)
   Filter: (name_desc ~~ '%Mug%'::text)
 Total runtime: 2581.272 ms
(3 rows)

The total runtime remains below 3 sec in all cases. Of course I still
need to join the main table to the result:

explain analyze select s.id,p.name from product_search s inner join
product p on (p.id = s.id) where s.name_desc like '%Tiffany%'

      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..55042.84 rows=58 width=40) (actual
time=164.437..3982.610 rows=117 loops=1)
   ->  Seq Scan on product_search s  (cost=0.00..54693.34 rows=58
width=8) (actual time=103.651..2717.914 rows=117 loops=1)
         Filter: (name_desc ~~ '%Tiffany%'::text)
   ->  Index Scan using pk_product_id on product p  (cost=0.00..6.01
rows=1 width=40) (actual time=10.793..10.796 rows=1 loops=117)
         Index Cond: (p.id = "outer".id)
 Total runtime: 4007.283 ms
(6 rows)

Took 4 seconds. Awesome! With the original table, it used to be one or
two minutes!

Now you can ask, why am I not using tsearch2 for this? Here is answer:

CREATE TABLE product_search
(
  id int8 NOT NULL,
  ts_name_desc tsvector,
  CONSTRAINT pk_product_search PRIMARY KEY (id)
);

insert into product_search
  select
    id,
    to_tsvector(name || ' ' coalesce(description,''))
  from product;

CREATE INDEX idx_product_search_ts_name_desc  ON product_search  USING
gist  (ts_name_desc);
VACUUM product_search;

zeusd1=> explain analyze select id from product_search where
ts_name_desc @@ to_tsquery('mug');
                                                                   QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------


Bitmap Heap Scan on product_search  (cost=25.19..3378.20 rows=912
width=8) (actual time=954.669..13112.009 rows=91434 loops=1)
  Filter: (ts_name_desc @@ '''mug'''::tsquery)
  ->  Bitmap Index Scan on idx_product_search_ts_name_desc
(cost=0.00..25.19 rows=912 width=0) (actual time=932.455..932.455
rows=91436 loops=1)
        Index Cond: (ts_name_desc @@ '''mug'''::tsquery)
Total runtime: 13155.724 ms
(5 rows)

zeusd1=> explain analyze select id from product_search where
ts_name_desc @@ to_tsquery('tiffany');

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------


Bitmap Heap Scan on product_search  (cost=25.19..3378.20 rows=912
width=8) (actual time=13151.725..13639.112 rows=76 loops=1)
  Filter: (ts_name_desc @@ '''tiffani'''::tsquery)
  ->  Bitmap Index Scan on idx_product_search_ts_name_desc
(cost=0.00..25.19 rows=912 width=0) (actual time=13123.705..13123.705
rows=81 loops=1)
        Index Cond: (ts_name_desc @@ '''tiffani'''::tsquery)
Total runtime: 13639.478 ms
(5 rows)

At least 13 seconds, and the main table is not joined yet. Can anybody
explain to me, why the seq scan is faster than the bitmap index? In the
last example there were only 81 rows returned, but it took more than 13
seconds. :(  Even if the whole table can be cached into memory (which
isn't the case), the bitmap index should be much faster. Probably there
is a big problem with my schema but I cannot find it. What am I doing wrong?

Thanks,

   Laszlo

Re: Poor performance on seq scan

From

Laszlo Nagy

Date:

12 September 2006, 14:13:07

Tom Lane wrote:
> Why is that showing 85+ percent *system* CPU time??  I could believe a
> lot of idle CPU if the query is I/O bound, or a lot of user time if PG
> was being a hog about doing the ~~ comparisons (not too unlikely BTW).
>
I'm sorry, this was really confusing. I don't know what it was -
probably a background system process, started from cron (?). I retried
the same query and I got this:

zeusd1=> explain analyze select id,name from product where name like
'%Mug%';
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Seq Scan on product  (cost=0.00..206891.34 rows=36487 width=40) (actual
time=17.188..44585.176 rows=91399 loops=1)
   Filter: (name ~~ '%Mug%'::text)
 Total runtime: 44631.150 ms
(3 rows)

     tty             ad4              ad6             cpu
 tin tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
 0   62 115.25 143 16.06  116.03 143 16.17   3  0  9  3 85
   0   62 122.11 144 17.12  121.78 144 17.07   6  0  3  2 89
   0   62 126.18 158 19.45  125.86 157 19.28   5  0 11  6 79
   0   62 126.41 131 16.13  127.52 132 16.39   5  0  9  6 80
   0   62 127.80 159 19.81  126.89 158 19.55   5  0  9  0 86
   0   62 125.29 165 20.15  126.26 165 20.30   5  0 14  2 80
   0   62 127.22 164 20.32  126.74 165 20.37   5  0  9  0 86
   0   62 121.34 150 17.75  120.76 149 17.54   1  0 13  3 82
   0   62 121.40 143 16.92  120.33 144 16.89   5  0 11  3 82
   0   62 127.38 154 19.12  127.17 154 19.09   8  0  8  5 80
   0   62 126.88 129 15.95  127.00 128 15.84   5  0  9  5 82
   0   62 118.48 121 13.97  119.28 121 14.06   6  0 17  3 74
   0   62 127.23 146 18.10  126.79 146 18.04   9  0 20  2 70
   0   62 127.27 153 18.98  128.00 154 19.21   5  0 17  0 79
   0   62 127.02 130 16.09  126.28 130 16.00  10  0 16  3 70
   0   62 123.17 125 15.00  122.40 125 14.91   5  0 14  2 80
   0   62 112.37 130 14.24  112.62 130 14.27   0  0 14  3 83
   0   62 115.83 138 15.58  113.97 138 15.33   3  0 18  0 79

A bit better transfer rate, but nothing serious.

Regards,

   Laszlo

Re: tsearch2 question (was: Poor performance on seq

From

"Luke Lonergan"

Date:

12 September 2006, 15:46:31

Lazlo,

On 9/12/06 10:01 AM, "Laszlo Nagy" <gandalf@designaproduct.biz> wrote:

> zeusd1=> explain analyze select id from product_search where name_desc
> like '%Mug%';
>                                                        QUERY PLAN
> ------------------------------------------------------------------------------
> ------------------------------------------
>  Seq Scan on product_search  (cost=0.00..54693.34 rows=36487 width=8)
> (actual time=20.036..2541.971 rows=91399 loops=1)
>    Filter: (name_desc ~~ '%Mug%'::text)
>  Total runtime: 2581.272 ms
> (3 rows)
>
> The total runtime remains below 3 sec in all cases.

By creating a table with only the name field you are searching, you have
just reduced the size of rows so that they fit in memory.  That is why your
query runs faster.

If your searched data doesn't grow, this is fine.  If it does, you will need
to fix your disk drive OS problem.

- Luke

Re: Poor performance on seq scan

From

Tom Lane

Date:

12 September 2006, 17:53:38

Laszlo Nagy <gandalf@designaproduct.biz> writes:
> Tom Lane wrote:
>> Why is that showing 85+ percent *system* CPU time??

> I'm sorry, this was really confusing. I don't know what it was -
> probably a background system process, started from cron (?). I retried
> the same query and I got this:
> [ around 80% idle CPU, 10% system, < 10% user ]

OK, so then the thing really is I/O bound, and Luke is barking up the
right tree.  The system CPU percentage still seems high though.
I wonder if there is a software aspect to your I/O speed woes ...
could the thing be doing PIO instead of DMA for instance?

            regards, tom lane

Re: Poor performance on seq scan

From

"Craig A. James"

Date:

12 September 2006, 18:07:27

>>  tin tout  KB/t tps  MB/s   KB/t   tps  MB/s  us ni sy in id
>>    1   14 128.00   1  0.10  128.00   1  0.10   5  0 94  1  0
>>    0   12 123.98 104 12.56  123.74 104 12.56   8  0 90  2  0
>>    0   12 125.66 128 15.75  125.26 128 15.68  10  0 85  6  0
>>    0   12 124.66 129 15.67  124.39 129 15.64  12  0 85  3  0
>>    0   12 117.13 121 13.87  117.95 121 13.96  12  0 84  5  0
>>    0   12 104.84 118 12.05  105.84 118 12.19  10  0 87  2  0
>
> Why is that showing 85+ percent *system* CPU time??  I could believe a
> lot of idle CPU if the query is I/O bound, or a lot of user time if PG
> was being a hog about doing the ~~ comparisons (not too unlikely BTW).
> But if the kernel is eating all the CPU, there's something very wrong,
> and I don't think it's Postgres' fault.

There IS a bug for SATA disk drives in some versions of the Linux kernel.  On a lark I ran some of the I/O tests in
thisthread, and much to my surprise discovered my write speed was 6 MB/sec ... ouch!  On an identical machine,
differentkernel, the write speed was 54 MB/sec. 

A couple of hours of research turned up this:

   https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=168363

The fix for me was to edit /boot/grub/grub.conf, like this:

   kernel /vmlinuz-2.6.12-1.1381_FC3 ro root=LABEL=/ rhgb quiet \
       ramdisk_size=12000000 ide0=noprobe ide1=noprobe

Notice the "ideX=noprobe".  Instant fix -- after reboot the disk write speed jumped to what I expected.

Craig

Re: Poor performance on seq scan

From

Laszlo Nagy

Date:

12 September 2006, 18:49:38

Craig A. James wrote:
>
> There IS a bug for SATA disk drives in some versions of the Linux
> kernel.  On a lark I ran some of the I/O tests in this thread, and
> much to my surprise discovered my write speed was 6 MB/sec ... ouch!
> On an identical machine, different kernel, the write speed was 54 MB/sec.
My disks are running in SATA150 mode. Whatever it means.

I'm using FreeBSD, and not just because it dynamically alters the
priority of long running  processes. :-)

  Laszlo

Re: Poor performance on seq scan

From

"Luke Lonergan"

Date:

12 September 2006, 19:19:05

Lazlo,

On 9/12/06 2:49 PM, "Laszlo Nagy" <gandalf@designaproduct.biz> wrote:

> I'm using FreeBSD, and not just because it dynamically alters the
> priority of long running  processes. :-)

Understood.

Linux and FreeBSD often share some driver technology.

I have had extremely bad performance historically with onboard SATA chipsets
on Linux.  The one exception has been with the Intel based chipsets (not the
CPU, the I/O chipset).

It is very likely that you are having problems with the driver for the
chipset.

Are you running RAID1 in hardware?  If so, turn it off and see what the
performance is.  The onboard hardware RAID is worse than useless, it
actually slows the I/O down.

If you want RAID with onboard chipsets, use software RAID, or buy an adapter
from 3Ware or Areca for $200.

- Luke

Re: Poor performance on seq scan

From

Mark Kirkwood

Date:

12 September 2006, 21:15:32

Laszlo Nagy wrote:
> Craig A. James wrote:
>>
>> There IS a bug for SATA disk drives in some versions of the Linux
>> kernel.  On a lark I ran some of the I/O tests in this thread, and
>> much to my surprise discovered my write speed was 6 MB/sec ... ouch!
>> On an identical machine, different kernel, the write speed was 54 MB/sec.
> My disks are running in SATA150 mode. Whatever it means.
>
> I'm using FreeBSD, and not just because it dynamically alters the
> priority of long running  processes. :-)
>

I dunno if this has been suggested, but try changing the sysctl
vfs.read_max. The default is 8 and results in horrible RAID performance
(having said that, not sure if RAID1 is effected, only striped RAID
levels...), anyway try 16 or 32 and see if you seq IO rate improves at
all (tho the underlying problem does look like a poor SATA
chipset/driver combination).

I also found that building your ufs2 filesystems with 32K blocks and 4K
fragments improved sequential performance considerably (even for 8K reads).

Cheers

Mark

Re: Poor performance on seq scan

From

Laszlo Nagy

Date:

13 September 2006, 07:16:52

> I have had extremely bad performance historically with onboard SATA chipsets
> on Linux.  The one exception has been with the Intel based chipsets (not the
> CPU, the I/O chipset).
>
This board has Intel chipset. I cannot remember the exact type but it
was not in the low end category.
dmesg says:

<Intel ICH7 SATA300 controller>
kernel: ad4: 152626MB <SAMSUNG HD160JJ ZM100-33> at ata2-master SATA150
kernel: ad4: 152627MB <SAMSUNG HD160JJ ZM100-33> at ata3-master SATA150

> It is very likely that you are having problems with the driver for the
> chipset.
>
> Are you running RAID1 in hardware?  If so, turn it off and see what the
> performance is.  The onboard hardware RAID is worse than useless, it
> actually slows the I/O down.
>
I'm using software raid, namely gmirror:

GEOM_MIRROR: Device gm0 created (id=2574033628).
GEOM_MIRROR: Device gm0: provider ad4 detected.
GEOM_MIRROR: Device gm0: provider ad6 detected.
GEOM_MIRROR: Device gm0: provider ad4 activated.
GEOM_MIRROR: Device gm0: provider ad6 activated.

#gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 2574033628
Providers:
1. Name: mirror/gm0
   Mediasize: 160040803328 (149G)
   Sectorsize: 512
   Mode: r5w5e6
Consumers:
1. Name: ad4
   Mediasize: 160040803840 (149G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 1153981856
2. Name: ad6
   Mediasize: 160041885696 (149G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 3520427571


I tried to do:

#sysctl vfs.read_max=32
vfs.read_max: 6 -> 32

but I could not reach better disk read performance.

Thank you for your suggestions. Looks like I need to buy SCSI disks.

Regards,

   Laszlo

Re: Poor performance on seq scan

From

Dave Cramer

Date:

13 September 2006, 09:22:55

On 13-Sep-06, at 6:16 AM, Laszlo Nagy wrote:

>
>> I have had extremely bad performance historically with onboard
>> SATA chipsets
>> on Linux.  The one exception has been with the Intel based
>> chipsets (not the
>> CPU, the I/O chipset).
>>
> This board has Intel chipset. I cannot remember the exact type but
> it was not in the low end category.
> dmesg says:
>
> <Intel ICH7 SATA300 controller>
> kernel: ad4: 152626MB <SAMSUNG HD160JJ ZM100-33> at ata2-master
> SATA150
> kernel: ad4: 152627MB <SAMSUNG HD160JJ ZM100-33> at ata3-master
> SATA150
>
>> It is very likely that you are having problems with the driver for
>> the
>> chipset.
>>
>> Are you running RAID1 in hardware?  If so, turn it off and see
>> what the
>> performance is.  The onboard hardware RAID is worse than useless, it
>> actually slows the I/O down.
>>
> I'm using software raid, namely gmirror:
>
> GEOM_MIRROR: Device gm0 created (id=2574033628).
> GEOM_MIRROR: Device gm0: provider ad4 detected.
> GEOM_MIRROR: Device gm0: provider ad6 detected.
> GEOM_MIRROR: Device gm0: provider ad4 activated.
> GEOM_MIRROR: Device gm0: provider ad6 activated.
>
> #gmirror list
> Geom name: gm0
> State: COMPLETE
> Components: 2
> Balance: round-robin
> Slice: 4096
> Flags: NONE
> GenID: 0
> SyncID: 1
> ID: 2574033628
> Providers:
> 1. Name: mirror/gm0
>   Mediasize: 160040803328 (149G)
>   Sectorsize: 512
>   Mode: r5w5e6
> Consumers:
> 1. Name: ad4
>   Mediasize: 160040803840 (149G)
>   Sectorsize: 512
>   Mode: r1w1e1
>   State: ACTIVE
>   Priority: 0
>   Flags: DIRTY
>   GenID: 0
>   SyncID: 1
>   ID: 1153981856
> 2. Name: ad6
>   Mediasize: 160041885696 (149G)
>   Sectorsize: 512
>   Mode: r1w1e1
>   State: ACTIVE
>   Priority: 0
>   Flags: DIRTY
>   GenID: 0
>   SyncID: 1
>   ID: 3520427571
>
>
> I tried to do:
>
> #sysctl vfs.read_max=32
> vfs.read_max: 6 -> 32
>
> but I could not reach better disk read performance.
>
> Thank you for your suggestions. Looks like I need to buy SCSI disks.

Well before you go do that try the areca SATA raid card
>
> Regards,
>
>   Laszlo
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Poor performance on seq scan

From

"Luke Lonergan"

Date:

13 September 2006, 17:35:20

Lazlo,

>> Thank you for your suggestions. Looks like I need to buy SCSI disks.
>
> Well before you go do that try the areca SATA raid card

Yes, by all means spend $200 and buy the Areca or 3Ware RAID card - it's a
simple switch out of the cables and you should be golden.

Again - you should only expect an increase in performance from 4-6 times
from what you are getting now unless you increase the number of disks.

- Luke

Re: Poor performance on seq scan

From

Ivan Voras

Date:

13 September 2006, 18:21:48

pgsql-performance-owner@postgresql.org wrote:

> This board has Intel chipset. I cannot remember the exact type but it
> was not in the low end category.
> dmesg says:
>
> <Intel ICH7 SATA300 controller>
> kernel: ad4: 152626MB <SAMSUNG HD160JJ ZM100-33> at ata2-master SATA150
> kernel: ad4: 152627MB <SAMSUNG HD160JJ ZM100-33> at ata3-master SATA150

There have been reported problems with ICH7 on FreeBSD mailing lists,
though I can't find any that affect performance.

> Components: 2
> Balance: round-robin
> Slice: 4096

See if changing balance algorithm to "split", and slice size to 8192 or
more, while keeping vfs.read_max to 16 or more helps your performance.

(e.g. gmirror configure -b split -s 8192 gm0)

Also, how is your file system mounted? (what does output from 'mount' say?)

Re: Poor performance on seq scan

From

Piotr Kołaczkowski

Date:

17 September 2006, 02:22:45

On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote:
> Laszlo Nagy wrote:
> > I made another test. I create a file with the identifiers and names of
> > the products:
> >
> > psql#\o products.txt
> > psql#select id,name from product;
> >
> > Then I can search using grep:
> >
> > grep "Mug" products.txt | cut -f1 -d\|
> >
> > There is a huge difference. This command runs within 0.5 seconds. That
> > is, at least 76 times faster than the seq scan. It is the same if I
> > vacuum, backup and restore the database. I thought that the table is
> > stored in one file, and the seq scan will be actually faster than
> > grepping the file. Can you please tell me what am I doing wrong? I'm
> > not sure if I can increase the performance of a seq scan by adjusting
> > the values in postgresql.conf. I do not like the idea of exporting the
> > product table periodically into a txt file, and search with grep. :-)
>
> Is there any other columns besides id and name in the table? How big is
> products.txt compared to the heap file?
>
> > Another question: I have a btree index on product(name). It contains
> > all product names and the identifiers of the products. Wouldn't it be
> > easier to seq scan the index instead of seq scan the table? The index
> > is only 66MB, the table is 1123MB.
>
> Probably, but PostgreSQL doesn't know how to do that. Even if it did, it
> depends on how many matches there is. If you scan the index and then
> fetch the matching rows from the heap, you're doing random I/O to the
> heap. That becomes slower than scanning the heap sequentially if you're
> going to get more than a few hits.

Why match rows from the heap if ALL required data are in the index itself?
Why look at the heap at all?

This is the same performance problem in PostgreSQL I noticed when doing
some "SELECT count(*)" queries. Look at this:

explain analyze select count(*) from transakcja where data > '2005-09-09' and
miesiac >= (9 + 2005 * 12) and kwota < 50;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=601557.86..601557.87 rows=1 width=0) (actual
time=26733.479..26733.484 rows=1 loops=1)
   ->  Bitmap Heap Scan on transakcja  (cost=154878.00..596928.23 rows=1851852
width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1)
         Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision))
         Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone)
         ->  Bitmap Index Scan on idx_transakcja_miesiac_kwota
(cost=0.00..154878.00 rows=5555556 width=0) (actual time=9919.967..9919.967
rows=1690402 loops=1)
               Index Cond: ((miesiac >= 24069) AND (kwota < 50::double
precision))
 Total runtime: 26733.980 ms
(7 rows)

The actual time retrieving tuples from the index is less than 10 seconds, but
the system executes needless heap scan that takes up additional 16 seconds.

Best regards,
Peter

Re: Poor performance on seq scan

From

Markus Schaber

Date:

18 September 2006, 05:22:19

Hi, Piotr,

Piotr Kołaczkowski wrote:

> Why match rows from the heap if ALL required data are in the index itself?
> Why look at the heap at all?

Because the index does not contain any transaction informations, so it
has to look to the heap to find out which of the rows are current.

This is one of the more debated points in the PostgreSQL way of MVCC
implementation.

Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: Poor performance on seq scan

From

"Guido Neitzer"

Date:

18 September 2006, 05:50:55

Because there is no MVCC information in the index.

cug

2006/9/12, Piotr Kołaczkowski <P.Kolaczkowski@elka.pw.edu.pl>:
> On Tuesday 12 September 2006 12:47, Heikki Linnakangas wrote:
> > Laszlo Nagy wrote:
> > > I made another test. I create a file with the identifiers and names of
> > > the products:
> > >
> > > psql#\o products.txt
> > > psql#select id,name from product;
> > >
> > > Then I can search using grep:
> > >
> > > grep "Mug" products.txt | cut -f1 -d\|
> > >
> > > There is a huge difference. This command runs within 0.5 seconds. That
> > > is, at least 76 times faster than the seq scan. It is the same if I
> > > vacuum, backup and restore the database. I thought that the table is
> > > stored in one file, and the seq scan will be actually faster than
> > > grepping the file. Can you please tell me what am I doing wrong? I'm
> > > not sure if I can increase the performance of a seq scan by adjusting
> > > the values in postgresql.conf. I do not like the idea of exporting the
> > > product table periodically into a txt file, and search with grep. :-)
> >
> > Is there any other columns besides id and name in the table? How big is
> > products.txt compared to the heap file?
> >
> > > Another question: I have a btree index on product(name). It contains
> > > all product names and the identifiers of the products. Wouldn't it be
> > > easier to seq scan the index instead of seq scan the table? The index
> > > is only 66MB, the table is 1123MB.
> >
> > Probably, but PostgreSQL doesn't know how to do that. Even if it did, it
> > depends on how many matches there is. If you scan the index and then
> > fetch the matching rows from the heap, you're doing random I/O to the
> > heap. That becomes slower than scanning the heap sequentially if you're
> > going to get more than a few hits.
>
> Why match rows from the heap if ALL required data are in the index itself?
> Why look at the heap at all?
>
> This is the same performance problem in PostgreSQL I noticed when doing
> some "SELECT count(*)" queries. Look at this:
>
> explain analyze select count(*) from transakcja where data > '2005-09-09' and
> miesiac >= (9 + 2005 * 12) and kwota < 50;
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=601557.86..601557.87 rows=1 width=0) (actual
> time=26733.479..26733.484 rows=1 loops=1)
>    ->  Bitmap Heap Scan on transakcja  (cost=154878.00..596928.23 rows=1851852
> width=0) (actual time=9974.208..18796.060 rows=1654218 loops=1)
>          Recheck Cond: ((miesiac >= 24069) AND (kwota < 50::double precision))
>          Filter: (data > '2005-09-09 00:00:00'::timestamp without time zone)
>          ->  Bitmap Index Scan on idx_transakcja_miesiac_kwota
> (cost=0.00..154878.00 rows=5555556 width=0) (actual time=9919.967..9919.967
> rows=1690402 loops=1)
>                Index Cond: ((miesiac >= 24069) AND (kwota < 50::double
> precision))
>  Total runtime: 26733.980 ms
> (7 rows)
>
> The actual time retrieving tuples from the index is less than 10 seconds, but
> the system executes needless heap scan that takes up additional 16 seconds.
>
> Best regards,
> Peter
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>


--
PostgreSQL Bootcamp, Big Nerd Ranch Europe, Nov 2006
http://www.bignerdranch.com/news/2006-08-21.shtml