Re: Slow count(*) again... - Mailing list pgsql-performance

From Neil Whelchel
Subject Re: Slow count(*) again...
Date
Msg-id 201010130138.39893.neil.whelchel@gmail.com
Whole thread Raw
In response to Re: Slow count(*) again...  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Responses Re: Slow count(*) again...
Re: Slow count(*) again...
Re: Slow count(*) again...
List pgsql-performance
On Wednesday 13 October 2010 00:19:26 Mark Kirkwood wrote:
> On 13/10/10 19:47, Neil Whelchel wrote:
> > Nope...
> > So, possible conclusions are:
> > 1. Even with VACUUM database table speed degrades as tables are updated.
> > 2. Time testing on a freshly INSERTed table gives results that are not
> > real- world.
> > 3. Filesystem defragmentation helps (some).
> > 4. Cache only makes a small difference once a table has been UPDATEd.
> >
> > I am going to leave this configuration running for the next day or so.
> > This way I can try any suggestions and play with any more ideas that I
> > have. I will try these same tests on ext4 later, along with any good
> > suggested tests.
> > I will try MySQL with the dame data with both XFS and ext4.
> > -Neil-
>
> I think that major effect you are seeing here is that the UPDATE has
> made the table twice as big on disk (even after VACUUM etc), and it has
> gone from fitting in ram to not fitting in ram - so cannot be
> effectively cached anymore.
>
> This would not normally happen in real life (assuming UPDATEs only
> modify a small part of a table  per transaction). However administration
> updates (e.g 'oh! -  ref 1 should now be ref 2 please update
> everything') *will* cause the table size to double.
>
> This is an artifact of Postgres's non overwriting storage manager -
> Mysql will update in place and you will not see this.
>
> Try VACUUM FULL on the table and retest.
>
> regards
>
> Mark

There seems to be allot of discussion about VACUUM FULL, and its problems. The
overall buzz seems to be that VACUUM FULL is a bad idea (I could be wrong
here). It has been some time since I have read the changelogs, but I seem to
remember that there have been some major changes to VACUUM FULL recently.
Maybe this needs to be re-visited in the documentation.

crash:~# time psql -U test test -c "VACUUM FULL log;"
VACUUM

real    4m49.055s
user    0m0.000s
sys     0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
  count
----------
 10050886
(1 row)

real    0m9.665s
user    0m0.000s
sys     0m0.004s

A huge improvement from the minute and a half before the VACUUM FULL.
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
  count
----------
 10050886
(1 row)

real    0m3.786s
user    0m0.000s
sys     0m0.000s

And the cache helps...
So, we are right back to within 10ms of where we started after INSERTing the
data, but it took a VACUUM FULL to accomplish this (by making the table fit in
RAM).
This is a big problem on a production machine as the VACUUM FULL is likely to
get in the way of INSERTing realtime data into the table.

So to add to the conclusion pile:
5. When you have no control over the WHERE clause which may send count(*)
through more rows of a table that would fit in RAM your performance will be
too slow, so count is missing a LIMIT feature to avoid this.
6. Keep tables that are to be updated frequently as narrow as possible: Link
them to wider tables to store the columns that are less frequently updated.

So with our conclusion pile so far we can deduce that if we were to keep all
of our data in two column tables (one to link them together, and the other to
store one column of data), we stand a much better chance of making the entire
table to be counted fit in RAM, so we simply apply the WHERE clause to a
specific table as opposed to a column within a wider table... This seems to
defeat the entire goal of the relational database...

-Neil-

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: SQL functions vs. PL/PgSQL functions
Next
From: Mladen Gogala
Date:
Subject: Re: Slow count(*) again...