Thread: SELECT count(*) Generating Lots of Write Activity

SELECT count(*) Generating Lots of Write Activity

From
"Logan Bowers"
Date:

Hello,

 

I’m potentially having a strange performance problem.  I have a BIG table: ~100M, ~1KB rows.  I do a SELECT count(*) from it (I know it will be slow) and as I watch procinfo on my DB server I see a huge amount of write activity.  Thus,

 

1)       Why does this statement generate any writes at all?

2)       Is it possible to tweak something to make it avoid doing so? 

3)       Are my tools lying to me (i.e. procinfo is wrong)?

 

Thanks! 

 

Oh, and our server configuration is:

Postgres 8.0.1, Linux 2.6, AMD64, 4GB RAM, and 3TB of storage (two RAID5 volumes striped together)

 

 

Logan Bowers

Re: SELECT count(*) Generating Lots of Write Activity

From
Tom Lane
Date:
"Logan Bowers" <logan@zillow.com> writes:
> I'm potentially having a strange performance problem.  I have a BIG
> table: ~100M, ~1KB rows.  I do a SELECT count(*) from it (I know it will
> be slow) and as I watch procinfo on my DB server I see a huge amount of
> write activity.  Thus,

> 1)       Why does this statement generate any writes at all?

It could be that it's evicting unrelated dirty pages from cache
(although PG 8.0 is supposed to try to avoid doing that during a simple
seqscan).  Another theory is that the table has a whole lot of
recently-changed rows, and the writes are a side effect of the SELECT
setting commit hint bits to tell future transactions what it found out
about the commit status of the rows.

I dunno what procinfo is --- personally I would use strace and see
exactly which file(s) the database processes are issuing writes against.
Also check whether a second SELECT against the same table continues
to issue writes...

            regards, tom lane

Re: SELECT count(*) Generating Lots of Write Activity

From
Douglas McNaught
Date:
"Logan Bowers" <logan@zillow.com> writes:

> I'm potentially having a strange performance problem.  I have a BIG table:
> ~100M, ~1KB rows.  I do a SELECT count(*) from it (I know it will be slow)
> and as I watch procinfo on my DB server I see a huge amount of write
> activity.  Thus,

The only thing I can think of is that you have a lot of dirty page
buffers (either in PG's shared buffer area or the kernel's page cache)
due to prior write activity, and they are getting written out to make
room for the sequential scan's incoming buffers.

> 3)       Are my tools lying to me (i.e. procinfo is wrong)?

Perhaps; if you're on Linux try 'vmstat' instead of procinfo as a
check.

-Doug

Re: SELECT count(*) Generating Lots of Write Activity

From
"Logan Bowers"
Date:
Thanks for the quick responses everyone. It did turn out to be the
commit bit as the table was just loaded (never accessed) and subsequent
SELECTs did not incur any write activity.

As a side note, I saw in the archives a past conversation about adding
an option to disable touching the commit hint bit.  There were some
conversations about possible uses for such a feature and I'd like to
propose this as a common one:
-Load a whole bunch of "raw" data into big table
-Munge/Transform data and insert it into existing, normalized schema
-Drop table of "raw" data

In my case, the "raw" data is on the order of hundreds of gigabytes and
the increased write activity is a HUGE penalty.  Even with smaller data
sets, this relatively common usage pattern could benefit greatly.

Logan Bowers

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, August 01, 2005 7:09 PM
To: Logan Bowers
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] SELECT count(*) Generating Lots of Write Activity


"Logan Bowers" <logan@zillow.com> writes:
> I'm potentially having a strange performance problem.  I have a BIG
> table: ~100M, ~1KB rows.  I do a SELECT count(*) from it (I know it
will
> be slow) and as I watch procinfo on my DB server I see a huge amount
of
> write activity.  Thus,

> 1)       Why does this statement generate any writes at all?

It could be that it's evicting unrelated dirty pages from cache
(although PG 8.0 is supposed to try to avoid doing that during a simple
seqscan).  Another theory is that the table has a whole lot of
recently-changed rows, and the writes are a side effect of the SELECT
setting commit hint bits to tell future transactions what it found out
about the commit status of the rows.

I dunno what procinfo is --- personally I would use strace and see
exactly which file(s) the database processes are issuing writes against.
Also check whether a second SELECT against the same table continues
to issue writes...

            regards, tom lane

Re: SELECT count(*) Generating Lots of Write Activity

From
Tom Lane
Date:
"Logan Bowers" <logan@zillow.com> writes:
> In my case, the "raw" data is on the order of hundreds of gigabytes and
> the increased write activity is a HUGE penalty.

And you think the extra activity from repeated clog tests would not be a
huge penalty?

AFAICS this would only be likely to be a win if you were sure that no
row would be visited more than once before you drop (or truncate) the
containing table.  Which leads me to wonder why you inserted the row
into the database in the first place, instead of doing the data
aggregation on the client side.

            regards, tom lane