Re: big select is resulting in a large amount of disk writing by kjournald - Mailing list pgsql-performance

From Joseph S
Subject Re: big select is resulting in a large amount of disk writing by kjournald
Date
Msg-id hfp83f$1dcd$1@news.hub.org
Whole thread Raw
In response to Re: big select is resulting in a large amount of disk writing by kjournald  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-performance
Greg Smith wrote:
> Joseph S wrote:
>> Greg Smith wrote:
>>> Joseph S wrote:
>>>> So I run "select count(*) from large_table" and I see in xosview a
>>>> solid block of write activity. Runtime is 28125.644 ms for the first
>>>> run.  The second run does not show a block of write activity and
>>>> takes 3327.441 ms
>>> http://wiki.postgresql.org/wiki/Hint_Bits
>>>
>>
>> Hmm. A large select results in a lot of writes? This seems broken.
>> And if we are setting these hint bits then what do we need VACUUM
>> for?  Is there any way to tune this behavior? Is there any way to get
>> stats on how many rows/pages would need hint bits set?
> Basically, the idea is that if you're pulling a page in for something
> else that requires you to compute the hint bits, just do it now so
> VACUUM doesn't have to later, while you're in there anyway.  Why make
> VACUUM do the work later if you're doing part of it now anyway?  If you

Then why not do all the work the VACUUM does?  What does VACUUM do anyway?

> reorganize your test to VACUUM first *before* running the "select (*)
> from...", you'll discover the writes during SELECT go away.  You're
> running into the worst-case behavior.  For example, if you inserted a
> bunch of things more slowly, you might discover that autovacuum would do
> this cleanup before you even got to looking at the data.

I think autovacuum did hit these tables after slony copied them (I
remember seeing them running). Would the hint bits be set during an
reindex?  For example the indexing slony does after the initial copy?
I'm not sure if slony commits the transaction before it does the
reindex. It probably doesn't.
>

> downsides.  This situation shows up a lot when you're benchmarking
> things, but not as much in the real world, so it's hard to justify
> improving.
>

Actually I think I have been running into this situation.  There were
many reports that ran much faster the second time around than the first
and I assumed it was just because the data was in memory cache.  Now I'm
thinking I was running into this.

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: big select is resulting in a large amount of disk writing by kjournald
Next
From: Mark Stosberg
Date:
Subject: Fw: Help me put 2 Gigs of RAM to use