Re: [HACKERS] More detail on settings for pgavd? - Mailing list pgsql-performance

From Josh Berkus
Subject Re: [HACKERS] More detail on settings for pgavd?
Date
Msg-id 200311210909.00978.josh@agliodbs.com
Whole thread Raw
In response to More detail on settings for pgavd?  (Josh Berkus <josh@agliodbs.com>)
Responses Re: [HACKERS] More detail on settings for pgavd?  ("Matthew T. O'Connor" <matthew@zeut.net>)
List pgsql-performance
Matthew,

> As long as pg_autovacuum remains a contrib module, I don't think any
> changes to the system catelogs will be make.  If  pg_autovacuum is
> deemed ready to move out of contrib, then we can talk about the above.

But we could create a config file that would store stuff in a flatfile table,
OR we could add our own "system table" that would be created when one
"initializes" pg_avd.

Just an idea.  Mind you, I'm not so sure that we want to focus immediately on
per-table settings.   I think that we want to get the "automatic" settings
working fairly well first; a lot of new DBAs would use the per-table settings
to shoot themselves in the foot.  So we need to be able to make a strong
recommendation to "try the automatic settings first."

> Are you saying that you can vacuum a 1 million row table in 2-4
> minutes?  While a vacuum of the same table with an additional 1 million
> dead tuples would take an hour?

I'm probably exaggerating.  I do know that I can vacuum a fairly clean 1-5
million row table in less than 4 mintues.   I've never let such a table get
to 50% dead tuples, so I don't really know how long that takes.  Call me a
coward if you  like ...

> >I'd be really reluctant to base pv-avd frequency on the fsm settings
> > instead. What if the user loads 8GB of data but leaves fsm_pages at the
> > default of 10,000?  You can't do much with that; you'd have to vacuum if
> > even 1% of the data changed.
>
> Ok, but as you said above it's very easy to set the FSM once you know
> your db size.

Actually, thinking about this I realize that PG_AVD and the Perl-based
postgresql.conf configuration script I was working on (darn, who was doing
that with me?) need to go togther.   With pg_avd, setting max_fsm_pages is
very easy; without it its a bit of guesswork.

So I think we can do this:  for 'auto' settings:

If max_fsm_pages is between 13% and 100% of the total database pages, then set
the vacuum scale factor to match 3/4 of the fsm_pages setting, e.g.
database = 18,000,000 data pages;
max_fsm_pages = 3,600,000;
set vacuum scale factor = 3.6mil/18mil * 3/4 = 0.15

If max_fsm_pages is less than 13% of database pages, issue a warning to the
user (log it, if possible) and set scale factor to 0.1.   If it's greater
than 100% set it to 1 and leave it alone.

> I don't have a good plan as to how to incorporate
> this data, but to a large extent the FSM already tracks table activity
> and gives us the most accurate answer about storage growth (short of
> using something like contrib/pgstattuple which takes nearly the same
> amount of time as an actual vacuum)

I don't really think we need to do dynamic monitoring at this point.   It
would be a lot of engineering to check data page pollution without having
significant performance impact.   It's doable, but something I think we
should hold off until version 3.  It would mean hacking the FSM, which is a
little beyond me right now.

> In my testing, I never changed the default statistics settings.

Ah.  Well, a lot of users do to resolve query problems.

> But we track tuples because we can compare against the count given by
> the stats system.  I don't know of a way (other than looking at the FSM,
> or contrib/pgstattuple ) to see how many dead pages exist.

No, but for scaling you don't need the dynamic count of tuples or of dead
tuples; pg_class holds a reasonable accurate count of pages per table as of
last vacuum.

--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: "Matthew T. O'Connor"
Date:
Subject: Re: [HACKERS] More detail on settings for pgavd?
Next
From: "Matthew T. O'Connor"
Date:
Subject: Re: [HACKERS] More detail on settings for pgavd?