Thread: Re: [PERFORM] unusual performance for vac following 8.2 upgrade

Re: [PERFORM] unusual performance for vac following 8.2 upgrade

From
Tom Lane
Date:
Kim <kim@myemma.com> writes:
> We were running on 8.1.1 previous to upgrading to 8.2, and yes, we
> definitely have a heafty pg_class. The inheritance model is heavily used
> in our schema (the results of the group by you wanted to see are down
> below).  However, no significant problems were seen with vacs while we
> were on 8.1.

Odd, because the 8.1 code looks about the same, and it is perfectly
obvious in hindsight that its runtime is about O(N^2) in the number of
relations :-(.  At least that'd be the case if the stats collector
output were fully populated.  Did you have either stats_block_level or
stats_row_level turned on in 8.1?  If not, maybe the reason for the
change is that in 8.2, that table *will* be pretty fully populated,
because now it's got a last-vacuum-time entry that gets made even if the
stats are otherwise turned off.  Perhaps making that non-disablable
wasn't such a hot idea :-(.

What I think we need to do about this is

(1) fix pgstat_vacuum_tabstats to have non-O(N^2) behavior; I'm thinking
of using a hash table for the OIDs instead of a linear list.  Should be
a pretty small change; I'll work on it today.

(2) Reconsider whether last-vacuum-time should be sent to the collector
unconditionally.

Comments from hackers?

            regards, tom lane

Re: [PERFORM] unusual performance for vac following 8.2 upgrade

From
Alvaro Herrera
Date:
Tom Lane wrote:

> What I think we need to do about this is
>
> (1) fix pgstat_vacuum_tabstats to have non-O(N^2) behavior; I'm thinking
> of using a hash table for the OIDs instead of a linear list.  Should be
> a pretty small change; I'll work on it today.
>
> (2) Reconsider whether last-vacuum-time should be sent to the collector
> unconditionally.

(2) seems a perfectly reasonably answer, but ISTM (1) would be good to
have anyway (at least in HEAD).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [PERFORM] unusual performance for vac following 8.2 upgrade

From
"Simon Riggs"
Date:
On Thu, 2007-01-11 at 14:45 -0500, Tom Lane wrote:
> Kim <kim@myemma.com> writes:
> > We were running on 8.1.1 previous to upgrading to 8.2, and yes, we
> > definitely have a heafty pg_class. The inheritance model is heavily used
> > in our schema (the results of the group by you wanted to see are down
> > below).  However, no significant problems were seen with vacs while we
> > were on 8.1.
>
> Odd, because the 8.1 code looks about the same, and it is perfectly
> obvious in hindsight that its runtime is about O(N^2) in the number of
> relations :-(.  At least that'd be the case if the stats collector
> output were fully populated.  Did you have either stats_block_level or
> stats_row_level turned on in 8.1?  If not, maybe the reason for the
> change is that in 8.2, that table *will* be pretty fully populated,
> because now it's got a last-vacuum-time entry that gets made even if the
> stats are otherwise turned off.  Perhaps making that non-disablable
> wasn't such a hot idea :-(.
>
> What I think we need to do about this is
>
> (1) fix pgstat_vacuum_tabstats to have non-O(N^2) behavior; I'm thinking
> of using a hash table for the OIDs instead of a linear list.  Should be
> a pretty small change; I'll work on it today.
>
> (2) Reconsider whether last-vacuum-time should be sent to the collector
> unconditionally.
>
> Comments from hackers?

It's not clear to me how this fix will alter the INSERT issue Kim
mentions. Are those issues connected? Or are you thinking that handling
stats in a tight loop is slowing down other aspects of the system?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: [PERFORM] unusual performance for vac following 8.2 upgrade

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> It's not clear to me how this fix will alter the INSERT issue Kim
> mentions.

I didn't say that it would; we have no information on the INSERT issue,
so I'm just concentrating on the problem that he did provide info on.

(BTW, I suppose the slow-\d issue is the regex planning problem we
already knew about.)

I'm frankly not real surprised that there are performance issues with
such a huge pg_class; it's not a regime that anyone's spent any time
optimizing.  It is interesting that 8.2 seems to have regressed but
I can think of several places that would've been bad before.  One is
that there are seqscans of pg_inherits ...

            regards, tom lane

Re: [PERFORM] unusual performance for vac following 8.2 upgrade

From
"Jim C. Nasby"
Date:
On Thu, Jan 11, 2007 at 04:49:28PM -0300, Alvaro Herrera wrote:
> Tom Lane wrote:
>
> > What I think we need to do about this is
> >
> > (1) fix pgstat_vacuum_tabstats to have non-O(N^2) behavior; I'm thinking
> > of using a hash table for the OIDs instead of a linear list.  Should be
> > a pretty small change; I'll work on it today.
> >
> > (2) Reconsider whether last-vacuum-time should be sent to the collector
> > unconditionally.
>
> (2) seems a perfectly reasonably answer, but ISTM (1) would be good to
> have anyway (at least in HEAD).

Actually, I'd rather see the impact #1 has before adding #2... If #1
means we're good for even someone with 10M relations, I don't see much
point in #2.

BTW, we're now starting to see more users with a large number of
relations, thanks to partitioning. It would probably be wise to expand
test coverage for that case, especially when it comes to performance.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: [PERFORM] unusual performance for vac following 8.2 upgrade

From
Tom Lane
Date:
I wrote:
> (2) Reconsider whether last-vacuum-time should be sent to the collector
> unconditionally.

Actually, now that I look, the collector already contains this logic:

    /*
     * Don't create either the database or table entry if it doesn't already
     * exist.  This avoids bloating the stats with entries for stuff that is
     * only touched by vacuum and not by live operations.
     */

and ditto for analyze messages.  So my idea that the addition of
last-vac-time was causing an increase in the statistics file size
compared to 8.1 seems wrong.

How large is your $PGDATA/global/pgstat.stat file, anyway?

            regards, tom lane

Re: [PERFORM] unusual performance for vac following 8.2 upgrade

From
Kim
Date:
Our pgstats.stat file is 40M for 8.2, on 8.1 it was 33M. Our schema size hasn't grown *that* much in the two weeks since we upgraded

I'm not sure if this sheds any more light on the situation, but in scanning down through the process output from truss, it looks like the first section of output was a large chunk of reads on pgstat.stat, followed by a larger chunk of reads on the global directory and directories under base - this whole section probably went on for a good 6-7 minutes, though I would say the reads on pgstat likely finished within a couple of minutes or so. Following this there was a phase were it did a lot of seeks and reads on files under pg_clog, and it was while doing this (or perhaps it had finished whatever it wanted with clogs) it dropped into the send()/SIGUSR1 loop that goes for another several minutes.

Kim


Tom Lane wrote:
I wrote: 
(2) Reconsider whether last-vacuum-time should be sent to the collector
unconditionally.   
Actually, now that I look, the collector already contains this logic:
   /*    * Don't create either the database or table entry if it doesn't already    * exist.  This avoids bloating the stats with entries for stuff that is    * only touched by vacuum and not by live operations.    */

and ditto for analyze messages.  So my idea that the addition of
last-vac-time was causing an increase in the statistics file size
compared to 8.1 seems wrong.

How large is your $PGDATA/global/pgstat.stat file, anyway?
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
 

Re: [PERFORM] unusual performance for vac following 8.2upgrade

From
"Simon Riggs"
Date:
On Thu, 2007-01-11 at 16:11 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > It's not clear to me how this fix will alter the INSERT issue Kim
> > mentions.
>
> I didn't say that it would; we have no information on the INSERT issue,
> so I'm just concentrating on the problem that he did provide info on.

OK.

> I'm frankly not real surprised that there are performance issues with
> such a huge pg_class; it's not a regime that anyone's spent any time
> optimizing.

Yeh, I saw a pg_class that big once, but it just needed a VACUUM.

Temp relations still make pg_class entried don't they? Is that on the
TODO list to change?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: [PERFORM] unusual performance for vac following 8.2upgrade

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> Temp relations still make pg_class entried don't they? Is that on the
> TODO list to change?

Yeah, and pg_attribute entries as well, which may be more problematic
because they are a lot.  Did we get rid of pg_attribute entries for
system attributes already?

Can we actually get rid of pg_class entries for temp tables.  Maybe
creating a "temp pg_class" which would be local to each session?  Heck,
it doesn't even have to be an actual table -- it just needs to be
somewhere from where we can load entries into the relcache.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [PERFORM] unusual performance for vac following 8.2upgrade

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Can we actually get rid of pg_class entries for temp tables.  Maybe
> creating a "temp pg_class" which would be local to each session?  Heck,
> it doesn't even have to be an actual table -- it just needs to be
> somewhere from where we can load entries into the relcache.

A few things to think about:

1. You'll break a whole lotta client-side code if temp tables disappear
from pg_class.  This is probably solvable --- one thought is to give
pg_class an inheritance child that is a view on a SRF that reads out the
stored-in-memory rows for temp pg_class entries.  Likewise for
pg_attribute and everything else related to a table definition.

2. How do you keep the OIDs for temp tables (and their associated
rowtypes) from conflicting with OIDs for real tables?  Given the way
that OID generation works, there wouldn't be any real problem unless a
temp table survived for as long as it takes the OID counter to wrap all
the way around --- but in a database that has WITH OIDS user tables,
that might not be impossibly long ...

3. What about dependencies on user-defined types, functions, etc?
How will you get things to behave sanely if one backend tries to drop a
type that some other backend is using in a column of a temp table?  Even
if you put entries into pg_depend, which would kind of defeat the point
of not having on-disk catalog entries for temp tables, I don't see how
the other backend figures out what the referencing object is.

I don't really see any solution to that last point :-(

            regards, tom lane

Re: [PERFORM] unusual performance for vac following 8.2upgrade

From
Richard Huxton
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Can we actually get rid of pg_class entries for temp tables.  Maybe
>> creating a "temp pg_class" which would be local to each session?  Heck,
>> it doesn't even have to be an actual table -- it just needs to be
>> somewhere from where we can load entries into the relcache.
>
> A few things to think about:
>
> 1. You'll break a whole lotta client-side code if temp tables disappear
> from pg_class.

> 2. How do you keep the OIDs for temp tables (and their associated
> rowtypes) from conflicting with OIDs for real tables?

> 3. What about dependencies on user-defined types, functions, etc?

Is there not some gain from just a "standard" partitioning of pg_class
into: (system-objects, user-objects, temp-objects)? I'd expect them to
form a hierarchy of change+vacuum rates (if you see what I mean).

--
   Richard Huxton
   Archonet Ltd

Re: [PERFORM] unusual performance for vac following 8.2upgrade

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> 3. What about dependencies on user-defined types, functions, etc?
> How will you get things to behave sanely if one backend tries to drop a
> type that some other backend is using in a column of a temp table?  Even
> if you put entries into pg_depend, which would kind of defeat the point
> of not having on-disk catalog entries for temp tables, I don't see how
> the other backend figures out what the referencing object is.

We could just lock the object it depends on. Only really makes sense for very
temporary tables though, not tables a session expects to use for a long series
of transactions.

Another direction to go to address the same problem would be to implement the
standard temporary table concept of a permanent table definition for which
each session gets a different actual set of data which is reset frequently.
Then the meta-data isn't changing frequently.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: [PERFORM] unusual performance for vac following 8.2upgrade

From
"Jim C. Nasby"
Date:
On Thu, Jan 11, 2007 at 09:51:39PM -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Can we actually get rid of pg_class entries for temp tables.  Maybe
> > creating a "temp pg_class" which would be local to each session?  Heck,
> > it doesn't even have to be an actual table -- it just needs to be
> > somewhere from where we can load entries into the relcache.
> 
> A few things to think about:
> 
> 1. You'll break a whole lotta client-side code if temp tables disappear
> from pg_class.  This is probably solvable --- one thought is to give
> pg_class an inheritance child that is a view on a SRF that reads out the
> stored-in-memory rows for temp pg_class entries.  Likewise for
> pg_attribute and everything else related to a table definition.
> 
> 2. How do you keep the OIDs for temp tables (and their associated
> rowtypes) from conflicting with OIDs for real tables?  Given the way
> that OID generation works, there wouldn't be any real problem unless a
> temp table survived for as long as it takes the OID counter to wrap all
> the way around --- but in a database that has WITH OIDS user tables,
> that might not be impossibly long ...
> 
> 3. What about dependencies on user-defined types, functions, etc?
> How will you get things to behave sanely if one backend tries to drop a
> type that some other backend is using in a column of a temp table?  Even
> if you put entries into pg_depend, which would kind of defeat the point
> of not having on-disk catalog entries for temp tables, I don't see how
> the other backend figures out what the referencing object is.
> 
> I don't really see any solution to that last point :-(

Perhaps it would be better to partition pg_class and _attributes based
on whether an object is temporary or not. Granted, that still means
vacuuming is a consideration, but at least it wouldn't be affecting
pg_class itself. Separating temp objects out would also make it more
reasonable to have the system automatically vacuum those tables after
every X number of dropped objects.

Unfortunately, that still wouldn't help with the OID issue. :( Unless
there was a SERIAL column in pg_class_temp and other parts of the system
could differentiate between temp and non-temp objects.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: [PERFORM] unusual performance for vac following 8.2

From
Kim
Date:
Hello again Tom,

We have our upgrade to 8.2.1 scheduled for this weekend, and we noticed your message regarding the vacuum patch being applied to 8.2 and back-patched. I expect I know the answer to this next question :) but I was wondering if the patch referenced below has also been bundled into the normal source download of 8.2.1 or if we would still need to manually apply it?

- Fix a performance problem in databases with large numbers of tables
  (or other types of pg_class entry): the function
  pgstat_vacuum_tabstat, invoked during VACUUM startup, had runtime
  proportional to the number of stats table entries times the number
  of pg_class rows; in other words O(N^2) if the stats collector's
  information is reasonably complete.  Replace list searching with a
  hash table to bring it back to O(N) behavior.  Per report from kim
  at myemma.com.  Back-patch as far as 8.1; 8.0 and before use
  different coding here.

Thanks,
Kim


Tom Lane wrote:
I wrote: 
What I think we need to do about this is
(1) fix pgstat_vacuum_tabstats to have non-O(N^2) behavior; I'm thinking
of using a hash table for the OIDs instead of a linear list.  Should be
a pretty small change; I'll work on it today.   
I've applied the attached patch to 8.2 to do the above.  Please give it
a try and see how much it helps for you.  Some limited testing here
confirms a noticeable improvement in VACUUM startup time at 10000
tables, and of course it should be 100X worse with 100000 tables.

I am still confused why you didn't see the problem in 8.1, though.
This code is just about exactly the same in 8.1.  Maybe you changed
your stats collector settings when moving to 8.2?
		regards, tom lane
 

Index: pgstat.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.140 diff -c -r1.140 pgstat.c *** pgstat.c 21 Nov 2006 20:59:52 -0000 1.140 --- pgstat.c 11 Jan 2007 22:32:30 -0000 *************** *** 159,164 **** --- 159,165 ---- static void pgstat_read_statsfile(HTAB **dbhash, Oid onlydb); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); + static HTAB *pgstat_collect_oids(Oid catalogid); static void pgstat_setheader(PgStat_MsgHdr *hdr, StatMsgType mtype); static void pgstat_send(void *msg, int len); *************** *** 657,666 **** void pgstat_vacuum_tabstat(void) { ! List *oidlist; ! Relation rel; ! HeapScanDesc scan; ! HeapTuple tup; PgStat_MsgTabpurge msg; HASH_SEQ_STATUS hstat; PgStat_StatDBEntry *dbentry; --- 658,664 ---- void pgstat_vacuum_tabstat(void) { ! HTAB *htab; PgStat_MsgTabpurge msg; HASH_SEQ_STATUS hstat; PgStat_StatDBEntry *dbentry; *************** *** 679,693 **** /* * Read pg_database and make a list of OIDs of all existing databases */ ! oidlist = NIL; ! rel = heap_open(DatabaseRelationId, AccessShareLock); ! scan = heap_beginscan(rel, SnapshotNow, 0, NULL); ! while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) ! { ! oidlist = lappend_oid(oidlist, HeapTupleGetOid(tup)); ! } ! heap_endscan(scan); ! heap_close(rel, AccessShareLock); /* * Search the database hash table for dead databases and tell the --- 677,683 ---- /* * Read pg_database and make a list of OIDs of all existing databases */ ! htab = pgstat_collect_oids(DatabaseRelationId); /* * Search the database hash table for dead databases and tell the *************** *** 698,709 **** { Oid dbid = dbentry->databaseid; ! if (!list_member_oid(oidlist, dbid)) pgstat_drop_database(dbid); } /* Clean up */ ! list_free(oidlist); /* * Lookup our own database entry; if not found, nothing more to do. --- 688,701 ---- { Oid dbid = dbentry->databaseid; ! CHECK_FOR_INTERRUPTS(); ! ! if (hash_search(htab, (void *) &dbid, HASH_FIND, NULL) == NULL) pgstat_drop_database(dbid); } /* Clean up */ ! hash_destroy(htab); /* * Lookup our own database entry; if not found, nothing more to do. *************** *** 717,731 **** /* * Similarly to above, make a list of all known relations in this DB. */ ! oidlist = NIL; ! rel = heap_open(RelationRelationId, AccessShareLock); ! scan = heap_beginscan(rel, SnapshotNow, 0, NULL); ! while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) ! { ! oidlist = lappend_oid(oidlist, HeapTupleGetOid(tup)); ! } ! heap_endscan(scan); ! heap_close(rel, AccessShareLock); /* * Initialize our messages table counter to zero --- 709,715 ---- /* * Similarly to above, make a list of all known relations in this DB. */ ! htab = pgstat_collect_oids(RelationRelationId); /* * Initialize our messages table counter to zero *************** *** 738,750 **** hash_seq_init(&hstat, dbentry->tables); while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&hstat)) != NULL) { ! if (list_member_oid(oidlist, tabentry->tableid)) continue; /* * Not there, so add this table's Oid to the message */ ! msg.m_tableid[msg.m_nentries++] = tabentry->tableid; /* * If the message is full, send it out and reinitialize to empty --- 722,738 ---- hash_seq_init(&hstat, dbentry->tables); while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&hstat)) != NULL) { ! Oid tabid = tabentry->tableid; ! ! CHECK_FOR_INTERRUPTS(); ! ! if (hash_search(htab, (void *) &tabid, HASH_FIND, NULL) != NULL) continue; /* * Not there, so add this table's Oid to the message */ ! msg.m_tableid[msg.m_nentries++] = tabid; /* * If the message is full, send it out and reinitialize to empty *************** *** 776,782 **** } /* Clean up */ ! list_free(oidlist); } --- 764,813 ---- } /* Clean up */ ! hash_destroy(htab); ! } ! ! ! /* ---------- ! * pgstat_collect_oids() - ! * ! * Collect the OIDs of either all databases or all tables, according to ! * the parameter, into a temporary hash table. Caller should hash_destroy ! * the result when done with it. ! * ---------- ! */ ! static HTAB * ! pgstat_collect_oids(Oid catalogid) ! { ! HTAB *htab; ! HASHCTL hash_ctl; ! Relation rel; ! HeapScanDesc scan; ! HeapTuple tup; ! ! memset(&hash_ctl, 0, sizeof(hash_ctl)); ! hash_ctl.keysize = sizeof(Oid); ! hash_ctl.entrysize = sizeof(Oid); ! hash_ctl.hash = oid_hash; ! htab = hash_create("Temporary table of OIDs", ! PGSTAT_TAB_HASH_SIZE, ! &hash_ctl, ! HASH_ELEM | HASH_FUNCTION); ! ! rel = heap_open(catalogid, AccessShareLock); ! scan = heap_beginscan(rel, SnapshotNow, 0, NULL); ! while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL) ! { ! Oid thisoid = HeapTupleGetOid(tup); ! ! CHECK_FOR_INTERRUPTS(); ! ! (void) hash_search(htab, (void *) &thisoid, HASH_ENTER, NULL); ! } ! heap_endscan(scan); ! heap_close(rel, AccessShareLock); ! ! return htab; }