Thread: table/index fillfactor control
This is a patch for table/index fillfactor control discussed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00175.php Fillfactor works on CREATE, INSERT, UPDATE, COPY, VACUUM FULL, CLUSTER and REINDEX, but is not done on dump/restore and GIN access method; I'd like to ask those module developers to complete the patch, please. This patch might help the TODO-item: Allow CREATE INDEX to take an additional parameter for use with special index types but the patch rejects parameters except fillfactor currently. If we want to implement the feature, it is needed to consider how to store generic parameters passed at CREATE. The following syntax are added: - Table creation: - CREATE TABLE table (columns) WITH (...) - CREATE TABLE table WITH (...) AS SELECT/EXECUTE ... - Index creation: - CREATE INDEX index ON table USING btree (columns) WITH (...) - CREATE TABLE table (i integer PRIMARY KEY WITH (...)) - ALTER TABLE table ADD PRIMARY KEY (columns) WITH (...) - Alterating parameters for existing tables/indexes: - ALTER TABLE/INDEX name SET (...) The folloing is a test of the patch: # CREATE TABLE tbl (i int) WITH (fillfactor=50); # CREATE INDEX idx ON tbl USING btree (i) WITH (fillfactor=50); # INSERT INTO tbl SELECT generate_series(1, 100000); | relname | relfillfactor | relpages | +---------+---------------+----------+ | tbl | 50 | 1087 | | idx | 50 | 494 | # ALTER TABLE tbl SET (fillfactor = 100); # ALTER INDEX idx SET (fillfactor = 100); # CLUSTER idx ON tbl; # REINDEX INDEX idx; | relname | relfillfactor | relpages | +---------+---------------+----------+ | tbl | 100 | 541 | | idx | 100 | 249 | --- ITAGAKI Takahiro NTT OSS Center
Attachment
On Tue, 2006-06-06 at 18:02 +0900, ITAGAKI Takahiro wrote: > This is a patch for table/index fillfactor control This is a good new feature and I'll vote for it. I see what Tom was driving at with earlier comments. I think we need an non-index AM specific patch, so that each AM has its own parameters. So we have a new element of the RelationData struct: void *rd_amopts; Which each AM defines and interprets. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On 6/6/06, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote: > This is a patch for table/index fillfactor control discussed in > http://archives.postgresql.org/pgsql-hackers/2006-06/msg00175.php There's 4 shift/reduce conflicts which I believe are caused by having used WITH... did you plan to fix this? -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
On 6/6/06, Jonah H. Harris <jonah.harris@gmail.com> wrote: > On 6/6/06, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote: > > This is a patch for table/index fillfactor control discussed in > > http://archives.postgresql.org/pgsql-hackers/2006-06/msg00175.php > > There's 4 shift/reduce conflicts which I believe are caused by having > used WITH... did you plan to fix this? BTW, I think this is nice functionality and definitely second Simon & Tom's ideas. Thanks for picking it up again :) -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
"Jonah H. Harris" <jonah.harris@gmail.com> wrote: > There's 4 shift/reduce conflicts which I believe are caused by having > used WITH... did you plan to fix this? Thanks for pointing out. I realized it is a confliction between WITH OIDS and WITH (options). I'll try to treat CreateStmt.hasoids as one of the options internally, with SQL-level backward compatibility. --- ITAGAKI Takahiro NTT OSS Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > "Jonah H. Harris" <jonah.harris@gmail.com> wrote: >> There's 4 shift/reduce conflicts which I believe are caused by having >> used WITH... did you plan to fix this? > Thanks for pointing out. I realized it is a confliction between > WITH OIDS and WITH (options). > I'll try to treat CreateStmt.hasoids as one of the options internally, > with SQL-level backward compatibility. There was some discussion recently about how we'll have to make WITH a fully reserved keyword eventually anyway to handle SQL99 recursive queries. I'm not sure if that's really true, but reserving WITH is a fallback position we should consider, if avoiding the shift/reduce conflict seems unreasonably messy otherwise. regards, tom lane
This is a revised fillfactor patch. It uses WITH syntax and we can add new AM specific parameters easily. Simon Riggs <simon@2ndquadrant.com> wrote: > I see what Tom was driving at with earlier comments. I think we need an > non-index AM specific patch, so that each AM has its own parameters. I added amparseoption and amdumpoption to pg_am. The amparseoption parses options and convert them to an internal structure per AM. The amdumpoption converts the structure to a human-readable text to dump the definition. It might be better to make the result of amdumpoption to be a list of (name, value) pairs instead of a plain text. > So we have a new element of the RelationData struct: > void *rd_amopts; > Which each AM defines and interprets. The internal structure is stored in the pg_class.relamopaque column as bytea. I guess it is not the best and there is room for discussion. I examined the following ideas, but they had complexities and difficulties. 1. Add AM specific system tables (pg_heap, pg_btree, etc.) that may inherit pg_class. But it will impact the current source code terribly. 2. Store the structures in AM's meta page. But heaps don't have meta pages. 3. Store them into an array of text column that newly added to pg_class. But we hove to re-parse the array every time relations are loaded. 4. Add new system table, pg_class_option (relid, option name, value). But it has same problem as 3 and needs additional heap scannings. Therefore, I choose the as-is binary format to store the internal structures. Any comments or better ideas? --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
On Fri, 2006-06-16 at 13:33 +0900, ITAGAKI Takahiro wrote: > This is a revised fillfactor patch. It uses WITH syntax and > we can add new AM specific parameters easily. Cool. I'll look at that in more detail. > > So we have a new element of the RelationData struct: > > void *rd_amopts; > > Which each AM defines and interprets. > > The internal structure is stored in the pg_class.relamopaque column as bytea. > I guess it is not the best and there is room for discussion. I examined the > following ideas, but they had complexities and difficulties. > > 1. Add AM specific system tables (pg_heap, pg_btree, etc.) that may inherit > pg_class. But it will impact the current source code terribly. Hmmm, yep, not a good idea. > 2. Store the structures in AM's meta page. But heaps don't have meta pages. But perhaps they should? That sounds very similar to the idea of non-transactional pg_class data. It would make a lot of sense if heaps had meta pages too, and that the data within them was cached on the relcache just as index meta page data will be for 8.2 > 3. Store them into an array of text column that newly added to pg_class. > But we hove to re-parse the array every time relations are loaded. Not sure if thats a big overhead? Is it a big array? I hope not. We should be aiming for as few parameters as possible for an index/heap, otherwise we'll never be able to determine their correct settings. > 4. Add new system table, pg_class_option (relid, option name, value). > But it has same problem as 3 and needs additional heap scannings. No thanks. > Therefore, I choose the as-is binary format to store the internal structures. > Any comments or better ideas? Well, its either metapages or array-on-pg_class for me. The metagpages thought would require some consolidation from various other proposals, so we'll need to wait for wider discussion on that. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > On Fri, 2006-06-16 at 13:33 +0900, ITAGAKI Takahiro wrote: >> 2. Store the structures in AM's meta page. But heaps don't have meta pages. > But perhaps they should? That sounds very similar to the idea of > non-transactional pg_class data. The disadvantage of putting this stuff into metapages is that then you need some entirely new protocol for letting clients get at it (and pg_dump, for one, needs to). I agree with putting it in a catalog. An opaque bytea won't do though. What I'd suggest is something real close to the format used for GUC parameters in ALTER DATABASE SET and ALTER USER SET, ie, pairs of keyword/value strings. This way pg_dump doesn't need very much smarts about what the values are that it's dumping. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > The disadvantage of putting this stuff into metapages is that then you > need some entirely new protocol for letting clients get at it (and > pg_dump, for one, needs to). > An opaque bytea won't do though. What I'd suggest is something real > close to the format used for GUC parameters in ALTER DATABASE SET and > ALTER USER SET, ie, pairs of keyword/value strings. Ok, I'll consult the ALTER DATABASE SET codes. Storing in arrays might make it difficult to retrieve relations that match conditions specified by clients. However, I think such queirs are not used so many times. And if necessary, we can provide a helper function to extract a value from an array, like "valueof(reloptions, 'fillfactor') > 90" --- ITAGAKI Takahiro NTT Open Source Software Center
Simon Riggs <simon@2ndquadrant.com> wrote: > > 2. Store the structures in AM's meta page. But heaps don't have meta pages. > > But perhaps they should? That sounds very similar to the idea of > non-transactional pg_class data. Presently, I suppose the parameters are not modified so many times. They are set only on build time or maintenance. I think we will need metapages in the future, but not right now. If we will provide an automatic configurator for fillfactors (or other parameters), parameters might be changed every time the configurator runs, for example, per autovacuum. > The metagpages thought would require some consolidation from various > other proposals, so we'll need to wait for wider discussion on that. I agree, but it is easy to move the metadata from catalog to metapages. So the metapages thought can be reconciled among proposals that must need it. (pg_class_nt and dead tuples bitmaps?) --- ITAGAKI Takahiro NTT Open Source Software Center
On Mon, 2006-06-19 at 14:36 +0900, ITAGAKI Takahiro wrote: > Simon Riggs <simon@2ndquadrant.com> wrote: > > > > 2. Store the structures in AM's meta page. But heaps don't have meta pages. > > > > But perhaps they should? That sounds very similar to the idea of > > non-transactional pg_class data. > > Presently, I suppose the parameters are not modified so many times. > They are set only on build time or maintenance. > > I think we will need metapages in the future, but not right now. If we will > provide an automatic configurator for fillfactors (or other parameters), > parameters might be changed every time the configurator runs, for example, > per autovacuum. Yes, I can see that future too. > > The metagpages thought would require some consolidation from various > > other proposals, so we'll need to wait for wider discussion on that. > > I agree, but it is easy to move the metadata from catalog to metapages. > So the metapages thought can be reconciled among proposals that must need it. > (pg_class_nt and dead tuples bitmaps?) I've copied in Alvaro to comment on possible cross-overs between these projects... Looks like we've got a class of data that is similar in its frequency of change to the pg_class_nt stuff. Also, the discussion about having some of this type of info cached in shared memory seems to have dried up. Should we now go for pg_class_nt plus shared memory cache? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > Looks like we've got a class of data that is similar in its frequency of > change to the pg_class_nt stuff. Say what? These parameters wouldn't ever change after creation, unless we invent ALTER commands to change them. regards, tom lane
On Mon, 2006-06-19 at 09:15 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Looks like we've got a class of data that is similar in its frequency of > > change to the pg_class_nt stuff. > > Say what? These parameters wouldn't ever change after creation, unless > we invent ALTER commands to change them. This was a response to Itagaki's comment that there may be a class of parameter that changes more frequently than that. I can see that we might want that, so just trying to plan ahead to dynamically/automatically set parameters - I thought you'd be in favour of that rather than lots of hand-tuned static parameters for indexes. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > This was a response to Itagaki's comment that there may be a class of > parameter that changes more frequently than that. I can see that we > might want that, so just trying to plan ahead to > dynamically/automatically set parameters - I thought you'd be in favour > of that rather than lots of hand-tuned static parameters for indexes. Anything that's automatically set can be completely hidden within the index AM, no? regards, tom lane
This is the 3rd revised fillfactor patch. Now, AM specific options are stored in pg_class.reloptions as text[]. Also, some bugs are fixed. It passed all regression tests. Tom Lane <tgl@sss.pgh.pa.us> wrote: > An opaque bytea won't do though. What I'd suggest is something real > close to the format used for GUC parameters in ALTER DATABASE SET and > ALTER USER SET, ie, pairs of keyword/value strings. This way pg_dump > doesn't need very much smarts about what the values are that it's > dumping. The column format of options is changed from bytea to an array of text, so re-parsing is needed every time a connection accesses a relation. I changed to write pre-parsed options into pg_internal.init, but AFAICS, only system relations are written in it. If we will find the parsing is slow, it might be good to store options for user relations, too. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Patch applied. Thanks. Catalog version updated. --------------------------------------------------------------------------- ITAGAKI Takahiro wrote: > This is the 3rd revised fillfactor patch. > Now, AM specific options are stored in pg_class.reloptions as text[]. > Also, some bugs are fixed. It passed all regression tests. > > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > An opaque bytea won't do though. What I'd suggest is something real > > close to the format used for GUC parameters in ALTER DATABASE SET and > > ALTER USER SET, ie, pairs of keyword/value strings. This way pg_dump > > doesn't need very much smarts about what the values are that it's > > dumping. > > The column format of options is changed from bytea to an array of text, > so re-parsing is needed every time a connection accesses a relation. > I changed to write pre-parsed options into pg_internal.init, but AFAICS, > only system relations are written in it. If we will find the parsing > is slow, it might be good to store options for user relations, too. > > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center [ Attachment, skipping... ] [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> wrote: > Patch applied. Thanks. Thank you for applying, but sorry, my patch has some incompletions. 1. A debug code is left. Assert() and if-test are redundant. 2. Add a comment on the average FSM request size. Now, the size contains not only the size of tuples, but also freespace on pages. Especially, there may be a room to discuss on 2; it changed the meaning of 'average request size'. If it is enough only to add a comment, please apply the following fixes. diff -cpr pgsql-orig/src/backend/access/heap/hio.c pgsql/src/backend/access/heap/hio.c *** pgsql-orig/src/backend/access/heap/hio.c Mon Jul 3 09:22:49 2006 --- pgsql/src/backend/access/heap/hio.c Mon Jul 3 10:22:40 2006 *************** RelationGetBufferForTuple(Relation relat *** 108,115 **** otherBlock; bool needLock; - if (relation->rd_options == NULL) - elog(ERROR, "RelationGetBufferForTuple %s IS NULL", RelationGetRelationName(relation)); Assert(relation->rd_options != NULL); len = MAXALIGN(len); /* be conservative */ --- 108,113 ---- diff -cpr pgsql-orig/src/backend/storage/freespace/freespace.c pgsql/src/backend/storage/freespace/freespace.c *** pgsql-orig/src/backend/storage/freespace/freespace.c Mon Jul 3 09:22:50 2006 --- pgsql/src/backend/storage/freespace/freespace.c Mon Jul 3 10:30:26 2006 *************** RecordAndGetPageWithFreeSpace(RelFileNod *** 341,346 **** --- 341,348 ---- /* * GetAvgFSMRequestSize - get average FSM request size for a relation. * + * Retuened value is the average of item size plus freespace specified + * by fillfactor. * If the relation is not known to FSM, return a default value. */ Size Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > Thank you for applying, but sorry, my patch has some incompletions. I'm busy reviewing/reworking this patch, and will deal with these items. > 2. Add a comment on the average FSM request size. Now, the size > contains not only the size of tuples, but also freespace on pages. Yeah, I noticed this and thought it was probably a pretty bad idea: it plays hob with the notion of tracking a moving average of freespace requests. I'm not sure what else to do though. Do we want the FSM to explicitly account for fillfactor, and if so how exactly? There's certainly no point in returning a page that doesn't have space for the fillfactor padding. regards, tom lane