Thread: table/index fillfactor control

table/index fillfactor control

From
ITAGAKI Takahiro
Date:
This is a patch for table/index fillfactor control discussed in
    http://archives.postgresql.org/pgsql-hackers/2006-06/msg00175.php
Fillfactor works on CREATE, INSERT, UPDATE, COPY, VACUUM FULL, CLUSTER
and REINDEX, but is not done on dump/restore and GIN access method;
I'd like to ask those module developers to complete the patch, please.

This patch might help the TODO-item:
    Allow CREATE INDEX to take an additional parameter for use with
    special index types
but the patch rejects parameters except fillfactor currently.
If we want to implement the feature, it is needed to consider how to
store generic parameters passed at CREATE.



The following syntax are added:
 - Table creation:
    - CREATE TABLE table (columns) WITH (...)
    - CREATE TABLE table WITH (...) AS SELECT/EXECUTE ...
 - Index creation:
    - CREATE INDEX index ON table USING btree (columns) WITH (...)
    - CREATE TABLE table (i integer PRIMARY KEY WITH (...))
    - ALTER TABLE table ADD PRIMARY KEY (columns) WITH (...)
 - Alterating parameters for existing tables/indexes:
    - ALTER TABLE/INDEX name SET (...)

The folloing is a test of the patch:

# CREATE TABLE tbl (i int) WITH (fillfactor=50);
# CREATE INDEX idx ON tbl USING btree (i) WITH (fillfactor=50);
# INSERT INTO tbl SELECT generate_series(1, 100000);

| relname | relfillfactor | relpages |
+---------+---------------+----------+
| tbl     |            50 |     1087 |
| idx     |            50 |      494 |

# ALTER TABLE tbl SET (fillfactor = 100);
# ALTER INDEX idx SET (fillfactor = 100);
# CLUSTER idx ON tbl;
# REINDEX INDEX idx;

| relname | relfillfactor | relpages |
+---------+---------------+----------+
| tbl     |           100 |      541 |
| idx     |           100 |      249 |

---
ITAGAKI Takahiro
NTT OSS Center


Attachment

Re: table/index fillfactor control

From
Simon Riggs
Date:
On Tue, 2006-06-06 at 18:02 +0900, ITAGAKI Takahiro wrote:
> This is a patch for table/index fillfactor control

This is a good new feature and I'll vote for it.

I see what Tom was driving at with earlier comments. I think we need an
non-index AM specific patch, so that each AM has its own parameters.

So we have a new element of the RelationData struct:
    void    *rd_amopts;

Which each AM defines and interprets.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: table/index fillfactor control

From
"Jonah H. Harris"
Date:
On 6/6/06, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:
> This is a patch for table/index fillfactor control discussed in
>     http://archives.postgresql.org/pgsql-hackers/2006-06/msg00175.php

There's 4 shift/reduce conflicts which I believe are caused by having
used WITH... did you plan to fix this?

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 2nd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/

Re: table/index fillfactor control

From
"Jonah H. Harris"
Date:
On 6/6/06, Jonah H. Harris <jonah.harris@gmail.com> wrote:
> On 6/6/06, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:
> > This is a patch for table/index fillfactor control discussed in
> >     http://archives.postgresql.org/pgsql-hackers/2006-06/msg00175.php
>
> There's 4 shift/reduce conflicts which I believe are caused by having
> used WITH... did you plan to fix this?

BTW, I think this is nice functionality and definitely second Simon &
Tom's ideas.  Thanks for picking it up again :)

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 2nd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/

Re: table/index fillfactor control

From
ITAGAKI Takahiro
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> wrote:
> There's 4 shift/reduce conflicts which I believe are caused by having
> used WITH... did you plan to fix this?

Thanks for pointing out. I realized it is a confliction between
WITH OIDS and WITH (options).

I'll try to treat CreateStmt.hasoids as one of the options internally,
with SQL-level backward compatibility.

---
ITAGAKI Takahiro
NTT OSS Center



Re: table/index fillfactor control

From
Tom Lane
Date:
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> "Jonah H. Harris" <jonah.harris@gmail.com> wrote:
>> There's 4 shift/reduce conflicts which I believe are caused by having
>> used WITH... did you plan to fix this?

> Thanks for pointing out. I realized it is a confliction between
> WITH OIDS and WITH (options).

> I'll try to treat CreateStmt.hasoids as one of the options internally,
> with SQL-level backward compatibility.

There was some discussion recently about how we'll have to make WITH
a fully reserved keyword eventually anyway to handle SQL99 recursive
queries.  I'm not sure if that's really true, but reserving WITH is
a fallback position we should consider, if avoiding the shift/reduce
conflict seems unreasonably messy otherwise.

            regards, tom lane

table/index fillfactor control, try 2

From
ITAGAKI Takahiro
Date:
This is a revised fillfactor patch. It uses WITH syntax and
we can add new AM specific parameters easily.


Simon Riggs <simon@2ndquadrant.com> wrote:

> I see what Tom was driving at with earlier comments. I think we need an
> non-index AM specific patch, so that each AM has its own parameters.

I added amparseoption and amdumpoption to pg_am. The amparseoption parses
options and convert them to an internal structure per AM. The amdumpoption
converts the structure to a human-readable text to dump the definition.
It might be better to make the result of amdumpoption to be a list of
(name, value) pairs instead of a plain text.


> So we have a new element of the RelationData struct:
>     void    *rd_amopts;
> Which each AM defines and interprets.

The internal structure is stored in the pg_class.relamopaque column as bytea.
I guess it is not the best and there is room for discussion. I examined the
following ideas, but they had complexities and difficulties.

 1. Add AM specific system tables (pg_heap, pg_btree, etc.) that may inherit
    pg_class. But it will impact the current source code terribly.
 2. Store the structures in AM's meta page. But heaps don't have meta pages.
 3. Store them into an array of text column that newly added to pg_class.
    But we hove to re-parse the array every time relations are loaded.
 4. Add new system table, pg_class_option (relid, option name, value).
    But it has same problem as 3 and needs additional heap scannings.

Therefore, I choose the as-is binary format to store the internal structures.
Any comments or better ideas?

---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: table/index fillfactor control, try 2

From
Simon Riggs
Date:
On Fri, 2006-06-16 at 13:33 +0900, ITAGAKI Takahiro wrote:
> This is a revised fillfactor patch. It uses WITH syntax and
> we can add new AM specific parameters easily.

Cool. I'll look at that in more detail.

> > So we have a new element of the RelationData struct:
> >     void    *rd_amopts;
> > Which each AM defines and interprets.
>
> The internal structure is stored in the pg_class.relamopaque column as bytea.
> I guess it is not the best and there is room for discussion. I examined the
> following ideas, but they had complexities and difficulties.
>
>  1. Add AM specific system tables (pg_heap, pg_btree, etc.) that may inherit
>     pg_class. But it will impact the current source code terribly.

Hmmm, yep, not a good idea.

>  2. Store the structures in AM's meta page. But heaps don't have meta pages.

But perhaps they should? That sounds very similar to the idea of
non-transactional pg_class data.

It would make a lot of sense if heaps had meta pages too, and that the
data within them was cached on the relcache just as index meta page data
will be for 8.2

>  3. Store them into an array of text column that newly added to pg_class.
>     But we hove to re-parse the array every time relations are loaded.

Not sure if thats a big overhead? Is it a big array? I hope not. We
should be aiming for as few parameters as possible for an index/heap,
otherwise we'll never be able to determine their correct settings.

>  4. Add new system table, pg_class_option (relid, option name, value).
>     But it has same problem as 3 and needs additional heap scannings.

No thanks.

> Therefore, I choose the as-is binary format to store the internal structures.
> Any comments or better ideas?

Well, its either metapages or array-on-pg_class for me.

The metagpages thought would require some consolidation from various
other proposals, so we'll need to wait for wider discussion on that.

--
  Simon Riggs
  EnterpriseDB          http://www.enterprisedb.com


Re: table/index fillfactor control, try 2

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Fri, 2006-06-16 at 13:33 +0900, ITAGAKI Takahiro wrote:
>> 2. Store the structures in AM's meta page. But heaps don't have meta pages.

> But perhaps they should? That sounds very similar to the idea of
> non-transactional pg_class data.

The disadvantage of putting this stuff into metapages is that then you
need some entirely new protocol for letting clients get at it (and
pg_dump, for one, needs to).  I agree with putting it in a catalog.

An opaque bytea won't do though.  What I'd suggest is something real
close to the format used for GUC parameters in ALTER DATABASE SET and
ALTER USER SET, ie, pairs of keyword/value strings.  This way pg_dump
doesn't need very much smarts about what the values are that it's
dumping.

            regards, tom lane

Re: table/index fillfactor control, try 2

From
ITAGAKI Takahiro
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The disadvantage of putting this stuff into metapages is that then you
> need some entirely new protocol for letting clients get at it (and
> pg_dump, for one, needs to).

> An opaque bytea won't do though.  What I'd suggest is something real
> close to the format used for GUC parameters in ALTER DATABASE SET and
> ALTER USER SET, ie, pairs of keyword/value strings.

Ok, I'll consult the ALTER DATABASE SET codes.

Storing in arrays might make it difficult to retrieve relations that match
conditions specified by clients. However, I think such queirs are not used
so many times. And if necessary, we can provide a helper function to extract
a value from an array, like "valueof(reloptions, 'fillfactor') > 90"

---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: table/index fillfactor control, try 2

From
ITAGAKI Takahiro
Date:
Simon Riggs <simon@2ndquadrant.com> wrote:

> >  2. Store the structures in AM's meta page. But heaps don't have meta pages.
>
> But perhaps they should? That sounds very similar to the idea of
> non-transactional pg_class data.

Presently, I suppose the parameters are not modified so many times.
They are set only on build time or maintenance.

I think we will need metapages in the future, but not right now. If we will
provide an automatic configurator for fillfactors (or other parameters),
parameters might be changed every time the configurator runs, for example,
per autovacuum.


> The metagpages thought would require some consolidation from various
> other proposals, so we'll need to wait for wider discussion on that.

I agree, but it is easy to move the metadata from catalog to metapages.
So the metapages thought can be reconciled among proposals that must need it.
(pg_class_nt and dead tuples bitmaps?)

---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: table/index fillfactor control, try 2

From
Simon Riggs
Date:
On Mon, 2006-06-19 at 14:36 +0900, ITAGAKI Takahiro wrote:
> Simon Riggs <simon@2ndquadrant.com> wrote:
>
> > >  2. Store the structures in AM's meta page. But heaps don't have meta pages.
> >
> > But perhaps they should? That sounds very similar to the idea of
> > non-transactional pg_class data.
>
> Presently, I suppose the parameters are not modified so many times.
> They are set only on build time or maintenance.
>
> I think we will need metapages in the future, but not right now. If we will
> provide an automatic configurator for fillfactors (or other parameters),
> parameters might be changed every time the configurator runs, for example,
> per autovacuum.

Yes, I can see that future too.

> > The metagpages thought would require some consolidation from various
> > other proposals, so we'll need to wait for wider discussion on that.
>
> I agree, but it is easy to move the metadata from catalog to metapages.
> So the metapages thought can be reconciled among proposals that must need it.
> (pg_class_nt and dead tuples bitmaps?)

I've copied in Alvaro to comment on possible cross-overs between these
projects...

Looks like we've got a class of data that is similar in its frequency of
change to the pg_class_nt stuff.

Also, the discussion about having some of this type of info cached in
shared memory seems to have dried up. Should we now go for pg_class_nt
plus shared memory cache?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: table/index fillfactor control, try 2

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Looks like we've got a class of data that is similar in its frequency of
> change to the pg_class_nt stuff.

Say what?  These parameters wouldn't ever change after creation, unless
we invent ALTER commands to change them.

            regards, tom lane

Re: table/index fillfactor control, try 2

From
Simon Riggs
Date:
On Mon, 2006-06-19 at 09:15 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Looks like we've got a class of data that is similar in its frequency of
> > change to the pg_class_nt stuff.
>
> Say what?  These parameters wouldn't ever change after creation, unless
> we invent ALTER commands to change them.

This was a response to Itagaki's comment that there may be a class of
parameter that changes more frequently than that. I can see that we
might want that, so just trying to plan ahead to
dynamically/automatically set parameters - I thought you'd be in favour
of that rather than lots of hand-tuned static parameters for indexes.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


Re: table/index fillfactor control, try 2

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> This was a response to Itagaki's comment that there may be a class of
> parameter that changes more frequently than that. I can see that we
> might want that, so just trying to plan ahead to
> dynamically/automatically set parameters - I thought you'd be in favour
> of that rather than lots of hand-tuned static parameters for indexes.

Anything that's automatically set can be completely hidden within the
index AM, no?

            regards, tom lane

table/index fillfactor control, try 3

From
ITAGAKI Takahiro
Date:
This is the 3rd revised fillfactor patch.
Now, AM specific options are stored in pg_class.reloptions as text[].
Also, some bugs are fixed. It passed all regression tests.


Tom Lane <tgl@sss.pgh.pa.us> wrote:

> An opaque bytea won't do though.  What I'd suggest is something real
> close to the format used for GUC parameters in ALTER DATABASE SET and
> ALTER USER SET, ie, pairs of keyword/value strings.  This way pg_dump
> doesn't need very much smarts about what the values are that it's
> dumping.

The column format of options is changed from bytea to an array of text,
so re-parsing is needed every time a connection accesses a relation.
I changed to write pre-parsed options into pg_internal.init, but AFAICS,
only system relations are written in it. If we will find the parsing
is slow, it might be good to store options for user relations, too.


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment

Re: table/index fillfactor control, try 3

From
Bruce Momjian
Date:
Patch applied.  Thanks.

Catalog version updated.

---------------------------------------------------------------------------


ITAGAKI Takahiro wrote:
> This is the 3rd revised fillfactor patch.
> Now, AM specific options are stored in pg_class.reloptions as text[].
> Also, some bugs are fixed. It passed all regression tests.
>
>
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > An opaque bytea won't do though.  What I'd suggest is something real
> > close to the format used for GUC parameters in ALTER DATABASE SET and
> > ALTER USER SET, ie, pairs of keyword/value strings.  This way pg_dump
> > doesn't need very much smarts about what the values are that it's
> > dumping.
>
> The column format of options is changed from bytea to an array of text,
> so re-parsing is needed every time a connection accesses a relation.
> I changed to write pre-parsed options into pg_internal.init, but AFAICS,
> only system relations are written in it. If we will find the parsing
> is slow, it might be good to store options for user relations, too.
>
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center

[ Attachment, skipping... ]

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: table/index fillfactor control, try 3

From
ITAGAKI Takahiro
Date:
Bruce Momjian <bruce@momjian.us> wrote:

> Patch applied.  Thanks.

Thank you for applying, but sorry, my patch has some incompletions.
  1. A debug code is left. Assert() and if-test are redundant.
  2. Add a comment on the average FSM request size. Now, the size
     contains not only the size of tuples, but also freespace on pages.

Especially, there may be a room to discuss on 2; it changed the meaning of
'average request size'. If it is enough only to add a comment, please apply
the following fixes.



diff -cpr pgsql-orig/src/backend/access/heap/hio.c pgsql/src/backend/access/heap/hio.c
*** pgsql-orig/src/backend/access/heap/hio.c    Mon Jul  3 09:22:49 2006
--- pgsql/src/backend/access/heap/hio.c    Mon Jul  3 10:22:40 2006
*************** RelationGetBufferForTuple(Relation relat
*** 108,115 ****
                  otherBlock;
      bool        needLock;

-     if (relation->rd_options == NULL)
-         elog(ERROR, "RelationGetBufferForTuple %s IS NULL", RelationGetRelationName(relation));
      Assert(relation->rd_options != NULL);

      len = MAXALIGN(len);        /* be conservative */
--- 108,113 ----
diff -cpr pgsql-orig/src/backend/storage/freespace/freespace.c pgsql/src/backend/storage/freespace/freespace.c
*** pgsql-orig/src/backend/storage/freespace/freespace.c    Mon Jul  3 09:22:50 2006
--- pgsql/src/backend/storage/freespace/freespace.c    Mon Jul  3 10:30:26 2006
*************** RecordAndGetPageWithFreeSpace(RelFileNod
*** 341,346 ****
--- 341,348 ----
  /*
   * GetAvgFSMRequestSize - get average FSM request size for a relation.
   *
+  * Retuened value is the average of item size plus freespace specified
+  * by fillfactor.
   * If the relation is not known to FSM, return a default value.
   */
  Size

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: table/index fillfactor control, try 3

From
Tom Lane
Date:
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> Thank you for applying, but sorry, my patch has some incompletions.

I'm busy reviewing/reworking this patch, and will deal with these items.

>   2. Add a comment on the average FSM request size. Now, the size
>      contains not only the size of tuples, but also freespace on pages.

Yeah, I noticed this and thought it was probably a pretty bad idea:
it plays hob with the notion of tracking a moving average of freespace
requests.  I'm not sure what else to do though.  Do we want the FSM to
explicitly account for fillfactor, and if so how exactly?  There's
certainly no point in returning a page that doesn't have space for the
fillfactor padding.

            regards, tom lane