Re: pg_reorg in core? - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pg_reorg in core?
Date
Msg-id CAB7nPqTFhL8_eHTG=XT5Tfju_+7bewASXY5ivdVJGXQ4yBJxjA@mail.gmail.com
Whole thread Raw
In response to Re: pg_reorg in core?  (Josh Kupershmidt <schmiddy@gmail.com>)
Responses Re: pg_reorg in core?  (Josh Kupershmidt <schmiddy@gmail.com>)
List pgsql-hackers


On Fri, Sep 21, 2012 at 12:07 PM, Josh Kupershmidt <schmiddy@gmail.com> wrote:
On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Hi all,
>
> During the last PGCon, I heard that some community members would be
> interested in having pg_reorg directly in core.

I'm actually not crazy about this idea, at least not given the current
state of pg_reorg. Right now, there are a quite a few fixes and
features which remain to be merged in to cvs head, but at least we can
develop pg_reorg on a schedule independent of Postgres itself, i.e. we
can release new features more often than once a year. Perhaps when
pg_reorg is more stable, and the known bugs and missing features have
been ironed out, we could think about integrating into core.
 
What could be also great is to move the project directly into github to facilitate its maintenance and development.
My own copy is based and synced on what is in pgfoundry as I don't own any admin access to on pgfoundry (honestly don't think I can get one either), even if I am from NTT. Hey, some people with admin rights here?
 
Granted, a nice thing about integrating with core is we'd probably
have more of an early warning when reshuffling of PG breaks pg_reorg
(e.g. the recent splitting of the htup headers), but such changes have
been quick and easy to fix so far. 
Yes, that is also why I am proposing to integrate it into core. Its maintenance pace would be faster and easier than it is now in pgfoundry. However, if hackers do not think that it is worth adding it to core... Well separate development as done now would be fine but slower...
Also, just by watching the extension modules in contrib, I haven't seen one using both the library and binary at the same time like pg_reorg does.

> - creation of indexes on the temporary table based on what the user wishes
> - Apply the logs registered during the index creation
> - Swap the names of freshly created table and old table
> - Drop the useless objects
>
> The code is hosted by pg_foundry here: http://pgfoundry.org/projects/reorg/.
> I am also maintaining a fork in github in sync with pgfoundry here:
> https://github.com/michaelpq/pg_reorg.
>
> Just, do you guys think it is worth adding a functionality like pg_reorg in
> core or not?
>
> If yes, well I think the code of pg_reorg is going to need some
> modifications to make it more compatible with contrib modules using only
> EXTENSION.
> For the time being pg_reorg is divided into 2 parts, binary and library.
> The library part is the SQL portion of pg_reorg, containing a set of C
> functions that are called by the binary part. This has been extended to
> support CREATE EXTENSION recently.
> The binary part creates a command pg_reorg in charge of calling the set of
> functions created by the lib part, being just a wrapper of the library part
> to control the creation and deletion of the objects.
> It is also in charge of deleting the temporary objects by callback if an
> error occurs.
>
> By using the binary command, it is possible to reorganize a single table or
> a database, in this case reorganizing a database launches only a loop on
> each table of this database.
>
> My idea is to remove the binary part and to rely only on the library part to
> make pg_reorg a single extension with only system functions like other
> contrib modules.

> In order to do that what is missing is a function that could be used as an
> entry point for table reorganization, a function of the type
> pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
> All the functionalities of pg_reorg could be reproducible:
> - pg_reorg_table(tableoid) for a VACUUM FULL reorganization
> - pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table has a
> CLUSTER key
> - pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization based on
> a wanted column.
>
> Is it worth the shot?

I haven't seen this documented as such, but AFAICT the reason that
pg_reorg is split into a binary and set of backend functions which are
called by the binary is that pg_reorg needs to be able to control its
steps in several transactions so as to avoid holding locks
excessively. The reorg_one_table() function uses four or five
transactions per table, in fact. If all the logic currently in the
pg_reorg binary were moved into backend functions,  calling
pg_reorg_table() would have to be a single transaction, and there
would be no advantage to using such a function vs. CLUSTER or VACUUM
FULL.
Of course, but functionalities like CREATE INDEX CONCURRENTLY use multiple transactions. Couldn't it be possible to use something similar to make the modifications visible to other backends?
 

Also, having a separate binary we should be able to perform some neat
tricks such as parallel index builds using multiple connections (I'm
messing around with this idea now). AFAIK this would also not be
possible if pg_reorg were contained solely in the library functions.
Interesting idea, this could accelerate the whole process. I am just wondering about possible consistency issues like the logs being replayed before swap.
--
Michael Paquier
http://michael.otacoo.com

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: 64-bit API for large object
Next
From: Tom Lane
Date:
Subject: Re: 64-bit API for large object