Thread: don't create storage when unnecessary

don't create storage when unnecessary

From
Alvaro Herrera
Date:
Some time ago, after partitioned indexes had been pushed, I realized
that even though I didn't want them to have relfilenodes, they did.  And
looking closer I noticed that *a lot* of relation kinds that didn't need
relfilenodes, had them anyway.

This patch fixes that; if no relfilenode is needed, it's not created.

I didn't verify pg_upgrade behavior across this commit.  Maybe something
needs tweaking there.


PS: I think it'd be worth following up with this ...
https://postgr.es/m/CAFjFpRcfzs+yst6YBCseD_orEcDNuAr9GUTraZ5GC=AvCYh55Q@mail.gmail.com

-- 
Álvaro Herrera

Attachment

Re: don't create storage when unnecessary

From
Andres Freund
Date:
Hi,

On 2018-12-06 18:55:52 -0300, Alvaro Herrera wrote:
> Some time ago, after partitioned indexes had been pushed, I realized
> that even though I didn't want them to have relfilenodes, they did.  And
> looking closer I noticed that *a lot* of relation kinds that didn't need
> relfilenodes, had them anyway.
> 
> This patch fixes that; if no relfilenode is needed, it's not created.
> 
> I didn't verify pg_upgrade behavior across this commit.  Maybe something
> needs tweaking there.

Hm, that generally sounds like a good plan. Could we back this up with
tests in misc_sanity.sql or such?

- Andres


Re: don't create storage when unnecessary

From
Michael Paquier
Date:
On Thu, Dec 06, 2018 at 06:55:52PM -0300, Alvaro Herrera wrote:
> Some time ago, after partitioned indexes had been pushed, I realized
> that even though I didn't want them to have relfilenodes, they did.  And
> looking closer I noticed that *a lot* of relation kinds that didn't need
> relfilenodes, had them anyway.
>
> This patch fixes that; if no relfilenode is needed, it's not created.
>
> I didn't verify pg_upgrade behavior across this commit.  Maybe something
> needs tweaking there.
>
> PS: I think it'd be worth following up with this ...
> https://postgr.es/m/CAFjFpRcfzs+yst6YBCseD_orEcDNuAr9GUTraZ5GC=AvCYh55Q@mail.gmail.com

A macro makes sense to control that.  Now I have to admit that I don't
like your solution.  Wouldn't it be cleaner to assign InvalidOid to
relfilenode in such cases?  The callers of heap_create would need to be
made smarter when they now pass down a relfilenode (looking at you,
DefineIndex!), but that seems way more consistent to me.  Some tests
would also be welcome.
--
Michael

Attachment

Re: don't create storage when unnecessary

From
Alvaro Herrera
Date:
On 2018-Dec-07, Michael Paquier wrote:

> A macro makes sense to control that.

I added Ashutosh's RELKIND_HAS_STORAGE, but renamed it to
RELKIND_CAN_HAVE_STORAGE, because some of the relkinds can be mapped and
thus would have relfilenode set to 0.  I think this is a bit misleading
either way.

> Now I have to admit that I don't
> like your solution.  Wouldn't it be cleaner to assign InvalidOid to
> relfilenode in such cases?  The callers of heap_create would need to be
> made smarter when they now pass down a relfilenode (looking at you,
> DefineIndex!), but that seems way more consistent to me.

I don't follow.  When a relfilenode is passed by callers, they indicate
that the storage has already been created.  Contrariwise, when a
relation kind that *does* have storage but is not yet created, they
pass InvalidOid as relfilenode, and heap_create is in charge of creating
storage.  Maybe I'm not quite seeing what problem you mean.  Or I could
add a separate boolean, but that seems pointless.

Another possible improvement is to remove the create_storage boolean 

> Some tests would also be welcome.

Added a test in sanity_check.sql that there's no relation with the
relkinds that aren't supposed to have storage.  Without the code fix it
fails in current regression database, but in the failure result set
there isn't any relation of kinds 'p' or 'I', so this isn't a terribly
comprehensive test -- the query runs too early in the regression
sequence.  I'm not sure it's worth bothering further.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: don't create storage when unnecessary

From
Alvaro Herrera
Date:
Rebased over today's conflicting commits.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: don't create storage when unnecessary

From
Kyotaro HORIGUCHI
Date:
Hello.

At Sun, 16 Dec 2018 17:47:16 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
<20181216204716.jddzijk3fb2dpdk4@alvherre.pgsql>
> On 2018-Dec-07, Michael Paquier wrote:
> 
> > A macro makes sense to control that.
> 
> I added Ashutosh's RELKIND_HAS_STORAGE, but renamed it to
> RELKIND_CAN_HAVE_STORAGE, because some of the relkinds can be mapped and
> thus would have relfilenode set to 0.  I think this is a bit misleading
> either way.

FWIW.. I RELKIND_HAS_STORAGE looks better to me.

# Since it's a bit too late, I don't insist on that.

Mapped relations have storage, which is signalled by relfilenode
= 0 and the real file node is given by relation mapper. And it is
actually created at the boostrap time. In this sense,
(RELKIND_HAS_STORAGE && relfilenode == 0) in heap_craete makes
sense.

Assertion (..->relNode != InvalidOid) at the end of
RelationInitPhysicalAddr doesn't fail. I think RELKIND_HAS_STORGE
is preferable also in this sense.

ATExecSetTableSpaceNoStorage assumes that the relation is
actually having storage.

> > Now I have to admit that I don't
> > like your solution.  Wouldn't it be cleaner to assign InvalidOid to
> > relfilenode in such cases?  The callers of heap_create would need to be
> > made smarter when they now pass down a relfilenode (looking at you,
> > DefineIndex!), but that seems way more consistent to me.
> 
> I don't follow.  When a relfilenode is passed by callers, they indicate
> that the storage has already been created.  Contrariwise, when a
> relation kind that *does* have storage but is not yet created, they
> pass InvalidOid as relfilenode, and heap_create is in charge of creating
> storage.  Maybe I'm not quite seeing what problem you mean.  Or I could
> add a separate boolean, but that seems pointless.
> 
> Another possible improvement is to remove the create_storage boolean 
> 
> > Some tests would also be welcome.
> 
> Added a test in sanity_check.sql that there's no relation with the
> relkinds that aren't supposed to have storage.  Without the code fix it
> fails in current regression database, but in the failure result set
> there isn't any relation of kinds 'p' or 'I', so this isn't a terribly
> comprehensive test -- the query runs too early in the regression
> sequence.  I'm not sure it's worth bothering further.

Actual files were not created even in the past, create_heap has
been just refactored, which looks fine. pg_class and relcache
entries no longer have false relfilenode, which looks fine. The
test should check only relfilenode won't be given to such
relation, which were given in the past, which looks fine.

The patch applies cleanly and passed the regression test.

regares.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: don't create storage when unnecessary

From
Michael Paquier
Date:
On Sun, Dec 16, 2018 at 05:47:16PM -0300, Alvaro Herrera wrote:
> I don't follow.  When a relfilenode is passed by callers, they indicate
> that the storage has already been created.  Contrariwise, when a
> relation kind that *does* have storage but is not yet created, they
> pass InvalidOid as relfilenode, and heap_create is in charge of creating
> storage.  Maybe I'm not quite seeing what problem you mean.  Or I could
> add a separate boolean, but that seems pointless.

I have been double-checking my thoughts on the matter, and one take is
to allow the reuse of relfilenodes for indexes like in TryReuseIndex().
I did not recall that case.  Sorry for the noise.

> Another possible improvement is to remove the create_storage boolean

Yes, this could go away as this is linked with relfilenode.  I let it up
to you if you want to remove it or keep it.  I think that I would just
remove it.

> Added a test in sanity_check.sql that there's no relation with the
> relkinds that aren't supposed to have storage.  Without the code fix it
> fails in current regression database, but in the failure result set
> there isn't any relation of kinds 'p' or 'I', so this isn't a terribly
> comprehensive test -- the query runs too early in the regression
> sequence.  I'm not sure it's worth bothering further.

+-- check that relations without storage don't have relfilenode
It could be an idea to add a comment mentioning that the set of relkinds
in RELKIND_CAN_HAVE_STORAGE are linked with the relkinds of this query.
--
Michael

Attachment

Re: don't create storage when unnecessary

From
Amit Langote
Date:
Hi,

On 2018/12/18 14:56, Kyotaro HORIGUCHI wrote:
> Hello.
> 
> At Sun, 16 Dec 2018 17:47:16 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
<20181216204716.jddzijk3fb2dpdk4@alvherre.pgsql>
>> On 2018-Dec-07, Michael Paquier wrote:
>>
>>> A macro makes sense to control that.
>>
>> I added Ashutosh's RELKIND_HAS_STORAGE, but renamed it to
>> RELKIND_CAN_HAVE_STORAGE, because some of the relkinds can be mapped and
>> thus would have relfilenode set to 0.  I think this is a bit misleading
>> either way.
> 
> FWIW.. I RELKIND_HAS_STORAGE looks better to me.
> 
> # Since it's a bit too late, I don't insist on that.
> 
> Mapped relations have storage, which is signalled by relfilenode
> = 0 and the real file node is given by relation mapper. And it is
> actually created at the boostrap time. In this sense,
> (RELKIND_HAS_STORAGE && relfilenode == 0) in heap_craete makes
> sense.
> 
> Assertion (..->relNode != InvalidOid) at the end of
> RelationInitPhysicalAddr doesn't fail. I think RELKIND_HAS_STORGE
> is preferable also in this sense.

Sorry to be saying it late, but I have to agree with Horiguchi-san here
that RELKIND_HAS_STORAGE sounds better and is clear.  Looking at what's
committed:

/*
 * Relation kinds that have physical storage. These relations normally have
 * relfilenode set to non-zero, but it can also be zero if the relation is
 * mapped.
 */
#define RELKIND_CAN_HAVE_STORAGE(relkind) \
    ((relkind) == RELKIND_RELATION || \
     (relkind) == RELKIND_INDEX || \
     (relkind) == RELKIND_SEQUENCE || \
     (relkind) == RELKIND_TOASTVALUE || \
     (relkind) == RELKIND_MATVIEW)

So, they all *do have* storage, as the comment's first sentence also says.
 Mixing the bit about mapped relations in the naming of this macro will
make it hard to reason about using it in other parts of the backend code
(although, in admittedly not too far off places), where it shouldn't
matter if the relation's file is determined using relfilenode or via
relation to file mapper.

> ATExecSetTableSpaceNoStorage assumes that the relation is
> actually having storage.

Hmm, it has the following Assert:

  /*
   * Shouldn't be called on relations having storage; these are processed
   * in phase 3.
   */
  Assert(!RELKIND_CAN_HAVE_STORAGE(rel->rd_rel->relkind));

So, it actually assumes that it's called for relations that don't have
storage (...but do have a tablespace property that applies to its children.)

Thanks,
Amit



Re: don't create storage when unnecessary

From
Peter Eisentraut
Date:
On 18/12/2018 08:18, Amit Langote wrote:
> Sorry to be saying it late, but I have to agree with Horiguchi-san here
> that RELKIND_HAS_STORAGE sounds better and is clear.

I think I agree.  Does someone want to send a patch?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: don't create storage when unnecessary

From
Alvaro Herrera
Date:
On 2019-Jan-04, Peter Eisentraut wrote:

> On 18/12/2018 08:18, Amit Langote wrote:
> > Sorry to be saying it late, but I have to agree with Horiguchi-san here
> > that RELKIND_HAS_STORAGE sounds better and is clear.
> 
> I think I agree.  Does someone want to send a patch?

I'll do it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: don't create storage when unnecessary

From
Michael Paquier
Date:
On Fri, Jan 04, 2019 at 09:19:25AM -0300, Alvaro Herrera wrote:
> I'll do it.

Thanks for taking care of it.  Please note that commands/copy.c also
makes use of RELKIND_CAN_HAVE_STORAGE() as of bf491a9.  Could you
split the renaming with a commit independent on what is being
discussed on this thread?  These are separate issues in my opinion.
--
Michael

Attachment