Thread: generic LONG VARLENA

generic LONG VARLENA

From
wieck@debis.com (Jan Wieck)
Date:
Well,

    first  I  want  to  summarize  some details, to see if we all
    agree so far in the discussion.

    - The implementation should be generic for all variable  size
      types, but can be enabled/disabled per type.

    - Large values are moved out of the main tuple until it fit's
      a yet to be defined size.

    - The moved off values  are  kept  in  another  relation  per
      table,  using  regular tuples where the value is split into
      chunks. The new "expansion" relations get another  relkind,
      so  they  can  be  hidden  from the user and the system can
      easily identify them as such.

    - The type specific functions call a central support function
      to  get the usual VARLENA format, which is taken from a LRU
      cache or fetched from  the  extension  relation.  They  are
      responsible  for freeing the memory after they're done with
      the value. Some macro's should make  it  fairly  simple  to
      handle.

    I  don't  think  it  is  a  good idea to create the expansion
    relation all the time. Some keyword in CREATE  TABLE,  and/or
    another ALTER TABLE should do it instead, so the DB admin can
    activate the LONG feature on a per table base as  needed.  In
    the   first  implementation  there  will  be  no  command  to
    deactivate it again. Workaround is rename  table  and  select
    into as usual.

    Also  I  would  like to say that system relations cannot have
    expansion relations.  At  least  not  until  we  have  enough
    experience with this stuff.

    Is  that  now  what we initially want to give a try? If so, I
    would like to start soon to get the generic part ready  ASAP.
    Others  could  then  join  in  and  contribute by adding LONG
    support for all the VARLENA data types we have.

    Would really be a big leap if we can get this finished for  a
    reasonable number of VARLENA types by February 1.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] generic LONG VARLENA

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     first  I  want  to  summarize  some details, to see if we all
>     agree so far in the discussion.

I snipped everything I agreed with ;-)

>     - The implementation should be generic for all variable  size
>       types, but can be enabled/disabled per type.              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Per-type control doesn't strike me as interesting or useful.  If there
needs to be a control at all, which I doubt, per-table would be the
way to go.  But how many users will really say, "Oh yes, I *want* the
thing to fail if my tuple's too big!"?  I say: make it automatically
apply whenever needed, don't force users to think about it.

>     - The type specific functions call a central support function
>       to  get the usual VARLENA format, which is taken from a LRU
>       cache or fetched from  the  extension  relation.  They  are
>       responsible  for freeing the memory after they're done with
>       the value.

If we are going to do this, we ought also think about solving the
generic memory-leakage problem at the same time.  No point in having
to revisit all the same code later to deal with that issue.

>     I  don't  think  it  is  a  good idea to create the expansion
>     relation all the time. Some keyword in CREATE  TABLE,  and/or
>     another ALTER TABLE should do it instead, so the DB admin can
>     activate the LONG feature on a per table base as  needed.

I don't believe it.  See above: people will complain that it's a bug
that the system doesn't handle their long data values.  Saying "oh, you
have to turn it on" will not appease them.  My objection is really the
same as for the specialized LONG datatype: I do *not* want people to
have to put nonstandard junk into their database schema declarations
in order to activate this feature.  I think it should Just Work and
stay out of users' faces.

Creating the expansion relation isn't that big a deal, but if you
don't want to do it always, why not do it on first use?

>     Also  I  would  like to say that system relations cannot have
>     expansion relations.  At  least  not  until  we  have  enough
>     experience with this stuff.

I'd really, really, really like to have this work for rules, though.
Why shouldn't we allow it for system relations?  Most of the critical
ones have fixed-width tuples anyway, so it won't matter for them.

BTW, it strikes me we should drop the "lztext" special datatype, and
instead have compression automatically applied to any varlena that
we are contemplating putting out-of-line.  (If we're really lucky,
that saves us having to put the value out-of-line!)

>     Is  that  now  what we initially want to give a try? If so, I
>     would like to start soon to get the generic part ready  ASAP.
>     Others  could  then  join  in  and  contribute by adding LONG
>     support for all the VARLENA data types we have.

Yes, if we don't do it inside fastgetattr then there's a lot of code
that will have to change.

>     Would really be a big leap if we can get this finished for  a
>     reasonable number of VARLENA types by February 1.

The more I think about this the more I think that it's a bad, bad idea
to try to have it ready by Feb 1.  There's not really enough time to
get it right and test it.  I don't want to be putting out an unstable
release, and that's what I'm afraid we'll have if we try to rush in
such a major change as this.  Particularly when we have nontrivial
amounts of unfinished business elsewhere that we shouldn't neglect.
(Jan, do you really think you can make this happen *and* bring foreign
keys to a finished status before February?  If you are going to leave
stuff undone in foreign keys, I think you are making the wrong choice.)

Furthermore, we can save ourselves some time if we tackle this change
in combination with the fmgr revision and the memory-leak-elimination
issue.  We will be touching all the same per-data-type code for each
of these issues, so why not touch it once instead of several times?

In short, I like this design but I think we should plan it for 7.1.
        regards, tom lane


Re: [HACKERS] generic LONG VARLENA

From
Bruce Momjian
Date:
> wieck@debis.com (Jan Wieck) writes:
> >     first  I  want  to  summarize  some details, to see if we all
> >     agree so far in the discussion.
> 
> I snipped everything I agreed with ;-)
> 
> >     - The implementation should be generic for all variable  size
> >       types, but can be enabled/disabled per type.
>                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Per-type control doesn't strike me as interesting or useful.  If there
> needs to be a control at all, which I doubt, per-table would be the
> way to go.  But how many users will really say, "Oh yes, I *want* the
> thing to fail if my tuple's too big!"?  I say: make it automatically
> apply whenever needed, don't force users to think about it.

Agreed.  Who wouldn't want it.

> 
> >     - The type specific functions call a central support function
> >       to  get the usual VARLENA format, which is taken from a LRU
> >       cache or fetched from  the  extension  relation.  They  are
> >       responsible  for freeing the memory after they're done with
> >       the value.
> 
> If we are going to do this, we ought also think about solving the
> generic memory-leakage problem at the same time.  No point in having
> to revisit all the same code later to deal with that issue.

I have a good fix for this.  My patch suggested the varlena routine
pfree the pointer returned from expand_long().  No need for that.  With
an LRU cache, we can have the cache itself free the old values.  This
would be a nice optimization.  Just add the lines below:

+ 
+     if (VARISLONG(vlena)) /* checks long bit */
+         vlena = expand_long(vlena); /* returns palloc long */
+ 

There aren't any cases where the varlena access routines access more
than two varlena values at the same time.  If the expansion cache is at
least two values, you can just expand it and return memory.  When that
cache entry is expired, the memory is freed.  Wow, this makes the
varlena changes very compact.  All the action is in expand_long().

Basically, don't have the access routines free the memory, have the old
cache entries be pfreed.

> 
> >     I  don't  think  it  is  a  good idea to create the expansion
> >     relation all the time. Some keyword in CREATE  TABLE,  and/or
> >     another ALTER TABLE should do it instead, so the DB admin can
> >     activate the LONG feature on a per table base as  needed.
> 
> I don't believe it.  See above: people will complain that it's a bug
> that the system doesn't handle their long data values.  Saying "oh, you
> have to turn it on" will not appease them.  My objection is really the
> same as for the specialized LONG datatype: I do *not* want people to
> have to put nonstandard junk into their database schema declarations
> in order to activate this feature.  I think it should Just Work and
> stay out of users' faces.

> Creating the expansion relation isn't that big a deal, but if you
> don't want to do it always, why not do it on first use?
> 

Yes, why not just create it the first time it is needed.  Seems pretty
small performance-wise.

> >     Also  I  would  like to say that system relations cannot have
> >     expansion relations.  At  least  not  until  we  have  enough
> >     experience with this stuff.
> 
> I'd really, really, really like to have this work for rules, though.
> Why shouldn't we allow it for system relations?  Most of the critical
> ones have fixed-width tuples anyway, so it won't matter for them.

Oh, that's a good point.  Seems that is a big reason for expansion of
types.

> 
> BTW, it strikes me we should drop the "lztext" special datatype, and
> instead have compression automatically applied to any varlena that
> we are contemplating putting out-of-line.  (If we're really lucky,
> that saves us having to put the value out-of-line!)

Ooh, very smart.  You would need another bit to say whether the varlena
is compressed or now.  If you take it from 4-byte header, we are down to
a 1 GB length limit.  You could do all the compression/decompression in
the two expansion functions, though compressing and then not using the
long_ table would be a little tricky to code, but do-able.  You would
compress, then if still too large, move to long table.



> 
> >     Is  that  now  what we initially want to give a try? If so, I
> >     would like to start soon to get the generic part ready  ASAP.
> >     Others  could  then  join  in  and  contribute by adding LONG
> >     support for all the VARLENA data types we have.
> 
> Yes, if we don't do it inside fastgetattr then there's a lot of code
> that will have to change.

See above.  It looks like only a few lines per function now.  If we do
it in fastgetattr, is there less code to change?  How do we pfree()?

> 
> >     Would really be a big leap if we can get this finished for  a
> >     reasonable number of VARLENA types by February 1.
> 
> The more I think about this the more I think that it's a bad, bad idea
> to try to have it ready by Feb 1.  There's not really enough time to
> get it right and test it.  I don't want to be putting out an unstable
> release, and that's what I'm afraid we'll have if we try to rush in
> such a major change as this.  Particularly when we have nontrivial
> amounts of unfinished business elsewhere that we shouldn't neglect.
> (Jan, do you really think you can make this happen *and* bring foreign
> keys to a finished status before February?  If you are going to leave
> stuff undone in foreign keys, I think you are making the wrong choice.)
> 
> Furthermore, we can save ourselves some time if we tackle this change
> in combination with the fmgr revision and the memory-leak-elimination
> issue.  We will be touching all the same per-data-type code for each
> of these issues, so why not touch it once instead of several times?
> 
> In short, I like this design but I think we should plan it for 7.1.

Not sure on this one.  Jan will have to comment.

I am excited about the long data type.  This is _the_ way to do long
data types.  Have any of the commercial databases figured out this way
to do it.  I can't imagine a better system.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] generic LONG VARLENA

From
wieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

> wieck@debis.com (Jan Wieck) writes:
>
> Per-type control doesn't strike me as interesting or useful.  If there
> needs to be a control at all, which I doubt, per-table would be the

    Isn't   intended  to  be  a  runtime  configuration.  Just  a
    temporary feature to restrict  the  attributes  that  can  be
    moved  off  to  those  types,  where  WE  know  that  the adt
    functions are prepared for  them.  If  we  finally  have  all
    builtin types finished for LONG handling, it will be removed,
    making user defined types LONGable too.

> >     I  don't  think  it  is  a  good idea to create the expansion
> >     relation all the time. Some keyword in CREATE  TABLE,  and/or
> >     another ALTER TABLE should do it instead, so the DB admin can
> >     activate the LONG feature on a per table base as  needed.
>
> I don't believe it.  See above: people will complain that it's a bug
> that the system doesn't handle their long data values.  Saying "oh, you
> have to turn it on" will not appease them.  My objection is really the
> same as for the specialized LONG datatype: I do *not* want people to
> have to put nonstandard junk into their database schema declarations
> in order to activate this feature.  I think it should Just Work and
> stay out of users' faces.
>
> Creating the expansion relation isn't that big a deal, but if you
> don't want to do it always, why not do it on first use?

    So  you  want  to  do   a   heap_create_with_catalog()   plus
    index_create()'s    from    inside   the   heap_insert()   or
    heap_update(). Cannot be done  from  anywhere  else,  because
    that's  the point where we recognize the need.  I don't think
    that's a good idea. What would happen if

      Xact 1 needs expansion relation and creates it.
      Xact 2 needs expansion relation too and uses that one
      Xact 1 aborts
      Xact 2 commits

    Better to put out an explanative error message if  tuple  too
    big  and  no  expansion  relation  exists,  than dealing with
    trouble when autocreating it. If it later turns out  that  it
    can  safely  work  as an automated process, we can do it in a
    subsequent release.

> >     Also  I  would  like to say that system relations cannot have
> >     expansion relations.  At  least  not  until  we  have  enough
> >     experience with this stuff.
>
> I'd really, really, really like to have this work for rules, though.
> Why shouldn't we allow it for system relations?  Most of the critical
> ones have fixed-width tuples anyway, so it won't matter for them.

    Me too, and for function source text again.

    But this time, you  include  the  syscache  into  the  entire
    approach too.

> BTW, it strikes me we should drop the "lztext" special datatype, and
> instead have compression automatically applied to any varlena that
> we are contemplating putting out-of-line.  (If we're really lucky,
> that saves us having to put the value out-of-line!)

    Nice   idea,   and  should  be  technically  easy  since  the
    compressor itself is separated from the lztext type. OTOH the
    user  then  will  have no choice to prevent compression tries
    for performance reasons.

    So this feature again is something that IMHO should go into a
    configurable option.

> >     Is  that  now  what we initially want to give a try? If so, I
> >     would like to start soon to get the generic part ready  ASAP.
> >     Others  could  then  join  in  and  contribute by adding LONG
> >     support for all the VARLENA data types we have.
>
> Yes, if we don't do it inside fastgetattr then there's a lot of code
> that will have to change.

    That's  why  I'd  like  a  small  number of types involved at
    first.

    And there we're back  on  the  "release  what  we  have  now"
    discussion  again.  Some like to get new functionality out in
    a couple of smaller steps,  than  doing  the  big  all-in-one
    roll.  Some  not.  Seems we can never get a consensus on that
    :-(


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] generic LONG VARLENA

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> If we are going to do this, we ought also think about solving the
>> generic memory-leakage problem at the same time.  No point in having
>> to revisit all the same code later to deal with that issue.

> I have a good fix for this.  My patch suggested the varlena routine
> pfree the pointer returned from expand_long().  No need for that.  With
> an LRU cache, we can have the cache itself free the old values.

Oooh, that's a thought.  Sort of like applying TupleTableSlot to
individual datum values.

> There aren't any cases where the varlena access routines access more
> than two varlena values at the same time.

Huh?  The standard operators on varlena types access at least three (two
inputs and a result), and multi-argument functions could access more.
Also think about functions written in PLs: they could invoke a large
amount of computation, and would still expect to be able to access their
original input arguments.

I'd feel more comfortable with explicit reference counting.  Perhaps
we could make an exception for function return values: the cache
guarantees to hold onto a function return value for a little while
even though no one is holding a refcount on it at the instant it's
returned.  Functions (including PL functions) that want to access
varlena values across any significant amount of computation would
have to bump the refcount on those values somehow.

> I am excited about the long data type.  This is _the_ way to do long
> data types.  Have any of the commercial databases figured out this way
> to do it.  I can't imagine a better system.

I think we are working on some really cool ideas here.  But I *don't*
think we have a solid enough hold on all the details that we can expect
to implement it and ship it out one-two-three.  Thus my feeling that
this is for 7.1 not 7.0...
        regards, tom lane


Re: [HACKERS] generic LONG VARLENA

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
> Tom Lane wrote:
>> Per-type control doesn't strike me as interesting or useful.

>     Isn't   intended  to  be  a  runtime  configuration.  Just  a
>     temporary feature to restrict  the  attributes  that  can  be
>     moved  off  to  those  types,  where  WE  know  that  the adt
>     functions are prepared for  them.

Oh, I see.  Yeah, if we wanted to make an interim release where only
some datatypes were ready for long values, that would be a necessary
safety measure.  But I'd rather plan on just getting it done in one
release.

>> BTW, it strikes me we should drop the "lztext" special datatype, and
>> instead have compression automatically applied to any varlena that
>> we are contemplating putting out-of-line.  (If we're really lucky,
>> that saves us having to put the value out-of-line!)

>     Nice   idea,   and  should  be  technically  easy  since  the
>     compressor itself is separated from the lztext type. OTOH the
>     user  then  will  have no choice to prevent compression tries
>     for performance reasons.
>     So this feature again is something that IMHO should go into a
>     configurable option.

Good point.  You're right, there should be a per-datatype "don't
bother to try to compress this type" flag.  (Is per-datatype the
right granularity?)
        regards, tom lane


Re: [HACKERS] generic LONG VARLENA

From
wieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

> wieck@debis.com (Jan Wieck) writes:
> >     Nice   idea,   and  should  be  technically  easy  since  the
> >     compressor itself is separated from the lztext type. OTOH the
> >     user  then  will  have no choice to prevent compression tries
> >     for performance reasons.
> >     So this feature again is something that IMHO should go into a
> >     configurable option.
>
> Good point.  You're right, there should be a per-datatype "don't
> bother to try to compress this type" flag.  (Is per-datatype the
> right granularity?)

    Per column!


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] generic LONG VARLENA

From
wieck@debis.com (Jan Wieck)
Date:
Tom Lane asked:

> (Jan, do you really think you can make this happen *and* bring foreign
> keys to a finished status before February?  If you are going to leave
> stuff undone in foreign keys, I think you are making the wrong choice.)

    Except  for  the  file  buffering of the trigger event queue,
    FOREIGN KEY is completely implemented as  I  proposed,  MATCH
    FULL.

    Thus I HAVE the time.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] generic LONG VARLENA

From
Bruce Momjian
Date:
> Tom Lane asked:
> 
> > (Jan, do you really think you can make this happen *and* bring foreign
> > keys to a finished status before February?  If you are going to leave
> > stuff undone in foreign keys, I think you are making the wrong choice.)
> 
>     Except  for  the  file  buffering of the trigger event queue,
>     FOREIGN KEY is completely implemented as  I  proposed,  MATCH
>     FULL.
> 
>     Thus I HAVE the time.

Well, this is very good news.  Jan, aren't you going to bed?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] generic LONG VARLENA

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> If we are going to do this, we ought also think about solving the
> >> generic memory-leakage problem at the same time.  No point in having
> >> to revisit all the same code later to deal with that issue.
> 
> > I have a good fix for this.  My patch suggested the varlena routine
> > pfree the pointer returned from expand_long().  No need for that.  With
> > an LRU cache, we can have the cache itself free the old values.
> 
> Oooh, that's a thought.  Sort of like applying TupleTableSlot to
> individual datum values.
> 
> > There aren't any cases where the varlena access routines access more
> > than two varlena values at the same time.
> 
> Huh?  The standard operators on varlena types access at least three (two
> inputs and a result), and multi-argument functions could access more.
> Also think about functions written in PLs: they could invoke a large
> amount of computation, and would still expect to be able to access their
> original input arguments.
> 
> I'd feel more comfortable with explicit reference counting.  Perhaps
> we could make an exception for function return values: the cache
> guarantees to hold onto a function return value for a little while
> even though no one is holding a refcount on it at the instant it's
> returned.  Functions (including PL functions) that want to access
> varlena values across any significant amount of computation would
> have to bump the refcount on those values somehow.

I just checked the code, and I don't see any places where a varlena is
returned that isn't palloc'ed inside the function, so the cache memory
never makes it out of the routines.

However, I see any reference to VARDATA could be a problem because it
assume the data is there, and not in the long* relations.  I could
probably figure out which ones need expanding.  They are mostly system
table accesses.  The others go through adt or are output to the user.

> 
> > I am excited about the long data type.  This is _the_ way to do long
> > data types.  Have any of the commercial databases figured out this way
> > to do it.  I can't imagine a better system.
> 
> I think we are working on some really cool ideas here.  But I *don't*
> think we have a solid enough hold on all the details that we can expect
> to implement it and ship it out one-two-three.  Thus my feeling that
> this is for 7.1 not 7.0...

We have gotten pretty far in two days.  This long tuple stuff is not as
difficult as foreign key because I can actually figure out what is
happening with the long types, while foreign key is a complete mystery
to me.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] generic LONG VARLENA

From
Bruce Momjian
Date:
> However, I see any reference to VARDATA could be a problem because it
> assume the data is there, and not in the long* relations.  I could
> probably figure out which ones need expanding.  They are mostly system
> table accesses.  The others go through adt or are output to the user.

VARDATA looks tricky.  Seems I may need that cache of values.  In most
cases, VARDATA values are used within the next few lines of code, just
like system cache tuples.  If I need it for longer periods, I have to
palloc it.  Good thing most VARDATA values are used for brief periods.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026