Thread: generic LONG VARLENA
Well, first I want to summarize some details, to see if we all agree so far in the discussion. - The implementation should be generic for all variable size types, but can be enabled/disabled per type. - Large values are moved out of the main tuple until it fit's a yet to be defined size. - The moved off values are kept in another relation per table, using regular tuples where the value is split into chunks. The new "expansion" relations get another relkind, so they can be hidden from the user and the system can easily identify them as such. - The type specific functions call a central support function to get the usual VARLENA format, which is taken from a LRU cache or fetched from the extension relation. They are responsible for freeing the memory after they're done with the value. Some macro's should make it fairly simple to handle. I don't think it is a good idea to create the expansion relation all the time. Some keyword in CREATE TABLE, and/or another ALTER TABLE should do it instead, so the DB admin can activate the LONG feature on a per table base as needed. In the first implementation there will be no command to deactivate it again. Workaround is rename table and select into as usual. Also I would like to say that system relations cannot have expansion relations. At least not until we have enough experience with this stuff. Is that now what we initially want to give a try? If so, I would like to start soon to get the generic part ready ASAP. Others could then join in and contribute by adding LONG support for all the VARLENA data types we have. Would really be a big leap if we can get this finished for a reasonable number of VARLENA types by February 1. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: > first I want to summarize some details, to see if we all > agree so far in the discussion. I snipped everything I agreed with ;-) > - The implementation should be generic for all variable size > types, but can be enabled/disabled per type. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Per-type control doesn't strike me as interesting or useful. If there needs to be a control at all, which I doubt, per-table would be the way to go. But how many users will really say, "Oh yes, I *want* the thing to fail if my tuple's too big!"? I say: make it automatically apply whenever needed, don't force users to think about it. > - The type specific functions call a central support function > to get the usual VARLENA format, which is taken from a LRU > cache or fetched from the extension relation. They are > responsible for freeing the memory after they're done with > the value. If we are going to do this, we ought also think about solving the generic memory-leakage problem at the same time. No point in having to revisit all the same code later to deal with that issue. > I don't think it is a good idea to create the expansion > relation all the time. Some keyword in CREATE TABLE, and/or > another ALTER TABLE should do it instead, so the DB admin can > activate the LONG feature on a per table base as needed. I don't believe it. See above: people will complain that it's a bug that the system doesn't handle their long data values. Saying "oh, you have to turn it on" will not appease them. My objection is really the same as for the specialized LONG datatype: I do *not* want people to have to put nonstandard junk into their database schema declarations in order to activate this feature. I think it should Just Work and stay out of users' faces. Creating the expansion relation isn't that big a deal, but if you don't want to do it always, why not do it on first use? > Also I would like to say that system relations cannot have > expansion relations. At least not until we have enough > experience with this stuff. I'd really, really, really like to have this work for rules, though. Why shouldn't we allow it for system relations? Most of the critical ones have fixed-width tuples anyway, so it won't matter for them. BTW, it strikes me we should drop the "lztext" special datatype, and instead have compression automatically applied to any varlena that we are contemplating putting out-of-line. (If we're really lucky, that saves us having to put the value out-of-line!) > Is that now what we initially want to give a try? If so, I > would like to start soon to get the generic part ready ASAP. > Others could then join in and contribute by adding LONG > support for all the VARLENA data types we have. Yes, if we don't do it inside fastgetattr then there's a lot of code that will have to change. > Would really be a big leap if we can get this finished for a > reasonable number of VARLENA types by February 1. The more I think about this the more I think that it's a bad, bad idea to try to have it ready by Feb 1. There's not really enough time to get it right and test it. I don't want to be putting out an unstable release, and that's what I'm afraid we'll have if we try to rush in such a major change as this. Particularly when we have nontrivial amounts of unfinished business elsewhere that we shouldn't neglect. (Jan, do you really think you can make this happen *and* bring foreign keys to a finished status before February? If you are going to leave stuff undone in foreign keys, I think you are making the wrong choice.) Furthermore, we can save ourselves some time if we tackle this change in combination with the fmgr revision and the memory-leak-elimination issue. We will be touching all the same per-data-type code for each of these issues, so why not touch it once instead of several times? In short, I like this design but I think we should plan it for 7.1. regards, tom lane
> wieck@debis.com (Jan Wieck) writes: > > first I want to summarize some details, to see if we all > > agree so far in the discussion. > > I snipped everything I agreed with ;-) > > > - The implementation should be generic for all variable size > > types, but can be enabled/disabled per type. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Per-type control doesn't strike me as interesting or useful. If there > needs to be a control at all, which I doubt, per-table would be the > way to go. But how many users will really say, "Oh yes, I *want* the > thing to fail if my tuple's too big!"? I say: make it automatically > apply whenever needed, don't force users to think about it. Agreed. Who wouldn't want it. > > > - The type specific functions call a central support function > > to get the usual VARLENA format, which is taken from a LRU > > cache or fetched from the extension relation. They are > > responsible for freeing the memory after they're done with > > the value. > > If we are going to do this, we ought also think about solving the > generic memory-leakage problem at the same time. No point in having > to revisit all the same code later to deal with that issue. I have a good fix for this. My patch suggested the varlena routine pfree the pointer returned from expand_long(). No need for that. With an LRU cache, we can have the cache itself free the old values. This would be a nice optimization. Just add the lines below: + + if (VARISLONG(vlena)) /* checks long bit */ + vlena = expand_long(vlena); /* returns palloc long */ + There aren't any cases where the varlena access routines access more than two varlena values at the same time. If the expansion cache is at least two values, you can just expand it and return memory. When that cache entry is expired, the memory is freed. Wow, this makes the varlena changes very compact. All the action is in expand_long(). Basically, don't have the access routines free the memory, have the old cache entries be pfreed. > > > I don't think it is a good idea to create the expansion > > relation all the time. Some keyword in CREATE TABLE, and/or > > another ALTER TABLE should do it instead, so the DB admin can > > activate the LONG feature on a per table base as needed. > > I don't believe it. See above: people will complain that it's a bug > that the system doesn't handle their long data values. Saying "oh, you > have to turn it on" will not appease them. My objection is really the > same as for the specialized LONG datatype: I do *not* want people to > have to put nonstandard junk into their database schema declarations > in order to activate this feature. I think it should Just Work and > stay out of users' faces. > Creating the expansion relation isn't that big a deal, but if you > don't want to do it always, why not do it on first use? > Yes, why not just create it the first time it is needed. Seems pretty small performance-wise. > > Also I would like to say that system relations cannot have > > expansion relations. At least not until we have enough > > experience with this stuff. > > I'd really, really, really like to have this work for rules, though. > Why shouldn't we allow it for system relations? Most of the critical > ones have fixed-width tuples anyway, so it won't matter for them. Oh, that's a good point. Seems that is a big reason for expansion of types. > > BTW, it strikes me we should drop the "lztext" special datatype, and > instead have compression automatically applied to any varlena that > we are contemplating putting out-of-line. (If we're really lucky, > that saves us having to put the value out-of-line!) Ooh, very smart. You would need another bit to say whether the varlena is compressed or now. If you take it from 4-byte header, we are down to a 1 GB length limit. You could do all the compression/decompression in the two expansion functions, though compressing and then not using the long_ table would be a little tricky to code, but do-able. You would compress, then if still too large, move to long table. > > > Is that now what we initially want to give a try? If so, I > > would like to start soon to get the generic part ready ASAP. > > Others could then join in and contribute by adding LONG > > support for all the VARLENA data types we have. > > Yes, if we don't do it inside fastgetattr then there's a lot of code > that will have to change. See above. It looks like only a few lines per function now. If we do it in fastgetattr, is there less code to change? How do we pfree()? > > > Would really be a big leap if we can get this finished for a > > reasonable number of VARLENA types by February 1. > > The more I think about this the more I think that it's a bad, bad idea > to try to have it ready by Feb 1. There's not really enough time to > get it right and test it. I don't want to be putting out an unstable > release, and that's what I'm afraid we'll have if we try to rush in > such a major change as this. Particularly when we have nontrivial > amounts of unfinished business elsewhere that we shouldn't neglect. > (Jan, do you really think you can make this happen *and* bring foreign > keys to a finished status before February? If you are going to leave > stuff undone in foreign keys, I think you are making the wrong choice.) > > Furthermore, we can save ourselves some time if we tackle this change > in combination with the fmgr revision and the memory-leak-elimination > issue. We will be touching all the same per-data-type code for each > of these issues, so why not touch it once instead of several times? > > In short, I like this design but I think we should plan it for 7.1. Not sure on this one. Jan will have to comment. I am excited about the long data type. This is _the_ way to do long data types. Have any of the commercial databases figured out this way to do it. I can't imagine a better system. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > wieck@debis.com (Jan Wieck) writes: > > Per-type control doesn't strike me as interesting or useful. If there > needs to be a control at all, which I doubt, per-table would be the Isn't intended to be a runtime configuration. Just a temporary feature to restrict the attributes that can be moved off to those types, where WE know that the adt functions are prepared for them. If we finally have all builtin types finished for LONG handling, it will be removed, making user defined types LONGable too. > > I don't think it is a good idea to create the expansion > > relation all the time. Some keyword in CREATE TABLE, and/or > > another ALTER TABLE should do it instead, so the DB admin can > > activate the LONG feature on a per table base as needed. > > I don't believe it. See above: people will complain that it's a bug > that the system doesn't handle their long data values. Saying "oh, you > have to turn it on" will not appease them. My objection is really the > same as for the specialized LONG datatype: I do *not* want people to > have to put nonstandard junk into their database schema declarations > in order to activate this feature. I think it should Just Work and > stay out of users' faces. > > Creating the expansion relation isn't that big a deal, but if you > don't want to do it always, why not do it on first use? So you want to do a heap_create_with_catalog() plus index_create()'s from inside the heap_insert() or heap_update(). Cannot be done from anywhere else, because that's the point where we recognize the need. I don't think that's a good idea. What would happen if Xact 1 needs expansion relation and creates it. Xact 2 needs expansion relation too and uses that one Xact 1 aborts Xact 2 commits Better to put out an explanative error message if tuple too big and no expansion relation exists, than dealing with trouble when autocreating it. If it later turns out that it can safely work as an automated process, we can do it in a subsequent release. > > Also I would like to say that system relations cannot have > > expansion relations. At least not until we have enough > > experience with this stuff. > > I'd really, really, really like to have this work for rules, though. > Why shouldn't we allow it for system relations? Most of the critical > ones have fixed-width tuples anyway, so it won't matter for them. Me too, and for function source text again. But this time, you include the syscache into the entire approach too. > BTW, it strikes me we should drop the "lztext" special datatype, and > instead have compression automatically applied to any varlena that > we are contemplating putting out-of-line. (If we're really lucky, > that saves us having to put the value out-of-line!) Nice idea, and should be technically easy since the compressor itself is separated from the lztext type. OTOH the user then will have no choice to prevent compression tries for performance reasons. So this feature again is something that IMHO should go into a configurable option. > > Is that now what we initially want to give a try? If so, I > > would like to start soon to get the generic part ready ASAP. > > Others could then join in and contribute by adding LONG > > support for all the VARLENA data types we have. > > Yes, if we don't do it inside fastgetattr then there's a lot of code > that will have to change. That's why I'd like a small number of types involved at first. And there we're back on the "release what we have now" discussion again. Some like to get new functionality out in a couple of smaller steps, than doing the big all-in-one roll. Some not. Seems we can never get a consensus on that :-( Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> If we are going to do this, we ought also think about solving the >> generic memory-leakage problem at the same time. No point in having >> to revisit all the same code later to deal with that issue. > I have a good fix for this. My patch suggested the varlena routine > pfree the pointer returned from expand_long(). No need for that. With > an LRU cache, we can have the cache itself free the old values. Oooh, that's a thought. Sort of like applying TupleTableSlot to individual datum values. > There aren't any cases where the varlena access routines access more > than two varlena values at the same time. Huh? The standard operators on varlena types access at least three (two inputs and a result), and multi-argument functions could access more. Also think about functions written in PLs: they could invoke a large amount of computation, and would still expect to be able to access their original input arguments. I'd feel more comfortable with explicit reference counting. Perhaps we could make an exception for function return values: the cache guarantees to hold onto a function return value for a little while even though no one is holding a refcount on it at the instant it's returned. Functions (including PL functions) that want to access varlena values across any significant amount of computation would have to bump the refcount on those values somehow. > I am excited about the long data type. This is _the_ way to do long > data types. Have any of the commercial databases figured out this way > to do it. I can't imagine a better system. I think we are working on some really cool ideas here. But I *don't* think we have a solid enough hold on all the details that we can expect to implement it and ship it out one-two-three. Thus my feeling that this is for 7.1 not 7.0... regards, tom lane
wieck@debis.com (Jan Wieck) writes: > Tom Lane wrote: >> Per-type control doesn't strike me as interesting or useful. > Isn't intended to be a runtime configuration. Just a > temporary feature to restrict the attributes that can be > moved off to those types, where WE know that the adt > functions are prepared for them. Oh, I see. Yeah, if we wanted to make an interim release where only some datatypes were ready for long values, that would be a necessary safety measure. But I'd rather plan on just getting it done in one release. >> BTW, it strikes me we should drop the "lztext" special datatype, and >> instead have compression automatically applied to any varlena that >> we are contemplating putting out-of-line. (If we're really lucky, >> that saves us having to put the value out-of-line!) > Nice idea, and should be technically easy since the > compressor itself is separated from the lztext type. OTOH the > user then will have no choice to prevent compression tries > for performance reasons. > So this feature again is something that IMHO should go into a > configurable option. Good point. You're right, there should be a per-datatype "don't bother to try to compress this type" flag. (Is per-datatype the right granularity?) regards, tom lane
Tom Lane wrote: > wieck@debis.com (Jan Wieck) writes: > > Nice idea, and should be technically easy since the > > compressor itself is separated from the lztext type. OTOH the > > user then will have no choice to prevent compression tries > > for performance reasons. > > So this feature again is something that IMHO should go into a > > configurable option. > > Good point. You're right, there should be a per-datatype "don't > bother to try to compress this type" flag. (Is per-datatype the > right granularity?) Per column! Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Tom Lane asked: > (Jan, do you really think you can make this happen *and* bring foreign > keys to a finished status before February? If you are going to leave > stuff undone in foreign keys, I think you are making the wrong choice.) Except for the file buffering of the trigger event queue, FOREIGN KEY is completely implemented as I proposed, MATCH FULL. Thus I HAVE the time. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> Tom Lane asked: > > > (Jan, do you really think you can make this happen *and* bring foreign > > keys to a finished status before February? If you are going to leave > > stuff undone in foreign keys, I think you are making the wrong choice.) > > Except for the file buffering of the trigger event queue, > FOREIGN KEY is completely implemented as I proposed, MATCH > FULL. > > Thus I HAVE the time. Well, this is very good news. Jan, aren't you going to bed? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> If we are going to do this, we ought also think about solving the > >> generic memory-leakage problem at the same time. No point in having > >> to revisit all the same code later to deal with that issue. > > > I have a good fix for this. My patch suggested the varlena routine > > pfree the pointer returned from expand_long(). No need for that. With > > an LRU cache, we can have the cache itself free the old values. > > Oooh, that's a thought. Sort of like applying TupleTableSlot to > individual datum values. > > > There aren't any cases where the varlena access routines access more > > than two varlena values at the same time. > > Huh? The standard operators on varlena types access at least three (two > inputs and a result), and multi-argument functions could access more. > Also think about functions written in PLs: they could invoke a large > amount of computation, and would still expect to be able to access their > original input arguments. > > I'd feel more comfortable with explicit reference counting. Perhaps > we could make an exception for function return values: the cache > guarantees to hold onto a function return value for a little while > even though no one is holding a refcount on it at the instant it's > returned. Functions (including PL functions) that want to access > varlena values across any significant amount of computation would > have to bump the refcount on those values somehow. I just checked the code, and I don't see any places where a varlena is returned that isn't palloc'ed inside the function, so the cache memory never makes it out of the routines. However, I see any reference to VARDATA could be a problem because it assume the data is there, and not in the long* relations. I could probably figure out which ones need expanding. They are mostly system table accesses. The others go through adt or are output to the user. > > > I am excited about the long data type. This is _the_ way to do long > > data types. Have any of the commercial databases figured out this way > > to do it. I can't imagine a better system. > > I think we are working on some really cool ideas here. But I *don't* > think we have a solid enough hold on all the details that we can expect > to implement it and ship it out one-two-three. Thus my feeling that > this is for 7.1 not 7.0... We have gotten pretty far in two days. This long tuple stuff is not as difficult as foreign key because I can actually figure out what is happening with the long types, while foreign key is a complete mystery to me. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> However, I see any reference to VARDATA could be a problem because it > assume the data is there, and not in the long* relations. I could > probably figure out which ones need expanding. They are mostly system > table accesses. The others go through adt or are output to the user. VARDATA looks tricky. Seems I may need that cache of values. In most cases, VARDATA values are used within the next few lines of code, just like system cache tuples. If I need it for longer periods, I have to palloc it. Good thing most VARDATA values are used for brief periods. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026