Thread: Shared access methods?
Hi, Several features in various discussed access methods would benefit from being able to perform actions when writing out a buffer. As an example, because it doesn't require understanding any of the new proposed storage formats, it'd be good for performance if we could eagerly set hint bits / perform page level pruning when cleaning a dirty buffer either during checkpoint writeout or bgwriter / backend reclaim. That'd allow to avoid the write amplification issues in several of current and proposed cleanup schemes. Unfortunately that's currently not really easy to do. Buffers don't currently know which AM they belong to, therefore we can't know how to treat it at writeout time. It's not that hard to make space in the buffer descriptor to additionally store the oid of the associated AM, e.g. we could just replace buf_id with a small bit of pointer math. But even if we had a AM oid, it'd be unclear what to do with it as it'd be specific to a database. Which makes it pretty much useless for tasks happening on writeout of victim buffers / checkpoint. Thus I think it'd be better design to have pg_am be a shared relation. That'd imply a two things: a) amhandler couldn't be regproc but would need to be two fields, one pointing to internal or a shlib, the other to the function name. b) extensions containing AMs would need to do something INSERT ... ON CONFLICT DO NOTHING like. I don't think this is the most urgent feature for making pluggable AMs useful, but given that we're likely going to whack around pg_am, and that pg_am is fairly new in its current incarnation, it seems like a good idea to discuss this now. Comments? Greetings, Andres Freund PS: I could have written more on this, but people are urging me to come to dinner, so thank them ;)
Hi! On Thu, Jun 14, 2018 at 5:37 AM Andres Freund <andres@anarazel.de> wrote: > Several features in various discussed access methods would benefit from > being able to perform actions when writing out a buffer. As an example, > because it doesn't require understanding any of the new proposed storage > formats, it'd be good for performance if we could eagerly set hint bits > / perform page level pruning when cleaning a dirty buffer either during > checkpoint writeout or bgwriter / backend reclaim. That'd allow to > avoid the write amplification issues in several of current and proposed > cleanup schemes. Yes, that could be useful. > Unfortunately that's currently not really easy to do. Buffers don't > currently know which AM they belong to, therefore we can't know how to > treat it at writeout time. It's not that hard to make space in the > buffer descriptor to additionally store the oid of the associated AM, > e.g. we could just replace buf_id with a small bit of pointer math. > > But even if we had a AM oid, it'd be unclear what to do with it as it'd > be specific to a database. Which makes it pretty much useless for tasks > happening on writeout of victim buffers / checkpoint. > > Thus I think it'd be better design to have pg_am be a shared > relation. That'd imply a two things: > a) amhandler couldn't be regproc but would need to be two fields, one > pointing to internal or a shlib, the other to the function name. Makes sense for me. > b) extensions containing AMs would need to do something INSERT ... ON > CONFLICT DO NOTHING like. We already have CREATE ACCESS METHOD command. I think this command should handle that internally. And I don't understand why "ON CONFLICT DO NOTHING". If AM with given name already exists in pg_am, why should we ignore the error? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2018-06-14 15:59:22 +0300, Alexander Korotkov wrote: > > b) extensions containing AMs would need to do something INSERT ... ON > > CONFLICT DO NOTHING like. > > We already have CREATE ACCESS METHOD command. I think this command > should handle that internally. And I don't understand why "ON > CONFLICT DO NOTHING". If AM with given name already exists in pg_am, > why should we ignore the error? Well, right now an AM containing extension creates things in each database (i.e. same scope as extensions). But with shared AMs that wouldn't be the case - you might still want to create the extension in another database. So we'd need to have CREATE ACCESS METHOD check whether already is the same entry, and only delete it on DROP ACCESS METHOD if there's no dependencies from other databases... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-06-14 15:59:22 +0300, Alexander Korotkov wrote: >> We already have CREATE ACCESS METHOD command. I think this command >> should handle that internally. And I don't understand why "ON >> CONFLICT DO NOTHING". If AM with given name already exists in pg_am, >> why should we ignore the error? > Well, right now an AM containing extension creates things in each > database (i.e. same scope as extensions). But with shared AMs that > wouldn't be the case - you might still want to create the extension in > another database. So we'd need to have CREATE ACCESS METHOD check > whether already is the same entry, and only delete it on DROP ACCESS > METHOD if there's no dependencies from other databases... I'm not really buying this idea at all, at least not for index AMs, because you also need a pile of other database-local infrastructure --- opclasses, operators, functions, etc. Trying to make pieces of that be shared is not going to end well. regards, tom lane
On 2018-06-14 16:10:42 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2018-06-14 15:59:22 +0300, Alexander Korotkov wrote: > >> We already have CREATE ACCESS METHOD command. I think this command > >> should handle that internally. And I don't understand why "ON > >> CONFLICT DO NOTHING". If AM with given name already exists in pg_am, > >> why should we ignore the error? > > > Well, right now an AM containing extension creates things in each > > database (i.e. same scope as extensions). But with shared AMs that > > wouldn't be the case - you might still want to create the extension in > > another database. So we'd need to have CREATE ACCESS METHOD check > > whether already is the same entry, and only delete it on DROP ACCESS > > METHOD if there's no dependencies from other databases... > > I'm not really buying this idea at all, at least not for index AMs, > because you also need a pile of other database-local infrastructure > --- opclasses, operators, functions, etc. Trying to make pieces of > that be shared is not going to end well. Yea, I do think there's a number of issues around exactly that - in fact I raised them when Robert was talking about the issue before. But I do think there's a few things that are doable without actually needing to invoke any user defined code aside of the AM code itself. E.g. heap pruning / aggressively setting hint bits doesn't need to invoke operators, and I can think of some ways to implement index delete marking that does so without invoking any comparators either. Thus it seems like this'd still allow to implement quite a bit of new useful infrastructure, even though more would be needed. Greetings, Andres Freund
On 2018-Jun-14, Andres Freund wrote: > But I do think there's a few things that are doable without actually > needing to invoke any user defined code aside of the AM code > itself. E.g. heap pruning / aggressively setting hint bits doesn't need > to invoke operators, and I can think of some ways to implement index > delete marking that does so without invoking any comparators either. So what you want to do is have bgwriter/checkpointer able to scan some catalog and grab a function pointer that can "execute pruning on this shared buffer", right? For that maybe we need to split out a part of AMs that is storage-level and another one that is data-level. So an access method would create two catalog entries, one of which is shared (pg_shared_am? ugh) and the other is the regular one we already have in pg_am. The handler function in pg_shared_am gives you functions that can only do storage-level stuff such as hint bit setting, page pruning, tuple freezing, CRC, etc which does not require access to the data itself. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-06-14 16:33:08 -0400, Alvaro Herrera wrote: > On 2018-Jun-14, Andres Freund wrote: > > > But I do think there's a few things that are doable without actually > > needing to invoke any user defined code aside of the AM code > > itself. E.g. heap pruning / aggressively setting hint bits doesn't need > > to invoke operators, and I can think of some ways to implement index > > delete marking that does so without invoking any comparators either. > > So what you want to do is have bgwriter/checkpointer able to scan some > catalog and grab a function pointer that can "execute pruning on this > shared buffer", right? Yes. > For that maybe we need to split out a part of > AMs that is storage-level and another one that is data-level. So an > access method would create two catalog entries, one of which is shared > (pg_shared_am? ugh) and the other is the regular one we already have in > pg_am. The handler function in pg_shared_am gives you functions that > can only do storage-level stuff such as hint bit setting, page pruning, > tuple freezing, CRC, etc which does not require access to the data > itself. I'm not sure I understand the need for this split? Why can't we have pg_am's amhandler - now a shlib/name combo - return its normal *AmRoutine struct, one of which would be an optional 'amonwriteout' callback? Greetings, Andres Freund