Thread: System column support for partitioned tables using heap
I've run into an existing behavior where xmax(), and various other system tables, return an error when included in the RETURNING list on a partitioned table.
ERROR: cannot retrieve a system column in this context
`
This issue got a fair airing back in 2020:
AW: posgres 12 bug (partitioned table)
https://www.postgresql.org/message-id/flat/GVAP278MB006939B1D7DFDD650E383FBFEACE0%40GVAP278MB0069.CHEP278.PROD.OUTLOOK.COM#908f2604081699e7f41fa20d352e1b79
I'm using 14.4, and just ran into this behavior today. I'm wondering if there has been any new work on this subject, or anything to take into account moving forward?
I'm not a C coder, and do not know the Postgres internals, but here's what I gleaned from the thread:
* Available system columns depend on the underlying table access method, and may/will vary across AMs. For example, the columns implemented by heap is what the docs describe, an FDW could be anything, and Postgres has no control of what, if any, system column-like attributes they support, and future and hypothetical AMs may have different sets.
* Rather than return garbage results, or a default of 0, etc., the system throws the error I ran into.
I'd be happier working with a NULL result than garbage, ambiguous results, or errors...but an error is the current behavior. Agreed on that, I'd rather an error than a bad/meaningless result. Postgres' consistent emphasis on correctness is easily one of its greatest qualities.
In my case, I'm upgrading a lot of existing code to try and capture a more complete profile of what an UPSERT did. Right now, I grab a count(*) of the rows and return that. Works fine. A revised snippet looks a bit like this:
------------------------------------------------------------
ERROR: cannot retrieve a system column in this context
`
This issue got a fair airing back in 2020:
AW: posgres 12 bug (partitioned table)
https://www.postgresql.org/message-id/flat/GVAP278MB006939B1D7DFDD650E383FBFEACE0%40GVAP278MB0069.CHEP278.PROD.OUTLOOK.COM#908f2604081699e7f41fa20d352e1b79
I'm using 14.4, and just ran into this behavior today. I'm wondering if there has been any new work on this subject, or anything to take into account moving forward?
I'm not a C coder, and do not know the Postgres internals, but here's what I gleaned from the thread:
* Available system columns depend on the underlying table access method, and may/will vary across AMs. For example, the columns implemented by heap is what the docs describe, an FDW could be anything, and Postgres has no control of what, if any, system column-like attributes they support, and future and hypothetical AMs may have different sets.
* Rather than return garbage results, or a default of 0, etc., the system throws the error I ran into.
I'd be happier working with a NULL result than garbage, ambiguous results, or errors...but an error is the current behavior. Agreed on that, I'd rather an error than a bad/meaningless result. Postgres' consistent emphasis on correctness is easily one of its greatest qualities.
In my case, I'm upgrading a lot of existing code to try and capture a more complete profile of what an UPSERT did. Right now, I grab a count(*) of the rows and return that. Works fine. A revised snippet looks a bit like this:
------------------------------------------------------------
...UPSERT code
returning xmax as inserted_transaction_id),
status_data AS (
select count(*) FILTER (where inserted_transaction_id = 0) AS insert_count,
count(*) FILTER (where inserted_transaction_id != 0) AS estimated_update_count,
pg_current_xact_id_if_assigned()::text AS transaction_id
from inserted_rows),
...custom logging code
-- Final output/result.
select insert_count,
estimated_update_count,
transaction_id
from status_data;
------------------------------------------------------------
returning xmax as inserted_transaction_id),
status_data AS (
select count(*) FILTER (where inserted_transaction_id = 0) AS insert_count,
count(*) FILTER (where inserted_transaction_id != 0) AS estimated_update_count,
pg_current_xact_id_if_assigned()::text AS transaction_id
from inserted_rows),
...custom logging code
-- Final output/result.
select insert_count,
estimated_update_count,
transaction_id
from status_data;
------------------------------------------------------------
This fails on a partitioned table because xmax() may not exist. In fact, it does exist in all of those tables, but the system doesn't know how to guarantee that. I know which tables are partitioned, and can downgrade the result on partitioned tables to the count(*) I've been using to date. But now I'm wondering if working with xmax() like this is a poor idea going forward. I don't want to lean on a feature/behavior that's likely to change. For example, I noticed the other day that MERGE does not support RETURNING.
I'd appreciate any insight or advice you can offer.
I'd appreciate any insight or advice you can offer.
On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx <morrisdeoryx@gmail.com> wrote: > This fails on a partitioned table because xmax() may not exist. In fact, it does exist in all of those tables, but thesystem doesn't know how to guarantee that. I know which tables are partitioned, and can downgrade the result on partitionedtables to the count(*) I've been using to date. But now I'm wondering if working with xmax() like this is a pooridea going forward. I don't want to lean on a feature/behavior that's likely to change. For example, I noticed the otherday that MERGE does not support RETURNING. > > I'd appreciate any insight or advice you can offer. What is motivating you to want to see the xmax value here? It's not an unreasonable thing to want to do, IMHO, but it's a little bit niche so I'm just curious what the motivation is. I do agree with you that it would be nice if this worked better than it does, but I don't really know exactly how to make that happen. The column list for a partitioned table must be fixed at the time it is created, but we do not know what partitions might be added in the future, and thus we don't know whether they will have an xmax column. I guess we could have tried to work things out so that a 0 value would be passed up from children that lack an xmax column, and that would allow the parent to have such a column, but I don't feel too bad that we didn't do that ... should I? -- Robert Haas EDB: http://www.enterprisedb.com
> What is motivating you to want to see the xmax value here? It's not an
> unreasonable thing to want to do, IMHO, but it's a little bit niche so
> I'm just curious what the motivation is.
Yeah, I figured it was niche when I saw so little mention of the issue.
My reason for xmax() in the result is to break down the affected rows count into an insert count, and a modified estimate. Not super critical, but helpful. I've built out some simple custom logging table in out system for this kind of detail, and folks have been wanting to break down rows submitted, rows inserted, and rows updated a bit better. Rows submitted is easy and rows inserted is too...update is an estimate as I'm not using anything fancy with xmax() to sort out what exactly happened.
For clarification, we're not using an ORM, and may need to support straggling clients, so our push cycle works like this:
* Create a view with the fields expected in the insert. I figured I'd use CREATE VIEW instead of CREATE TYPE as then I can quickly check out the details against live data, and I still get a custom compound type.
* Write a function that accepts an array of view_name_type. I love Postgres' typing system, It has spoiled me forever. Can't submit badly formatted objects from the client, they're rejected automatically.
* Write a client-side routine to package data as an array and push it into the insert handling function. The function unnests the array, and then the actual insert code draws from the unpacked values. If I need to extend the table, I can add a new function that knows about the revised fields, and revise (when necessary) earlier supported formats to map to new types/columns/defaults.
There are few CTEs in there, including one that does the main insert and returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <> 0 (not an insert).
> I do agree with you that it would be nice if this worked better than
> it does, but I don't really know exactly how to make that happen. The
> column list for a partitioned table must be fixed at the time it is
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.
> I guess we could have tried to work things out so that a 0 value would
> be passed up from children that lack an xmax column, and that would
> allow the parent to have such a column, but I don't feel too bad that
> we didn't do that ... should I?
You should never feel bad about anything ;-) You and others on that thread contribute so much that I'm getting value out of.
I had it in mind that it would be nice to have some kind of catalog/abstraction that would make it possible to interrogate what system columns are available on a table/partition based on access method. In my vague notion, that might make some of the other ideas from that thread, such as index-oriented stores with quite different physical layouts, easier to implement. But, it's all free when you aren't the one who can write the code.
I've switched the partition-based tables back to returning * on the insert CTE, and then aggregating that to add to a log table and the client result. It's fine. A rich result summary would be very nice. As in rows added/modified/deleted on whatever table(s). If anyone ever decides to implement such a structure for MERGE, it would be nice to see it retrofitted to the other data modification commands where RETURNING works.
> unreasonable thing to want to do, IMHO, but it's a little bit niche so
> I'm just curious what the motivation is.
Yeah, I figured it was niche when I saw so little mention of the issue.
My reason for xmax() in the result is to break down the affected rows count into an insert count, and a modified estimate. Not super critical, but helpful. I've built out some simple custom logging table in out system for this kind of detail, and folks have been wanting to break down rows submitted, rows inserted, and rows updated a bit better. Rows submitted is easy and rows inserted is too...update is an estimate as I'm not using anything fancy with xmax() to sort out what exactly happened.
For clarification, we're not using an ORM, and may need to support straggling clients, so our push cycle works like this:
* Create a view with the fields expected in the insert. I figured I'd use CREATE VIEW instead of CREATE TYPE as then I can quickly check out the details against live data, and I still get a custom compound type.
* Write a function that accepts an array of view_name_type. I love Postgres' typing system, It has spoiled me forever. Can't submit badly formatted objects from the client, they're rejected automatically.
* Write a client-side routine to package data as an array and push it into the insert handling function. The function unnests the array, and then the actual insert code draws from the unpacked values. If I need to extend the table, I can add a new function that knows about the revised fields, and revise (when necessary) earlier supported formats to map to new types/columns/defaults.
There are few CTEs in there, including one that does the main insert and returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <> 0 (not an insert).
> I do agree with you that it would be nice if this worked better than
> it does, but I don't really know exactly how to make that happen. The
> column list for a partitioned table must be fixed at the time it is
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.
> I guess we could have tried to work things out so that a 0 value would
> be passed up from children that lack an xmax column, and that would
> allow the parent to have such a column, but I don't feel too bad that
> we didn't do that ... should I?
You should never feel bad about anything ;-) You and others on that thread contribute so much that I'm getting value out of.
I had it in mind that it would be nice to have some kind of catalog/abstraction that would make it possible to interrogate what system columns are available on a table/partition based on access method. In my vague notion, that might make some of the other ideas from that thread, such as index-oriented stores with quite different physical layouts, easier to implement. But, it's all free when you aren't the one who can write the code.
I've switched the partition-based tables back to returning * on the insert CTE, and then aggregating that to add to a log table and the client result. It's fine. A rich result summary would be very nice. As in rows added/modified/deleted on whatever table(s). If anyone ever decides to implement such a structure for MERGE, it would be nice to see it retrofitted to the other data modification commands where RETURNING works.
On Tue, Jul 19, 2022 at 6:13 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx <morrisdeoryx@gmail.com> wrote:
> This fails on a partitioned table because xmax() may not exist. In fact, it does exist in all of those tables, but the system doesn't know how to guarantee that. I know which tables are partitioned, and can downgrade the result on partitioned tables to the count(*) I've been using to date. But now I'm wondering if working with xmax() like this is a poor idea going forward. I don't want to lean on a feature/behavior that's likely to change. For example, I noticed the other day that MERGE does not support RETURNING.
>
> I'd appreciate any insight or advice you can offer.
What is motivating you to want to see the xmax value here? It's not an
unreasonable thing to want to do, IMHO, but it's a little bit niche so
I'm just curious what the motivation is.
I do agree with you that it would be nice if this worked better than
it does, but I don't really know exactly how to make that happen. The
column list for a partitioned table must be fixed at the time it is
created, but we do not know what partitions might be added in the
future, and thus we don't know whether they will have an xmax column.
I guess we could have tried to work things out so that a 0 value would
be passed up from children that lack an xmax column, and that would
allow the parent to have such a column, but I don't feel too bad that
we didn't do that ... should I?
--
Robert Haas
EDB: http://www.enterprisedb.com
> The column list for a partitioned table must be fixed at the time it is
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.
Right, seeing what you're meaning there. It's fantastic that a partition might be an FDW to a system that has no concept at all of anything like a "system column", or something with an alternative AM to heap that has a different set of system columns. That flexibility in partitions is super valuable. I'd love to be able to convert old partitions into column stores, for example. (I think that Citus offers that feature now.)
I guess if anyone ever felt it was worth the effort, maybe whatever checks are done at attach-partition time for the column list could also enforce meta/system columns. If missing, a shimming mechanism would be pretty necessary.
Sounds like a lot of work for not much gain, at least in this narrow case.
Thanks again for answering.
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.
Right, seeing what you're meaning there. It's fantastic that a partition might be an FDW to a system that has no concept at all of anything like a "system column", or something with an alternative AM to heap that has a different set of system columns. That flexibility in partitions is super valuable. I'd love to be able to convert old partitions into column stores, for example. (I think that Citus offers that feature now.)
I guess if anyone ever felt it was worth the effort, maybe whatever checks are done at attach-partition time for the column list could also enforce meta/system columns. If missing, a shimming mechanism would be pretty necessary.
Sounds like a lot of work for not much gain, at least in this narrow case.
Thanks again for answering.
On Tue, Jul 19, 2022 at 6:43 PM Morris de Oryx <morrisdeoryx@gmail.com> wrote:
> What is motivating you to want to see the xmax value here? It's not an
> unreasonable thing to want to do, IMHO, but it's a little bit niche so
> I'm just curious what the motivation is.
Yeah, I figured it was niche when I saw so little mention of the issue.
My reason for xmax() in the result is to break down the affected rows count into an insert count, and a modified estimate. Not super critical, but helpful. I've built out some simple custom logging table in out system for this kind of detail, and folks have been wanting to break down rows submitted, rows inserted, and rows updated a bit better. Rows submitted is easy and rows inserted is too...update is an estimate as I'm not using anything fancy with xmax() to sort out what exactly happened.
For clarification, we're not using an ORM, and may need to support straggling clients, so our push cycle works like this:
* Create a view with the fields expected in the insert. I figured I'd use CREATE VIEW instead of CREATE TYPE as then I can quickly check out the details against live data, and I still get a custom compound type.
* Write a function that accepts an array of view_name_type. I love Postgres' typing system, It has spoiled me forever. Can't submit badly formatted objects from the client, they're rejected automatically.
* Write a client-side routine to package data as an array and push it into the insert handling function. The function unnests the array, and then the actual insert code draws from the unpacked values. If I need to extend the table, I can add a new function that knows about the revised fields, and revise (when necessary) earlier supported formats to map to new types/columns/defaults.
There are few CTEs in there, including one that does the main insert and returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <> 0 (not an insert).
> I do agree with you that it would be nice if this worked better than
> it does, but I don't really know exactly how to make that happen. The
> column list for a partitioned table must be fixed at the time it is
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.
> I guess we could have tried to work things out so that a 0 value would
> be passed up from children that lack an xmax column, and that would
> allow the parent to have such a column, but I don't feel too bad that
> we didn't do that ... should I?
You should never feel bad about anything ;-) You and others on that thread contribute so much that I'm getting value out of.
I had it in mind that it would be nice to have some kind of catalog/abstraction that would make it possible to interrogate what system columns are available on a table/partition based on access method. In my vague notion, that might make some of the other ideas from that thread, such as index-oriented stores with quite different physical layouts, easier to implement. But, it's all free when you aren't the one who can write the code.
I've switched the partition-based tables back to returning * on the insert CTE, and then aggregating that to add to a log table and the client result. It's fine. A rich result summary would be very nice. As in rows added/modified/deleted on whatever table(s). If anyone ever decides to implement such a structure for MERGE, it would be nice to see it retrofitted to the other data modification commands where RETURNING works.On Tue, Jul 19, 2022 at 6:13 AM Robert Haas <robertmhaas@gmail.com> wrote:On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx <morrisdeoryx@gmail.com> wrote:
> This fails on a partitioned table because xmax() may not exist. In fact, it does exist in all of those tables, but the system doesn't know how to guarantee that. I know which tables are partitioned, and can downgrade the result on partitioned tables to the count(*) I've been using to date. But now I'm wondering if working with xmax() like this is a poor idea going forward. I don't want to lean on a feature/behavior that's likely to change. For example, I noticed the other day that MERGE does not support RETURNING.
>
> I'd appreciate any insight or advice you can offer.
What is motivating you to want to see the xmax value here? It's not an
unreasonable thing to want to do, IMHO, but it's a little bit niche so
I'm just curious what the motivation is.
I do agree with you that it would be nice if this worked better than
it does, but I don't really know exactly how to make that happen. The
column list for a partitioned table must be fixed at the time it is
created, but we do not know what partitions might be added in the
future, and thus we don't know whether they will have an xmax column.
I guess we could have tried to work things out so that a 0 value would
be passed up from children that lack an xmax column, and that would
allow the parent to have such a column, but I don't feel too bad that
we didn't do that ... should I?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jul 19, 2022 at 4:44 AM Morris de Oryx <morrisdeoryx@gmail.com> wrote: > My reason for xmax() in the result is to break down the affected rows count into an insert count, and a modified estimate.Not super critical, but helpful. I've built out some simple custom logging table in out system for this kind ofdetail, and folks have been wanting to break down rows submitted, rows inserted, and rows updated a bit better. Rows submittedis easy and rows inserted is too...update is an estimate as I'm not using anything fancy with xmax() to sort outwhat exactly happened. I wonder whether you could just have the CTEs bubble up 1 or 0 and then sum them at some stage, instead of relying on xmax. Presumably your UPSERT simulation knows which thing it did in each case. For MERGE itself, I wonder if some information about this should be included in the command tag. It looks like MERGE already includes some sort of row count in the command tag, but I guess perhaps it doesn't distinguish between inserts and updates. I don't know why we couldn't expose multiple values this way, though. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Jul 19, 2022 at 10:38 PM Robert Haas <robertmhaas@gmail.com> wrote:
> For MERGE itself, I wonder if some information about this should be
> included in the command tag. It looks like MERGE already includes some
> sort of row count in the command tag, but I guess perhaps it doesn't
> distinguish between inserts and updates. I don't know why we couldn't
> expose multiple values this way, though.
It would be great to get some sort of feedback from MERGE accessible through SQL results, even if that doesn't come in the form of a RETURNING list.
> I wonder whether you could just have the CTEs bubble up 1 or 0 and
> then sum them at some stage, instead of relying on xmax. Presumably
> your UPSERT simulation knows which thing it did in each case.
It might help if I show a sample insert handling function. The issue is with the line at the end of the top CTE, insert_rows:
returning xmax as inserted_transaction_id),
That's what fails on partitions. Is there an alternative way to test what happened to the row(s)? here's the full function. . I wrote a code generator, so I don't have to hand-code all of these bits for each table+version:
-- Create a function to accept an array of rows formatted as item_type_v1 for UPSERT into item_type.
DROP FUNCTION IF EXISTS types_plus.insert_item_type_v1 (types_plus.item_type_v1[]);
CREATE OR REPLACE FUNCTION types_plus.insert_item_type_v1 (data_in types_plus.item_type_v1[])
RETURNS TABLE (
insert_count integer,
estimated_update_count integer,
transaction_id text)
LANGUAGE SQL
BEGIN ATOMIC
-- The CTE below is a roundabout way of returning an insertion count from a pure SQL function in Postgres.
WITH
inserted_rows as (
INSERT INTO item_type (
id,
marked_for_deletion,
name_)
SELECT
rows_in.id,
rows_in.marked_for_deletion,
rows_in.name_
FROM unnest(data_in) as rows_in
ON CONFLICT(id) DO UPDATE SET
marked_for_deletion = EXCLUDED.marked_for_deletion,
name_ = EXCLUDED.name_
returning xmax as inserted_transaction_id),
status_data AS (
select count(*) FILTER (where inserted_transaction_id = 0) AS insert_count,
count(*) FILTER (where inserted_transaction_id != 0) AS estimated_update_count,
pg_current_xact_id_if_assigned()::text AS transaction_id
from inserted_rows),
insert_log_entry AS (
INSERT INTO insert_log (
data_file_id,
ib_version,
job_run_id,
schema_name,
table_name,
records_submitted,
insert_count,
estimated_update_count)
SELECT
coalesce_session_variable(
'data_file_id',
'00000000000000000000000000000000')::uuid,
coalesce_session_variable('ib_version'), -- Default result is ''
coalesce_session_variable(
'job_run_id',
'00000000000000000000000000000000')::uuid,
'ascendco',
'item_type',
(select cardinality(data_in)),
insert_count,
estimated_update_count
FROM status_data
)
-- Final output/result.
select insert_count,
estimated_update_count,
transaction_id
from status_data;
END;
On Tue, Jul 19, 2022 at 11:22 PM Morris de Oryx <morrisdeoryx@gmail.com> wrote: > It might help if I show a sample insert handling function. The issue is with the line at the end of the top CTE, insert_rows: > > returning xmax as inserted_transaction_id), > > That's what fails on partitions. Is there an alternative way to test what happened to the row(s)? here's the full function.. I wrote a code generator, so I don't have to hand-code all of these bits for each table+version: Oh I see. I didn't realize you were using INSERT .. ON CONFLICT UPDATE, but that makes tons of sense, and I don't see an obvious alternative to the way you wrote this. Hmm. -- Robert Haas EDB: http://www.enterprisedb.com