Thread: Make attstattarget nullable
In [0] it was discussed that we could make attstattarget a nullable column, instead of always storing an explicit -1 default value for most columns. This patch implements this. This changes the pg_attribute field attstattarget into a nullable field in the variable-length part of the row. If no value is set by the user for attstattarget, it is now null instead of previously -1. This saves space in pg_attribute and tuple descriptors for most practical scenarios. (ATTRIBUTE_FIXED_PART_SIZE is reduced from 108 to 104.) Also, null is the semantically more correct value. The ANALYZE code internally continues to represent the default statistics target by -1, so that that code can avoid having to deal with null values. But that is now contained to ANALYZE code. The DDL code deals with attstattarget possibly null. For system columns, the field is now always null but the effective value 0 (don't analyze) is assumed. To set a column's statistics target to the default value, the new command form ALTER TABLE ... SET STATISTICS DEFAULT can be used. (SET STATISTICS -1 still works.) [0]: https://www.postgresql.org/message-id/flat/d07ffc2b-e0e8-77f7-38fb-be921dff71af%40enterprisedb.com
Attachment
Here is an updated patch rebased over 3e2e0d5ad7. The 0001 patch stands on its own, but I also tacked on two additional WIP patches that simplify some pg_attribute handling and make these kinds of refactorings simpler in the future. See description in the patches. On 05.12.23 13:52, Peter Eisentraut wrote: > In [0] it was discussed that we could make attstattarget a nullable > column, instead of always storing an explicit -1 default value for most > columns. This patch implements this. > > This changes the pg_attribute field attstattarget into a nullable field > in the variable-length part of the row. If no value is set by the user > for attstattarget, it is now null instead of previously -1. This saves > space in pg_attribute and tuple descriptors for most practical > scenarios. (ATTRIBUTE_FIXED_PART_SIZE is reduced from 108 to 104.) > Also, null is the semantically more correct value. > > The ANALYZE code internally continues to represent the default > statistics target by -1, so that that code can avoid having to deal with > null values. But that is now contained to ANALYZE code. The DDL code > deals with attstattarget possibly null. > > For system columns, the field is now always null but the effective value > 0 (don't analyze) is assumed. > > To set a column's statistics target to the default value, the new > command form ALTER TABLE ... SET STATISTICS DEFAULT can be used. (SET > STATISTICS -1 still works.) > > > [0]: > https://www.postgresql.org/message-id/flat/d07ffc2b-e0e8-77f7-38fb-be921dff71af%40enterprisedb.com
Attachment
On 2023-Dec-23, Peter Eisentraut wrote: > Here is an updated patch rebased over 3e2e0d5ad7. > > The 0001 patch stands on its own, but I also tacked on two additional WIP > patches that simplify some pg_attribute handling and make these kinds of > refactorings simpler in the future. See description in the patches. I didn't look at 0002 and 0003, since they're marked as WIP. (But I did like the removal that happens in 0003, so I hope these two also make it to 17). > On 05.12.23 13:52, Peter Eisentraut wrote: > > In [0] it was discussed that we could make attstattarget a nullable > > column, instead of always storing an explicit -1 default value for most > > columns. This patch implements this. Seems reasonable. Do we really need a catversion bump for this? I like that we now have SET STATISTICS DEFAULT rather than -1 to reset to default. Do we want to document that setting explicitly to -1 continues to have that behavior? (I would add something like "Setting to a value of -1 is an obsolete spelling to get the same behavior." after the phrase that explains DEFAULT in the ALTER TABLE manpage.) I noticed that equalTupleDescs no longer compares attstattarget, and this is because the field is not in TupleDesc anymore. I looked at the callers of equalTupleDescs and I think this is exactly what we want (precisely because attstattarget is no longer in TupleDesc.) > > This changes the pg_attribute field attstattarget into a nullable field > > in the variable-length part of the row. I don't think we use "the variable-length part of the row" as a term anywhere. We only have the variable-length columns, and we made a bit of a mistake in using CATALOG_VARLEN to differentiate the part of the catalogs that are not mapped to the structs (because at the time those were in effect only the variable length fields). I think this is largely not a problem, but let's be careful with how we word the related comments. So: I think the comment next to "#ifdef CATALOG_VARLEN" is now a bit misleading, because the field immediately below is effectively not varlena. Maybe make it #ifdef CATALOG_VARLEN /* nullable/varlena fields start here */ In RemoveAttributeById, a comment says "Clear the other variable-length fields." but this is no longer fully correct. Again maybe make it "... the other nullable or variable-length fields". In get_attstattarget() I think we should return 0 for dropped columns without reading attstattarget, which is useless anyway, and if it did happen to return non-null, it might cause us to do stuff, which would be a waste. It's annoying that the new code in index_concurrently_swap() is more verbose than the code being replaced, but it seems OK to me, since it allows us to distinguish a null value in attstattarget from actual 0 without complicating the get_attstattarget API (which I think we would have to do if we wanted to use it here.) I don't have anything else on this patch at this point. Thanks -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On 10.01.24 14:16, Alvaro Herrera wrote: >> Here is an updated patch rebased over 3e2e0d5ad7. >> >> The 0001 patch stands on its own, but I also tacked on two additional WIP >> patches that simplify some pg_attribute handling and make these kinds of >> refactorings simpler in the future. See description in the patches. > > I didn't look at 0002 and 0003, since they're marked as WIP. (But I did > like the removal that happens in 0003, so I hope these two also make it > to 17). Here is an updated patch set. I have addressed your comments on 0001. I looked again at 0002 and 0003 and I was quite happy with them, so I just removed the WIP label and also added a few more code comments, but otherwise didn't change anything. > Seems reasonable. Do we really need a catversion bump for this? Yes, this changes the order of the fields in pg_attribute. > I like that we now have SET STATISTICS DEFAULT rather than -1 to reset > to default. Do we want to document that setting explicitly to -1 > continues to have that behavior? (I would add something like "Setting > to a value of -1 is an obsolete spelling to get the same behavior." > after the phrase that explains DEFAULT in the ALTER TABLE manpage.) done > I noticed that equalTupleDescs no longer compares attstattarget, and > this is because the field is not in TupleDesc anymore. I looked at the > callers of equalTupleDescs and I think this is exactly what we want > (precisely because attstattarget is no longer in TupleDesc.) Yes, I had investigated that in some detail, and I think it's ok. I think equalTupleDescs() is actually mostly useless and I plan to start a separate discussion on that. >>> This changes the pg_attribute field attstattarget into a nullable field >>> in the variable-length part of the row. > > I don't think we use "the variable-length part of the row" as a term > anywhere. We only have the variable-length columns, and we made a bit > of a mistake in using CATALOG_VARLEN to differentiate the part of the > catalogs that are not mapped to the structs (because at the time those > were in effect only the variable length fields). I think this is > largely not a problem, but let's be careful with how we word the related > comments. So: Yeah, there are multiple ways to interpret this. There are fields with varlena headers, but there are also fields that are not-fixed-length as far as struct access to catalog tuples is concerned, and the two not the same. > I think the comment next to "#ifdef CATALOG_VARLEN" is now a bit > misleading, because the field immediately below is effectively not > varlena. Maybe make it > #ifdef CATALOG_VARLEN /* nullable/varlena fields start here */ done > In RemoveAttributeById, a comment says > "Clear the other variable-length fields." > but this is no longer fully correct. Again maybe make it "... the other > nullable or variable-length fields". done > In get_attstattarget() I think we should return 0 for dropped columns > without reading attstattarget, which is useless anyway, and if it did > happen to return non-null, it might cause us to do stuff, which would be > a waste. I ended up deciding to get rid of get_attstattarget() altogether and just do the fetching inline in examine_attribute(). Because the previous API and what you are discussing here is over-designed, since the only caller doesn't call it with dropped columns or system columns anyway. This way these issues are contained in the ANALYZE code, not in a very general place like lsyscache.c. > It's annoying that the new code in index_concurrently_swap() is more > verbose than the code being replaced, but it seems OK to me, since it > allows us to distinguish a null value in attstattarget from actual 0 > without complicating the get_attstattarget API (which I think we would > have to do if we wanted to use it here.) Yeah, this was annoying. Originally, I had it even more complicated, because I was trying to check if the actual (non-null) values are the same. But then I realized the new value is never set at this point. I think what the code is actually about is clearer now. And of course the 0003 patch gets rid of it anyway.
Attachment
On 2024-Jan-11, Peter Eisentraut wrote: > On 10.01.24 14:16, Alvaro Herrera wrote: > > Seems reasonable. Do we really need a catversion bump for this? > > Yes, this changes the order of the fields in pg_attribute. Ah, right. > > In get_attstattarget() I think we should return 0 for dropped columns > > without reading attstattarget, which is useless anyway, and if it did > > happen to return non-null, it might cause us to do stuff, which would be > > a waste. > > I ended up deciding to get rid of get_attstattarget() altogether and just do > the fetching inline in examine_attribute(). Because the previous API and > what you are discussing here is over-designed, since the only caller doesn't > call it with dropped columns or system columns anyway. This way these > issues are contained in the ANALYZE code, not in a very general place like > lsyscache.c. Sounds good. Maybe instead of having examine_attribute hand a -1 target to the analyze functions, we could just put default_statistics_target there. Analyze functions would never receive negative values, and we could remove that from the analyze functions. Maybe make VacAttrStats->attstattarget unsigned while at it. (This could be a separate patch.) > > It's annoying that the new code in index_concurrently_swap() is more > > verbose than the code being replaced, but it seems OK to me, since it > > allows us to distinguish a null value in attstattarget from actual 0 > > without complicating the get_attstattarget API (which I think we would > > have to do if we wanted to use it here.) > > Yeah, this was annoying. Originally, I had it even more complicated, > because I was trying to check if the actual (non-null) values are the same. > But then I realized the new value is never set at this point. I think what > the code is actually about is clearer now. Yeah, it's neat and the comment is clear enough. > And of course the 0003 patch gets rid of it anyway. I again didn't look at 0002 and 0003 very closely, but from 10,000 feet it looks mostly reasonable -- but I think naming the struct FormData_pg_attribute_extra is not a great idea, as it looks like there would have to be a catalog named pg_attribute_extra -- and I don't think I would make the "non-Data" pointer-to-struct typedef either. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
BTW I wanted to but didn't say so explicitly, so here goes: 0001 looks ready to go in. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Find a bug in a program, and fix it, and the program will work today. Show the program how to find and fix a bug, and the program will work forever" (Oliver Silfridge)
On 12.01.24 12:16, Alvaro Herrera wrote: >>> In get_attstattarget() I think we should return 0 for dropped columns >>> without reading attstattarget, which is useless anyway, and if it did >>> happen to return non-null, it might cause us to do stuff, which would be >>> a waste. >> >> I ended up deciding to get rid of get_attstattarget() altogether and just do >> the fetching inline in examine_attribute(). Because the previous API and >> what you are discussing here is over-designed, since the only caller doesn't >> call it with dropped columns or system columns anyway. This way these >> issues are contained in the ANALYZE code, not in a very general place like >> lsyscache.c. > > Sounds good. I have committed this first patch. > Maybe instead of having examine_attribute hand a -1 target to the > analyze functions, we could just put default_statistics_target there. > Analyze functions would never receive negative values, and we could > remove that from the analyze functions. Maybe make > VacAttrStats->attstattarget unsigned while at it. (This could be a > separate patch.) But I now remembered why I didn't do this. The extended statistics code needs to know whether the statistics target was set or left as default, because it will then apply its own sequence of logic to determine a final value. (Maybe there is a way to untangle this further, but it's not as obvious as it seems.) At which point I then realized that extended statistics have their own statistics target catalog field and command, and we really should change that to match the changes done to attstattarget. So here is another patch that does all that again for stxstattarget. It's meant to mirror the attstattarget changes exactly. >> And of course the 0003 patch gets rid of it anyway. > > I again didn't look at 0002 and 0003 very closely, but from 10,000 feet > it looks mostly reasonable -- but I think naming the struct > FormData_pg_attribute_extra is not a great idea, as it looks like there > would have to be a catalog named pg_attribute_extra -- and I don't think > I would make the "non-Data" pointer-to-struct typedef either. I agree that this naming was problematic. After some introverted bikeshedding, I changed it to FormExtraData_pg_attribute. Obviously, other solutions are possible. I also removed the typedef as you suggested.
Attachment
Hi Peter, I took a look at this patch today. I think it makes sense to do this, and I haven't found any major issues with any of the three patches. A couple minor comments: 0001 ---- 1) I think this bit in ALTER STATISTICS docs is wrong: - <term><replaceable class="parameter">new_target</replaceable></term> + <term><literal>SET STATISTICS { <replaceable class="parameter">integer</replaceable> | DEFAULT }</literal></term> because it means we now have list entries for name, ..., new_name, new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }". That's a bit weird. 2) The newtarget handling in AlterStatistics seems rather confusing. Why does it get set to -1 just to ignore the value later? For a while I was 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field to -1. Maybe ditching the first if block and directly checking stmt->stxstattarget before setting repl_val/repl_null would be better? 3) I find it a bit tedious that making the stxstattarget field nullable means we now have to translate NULL to -1 in so many places. I know why it's that way, but it's ... not very convenient :-( 0002 ---- 1) I think InsertPgAttributeTuples comment probably needs to document what the new tupdesc_extra parameter does. 0003 ---- no comment, seems fine regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06.03.24 22:34, Tomas Vondra wrote: > 0001 > ---- > > 1) I think this bit in ALTER STATISTICS docs is wrong: > > - <term><replaceable class="parameter">new_target</replaceable></term> > + <term><literal>SET STATISTICS { <replaceable > class="parameter">integer</replaceable> | DEFAULT }</literal></term> > > because it means we now have list entries for name, ..., new_name, > new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }". > That's a bit weird. Ok, how would you change it? List out the full clauses of the other variants under Parameters as well? We have similar inconsistencies on other ALTER reference pages, so I'm not sure what the preferred approach is. > 2) The newtarget handling in AlterStatistics seems rather confusing. Why > does it get set to -1 just to ignore the value later? For a while I was > 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field > to -1. Maybe ditching the first if block and directly checking > stmt->stxstattarget before setting repl_val/repl_null would be better? But we also need to continue accepting -1 for default on input. The current code achieves that, the proposed variant would not. Note that this patch matches the equivalent patch for attstattarget (4f622503d6d), which uses the same logic. We could change it if we have a better idea, but then we should change both. > 0002 > ---- > > 1) I think InsertPgAttributeTuples comment probably needs to document > what the new tupdesc_extra parameter does. Yes, I'll update that comment.
On 3/12/24 13:47, Peter Eisentraut wrote: > On 06.03.24 22:34, Tomas Vondra wrote: >> 0001 >> ---- >> >> 1) I think this bit in ALTER STATISTICS docs is wrong: >> >> - <term><replaceable >> class="parameter">new_target</replaceable></term> >> + <term><literal>SET STATISTICS { <replaceable >> class="parameter">integer</replaceable> | DEFAULT }</literal></term> >> >> because it means we now have list entries for name, ..., new_name, >> new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }". >> That's a bit weird. > > Ok, how would you change it? List out the full clauses of the other > variants under Parameters as well? I'd go with a parameter, essentially exactly as it used to be, except for adding the DEFAULT option. So the list would define new_target, and mention DEFAULT as a special value. > We have similar inconsistencies on other ALTER reference pages, so I'm > not sure what the preferred approach is. > Yeah, the other reference pages may have some inconsistencies too, but let's not add more. >> 2) The newtarget handling in AlterStatistics seems rather confusing. Why >> does it get set to -1 just to ignore the value later? For a while I was >> 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field >> to -1. Maybe ditching the first if block and directly checking >> stmt->stxstattarget before setting repl_val/repl_null would be better? > > But we also need to continue accepting -1 for default on input. The > current code achieves that, the proposed variant would not. > OK, I did not realize that. But then maybe this should be explained in a comment before the new "if" block, because people won't realize why it needs to be this way. > Note that this patch matches the equivalent patch for attstattarget > (4f622503d6d), which uses the same logic. We could change it if we have > a better idea, but then we should change both. > >> 0002 >> ---- >> >> 1) I think InsertPgAttributeTuples comment probably needs to document >> what the new tupdesc_extra parameter does. > > Yes, I'll update that comment. > OK. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12.03.24 14:32, Tomas Vondra wrote: > On 3/12/24 13:47, Peter Eisentraut wrote: >> On 06.03.24 22:34, Tomas Vondra wrote: >>> 0001 >>> ---- >>> >>> 1) I think this bit in ALTER STATISTICS docs is wrong: >>> >>> - <term><replaceable >>> class="parameter">new_target</replaceable></term> >>> + <term><literal>SET STATISTICS { <replaceable >>> class="parameter">integer</replaceable> | DEFAULT }</literal></term> >>> >>> because it means we now have list entries for name, ..., new_name, >>> new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }". >>> That's a bit weird. >> >> Ok, how would you change it? List out the full clauses of the other >> variants under Parameters as well? > > I'd go with a parameter, essentially exactly as it used to be, except > for adding the DEFAULT option. So the list would define new_target, and > mention DEFAULT as a special value. Ok, done that way (I think). >>> 2) The newtarget handling in AlterStatistics seems rather confusing. Why >>> does it get set to -1 just to ignore the value later? For a while I was >>> 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field >>> to -1. Maybe ditching the first if block and directly checking >>> stmt->stxstattarget before setting repl_val/repl_null would be better? >> >> But we also need to continue accepting -1 for default on input. The >> current code achieves that, the proposed variant would not. > > OK, I did not realize that. But then maybe this should be explained in a > comment before the new "if" block, because people won't realize why it > needs to be this way. In the new version, I tried to write this more explicitly, and updated tablecmds.c to match.
Attachment
On 3/14/24 11:13, Peter Eisentraut wrote: > On 12.03.24 14:32, Tomas Vondra wrote: >> On 3/12/24 13:47, Peter Eisentraut wrote: >>> On 06.03.24 22:34, Tomas Vondra wrote: >>>> 0001 >>>> ---- >>>> >>>> 1) I think this bit in ALTER STATISTICS docs is wrong: >>>> >>>> - <term><replaceable >>>> class="parameter">new_target</replaceable></term> >>>> + <term><literal>SET STATISTICS { <replaceable >>>> class="parameter">integer</replaceable> | DEFAULT }</literal></term> >>>> >>>> because it means we now have list entries for name, ..., new_name, >>>> new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }". >>>> That's a bit weird. >>> >>> Ok, how would you change it? List out the full clauses of the other >>> variants under Parameters as well? >> >> I'd go with a parameter, essentially exactly as it used to be, except >> for adding the DEFAULT option. So the list would define new_target, and >> mention DEFAULT as a special value. > > Ok, done that way (I think). > Seems OK to me. >>>> 2) The newtarget handling in AlterStatistics seems rather confusing. >>>> Why >>>> does it get set to -1 just to ignore the value later? For a while I was >>>> 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field >>>> to -1. Maybe ditching the first if block and directly checking >>>> stmt->stxstattarget before setting repl_val/repl_null would be better? >>> >>> But we also need to continue accepting -1 for default on input. The >>> current code achieves that, the proposed variant would not. >> >> OK, I did not realize that. But then maybe this should be explained in a >> comment before the new "if" block, because people won't realize why it >> needs to be this way. > > In the new version, I tried to write this more explicitly, and updated > tablecmds.c to match. WFM. It still seems a bit hard to read, but I don't know how to do it better. I guess it's how it has to be to deal with multiple default values in a backwards-compatible way. Good thing is it's localized in two places. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14.03.24 15:46, Tomas Vondra wrote: >>>>> 2) The newtarget handling in AlterStatistics seems rather confusing. >>>>> Why >>>>> does it get set to -1 just to ignore the value later? For a while I was >>>>> 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field >>>>> to -1. Maybe ditching the first if block and directly checking >>>>> stmt->stxstattarget before setting repl_val/repl_null would be better? >>>> >>>> But we also need to continue accepting -1 for default on input. The >>>> current code achieves that, the proposed variant would not. >>> >>> OK, I did not realize that. But then maybe this should be explained in a >>> comment before the new "if" block, because people won't realize why it >>> needs to be this way. >> >> In the new version, I tried to write this more explicitly, and updated >> tablecmds.c to match. > > WFM. It still seems a bit hard to read, but I don't know how to do it > better. I guess it's how it has to be to deal with multiple default > values in a backwards-compatible way. Good thing is it's localized in > two places. I have committed this patch series. Thanks.
On Sun, Mar 17, 2024 at 01:51:39PM +0100, Peter Eisentraut wrote: > I have committed this patch series. Thanks. My compiler is worried that "newtarget" might be getting used uninitialized. AFAICT there's no actual risk here, so I think initializing it to 0 is sufficient. I'll commit the attached patch shortly. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com