Thread: some pg_dump query code simplification
There is a lot of version dependent code in pg_dump now, especially per-version queries. The way they were originally built was that we just have an entirely separate query per version. But as the number of versions and the size of the query grows, this becomes unwieldy. So I tried, as an example, to write the queries in getTableAttrs() differently. Instead of repeating the almost same large query in each version branch, use one query and add a few columns to the SELECT list depending on the version. This saves a lot of duplication and is easier to deal with. I think this patch is useful in its own right, but there is also the larger question of whether people think this is a better style going forward, for pg_dump and psql. (It's already in use in psql in some places.) It won't always work, if the query structure is very different between versions, but as this example shows, it can be useful. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > There is a lot of version dependent code in pg_dump now, especially > per-version queries. The way they were originally built was that we > just have an entirely separate query per version. But as the number of > versions and the size of the query grows, this becomes unwieldy. Agreed, it's becoming a bit of a mess. > So I tried, as an example, to write the queries in getTableAttrs() > differently. Instead of repeating the almost same large query in each > version branch, use one query and add a few columns to the SELECT list > depending on the version. This saves a lot of duplication and is easier > to deal with. I think I had this discussion already with somebody, but ... I do not like this style at all: tbinfo->attidentity[j] = (i_attidentity >= 0 ? *(PQgetvalue(res, j, i_attidentity)) : '\0'); It's basically throwing away all the cross-checking that exists now to ensure that you didn't forget to deal with a column, misspell the column's AS name in one place or the other, etc etc. The code could be completely wrong and it'd still fail silently, producing a default result that might well be the right answer, making it hard to debug. I think the way to do this is to continue to require the query to produce the same set of columns regardless of server version. So the version-dependent bits would look like, eg, if (fout->remoteVersion >= 110000) appendPQExpBufferStr(q, "CASE WHEN a.atthasmissing AND NOT a.attisdropped " "THEN a.attmissingval ELSE null END AS attmissingval,\n"); else appendPQExpBufferStr(q, "null AS attmissingval,\n"); and the data extraction code would remain the same as now. Other than that nit, I agree that this shows a lot of promise. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > > There is a lot of version dependent code in pg_dump now, especially > > per-version queries. The way they were originally built was that we > > just have an entirely separate query per version. But as the number of > > versions and the size of the query grows, this becomes unwieldy. > > Agreed, it's becoming a bit of a mess. > > > So I tried, as an example, to write the queries in getTableAttrs() > > differently. Instead of repeating the almost same large query in each > > version branch, use one query and add a few columns to the SELECT list > > depending on the version. This saves a lot of duplication and is easier > > to deal with. > > I think I had this discussion already with somebody, but ... I do not > like this style at all: > > tbinfo->attidentity[j] = (i_attidentity >= 0 ? *(PQgetvalue(res, j, i_attidentity)) : '\0'); > > It's basically throwing away all the cross-checking that exists now to > ensure that you didn't forget to deal with a column, misspell the column's > AS name in one place or the other, etc etc. The code could be completely > wrong and it'd still fail silently, producing a default result that might > well be the right answer, making it hard to debug. I think the way to do > this is to continue to require the query to produce the same set of > columns regardless of server version. So the version-dependent bits would > look like, eg, > > if (fout->remoteVersion >= 110000) > appendPQExpBufferStr(q, > "CASE WHEN a.atthasmissing AND NOT a.attisdropped " > "THEN a.attmissingval ELSE null END AS attmissingval,\n"); > else > appendPQExpBufferStr(q, > "null AS attmissingval,\n"); > > and the data extraction code would remain the same as now. > > Other than that nit, I agree that this shows a lot of promise. +1 to Tom's thoughts on this- seems like a much better approach. Overall, I definitely agree that it'd be nice to reduce the amount of duplication happening today. Working out some way to actually test all of those queries would be awful nice too. I wonder- what if we had an option to pg_dump to explicitly tell it what the server's version is and then have TAP tests to run with different versions? There may be some cases where that doesn't work, of course, and we'd need to address that, but having those tests might reduce the number of times we end up with something committed which doesn't work at all against some older version of PG because it wasn't tested... Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > I wonder- what if we had an option to pg_dump to explicitly tell it what > the server's version is and then have TAP tests to run with different > versions? Uh ... telling it what the version is doesn't make that true, so I'd have no confidence in a test^H^H^H^Hkluge done that way. The way to test is to point it at an *actual* back-branch server. Andrew has a buildfarm module that does precisely that, although I'm not sure what its test dataset is --- probably the regression database from each branch. I also have a habit of doing such testing manually whenever I touch version-sensitive parts of pg_dump. Dunno about the idea of running the pg_dump TAP tests against back branches. I find that code sufficiently unreadable that maintaining several more copies of it doesn't sound like fun at all. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > I wonder- what if we had an option to pg_dump to explicitly tell it what > > the server's version is and then have TAP tests to run with different > > versions? > > Uh ... telling it what the version is doesn't make that true, so I'd > have no confidence in a test^H^H^H^Hkluge done that way. The way > to test is to point it at an *actual* back-branch server. I certainly agree that this would be ideal, but nonetheless, I've seen multiple cases where just trying to run the query, even against a current version, would have shown that it's malformed or has some issue which needs fixing and today we haven't even got that. > Andrew has a buildfarm module that does precisely that, although > I'm not sure what its test dataset is --- probably the regression > database from each branch. I also have a habit of doing such testing > manually whenever I touch version-sensitive parts of pg_dump. I've gotten better about doing that back-branch testing myself and certainly prefer it, but I think we should also have buildfarm coverage. I don't think we have the full matrix covered, or, really, anything anywhere near it, so I'm looking for other options to at least get that code exercised. > Dunno about the idea of running the pg_dump TAP tests against back > branches. I find that code sufficiently unreadable that maintaining > several more copies of it doesn't sound like fun at all. I hadn't been thinking we'd need more copies of it, simply a few more runs which have different version values specified, though even just doing a pg_dump of the regression suite for each combination would be something. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Stephen Frost <sfrost@snowman.net> writes: >>> I wonder- what if we had an option to pg_dump to explicitly tell it what >>> the server's version is and then have TAP tests to run with different >>> versions? >> Uh ... telling it what the version is doesn't make that true, so I'd >> have no confidence in a test^H^H^H^Hkluge done that way. The way >> to test is to point it at an *actual* back-branch server. > I certainly agree that this would be ideal, but nonetheless, I've seen > multiple cases where just trying to run the query, even against a > current version, would have shown that it's malformed or has some issue > which needs fixing and today we haven't even got that. Yeah, but cases where we need to touch column C in one version and column D in another can't be made to pass when pg_dump is operating under a false assumption about the server version. So this seems like a nonstarter. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> Stephen Frost <sfrost@snowman.net> writes: > >>> I wonder- what if we had an option to pg_dump to explicitly tell it what > >>> the server's version is and then have TAP tests to run with different > >>> versions? > > >> Uh ... telling it what the version is doesn't make that true, so I'd > >> have no confidence in a test^H^H^H^Hkluge done that way. The way > >> to test is to point it at an *actual* back-branch server. > > > I certainly agree that this would be ideal, but nonetheless, I've seen > > multiple cases where just trying to run the query, even against a > > current version, would have shown that it's malformed or has some issue > > which needs fixing and today we haven't even got that. > > Yeah, but cases where we need to touch column C in one version and column > D in another can't be made to pass when pg_dump is operating under a false > assumption about the server version. So this seems like a nonstarter. Yes, there are cases that wouldn't work. I'd be much happier with a complete solution, were that forthcoming. Thanks! Stephen
Attachment
On 08/28/2018 06:05 PM, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: >> I wonder- what if we had an option to pg_dump to explicitly tell it what >> the server's version is and then have TAP tests to run with different >> versions? > Uh ... telling it what the version is doesn't make that true, so I'd > have no confidence in a test^H^H^H^Hkluge done that way. The way > to test is to point it at an *actual* back-branch server. > > Andrew has a buildfarm module that does precisely that, although > I'm not sure what its test dataset is --- probably the regression > database from each branch. I also have a habit of doing such testing > manually whenever I touch version-sensitive parts of pg_dump. It's all the databases from a buildfarm run apart from TAP tests. Since it uses USE_MODULE_DB=1 it covers most of the contrib modules plus the standard regression db, as well as isolation test and pl_regression (which should be taught to do separate DBs for each PL if requested). There is no reason it couldn't test more. > Dunno about the idea of running the pg_dump TAP tests against back > branches. I find that code sufficiently unreadable that maintaining > several more copies of it doesn't sound like fun at all. > > Agreed. The code could do with a lot of comments. I recently looked at adding something to it and decided I had more pressing things to do. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 08/28/2018 06:10 PM, Stephen Frost wrote: > >> Andrew has a buildfarm module that does precisely that, although >> I'm not sure what its test dataset is --- probably the regression >> database from each branch. I also have a habit of doing such testing >> manually whenever I touch version-sensitive parts of pg_dump. > I've gotten better about doing that back-branch testing myself and > certainly prefer it, but I think we should also have buildfarm coverage. > I don't think we have the full matrix covered, or, really, anything > anywhere near it, so I'm looking for other options to at least get that > code exercised. > > If by matrix you mean the version matrix, then the module in question covers every live branch as the source and every same or later live branch as the destination. It could conceivably cover older branches as well - the branch names aren't actually hardcoded - I'd just have to be able to build them and do a standard buildfarm run once. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 28/08/2018 23:43, Tom Lane wrote: > I think I had this discussion already with somebody, but ... I do not > like this style at all: > > tbinfo->attidentity[j] = (i_attidentity >= 0 ? *(PQgetvalue(res, j, i_attidentity)) : '\0'); OK, updated patch attached. If the updated style is acceptable, I'll start running more extensive tests against the older branches. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > OK, updated patch attached. If the updated style is acceptable, I'll > start running more extensive tests against the older branches. Looks sane to me. regards, tom lane
Greetings, * Andrew Dunstan (andrew.dunstan@2ndquadrant.com) wrote: > On 08/28/2018 06:10 PM, Stephen Frost wrote: > >>Andrew has a buildfarm module that does precisely that, although > >>I'm not sure what its test dataset is --- probably the regression > >>database from each branch. I also have a habit of doing such testing > >>manually whenever I touch version-sensitive parts of pg_dump. > >I've gotten better about doing that back-branch testing myself and > >certainly prefer it, but I think we should also have buildfarm coverage. > >I don't think we have the full matrix covered, or, really, anything > >anywhere near it, so I'm looking for other options to at least get that > >code exercised. > > If by matrix you mean the version matrix, then the module in question covers > every live branch as the source and every same or later live branch as the > destination. That's certainly better than I had thought it did. > It could conceivably cover older branches as well - the branch names aren't > actually hardcoded - I'd just have to be able to build them and do a > standard buildfarm run once. Having it cover older branches really is pretty important given how much code there is for those older branches in pg_dump. I've had pretty good success compiling back down to about 7.2, as I recall, using patches that Greg Stark made available. Older is probably possible too. Getting everything we claim to support covered would be fantastic (which I believe is basically 9.6/9.5/9.4/9.3 against 7.0 and up, and master/11/10 against 8.0 and up). Thanks! Stephen
Attachment
Greetings, * Andrew Dunstan (andrew.dunstan@2ndquadrant.com) wrote: > On 08/28/2018 06:05 PM, Tom Lane wrote: > >Dunno about the idea of running the pg_dump TAP tests against back > >branches. I find that code sufficiently unreadable that maintaining > >several more copies of it doesn't sound like fun at all. > > Agreed. The code could do with a lot of comments. I recently looked at > adding something to it and decided I had more pressing things to do. I'm happy to add more comments to it.. There's a pretty good block that tries to describe how the tests work above where the tests are actually defined. What would help? Should I include an example in that code block? Or are you looking for more comments in the actual test definitions about what they're covering or why they're included? Or are you interested in comments about the actual code down at the bottom which runs the tests..? Thanks! Stephen