Thread: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and--custom-fetch-value parameters
[HACKERS] [ patch ] pg_dump: new --custom-fetch-table and--custom-fetch-value parameters
From
"Andrea Urbani"
Date:
Hello to everybody,
I had a problem with a Postgresql 9.3.5 on 32 bit linux, old 2.6.26 kernel:
Program: pg_dump
Problem: if you have tables with big blob fields and you try to dump them with --inserts you could get errors like
pg_dump: [archiver (db)] query failed: lost synchronization with server: got message type "D", length 847712348
pg_dump: [archiver (db)] query was: FETCH 100 FROM _pg_dump_cursor
or
pg_dump: [archiver (db)] query failed: ERROR: out of memory
DETAIL: Failed on request of size 1073741823.
pg_dump: [archiver (db)] query was: FETCH 100 FROM _pg_dump_cursor
I have solve it adding two new parameters, --custom-fetch-table and --custom-fetch-value, to fetch less records for the specified table(s).
This does not completely solve the problem, but it helps you to get more chance to be able to dump your database.
Problem: if you have tables with big blob fields and you try to dump them with --inserts you could get errors like
pg_dump: [archiver (db)] query failed: lost synchronization with server: got message type "D", length 847712348
pg_dump: [archiver (db)] query was: FETCH 100 FROM _pg_dump_cursor
or
pg_dump: [archiver (db)] query failed: ERROR: out of memory
DETAIL: Failed on request of size 1073741823.
pg_dump: [archiver (db)] query was: FETCH 100 FROM _pg_dump_cursor
I have solve it adding two new parameters, --custom-fetch-table and --custom-fetch-value, to fetch less records for the specified table(s).
This does not completely solve the problem, but it helps you to get more chance to be able to dump your database.
pg_dump --dbname=healthorganizer --username=hor --column-inserts --custom-fetch-table='"tDocumentsFiles"' --custom-fetch-value=25
I haven't tested the documentation: too many problems while building it (also the original version, without my changes; probably I have bogus tools... and too less time to check/try...).
Attached the patches for the master and REL9_6_STABLE.
I hope it will help somebody else.
Bye
Andrea
matfanjol@users.sf.net
Andrea
matfanjol@users.sf.net
Attachment
Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and--custom-fetch-value parameters
From
Stephen Frost
Date:
Andrea, * Andrea Urbani (matfanjol@mail.com) wrote: > I had a problem with a Postgresql 9.3.5 on 32 bit linux, old 2.6.26 > kernel: Ok, though, to be clear, this is a feature request, so we wouldn't back-patch adding this to pg_dump. > I have solve it adding two new parameters, --custom-fetch-table and > --custom-fetch-value, to fetch less records for the specified table(s). Giving the user the ability to change the fetch size sounds interesting, though do we really need to specify it per table? What about just a --fetch-size=X option? > This does not completely solve the problem, but it helps you to get more > chance to be able to dump your database. That is certainly a worthwhile goal. > pg_dump --dbname=healthorganizer --username=hor --column-inserts > --custom-fetch-table='"tDocumentsFiles"' --custom-fetch-value=25 I don't particularly like the use of 'custom' in the name of the option, seems like it's just a noise word and not really necessary. > I haven't tested the documentation: too many problems while building it > (also the original version, without my changes; probably I have bogus > tools... and too less time to check/try...). > Attached the patches for the master and REL9_6_STABLE. I agree the documentation can be a bit of a pain, but there's a lot of issues with the patch itself when it comes to the project style. The indentation doesn't look like it's all correct, and multi-line comments should be of the form: /** text here*/ Lastly, it'd be good to have this patch added to https://commitfest.postgresql.org to have it formally reviewed in the commitfest cycle coming up in January. Thanks! Stephen
Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and--custom-fetch-value parameters
From
"Andrea Urbani"
Date:
>> I have solve it adding two new parameters, --custom-fetch-table and >> --custom-fetch-value, to fetch less records for the specified table(s). > Giving the user the ability to change the fetch size sounds interesting, > though do we really need to specify it per table? What about just a > --fetch-size=X option? ... > I don't particularly like the use of 'custom' in the name of the option, > seems like it's just a noise word and not really necessary. I have used "custom" parameters because I want to decrease the fetch size only on the tables with big bloab fields. If weremove the "custom-fetch-table" parameter and we provide only the "fetch-size" parameter all the tables will use the newfetch size and the execution time will be slower (according to my few tests). But just "fetch-size" will be faster touse and maybe more clear. Well, how to go on? I add it to the commitfest and somebody will decide and fix it? Please, let me know Thank you Andrea Sent: Wednesday, December 21, 2016 at 6:44 PM From: "Stephen Frost" <sfrost@snowman.net> To: "Andrea Urbani" <matfanjol@mail.com> Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and --custom-fetch-value parameters Andrea, * Andrea Urbani (matfanjol@mail.com) wrote: > I had a problem with a Postgresql 9.3.5 on 32 bit linux, old 2.6.26 > kernel: Ok, though, to be clear, this is a feature request, so we wouldn't back-patch adding this to pg_dump. > I have solve it adding two new parameters, --custom-fetch-table and > --custom-fetch-value, to fetch less records for the specified table(s). Giving the user the ability to change the fetch size sounds interesting, though do we really need to specify it per table? What about just a --fetch-size=X option? > This does not completely solve the problem, but it helps you to get more > chance to be able to dump your database. That is certainly a worthwhile goal. > pg_dump --dbname=healthorganizer --username=hor --column-inserts > --custom-fetch-table='"tDocumentsFiles"' --custom-fetch-value=25 I don't particularly like the use of 'custom' in the name of the option, seems like it's just a noise word and not really necessary. > I haven't tested the documentation: too many problems while building it > (also the original version, without my changes; probably I have bogus > tools... and too less time to check/try...). > Attached the patches for the master and REL9_6_STABLE. I agree the documentation can be a bit of a pain, but there's a lot of issues with the patch itself when it comes to the project style. The indentation doesn't look like it's all correct, and multi-line comments should be of the form: /* * text here */ Lastly, it'd be good to have this patch added to https://commitfest.postgresql.org to have it formally reviewed in the commitfest cycle coming up in January. Thanks! Stephen
Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and--custom-fetch-value parameters
From
Robert Haas
Date:
On Fri, Jan 20, 2017 at 12:52 AM, Andrea Urbani <matfanjol@mail.com> wrote: > I have used "custom" parameters because I want to decrease the fetch size only on the tables with big bloab fields. Ifwe remove the "custom-fetch-table" parameter and we provide only the "fetch-size" parameter all the tables will use thenew fetch size and the execution time will be slower (according to my few tests). But just "fetch-size" will be fasterto use and maybe more clear. > Well, how to go on? I add it to the commitfest and somebody will decide and fix it? OK, so I think the idea is that --custom-fetch-size affects only the tables mentioned in --custom-fetch-table. I understand why you want to do it that way but it's kind of messy. Suppose somebody else comes along and wants to customize some other thing for some other set of tables. Then we'll have --custom2-otherthing and --custom2-tables? Blech. Interestingly, this isn't the first attempt to solve a problem of this type. Kyotaro Horiguchi ran into a similar issue with postgres_fdw trying to fetch too much data at once from a remote server: https://www.postgresql.org/message-id/20150122.192739.164180273.horiguchi.kyotaro%40lab.ntt.co.jp In the end, all that got done there was a table-level-configurable fetch limit, and we could do the same thing here (e.g. by adding a dummy storage parameter that only pg_dump uses). But I think what we really ought to do is what Kyotaro Horiguchi proposed originally: have a way to limit the FETCH command to a certain number of bytes. If that number of bytes is exceeded, the FETCH stops after that row even if the number of rows the user requested isn't fulfilled yet. The user can FETCH again if they want more. Tom wasn't a big fan of this idea, but I thought it was clever and still do. And it's undeniable that it provides a much better solution to this problem than forcing the user to manually tweak the fetch size based on their installation-specific knowledge of which tables have blobs large enough that returning 100 rows at a time will exhaust the local server's memory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and--custom-fetch-value parameters
From
"Andrea Urbani"
Date:
I'm a beginner here... anyway I try to share my ideas. My situation is changed in a worst state: I'm no more able to make a pg_dump neither with my custom fetch value (I have tried"1" as value = one row at the time) neither without the "--column-inserts": pg_dump: Dumping the contents of table "tDocumentsFiles" failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: out of memory DETAIL: Failed on request of size 1073741823. pg_dump: The command was: COPY public."tDocumentsFiles" ("ID_Document", "ID_File", "Name", "FileName", "Link", "Note", "Picture","Content", "FileSize", "FileDateTime", "DrugBox", "DrugPicture", "DrugInstructions") TO stdout; I don't know if the Kyotaro Horiguchi patch will solve this, because, again, I'm not able to get neither one single row. Similar problem trying to read and to write the bloab fields with my program. Actually I'm working via pieces: Read r1) I get the length of the bloab field r2) I check the available free memory (on the client pc) r3) I read pieces ofthe bloab field, according to the free memory, appending them to a physical file Write w1) I check the length of the file to save inside the bloab w2) I check the available free memory (on the client pc)w3) I create a temporary table on the server w4) I add lines to this temporary table, writing pieces of the file accordingto the free memory w5) I ask the server to write, inside the final bloab field, the concatenation of the rows ofthe temporary data The read and write is working now. Probably the free memory check should be done on both sides (client and server [does a function/view with the available freememory exist?]) taking the smallest one. What do you think to use a similar approach in the pg_dump? a) go through the table getting the size of each row / fields b) when the size of the row or of the field is bigger than the value (provided or stored somewhere), read pieces of the fieldtill the end PS: I have see there are the "large object" that can work via streams. My files are actually not bigger than 1Gb, but, ok,maybe in the future I will use them instead of the bloabs. Thank you Andrea
Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and--custom-fetch-value parameters
From
Robert Haas
Date:
On Sat, Feb 11, 2017 at 9:56 AM, Andrea Urbani <matfanjol@mail.com> wrote: > I'm a beginner here... anyway I try to share my ideas. > > My situation is changed in a worst state: I'm no more able to make a pg_dump neither with my custom fetch value (I havetried "1" as value = one row at the time) neither without the "--column-inserts": > > pg_dump: Dumping the contents of table "tDocumentsFiles" failed: PQgetResult() failed. > pg_dump: Error message from server: ERROR: out of memory > DETAIL: Failed on request of size 1073741823. > pg_dump: The command was: COPY public."tDocumentsFiles" ("ID_Document", "ID_File", "Name", "FileName", "Link", "Note","Picture", "Content", "FileSize", "FileDateTime", "DrugBox", "DrugPicture", "DrugInstructions") TO stdout; > > I don't know if the Kyotaro Horiguchi patch will solve this, because, again, I'm not able to get neither one single row. Yeah, if you can't fetch even one row, limiting the fetch size won't help. But why is that failing? A single 1GB allocation should be fine on most modern servers. I guess the fact that you're using a 32-bit build of PostgreSQL is probably a big part of the problem; there is probably only 2GB of available address space and you're trying to find a single, contiguous 1GB chunk. If you switch to using a 64-bit PostgreSQL things will probably get a lot better for you, unless the server's actual memory is also very small. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company