Thread: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and--custom-fetch-value parameters

Hello to everybody,
I had a problem with a Postgresql 9.3.5 on 32 bit linux, old 2.6.26 kernel:
Program: pg_dump
Problem: if you have tables with big blob fields and you try to dump them with --inserts you could get errors like
    pg_dump: [archiver (db)] query failed: lost synchronization with server: got message type "D", length 847712348
    pg_dump: [archiver (db)] query was: FETCH 100 FROM _pg_dump_cursor
    or
    pg_dump: [archiver (db)] query failed: ERROR:  out of memory
    DETAIL:  Failed on request of size 1073741823.
    pg_dump: [archiver (db)] query was: FETCH 100 FROM _pg_dump_cursor
    
I have solve it adding two new parameters, --custom-fetch-table and --custom-fetch-value, to fetch less records for the specified table(s).
This does not completely solve the problem, but it helps you to get more chance to be able to dump your database.
    pg_dump --dbname=healthorganizer --username=hor --column-inserts --custom-fetch-table='"tDocumentsFiles"' --custom-fetch-value=25
I haven't tested the documentation: too many problems while building it (also the original version, without my changes; probably I have bogus tools... and too less time to check/try...).
Attached the patches for the master and REL9_6_STABLE.
I hope it will help somebody else.
 
Bye
Andrea
matfanjol@users.sf.net
 
Attachment
Andrea,

* Andrea Urbani (matfanjol@mail.com) wrote:
>    I had a problem with a Postgresql 9.3.5 on 32 bit linux, old 2.6.26
>    kernel:

Ok, though, to be clear, this is a feature request, so we wouldn't
back-patch adding this to pg_dump.

>    I have solve it adding two new parameters, --custom-fetch-table and
>    --custom-fetch-value, to fetch less records for the specified table(s).

Giving the user the ability to change the fetch size sounds interesting,
though do we really need to specify it per table?  What about just a
--fetch-size=X option?

>    This does not completely solve the problem, but it helps you to get more
>    chance to be able to dump your database.

That is certainly a worthwhile goal.

>        pg_dump --dbname=healthorganizer --username=hor --column-inserts
>    --custom-fetch-table='"tDocumentsFiles"' --custom-fetch-value=25

I don't particularly like the use of 'custom' in the name of the option,
seems like it's just a noise word and not really necessary.

>    I haven't tested the documentation: too many problems while building it
>    (also the original version, without my changes; probably I have bogus
>    tools... and too less time to check/try...).
>    Attached the patches for the master and REL9_6_STABLE.

I agree the documentation can be a bit of a pain, but there's a lot of
issues with the patch itself when it comes to the project style.  The
indentation doesn't look like it's all correct, and multi-line comments
should be of the form:

/** text here*/

Lastly, it'd be good to have this patch added to
https://commitfest.postgresql.org to have it formally reviewed in the
commitfest cycle coming up in January.

Thanks!

Stephen

>> I have solve it adding two new parameters, --custom-fetch-table and
>> --custom-fetch-value, to fetch less records for the specified table(s).

> Giving the user the ability to change the fetch size sounds interesting,
> though do we really need to specify it per table? What about just a
> --fetch-size=X option?

...

> I don't particularly like the use of 'custom' in the name of the option,
> seems like it's just a noise word and not really necessary.

I have used "custom" parameters because I want to decrease the fetch size only on the tables with big bloab fields. If
weremove the "custom-fetch-table" parameter and we provide only the "fetch-size" parameter all the tables will use the
newfetch size and the execution time will be slower (according to my few tests). But just "fetch-size" will be faster
touse and maybe more clear. 
Well, how to go on? I add it to the commitfest and somebody will decide and fix it?
Please, let me know
Thank you
Andrea
 

Sent: Wednesday, December 21, 2016 at 6:44 PM
From: "Stephen Frost" <sfrost@snowman.net>
To: "Andrea Urbani" <matfanjol@mail.com>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and --custom-fetch-value parameters
Andrea,

* Andrea Urbani (matfanjol@mail.com) wrote:
> I had a problem with a Postgresql 9.3.5 on 32 bit linux, old 2.6.26
> kernel:

Ok, though, to be clear, this is a feature request, so we wouldn't
back-patch adding this to pg_dump.

> I have solve it adding two new parameters, --custom-fetch-table and
> --custom-fetch-value, to fetch less records for the specified table(s).

Giving the user the ability to change the fetch size sounds interesting,
though do we really need to specify it per table? What about just a
--fetch-size=X option?

> This does not completely solve the problem, but it helps you to get more
> chance to be able to dump your database.

That is certainly a worthwhile goal.

>     pg_dump --dbname=healthorganizer --username=hor --column-inserts
> --custom-fetch-table='"tDocumentsFiles"' --custom-fetch-value=25

I don't particularly like the use of 'custom' in the name of the option,
seems like it's just a noise word and not really necessary.

> I haven't tested the documentation: too many problems while building it
> (also the original version, without my changes; probably I have bogus
> tools... and too less time to check/try...).
> Attached the patches for the master and REL9_6_STABLE.

I agree the documentation can be a bit of a pain, but there's a lot of
issues with the patch itself when it comes to the project style. The
indentation doesn't look like it's all correct, and multi-line comments
should be of the form:

/*
* text here
*/

Lastly, it'd be good to have this patch added to
https://commitfest.postgresql.org to have it formally reviewed in the
commitfest cycle coming up in January.

Thanks!

Stephen



On Fri, Jan 20, 2017 at 12:52 AM, Andrea Urbani <matfanjol@mail.com> wrote:
> I have used "custom" parameters because I want to decrease the fetch size only on the tables with big bloab fields.
Ifwe remove the "custom-fetch-table" parameter and we provide only the "fetch-size" parameter all the tables will use
thenew fetch size and the execution time will be slower (according to my few tests). But just "fetch-size" will be
fasterto use and maybe more clear. 
> Well, how to go on? I add it to the commitfest and somebody will decide and fix it?

OK, so I think the idea is that --custom-fetch-size affects only the
tables mentioned in --custom-fetch-table.  I understand why you want
to do it that way but it's kind of messy.  Suppose somebody else comes
along and wants to customize some other thing for some other set of
tables.  Then we'll have --custom2-otherthing and --custom2-tables?
Blech.

Interestingly, this isn't the first attempt to solve a problem of this
type.  Kyotaro Horiguchi ran into a similar issue with postgres_fdw
trying to fetch too much data at once from a remote server:

https://www.postgresql.org/message-id/20150122.192739.164180273.horiguchi.kyotaro%40lab.ntt.co.jp

In the end, all that got done there was a table-level-configurable
fetch limit, and we could do the same thing here (e.g. by adding a
dummy storage parameter that only pg_dump uses).  But I think what we
really ought to do is what Kyotaro Horiguchi proposed originally: have
a way to limit the FETCH command to a certain number of bytes.  If
that number of bytes is exceeded, the FETCH stops after that row even
if the number of rows the user requested isn't fulfilled yet.  The
user can FETCH again if they want more.

Tom wasn't a big fan of this idea, but I thought it was clever and
still do.  And it's undeniable that it provides a much better solution
to this problem than forcing the user to manually tweak the fetch size
based on their installation-specific knowledge of which tables have
blobs large enough that returning 100 rows at a time will exhaust the
local server's memory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



I'm a beginner here... anyway I try to share my ideas.

My situation is changed in a worst state: I'm no more able to make a pg_dump neither with my custom fetch value (I have
tried"1" as value = one row at the time) neither without the "--column-inserts": 

pg_dump: Dumping the contents of table "tDocumentsFiles" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR:  out of memory
DETAIL:  Failed on request of size 1073741823.
pg_dump: The command was: COPY public."tDocumentsFiles" ("ID_Document", "ID_File", "Name", "FileName", "Link", "Note",
"Picture","Content", "FileSize", "FileDateTime", "DrugBox", "DrugPicture", "DrugInstructions") TO stdout; 

I don't know if the Kyotaro Horiguchi patch will solve this, because, again, I'm not able to get neither one single
row.
Similar problem trying to read and to write the bloab fields with my program.
Actually I'm working via pieces:
Read r1) I get the length of the bloab field r2) I check the available free memory (on the client pc) r3) I read pieces
ofthe bloab field, according to the free memory, appending them to a physical file 
Write w1) I check the length of the file to save inside the bloab w2) I check the available free memory (on the client
pc)w3) I create a temporary table on the server w4) I add lines to this temporary table, writing pieces of the file
accordingto the free memory w5) I ask the server to write, inside the final bloab field, the concatenation of the rows
ofthe temporary data 
The read and write is working now.
Probably the free memory check should be done on both sides (client and server [does a function/view with the available
freememory exist?]) taking the smallest one. 
What do you think to use a similar approach in the pg_dump?
a) go through the table getting the size of each row / fields
b) when the size of the row or of the field is bigger than the value (provided or stored somewhere), read pieces of the
fieldtill the end 

PS: I have see there are the "large object" that can work via streams. My files are actually not bigger than 1Gb, but,
ok,maybe in the future I will use them instead of the bloabs. 

Thank you 
Andrea



On Sat, Feb 11, 2017 at 9:56 AM, Andrea Urbani <matfanjol@mail.com> wrote:
> I'm a beginner here... anyway I try to share my ideas.
>
> My situation is changed in a worst state: I'm no more able to make a pg_dump neither with my custom fetch value (I
havetried "1" as value = one row at the time) neither without the "--column-inserts":
 
>
> pg_dump: Dumping the contents of table "tDocumentsFiles" failed: PQgetResult() failed.
> pg_dump: Error message from server: ERROR:  out of memory
> DETAIL:  Failed on request of size 1073741823.
> pg_dump: The command was: COPY public."tDocumentsFiles" ("ID_Document", "ID_File", "Name", "FileName", "Link",
"Note","Picture", "Content", "FileSize", "FileDateTime", "DrugBox", "DrugPicture", "DrugInstructions") TO stdout;
 
>
> I don't know if the Kyotaro Horiguchi patch will solve this, because, again, I'm not able to get neither one single
row.

Yeah, if you can't fetch even one row, limiting the fetch size won't
help.  But why is that failing?  A single 1GB allocation should be
fine on most modern servers.  I guess the fact that you're using a
32-bit build of PostgreSQL is probably a big part of the problem;
there is probably only 2GB of available address space and you're
trying to find a single, contiguous 1GB chunk.  If you switch to using
a 64-bit PostgreSQL things will probably get a lot better for you,
unless the server's actual memory is also very small.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company