Thread: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database
De : mailto:dev@archonet.com > Cyril VELTER wrote: > > > > maintenance_work_mem and work_mem are set to their default value (16M / 1M). > > Does Copy use any of these values ? The doc only state sort operations for > > work_mem and vacuum / create index / alter table add foreing key for > > maintenance_work_mem. > > You'll probably want to increase both of those (assuming you have a few > gigs of RAM). The server have 4G of ram. But doing a search in the source code it does not seem that these values are used during a copy operation. I will try to increase these values. > > > BTW, just dumping this table fail with the same error. > > Hmm - what version of 8.2 are you running on Windows? > > It's just that 8.2.4 has a fix that says: > > Allow pg_dump to do binary backups larger than two gigabytes on Windows > (Magnus) > > Now, whether that will affect you I'm not sure, since you said you were > dumping from Linux, with the server on Windows. I don't think so. The dump stop at 75GB (the 2GB limit is not a problem), I'm running 8.2.5 on both the client and the server and the dump is done on a linux machine. Cyril VELTER
Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database
From
Richard Huxton
Date:
Cyril VELTER wrote: >> It's just that 8.2.4 has a fix that says: >> >> Allow pg_dump to do binary backups larger than two gigabytes on Windows >> (Magnus) >> >> Now, whether that will affect you I'm not sure, since you said you were >> dumping from Linux, with the server on Windows. > > I don't think so. The dump stop at 75GB (the 2GB limit is not a problem), I'm > running 8.2.5 on both the client and the server and the dump is done on a linux > machine. Can you select all the rows in that table, or does that cause an error too? -- Richard Huxton Archonet Ltd
"Cyril VELTER" <cyril.velter@metadys.com> writes: > The server have 4G of ram. But doing a search in the source code it does not > seem that these values are used during a copy operation. They aren't. The "out of memory" complaint sounds more like it might be due to corrupt data, ie the olde scenario where a variable-width field's length word contains a ridiculously large value. I don't know how that would lead to a winsock error, though. regards, tom lane
[Re] Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database
From
"Cyril VELTER"
Date:
De : mailto:tgl@sss.pgh.pa.us > "Cyril VELTER" <cyril.velter@metadys.com> writes: > > The server have 4G of ram. But doing a search in the source code it does not > > seem that these values are used during a copy operation. > > They aren't. The "out of memory" complaint sounds more like it might be > due to corrupt data, ie the olde scenario where a variable-width field's > length word contains a ridiculously large value. I don't know how that > would lead to a winsock error, though. Yes this is very strange. The corruption scenario is plausible as the dump stop at nearly the same place each time (it's a live database so the place is not exactly the same). Also, the database have been upgraded recently (2007-11-09) from 8.0.13 to 8.2.5 so I suppose the corruption have occured between this date and now ? I have run the following query to get all record with one field over 10MB : select p2,length(p18155),length(p18154),length(p4065),length(p4083),length(p4020),lengt h(p4074),length(p3857),length(p32402),length(p5512),length(p18175) from c2674 where length(p18155)>10000000 or length(p18154)>10000000 or length(p4065)>10000000 or length(p4083)>10000000 or length(p4020)>10000000 or length(p4074)>10000000 or length(p3857)>10000000 or length(p32402)>10000000 or length(p5512)>10000000 or length(p18175)>10000000; The biggest value is 28034379. Is length() supposed to return the very high length in case of corruption ? Is there anythning else i can do ? Thanks, Cyril VELTER Table definition : Column | Type | Modifiers --------+-----------------------------+----------- p2 | bigint | p9 | boolean | p3337 | integer | p18155 | text | p18154 | text | p17561 | bigint | p4065 | text | p689 | bigint | p43711 | integer | p4083 | text | p4020 | text | p4029 | text | p4218 | timestamp without time zone | p4074 | text | p45127 | bigint | p3857 | text | p7 | timestamp without time zone | p6 | bigint | p5 | timestamp without time zone | p32402 | text | p5512 | bytea | p18175 | bytea | p42542 | bigint | p17562 | integer | p4 | bigint | Indexes: "idx_2674_p2" UNIQUE, btree (p2) "idx_2674_p17562" btree (p17562) "idx_2674_p4" btree (p4) "idx_2674_p4029" btree (p4029) "idx_2674_p4218" btree (p4218) "idx_2674_p42542" btree (p42542) "idx_2674_p45127" btree (p45127) "idx_2674_p5" btree (p5) "idx_2674_p6" btree (p6) "idx_2674_p689" btree (p689) "idx_2674_p7" btree (p7) Row count=1320365
Re: [Re] Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database
From
Richard Huxton
Date:
Cyril VELTER wrote: > > I have run the following query to get all record with one field over 10MB : > > select > p2,length(p18155),length(p18154),length(p4065),length(p4083),length(p4020),lengt > h(p4074),length(p3857),length(p32402),length(p5512),length(p18175) from c2674 > where length(p18155)>10000000 or length(p18154)>10000000 or > length(p4065)>10000000 or length(p4083)>10000000 or length(p4020)>10000000 or > length(p4074)>10000000 or length(p3857)>10000000 or length(p32402)>10000000 or > length(p5512)>10000000 or length(p18175)>10000000; > > The biggest value is 28034379. > > Is length() supposed to return the very high length in case of corruption ? You'd have thought it would. The odd thing (if it is data corruption) is that you would expect to see something in the server logs about a failure to allocate 12345412234124 bytes of memory or some such. Whereas all you get is this winsock error. > Is there anythning else i can do ? Could you try CREATE some_table AS SELECT * FROM c2674 WHERE ... to copy the large rows within the database. If that fails, the table is corrupted but you can identify the problem rows and work around them while you dump the data. If it doesn't fail, that suggests (to me, anyway) you've found a bug somewhere in the communication between server and client (which would explain the winsock error). -- Richard Huxton Archonet Ltd
[Re] Re: [Re] Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database
From
"Cyril VELTER"
Date:
De : mailto:dev@archonet.com > Cyril VELTER wrote: > > > > Is length() supposed to return the very high length in case of corruption ? > > You'd have thought it would. The odd thing (if it is data corruption) is > that you would expect to see something in the server logs about a > failure to allocate 12345412234124 bytes of memory or some such. Whereas > all you get is this winsock error. I have another theory. The message printed by pg_dump : "pg_dump: Error message from server: out of memory" is printed in dumpTableData_copy (pg_dump.c) : write_msg(NULL, "Error message from server: %s", PQerrorMessage(g_conn)); There are serveral places in libpq where the conn error might be set to "out of memory". I've also discovered that the machine running pg_dump is pretty tight in ram (256M) and that no swap file is active (oversight after a disk upgrade). May be this error is simply pg_dump running out of memory and not the server. This would also explain that the server only report a socket error (though I would have expected WSAECONNRESET (10054) instead of WSAECONNREFUSED (10061)). The 8.0.13 pg_dump used to work just fine on the exact same machine. Maybe 8.2 pg_dump need more memory (or maybe a memory leak ?). I'm running the dump again after adding some swap space and will monitor memory usage. What do you think ? > > Is there anythning else i can do ? > > Could you try CREATE some_table AS SELECT * FROM c2674 WHERE ... to copy > the large rows within the database. If that fails, the table is > corrupted but you can identify the problem rows and work around them > while you dump the data. I will try that this week end if my theory prove wrong (I need to make disk space available on the server for that, the table is 60GB). Thanks, Cyril
Re: [Re] Re: [Re] Re: [Re] Re: [Re] Re: Unknown winsock error 10061while dumping a big database
From
Richard Huxton
Date:
Cyril VELTER wrote: > De : mailto:dev@archonet.com >> Cyril VELTER wrote: >>> Is length() supposed to return the very high length in case of corruption > ? >> You'd have thought it would. The odd thing (if it is data corruption) is >> that you would expect to see something in the server logs about a >> failure to allocate 12345412234124 bytes of memory or some such. Whereas >> all you get is this winsock error. > > I have another theory. The message printed by pg_dump : > > "pg_dump: Error message from server: out of memory" > > is printed in dumpTableData_copy (pg_dump.c) : write_msg(NULL, "Error message > from server: %s", PQerrorMessage(g_conn)); > > There are serveral places in libpq where the conn error might be set to "out > of memory". I've also discovered that the machine running pg_dump is pretty > tight in ram (256M) and that no swap file is active (oversight after a disk > upgrade). May be this error is simply pg_dump running out of memory and not the > server. This would also explain that the server only report a socket error > (though I would have expected WSAECONNRESET (10054) instead of WSAECONNREFUSED > (10061)). Aha - that sounds likely. If you're dumping multi-megabyte rows I can see how you'd get into problems on a client with a small amount of RAM. > The 8.0.13 pg_dump used to work just fine on the exact same machine. Maybe 8.2 > pg_dump need more memory (or maybe a memory leak ?). Might just be slightly higher RAM usage in your particular case. It could be there were some trade-offs between size and speed. > I'm running the dump again after adding some swap space and will monitor > memory usage. > > What do you think ? I think you've found the problem. If you're short of RAM though you might also have difficulty restoring the dump. You could run pg_dump on the Windows server and copy its output to the RAM-limited Linux box. You could even run pg_restore from the Windows box - if you don't have a direct channel to the database you can use the putty ssh-client to create a tunnelled connection to the Linux box. -- Richard Huxton Archonet Ltd