Thread: COPY FROM : out of memory
Hi list ! When trying to import a 20M rows csv file into PostgreSQL, I get : ERROR: out of memory État SQL :53200 Détail :Failed on request of size 1073741823. Contexte : COPY tmp, line 1 The table has no index, no trigger, ... : CREATE TABLE tmp ( c1 bigint, c2 character varying, c3 character varying ) WITHOUT OIDS; ALTER TABLE tmp OWNER TO postgres; The COPY command is very basic : SET client_encoding TO UTF8; COPY tmp FROM 'E:\\Production\\Temp\\detailrechercheutf8.csv' CSV; PostgreSQL version is : "PostgreSQL 8.1.5 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special)" I have ~1.5GB of RAM available, and ~4GB of free pagefile space. Something wrong in my postgresql.conf ? I didn't do much tweaking though... Regards -- Arnaud
Arnaud Lesauvage <thewild@freesurf.fr> writes: > When trying to import a 20M rows csv file into PostgreSQL, I > get : > ERROR: out of memory > �tat SQL :53200 > D�tail :Failed on request of size 1073741823. > Contexte : COPY tmp, line 1 Can you put together a self-contained example? The reference to "line 1" suggests that you wouldn't need the whole 20M row file, just the first few rows ... regards, tom lane
On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote: > Arnaud Lesauvage <thewild@freesurf.fr> writes: > > When trying to import a 20M rows csv file into PostgreSQL, I > > get : > > > ERROR: out of memory > > État SQL :53200 > > Détail :Failed on request of size 1073741823. > > Contexte : COPY tmp, line 1 > > Can you put together a self-contained example? The reference to "line > 1" suggests that you wouldn't need the whole 20M row file, just the > first few rows ... Maybe it's a line termination problem? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Unless its not seeing the end of the first record AS the end of the first record, and hence seeing the whole file as 1 record.Arnaud Lesauvage <thewild@freesurf.fr> writes:When trying to import a 20M rows csv file into PostgreSQL, I get :ERROR: out of memory État SQL :53200 Détail :Failed on request of size 1073741823. Contexte : COPY tmp, line 1Can you put together a self-contained example? The reference to "line 1" suggests that you wouldn't need the whole 20M row file, just the first few rows ...
Terry
regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Martijn van Oosterhout a écrit : > On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote: >> Arnaud Lesauvage <thewild@freesurf.fr> writes: >> > When trying to import a 20M rows csv file into PostgreSQL, I >> > get : >> >> > ERROR: out of memory >> > État SQL :53200 >> > Détail :Failed on request of size 1073741823. >> > Contexte : COPY tmp, line 1 >> >> Can you put together a self-contained example? The reference to "line >> 1" suggests that you wouldn't need the whole 20M row file, just the >> first few rows ... > > Maybe it's a line termination problem? > > Have a nice day, I think you are right ! Trying to see the first line with sed outputs the whole file! All I did was export the file in UNICODE from MSSQL, convert it with iconv -f "UCS-4-INTERNAL" -t "UTF-8" myfile.cvs. I guess I still don't have the right encoding... :(
Arnaud Lesauvage wrote: > Martijn van Oosterhout a écrit : > >On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote: > >>Arnaud Lesauvage <thewild@freesurf.fr> writes: > >>> When trying to import a 20M rows csv file into PostgreSQL, I > >>> get : > >> > >>> ERROR: out of memory > >>> État SQL :53200 > >>> Détail :Failed on request of size 1073741823. > >>> Contexte : COPY tmp, line 1 > >> > >>Can you put together a self-contained example? The reference to "line > >>1" suggests that you wouldn't need the whole 20M row file, just the > >>first few rows ... > > > >Maybe it's a line termination problem? > > I think you are right ! > Trying to see the first line with sed outputs the whole file! > All I did was export the file in UNICODE from MSSQL, convert > it with iconv -f "UCS-4-INTERNAL" -t "UTF-8" myfile.cvs. > > I guess I still don't have the right encoding... :( Did you set the encoding with \encoding? I think it's critical for determining line and field separators. If you only do SET client_encoding, the backend will work but psql may not. Or you mean that the first line of the text file is the whole file? In that case I'd guess that the iconv procedure is borked somehow, or maybe the input file is OK for everything except the linefeed(*) (*) is "linefeed" plural or do you need to add an "s"? Is the singular "linefood"??? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Hi all,
I have a postgres installation thats running under 70-80% CPU usage while
an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.
Here’s the scenario,
300 queries/second
Server: Postgres 8.1.4 on win2k server
CPU: Dual Xeon 3.6 Ghz,
Memory: 4GB RAM
Disks: 3 x 36gb , 15K RPM SCSI
C# based web application calling postgres functions using npgsql 0.7.
Its almost completely read-only db apart from fortnightly updates.
Table 1 - About 300,000 rows with simple rectangles
Table 2 – 1 million rows
Total size: 300MB
Functions : Simple coordinate reprojection and intersection query + inner join of table1 and table2.
I think I have all the right indexes defined and indeed the performance for queries under low loads is fast.
==================================================================================
postgresql.conf has following settings
max_connections = 150
hared_buffers = 20000 # min 16 or max_connections*2, 8KB each
temp_buffers = 2000 # min 100, 8KB each
max_prepared_transactions = 25 # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 512 # min 64, size in KB
#maintenance_work_mem = 16384 # min 1024, size in KB
max_stack_depth = 2048
effective_cache_size = 82728 # typically 8KB each
random_page_cost = 4 # units are one sequential page fetch
==================================================================================
SQL server caches all the data in memory which is making it faster(uses about 1.2GB memory- which is fine).
But postgres has everything spread across 10-15 processes, with each process using about 10-30MB, not nearly enough to cache all the data and ends up doing a lot of disk reads.
I've read that postgres depends on OS to cache the files, I wonder if this is not happenning on windows.
In any case I cannot believe that having 15-20 processes running on windows helps. Why not spwan of threads instead of processes, which might
be far less expensive and more efficient. Is there any way of doing this?
My question is, should I just accept the performance I am getting as the limit on windows or should I be looking at some other params that I might have missed?
Thanks,
Gopal
----- Original Message -----From: GopalSent: Thursday, November 23, 2006 11:31 PMSubject: [GENERAL] Postgres scalability and performance on windowsHi all,
I have a postgres installation thats running under 70-80% CPU usage while
an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.
Heres the scenario,
300 queries/second
Server: Postgres 8.1.4 on win2k server
CPU: Dual Xeon 3.6 Ghz,
Memory: 4GB RAM
Disks: 3 x 36gb , 15K RPM SCSI
C# based web application calling postgres functions using npgsql 0.7.
Its almost completely read-only db apart from fortnightly updates.
Table 1 - About 300,000 rows with simple rectangles
Table 2 1 million rows
Total size: 300MB
Functions : Simple coordinate reprojection and intersection query + inner join of table1 and table2.
I think I have all the right indexes defined and indeed the performance for queries under low loads is fast.
==================================================================================
postgresql.conf has following settings
max_connections = 150
hared_buffers = 20000 # min 16 or max_connections*2, 8KB each
temp_buffers = 2000 # min 100, 8KB each
max_prepared_transactions = 25 # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 512 # min 64, size in KB
#maintenance_work_mem = 16384 # min 1024, size in KB
max_stack_depth = 2048
effective_cache_size = 82728 # typically 8KB each
random_page_cost = 4 # units are one sequential page fetch
==================================================================================
SQL server caches all the data in memory which is making it faster(uses about 1.2GB memory- which is fine).
But postgres has everything spread across 10-15 processes, with each process using about 10-30MB, not nearly enough to cache all the data and ends up doing a lot of disk reads.
I've read that postgres depends on OS to cache the files, I wonder if this is not happenning on windows.
In any case I cannot believe that having 15-20 processes running on windows helps. Why not spwan of threads instead of processes, which might
be far less expensive and more efficient. Is there any way of doing this?
My question is, should I just accept the performance I am getting as the limit on windows or should I be looking at some other params that I might have missed?
Thanks,
Gopal
On Thu, 23 Nov 2006 22:31:40 -0000 "Gopal" <gopal@getmapping.com> wrote: > Hi all, > > > > I have a postgres installation thats running under 70-80% CPU usage > while > > an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load. > > > > Here's the scenario, > > 300 queries/second > > Server: Postgres 8.1.4 on win2k server > > CPU: Dual Xeon 3.6 Ghz, > > Memory: 4GB RAM > > Disks: 3 x 36gb , 15K RPM SCSI > > C# based web application calling postgres functions using npgsql 0.7. > > Its almost completely read-only db apart from fortnightly updates. > > > > Table 1 - About 300,000 rows with simple rectangles > > Table 2 - 1 million rows > > Total size: 300MB > > > > Functions : Simple coordinate reprojection and intersection query + > inner join of table1 and table2. > > I think I have all the right indexes defined and indeed the performance > for queries under low loads is fast. > > > > > > ======================================================================== > ========== > > postgresql.conf has following settings > > max_connections = 150 > > hared_buffers = 20000 # min 16 or > max_connections*2, 8KB each Considering you have 4G or RAM, you might want to allocate more than 160M to shared buffers. > temp_buffers = 2000 # min 100, 8KB each > > max_prepared_transactions = 25 # can be 0 or more > > # note: increasing max_prepared_transactions costs ~600 bytes of shared > memory > > # per transaction slot, plus lock space (see max_locks_per_transaction). > > work_mem = 512 # min 64, size in KB Again, with 4G of RAM, you may get some beneifit from more than 1/2M of work space. > SQL server caches all the data in memory which is making it faster(uses > about 1.2GB memory- which is fine). > > But postgres has everything spread across 10-15 processes, with each > process using about 10-30MB, not nearly enough to cache all the data and > ends up doing a lot of disk reads. Allocate more shared buffers and PG will use it. > I've read that postgres depends on OS to cache the files, I wonder if > this is not happenning on windows. Yes, but it can access data even faster if it's in the shared buffer space. There are numerous write-ups on the Internet about this sort of tuning. > In any case I cannot believe that having 15-20 processes running on > windows helps. Why not spwan of threads instead of processes, which > might > > be far less expensive and more efficient. Is there any way of doing > this? Because every other OS (Linux, BSD, Solaris, etc) does very well with multiple spawned processes. I expect that future versions of PG will have some improvements to allow better performance on Windows, but you'll be surprised how well it runs under a POSIX OS. > My question is, should I just accept the performance I am getting as the > limit on windows or should I be looking at some other params that I > might have missed? I have a feeling that some tuning would improve things for you.
On 11/23/06, Gopal <gopal@getmapping.com> wrote: > I have a postgres installation thats running under 70-80% CPU usage while > > an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load. i somehow doubt ms sql server is 35x faster than postgresql in production environments, even on windows. > work_mem = 512 # min 64, this is probably too low. > SQL server caches all the data in memory which is making it faster(uses > about 1.2GB memory- which is fine). > > But postgres has everything spread across 10-15 processes, with each process > using about 10-30MB, not nearly enough to cache all the data and ends up > doing a lot of disk reads. this is a misleading and unfortuante shortcoming of the windows process manager. postgresql uses a lot of shared memory, and if you have shared memory set to 10 mb, each process in the task manager can report up to 10 mb (at the same time) even though only 10mb is really in use. > I've read that postgres depends on OS to cache the files, I wonder if this > is not happenning on windows. Are you suggesting postgresql somehow turned off file caching in windows? > In any case I cannot believe that having 15-20 processes running on windows > helps. Why not spwan of threads instead of processes, which might this was an important arguement in oh, say, 1992 :-). Seriously, even though processes are slower in windows than threads for certain things, it's not as much as you'd expect and certainly not causing any performance issues you are suffering. My question is, should I just accept the performance I am getting as the > limit on windows or should I be looking at some other params that I might > have missed? i'd start by logging queries with execution times and looking for queries that are running the slowest. merlin
Alvaro Herrera a écrit : > Arnaud Lesauvage wrote: >> Martijn van Oosterhout a écrit : >> >On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote: >> >>Arnaud Lesauvage <thewild@freesurf.fr> writes: >> >>> When trying to import a 20M rows csv file into PostgreSQL, I >> >>> get : >> >> >> >>> ERROR: out of memory >> >>> État SQL :53200 >> >>> Détail :Failed on request of size 1073741823. >> >>> Contexte : COPY tmp, line 1 >> >> >> >>Can you put together a self-contained example? The reference to "line >> >>1" suggests that you wouldn't need the whole 20M row file, just the >> >>first few rows ... >> > >> >Maybe it's a line termination problem? >> >> I think you are right ! >> Trying to see the first line with sed outputs the whole file! >> All I did was export the file in UNICODE from MSSQL, convert >> it with iconv -f "UCS-4-INTERNAL" -t "UTF-8" myfile.cvs. >> >> I guess I still don't have the right encoding... :( > > Did you set the encoding with \encoding? I think it's critical for > determining line and field separators. If you only do SET > client_encoding, the backend will work but psql may not. > > Or you mean that the first line of the text file is the whole file? In > that case I'd guess that the iconv procedure is borked somehow, or maybe > the input file is OK for everything except the linefeed(*) No, I used "SET cleint_encoding". But I checked the file with sed, and sed agrees with PostgreSQL : there is just one line in the file. I have a last idea. I'll give it a try today, if it doesn't work I'll forget about this COPY stuff and work through ODBC. -- Arnaud
> Hi all, > > > > I have a postgres installation thats running under 70-80% CPU usage > while > > an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load. > > > > Here's the scenario, > > 300 queries/second > > Server: Postgres 8.1.4 on win2k server > > CPU: Dual Xeon 3.6 Ghz, > > Memory: 4GB RAM > > Disks: 3 x 36gb , 15K RPM SCSI > > C# based web application calling postgres functions using npgsql 0.7. > > Its almost completely read-only db apart from fortnightly updates. > > > > Table 1 - About 300,000 rows with simple rectangles > > Table 2 - 1 million rows > > Total size: 300MB > > > > Functions : Simple coordinate reprojection and intersection query + > inner join of table1 and table2. > > I think I have all the right indexes defined and indeed the performance > for queries under low loads is fast. (cut) > SQL server caches all the data in memory which is making it faster(uses > about 1.2GB memory- which is fine). > > But postgres has everything spread across 10-15 processes, with each > process using about 10-30MB, not nearly enough to cache all the data and > ends up doing a lot of disk reads. > > I've read that postgres depends on OS to cache the files, I wonder if > this is not happenning on windows. > > In any case I cannot believe that having 15-20 processes running on > windows helps. Why not spwan of threads instead of processes, which > might > > be far less expensive and more efficient. Is there any way of doing > this? Hi Gopal, It sounds as if you are using PostGIS to store your geometries, and yes it sounds as if something is not performing as it should. Please post your configuration (along with information about the versions of PostGIS you are using) to the postgis-users list at http://postgis.refractions.net. You will also need to supply the output of EXPLAIN ANALYZE for some of your queries in order to help determine exactly where the bottleneck is in your application. Kind regards, Mark.