Thread: COPY FROM : out of memory

COPY FROM : out of memory

From
Arnaud Lesauvage
Date:
Hi list !

When trying to import a 20M rows csv file into PostgreSQL, I
get :

ERROR: out of memory
État SQL :53200
Détail :Failed on request of size 1073741823.
Contexte : COPY tmp, line 1

The table has no index, no trigger, ... :

CREATE TABLE tmp
(
   c1 bigint,
   c2 character varying,
   c3 character varying
)
WITHOUT OIDS;
ALTER TABLE tmp OWNER TO postgres;

The COPY command is very basic :
SET client_encoding TO UTF8;
COPY tmp FROM
'E:\\Production\\Temp\\detailrechercheutf8.csv' CSV;

PostgreSQL version is :
"PostgreSQL 8.1.5 on i686-pc-mingw32, compiled by GCC
gcc.exe (GCC) 3.4.2 (mingw-special)"

I have ~1.5GB of RAM available, and ~4GB of free pagefile space.

Something wrong in my postgresql.conf ? I didn't do much
tweaking though...

Regards
--
Arnaud

Re: COPY FROM : out of memory

From
Tom Lane
Date:
Arnaud Lesauvage <thewild@freesurf.fr> writes:
> When trying to import a 20M rows csv file into PostgreSQL, I
> get :

> ERROR: out of memory
> �tat SQL :53200
> D�tail :Failed on request of size 1073741823.
> Contexte : COPY tmp, line 1

Can you put together a self-contained example?  The reference to "line
1" suggests that you wouldn't need the whole 20M row file, just the
first few rows ...

            regards, tom lane

Re: COPY FROM : out of memory

From
Martijn van Oosterhout
Date:
On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote:
> Arnaud Lesauvage <thewild@freesurf.fr> writes:
> > When trying to import a 20M rows csv file into PostgreSQL, I
> > get :
>
> > ERROR: out of memory
> > État SQL :53200
> > Détail :Failed on request of size 1073741823.
> > Contexte : COPY tmp, line 1
>
> Can you put together a self-contained example?  The reference to "line
> 1" suggests that you wouldn't need the whole 20M row file, just the
> first few rows ...

Maybe it's a line termination problem?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: COPY FROM : out of memory

From
Terry Fielder
Date:
Tom Lane wrote:
Arnaud Lesauvage <thewild@freesurf.fr> writes: 
When trying to import a 20M rows csv file into PostgreSQL, I 
get :   
 
ERROR: out of memory
État SQL :53200
Détail :Failed on request of size 1073741823.
Contexte : COPY tmp, line 1   
Can you put together a self-contained example?  The reference to "line
1" suggests that you wouldn't need the whole 20M row file, just the
first few rows ... 
Unless its not seeing the end of the first record AS the end of the first record, and hence seeing the whole file as 1 record.

Terry

		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
              http://archives.postgresql.org/
 

Re: COPY FROM : out of memory

From
Arnaud Lesauvage
Date:
Martijn van Oosterhout a écrit :
> On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote:
>> Arnaud Lesauvage <thewild@freesurf.fr> writes:
>> > When trying to import a 20M rows csv file into PostgreSQL, I
>> > get :
>>
>> > ERROR: out of memory
>> > État SQL :53200
>> > Détail :Failed on request of size 1073741823.
>> > Contexte : COPY tmp, line 1
>>
>> Can you put together a self-contained example?  The reference to "line
>> 1" suggests that you wouldn't need the whole 20M row file, just the
>> first few rows ...
>
> Maybe it's a line termination problem?
>
> Have a nice day,

I think you are right !
Trying to see the first line with sed outputs the whole file!
All I did was export the file in UNICODE from MSSQL, convert
it with iconv -f "UCS-4-INTERNAL" -t "UTF-8" myfile.cvs.

I guess I still don't have the right encoding... :(

Re: COPY FROM : out of memory

From
Alvaro Herrera
Date:
Arnaud Lesauvage wrote:
> Martijn van Oosterhout a écrit :
> >On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote:
> >>Arnaud Lesauvage <thewild@freesurf.fr> writes:
> >>> When trying to import a 20M rows csv file into PostgreSQL, I
> >>> get :
> >>
> >>> ERROR: out of memory
> >>> État SQL :53200
> >>> Détail :Failed on request of size 1073741823.
> >>> Contexte : COPY tmp, line 1
> >>
> >>Can you put together a self-contained example?  The reference to "line
> >>1" suggests that you wouldn't need the whole 20M row file, just the
> >>first few rows ...
> >
> >Maybe it's a line termination problem?
>
> I think you are right !
> Trying to see the first line with sed outputs the whole file!
> All I did was export the file in UNICODE from MSSQL, convert
> it with iconv -f "UCS-4-INTERNAL" -t "UTF-8" myfile.cvs.
>
> I guess I still don't have the right encoding... :(

Did you set the encoding with \encoding?  I think it's critical for
determining line and field separators.  If you only do SET
client_encoding, the backend will work but psql may not.

Or you mean that the first line of the text file is the whole file?  In
that case I'd guess that the iconv procedure is borked somehow, or maybe
the input file is OK for everything except the linefeed(*)

(*) is "linefeed" plural or do you need to add an "s"?  Is the singular
"linefood"???

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Postgres scalability and performance on windows

From
"Gopal"
Date:

Hi all,

 

I have a postgres installation thats running under 70-80% CPU usage while

an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.

 

Here’s the scenario,

300 queries/second

Server: Postgres 8.1.4 on win2k server

CPU: Dual Xeon 3.6 Ghz,

Memory: 4GB RAM

Disks: 3 x 36gb , 15K RPM SCSI

C# based web application calling postgres functions using npgsql 0.7.

Its almost completely read-only db apart from fortnightly updates.

 

Table 1 - About 300,000 rows with simple rectangles

Table 2 – 1 million rows

Total size: 300MB

 

Functions : Simple coordinate reprojection and intersection query + inner join of table1 and table2.

I think I have all the right indexes defined and indeed the performance for  queries under low loads is fast.

 

 

==================================================================================

postgresql.conf has following settings

max_connections = 150

hared_buffers = 20000                            # min 16 or max_connections*2, 8KB each

temp_buffers = 2000                               # min 100, 8KB each

max_prepared_transactions = 25             # can be 0 or more

# note: increasing max_prepared_transactions costs ~600 bytes of shared memory

# per transaction slot, plus lock space (see max_locks_per_transaction).

work_mem = 512                                   # min 64, size in KB

#maintenance_work_mem = 16384                      # min 1024, size in KB

max_stack_depth = 2048

effective_cache_size = 82728                  # typically 8KB each

random_page_cost = 4                           # units are one sequential page fetch

==================================================================================

 

SQL server caches all the data in memory which is making it faster(uses about 1.2GB memory- which is fine).

But postgres has everything spread across 10-15 processes, with each process using about 10-30MB, not nearly enough to cache all the data and ends up doing a lot of disk reads.

I've read that postgres depends on OS to cache the files, I wonder if this is not happenning on windows.

 

In any case I cannot believe that having 15-20 processes running on windows helps. Why not spwan of threads instead of processes, which might

be far less expensive and more efficient. Is there any way of doing this?

 

My question is, should I just accept the performance I am getting as the limit on windows or should I be looking at some other params that I might have missed?

 

Thanks,

Gopal

Re: Postgres scalability and performance on windows

From
"Thomas H."
Date:
i'm seeing the opposite here on our win32 pgsql instance. while mssql server used ~50% cpu constantly (AND consuming lots of memory, pgsql runs at a low 1-5% and gives better performance even under relatively high load.
 
is the high cpu load coming from one particular postgres.exe process or is it distributed among all the processes?
 
try raising work_mem. we have set it to 30MB
 
- thomas
----- Original Message -----
From: Gopal
Sent: Thursday, November 23, 2006 11:31 PM
Subject: [GENERAL] Postgres scalability and performance on windows

Hi all,

 

I have a postgres installation thats running under 70-80% CPU usage while

an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.

 

Here’s the scenario,

300 queries/second

Server: Postgres 8.1.4 on win2k server

CPU: Dual Xeon 3.6 Ghz,

Memory: 4GB RAM

Disks: 3 x 36gb , 15K RPM SCSI

C# based web application calling postgres functions using npgsql 0.7.

Its almost completely read-only db apart from fortnightly updates.

 

Table 1 - About 300,000 rows with simple rectangles

Table 2 – 1 million rows

Total size: 300MB

 

Functions : Simple coordinate reprojection and intersection query + inner join of table1 and table2.

I think I have all the right indexes defined and indeed the performance for  queries under low loads is fast.

 

 

==================================================================================

postgresql.conf has following settings

max_connections = 150

hared_buffers = 20000                            # min 16 or max_connections*2, 8KB each

temp_buffers = 2000                               # min 100, 8KB each

max_prepared_transactions = 25             # can be 0 or more

# note: increasing max_prepared_transactions costs ~600 bytes of shared memory

# per transaction slot, plus lock space (see max_locks_per_transaction).

work_mem = 512                                   # min 64, size in KB

#maintenance_work_mem = 16384                      # min 1024, size in KB

max_stack_depth = 2048

effective_cache_size = 82728                  # typically 8KB each

random_page_cost = 4                           # units are one sequential page fetch

==================================================================================

 

SQL server caches all the data in memory which is making it faster(uses about 1.2GB memory- which is fine).

But postgres has everything spread across 10-15 processes, with each process using about 10-30MB, not nearly enough to cache all the data and ends up doing a lot of disk reads.

I've read that postgres depends on OS to cache the files, I wonder if this is not happenning on windows.

 

In any case I cannot believe that having 15-20 processes running on windows helps. Why not spwan of threads instead of processes, which might

be far less expensive and more efficient. Is there any way of doing this?

 

My question is, should I just accept the performance I am getting as the limit on windows or should I be looking at some other params that I might have missed?

 

Thanks,

Gopal

Re: Postgres scalability and performance on windows

From
Bill Moran
Date:
On Thu, 23 Nov 2006 22:31:40 -0000
"Gopal" <gopal@getmapping.com> wrote:

> Hi all,
>
>
>
> I have a postgres installation thats running under 70-80% CPU usage
> while
>
> an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.
>
>
>
> Here's the scenario,
>
> 300 queries/second
>
> Server: Postgres 8.1.4 on win2k server
>
> CPU: Dual Xeon 3.6 Ghz,
>
> Memory: 4GB RAM
>
> Disks: 3 x 36gb , 15K RPM SCSI
>
> C# based web application calling postgres functions using npgsql 0.7.
>
> Its almost completely read-only db apart from fortnightly updates.
>
>
>
> Table 1 - About 300,000 rows with simple rectangles
>
> Table 2 - 1 million rows
>
> Total size: 300MB
>
>
>
> Functions : Simple coordinate reprojection and intersection query +
> inner join of table1 and table2.
>
> I think I have all the right indexes defined and indeed the performance
> for  queries under low loads is fast.
>
>
>
>
>
> ========================================================================
> ==========
>
> postgresql.conf has following settings
>
> max_connections = 150
>
> hared_buffers = 20000                            # min 16 or
> max_connections*2, 8KB each

Considering you have 4G or RAM, you might want to allocate more than 160M to
shared buffers.

> temp_buffers = 2000                               # min 100, 8KB each
>
> max_prepared_transactions = 25             # can be 0 or more
>
> # note: increasing max_prepared_transactions costs ~600 bytes of shared
> memory
>
> # per transaction slot, plus lock space (see max_locks_per_transaction).
>
> work_mem = 512                                   # min 64, size in KB

Again, with 4G of RAM, you may get some beneifit from more than 1/2M of
work space.

> SQL server caches all the data in memory which is making it faster(uses
> about 1.2GB memory- which is fine).
>
> But postgres has everything spread across 10-15 processes, with each
> process using about 10-30MB, not nearly enough to cache all the data and
> ends up doing a lot of disk reads.

Allocate more shared buffers and PG will use it.

> I've read that postgres depends on OS to cache the files, I wonder if
> this is not happenning on windows.

Yes, but it can access data even faster if it's in the shared buffer
space.  There are numerous write-ups on the Internet about this sort
of tuning.

> In any case I cannot believe that having 15-20 processes running on
> windows helps. Why not spwan of threads instead of processes, which
> might
>
> be far less expensive and more efficient. Is there any way of doing
> this?

Because every other OS (Linux, BSD, Solaris, etc) does very well with
multiple spawned processes.  I expect that future versions of PG will
have some improvements to allow better performance on Windows, but you'll
be surprised how well it runs under a POSIX OS.

> My question is, should I just accept the performance I am getting as the
> limit on windows or should I be looking at some other params that I
> might have missed?

I have a feeling that some tuning would improve things for you.

Re: Postgres scalability and performance on windows

From
"Merlin Moncure"
Date:
On 11/23/06, Gopal <gopal@getmapping.com> wrote:
> I have a postgres installation thats running under 70-80% CPU usage while
>
> an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.

i somehow doubt ms sql server is 35x faster than postgresql in
production environments, even on windows.

> work_mem = 512                                   # min 64,
this is probably too low.

> SQL server caches all the data in memory which is making it faster(uses
> about 1.2GB memory- which is fine).
>
> But postgres has everything spread across 10-15 processes, with each process
> using about 10-30MB, not nearly enough to cache all the data and ends up
> doing a lot of disk reads.

this is a misleading and unfortuante shortcoming of the windows
process manager.  postgresql uses a lot of shared memory, and if you
have shared memory set to 10 mb, each process in the task manager can
report up to 10 mb (at the same time) even though only 10mb is really
in use.

> I've read that postgres depends on OS to cache the files, I wonder if this
> is not happenning on windows.

Are you suggesting postgresql somehow turned off file caching in windows?

> In any case I cannot believe that having 15-20 processes running on windows
> helps. Why not spwan of threads instead of processes, which might

this was an important arguement in oh, say, 1992 :-).  Seriously, even
though processes are slower in windows than threads for certain
things, it's not as much as you'd expect and certainly not causing any
performance issues you are suffering.

 My question is, should I just accept the performance I am getting as the
> limit on windows or should I be looking at some other params that I might
> have missed?

i'd start by logging queries with execution times and looking for
queries that are running the slowest.

merlin

Re: COPY FROM : out of memory

From
Arnaud Lesauvage
Date:
Alvaro Herrera a écrit :
> Arnaud Lesauvage wrote:
>> Martijn van Oosterhout a écrit :
>> >On Thu, Nov 23, 2006 at 11:27:06AM -0500, Tom Lane wrote:
>> >>Arnaud Lesauvage <thewild@freesurf.fr> writes:
>> >>> When trying to import a 20M rows csv file into PostgreSQL, I
>> >>> get :
>> >>
>> >>> ERROR: out of memory
>> >>> État SQL :53200
>> >>> Détail :Failed on request of size 1073741823.
>> >>> Contexte : COPY tmp, line 1
>> >>
>> >>Can you put together a self-contained example?  The reference to "line
>> >>1" suggests that you wouldn't need the whole 20M row file, just the
>> >>first few rows ...
>> >
>> >Maybe it's a line termination problem?
>>
>> I think you are right !
>> Trying to see the first line with sed outputs the whole file!
>> All I did was export the file in UNICODE from MSSQL, convert
>> it with iconv -f "UCS-4-INTERNAL" -t "UTF-8" myfile.cvs.
>>
>> I guess I still don't have the right encoding... :(
>
> Did you set the encoding with \encoding?  I think it's critical for
> determining line and field separators.  If you only do SET
> client_encoding, the backend will work but psql may not.
>
> Or you mean that the first line of the text file is the whole file?  In
> that case I'd guess that the iconv procedure is borked somehow, or maybe
> the input file is OK for everything except the linefeed(*)

No, I used "SET cleint_encoding".
But I checked the file with sed, and sed agrees with
PostgreSQL : there is just one line in the file.
I have a last idea. I'll give it a try today, if it doesn't
work I'll forget about this COPY stuff and work through ODBC.

--
Arnaud

Re: Postgres scalability and performance on windows

From
"Mark Cave-Ayland"
Date:
> Hi all,
>
>
>
> I have a postgres installation thats running under 70-80% CPU usage
> while
>
> an MSSQL7 installation did 'roughly' the same thing with 1-2% CPU load.
>
>
>
> Here's the scenario,
>
> 300 queries/second
>
> Server: Postgres 8.1.4 on win2k server
>
> CPU: Dual Xeon 3.6 Ghz,
>
> Memory: 4GB RAM
>
> Disks: 3 x 36gb , 15K RPM SCSI
>
> C# based web application calling postgres functions using npgsql 0.7.
>
> Its almost completely read-only db apart from fortnightly updates.
>
>
>
> Table 1 - About 300,000 rows with simple rectangles
>
> Table 2 - 1 million rows
>
> Total size: 300MB
>
>
>
> Functions : Simple coordinate reprojection and intersection query +
> inner join of table1 and table2.
>
> I think I have all the right indexes defined and indeed the performance
> for  queries under low loads is fast.

(cut)

> SQL server caches all the data in memory which is making it faster(uses
> about 1.2GB memory- which is fine).
>
> But postgres has everything spread across 10-15 processes, with each
> process using about 10-30MB, not nearly enough to cache all the data and
> ends up doing a lot of disk reads.
>
> I've read that postgres depends on OS to cache the files, I wonder if
> this is not happenning on windows.
>
> In any case I cannot believe that having 15-20 processes running on
> windows helps. Why not spwan of threads instead of processes, which
> might
>
> be far less expensive and more efficient. Is there any way of doing
> this?


Hi Gopal,

It sounds as if you are using PostGIS to store your geometries, and yes it
sounds as if something is not performing as it should. Please post your
configuration (along with information about the versions of PostGIS you
are using) to the postgis-users list at http://postgis.refractions.net.
You will also need to supply the output of EXPLAIN ANALYZE for some of
your queries in order to help determine exactly where the bottleneck is in
your application.


Kind regards,

Mark.