Thread: Create table from view, large data - out of memory (postgresql 8.2.0)

Create table from view, large data - out of memory (postgresql 8.2.0)

From
Peter Petrov
Date:
Hi all,

PostgreSQL version: 8.2.0 on Linux xeonito 2.6.19.3 #1 SMP Mon Feb 12
18:57:16 EET 2007 i686 i686 i386 GNU/Linux
Slackware 11.0.0

Memory parameters from postgresql.conf:
shared_buffers = 512MB
work_mem = 128MB
maintenance_work_mem = 512MB
max_fsm_pages = 1638400
effective_cache_size = 2200MB

cat /proc/sys/kernel/shmmax - 4000000000

Memory parameters are smaller, because I want to avoid "out of memory"
if there are a few parallel queries.

I'm trying to create a table as select data from a view (prototype).
The  prototype view uses some data, aggregated by another view (which
combine (UNION ALL) data: ~ 14 GB  - 3 tables, ~ 15 millions rows).

CREATE TABLE  aa_clusters_raw AS SELECT * FROM clusters_raw_proto;
ERROR:  out of memory
DETAIL:  Failed on request of size 194.

And explain output:
EXPLAIN SELECT * FROM clusters_raw_proto;
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=6522535.23..6523961.81 rows=19021 width=340)
   ->  Hash Join  (cost=6517053.40..6520202.32 rows=54892 width=340)
         Hash Cond: (b.bts_sub = clusters.bts)
         ->  HashAggregate  (cost=6517032.51..6518932.51 rows=40000
width=488)
               ->  Subquery Scan allbts_new_proto
(cost=2016810.08..6461438.07 rows=1389861 width=488)
                     ->  Append  (cost=2016810.08..6444064.81
rows=1389861 width=496)
                           ->  Subquery Scan "*SELECT* 1"
(cost=2016810.08..2421192.58 rows=431342 width=496)
                                 ->  GroupAggregate
(cost=2016810.08..2416879.16 rows=431342 width=496)
                                       ->  Sort
(cost=2016810.08..2027593.61 rows=4313411 width=496)
                                             Sort Key: cell,
"timestamp", bsc
                                             ->  Seq Scan on nbts_raw
(cost=0.00..575789.11 rows=4313411 width=496)
                           ->  Subquery Scan "*SELECT* 2"
(cost=2211725.24..2689544.91 rows=888967 width=118)
                                 ->  GroupAggregate
(cost=2211725.24..2680655.24 rows=888967 width=118)
                                       ->  Sort
(cost=2211725.24..2233949.41 rows=8889668 width=118)
                                             Sort Key: cell,
"timestamp", bsc
                                             ->  Seq Scan on bts
(cost=0.00..623575.68 rows=8889668 width=118)
                           ->  Subquery Scan "*SELECT* 3"
(cost=1260555.90..1333327.33 rows=69552 width=180)
                                 ->  GroupAggregate
(cost=1260555.90..1332631.81 rows=69552 width=180)
                                       ->  Merge Join
(cost=1260555.90..1322504.14 rows=175098 width=180)
                                             Merge Cond: ((a.cell =
b.cell) AND (a."timestamp" = b."timestamp") AND (a.bsc = b.bsc))
                                             ->  Sort
(cost=127343.39..129082.16 rows=695510 width=151)
                                                   Sort Key: a.cell,
a."timestamp", a.bsc
                                                   ->  Seq Scan on
ebts_raw a  (cost=0.00..59852.10 rows=695510 width=151)
                                             ->  Sort
(cost=1133212.51..1146524.33 rows=5324730 width=59)
                                                   Sort Key: b.cell,
b."timestamp", b.bsc
                                                   ->  Seq Scan on ebts
b  (cost=0.00..338126.30 rows=5324730 width=59)
         ->  Hash  (cost=18.51..18.51 rows=951 width=22)
               ->  Seq Scan on clusters  (cost=0.00..18.51 rows=951
width=22)


The result of the new table should be less than 100 MB, because of
aggregation.

Is there a way to tell postgre to start writing the new table, before
finish all "calculations" (something like automatic using of cursors) ?

I know it's possible by splitting the data using WHERE, but is there
another method ?

If I split the 3 "big" tables into smaller pieces ... for example on
monthly based storing of data and than I'm using a aggregation VIEW with
constraints (timestamp)... this operation will be faster, but I'm not so
sure if it will be again possible to create just one table, combining
all results.

Any suggestions are welcome :)


Regards,
Peter Petrov



Re: Create table from view, large data - out of memory (postgresql 8.2.0)

From
Martijn van Oosterhout
Date:
On Mon, Mar 26, 2007 at 01:02:46PM +0300, Peter Petrov wrote:
> Hi all,
>
> PostgreSQL version: 8.2.0 on Linux xeonito 2.6.19.3 #1 SMP Mon Feb 12
> 18:57:16 EET 2007 i686 i686 i386 GNU/Linux
> Slackware 11.0.0
>
> Memory parameters from postgresql.conf:
> shared_buffers = 512MB
> work_mem = 128MB
> maintenance_work_mem = 512MB
> max_fsm_pages = 1638400
> effective_cache_size = 2200MB
>
> cat /proc/sys/kernel/shmmax - 4000000000
>
> Memory parameters are smaller, because I want to avoid "out of memory"
> if there are a few parallel queries.

You don't say how much memory you have, but in any case I find your
parameters on the high side. Which may be appropriate in your case, but
worth check, especially if it's a 32-bit machine.

Note that work_mem is counted per sort, and probably for the hash too,
so that's at least four times 128MB possible. You may want to check the
maximum amount of memory allowed one process, because you're likely
going well over 1GB of memory here.

You don't specify the exact query, but it's possible that by creating
some indexes you can avoid some sorts, which saves memory also.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment