Create table from view, large data - out of memory (postgresql 8.2.0) - Mailing list pgsql-general

From Peter Petrov
Subject Create table from view, large data - out of memory (postgresql 8.2.0)
Date
Msg-id 46079A46.3040804@demabg.com
Whole thread Raw
Responses Re: Create table from view, large data - out of memory (postgresql 8.2.0)  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
Hi all,

PostgreSQL version: 8.2.0 on Linux xeonito 2.6.19.3 #1 SMP Mon Feb 12
18:57:16 EET 2007 i686 i686 i386 GNU/Linux
Slackware 11.0.0

Memory parameters from postgresql.conf:
shared_buffers = 512MB
work_mem = 128MB
maintenance_work_mem = 512MB
max_fsm_pages = 1638400
effective_cache_size = 2200MB

cat /proc/sys/kernel/shmmax - 4000000000

Memory parameters are smaller, because I want to avoid "out of memory"
if there are a few parallel queries.

I'm trying to create a table as select data from a view (prototype).
The  prototype view uses some data, aggregated by another view (which
combine (UNION ALL) data: ~ 14 GB  - 3 tables, ~ 15 millions rows).

CREATE TABLE  aa_clusters_raw AS SELECT * FROM clusters_raw_proto;
ERROR:  out of memory
DETAIL:  Failed on request of size 194.

And explain output:
EXPLAIN SELECT * FROM clusters_raw_proto;
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=6522535.23..6523961.81 rows=19021 width=340)
   ->  Hash Join  (cost=6517053.40..6520202.32 rows=54892 width=340)
         Hash Cond: (b.bts_sub = clusters.bts)
         ->  HashAggregate  (cost=6517032.51..6518932.51 rows=40000
width=488)
               ->  Subquery Scan allbts_new_proto
(cost=2016810.08..6461438.07 rows=1389861 width=488)
                     ->  Append  (cost=2016810.08..6444064.81
rows=1389861 width=496)
                           ->  Subquery Scan "*SELECT* 1"
(cost=2016810.08..2421192.58 rows=431342 width=496)
                                 ->  GroupAggregate
(cost=2016810.08..2416879.16 rows=431342 width=496)
                                       ->  Sort
(cost=2016810.08..2027593.61 rows=4313411 width=496)
                                             Sort Key: cell,
"timestamp", bsc
                                             ->  Seq Scan on nbts_raw
(cost=0.00..575789.11 rows=4313411 width=496)
                           ->  Subquery Scan "*SELECT* 2"
(cost=2211725.24..2689544.91 rows=888967 width=118)
                                 ->  GroupAggregate
(cost=2211725.24..2680655.24 rows=888967 width=118)
                                       ->  Sort
(cost=2211725.24..2233949.41 rows=8889668 width=118)
                                             Sort Key: cell,
"timestamp", bsc
                                             ->  Seq Scan on bts
(cost=0.00..623575.68 rows=8889668 width=118)
                           ->  Subquery Scan "*SELECT* 3"
(cost=1260555.90..1333327.33 rows=69552 width=180)
                                 ->  GroupAggregate
(cost=1260555.90..1332631.81 rows=69552 width=180)
                                       ->  Merge Join
(cost=1260555.90..1322504.14 rows=175098 width=180)
                                             Merge Cond: ((a.cell =
b.cell) AND (a."timestamp" = b."timestamp") AND (a.bsc = b.bsc))
                                             ->  Sort
(cost=127343.39..129082.16 rows=695510 width=151)
                                                   Sort Key: a.cell,
a."timestamp", a.bsc
                                                   ->  Seq Scan on
ebts_raw a  (cost=0.00..59852.10 rows=695510 width=151)
                                             ->  Sort
(cost=1133212.51..1146524.33 rows=5324730 width=59)
                                                   Sort Key: b.cell,
b."timestamp", b.bsc
                                                   ->  Seq Scan on ebts
b  (cost=0.00..338126.30 rows=5324730 width=59)
         ->  Hash  (cost=18.51..18.51 rows=951 width=22)
               ->  Seq Scan on clusters  (cost=0.00..18.51 rows=951
width=22)


The result of the new table should be less than 100 MB, because of
aggregation.

Is there a way to tell postgre to start writing the new table, before
finish all "calculations" (something like automatic using of cursors) ?

I know it's possible by splitting the data using WHERE, but is there
another method ?

If I split the 3 "big" tables into smaller pieces ... for example on
monthly based storing of data and than I'm using a aggregation VIEW with
constraints (timestamp)... this operation will be faster, but I'm not so
sure if it will be again possible to create just one table, combining
all results.

Any suggestions are welcome :)


Regards,
Peter Petrov



pgsql-general by date:

Previous
From: "A. Kretschmer"
Date:
Subject: Re: Log Query and Execution time
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Create table from view, large data - out of memory (postgresql 8.2.0)