Re: Out of swap space & memory - Mailing list pgsql-general
From | Kevin Bartz |
---|---|
Subject | Re: Out of swap space & memory |
Date | |
Msg-id | 20040809183859.774213FCA5@omta16.mta.everyone.net Whole thread Raw |
In response to | Re: Out of swap space & memory (Lincoln Yeoh <lyeoh@pop.jaring.my>) |
Responses |
Re: Out of swap space & memory
Re: Out of swap space & memory |
List | pgsql-general |
Hi Lincoln! Thanks for your reply. On the problematic SELECT INTO, EXPLAIN says: test=# QUERY PLAN ------------------------------------------------------------------ HashAggregate (cost=42.50..42.50 rows=1000 width=356) -> Seq Scan on hp_raw (cost=0.00..20.00 rows=1000 width=356) (2 rows) Correct me if I'm wrong, but I'm guessing that what I should infer from this is that Postgres is indeed using hash aggregation. I just tried set enable_hashagg = off, per your suggestion, and the SELECT INTO is grouping away without blowup this time. Unfortunately, there's no end in sight. I've let it sit and crank for an hour now, and it's still going. MS SQL Server with as much RAM and less clock speed took six minutes for this de-duping! For comparison, I also tried some other operations, like "select count(*) from octanenights" and creating indexes, and in all instances Postgres finishes in about three-fourths of the time SQL Server does. So why is this SELECT INTO taking so much more time in comparison? Is there any other option I can set to make it move a little faster? Thanks for all your help. Kevin -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Lincoln Yeoh Sent: Friday, August 06, 2004 9:25 PM To: Kevin Bartz; 'Manfred Koizar'; pgsql-general@postgresql.org Subject: Re: [GENERAL] Out of swap space & memory What does EXPLAIN <your query here> say? I had an issue[1] with 7.4.1 where it was using "HashAggregate" because it thought everything could fit in memory, but it couldn't. So I downgraded to 7.3.4 and it worked. In my case Tom Lane suggested a workaround for the 7.4.1 planner erroneously using HashAgregate: set enable_hashagg = off But without an EXPLAIN it's harder for people here to help you. Wonder if 7.4.x could throw a warning and switch to a less memory intensive query plan if it discovers it is running out of mem. Hope this helps, Link. [1] see thread: "postmaster growing to consume all memory" At 07:32 PM 8/6/2004 -0700, Kevin Bartz wrote: >Mike, thanks so much for your reply. I'm sorry for not showing you my SQL. I >didn't show it because I couldn't manage to boil it down to something >reproducible that everyone could try. But here's what it was: > >drop table octanenights; >CREATE TABLE octanenights (member_id varchar(100), campaign_id varchar(100), >catalog_type varchar(100), pushed int, delivered int, clicks int, opened >int, month varchar(100), type1 int, type2 int, type3 int, type4 int, type5 >int); > >copy octanenights from >'/home/kevin/octanenights/proc/uberfile/uberfile1.txt' null as ''; >copy octanenights from >'/home/kevin/octanenights/proc/uberfile/uberfile2.txt' null as ''; >copy octanenights from >'/home/kevin/octanenights/proc/uberfile/uberfile3.txt' null as ''; > >select * from octanenights limit 10; >alter table octanenights rename to octanenights_raw; > >-- de-dup the table >select member_id, campaign_id, catalog_type, pushed, delivered, clicks, >opened, month, type1, type2, type3, type4, type5 >into octanenights >from octanenights_raw >group by member_id, campaign_id, catalog_type, pushed, delivered, clicks, > opened, month, type1, type2, type3, type4, type5; > >Let me tell you a little about octanenights. It's a file of about 36,000,000 >rows, each describing an e-mail sent. Unfortunately, there are duplicate >records scattered throughout the table, which I do not care about. One might >suggest that I could've used uniq from the command line for this, but the >data were not sorted originally and the duplicate records may be scattered >anywhere in the table. The objective in the final line is to de-dup the >table and place it into octanenights, leaving the original in >octanenights_raw in case I ever need to refer back to it. > >MS SQL Server, with as much RAM and less clock speed, de-dups the table in >about six minutes. The de-duped version has about 26,000,000 rows. The final >line is where Postgres gobbles up all my swap and RAM and then conks out >completely. > >Am I doing something wrong? Maybe there was a better way to approach this >problem? I'd be open to suggestions of any kind, since I'm still very, very >new to the world of optimizing Postgres. > >Kevin ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
pgsql-general by date: