Re: Speeding up query, Joining 55mil and 43mil records. - Mailing list pgsql-performance

From Sven Geisler
Subject Re: Speeding up query, Joining 55mil and 43mil records.
Date
Msg-id 449967A3.3030801@aeccom.com
Whole thread Raw
In response to Speeding up query, Joining 55mil and 43mil records.  (nicky <nicky@valuecare.nl>)
List pgsql-performance
Hi Nicky,

I guess, you should try to upgrade the memory setting of PostgreSQL first.

work_mem = 65536

Is a bit low for such large joins.

Did you get a change to watch the directory
<PGDATA>/base/<DBOID>/pgsql_tmp to see how large the temporary file is
during this query. I'm sure that there is large file.

Anyhow, you can upgrade 'work_mem' to 1000000 which is 1 GB. Please note
that the parameter work_mem is per backend process. You will get
problems with multiple large queries at the same time.
You may move (link) the directory 'pgsql_tmp' to a very fast file system
if you still get large files in this directory.

You also can try to increase this settings:

checkpoint_segments = 256
checkpoint_timeout = 3600  # range 30-3600, in seconds
checkpoint_warning = 0     # 0 is off

Please read the PostgreSQL documentation about the drawbacks of this
setting as well as your setting 'fsync=off'.

Cheers
Sven.

nicky schrieb:
> Hello People,
>
> I'm trying to solve a 'what i feel is a' performance/configuration/query
> error on my side. I'm fairly new to configuring PostgreSQL so, i might
> be completely wrong with my configuration.
>
> My database consists of 44 tables, about 20GB. Two of those tables are
> 'big/huge'. Table src.src_faktuur_verricht contains 43million records
> (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB).
>
> Below is the 'slow' query.
>
> INSERT INTO rpt.rpt_verrichting
> (verrichting_id
> ,verrichting_secid
> ,fout_status
> ,patientnr
> ,verrichtingsdatum
> ,locatie_code
> ,afdeling_code
> ,uitvoerder_code
> ,aanvrager_code
> ,verrichting_code
> ,dbcnr
> ,aantal_uitgevoerd
> ,kostenplaats_code
> ,vc_patientnr
> ,vc_verrichting_code
> ,vc_dbcnr
> )
> SELECT  t1.id
> ,       t0.secid
> ,       t1.status
> ,       t1.patientnr
> ,       t1.datum
> ,       t1.locatie
> ,       t1.afdeling
> ,       t1.uitvoerder
> ,       t1.aanvrager
> ,       t0.code
> ,       t1.casenr
> ,       t0.aantal
> ,       t0.kostplaats
> ,       null
> ,       null
> ,       null
> FROM    src.src_faktuur_verrsec t0 JOIN
>         src.src_faktuur_verricht t1 ON
>         t0.id = t1.id
> WHERE   substr(t0.code,1,2) not in ('14','15','16','17')
> AND     (substr(t0.correctie,4,1) <> '1' OR t0.correctie is null)
> AND     EXTRACT(YEAR from t1.datum) > 2004;
>
>
> Output from explain
>
> Hash Join  (cost=1328360.12..6167462.76 rows=7197568 width=118)
>   Hash Cond: (("outer".id)::text = ("inner".id)::text)
>
>   ->  Seq Scan on src_faktuur_verrsec t0  (cost=0.00..2773789.90
> rows=40902852 width=52)
>         Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND
> (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1,
> 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND
> ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL)))
>   ->  Hash  (cost=1188102.97..1188102.97 rows=8942863 width=80)
>         ->  Bitmap Heap Scan on src_faktuur_verricht t1
> (cost=62392.02..1188102.97 rows=8942863 width=80)
>               Recheck Cond: (date_part('year'::text, datum) >
> 2004::double precision)
>               ->  Bitmap Index Scan on src_faktuur_verricht_idx1
> (cost=0.00..62392.02 rows=8942863 width=0)
>                     Index Cond: (date_part('year'::text, datum) >
> 2004::double precision)
>
>
> The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM.
> It contains two SATA150 disks, one contains PostgreSQL and the rest of
> the operating system and the other disk holds the pg_xlog directory.
>
> Changed lines from my postgresql.conf file
>
> shared_buffers = 8192
> temp_buffers = 4096
> work_mem = 65536
> maintenance_work_mem = 1048576
> max_fsm_pages = 40000
> fsync = off
> wal_buffers = 64
> effective_cache_size = 174848
>
> The query above takes around 42 minutes.
>
> However, i also have a wimpy desktop machine with 1gb ram. Windows with
> MSSQL 2000 (default installation), same database structure, same
> indexes, same query, etc and it takes 17 minutes. The big difference
> makes me think that i've made an error with my PostgreSQL configuration.
> I just can't seem to figure it out.
>
> Could someone perhaps give me some pointers, advice?
>
> Thanks in advance.
>
> Nicky
>

pgsql-performance by date:

Previous
From: "jody brownell"
Date:
Subject: Re: Help tuning autovacuum - seeing lots of relationbloat
Next
From: Csaba Nagy
Date:
Subject: Re: Help tuning autovacuum - seeing lots of relationbloat