Re: Creating table with data from a join - Mailing list pgsql-general
From | Julien Rouhaud |
---|---|
Subject | Re: Creating table with data from a join |
Date | |
Msg-id | 55A53951.4080505@dalibo.com Whole thread Raw |
In response to | Re: Creating table with data from a join (Igor Stassiy <istassiy@gmail.com>) |
List | pgsql-general |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 14/07/2015 18:21, Igor Stassiy wrote: > Julien, I have the following setting for WAL level: #wal_level = > minimal (which defaults to minimal anyway) > > On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud > <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> > wrote: > > On 14/07/2015 11:12, Igor Stassiy wrote: >> Hello, >> >> I am benchmarking different ways of putting data into table on >> table creation: >> >> 1. INSERT INTO c SELECT * FROM a JOIN b on a.id <http://a.id> > <http://a.id> = b.id <http://b.id> >> <http://b.id>; 2. CREATE TABLE c AS SELECT * FROM a JOIN b on >> a.id <http://a.id> > <http://a.id> = b.id <http://b.id> >> <http://b.id>; 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id >> <http://a.id> > <http://a.id> = b.id <http://b.id> >> <http://b.id>) TO STDOUT" | parallel --block 128M --jobs 4 --pipe >> psql -c "COPY c FROM STDIN"; >> >> (the parallel command is available as part of parallel deb >> package in Ubuntu for example, it splits the stdin by newline >> character and feeds it to the corresponding command) >> >> Both tables a and b have ~16M records and one of the columns in a >> is geometry (ranging from several KB in size to several MB). >> Columns in b are mostly integers. >> >> The machine that I am running these commands on has the >> following parameters: >> >> default_statistics_target = 50 # pgtune wizard 2012-06-06 >> maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 >> constraint_exclusion = on # pgtune wizard 2012-06-06 >> checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06 >> effective_cache_size = 48GB # pgtune wizard 2012-06-06 work_mem = >> 80MB # pgtune wizard 2012-06-06 wal_buffers = 8MB # pgtune wizard >> 2012-06-06 checkpoint_segments = 16 # pgtune wizard 2012-06-06 >> shared_buffers = 16GB # pgtune wizard 2012-06-06 max_connections >> = 400 # pgtune wizard 2012-06-06 >> >> One would expect the 3rd option to be faster than 1 and 2, >> however 2 outperforms both by a large margin (sometimes x2). This >> is especially surprising taking into account that COPY doesn't >> acquire a global lock on the table, only a RowExclusiveLock >> (according to > http://www.postgresql.org/message-id/10611.1014867684@sss.pgh.pa.us) > > > > > What is wal_level value? I think this is because of an > optimisation happening with wal_level = minimal: > > "In minimal level, WAL-logging of some bulk operations can be > safely skipped, which can make those operations much faster" > > see > http://www.postgresql.org/docs/current/static/runtime-config-wal.html > > >> So is option 2 a winner by design? Could you please suggest >> other alternatives to try (if there are any)? And what might be >> the reason that 3 is not outperforming the other 2? >> >> Thank you, Igor >> >> > > > -- Julien Rouhaud http://dalibo.com - http://dalibo.org > - -- Julien Rouhaud http://dalibo.com - http://dalibo.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQEcBAEBAgAGBQJVpTlRAAoJELGaJ8vfEpOqvI4H/RZygc5QXOuEZDWqmWRoZZ5N kNLWxPJbQ7cLpSNIUj3gJmq9bj0I3K071L09KbJWgxtwvQCzgiTsUIVURv7V83C6 nQ8CmrRr96+jKprx5Gw/uqSel8qnbi9LApl1IDqx9Hnd/HnyVOemND2gzHOQhsKN tvGuo4ac5yR+rsFA8FHuwXgSgVH2NEDL2n4Zv6jI2uwh5NRBeeGEn8MFKDZCSWN6 HXG9wZaelSrYbcSfumRg07RLnAmP6E/xbY1eB8dA17XmnFxE9AMTFy0YqJb8Kl5Z KvzQ6+VHnrW2zaoCUOGE56ra2La7TPeJxxeNA9U9Li+8GmvJIQHqIoQvLz7CzT8= =Ztkl -----END PGP SIGNATURE-----
pgsql-general by date: