Re: Creating table with data from a join - Mailing list pgsql-general

From Julien Rouhaud
Subject Re: Creating table with data from a join
Date
Msg-id 55A53951.4080505@dalibo.com
Whole thread Raw
In response to Re: Creating table with data from a join  (Igor Stassiy <istassiy@gmail.com>)
List pgsql-general
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 14/07/2015 18:21, Igor Stassiy wrote:
> Julien, I have the following setting for WAL level: #wal_level =
> minimal (which defaults to minimal anyway)
>
> On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud
> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>>
> wrote:
>
> On 14/07/2015 11:12, Igor Stassiy wrote:
>> Hello,
>>
>> I am benchmarking different ways of putting data into table on
>> table creation:
>>
>> 1. INSERT INTO c SELECT * FROM a JOIN b on a.id <http://a.id>
> <http://a.id> = b.id <http://b.id>
>> <http://b.id>; 2. CREATE TABLE c AS SELECT * FROM a JOIN b on
>> a.id <http://a.id>
> <http://a.id> = b.id <http://b.id>
>> <http://b.id>; 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id
>> <http://a.id>
> <http://a.id> = b.id <http://b.id>
>> <http://b.id>) TO STDOUT" | parallel --block 128M --jobs 4 --pipe
>> psql -c "COPY c FROM STDIN";
>>
>> (the parallel command is available as part of parallel deb
>> package in Ubuntu for example, it splits the stdin by newline
>> character and feeds it to the corresponding command)
>>
>> Both tables a and b have ~16M records and one of the columns in a
>> is geometry (ranging from several KB in size to several MB).
>> Columns in b are mostly integers.
>>
>> The machine that I am running these commands on has the
>> following parameters:
>>
>> default_statistics_target = 50 # pgtune wizard 2012-06-06
>> maintenance_work_mem = 1GB # pgtune wizard 2012-06-06
>> constraint_exclusion = on # pgtune wizard 2012-06-06
>> checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06
>> effective_cache_size = 48GB # pgtune wizard 2012-06-06 work_mem =
>> 80MB # pgtune wizard 2012-06-06 wal_buffers = 8MB # pgtune wizard
>> 2012-06-06 checkpoint_segments = 16 # pgtune wizard 2012-06-06
>> shared_buffers = 16GB # pgtune wizard 2012-06-06 max_connections
>> = 400 # pgtune wizard 2012-06-06
>>
>> One would expect the 3rd option to be faster than 1 and 2,
>> however 2 outperforms both by a large margin (sometimes x2). This
>> is especially surprising taking into account that COPY doesn't
>> acquire a global lock on the table, only a RowExclusiveLock
>> (according to
> http://www.postgresql.org/message-id/10611.1014867684@sss.pgh.pa.us)
>
>
>
>
> What is wal_level value? I think this is because of an
> optimisation happening with wal_level = minimal:
>
> "In minimal level, WAL-logging of some bulk operations can be
> safely skipped, which can make those operations much faster"
>
> see
> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
>
>
>> So is option 2 a winner by design? Could you please suggest
>> other alternatives to try (if there are any)? And what might be
>> the reason that 3 is not outperforming the other 2?
>>
>> Thank you, Igor
>>
>>
>
>
> -- Julien Rouhaud http://dalibo.com - http://dalibo.org
>


- --
Julien Rouhaud
http://dalibo.com - http://dalibo.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBAgAGBQJVpTlRAAoJELGaJ8vfEpOqvI4H/RZygc5QXOuEZDWqmWRoZZ5N
kNLWxPJbQ7cLpSNIUj3gJmq9bj0I3K071L09KbJWgxtwvQCzgiTsUIVURv7V83C6
nQ8CmrRr96+jKprx5Gw/uqSel8qnbi9LApl1IDqx9Hnd/HnyVOemND2gzHOQhsKN
tvGuo4ac5yR+rsFA8FHuwXgSgVH2NEDL2n4Zv6jI2uwh5NRBeeGEn8MFKDZCSWN6
HXG9wZaelSrYbcSfumRg07RLnAmP6E/xbY1eB8dA17XmnFxE9AMTFy0YqJb8Kl5Z
KvzQ6+VHnrW2zaoCUOGE56ra2La7TPeJxxeNA9U9Li+8GmvJIQHqIoQvLz7CzT8=
=Ztkl
-----END PGP SIGNATURE-----


pgsql-general by date:

Previous
From: Igor Stassiy
Date:
Subject: Re: Creating table with data from a join
Next
From: Julien Rouhaud
Date:
Subject: Re: Creating table with data from a join