Re: problem with large inserts - Mailing list pgsql-performance

From Filip Rembiałkowski
Subject Re: problem with large inserts
Date
Msg-id CAP_rwwmS-BxfX0v6Asq2AMCLyQN=0=_ackrcp-cZkKqLE6W+3A@mail.gmail.com
Whole thread Raw
In response to Re: problem with large inserts  (Lutz Fischer <lfischer@staffmail.ed.ac.uk>)
List pgsql-performance
Hmm, so it is some kind of file / table locking issue, not general IO
system malfunction.

It would be interesting and useful to run this use case on other
postgres instance (or several instances), including non-Windows ones.

OTOH Pg on Windows housekeeping was always "fun" - I advise all my
clients to avoid it for production purposes.




On Thu, Dec 13, 2012 at 5:33 PM, Lutz Fischer
<lfischer@staffmail.ed.ac.uk> wrote:
> Thanks a lot you saved my day
>
> create temp table foo AS SELECT DISTINCT ...
> did take a mere 77464.744 ms
> And an additional
> Insert into LPP select * from foo;
> Just 576.909 ms
>
> I don't really understand why it's working via a temp table but not
> directly (or in any reasonable amount of time) - but at least I have a
> solution I can work with.
>
>
> On 13/12/12 16:09, Filip Rembiałkowski wrote:
>> Just an idea - how long does it take to run _only_
>> CREATE TEMP TABLE foo AS <your SELECT here>
>>
>>
>>
>>
>> On Thu, Dec 13, 2012 at 4:37 PM, Lutz Fischer
>> <lfischer@staffmail.ed.ac.uk> wrote:
>>> Hi
>>>
>>> I have currently some trouble with inserts into a table
>>>
>>> INSERT INTO LPP (PPID, LID)
>>> SELECT DISTINCT PPid, LID FROM
>>>         (SELECT * FROM PP WHERE s_id = sid) pp
>>>             INNER JOIN
>>>         has_protein hp1
>>>             ON pp.p1id = hp1.pid
>>>             INNER JOIN
>>>         has_protein hp2
>>>             ON pp.p2_id = hp2.pid
>>>             INNER JOIN
>>>         (SELECT * FROM L WHERE s_id = sid) l
>>>             ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
>>> pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
>>>             OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
>>> pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
>>>     ;
>>>
>>> If I run only
>>>
>>> SELECT DISTINCT PPid, LID FROM
>>>         (SELECT * FROM PP WHERE s_id = 708) pp
>>>             INNER JOIN
>>>         has_protein hp1
>>>             ON pp.p1id = hp1.pid
>>>             INNER JOIN
>>>         has_protein hp2
>>>             ON pp.p2_id = hp2.pid
>>>             INNER JOIN
>>>         (SELECT * FROM L WHERE s_id = 708) l
>>>             ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
>>> pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
>>>             OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
>>> pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
>>>     ;
>>>
>>> it returns 200620 rows in 170649 ms  ( thats just under 3  minutes). I
>>> stopped the actual insert after about 8h.
>>>
>>> The table that the insert happens to, is following:
>>> CREATE TABLE LPP
>>> (
>>>   ppid bigint NOT NULL,
>>>   lid bigint NOT NULL,
>>>   CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid)
>>> )
>>>
>>> I also tried without the primary key but that one is still running for
>>> more that a day.
>>>
>>> Currently the table LPP holds 471139 rows. Its linking the PP and the L
>>> table.
>>>
>>> There are no foreign keys referring to that table nor are there any
>>> other constraints on it.
>>> Previously I had foreign keys on lid and ppid refering to the L and PP
>>> table. But in a desperate try to get some speed up I deleted these. -
>>> But still...
>>>
>>> I am running postgresql 9.2 on a windows 2008  R2 server with 256 GB and
>>> the database is on something like a raid 1+0 (actually a raid1e)
>>> consisting of 3x4TB disks (limit of what could easily be fitted into the
>>> server).
>>>
>>> At the given time  there were no concurrent access to any of the
>>> involved tables.
>>>
>>> Has anybody some idea why the insert takes so long and/or how to speed
>>> things up a bit? I could live with something like half an hour - better
>>> would be in minutes.
>>>
>>>
>>> Thanks for any responds,
>>>
>>> Lutz Fischer
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>
>
> --
> Lutz Fischer
> lfischer@staffmail.ed.ac.uk
> +44 131 6517057
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>


pgsql-performance by date:

Previous
From: Lutz Fischer
Date:
Subject: Re: problem with large inserts
Next
From: "Kevin Grittner"
Date:
Subject: Re: Limit & offset effect on query plans