Thread: problem with large inserts

problem with large inserts

From
Lutz Fischer
Date:
Hi

I have currently some trouble with inserts into a table

INSERT INTO LPP (PPID, LID)
SELECT DISTINCT PPid, LID FROM
        (SELECT * FROM PP WHERE s_id = sid) pp
            INNER JOIN
        has_protein hp1
            ON pp.p1id = hp1.pid
            INNER JOIN
        has_protein hp2
            ON pp.p2_id = hp2.pid
            INNER JOIN
        (SELECT * FROM L WHERE s_id = sid) l
            ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
            OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
    ;

If I run only

SELECT DISTINCT PPid, LID FROM
        (SELECT * FROM PP WHERE s_id = 708) pp
            INNER JOIN
        has_protein hp1
            ON pp.p1id = hp1.pid
            INNER JOIN
        has_protein hp2
            ON pp.p2_id = hp2.pid
            INNER JOIN
        (SELECT * FROM L WHERE s_id = 708) l
            ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
            OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
    ;

it returns 200620 rows in 170649 ms  ( thats just under 3  minutes). I
stopped the actual insert after about 8h.

The table that the insert happens to, is following:
CREATE TABLE LPP
(
  ppid bigint NOT NULL,
  lid bigint NOT NULL,
  CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid)
)

I also tried without the primary key but that one is still running for
more that a hour.

Currently the table LPP holds 471139 rows. Its linking the PP and the L
table.

There are no foreign keys referring to that table nor are there any
other constraints on it.
Previously I had foreign keys on lid and ppid refering to the L and PP
table. But in a desperate try to get some speed up I deleted these. -
But still...

I am running postgresql 9.2 on a windows 2008  R2 server with 256 GB and
the database is on something like a raid 1+0 (actually a raid1e)
consisting of 3x4TB disks (limit of what could easily be fitted into the
server).

At the given time  there were no concurrent access to any of the
involved tables.

Has anybody some idea why the insert takes so long and/or how to speed
things up a bit? I could live with something like half an hour - better
would be in minutes.


Thanks for any responds,

Lutz Fischer


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: problem with large inserts

From
Albe Laurenz
Date:
Lutz Fischer wrote:
> I have currently some trouble with inserts into a table
>
> INSERT INTO LPP (PPID, LID)
> SELECT DISTINCT PPid, LID FROM
>         (SELECT * FROM PP WHERE s_id = sid) pp
>             INNER JOIN
>         has_protein hp1
>             ON pp.p1id = hp1.pid
>             INNER JOIN
>         has_protein hp2
>             ON pp.p2_id = hp2.pid
>             INNER JOIN
>         (SELECT * FROM L WHERE s_id = sid) l
>             ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
> pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
>             OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
> pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
>     ;
>
> If I run only
>
> SELECT DISTINCT PPid, LID FROM
>         (SELECT * FROM PP WHERE s_id = 708) pp
>             INNER JOIN
>         has_protein hp1
>             ON pp.p1id = hp1.pid
>             INNER JOIN
>         has_protein hp2
>             ON pp.p2_id = hp2.pid
>             INNER JOIN
>         (SELECT * FROM L WHERE s_id = 708) l
>             ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
> pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
>             OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
> pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
>     ;
>
> it returns 200620 rows in 170649 ms  ( thats just under 3  minutes). I
> stopped the actual insert after about 8h.
>
> The table that the insert happens to, is following:
> CREATE TABLE LPP
> (
>   ppid bigint NOT NULL,
>   lid bigint NOT NULL,
>   CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid)
> )
>
> I also tried without the primary key but that one is still running for
> more that a hour.
>
> Currently the table LPP holds 471139 rows. Its linking the PP and the L
> table.
>
> There are no foreign keys referring to that table nor are there any
> other constraints on it.
> Previously I had foreign keys on lid and ppid refering to the L and PP
> table. But in a desperate try to get some speed up I deleted these. -
> But still...
>
> I am running postgresql 9.2 on a windows 2008  R2 server with 256 GB and
> the database is on something like a raid 1+0 (actually a raid1e)
> consisting of 3x4TB disks (limit of what could easily be fitted into the
> server).
>
> At the given time  there were no concurrent access to any of the
> involved tables.
>
> Has anybody some idea why the insert takes so long and/or how to speed
> things up a bit? I could live with something like half an hour - better
> would be in minutes.

Can you spot the bottleneck? Is it disk I/O? CPU?

Do you see complaints in the log file?

These are the usual suspects to tune to get better insert performance
(if commit is not the problem):

wal_buffers = 16MB
checkpoint_segments = 100
checkpoint_completion_target = 0.9

Yours,
Laurenz Albe


Re: problem with large inserts

From
aasat
Date:
First try put inserts to temporary table, a then insert from temporary to
your table



--
View this message in context: http://postgresql.1045698.n5.nabble.com/problem-with-large-inserts-tp5736844p5736867.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: problem with large inserts

From
"Kevin Grittner"
Date:
Lutz Fischer wrote:

> I am running postgresql 9.2 on a windows 2008 R2 server with 256 GB and
> the database is on something like a raid 1+0 (actually a raid1e)
> consisting of 3x4TB disks (limit of what could easily be fitted into the
> server).

I never heard of a raidle. What is that? What sort of raw
performance numbers do you get with bonnie++ (or if that's not
available on Windows, the Windows equivalent)?

-Kevin


Re: problem with large inserts

From
Chris Angelico
Date:
On Tue, Dec 18, 2012 at 12:22 AM, Kevin Grittner <kgrittn@mail.com> wrote:
> Lutz Fischer wrote:
>
>> I am running postgresql 9.2 on a windows 2008 R2 server with 256 GB and
>> the database is on something like a raid 1+0 (actually a raid1e)
>> consisting of 3x4TB disks (limit of what could easily be fitted into the
>> server).
>
> I never heard of a raidle. What is that? What sort of raw
> performance numbers do you get with bonnie++ (or if that's not
> available on Windows, the Windows equivalent)?

Google showed up this:

http://www.techrepublic.com/article/non-standard-raid-levels-primer-raid-1e/6181460

Seems it's "raid 10 for odd numbers of disks".

ChrisA


Re: problem with large inserts

From
"Kevin Grittner"
Date:
Chris Angelico wrote:

>> I never heard of a raidle. What is that?

> Google showed up this:
>
> http://www.techrepublic.com/article/non-standard-raid-levels-primer-raid-1e/6181460
>
> Seems it's "raid 10 for odd numbers of disks".

Ah, thanks! I missed it when I searched because I mis-scanned it
as raidle instead of raid1e.

RAID 1E would be less likely to be misinterpreted.

So, it should have write performance somewhere between a pair of
drives in RAID 1 and four drives in RAID 10, assuming the
controller hadles it well. I'm still curious to see raw I/O numbers
for random and sequential scan against the array.

-Kevin