Fwd: Insert performance (OT?) - Mailing list pgsql-performance

From Yves Vindevogel
Subject Fwd: Insert performance (OT?)
Date
Msg-id 940bb98041b7df475b53f98e35c31ce2@implements.be
Whole thread Raw
List pgsql-performance
BTW: thank you for the idea


Begin forwarded message:


<excerpt><bold><color><param>0000,0000,0000</param>From:
</color></bold>Yves Vindevogel <<yves.vindevogel@implements.be>

<bold><color><param>0000,0000,0000</param>Date: </color></bold>Tue 19
Jul 2005 12:20:34 CEST

<bold><color><param>0000,0000,0000</param>To: </color></bold>Richard
Huxton <<dev@archonet.com>

<bold><color><param>0000,0000,0000</param>Subject: </color>Re:
[PERFORM] Insert performance (OT?)

</bold>


On 19 Jul 2005, at 11:39, Richard Huxton wrote:


<excerpt>Yves Vindevogel wrote:

<excerpt>Hi,

Suppose I have a table with 4 fields (f1, f2, f3, f4)

I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)

I have 3 records

A, B, C, D (this will be inserted)

A, B, C, E (this will pass u2, but not u1, thus  not inserted)

A, B, F, D (this will pass u1, but not u2, thus not inserted)

</excerpt>

Are you saying you want to know whether they will be inserted before
you try to do so?


</excerpt>No, that is not an issue.  Problem is that when I use a big
query with   "insert into .. select" and one record is wrong (like
above) the complete insert query is abandonned.

Therefore, I must do it another way.  Or I must be able to say, insert
them and dump the rest.


<excerpt><excerpt>Now, for performance ...

I have tables like this with 500.000 records where there's a new
upload of approx. 20.000 records.

It is only now that we say index u2 to be necessary.  So, until now, I
did something like insert into ... select f1, f2, f2, max(f4) group by
f1, f2, f3

That is ok ... and also logically ok because of the data definition

</excerpt>

I'm confused here - assuming you meant "select f1,f2,f3", then I don't
see how you guarantee the row doesn't alredy exist.


</excerpt>No, I meant it with max(f4) because my table has 4 fields.
And no, I can't guarantee that, that is exactly my problem.

But with the unique indexes, I'm certain that it will not get into my
database


<excerpt><excerpt>I cannot do this with 2 group by's.  I tried this on
paper and I'm not succeeding.

</excerpt>

I don't see how you can have two group-by's, or what that would mean
if you did.


</excerpt>select from ( select from group by) as foo group by


<excerpt><excerpt>So, I must use a function that will check against u1
and u2, and then insert if it is ok.

I know that such a function is way slower that my insert query.

</excerpt>

So - you have a table, called something like "upload" with 20,000 rows
and you'd like to know whether it is safe to insert them. Well, it's
easy enough to identify which ones are duplicates.


SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;

SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;


</excerpt>That is a good idea.  I can delete the ones that would fail
my first unique index this way, and then delete the ones that would
fail my second unique index and then upload them.

Hmm, why did I not think of that myself.


<excerpt>Are you saying that deleting these rows and then inserting
takes too long?


</excerpt>This goes very fast, but not with a function that checks
each record one by one.


<excerpt>--

  Richard Huxton

  Archonet Ltd


---------------------------(end of
broadcast)---------------------------

TIP 6: explain analyze is your friend



</excerpt>Met vriendelijke groeten,

Bien à vous,

Kind regards,


<bold>Yves Vindevogel</bold>

<bold>Implements</bold>

<smaller>

</smaller></excerpt>BTW: thank you for the idea

Begin forwarded message:

> From: Yves Vindevogel <yves.vindevogel@implements.be>
> Date: Tue 19 Jul 2005 12:20:34 CEST
> To: Richard Huxton <dev@archonet.com>
> Subject: Re: [PERFORM] Insert performance (OT?)
>
>
> On 19 Jul 2005, at 11:39, Richard Huxton wrote:
>
>> Yves Vindevogel wrote:
>>> Hi,
>>> Suppose I have a table with 4 fields (f1, f2, f3, f4)
>>> I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)
>>> I have 3 records
>>> A, B, C, D (this will be inserted)
>>> A, B, C, E (this will pass u2, but not u1, thus  not inserted)
>>> A, B, F, D (this will pass u1, but not u2, thus not inserted)
>>
>> Are you saying you want to know whether they will be inserted before
>> you try to do so?
>>
> No, that is not an issue.  Problem is that when I use a big query with
>   "insert into .. select" and one record is wrong (like above) the
> complete insert query is abandonned.
> Therefore, I must do it another way.  Or I must be able to say, insert
> them and dump the rest.
>
>>> Now, for performance ...
>>> I have tables like this with 500.000 records where there's a new
>>> upload of approx. 20.000 records.
>>> It is only now that we say index u2 to be necessary.  So, until now,
>>> I did something like insert into ... select f1, f2, f2, max(f4)
>>> group by f1, f2, f3
>>> That is ok ... and also logically ok because of the data definition
>>
>> I'm confused here - assuming you meant "select f1,f2,f3", then I
>> don't see how you guarantee the row doesn't alredy exist.
>>
> No, I meant it with max(f4) because my table has 4 fields.  And no, I
> can't guarantee that, that is exactly my problem.
> But with the unique indexes, I'm certain that it will not get into my
> database
>
>>> I cannot do this with 2 group by's.  I tried this on paper and I'm
>>> not succeeding.
>>
>> I don't see how you can have two group-by's, or what that would mean
>> if you did.
>>
> select from ( select from group by) as foo group by
>
>>> So, I must use a function that will check against u1 and u2, and
>>> then insert if it is ok.
>>> I know that such a function is way slower that my insert query.
>>
>> So - you have a table, called something like "upload" with 20,000
>> rows and you'd like to know whether it is safe to insert them. Well,
>> it's easy enough to identify which ones are duplicates.
>>
>> SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;
>> SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;
>>
> That is a good idea.  I can delete the ones that would fail my first
> unique index this way, and then delete the ones that would fail my
> second unique index and then upload them.
> Hmm, why did I not think of that myself.
>
>> Are you saying that deleting these rows and then inserting takes too
>> long?
>>
> This goes very fast, but not with a function that checks each record
> one by one.
>
>> --
>>   Richard Huxton
>>   Archonet Ltd
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>>
> Met vriendelijke groeten,
> Bien à vous,
> Kind regards,
>
> Yves Vindevogel
> Implements
>
<excerpt><smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91


Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76


Web: http://www.implements.be

<italic><x-tad-smaller>

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.

Mahatma Ghandi.</x-tad-smaller></italic></smaller></excerpt><excerpt>


</excerpt>Met vriendelijke groeten,

Bien à vous,

Kind regards,


<bold>Yves Vindevogel</bold>

<bold>Implements</bold>

<smaller>

</smaller>>
>
> Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91
>
> Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76
>
> Web: http://www.implements.be
>
> First they ignore you.  Then they laugh at you.  Then they fight you.
> Then you win.
> Mahatma Ghandi.
>
Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements

<smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91


Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76


Web: http://www.implements.be

<italic><x-tad-smaller>

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.

Mahatma Ghandi.</x-tad-smaller></italic></smaller>



Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.
Mahatma Ghandi.

Attachment

pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Insert performance (OT?)
Next
From: Richard Huxton
Date:
Subject: Re: Insert performance (OT?)