Thread: Insert performance (OT?)

Insert performance (OT?)

From
Yves Vindevogel
Date:
Hi,


Suppose I have a table with 4 fields (f1, f2, f3, f4)

I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)


I have 3 records

A, B, C, D (this will be inserted)

A, B, C, E (this will pass u2, but not u1, thus  not inserted)

A, B, F, D (this will pass u1, but not u2, thus not inserted)


Now, for performance ...


I have tables like this with 500.000 records where there's a new
upload of approx. 20.000 records.

It is only now that we say index u2 to be necessary.  So, until now, I
did something like insert into ... select f1, f2, f2, max(f4) group by
f1, f2, f3

That is ok ... and also logically ok because of the data definition


I cannot do this with 2 group by's.  I tried this on paper and I'm not
succeeding.


So, I must use a function that will check against u1 and u2, and then
insert if it is ok.

I know that such a function is way slower that my insert query.


So, my question ...

How can I keep the same performance, but also with the new index in
mind ???



Met vriendelijke groeten,

Bien à vous,

Kind regards,


<bold>Yves Vindevogel</bold>

<bold>Implements</bold>

<smaller>

</smaller>Hi,

Suppose I have a table with 4 fields (f1, f2, f3, f4)
I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)

I have 3 records
A, B, C, D (this will be inserted)
A, B, C, E (this will pass u2, but not u1, thus  not inserted)
A, B, F, D (this will pass u1, but not u2, thus not inserted)

Now, for performance ...

I have tables like this with 500.000 records where there's a new upload
of approx. 20.000 records.
It is only now that we say index u2 to be necessary.  So, until now, I
did something like insert into ... select f1, f2, f2, max(f4) group by
f1, f2, f3
That is ok ... and also logically ok because of the data definition

I cannot do this with 2 group by's.  I tried this on paper and I'm not
succeeding.

So, I must use a function that will check against u1 and u2, and then
insert if it is ok.
I know that such a function is way slower that my insert query.

So, my question ...
How can I keep the same performance, but also with the new index in
mind ???


Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements

<smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91


Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76


Web: http://www.implements.be

<italic><x-tad-smaller>

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.

Mahatma Ghandi.</x-tad-smaller></italic></smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.
Mahatma Ghandi.

Attachment

Re: Insert performance (OT?)

From
Yves Vindevogel
Date:
nobody ?


On 18 Jul 2005, at 21:29, Yves Vindevogel wrote:


<excerpt>Hi,


Suppose I have a table with 4 fields (f1, f2, f3, f4)

I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)


I have 3 records

A, B, C, D (this will be inserted)

A, B, C, E (this will pass u2, but not u1, thus  not inserted)

A, B, F, D (this will pass u1, but not u2, thus not inserted)


Now, for performance ...


I have tables like this with 500.000 records where there's a new
upload of approx. 20.000 records.

It is only now that we say index u2 to be necessary.  So, until now, I
did something like insert into ... select f1, f2, f2, max(f4) group by
f1, f2, f3

That is ok ... and also logically ok because of the data definition


I cannot do this with 2 group by's.  I tried this on paper and I'm not
succeeding.


So, I must use a function that will check against u1 and u2, and then
insert if it is ok.

I know that such a function is way slower that my insert query.


So, my question ...

How can I keep the same performance, but also with the new index in
mind ???



Met vriendelijke groeten,

Bien à vous,

Kind regards,


<bold>Yves Vindevogel</bold>

<bold>Implements</bold>

<smaller>

</smaller><<Pasted Graphic 2.tiff><smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91


Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76


Web: http://www.implements.be

<italic><x-tad-smaller>

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.

Mahatma Ghandi.</x-tad-smaller></italic></smaller></excerpt><excerpt>

---------------------------(end of
broadcast)---------------------------

TIP 5: don't forget to increase your free space map settings


</excerpt>Met vriendelijke groeten,

Bien à vous,

Kind regards,


<bold>Yves Vindevogel</bold>

<bold>Implements</bold>

<smaller>

</smaller>nobody ?

On 18 Jul 2005, at 21:29, Yves Vindevogel wrote:

> Hi,
>
> Suppose I have a table with 4 fields (f1, f2, f3, f4)
> I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)
>
> I have 3 records
> A, B, C, D (this will be inserted)
> A, B, C, E (this will pass u2, but not u1, thus  not inserted)
> A, B, F, D (this will pass u1, but not u2, thus not inserted)
>
> Now, for performance ...
>
> I have tables like this with 500.000 records where there's a new
> upload of approx. 20.000 records.
> It is only now that we say index u2 to be necessary.  So, until now, I
> did something like insert into ... select f1, f2, f2, max(f4) group by
> f1, f2, f3
> That is ok ... and also logically ok because of the data definition
>
> I cannot do this with 2 group by's.  I tried this on paper and I'm not
> succeeding.
>
> So, I must use a function that will check against u1 and u2, and then
> insert if it is ok.
> I know that such a function is way slower that my insert query.
>
> So, my question ...
> How can I keep the same performance, but also with the new index in
> mind ???
>
>
> Met vriendelijke groeten,
> Bien à vous,
> Kind regards,
>
> Yves Vindevogel
> Implements
>
> <Pasted Graphic 2.tiff>
>
> Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91
>
> Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76
>
> Web: http://www.implements.be
>
> First they ignore you.  Then they laugh at you.  Then they fight you.
> Then you win.
> Mahatma Ghandi.
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements

<smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91


Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76


Web: http://www.implements.be

<italic><x-tad-smaller>

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.

Mahatma Ghandi.</x-tad-smaller></italic></smaller>



Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.
Mahatma Ghandi.

Attachment

Re: Insert performance (OT?)

From
Richard Huxton
Date:
Yves Vindevogel wrote:
> Hi,
>
> Suppose I have a table with 4 fields (f1, f2, f3, f4)
> I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)
>
> I have 3 records
> A, B, C, D (this will be inserted)
> A, B, C, E (this will pass u2, but not u1, thus  not inserted)
> A, B, F, D (this will pass u1, but not u2, thus not inserted)

Are you saying you want to know whether they will be inserted before you
try to do so?

> Now, for performance ...
>
> I have tables like this with 500.000 records where there's a new upload
> of approx. 20.000 records.
> It is only now that we say index u2 to be necessary.  So, until now, I
> did something like insert into ... select f1, f2, f2, max(f4) group by
> f1, f2, f3
> That is ok ... and also logically ok because of the data definition

I'm confused here - assuming you meant "select f1,f2,f3", then I don't
see how you guarantee the row doesn't alredy exist.

> I cannot do this with 2 group by's.  I tried this on paper and I'm not
> succeeding.

I don't see how you can have two group-by's, or what that would mean if
you did.

> So, I must use a function that will check against u1 and u2, and then
> insert if it is ok.
> I know that such a function is way slower that my insert query.

So - you have a table, called something like "upload" with 20,000 rows
and you'd like to know whether it is safe to insert them. Well, it's
easy enough to identify which ones are duplicates.

SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;
SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;

Are you saying that deleting these rows and then inserting takes too long?

--
   Richard Huxton
   Archonet Ltd

Re: Insert performance (OT?)

From
Richard Huxton
Date:
Yves Vindevogel wrote:
  >>> So, I must use a function that will check against u1 and u2, and then
>>> insert if it is ok.
>>> I know that such a function is way slower that my insert query.
>>
>> So - you have a table, called something like "upload" with 20,000 rows
>> and you'd like to know whether it is safe to insert them. Well, it's
>> easy enough to identify which ones are duplicates.
>>
>> SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;
>> SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;
>>
> That is a good idea.  I can delete the ones that would fail my first
> unique index this way, and then delete the ones that would fail my
> second unique index and then upload them.
> Hmm, why did I not think of that myself.

I've spent a lot of time moving data from one system to another, usually
having to clean it in the process. At 9pm on a Friday, you decide that
on the next job you'll find an efficient way to do it :-)

>> Are you saying that deleting these rows and then inserting takes too
>> long?
>>
> This goes very fast, but not with a function that checks each record one
> by one.

You could get away with one query if you converted them to left-joins:
INSERT INTO ...
SELECT * FROM upload LEFT JOIN ... WHERE f3 IS NULL
UNION
SELECT * FROM upload LEFT JOIN ... WHERE f4 IS NULL

The UNION will remove duplicates for you, but this might turn out to be
slower than two separate queries.

--
   Richard Huxton
   Archonet Ltd

Re: Insert performance (OT?)

From
Yves Vindevogel
Date:
I will use 2 queries.  They run within a function fnUpload(), so I'm
going to keep it simple.



On 19 Jul 2005, at 12:51, Richard Huxton wrote:


<excerpt>Yves Vindevogel wrote:

 >>> So, I must use a function that will check against u1 and u2, and
then

<excerpt><excerpt><excerpt>insert if it is ok.

I know that such a function is way slower that my insert query.

</excerpt>

So - you have a table, called something like "upload" with 20,000 rows
and you'd like to know whether it is safe to insert them. Well, it's
easy enough to identify which ones are duplicates.


SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;

SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;


</excerpt>That is a good idea.  I can delete the ones that would fail
my first unique index this way, and then delete the ones that would
fail my second unique index and then upload them.

Hmm, why did I not think of that myself.

</excerpt>

I've spent a lot of time moving data from one system to another,
usually having to clean it in the process. At 9pm on a Friday, you
decide that on the next job you'll find an efficient way to do it :-)


<excerpt><excerpt>Are you saying that deleting these rows and then
inserting takes too long?


</excerpt>This goes very fast, but not with a function that checks
each record one by one.

</excerpt>

You could get away with one query if you converted them to left-joins:

INSERT INTO ...

SELECT * FROM upload LEFT JOIN ... WHERE f3 IS NULL

UNION

SELECT * FROM upload LEFT JOIN ... WHERE f4 IS NULL


The UNION will remove duplicates for you, but this might turn out to
be slower than two separate queries.


--

  Richard Huxton

  Archonet Ltd



</excerpt>Met vriendelijke groeten,

Bien à vous,

Kind regards,


<bold>Yves Vindevogel</bold>

<bold>Implements</bold>

<smaller>

</smaller>I will use 2 queries.  They run within a function fnUpload(), so I'm
going to keep it simple.


On 19 Jul 2005, at 12:51, Richard Huxton wrote:

> Yves Vindevogel wrote:
>  >>> So, I must use a function that will check against u1 and u2, and
> then
>>>> insert if it is ok.
>>>> I know that such a function is way slower that my insert query.
>>>
>>> So - you have a table, called something like "upload" with 20,000
>>> rows and you'd like to know whether it is safe to insert them. Well,
>>> it's easy enough to identify which ones are duplicates.
>>>
>>> SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;
>>> SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;
>>>
>> That is a good idea.  I can delete the ones that would fail my first
>> unique index this way, and then delete the ones that would fail my
>> second unique index and then upload them.
>> Hmm, why did I not think of that myself.
>
> I've spent a lot of time moving data from one system to another,
> usually having to clean it in the process. At 9pm on a Friday, you
> decide that on the next job you'll find an efficient way to do it :-)
>
>>> Are you saying that deleting these rows and then inserting takes too
>>> long?
>>>
>> This goes very fast, but not with a function that checks each record
>> one by one.
>
> You could get away with one query if you converted them to left-joins:
> INSERT INTO ...
> SELECT * FROM upload LEFT JOIN ... WHERE f3 IS NULL
> UNION
> SELECT * FROM upload LEFT JOIN ... WHERE f4 IS NULL
>
> The UNION will remove duplicates for you, but this might turn out to
> be slower than two separate queries.
>
> --
>   Richard Huxton
>   Archonet Ltd
>
>
Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements

<smaller>


Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91


Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76


Web: http://www.implements.be

<italic><x-tad-smaller>

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.

Mahatma Ghandi.</x-tad-smaller></italic></smaller>



Mail: yves.vindevogel@implements.be  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.
Then you win.
Mahatma Ghandi.

Attachment

Re: Insert performance (OT?)

From
Manfred Koizar
Date:
On Tue, 19 Jul 2005 11:51:51 +0100, Richard Huxton <dev@archonet.com>
wrote:
>You could get away with one query if you converted them to left-joins:
>INSERT INTO ...
>SELECT * FROM upload LEFT JOIN ... WHERE f3 IS NULL
>UNION
>SELECT * FROM upload LEFT JOIN ... WHERE f4 IS NULL

For the archives:  This won't work.  Each of the two SELECTs
eliminates rows violating one of the two constraints but includes rows
violating the other constraint.  After the UNION you are back to
violating both constraints :-(

Servus
 Manfred


Re: Insert performance (OT?)

From
John A Meinel
Date:
Manfred Koizar wrote:
> On Tue, 19 Jul 2005 11:51:51 +0100, Richard Huxton <dev@archonet.com>
> wrote:
>
>>You could get away with one query if you converted them to left-joins:
>>INSERT INTO ...
>>SELECT * FROM upload LEFT JOIN ... WHERE f3 IS NULL
>>UNION
>>SELECT * FROM upload LEFT JOIN ... WHERE f4 IS NULL
>
>
> For the archives:  This won't work.  Each of the two SELECTs
> eliminates rows violating one of the two constraints but includes rows
> violating the other constraint.  After the UNION you are back to
> violating both constraints :-(

Couldn't you use "INTERSECT" then? To only get the rows that *both*
queries return?
John
=:->

>
> Servus
>  Manfred
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


Attachment