Thread: on delete cascade slowing down delete

on delete cascade slowing down delete

From

Ivan Sergio Borgonovo

Date:

21 August 2008, 14:06:43

I've a large table with a pk and several smaller tables with fk
referencing to it.

deleting from the first table is very slow.

Not all the related fk have indexes but they are VERY small (0 to
100 records) while the main table contain 600-800K records.

the
explain delete p;
doesn't give any clue.

Any hint to track down the problem?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: on delete cascade slowing down delete

From

Alan Hodgson

Date:

21 August 2008, 14:16:32

On Thursday 21 August 2008, Ivan Sergio Borgonovo <mail@webthatworks.it>
wrote:
> I've a large table with a pk and several smaller tables with fk
> referencing to it.
>
> deleting from the first table is very slow.
>
> Not all the related fk have indexes but they are VERY small (0 to
> 100 records) while the main table contain 600-800K records.
>
> the
> explain delete p;
> doesn't give any clue.
>
> Any hint to track down the problem?

1) Define slow.

2) Tell us what your disk subsystem consists of.

3) Are there any delete triggers on the tables?

--
Alan

Re: on delete cascade slowing down delete

From

Joshua Drake

Date:

21 August 2008, 14:20:04

On Thu, 21 Aug 2008 10:16:21 -0700
Alan Hodgson <ahodgson@simkin.ca> wrote:


> > Any hint to track down the problem?
>
> 1) Define slow.
>
> 2) Tell us what your disk subsystem consists of.
>
> 3) Are there any delete triggers on the tables?
>

And the EXPLAIN ANALYZE

Joshua D. Drake


--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: on delete cascade slowing down delete

From

David Fetter

Date:

21 August 2008, 15:02:54

On Thu, Aug 21, 2008 at 07:06:32PM +0200, Ivan Sergio Borgonovo wrote:
> I've a large table with a pk and several smaller tables with fk
> referencing to it.
>
> deleting from the first table is very slow.
>
> Not all the related fk have indexes but they are VERY small (0 to
> 100 records) while the main table contain 600-800K records.
>
> the
> explain delete p;
> doesn't give any clue.
>
> Any hint to track down the problem?

BEGIN;
EXPLAIN ANALYZE DELETE ...
ROLLBACK;

Lack of indexes on the referencing tables might be an issue, as might
any triggers.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: on delete cascade slowing down delete

From

Ivan Sergio Borgonovo

Date:

21 August 2008, 15:09:47

On Thu, 21 Aug 2008 10:19:44 -0700
Joshua Drake <jd@commandprompt.com> wrote:

> On Thu, 21 Aug 2008 10:16:21 -0700
> Alan Hodgson <ahodgson@simkin.ca> wrote:
>
>
> > > Any hint to track down the problem?

> > 1) Define slow.

still undefined anyway 20-30min compared to over 2h and still
running.

> > 2) Tell us what your disk subsystem consists of.

RAID5 on SCSI, 4Gb RAM, 2xXeon (single core, HT, 3.2GHz)

> > 3) Are there any delete triggers on the tables?

yes as in the subject... several but on very small tables (no more
than some hundreds of record, but mostly in the order of some tens).

BTW work_mem is 32Mb

> And the EXPLAIN ANALYZE

isn't explain analyze going to take forever since it actually run
the statements?

btw if I have

create table p(
  pid int primary key
);

create table s(
  sid int primary key,
  pid int references p (pid) on delete cascade
);

Is it going to make things faster if I:

delete from s;
reindex table s;
delete from p;

or once everything (eg correct indexes that anyway will be useful
for normal operation, postgres.conf suited for my workload and box,
something I still have to discover, etc...) else that don't require
bookkeeping on my part it won't be different from a simpler

delete from p;

?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

SERIAL datatype

From

Peter Billen

Date:

21 August 2008, 16:15:02

Hi all,

I would like to ask a question about the serial datatype. Say I have a
field of type serial, and say for the sake of example that the range of
a serial goes from 1 to 5 (inclusive). I insert 10 entries into the
table, so the table is 'full':

INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);
INSERT INTO my_table (my_serial) VALUES (DEFAULT);

Next I delete a random entry, say the one with value 3:

DELETE FROM my_table WHERE my_serial = 3;

Is it possible to insert a new entry? Will the serial sequence somehow
be able to find the gap (3)?

The reason why I am asking is because I have a table in which constantly
entries are being deleted and inserted. What happens if the serial
sequence is exhausted? If it is not able to go the the next gap, how is
it possible to keep inserting and deleting entries once the serial
sequence has been exhausted? I can't find this anywhere in docs.

To me, it is perfectly possible that there is only one entry in the
table, with a serial value equal to its upper limit.

Thanks in advance. Kind regards,

Peter

Re: SERIAL datatype

From

Peter Billen

Date:

21 August 2008, 16:24:05

Oops, my example was a bit incorrectly edited.

I wanted to say that the range of a serial datatype goes from 1 to 5
(incluse) and I insert five entries (not 10).

Peter

Peter Billen schreef:
> Hi all,
>
> I would like to ask a question about the serial datatype. Say I have a
> field of type serial, and say for the sake of example that the range
> of a serial goes from 1 to 5 (inclusive). I insert 10 entries into the
> table, so the table is 'full':
>
> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>
> Next I delete a random entry, say the one with value 3:
>
> DELETE FROM my_table WHERE my_serial = 3;
>
> Is it possible to insert a new entry? Will the serial sequence somehow
> be able to find the gap (3)?
>
> The reason why I am asking is because I have a table in which
> constantly entries are being deleted and inserted. What happens if the
> serial sequence is exhausted? If it is not able to go the the next
> gap, how is it possible to keep inserting and deleting entries once
> the serial sequence has been exhausted? I can't find this anywhere in
> docs.
>
> To me, it is perfectly possible that there is only one entry in the
> table, with a serial value equal to its upper limit.
>
> Thanks in advance. Kind regards,
>
> Peter
>

Re: SERIAL datatype

From

ries van Twisk

Date:

21 August 2008, 16:29:25

On Aug 21, 2008, at 2:23 PM, Peter Billen wrote:

> Oops, my example was a bit incorrectly edited.
>
> I wanted to say that the range of a serial datatype goes from 1 to 5
> (incluse) and I insert five entries (not 10).
>
> Peter
>
> Peter Billen schreef:
>> Hi all,
>>
>> I would like to ask a question about the serial datatype. Say I
>> have a field of type serial, and say for the sake of example that
>> the range of a serial goes from 1 to 5 (inclusive). I insert 10
>> entries into the table, so the table is 'full':
>>
>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>>
>> Next I delete a random entry, say the one with value 3:
>>
>> DELETE FROM my_table WHERE my_serial = 3;
>>
>> Is it possible to insert a new entry? Will the serial sequence
>> somehow be able to find the gap (3)?
>>
>> The reason why I am asking is because I have a table in which
>> constantly entries are being deleted and inserted. What happens if
>> the serial sequence is exhausted? If it is not able to go the the
>> next gap, how is it possible to keep inserting and deleting entries
>> once the serial sequence has been exhausted? I can't find this
>> anywhere in docs.
>>
>> To me, it is perfectly possible that there is only one entry in the
>> table, with a serial value equal to its upper limit.
>>
>> Thanks in advance. Kind regards,
>>
>> Peter
>>

May be you want to use BIGSERIAL if you are worried?

Ries





A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

Re: SERIAL datatype

From

Peter Billen

Date:

21 August 2008, 16:52:39

My concern is not that the table will become full, but that the sequence
will be exhausted. Doing INSERT, DELETE, INSERT, DELETE ... will exhaust
the sequence. What will happen then? Do I have to manually re-order my
serial values and reset the start sequence ID to MAX() + 1?

Thanks in advance,

Peter

ries van Twisk schreef:
>
> On Aug 21, 2008, at 2:23 PM, Peter Billen wrote:
>
>> Oops, my example was a bit incorrectly edited.
>>
>> I wanted to say that the range of a serial datatype goes from 1 to 5
>> (incluse) and I insert five entries (not 10).
>>
>> Peter
>>
>> Peter Billen schreef:
>>> Hi all,
>>>
>>> I would like to ask a question about the serial datatype. Say I have
>>> a field of type serial, and say for the sake of example that the
>>> range of a serial goes from 1 to 5 (inclusive). I insert 10 entries
>>> into the table, so the table is 'full':
>>>
>>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>>>
>>> Next I delete a random entry, say the one with value 3:
>>>
>>> DELETE FROM my_table WHERE my_serial = 3;
>>>
>>> Is it possible to insert a new entry? Will the serial sequence
>>> somehow be able to find the gap (3)?
>>>
>>> The reason why I am asking is because I have a table in which
>>> constantly entries are being deleted and inserted. What happens if
>>> the serial sequence is exhausted? If it is not able to go the the
>>> next gap, how is it possible to keep inserting and deleting entries
>>> once the serial sequence has been exhausted? I can't find this
>>> anywhere in docs.
>>>
>>> To me, it is perfectly possible that there is only one entry in the
>>> table, with a serial value equal to its upper limit.
>>>
>>> Thanks in advance. Kind regards,
>>>
>>> Peter
>>>
>
> May be you want to use BIGSERIAL if you are worried?
>
> Ries
>
>
>
>
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
>
>

Re: SERIAL datatype

From

"Scott Marlowe"

Date:

21 August 2008, 16:55:16

On Thu, Aug 21, 2008 at 1:08 PM, Peter Billen <peter@clueless.be> wrote:
> Hi all,
>
> I would like to ask a question about the serial datatype. Say I have a field
> of type serial, and say for the sake of example that the range of a serial
> goes from 1 to 5 (inclusive). I insert 5 (ed) entries into the table, so the
> table is 'full':
>
> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>
> Next I delete a random entry, say the one with value 3:
>
> DELETE FROM my_table WHERE my_serial = 3;
>
> Is it possible to insert a new entry? Will the serial sequence somehow be
> able to find the gap (3)?

No, sequences do not fill in gaps.

> The reason why I am asking is because I have a table in which constantly
> entries are being deleted and inserted. What happens if the serial sequence
> is exhausted? If it is not able to go the the next gap, how is it possible
> to keep inserting and deleting entries once the serial sequence has been
> exhausted? I can't find this anywhere in docs.

Regular SERIAL type is limited to a 32 bit int.  BIGSERIAL uses a 64
bit int.  That will give you an upper limit of 2^63, assuming positive
values only in the sequence.  If you run out of that many values
you're running a seriously busy database over a very long time.

My rough guesstimate is that at 2000 inserts per second, it would take
approximately 145,865,043 years to exhaust a BIGSERIAL.  I might be
off by a factor of ten or so there.  But I don't think I am.  Note
that an insert rate of 2000 per second would exhaust a regular SERIAL
type (2^31 size) in 12 days.

> To me, it is perfectly possible that there is only one entry in the table,
> with a serial value equal to its upper limit.

That's fine too.  If you need gapless sequences, be prepared to pay
more in terms of overhead costs.  If you don't need gapless sequences
(and usually you don't) then use either SERIAL or BIGSERIAL.

Re: SERIAL datatype

From

"Scott Marlowe"

Date:

21 August 2008, 16:57:46

On Thu, Aug 21, 2008 at 1:51 PM, Peter Billen <peter@clueless.be> wrote:
> My concern is not that the table will become full, but that the sequence
> will be exhausted. Doing INSERT, DELETE, INSERT, DELETE ... will exhaust the
> sequence. What will happen then? Do I have to manually re-order my serial
> values and reset the start sequence ID to MAX() + 1?

See my other reply.  short answer:  use bigserial

Re: SERIAL datatype

From

Peter Billen

Date:

21 August 2008, 17:10:05

Thanks. I thought it was a bit counter-intuitive to have a BIGSERIAL
while I will only have a few thousands of entries, which are updated (by
DELETE and INSERT) constantly.

Thanks Scott,

Peter

Scott Marlowe schreef:
> On Thu, Aug 21, 2008 at 1:08 PM, Peter Billen <peter@clueless.be> wrote:
>
>> Hi all,
>>
>> I would like to ask a question about the serial datatype. Say I have a field
>> of type serial, and say for the sake of example that the range of a serial
>> goes from 1 to 5 (inclusive). I insert 5 (ed) entries into the table, so the
>> table is 'full':
>>
>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>> INSERT INTO my_table (my_serial) VALUES (DEFAULT);
>>
>> Next I delete a random entry, say the one with value 3:
>>
>> DELETE FROM my_table WHERE my_serial = 3;
>>
>> Is it possible to insert a new entry? Will the serial sequence somehow be
>> able to find the gap (3)?
>>
>
> No, sequences do not fill in gaps.
>
>
>> The reason why I am asking is because I have a table in which constantly
>> entries are being deleted and inserted. What happens if the serial sequence
>> is exhausted? If it is not able to go the the next gap, how is it possible
>> to keep inserting and deleting entries once the serial sequence has been
>> exhausted? I can't find this anywhere in docs.
>>
>
> Regular SERIAL type is limited to a 32 bit int.  BIGSERIAL uses a 64
> bit int.  That will give you an upper limit of 2^63, assuming positive
> values only in the sequence.  If you run out of that many values
> you're running a seriously busy database over a very long time.
>
> My rough guesstimate is that at 2000 inserts per second, it would take
> approximately 145,865,043 years to exhaust a BIGSERIAL.  I might be
> off by a factor of ten or so there.  But I don't think I am.  Note
> that an insert rate of 2000 per second would exhaust a regular SERIAL
> type (2^31 size) in 12 days.
>
>
>> To me, it is perfectly possible that there is only one entry in the table,
>> with a serial value equal to its upper limit.
>>
>
> That's fine too.  If you need gapless sequences, be prepared to pay
> more in terms of overhead costs.  If you don't need gapless sequences
> (and usually you don't) then use either SERIAL or BIGSERIAL.
>
>

Re: on delete cascade slowing down delete

From

Alban Hertroys

Date:

22 August 2008, 03:49:18

> Is it going to make things faster if I:
>
> delete from s;
> reindex table s;

Why do you think this step would help you any? There's no index on p
to begin with. You'd just be reindexing the auto-generated unique
index on s (due to it being a PK).

> delete from p;

And no, this would most likely be slower.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,48ae6140243481364815068!

Re: on delete cascade slowing down delete

From

Ivan Sergio Borgonovo

Date:

22 August 2008, 04:45:52

On Fri, 22 Aug 2008 08:48:30 +0200
Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:

>
> > Is it going to make things faster if I:
> >
> > delete from s;
> > reindex table s;

> Why do you think this step would help you any? There's no index on
> p to begin with. You'd just be reindexing the auto-generated
> unique index on s (due to it being a PK).

Sorry I forgot to add the index in the example.
What if there was an index in s.pid too?
But mostly... if I delete s will the deletion of p be faster?

> > delete from p;

> And no, this would most likely be slower.

Why?

Stopping a
delete from p;
I can see that actually postgresql is also executing a
delete from s where pid=$1;
if s is already empty, and there are no other cascading delete on s,
the lookup should be faster. I was wondering if that doesn't make a
difference in terms of performance if
a) I've an index on pid on both tables
or
b) s is already empty

and... should I reindex s if I "delete from s" first if I want some
speed up on delete from p;

Anyway this looks more and more a dead end once things get more and
more complicated since it requires too much bookkeeping.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: SERIAL datatype

From

Sam Mason

Date:

22 August 2008, 11:38:11

On Thu, Aug 21, 2008 at 09:08:24PM +0200, Peter Billen wrote:
> Is it possible to insert a new entry? Will the serial sequence somehow
> be able to find the gap (3)?

As others have said, no it's not going to.  Sequences will only return
values out of order when explicitly told to.  The main reason is to
help prevent ambiguities in the data; if it could automatically reset
it would be much more difficult to determine if 7 was "older" than 3
or, worse, which 7 is the correct one.  In a well designed system this
shouldn't ever occur, but if something does go horribly wrong it's much
easier to put the pieces back together this way.

  Sam

Re: on delete cascade slowing down delete

From

Alban Hertroys

Date:

24 August 2008, 08:37:29

On Aug 22, 2008, at 9:45 AM, Ivan Sergio Borgonovo wrote:

> On Fri, 22 Aug 2008 08:48:30 +0200
> Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:
>
>>
>>> Is it going to make things faster if I:
>>>
>>> delete from s;
>>> reindex table s;
>
>> Why do you think this step would help you any? There's no index on
>> p to begin with. You'd just be reindexing the auto-generated
>> unique index on s (due to it being a PK).
>
> Sorry I forgot to add the index in the example.
> What if there was an index in s.pid too?
> But mostly... if I delete s will the deletion of p be faster?

Hard to tell without the results from explain analyse. It depends on
what the planner decides to do, but it's often faster than the things
we come up with to work around the planner. As a rule of thumb, if
you're trying to work around the planner it is likely your problem is
caused by something else.

Without an explain plan everything is just speculation really, the
planner is quite smart and it knows your data. It tends to outsmart
the devs.

>>> delete from p;
>
>> And no, this would most likely be slower.
>
> Why?

Because of the extra reindex step. If you'd replace that with an
analyse of p, then it may be faster. Or it may not.

You seem to misinterpret the use case for REINDEX. Read here:
http://www.postgresql.org/docs/8.3/interactive/sql-reindex.html

Especially note the usage scenarios ;)

> Stopping a
> delete from p;
> I can see that actually postgresql is also executing a
> delete from s where pid=$1;
> if s is already empty, and there are no other cascading delete on s,
> the lookup should be faster. I was wondering if that doesn't make a
> difference in terms of performance if
> a) I've an index on pid on both tables
> or
> b) s is already empty
>
> and... should I reindex s if I "delete from s" first if I want some
> speed up on delete from p;
>
> Anyway this looks more and more a dead end once things get more and
> more complicated since it requires too much bookkeeping.

Maybe you shouldn't try to speculate on solutions before you
ascertained what the problem is? People asked for an EXPLAIN ANALYSE,
we can't really help you without that.
With it we can probably exactly tell you what's causing your problem,
if we still need to once you see the results.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,48b147ea243482493511527!

Re: SERIAL datatype

From

Alban Hertroys

Date:

24 August 2008, 08:55:04

On Aug 21, 2008, at 9:51 PM, Peter Billen wrote:

> My concern is not that the table will become full, but that the
> sequence will be exhausted. Doing INSERT, DELETE, INSERT,
> DELETE ... will exhaust the sequence. What will happen then? Do I
> have to manually re-order my serial values and reset the start
> sequence ID to MAX() + 1?

DELETEs don't use your sequence so will not exhaust it. In practice
only INSERTs do. I saw you mention sequences in combination with
DELETEs a few times, just making sure you're not confused ;)

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,48b14c10243481755132881!

Re: on delete cascade slowing down delete

From

Ivan Sergio Borgonovo

Date:

25 August 2008, 03:35:24

On Sun, 24 Aug 2008 13:37:11 +0200
Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:

> >>> Is it going to make things faster if I:

> >>> delete from s;
> >>> reindex table s;

> >> Why do you think this step would help you any? There's no index
> >> on p to begin with. You'd just be reindexing the auto-generated
> >> unique index on s (due to it being a PK).

> > Sorry I forgot to add the index in the example.
> > What if there was an index in s.pid too?
> > But mostly... if I delete s will the deletion of p be faster?

> Hard to tell without the results from explain analyse. It depends
> on what the planner decides to do, but it's often faster than the
> things we come up with to work around the planner. As a rule of
> thumb, if you're trying to work around the planner it is likely
> your problem is caused by something else.

> Without an explain plan everything is just speculation really,
> the planner is quite smart and it knows your data. It tends to
> outsmart the devs.

> >>> delete from p;

> >> And no, this would most likely be slower.

> > Why?

> Because of the extra reindex step. If you'd replace that with an
> analyse of p, then it may be faster. Or it may not.

> You seem to misinterpret the use case for REINDEX. Read here:
> http://www.postgresql.org/docs/8.3/interactive/sql-reindex.html

> Especially note the usage scenarios ;)

http://www.postgresql.org/docs/8.1/interactive/routine-reindex.html

So on later version than 7.4... what's going to happen if I delete a
whole table?
It looks like it is not an issue and at least reindexing can be
avoided.

> Maybe you shouldn't try to speculate on solutions before you
> ascertained what the problem is? People asked for an EXPLAIN
> ANALYSE, we can't really help you without that.

As to my understanding EXPLAIN ANALYSE does actually run the
query... but it was so damn slow to have result in a useful time.
I re engineered the tables and now the stuff works at a reasonable
speed.

Does the planner optimise multiple statements in a transaction or
just a statement at a time?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: SERIAL datatype

From

Mark Roberts

Date:

25 August 2008, 13:24:54

On Thu, 2008-08-21 at 13:53 -0600, Scott Marlowe wrote:
> Regular SERIAL type is limited to a 32 bit int.  BIGSERIAL uses a 64
> bit int.

I think one of the things that would be offsetting is the size
difference between the two types (32 vs 64 bits, 5 foreign keys, and a
billion rows or so makes for alot of pain).

-Mark

Re: SERIAL datatype

From

"Scott Marlowe"

Date:

25 August 2008, 14:02:43

On Mon, Aug 25, 2008 at 10:23 AM, Mark Roberts
<mailing_lists@pandapocket.com> wrote:
>
> On Thu, 2008-08-21 at 13:53 -0600, Scott Marlowe wrote:
>> Regular SERIAL type is limited to a 32 bit int.  BIGSERIAL uses a 64
>> bit int.
>
> I think one of the things that would be offsetting is the size
> difference between the two types (32 vs 64 bits, 5 foreign keys, and a
> billion rows or so makes for alot of pain).

Well, of course a 64 bit int is gonna be bigger than a 32 bit, but
with alignment issues and on 64 bit hardware, I'm guessing the
difference isn't exactly twice as slow / twice as much storage.  And
it's way faster than a GUID which was what I think started this
thread.

Re: SERIAL datatype

From

Mark Roberts

Date:

25 August 2008, 17:07:02

On Mon, 2008-08-25 at 11:02 -0600, Scott Marlowe wrote:
> Well, of course a 64 bit int is gonna be bigger than a 32 bit, but
> with alignment issues and on 64 bit hardware, I'm guessing the
> difference isn't exactly twice as slow / twice as much storage.  And
> it's way faster than a GUID which was what I think started this
> thread.

I took a slice of data from our dev box and generated a table using
integers and bigints.  For reference, the schema is:
bigint table:
 Type   | Modifiers
--------+-----------
bigint  |
date    |
bigint  |
bigint  |
bigint  |
bigint  |
bigint  |
date    |
date    |
bytea   |
integer |
integer |
numeric |
numeric |
numeric |
integer |
integer |
integer |
integer |
integer |
integer |
integer |
bytea   |

int table:
 Type   | Modifiers
--------+-----------
bigint  |
date    |
integer |
integer |
integer |
integer |
integer |
date    |
date    |
bytea   |
integer |
integer |
numeric |
numeric |
numeric |
integer |
integer |
integer |
integer |
integer |
integer |
integer |
bytea   |

The integer version is 599752704 bytes, and the bigint version is
673120256 bytes (a ~12% size increase).  When joining the table to
itself (keys = 1 date, 5 (big)ints, no indexes), the bigint version
performs a join to itself with an average of 44.1 sec, and the integer
version in 29.6 sec (a 48% performance hit).

While granted that it's not twice as big and twice as slow, I think it's
a fairly valid reason to want to stay within (small)int ranges.
Sometimes the initial performance hit on insert would really be worth
the continuing space/performance savings down the road.

Of course, this wasn't very scientific and the benchmarks aren't very
thorough (for instance I assumed that bigserial is implemented as a
bigint), but it should remain a valid point.

Of course, it probably has no bearing on the OP's problem.  So my advice
to the OP: have you considered not keying such a volatile table on a
serial value?

-Mark

Re: SERIAL datatype

From

Zoltan Boszormenyi

Date:

26 August 2008, 05:39:32

Mark Roberts írta:
> On Mon, 2008-08-25 at 11:02 -0600, Scott Marlowe wrote:
>
>> Well, of course a 64 bit int is gonna be bigger than a 32 bit, but
>> with alignment issues and on 64 bit hardware, I'm guessing the
>> difference isn't exactly twice as slow / twice as much storage.  And
>> it's way faster than a GUID which was what I think started this
>> thread.
>>
> ...
> The integer version is 599752704 bytes, and the bigint version is
> 673120256 bytes (a ~12% size increase).  When joining the table to
> itself (keys = 1 date, 5 (big)ints, no indexes), the bigint version
> performs a join to itself with an average of 44.1 sec, and the integer
> version in 29.6 sec (a 48% performance hit).
>
> While granted that it's not twice as big and twice as slow, I think it's
> a fairly valid reason to want to stay within (small)int ranges.
> Sometimes the initial performance hit on insert would really be worth
> the continuing space/performance savings down the road.
>

The development version of PostgreSQL (to-be 8.4)
was modified in a way so on 64-bit hardware 64-bit types
(bigint, date, timestamp, etc.) are compile-time configurable
to be passed as value instead of as reference. This way, most of the
performance hit disappears because there is no malloc() overhead
in passing bigints back and forth. Of course, the on-disk size
difference will be the same.

> Of course, this wasn't very scientific and the benchmarks aren't very
> thorough (for instance I assumed that bigserial is implemented as a
> bigint), but it should remain a valid point.
>
> Of course, it probably has no bearing on the OP's problem.  So my advice
> to the OP: have you considered not keying such a volatile table on a
> serial value?
>
> -Mark
>


--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/