Thread: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Zoltan Boszormenyi

Date:

01 March 2007, 01:04:39

Hi,

I would like to be able to harden the conditions
of generating IDENTITY columns so the
events below run in this order:

- assign values for regular columns (with or without DEFAULT)
- NOT NULL checks on regular columns
- CHECK constraints whose expression contains only regular columns

- assign values for GENERATED columns
- NOT NULL checks on GENERATED columns
- CHECK constraints whose expression may contain regular or GENERATED columns

- UNIQUE index checks that has only regular columns
- UNIQUE index checks that may have regular or GENERATED columns

- assign values for IDENTITY column
- NOT NULL on IDENTITY
- CHECK constraints on IDENTITY
- UNIQUE index checks that can reference IDENTITY column

At this point the heap tuple and the index tuple can be inserted
without further checks.

Currently tuple->t_self is required by ExecInsertIndexTuples()
and I don't see any way to make IDENTITY work the way it's
intended but to mix heap_insert()/heap_update() and
ExecInsertIndexTuples() together and use the result in
ExecInsert() and ExecUpdate().

Would it be acceptable?

Best regards,
Zoltán Böszörményi

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Tom Lane

Date:

01 March 2007, 02:06:12

Zoltan Boszormenyi <zboszor@dunaweb.hu> writes:
> Would it be acceptable?

No, because you can't create index entries when you haven't yet got the
TID for the heap tuple.  What do you propose doing, insert a dummy index
entry and then go back to fill it in later?  Aside from approximately
doubling the work involved, this is fundamentally broken because no
other backend could know what to do upon encountering the dummy index
entry --- there's no way for it to check if the entry references a live
tuple or not.  Not to mention that a crash here will leave a permanently
dummy index entry that there's no way to vacuum.

The other rearrangements you suggest are not any more acceptable;
we are not going to restructure the entire handling of defaults and
check constraints around a single badly-designed SQL2003 feature.
        regards, tom lane

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Zoltan Boszormenyi

Date:

01 March 2007, 03:05:03

Hi,

Tom Lane írta:
> Zoltan Boszormenyi <zboszor@dunaweb.hu> writes:
>   
>> Would it be acceptable?
>>     
>
> No, because you can't create index entries when you haven't yet got the
> TID for the heap tuple.  What do you propose doing, insert a dummy index
> entry and then go back to fill it in later?  Aside from approximately
>   

No, I was thinking about breaking up e.g. heap_insert()
to be able to mix with ExecInsertIndexTuples() so I get a
pinned buffer and have the heap_tuple's t_self set first
then doing the uniqueness checks step by step.
BTW, can I use modify_tuple() after doing
RelationGetBufferForTuple() and RelationPutHeapTuple(),
right?

> doubling the work involved, this is fundamentally broken because no
>   

Well, the work wouldn't be doubled as all the unique indexes
have to be checked anyway with the current way, too, to have
the tuple accepted into the database.

> other backend could know what to do upon encountering the dummy index
> entry --- there's no way for it to check if the entry references a live
> tuple or not.  Not to mention that a crash here will leave a permanently
> dummy index entry that there's no way to vacuum.
>
> The other rearrangements you suggest are not any more acceptable;
> we are not going to restructure the entire handling of defaults and
> check constraints around a single badly-designed SQL2003 feature.
>   

My IDENTITY/GENERATED patch broke up the
checks currently this way (CHECK constraints are prohibited
for special case columns):

- normal columns are assigned values (maybe using DEFAULT)
- check NOT NULLs and CHECKs for normal columns

( Up to this point this works the same way as before if you don't  use neither IDENTITY nor GENERATED. )

- assign GENERATED with ther values
- check NOT NULLs for GENERATED
- assign IDENTITY with value
- check NOT NULL for IDENTITY

and

- check UNIQUE for everything

Identity would be special so it doesn't inflate the sequence
if avoidable. Currently the only way if UNIQUE fails
for any index which is still very much makes it unusable.

What I would like to achieve is for IDENTITY to skip
a sequence value and fail to be INSERTed if the IDENTITY
column's uniqe check is failed. Which pretty much means
that there is already a record with that IDENTITY value
regardless of the UNIQUE index is defined for only the IDENTITY
column or the IDENTITY column is part of a multi-column
UNIQUE index.

If I could broke up the order of events the way I described
in my first mail, I could re-enable having CHECK constraints
for both IDENTITY and GENERATED columns.

The point with GENERATED is you have to have
all other columns assigned with values BEFORE
being able to compute a GENERATED column
that reference other columns in its expression so
you _have to_ break up the current order of computing
DEFAULTs. I know a computed column could be done
either in the application or with SELECTs but compare
the amount of work: if you do it in the SELECT you have to
compute the expressions every time the SELECT is run
making it slower. Doing it on UPDATE or INSERT
makes it LESS work in a fewer INSERT/UPDATE +
heavy SELECT workload. Of course, for a heavy UPDATE
workload it makes it more work but only if you actually
use GENERATED columns. It means exatly the same
amount of work if you use IDENTITY as with SERIAL,
it's just made in different order.

The GENERATED column is an easy of use feature
with possibly having less work, whereas the IDENTITY
column is mandatory for some applications (e.g. accounting
and billing is stricter in some countries) where you simply
cannot skip a value in the sequence, the strict monotonity is
not enough.

Best regards,
Zoltán Böszörményi

Postgres Replication

From

Mageshwaran

Date:

01 March 2007, 04:00:10

Hi,

Some body help me regarding postgres replication, Give me some ideas .

Thanks in advance

Regards
J Mageshwaran


********** DISCLAIMER **********
Information contained and transmitted by this E-MAIL is proprietary to 
Sify Limited and is intended for use only by the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential or exempt from disclosure under applicable law. If this is a 
forwarded message, the content of this E-MAIL may not have been sent with 
the authority of the Company. If you are not the intended recipient, an 
agent of the intended recipient or a  person responsible for delivering the 
information to the named recipient,  you are notified that any use, 
distribution, transmission, printing, copying or dissemination of this 
information in any way or in any manner is strictly prohibited. If you have 
received this communication in error, please delete this mail & notify us 
immediately at admin@sifycorp.com


Watch the latest updates on Mumbai, with video coverage of news, events,
Bollywood, live darshan from Siddhivinayak temple and more, only on
www.mumbailive.in

Watch the hottest videos from Bollywood, Fashion, News and more only on
www.sifymax.com

Re: Postgres Replication

From

Andrew Dunstan

Date:

01 March 2007, 05:22:28

Mageshwaran wrote:
> Hi,
>
> Some body help me regarding postgres replication, Give me some ideas .
>
> Thanks in advance
>
>

Here are some ideas:

lose the idiotic, pointless and inaccurate email addendum, especially 
the ads for bollywood etc.

do some research yourself, especially by reading the Postgres 
documentation - google is also your friend

ask questions in the correct forum, which for this type of question this 
is not.

do not create mail on a new subject by replying to an old email and 
changing the subject line

cheers

andrew

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

"Florian G. Pflug"

Date:

01 March 2007, 06:26:36

Zoltan Boszormenyi wrote:
> The GENERATED column is an easy of use feature
> with possibly having less work, whereas the IDENTITY
> column is mandatory for some applications (e.g. accounting
> and billing is stricter in some countries) where you simply
> cannot skip a value in the sequence, the strict monotonity is
> not enough.

But just postponing nextval() until after the uniqueness checks
only decreases the *probability* of non-monotonic values, and
*does not* preven them. Consindert two transactions

A: begin ;
B: Begin ;
A: insert ... -- IDENTITY generates value 1
B: insert .. -- IDENTITY generates value 2
A: rollback ;
B: commit ;

Now there is a record with IDENTITY 2, but not with 1. The *only*
way to fix this is to *not* use a sequence, but rather do
lock table t in exclusive mode ;
select max(identity)+1 from t ;
to generate the identity - but of course this prevents any concurrent
inserts, which will make this unuseable for any larger database.

Note that this is not a deficency of postgres sequences - there is no
way to guarantee stricly monotonic values while allowing concurrent
selects at the same time. (Other than lazyly assigning the values, but
this needs to be done by the application)

I agree that I'd be nice to generate the identity columns as late as
possible to prevents needless gaps, but not if price is a for more
intrusive patch, or much higher complexity.

greetings, Florian Pflug

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Zoltan Boszormenyi

Date:

01 March 2007, 06:48:14

Florian G. Pflug írta:
> Zoltan Boszormenyi wrote:
>> The GENERATED column is an easy of use feature
>> with possibly having less work, whereas the IDENTITY
>> column is mandatory for some applications (e.g. accounting
>> and billing is stricter in some countries) where you simply
>> cannot skip a value in the sequence, the strict monotonity is
>> not enough.
>
> But just postponing nextval() until after the uniqueness checks
> only decreases the *probability* of non-monotonic values, and
> *does not* preven them. Consindert two transactions
>
> A: begin ;
> B: Begin ;
> A: insert ... -- IDENTITY generates value 1
> B: insert .. -- IDENTITY generates value 2
> A: rollback ;
> B: commit ;

I can understand that. But your example is faulty,
you can't have transaction inside a transaction.
Checkpoints are another story. 8-)

You can have some application tricks to
have continous sequence today with regular
serials but only if don't have a unique index
that doesn't use the serial column. Inserting
a record to that table outside the transaction,
making note of the serial value.

If subsequent processing fails (because of unique,
check constraint, etc) you have to go back to the main
table and modify the record, indicating that the record
isn't representing valid data. But you must keep it with
the serial value it was assigned. I have seen systems
requiring this. My point is that with the identity
column, you will be able to define unique index
on the table that exludes the identity column.

> Now there is a record with IDENTITY 2, but not with 1. The *only*
> way to fix this is to *not* use a sequence, but rather do
> lock table t in exclusive mode ;
> select max(identity)+1 from t ;
> to generate the identity - but of course this prevents any concurrent
> inserts, which will make this unuseable for any larger database.
>
> Note that this is not a deficency of postgres sequences - there is no
> way to guarantee stricly monotonic values while allowing concurrent
> selects at the same time. (Other than lazyly assigning the values, but
> this needs to be done by the application)

Agreed.

> I agree that I'd be nice to generate the identity columns as late as
> possible to prevents needless gaps, but not if price is a for more
> intrusive patch, or much higher complexity.

Intrusive, hm? The catalog have to indicate that the column
is IDENTITY, otherwise you cannot know it.

The cost I am thinking now is an extra heap_update()
after heap_insert() without generating the identity value
and inserting index tuples to indexes that doesn't
contain the identity column.

Best regards,
Zoltán Böszörményi

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Zoltan Boszormenyi

Date:

01 March 2007, 06:51:53

Zoltan Boszormenyi írta:
> Florian G. Pflug írta:
>> Zoltan Boszormenyi wrote:
>>> The GENERATED column is an easy of use feature
>>> with possibly having less work, whereas the IDENTITY
>>> column is mandatory for some applications (e.g. accounting
>>> and billing is stricter in some countries) where you simply
>>> cannot skip a value in the sequence, the strict monotonity is
>>> not enough.
>>
>> But just postponing nextval() until after the uniqueness checks
>> only decreases the *probability* of non-monotonic values, and
>> *does not* preven them. Consindert two transactions
>>
>> A: begin ;
>> B: Begin ;
>> A: insert ... -- IDENTITY generates value 1
>> B: insert .. -- IDENTITY generates value 2
>> A: rollback ;
>> B: commit ;
>
> I can understand that. But your example is faulty,
> you can't have transaction inside a transaction.
> Checkpoints are another story. 8-)
>
> You can have some application tricks to
> have continous sequence today with regular
> serials but only if don't have a unique index
> that doesn't use the serial column. Inserting
> a record to that table outside the transaction,
> making note of the serial value.
>
> If subsequent processing fails (because of unique,
> check constraint, etc) you have to go back to the main
> table and modify the record, indicating that the record
> isn't representing valid data. But you must keep it with
> the serial value it was assigned. I have seen systems
> requiring this. My point is that with the identity
> column, you will be able to define unique index
> on the table that exludes the identity column.
>
>> Now there is a record with IDENTITY 2, but not with 1. The *only*
>> way to fix this is to *not* use a sequence, but rather do
>> lock table t in exclusive mode ;
>> select max(identity)+1 from t ;
>> to generate the identity - but of course this prevents any concurrent
>> inserts, which will make this unuseable for any larger database.
>>
>> Note that this is not a deficency of postgres sequences - there is no
>> way to guarantee stricly monotonic values while allowing concurrent
>> selects at the same time. (Other than lazyly assigning the values, but
>> this needs to be done by the application)
>
> Agreed.
>
>> I agree that I'd be nice to generate the identity columns as late as
>> possible to prevents needless gaps, but not if price is a for more
>> intrusive patch, or much higher complexity.
>
> Intrusive, hm? The catalog have to indicate that the column
> is IDENTITY, otherwise you cannot know it.
>
> The cost I am thinking now is an extra heap_update()
> after heap_insert() without generating the identity value
> and inserting index tuples to indexes that doesn't
> contain the identity column.

And as far as I tested the current state, there is no cost
if you don't use GENERATED or IDENTITY.
The extra heap_update() would be performed only
if you have an IDENTITY colum.

> Best regards,
> Zoltán Böszörményi
>
>

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

"Florian G. Pflug"

Date:

01 March 2007, 07:13:23

Zoltan Boszormenyi wrote:
> Florian G. Pflug írta:
>> Zoltan Boszormenyi wrote:
>>> The GENERATED column is an easy of use feature
>>> with possibly having less work, whereas the IDENTITY
>>> column is mandatory for some applications (e.g. accounting
>>> and billing is stricter in some countries) where you simply
>>> cannot skip a value in the sequence, the strict monotonity is
>>> not enough.
>>
>> But just postponing nextval() until after the uniqueness checks
>> only decreases the *probability* of non-monotonic values, and
>> *does not* preven them. Consindert two transactions
>>
>> A: begin ;
>> B: Begin ;
>> A: insert ... -- IDENTITY generates value 1
>> B: insert .. -- IDENTITY generates value 2
>> A: rollback ;
>> B: commit ;
>
> I can understand that. But your example is faulty,
> you can't have transaction inside a transaction.
> Checkpoints are another story. 8-)

A: and B: are meant to denote *two* *different*
transactions running concurrently.

> You can have some application tricks to
> have continous sequence today with regular
> serials but only if don't have a unique index
> that doesn't use the serial column. Inserting
> a record to that table outside the transaction,
> making note of the serial value.
>
> If subsequent processing fails (because of unique,
> check constraint, etc) you have to go back to the main
> table and modify the record, indicating that the record
> isn't representing valid data. But you must keep it with
> the serial value it was assigned. I have seen systems
> requiring this. My point is that with the identity
> column, you will be able to define unique index
> on the table that exludes the identity column.

Yes, of course you can prevent gaps by just filling them
with garbage/invalid records of whatever. But I don't see
why this is usefull - either you want, say, your invoice
number to be continuous because it's required by law - or
you don't. But if the law required your invoice numbers to be
continous, surely just filling the gaps with fake invoices
it just as illegal as having gaps in the first place.

>> I agree that I'd be nice to generate the identity columns as late as
>> possible to prevents needless gaps, but not if price is a for more
>> intrusive patch, or much higher complexity.
>
> Intrusive, hm? The catalog have to indicate that the column
> is IDENTITY, otherwise you cannot know it.
>
> The cost I am thinking now is an extra heap_update()
> after heap_insert() without generating the identity value
> and inserting index tuples to indexes that doesn't
> contain the identity column.

I'll have to admit that I haven't actually looked at your patch -
so sorry if I missunderstood things. I got the impression that
tom's main complaint was that you are shuffling too much existing code
around in your patch, and I figured that this is partly because you
try to generate the IDENTITY value as late as possible. Since doing
this won't prevent gaps, but just reduces the probability of creating
them, I thought that a way around tom's concerns might be to drop
that requirement.

I will shut up now, at least until I have read the patch ;-)

greetings, Florian Pflug

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Zoltan Boszormenyi

Date:

01 March 2007, 08:06:34

Florian G. Pflug írta:
> Yes, of course you can prevent gaps by just filling them
> with garbage/invalid records of whatever. But I don't see
> why this is usefull - either you want, say, your invoice
> number to be continuous because it's required by law - or
> you don't. But if the law required your invoice numbers to be
> continous, surely just filling the gaps with fake invoices
> it just as illegal as having gaps in the first place.

Not fake invoice, "stornoed" for whatever reason.
But you have to keep the record to show you didn't delete anything.

Best regards,
Zoltán Böszörményi

Re: Postgres Replication

From

"Joshua D. Drake"

Date:

01 March 2007, 09:34:24

Andrew Dunstan wrote:
> Mageshwaran wrote:
>> Hi,
>>
>> Some body help me regarding postgres replication, Give me some ideas .
>>
>> Thanks in advance
>>
>>
> 
> Here are some ideas:

Tsk, Andrew.

Replication in PostgreSQL is usually served via Point in Time recovery,
Slony (http://www.slony.info) or Mammoth Replication
(http://www.commandprompt.com/).

Joshua D. Drake


> 
> lose the idiotic, pointless and inaccurate email addendum, especially
> the ads for bollywood etc.
> 
> do some research yourself, especially by reading the Postgres
> documentation - google is also your friend
> 
> ask questions in the correct forum, which for this type of question this
> is not.
> 
> do not create mail on a new subject by replying to an old email and
> changing the subject line
> 
> 
> cheers
> 
> andrew
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>               http://www.postgresql.org/docs/faq
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Zoltan Boszormenyi

Date:

01 March 2007, 09:50:58

Zoltan Boszormenyi írta:
>> The cost I am thinking now is an extra heap_update()
>> after heap_insert() without generating the identity value
>> and inserting index tuples to indexes that doesn't
>> contain the identity column.
>
> And as far as I tested the current state, there is no cost
> if you don't use GENERATED or IDENTITY.
> The extra heap_update() would be performed only
> if you have an IDENTITY colum.

The modification I imagined is actually working:
- skip indexes using the identity columns
- do a simple_heap_update() after all other columns are assigned and index tuples are inserted
- do ExecInsertIndexTuples() on indexes referencing the IDENTITY column

However, I get warning messages like:

WARNING:  detected write past chunk end in ExecutorState 0xaaff68

How can I prevent them?

Best regards,
Zoltán Böszörményi

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Tom Lane

Date:

01 March 2007, 12:04:26

Zoltan Boszormenyi <zboszor@dunaweb.hu> writes:
> However, I get warning messages like:
> WARNING:  detected write past chunk end in ExecutorState 0xaaff68
> How can I prevent them?

Find the memory-clobbering bug in your patch.
        regards, tom lane

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Zoltan Boszormenyi

Date:

01 March 2007, 13:00:42

Tom Lane írta:
> Zoltan Boszormenyi <zboszor@dunaweb.hu> writes:
>   
>> However, I get warning messages like:
>> WARNING:  detected write past chunk end in ExecutorState 0xaaff68
>> How can I prevent them?
>>     
>
> Find the memory-clobbering bug in your patch.
>
>             regards, tom lane
>   

Thanks, I found it.

Best regards,
Zoltán Böszörményi

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

Bruno Wolff III

Date:

02 March 2007, 18:38:40

On Thu, Mar 01, 2007 at 11:26:23 +0100, "Florian G. Pflug" <fgp@phlo.org> wrote:
> 
> But just postponing nextval() until after the uniqueness checks
> only decreases the *probability* of non-monotonic values, and
> *does not* preven them. Consindert two transactions
> 
> A: begin ;
> B: Begin ;
> A: insert ... -- IDENTITY generates value 1
> B: insert .. -- IDENTITY generates value 2
> A: rollback ;
> B: commit ;
> 
> Now there is a record with IDENTITY 2, but not with 1. The *only*
> way to fix this is to *not* use a sequence, but rather do
> lock table t in exclusive mode ;
> select max(identity)+1 from t ;
> to generate the identity - but of course this prevents any concurrent
> inserts, which will make this unuseable for any larger database.

While this demonstrates that you can get holes in the sequence, it doesn't
show an example that is not monotonic.

> Note that this is not a deficency of postgres sequences - there is no
> way to guarantee stricly monotonic values while allowing concurrent
> selects at the same time. (Other than lazyly assigning the values, but
> this needs to be done by the application)

With in a single session and barring wrap-around you will get monotonicly
increasing values. You are correct that there is no such guaranty between
separate sessions that overlap in time.

Re: Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

From

"Florian G. Pflug"

Date:

02 March 2007, 20:45:30

Bruno Wolff III wrote:
> On Thu, Mar 01, 2007 at 11:26:23 +0100,
>   "Florian G. Pflug" <fgp@phlo.org> wrote:
>> But just postponing nextval() until after the uniqueness checks
>> only decreases the *probability* of non-monotonic values, and
>> *does not* preven them. Consindert two transactions
>>
>> A: begin ;
>> B: Begin ;
>> A: insert ... -- IDENTITY generates value 1
>> B: insert .. -- IDENTITY generates value 2
>> A: rollback ;
>> B: commit ;
>>
>> Now there is a record with IDENTITY 2, but not with 1. The *only*
>> way to fix this is to *not* use a sequence, but rather do
>> lock table t in exclusive mode ;
>> select max(identity)+1 from t ;
>> to generate the identity - but of course this prevents any concurrent
>> inserts, which will make this unuseable for any larger database.
> 
> While this demonstrates that you can get holes in the sequence, it doesn't
> show an example that is not monotonic.
Sorry, my formulation was sloppy. What I meant is that you can't
guarantee gaplessness.

greetings, Florian Pflug