Re: [HACKERS] TIME QUALIFICATION - Mailing list pgsql-hackers

From jwieck@debis.com (Jan Wieck)
Subject Re: [HACKERS] TIME QUALIFICATION
Date
Msg-id m10ACkF-000EBRC@orion.SAPserv.Hamburg.dsh.de
Whole thread Raw
In response to Re: [HACKERS] TIME QUALIFICATION  (Vadim Mikheev <vadim@krs.ru>)
List pgsql-hackers
Vadim wrote:

> > > inside snapshot itself to prevent free-ing of snapshots.
> > > Benefits: no copying in Executor, no QueryId --> Snapshot
> > > lookup. Just add pointer to RTE. Parser will put NULL there:
> > > as flag that current snapshot has to be used. ExecutorStart
>                  ^^^^^^^^^^^^^^^^
> Note: "current" here is "when actual execution will start", not
> "when query was parsed/rewritten". ExecutorStart will substitute
> QuerySnapshot for NULL snapshot pointers.

    Yepp  -  snapshots  are  allways  built just before execution
    starts.

    The reason why I need the QueryId and it's lookup is that the
    time  of  ExecutorStart() for one query hasn't anything to do
    with  where  it  was  coming  from  or  when  it   has   been
    parsed/rewritten.   Due  to the rewriting, RTE's in different
    queries have relationships.  Only the  rewrite  system  knows
    them,  and  the  only  place  where this information could be
    stored is the RTE. All RTE's that are related to  each  other
    across  queries  must  use  the  same  snapshot when they get
    scanned.

> And so for deffered rules rewrite system will:
>
> 1. set t2' RTE snapshot pointer to NULL - this will guarantee
>    that snapshot of execution time (commit or set immediate time)
>    will be used;
> 2. set t1' RTE snapshot pointer to current QuerySnapshot
>    (and increment its refcount).

    At parse/rewrite time there is no actual  snapshot.  And  for
    SPI  prepared plan, the snapshot to use will be different for
    each execution.  The RTE cannot hold the snapshot itself.  It
    could  only tell, which of all the snapshots created during a
    transaction to use for it.

>
> >     And  there  could  also be rules fired on t2. This results in
> >     recursive rewriting and it's not that  easy  to  foresee  the
> >     order  in  which  all  these commands will then get executed.
> >     During recursion there is no  difference  between  a  command
> >     coming  from  the  user  and one that is already generated by
> >     another rule.
> >
> >     The problem here is, that the RTE's in a rule generated query
> >     resulting  from the former command (that fired them) must get
> >     scanned against the snapshot of  the  time  when  the  former
> >     command  get's  executed.  But the RTE's coming from the rule
>       ^^^^^^^^^^^^^^^^^^^^^^^^
> So - you use QuerySnapshot as it was in this time.
>
> >     action itself must get the snapshot when the rules command is
>
> Set RTE' snapshot pointer to NULL.
>
> >     executed.  Only this way the quals added to the rule from the
> >     former command will see what the former command saw.
> >
> >     The executor cannot know where all  the  RTE's  where  coming
> >     from. Except we have a QueryId and associate the QueryId with
> >     a snapshot at the time of execution. And I think we  must  do
> >     this lookup, because the order commands are executed will not
> >     be the same as they got created.  The executor  only  has  to
> >     override  the RTE's snapshot if the RTE's snapshot name isn't
> >     NULL.
>
> + set NULL snapshot pointers to QuerySnapshot.

    That way, the executor would have to  set  all  the  snapshot
    pointers in related RTE's of other queries (not yet executed)
    too so they point to the same  snapshot.  I  can  only  think
    about  an  ordered  set to link all the related RTE's to each
    other. That would be  some  kind  of  ordered  set  over  the
    related RTE's, but I would get into deep trouble when copying
    rangetables during rewrite or SPI_saveplan()  to  keep  these
    set's alive.

    Maybe  I'm  not  able  to  explain exactly enough what I have
    vaguely in mind how it could work. But  after  you've  helped
    not  to forget prepared plans I think I have all the odds and
    ends to build it.

    I'll hack around a little.   Then  let's  discuss  the  final
    details while having a prototype to look at.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

Previous
From: Vadim Mikheev
Date:
Subject: Re: [HACKERS] TIME QUALIFICATION
Next
From: Hannu Krosing
Date:
Subject: Re: AW: [HACKERS] Problems with >2GB tables on Linux 2.0