... - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject ...
Date
Msg-id 20190729.173120.160309061.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Index Skip Scan  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers
Hello.

On 2019/07/29 4:17, Dmitry Dolgov wrote:>> On Thu, Jul 25, 2019 at 1:21 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
wrote:
> Yeah, will change both (hopefully soon)

Thanks. 

>> +          /*
>> +           * XXX: In case of index scan quals evaluation happens after
>> +           * ExecScanFetch, which means skip results could be fitered out
>> +           */
>>
>> Why can't we use skipscan path if having filter condition?  If
>> something bad happens, the reason must be written here instead of
>> what we do.
> 
> Sorry, looks like I've failed to express this more clear in the
> commentary. The point is that when an index scan (not for index
> only scan) has some conditions, their evaluation happens after
> skipping, and I don't see any not too much invasive way to
> apply skip correctly.

Yeah, your explanation was perfect for me. What I failed to
understand was what is expected to be done in the case. I
reconsidered and understood that:

For example, the following query:

select distinct (a, b) a, b, c from t where  c < 100;

skip scan returns one tuple for one distinct set of (a, b) with
arbitrary one of c, If the choosed c doesn't match the qual and
there is any c that matches the qual, we miss that tuple.

If this is correct, an explanation like the above might help.


>> +     * If advancing direction is different from index direction, we must
>> +     * skip right away, but _bt_skip requires a starting point.
>>
>> It doesn't seem needed to me. Could you elaborate on the reason
>> for that?
> 
> This is needed for e.g. scan with a cursor backward without an index condition.
> E.g. if we have:
> 
>      1 1 2 2 3 3
>      1 2 3 4 5 6
> 
> and do
> 
>      DECLARE c SCROLL CURSOR FOR
>      SELECT DISTINCT ON (a) a,b FROM ab ORDER BY a, b;
> 
>      FETCH ALL FROM c;
> 
> we should get
> 
>      1 2 3
>      1 3 5
> 
> When afterwards we do
> 
>      FETCH BACKWARD ALL FROM c;
> 
> we should get
> 
>      3 2 1
>      5 2 1
> 
> 
> If we will use _bt_next first time without _bt_skip, the first pair would be
> 3 6 (the first from the end of the tuples, not from the end of the cursor).

Thanks for the explanation. Sorry, I somehow thought that that is
right. You're right.

>> +     * If advancing direction is different from index direction, we must
>> +     * skip right away, but _bt_skip requires a starting point.
>> +     */
>> +    if (direction * indexonlyscan->indexorderdir < 0 &&
>> +      !node->ioss_FirstTupleEmitted)
>>
>> I'm confused by this. "direction" there is the physical scan
>> direction (fwd/bwd) of index scan, which is already compensated
>> by indexorderdir. Thus the condition means we do that when
>> logical ordering (ASC/DESC) is DESC. (Though I'm not sure what
>> "index direction" exactly means...)
> 
> I'm not sure I follow, what do you mean by compensated? In general you're

I meant that the "direction" is already changed to physical order
at the point.

> right, as David Rowley mentioned above, indexorderdir is a general scan
> direction, and direction is flipped estate->es_direction, which is a cursor
> direction. The goal of this condition is catch when those two are different,
> and we need to advance and read in different directions.

Mmm. Sorry and thank you for the explanation. I was
stupid. You're right. I perhaps mistook indexorderdir's
meaning. Maybe something like the following will work *for me*:p

| When we are fetching a cursor in backward direction, return the
| tuples that forward fetching should have returned. In other
| words, we return the last scanned tuple in a DISTINCT set. Skip
| to that tuple before returning the first tuple.

# Of course, I need someone to correct this!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Is ParsePrepareRecord dead function
Next
From: Masahiko Sawada
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)