Re: sample scans and predicate locking - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: sample scans and predicate locking
Date
Msg-id CA+hUKGJQMyFQan=H+bHNeC4ov+JEcO6v3Z6N0FebSMr6U9UT4Q@mail.gmail.com
Whole thread Raw
In response to sample scans and predicate locking  (Andres Freund <andres@anarazel.de>)
Responses Re: sample scans and predicate locking  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Sun, May 19, 2019 at 8:31 AM Andres Freund <andres@anarazel.de> wrote:
> While looking at fixing [1] on master, I noticed the following
> codeblock:
>
> static HeapScanDesc
> heap_beginscan_internal(Relation relation, Snapshot snapshot,
>                                                 int nkeys, ScanKey key,
>                                                 ParallelHeapScanDesc parallel_scan,
>                                                 bool allow_strat,
>                                                 bool allow_sync,
>                                                 bool allow_pagemode,
>                                                 bool is_bitmapscan,
>                                                 bool is_samplescan,
>                                                 bool temp_snap)
> ...
>         /*
>          * For a seqscan in a serializable transaction, acquire a predicate lock
>          * on the entire relation. This is required not only to lock all the
>          * matching tuples, but also to conflict with new insertions into the
>          * table. In an indexscan, we take page locks on the index pages covering
>          * the range specified in the scan qual, but in a heap scan there is
>          * nothing more fine-grained to lock. A bitmap scan is a different story,
>          * there we have already scanned the index and locked the index pages
>          * covering the predicate. But in that case we still have to lock any
>          * matching heap tuples.
>          */
>         if (!is_bitmapscan)
>                 PredicateLockRelation(relation, snapshot);
>
> As you can see this only tests for is_bitmapscan, *not* for
> is_samplescan. Which means we afaict currently every sample scan
> predicate locks the entire relation.

Right, I just tested that.  That's not wrong though, is it?  It's just
overly pessimistic.

> I think there's two possibilities here:
>
> 1) It's just the comment that's inaccurate, and it should really talk
>    about both seqscans and sample scans. It should not be necessary to
>    lock the whole relation, but I'm not sure the code otherwise takes
>    enough care.
>
> 2) We should really not predicate lock the entire relation. In which
>    case I think there might be missing PredicateLockTuple/Page calls.

Yeah, we could probably predicate-lock pages in
heapam_scan_sample_next_block() and tuples in
heapam_scan_sample_next_tuple(), instead of doing this.  Seems like a
reasonable improvement for 13.  But... hmm....  There *might* be a
theoretical argument about TABLESAMPLE(100) behaving differently if
done per page rather than if done at relation level, wrt new pages
added to the end later and therefore missed.  And then by logical
extension, TABLESAMPLE of any percentage.  I'm not sure.

I made a little list of small SERIALIZABLE projects for v13 and added
this, over here:

https://wiki.postgresql.org/wiki/SerializableToDo

-- 
Thomas Munro
https://enterprisedb.com



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Multivariate MCV stats can leak data to unprivileged users
Next
From: Sascha Kuhl
Date:
Subject: Organisational structure