Re: Sequential scans - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Sequential scans
Date
Msg-id 1178151858.28383.212.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Sequential scans  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: Sequential scans
List pgsql-hackers
On Wed, 2007-05-02 at 23:59 +0100, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > On Wed, 2007-05-02 at 20:58 +0100, Heikki Linnakangas wrote:
> >> Jeff Davis wrote:
> >>> What should be the maximum size of this hash table? 
> >> Good question. And also, how do you remove entries from it?
> >>
> >> I guess the size should somehow be related to number of backends. Each 
> >> backend will realistically be doing just 1 or max 2 seq scan at a time. 
> >>   It also depends on the number of large tables in the databases, but we 
> >> don't have that information easily available. How about using just 
> >> NBackends? That should be plenty, but wasting a few hundred bytes of 
> >> memory won't hurt anyone.
> > 
> > One entry per relation, not per backend, is my current design.
> 
> Umm, you naturally have just entry per relation, but we were talking 
> about how many entries the table needs to hold.. You're patch had a 
> hard-coded value of 1000 which is quite arbitrary.

I think I'm missing something from your statement "I guess the size
should somehow be related to number of backends." If there is only one
entry per relation, why do more backends require a larger hash table?

> If you only have one seq scan in the system, there's no contention. If 
> you have more, they will all need to acquire the lock to update the page 
> number in the hash table, regardless of the table their scanning, so 
> there's potential for contention.
> 
> To put things into perspective, though, any time you need to evict a 
> buffer from the buffer cache to read in a new buffer, you need to 
> acquire the BufFreelistLock. If we only update the page number say every 
> 10 pages, the overhead and lock contention of that lock would be roughly 
> 1/10th of that arising from BufFreelistLock. And we can make it a 
> conditional acquire, in which case the backends won't need to slow down 
> their scans, but the updates of the entries in the hash table will get 
> less frequent instead. We could also take just a shared lock to update 
> the counter, and rely on the atomicity of the write like your patch did. 
> You'd only need to take an exclusive lock to move entries in the LRU 
> list or to add or remove an entry.
> 

If I just step back for a second to consider a simpler design:

We will most likely have no more than a few relations larger than the
minimum threshold for Sync Scanning in any database being scanned
concurrently.

What if I just make an LRU that occupies a fixed size? Any reads or
updates can start at the MRU item and work backward, and any evictions
can happen at the LRU item.

Does a hash table really save us cycles if we keep the list small, say,
100 items? Looking at all 100 items would only be required perhaps on a
scan startup. I don't have a good sense of scale here, is following 50
pointers while holding a lock a significant source of contention?

100 is of course arbitrary, and that could grow or shrink automatically
at runtime.

> Yes, see hash_estimate_size.
> 

Ok.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Avoiding unnecessary reads in recovery
Next
From: Tom Lane
Date:
Subject: Re: Feature freeze progress report