Re: Sequential scans - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Sequential scans
Date
Msg-id 1178219087.28383.286.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Sequential scans  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-hackers
On Thu, 2007-05-03 at 19:27 +0100, Heikki Linnakangas wrote:
> I understand that the data structure keeps track of relations being 
> scanned, with one entry per such relation. I think that's very good 
> design, and I'm not suggesting to change that.
> 
> But what's the right size for that? We don't know how many large tables 
> there is in the database, so we can't use that. A hard-coded constant 
> maybe? What would it be? I figured we could use NBackends, because there 
> can't realistically be more than one large seq scan per backend going on 
> at any point in time. That's an upper bound, in any real world 
> application there's far less than that.

Ok, I understand you now. We'll choose the maximum size of the
table/list as the max_connections on startup.

> > Part of the reason I did this is because, with a Sync Scan, it seems
> > like there may be possible reasons to do multiple seq scans concurrently
> > in the same backend. This is completely contrived, but consider:
> > 
> > SELECT * FROM bigtable UNION ALL SELECT * FROM bigtable;
> > 
> > I'm not trying to argue for such a plan to exist right now, but if there
> > is a possibility that we'd want to create such plans in the future, I
> > think it's better to track by relation rather than by backend. 
> 
> Those two seqscans would be executed one after each other, not in 
> parallel, so you would still have just one seq scan at any given point 
> in time.

Right, I was just throwing out a hypothetical situation (if highly
contrived) in which it may help to do multiple scans at once. I think
this can be safely ignored for now though, since the planner would never
generate such a plan, and I can't think of a reasonable use case.

> >>> 100 is of course arbitrary, and that could grow or shrink automatically
> >>> at runtime.
> >> Except that it can't shrink, because we don't have any convenient moment 
> >> to remove an entry from the list, other than dropping the least recently 
> >> used entry out of the list when inserting a new one. And since it's in 
> >> shared mem, the allocated size is fixed at boot up, so we can't grow it 
> >> either.
> >>
> > 
> > It can shrink when a new scan looks through the list and passes by
> > entries that don't need to be in the list.
> 
> How do you identify an entry that doesn't need to be there?
> 

Some type of clock-like algorithm where a counter was decremented when
an element is passed up and incremented when it's chosen might work.
It's probably easier to just use a hash table though.

> > I don't want to over-complicate things, so if you think having a hash
> > table is a better, simpler, or more readable design, I'll go with that
> > instead of the list-only approach. The only reason I suggested the list-
> > only approach was to see if the hash table was really gaining anything
> > for us. In practice, it will be quite rare that more than 10 different
> > big relations are being scanned at once, and I don't know how much of a
> > gain a hash table is when there are (in all but exceptional cases) only
> > a few entries.
> 
> How about implementing the list-only approach first, since you're going 
> to need the LRU list anyway, and we can add consider adding the hash 
> table later?
> 

Sounds good. Will do.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: RETURN QUERY in PL/PgSQL?
Next
From: Chris Ryan
Date:
Subject: Re: [pgsql-www] Feature freeze progress report