Re: Sequence Access Method WIP - Mailing list pgsql-hackers

From José Luis Tallón
Subject Re: Sequence Access Method WIP
Date
Msg-id 547F2318.6080207@adv-solutions.net
Whole thread Raw
In response to Re: Sequence Access Method WIP  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Sequence Access Method WIP  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: Sequence Access Method WIP  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 12/03/2014 11:24 AM, Andres Freund wrote:
> On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
>> snip]
> I don't think the WAL logging would need to change much in comparison to
> the current solution. We'd just add the page number to the WAL record.
>
> The biggest advantage would be to require fewer heavyweight locks,
> because the pg_sequence one could be a single fastpath lock. Currently
> we have to take the sequence's relation lock (heavyweight) and then the
> the page level lock (lwlock) for every single sequence used.

Got it, thank you for the explanation.

>> May I possibly suggest a file-per-schema model instead? This approach would
>> certainly solve the excessive i-node consumption problem that --I guess--
>> Andres is trying to address here.
> I don't think that really has any advantages.

Just spreading the I/O load, nothing more, it seems:

Just to elaborate a bit on the reasoning, for completeness' sake:
Given that a relation's segment maximum size is 1GB, we'd have 
(1048576/8)=128k sequences per relation segment.
Arguably, not many real use cases will have that many sequences.... save 
for *massively* multi-tenant databases.

The downside being that all that random I/O --- in general, it can't 
really be sequential unless there are very very few sequences--- can't 
be spread to other spindles. Create a "sequence_default_tablespace" GUC 
+ ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe? (I could take a shot at the patch, if deemed
worthwhile)

>> [snip]
> The overhead of a file is much more than wasting the remainder of a
> page. Alone the requirement of separate fsyncs and everything is pretty
> bothersome. The generated IO patterns are also much worse...

Yes, you are right. I stand corrected.

> [snip]
> I think there's pretty much no chance of accepting more than one page
> per sequence

Definitively.


Thanks,
    J.L.




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Next
From: Alvaro Herrera
Date:
Subject: Re: tracking commit timestamps