Thread: Performance implications of creating many, many sequences

Performance implications of creating many, many sequences

From

Michael Gardner

Date:

22 October 2010, 16:18:44

Consider the following table:

CREATE TABLE invoice (
    account_id integer NOT NULL REFERENCES account,
    invoice_number integer NOT NULL,
    UNIQUE (account_id,invoice_number));

I would like to do the equivalent of making invoice_number a serial type, but on a per-account basis. Would it be a
reasonableapproach to create a separate sequence for each individual account? Are there performance implications I
shouldknow about, given that there will be hundreds of thousands of accounts? Is there another approach I should be
lookingat instead?

Re: Performance implications of creating many, many sequences

From

Rob Sargent

Date:

22 October 2010, 16:50:41

Is this "invoice_number" just an id or what might appear an a bill (in
some pretty form etc)?

If the former, just get a unique id over all invoices.  At the very
least it will save time i) in writing where clauses ii) re-creating the
correct id once some one assigns an invoice to the wrong customer.

On 10/22/2010 01:18 PM, Michael Gardner wrote:
> Consider the following table:
>
> CREATE TABLE invoice (
>     account_id integer NOT NULL REFERENCES account,
>     invoice_number integer NOT NULL,
>     UNIQUE (account_id,invoice_number));
>
> I would like to do the equivalent of making invoice_number a serial type, but on a per-account basis. Would it be a
reasonableapproach to create a separate sequence for each individual account? Are there performance implications I
shouldknow about, given that there will be hundreds of thousands of accounts? Is there another approach I should be
lookingat instead?

Re: Performance implications of creating many, many sequences

From

Michael Gardner

Date:

22 October 2010, 17:02:04

On Oct 22, 2010, at 2:50 PM, Rob Sargent wrote:

> Is this "invoice_number" just an id or what might appear an a bill (in
> some pretty form etc)?

It will appear on actual invoices, as part of a compound invoice identifier (like ABCD-0042, where ABCD is an
identifierfor the account in question and 42 is the invoice number). 

> If the former, just get a unique id over all invoices.  At the very
> least it will save time i) in writing where clauses ii) re-creating the
> correct id once some one assigns an invoice to the wrong customer.

There will be such an ID, but I do not want to show it directly to users. I want an invoice number independent of
whateversurrogate key the database happens to use to uniquely identify rows. 

(Sorry if you receive this twice, Rob. I accidentally replied to you instead of the list the first time.)

Re: Performance implications of creating many, many sequences

From

Craig Ringer

Date:

23 October 2010, 01:09:24

On 23/10/2010 3:18 AM, Michael Gardner wrote:
> Consider the following table:
>
> CREATE TABLE invoice (
>     account_id integer NOT NULL REFERENCES account,
>     invoice_number integer NOT NULL,
>     UNIQUE (account_id,invoice_number));
>
> I would like to do the equivalent of making invoice_number a serial type, but on a per-account basis. Would it be a
reasonableapproach to create a separate sequence for each individual account? Are there performance implications I
shouldknow about, given that there will be hundreds of thousands of accounts? Is there another approach I should be
lookingat instead? 

I'd favour "another approach". The purpose of sequences is to improve
concurrency in the face of rapid inserts. You're probably not going to
have a bunch of transactions all wanting to grab new invoice numbers for
the same customer at the same time, so this isn't going to be a problem;
you're already partitioning the concurrency limitation out to be
per-customer, which should be good enough.

Also, sequences are not gapless. If a transaction grabs an entry then
rolls back instead of committing, that ID is never used. Your customers
would probably not like that for invoice numbers.

Instead, maintain a counter, either in the main customer record or in an
associated (customer_id, counter) side table if you want to reduce
potential lock contention. Write a simple SQL function that uses an
UPDATE ... RETURNING statement to grab a new ID from the counter and
increment it. Use that function instead of 'nextval(seqname)' when you
want an ID. The UPDATE will take a lock out on the customer row (or
side-table row if you did it that way) that'll prevent anyone else
updating it until the transaction commits or rolls back.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Performance implications of creating many, many sequences

From

Michael Gardner

Date:

23 October 2010, 13:42:51

On Oct 22, 2010, at 11:03 PM, Craig Ringer wrote:

> Instead, maintain a counter, either in the main customer record or in an associated (customer_id, counter) side table
ifyou want to reduce potential lock contention. Write a simple SQL function that uses an UPDATE ... RETURNING statement
tograb a new ID from the counter and increment it. Use that function instead of 'nextval(seqname)' when you want an ID.
TheUPDATE will take a lock out on the customer row (or side-table row if you did it that way) that'll prevent anyone
elseupdating it until the transaction commits or rolls back. 

Thanks for the suggestion. It seems like there should be a safe way to use max() instead of a separate counter though,
aslong as I can guarantee that invoice numbers never change and invoices are never deleted. Right?

Re: Performance implications of creating many, many sequences

From

Craig Ringer

Date:

24 October 2010, 23:31:41

On 10/24/2010 12:42 AM, Michael Gardner wrote:
> On Oct 22, 2010, at 11:03 PM, Craig Ringer wrote:
>
>> Instead, maintain a counter, either in the main customer record or in an associated (customer_id, counter) side
tableif you want to reduce potential lock contention. Write a simple SQL function that uses an UPDATE ... RETURNING
statementto grab a new ID from the counter and increment it. Use that function instead of 'nextval(seqname)' when you
wantan ID. The UPDATE will take a lock out on the customer row (or side-table row if you did it that way) that'll
preventanyone else updating it until the transaction commits or rolls back. 
>
> Thanks for the suggestion. It seems like there should be a safe way to use max() instead of a separate counter
though,as long as I can guarantee that invoice numbers never change and invoices are never deleted. Right? 

True. You'll then have to provide your own locking (say, SELECT ... FOR
UPDATE on the customer record) to ensure that no two invoices are
allocated the same number, though. If you use UPDATE ... RETURNING on a
counter field the locking is done for you.

You'll have a UNIQUE(customer_id,order_id) constraint in place anyway,
of course, so you won't have the risk of genuinely duplicate IDs, just
the need to retry a transaction that fails due to a duplicate key error
if two invoice creations on a customer happen concurrently.

Perhaps it's too unlikely to care about, but I just dislike using
max(x)+1 on principle, as it's just a generally unsafe sql programming
idiom.

--
Craig Ringer