Re: How to get good performance for very large lists/sets? - Mailing list pgsql-general

From Alban Hertroys
Subject Re: How to get good performance for very large lists/sets?
Date
Msg-id 397F595E-DC1D-4E5F-BEF5-476CA50F63FA@gmail.com
Whole thread Raw
In response to How to get good performance for very large lists/sets?  (Richard Frith-Macdonald <richard.frith-macdonald@brainstorm.co.uk>)
List pgsql-general
On 06 Oct 2014, at 10:02, Richard Frith-Macdonald <richard.frith-macdonald@brainstorm.co.uk> wrote:

> I'm wondering if anyone can help with advice on how to manage large lists/sets of items in a postgresql database.
>
> I have a database which uses multiple  lists of items roughly like this:
>
> CREATE TABLE List (
>  ID SERIAL,
>  Name VARCHAR ....
> );
>
> and a table containing individual entries in the lists:
>
> CREATE TABLE ListEntry (
>  ListID INT, /* Reference the List table */
>  ItemID INT /* References an Item table */
> ) ;
> CREATE UNIQUE INDEX ListEntryIDX ON ListEntry(ListID, ItemID);

Don’t you have any PK’s? A UNIQUE INDEX is not the same as a PK, a PK does not allow NULLs for example.

For that matter, I’d ditch the serial column in List - it attributes to a larger index size which decreases the chances
thatthe index will fit in memory, making it less feasable to the query planner. IMHO, natural keys are to be preferred
hereover surrogate keys. That is assuming that List.Name is supposed to be unique. 

> Now, there are thousands of lists, many with millions of entries, and items are added to and removed from lists in an
unpredictableway (in response to our customer's actions, not under our control).  Lists are also created by customer
actions.


> I think that server won't use index-only scans because, even in cases where a particular list has not had any recent
changes,the ListEntry table will almost always have had some change (for one of the other lists) since its last vacuum. 
> Perhaps creating multiple ListEntry tables (one for each list) would allow better performance; but that would be
thousands(possibly tens of thousands) of tables, and allowing new tables to be created by our clients might conflict
withthings like nightly backups. 
>
> Is there a better way to manage list/set membership for many thousands of sets and many millions of items?

Another benefit of using natural keys is that you don’t need to fetch the actual List entries - the Names are right
therein your ListEntry table. You only you need to look records up in the List table when you want their details
(columnsother than Name). 

A possible drawback in this case is that the PK index on ListEntry would probably be larger.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



pgsql-general by date:

Previous
From: Tim Mickelson
Date:
Subject: Re: Really strange foreign key constraint problem blocking delete
Next
From: Adrian Klaver
Date:
Subject: Re: Really strange foreign key constraint problem blocking delete