Re: browsing table with 2 million records - Mailing list pgsql-performance

From PFC
Subject Re: browsing table with 2 million records
Date
Msg-id op.sza6qpz4th1vuj@localhost
Whole thread Raw
In response to Re: browsing table with 2 million records  ("Merlin Moncure" <merlin.moncure@rcsonline.com>)
List pgsql-performance

> I've done it...
> First of all I totally agree with PFC's rant regarding absolute
> positioning while browsing datasets.  Among other things, it has serious
> problems if you have multiple updating your table.  Also it's kind of
> silly to be doing this in a set based data paradigm.

    Recently I've been browsing some site and it had this problem : as users
kept adding new entries as I was browsing the list page after page, when I
hit "next page" I got on the next page half of what I already saw on the
previous page. Of course the webmaster has set the "visited links" color
the same as "unvisited links", so I couldn't tell, and had to use my
brain, which was quite upsetting XDDD

    And bookmarking a page to resume browsing at some later time does not
work either, because if I bookmark page 15, then when I come back, users
have added 10 pages of content and what I bookmarked is now on page 25...

>> All very well and good, but now do it generically...

    Hehe. I like ranting...
    It is not possible to do it in a generic way that works in all cases. For
instance :

    Forum topic case :
    - posts are added at the bottom and not at the top
    - page number is relevant and meaningful

    However, in most cases, you can use a multipart key and get it right.
    Suppose that, for instance, you have a base of several million records,
organized according to :

    - date (like the original poster)
    or :
    - country, region, city, customer last name, first name.

    You could ask for the first three, but then you'll get 50000 Smiths in
New York and 1 Van Bliezinsky.

    Or you could precalculate, once a week, a key interval distribution that
creates reasonable sized intervals (for instance, 100 values in each),
maybe asking that each interval should only contain only one city. So, you
would get :

    Country Region City    LastName    FirstName
    USA    NYC    NY    Smith,    ''
    USA    NYC    NY    Smith,    Albert
    USA    NYC    NY    Smith,    Bernard
    .....
    USA    NYC    NY    Smith,    William
    ...
    USA    NYC    NY    Von Braun
    ...

    So you'd predetermine your "page breaks" ahead of time, and recompute
them once in a while. You won't get identically sized pages, but if the
statistical distribution of the data plays nice, you should get evenly
sized pages.

    The interesting part is that you can present the user with a selector
which presents meaningful and useful data, AND is fast to compute, AND is
fast to use.
    In this case, it would amount to "Select country, region, city", then,
display a list like this :
    Smith, ...Albert
    Smith, Albus...Bernard
    ...
    Smith, William...
    ...
    Von Braun...Von Schwarts
    ...

    So Jeannette Smith would be easy to find, being in the link "Smith,
Jean...John" for instance.

    If the aim is to quickly locate a particular record, I like
javascript-powered autocompletion better ; but for browsing, this
pagination method is cool.

    Regards !


























pgsql-performance by date:

Previous
From: Sidar López Cruz
Date:
Subject: how postgresql request the computer resources
Next
From: "Edward Di Geronimo Jr."
Date:
Subject: Re: Performance issues with custom functions