Re: Query question - Mailing list pgsql-general

From mailinglists@net-virtual.com
Subject Re: Query question
Date
Msg-id 57839.69.109.177.118.1232045766.squirrel@69.109.177.118
Whole thread Raw
In response to Query question  (mailinglists@net-virtual.com)
List pgsql-general
> On Wed, Jan 14, 2009 at 07:36:03PM -0800, mailinglists@net-virtual.com
> wrote:
>> CREATE TABLE listings (
>>   trans_id  SERIAL,
>>   mode CHAR(1),
>>   listing_id INT,
>>   region_id INT,
>>   category INT
>> );
>>
>> "SELECT * FROM listings ORDER BY region_id, category, listing_id,
>> trans_id"
>
>> [...] what I want to do is get only the last transaction for
>> a given listing_id, because the earlier ones don't matter.
>
> If you have an index on (region_id,category,listing_id,trans_id) you
> should be able to do:
>
>   SELECT region_id,category,listing_id,MAX(trans_id)
>   FROM listings
>   GROUP BY region_id,category,listing_id;
>
> And have PG answer this using the index (it'll only do this if it thinks
> there are many transactions for each group though).
>
>> On top of
>> that, each region_id and category_id has its own index.  I need to be
>> able
>> to process the indexes in-full, one-at-a-time because there are too many
>> to hold that many open filehandles/processes at one time.
>
> Not sure what you mean by "index" here; I'm assuming you're talking
> about something outside PG, or am I missing some context?

Yes, sorry I wasn't clear...  This is to keep track of incremental updates
to an index outside of Postgres

>> So, my question is, is there some way to return the rows in a
>> deterministic order, without actually having to do an explicit sort on
>> the
>> data?  What I mean is, I don't care if category_id 4 / region_id 10 /
>> listing_id 10000 comes before category_id 1 / region_id 1 / lisitng_id 1
>> -- I just need them returned to me in that sort of grouped order
>> (although
>> sorted by trans_id).
>
> If you want to know all the transaction ids then you need to do the
> sort, if you only want the largest/latest then you're probably better
> off telling PG that's what you want (i.e. using GROUP BY and MAX
> aggregate and letting it make an appropiate decision).
>

I only need to know the max id, but the problem is that for each region_id
has N categories and I need to process each one individually.  The query
you suggested does, indeed, sort them by category, but it does not sort
them by region_id.

Here's an example: (select region_id,category,listing_id,max(trans_id)
from listings_pending where region_id IN('3134000000', '2222000000')
group by region_id,category,listing_id;)

 region_id  |    category_id    | listing_id |  max
------------+----------------+------------+--------
 2222000000 | 1              |    2221473 | 640799
 2222000000 | 1              |    2426142 | 845468
 2222000000 | 1              |    2103599 | 522925
 3134000000 | 1              |    2146326 | 565652
 2222000000 | 1              |    2462112 | 881438
 2222000000 | 1              |    1947690 | 367016
 2222000000 | 1              |    2526731 | 946057
 2222000000 | 1              |    2217864 | 637190
 2222000000 | 1              |    2288420 | 707746

As you can see, at transaction 565652, I would close the index (which is
very expensive, because it actually has to do a "merge" of the newly
created index with old one), then reopen it at transaction 881438.

Thanks for your help!

- Greg




pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: MD5 password issue
Next
From: Andreas Wenk
Date:
Subject: Re: MD5 password issue