Home > mailing lists

Query question - Mailing list pgsql-general

From	mailinglists@net-virtual.com
Subject	Query question
Date	January 14, 2009 23:36:13
Msg-id	63550.69.109.177.118.1231990563.squirrel@69.109.177.118 Whole thread Raw
In response to	Question about updates and MVCC (mailinglists@net-virtual.com)
Responses	Re: Query question
List	pgsql-general

Tree view

Hello,

Thanks to the replier (Martijn -- thank you very much!) to an earlier
question I had about MVCC, I've decided to re-think entirely my use of the
status column in a table.   What I've decided to do is to put all of my
new or changed records into a "holding" table, then after they are
indexed, commit the changes to their final location.   This has worked
extremely well, except when I am querying the holding table.

This is not the actual table, but my problem can be demonstrated by this
(mode can be "U" for an update/insert or "D" for a delete):

CREATE TABLE listings (
  trans_id  SERIAL,
  mode CHAR(1),
  listing_id INT,
  region_id INT,
  category INT
);

.. so, my process goes along and inserts all these rows into the table,
about 2,000,000 a day.  Then it comes time to query the data, I do a query
like this:

"SELECT * FROM listings ORDER BY region_id, category, listing_id,
trans_id" -- this is *very* expensive obviously, but since multiple rows
can be inserted for the same listing_id I have to get the data into some
deterministic order.

There can be multiple writers adding to this listings table, when it comes
time to process it, what I want to do is get only the last transaction for
a given listing_id, because the earlier ones don't matter.  On top of
that, each region_id and category_id has its own index.  I need to be able
to process the indexes in-full, one-at-a-time because there are too many
to hold that many open filehandles/processes at one time.

So, my question is, is there some way to return the rows in a
deterministic order, without actually having to do an explicit sort on the
data?  What I mean is, I don't care if category_id 4 / region_id 10 /
listing_id 10000 comes before category_id 1 / region_id 1 / lisitng_id 1
-- I just need them returned to me in that sort of grouped order (although
sorted by trans_id).  And would this even be more efficient in the first
place or am I barking up the wrong tree?

I hope this makes sense, I've been up all night so not thinking too
clearly....

Thanks!

- Greg

pgsql-general by date:

From: Alvaro Herrera
Date: 14 January 2009, 22:45:10
Subject: Re: Odd duplicate database

From: "Justin Pasher"
Date: 15 January 2009, 00:21:52
Subject: Autovacuum daemon terminated by signal 11

Query question - Mailing list pgsql-general

Previous

Next