Re: Why does the query planner use two full indexes, when a dedicated partial index exists? - Mailing list pgsql-performance

From Richard Neill
Subject Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
Date
Msg-id 50D5F1F6.7090704@richardneill.org
Whole thread Raw
In response to Re: Why does the query planner use two full indexes, when a dedicated partial index exists?  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance

On 21/12/12 05:15, Jeff Janes wrote:
>
>
>     - What I'm trying to do is trace the history of the books
>        through the system and assign each one a proper unique id.
>        So, if I see a book with "parcel_id_code = 37",
>        is it a new book (after pid wrap), or is it the same book I saw 1
>        minute ago, that hasn't exited the sorter?
>
> I'm not sure how you are implementing this goal, but I don't think it is
> best done by looping over all books (presumably from some other table?)
> and issuing an individual query for each one, if that is what you are
> doing.  Some kind of bulk join would probably be more efficient.

It would be nice to do a bulk join, but it's not possible: the query is
time sensitive. Consider:

id/pkey pid      timestamp   exit_state       destination

1       77    -24 hours    1        212
2    77    -18 hours    1        213
3    77    -12 hours    1        45
4    77    -6 hours    1        443
5    77    0 hours        null

[in future...]
5    77    0 hours        1        92
6    77    4 hours        null


At time +5 minutes, I receive a report that a book with parcel_id 77 has
successfully been delivered to destination 92.  So, what I have to do is:

* First, find the id of the most recent book which had pid=77 and where
the exit state is null. (hopefully, but not always, this yields exactly
one row, which in this case is id=5)

* Then update the table to set the destination to 92, where the id=5.


It's a rather cursed query, because:
  - the sorter machine doesn't give me full info in each message, only
    deltas, and I have to reconstruct the global state.
  - pids are reused within hours, but don't increase monotonically,
    (more like drawing repeatedly from a shuffled deck, where cards
    are only returned to the deck sporadically.
  - some pids get double-reported
  - 1% of books fall off the machine, or get stuck on it.
  - occasionally, messages are lost.
  - the sorter state isn't self-consistent (it can be restarted)


The tracker table is my attempt to consistently combine all the state we
know, and merge in the deltas as we receive messages from the sorter
machine. It ends up reflecting reality about 99% of the time.


Richard






pgsql-performance by date:

Previous
From: Richard Neill
Date:
Subject: Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
Next
From: Jeff Janes
Date:
Subject: Re: Performance on Bulk Insert to Partitioned Table