Alternative to a temporary table - Mailing list pgsql-performance

From Steven Rosenstein
Subject Alternative to a temporary table
Date
Msg-id OF771CB730.CDE13F6F-ON8525708F.0056A621-8525708F.0056C7BC@us.ibm.com
Whole thread Raw
Responses Re: Alternative to a temporary table  (Steve Atkins <steve@blighty.com>)
List pgsql-performance



I have a PHP web-based application where a temporary list of servers and
their characteristics (each represented by a unique numeric server_id) is
extracted from a master server list based on a number of dynamic and
user-selected criteria (can the user view the server, is it on-line, is it
a member of a specific group, etc).  Once the user selects the set of
criteria (and servers), it rarely change during the rest of the browser
session.  The resulting temporary list of servers is then joined against
other tables with different sets of information about each of the servers,
based on the server_id.

I currently create a temporary table to hold the selected server_id's and
characteristics.  I then join this temp table with other data tables to
produce my reports.  My reason for using the temporary table method is that
the SQL for the initial server selection is dynamically created based on
the user's selections, and is complex enough that it does not lend itself
to being easily incorporated into any of the other data extraction queries
(which may also contain dynamic filtering).

Unfortunately, the PHP connection to the database does not survive from
webscreen to webscreen, so I have to re-generate the temporary server_id
table each time it is needed for a report screen.  An idea I had to make
this process more efficient was instead of re-creating the temporary table
over and over each time it is needed, do a one-time extraction of the list
of user-selected server_id's, store the list in a PHP global variable, and
then use the list in a dynamically-created WHERE clause in the rest of the
queries.  The resulting query would look something like

     SELECT *
     FROM some_data_table
     WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...)

Simple enough, however in rare cases the list of server_id's can range
between 6,000 and 10,000.

My question to the group is, can having so many values in a WHERE/IN clause
effect query performance?  Am I being naive about this and is there a
different, better way?  The server_id field is of course indexed, but it is
possible that the list of selected sid's can contain almost all of the
values in the some_data_table server_id index (in the situation where _all_
of the records are requested I wouldn't use the WHERE clause in the query).
The some_data_table can contain millions of records for thousands of
servers, so every bit of efficiency helps.

If this is not the proper group for this kind of question, please point me
in the right direction.

Thanks!
--- Steve
___________________________________________________________________________________

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: URGENT: pg_statistic_relid_att_index has gone
Next
From: Steve Atkins
Date:
Subject: Re: Alternative to a temporary table