Slow performance with join on many fields - Mailing list pgsql-performance
From | Alex Johnson |
---|---|
Subject | Slow performance with join on many fields |
Date | |
Msg-id | F8F237F6-4E04-11D7-85B6-000502FCE08D@aretesystems.com Whole thread Raw |
Responses |
Re: Slow performance with join on many fields
|
List | pgsql-performance |
Hello- I'm working with a MS Access database that a client wants to turn into a multi-user database. I'm evaluating PostgreSQL for that purpose (I'd _really_ like to be able to recommend and open-source solution to them). However, I'm running into a performance-related issue that I was hoping this list could help me with. I have three tables: tbl_samples (~2000 rows), tbl_tests (~4000 rows), and tbl_results (~20,000 rows), with one-to-many relationships between them (cascading, in the order given - table definitions are attached below). I'm looking at the following query that joins these three tables: SELECT tbl_samples.station_id, tbl_samples.samp_date, tbl_samples.matrix, tbl_samples.samp_type_code, tbl_samples.samp_no, tbl_samples.field_samp_id, tbl_tests.method, tbl_tests.lab, tbl_results.par_code, tbl_results.val_qualifier, tbl_results.value, tbl_results.units, tbl_results.mdl, tbl_results.date_anal FROM (tbl_samples INNER JOIN tbl_tests USING (station_id, samp_date, matrix, samp_type_code, samp_no, samp_bdepth, samp_edepth) ) INNER JOIN tbl_results USING (station_id, samp_date, matrix, samp_type_code, samp_no, samp_bdepth, samp_edepth, method); In Access, this query runs in about a second. In PostgreSQL on the same machine, it takes about 12-15 seconds for the initial rows to be returned, and about 45 seconds to returns all rows. (This is consistent whether I use psql, use the pgAdminII SQL window, or use Access with the ODBC driver.) This is the output from EXPLAIN: Nested Loop (cost=437.73..1216.02 rows=1 width=245) Join Filter: ("outer".method = "inner".method) -> Merge Join (cost=437.73..461.38 rows=125 width=131) Merge Cond: (("outer".matrix = "inner".matrix) AND ("outer".samp_edepth = "inner".samp_edepth) AND ("outer".samp_bdepth = "inner".samp_bdepth) AND ("outer".samp_no = "inner".samp_no) AND ("outer".samp_type_code = "inner".samp_type_code) AND ("outer".samp_date = "inner".samp_date) AND ("outer".station_id = "inner".station_id)) -> Sort (cost=117.51..120.77 rows=1304 width=63) Sort Key: tbl_samples.matrix, tbl_samples.samp_edepth, tbl_samples.samp_bdepth, tbl_samples.samp_no, tbl_samples.samp_type_code, tbl_samples.samp_date, tbl_samples.station_id -> Seq Scan on tbl_samples (cost=0.00..50.04 rows=1304 width=63) -> Sort (cost=320.22..328.68 rows=3384 width=68) Sort Key: tbl_tests.matrix, tbl_tests.samp_edepth, tbl_tests.samp_bdepth, tbl_tests.samp_no, tbl_tests.samp_type_code, tbl_tests.samp_date, tbl_tests.station_id -> Seq Scan on tbl_tests (cost=0.00..121.84 rows=3384 width=68) -> Index Scan using tbl_results_pkey on tbl_results (cost=0.00..5.99 rows=1 width=114) Index Cond: (("outer".station_id = tbl_results.station_id) AND ("outer".samp_date = tbl_results.samp_date) AND ("outer".matrix = tbl_results.matrix) AND ("outer".samp_type_code = tbl_results.samp_type_code) AND ("outer".samp_no = tbl_results.samp_no) AND ("outer".samp_bdepth = tbl_results.samp_bdepth) AND ("outer".samp_edepth = tbl_results.samp_edepth)) I've done the following to try to improve performance: -postgresql.conf: increased shared_buffers to 384 increased sort_mem to 2048 -clustered all tables on the pkey index -made sure the joined fields are indexed (they are through the pkeys) As a note, vm_stat shows no paging while the query is run. Also, I realize that these keys are large (as in the number of fields). I'll be condensing these down to sequential IDs (e.g. a SERIAL type) for a further test, but I'm curious why Access seems to outperform Postgres in this instance. My question is, am I missing anything? PostgreSQL will be a hard sell if they have to take a performance hit. Thanks for any suggestions you can provide. Sorry for the long e-mail, but I wanted to provide enough info to diagnose the issue. Alex Johnson ________________________________ Table defs: CREATE TABLE tbl_Samples ( Station_ID VARCHAR (25) NOT NULL, Samp_Date TIMESTAMP WITH TIME ZONE NOT NULL, Matrix VARCHAR (10) NOT NULL, Samp_Type_Code VARCHAR (5) NOT NULL, Samp_No INTEGER NOT NULL, Samp_BDepth DOUBLE PRECISION NOT NULL, Samp_EDepth DOUBLE PRECISION NOT NULL, Depth_units VARCHAR (3), Samp_start_time TIME, Samp_end_time TIME, Field_Samp_ID VARCHAR (20), Lab_Samp_ID VARCHAR (20), Samp_Meth VARCHAR (20), ...snip... PRIMARY KEY (Station_ID,Samp_Date,Matrix,Samp_Type_Code,Samp_No,Samp_BDepth,Samp_EDe pth) ); CREATE TABLE tbl_Tests ( Station_ID VARCHAR (25) NOT NULL, Samp_Date TIMESTAMP WITH TIME ZONE NOT NULL, Matrix VARCHAR (10) NOT NULL, Samp_Type_Code VARCHAR (5) NOT NULL, Samp_No INTEGER NOT NULL, Samp_BDepth DOUBLE PRECISION NOT NULL, Samp_EDepth DOUBLE PRECISION NOT NULL, Method VARCHAR (50) NOT NULL, Lab VARCHAR (10) NOT NULL, Date_Rec TIMESTAMP WITH TIME ZONE, ...snip... PRIMARY KEY (Station_ID,Samp_Date,Matrix,Samp_Type_Code,Samp_No,Samp_BDepth,Samp_EDe pth,Method) ); CREATE TABLE tbl_Results ( Station_ID VARCHAR (25) NOT NULL, Samp_Date TIMESTAMP WITH TIME ZONE NOT NULL, Matrix VARCHAR (10) NOT NULL, Samp_Type_Code VARCHAR (5) NOT NULL, Samp_No INTEGER NOT NULL, Samp_BDepth DOUBLE PRECISION NOT NULL, Samp_EDepth DOUBLE PRECISION NOT NULL, Method VARCHAR (50) NOT NULL, Par_code VARCHAR (50) NOT NULL, Val_Qualifier VARCHAR (50) NOT NULL, Value DECIMAL (20,9) NOT NULL, ...snip... PRIMARY KEY (Station_ID,Samp_Date,Matrix,Samp_Type_Code,Samp_No,Samp_BDepth,Samp_EDe pth,Method,Par_code) ); ALTER TABLE tbl_Tests ADD CONSTRAINT REL_1 FOREIGN KEY (Station_ID,Samp_Date,Matrix,Samp_Type_Code,Samp_No,Samp_BDepth,Samp_EDe pth) REFERENCES tbl_Samples ON DELETE CASCADE ON UPDATE CASCADE; ALTER TABLE tbl_Results ADD CONSTRAINT REL_2 FOREIGN KEY (Station_ID,Samp_Date,Matrix,Samp_Type_Code,Samp_No,Samp_BDepth,Samp_EDe pth,Method) REFERENCES tbl_Tests ON DELETE CASCADE ON UPDATE CASCADE; ________________________________________________________________________ ______ A r e t e S y s t e m s Alexander M. Johnson, P.E.
pgsql-performance by date: