Re: Performance of INSERT into temporary tables using psqlODBC driver - Mailing list pgsql-performance

From Tim Cross
Subject Re: Performance of INSERT into temporary tables using psqlODBC driver
Date
Msg-id 87k1nmgp58.fsf@gmail.com
Whole thread Raw
In response to Re: Performance of INSERT into temporary tables using psqlODBCdriver  (padusuma <venkata.adusumalli@gmail.com>)
Responses Re: Performance of INSERT into temporary tables using psqlODBCdriver
List pgsql-performance
padusuma <venkata.adusumalli@gmail.com> writes:

> Hello Tim,
>
>>How are you gathering metrics to determine if performance has improved
>>or not?
> I am measuring the response times through timer for the execution of SQL
> statements through psqlODBC driver. The response times for INSERT INTO
> temp-table statements have not changed with the parameters I modified.
>
>>Have you seen any change in your explain (analyze, buffers) plans?
>
> There was no change in the EXPLAIN for INSERT INTO statement, but the
> performance of the queries improved by about 5%.
>
>>Make sure your table statistics are all up-to-date before performing
>>each benchmark test. I often turn off autovacuum when doing this sort of
>>testing so that I know exactly when tables get vacuumed and statistics
>>get updated (just ensure you remember to turn it back on when your
>>finished!).
> I ran the VACUUM ANALYZE statement manually before starting the tests. Even
> though autovacuum was turned on, it did not get invoked due to the
> thresholds and as bulk of the inserts are in temporary tables.
>
>>Are the wal checkpoints being triggered every 30 mins or more
>>frequently?
> The wal checkpoints are triggered every 30 mins.
>
>>Are you still seeing the system use lots of temp files?
> I do not see any files in pgsql_tmp folders in the tablespaces where the
> tables are created. Also, I do not see pgsql_tmp folder in base and global
> folders. Am I checking for these files in the correct location? Also, I ran
> the following query (taken from another forum) to check the temporary files
> generated for all the databases:
> SELECT temp_files AS "Temporary files", temp_bytes AS "Size of temporary
> files" FROM pg_stat_database db;
>
> The result is 0 for both columns.
>
>>Do you have any indexes on the tables your inserting into?
> I have not created indexes on these temporary tables, but programatically
> executed /ANALYZE <temp-table>/ statement after the data is inserted into
> these temp tables, to generate/update statistics for these tables. Indexes
> do exist for all regular tables.
>
>>As mentioned previously, there are no simple/quick fixes here - you
>>cannot just change a setting and see performance improve. It will be
>>necessary to do a lot of experimentation, gathering statistics and
>>investigate how postgres is using buffers, disk IO etc. All of these
>>parameters interact with each other, so it is critical you have good
>>metrics to see exactly what your changes do. It is complex and time
>>consuming. Highly recommend PostgreSQL: High Performance (Ahmed & SMith)
>>and Mastering Postgres (Shonig) for valuable background/tips - there
>>really is just far too much to communicate effectively via email.
>
> Thank you for the suggestions on the books. I will go through these.

Based on your responses, it sounds like you have done the 'easy' stuff
which often results in improved performance. Now you are going to have
to dig much harder. It might be worth looking more closely at how
buffers/caching is working (pg_buffercache extension might be useful),
verifying where performance bottlenecks are (this can sometimes be
surprising - it may not be where you think it is. Don't forget to
profile your client, network/driver throughput, OS level disk I/O
etc). This is where books like PosgreSQL High Performance will be
useful.

My only word of caution is that you are likely to now begin looking at
options which can improve throughput, but often come with other 'costs',
such as stability, data integrity or recovery time. These are things
which can only be assessed on a per case basis and largely depend on
business priorities. It will take time and you will need to make changes
slowly and do a lot of benchmarking.

It is really important to have a clear idea as to what would be
acceptable performance rather than just a vague concept of making things
faster. For example, one application I have inserts 1.3+ billion rows
per day. This represents two 'sets' of data. Our minimum requirement was
the ability to process 1 set, but if possible, 2 sets would be
ideal. Initially, with the original technology being used, it took
between 23 and 26 hours to process 1 set. We were able to tune this to
get it always to be under 24 hours, but there was no way we were going
to get the level of improvement which would allow more than 1 set to be
processed per day - not with the technology and design that was in
place.

A decision was made to re-implement using a different technology and
design. This was where we gained the improvements in performance we
really required. While the technology did play a part, it was really the
re-design which gave us the performance improvement to reach our desired
goal of 2 sets per day. Even 3 sets per day is a possibility now.

We could have spent a lot of time tuning and re-spe'ing hardware etc to
get to 1 set per day and we would have succeeded, but that would have
been the absolute upper limit. I suspect it would have cost about the
same as the re-implementation, but with a much lower upper limit.

Re-implementation of a solution is often a hard case to sell, but it
might be the only way to get the performance you want. The big positive
to a re-implementation is that you usually get a better solution because
you are implementing with more knowledge and experience about the
problem domain. Design is often cleaner and as a result, easier to
maintain. It usually takes a lot less time than the original
implementation as well and can be the more economical solution compared
to fighting a system which has fundamental design limitations that
restrict performance.

good luck,

Tim
--
Tim Cross


pgsql-performance by date:

Previous
From: padusuma
Date:
Subject: Re: Performance of INSERT into temporary tables using psqlODBCdriver
Next
From: padusuma
Date:
Subject: Re: Performance of INSERT into temporary tables using psqlODBCdriver