Thread: performance of temporary vs. regular tables
Greetings, in http://archives.postgresql.org/message-id/1056648218.7041.11.camel@jester, it is stated that the performance of temporary tables is "the same as a regular table but without WAL on the table contents.". I have a datamining-type application which makes heavy use of temporary tables to stage (potentially large amounts of) data between different operations. WAL is write-ahead To effectively multi-thread this application, I (think I) need to switch from temporary to regular tables, because - the concurrent threads need to use different connections, not cursors, to effectively operate concurrently - temporary tables are not visible across connections (as they are across cursors of the same connection) Thus, I wonder how much this will affect performance. Access on the temporary table is inserting (millions of) rows once in a single transaction, potentially update them all once within a single transaction, then select on them once or more. Of course, eventually loosing the data in these tables is not a problem at all. The threads are synchronized above the SQL level. Thanks for any input on how to maximize performance for this applicaiton. Joachim
temporary tables are handled pretty much like the regular table. The magic happens on schema level, new schema is setup for connection, so that it can access its own temporary tables. Temporary tables also are not autovacuumed. And that's pretty much the most of the differences.
Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz: > temporary tables are handled pretty much like the regular table. The > magic happens on schema level, new schema is setup for connection, so > that it can access its own temporary tables. > Temporary tables also are not autovacuumed. > And that's pretty much the most of the differences. Thanks. So, the Write-Ahead-Logging (being used or not) does not matter? And, is there anything like RAM-only tables? I really don't care whether the staging data is lost on the rare event of a machine crash, or whether the query crashes due to lack of memory (I make sure there's enough w/o paging) - I only care about performance here. Joachim
2010/5/25 Joachim Worringen <joachim.worringen@iathh.de>: > Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz: >> >> temporary tables are handled pretty much like the regular table. The >> magic happens on schema level, new schema is setup for connection, so >> that it can access its own temporary tables. >> Temporary tables also are not autovacuumed. >> And that's pretty much the most of the differences. > > Thanks. So, the Write-Ahead-Logging (being used or not) does not matter? > > And, is there anything like RAM-only tables? I really don't care whether the > staging data is lost on the rare event of a machine crash, or whether the > query crashes due to lack of memory (I make sure there's enough w/o paging) > - I only care about performance here. > > Joachim > I think can create a tablespace on a ram disk, and create a table there. Thom
Am 25.05.2010 11:15, schrieb Thom Brown: > 2010/5/25 Joachim Worringen<joachim.worringen@iathh.de>: >> And, is there anything like RAM-only tables? I really don't care whether the >> staging data is lost on the rare event of a machine crash, or whether the >> query crashes due to lack of memory (I make sure there's enough w/o paging) >> - I only care about performance here. >> >> Joachim >> > > I think can create a tablespace on a ram disk, and create a table there. True, but I think this makes the database server configuration more complex (which is acceptable), and may add dependencies between the server configuration and the SQL statements for the selection of tablespace name (which would be a problem)? But I am a tablespace-novice and will look into this "workaround". thanks, Joachim
WAL does the same thing to DB journaling does to the FS. Plus allows you to roll back (PITR). As for the RAM, it will be in ram as long as OS decides to keep it in RAM cache, and/or its in the shared buffers memory. Unless you have a lot of doubt about the two, I don't think it makes too much sens to setup ramdisk table space yourself. But try it, and see yourself. Make sure that you have logic in place, that would set it up, before postgresql starts up, in case you'll reboot, or something.
Am 25.05.2010 11:38, schrieb Grzegorz Jaśkiewicz: > WAL does the same thing to DB journaling does to the FS. > Plus allows you to roll back (PITR). > > As for the RAM, it will be in ram as long as OS decides to keep it in > RAM cache, and/or its in the shared buffers memory. Or until I commit the transaction? I have not completely disabled sync-to-disk in my setup, as there are of course situations where new data comes into the database that needs to be stored in a safe manner. > Unless you have a lot of doubt about the two, I don't think it makes > too much sens to setup ramdisk table space yourself. But try it, and > see yourself. > Make sure that you have logic in place, that would set it up, before > postgresql starts up, in case you'll reboot, or something. That's what I thought about when mentioning "increased setup complexity". Simply adding a keyword like "NONPERSISTENT" to the table creation statement would be preferred... Joachim
On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote: > Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz: > > temporary tables are handled pretty much like the regular table. The > > magic happens on schema level, new schema is setup for connection, so > > that it can access its own temporary tables. > > Temporary tables also are not autovacuumed. > > And that's pretty much the most of the differences. > > Thanks. So, the Write-Ahead-Logging (being used or not) does not matter? It does matter quite significantly in my experience. Both from an io and a cpu overhead perspective. Andres
Am 25.05.2010 12:41, schrieb Andres Freund: > On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote: >> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter? > It does matter quite significantly in my experience. Both from an io and a cpu > overhead perspective. O.k., looks as if I have to make my own experience... I'll let you know if possible. Joachim
WAL matters in performance. Hence why it is advisable to have it on a separate drive :)
On 05/26/2010 06:03 PM, Joachim Worringen wrote: > Am 25.05.2010 12:41, schrieb Andres Freund: >> On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote: >>> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter? >> It does matter quite significantly in my experience. Both from an io >> and a cpu >> overhead perspective. > > O.k., looks as if I have to make my own experience... I'll let you know > if possible. As promised, I did a tiny benchmark - basically, 8 empty tables are filled with 100k rows each within 8 transactions (somewhat typically for my application). The test machine has 4 cores, 64G RAM and RAID1 10k drives for data. # INSERTs into a TEMPORARY table: [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml real 3m18.242s user 1m59.074s sys 1m51.001s # INSERTs into a standard table: [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml real 3m35.090s user 2m5.295s sys 2m2.307s Thus, there is a slight hit of about 10% (which may even be within meausrement variations) - your milage will vary. Joachim
On Fri, May 28, 2010 at 4:04 AM, Joachim Worringen <joachim.worringen@iathh.de> wrote: > On 05/26/2010 06:03 PM, Joachim Worringen wrote: >> >> Am 25.05.2010 12:41, schrieb Andres Freund: >>> >>> On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote: >>>> >>>> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter? >>> >>> It does matter quite significantly in my experience. Both from an io >>> and a cpu >>> overhead perspective. >> >> O.k., looks as if I have to make my own experience... I'll let you know >> if possible. > > As promised, I did a tiny benchmark - basically, 8 empty tables are filled > with 100k rows each within 8 transactions (somewhat typically for my > application). The test machine has 4 cores, 64G RAM and RAID1 10k drives for > data. > > # INSERTs into a TEMPORARY table: > [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml > > real 3m18.242s > user 1m59.074s > sys 1m51.001s > > # INSERTs into a standard table: > [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml > > real 3m35.090s > user 2m5.295s > sys 2m2.307s > > Thus, there is a slight hit of about 10% (which may even be within > meausrement variations) - your milage will vary. > > Joachim > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > I think it would be interesting to create a ram disk and insert into it. In the MySQL community even thought MyISAM has fallen out of use the Memory table (based on MyISAM) is still somewhat used. -- Rob Wultsch wultsch@gmail.com
> As promised, I did a tiny benchmark - basically, 8 empty tables are > filled with 100k rows each within 8 transactions (somewhat typically for > my application). The test machine has 4 cores, 64G RAM and RAID1 10k > drives for data. > > # INSERTs into a TEMPORARY table: > [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml > > real 3m18.242s > user 1m59.074s > sys 1m51.001s > > # INSERTs into a standard table: > [joachim@testsrv scaling]$ time pb query -d scaling_qry_1.xml > > real 3m35.090s > user 2m5.295s > sys 2m2.307s > > Thus, there is a slight hit of about 10% (which may even be within > meausrement variations) - your milage will vary. Usually WAL causes a much larger performance hit than this. Since the following command : CREATE TABLE tmp AS SELECT n FROM generate_series(1,1000000) AS n which inserts 1M rows takes 1.6 seconds on my desktop, your 800k rows INSERT taking more than 3 minutes is a bit suspicious unless : - you got huge fields that need TOASTing ; in this case TOAST compression will eat a lot of CPU and you're benchmarking TOAST, not the rest of the system - you got some non-indexed foreign key - some other reason ?
Am 02.06.2010 12:03, schrieb Pierre C: > Usually WAL causes a much larger performance hit than this. > > Since the following command : > > CREATE TABLE tmp AS SELECT n FROM generate_series(1,1000000) AS n > > which inserts 1M rows takes 1.6 seconds on my desktop, your 800k rows > INSERT taking more than 3 minutes is a bit suspicious unless : > > - you got huge fields that need TOASTing ; in this case TOAST > compression will eat a lot of CPU and you're benchmarking TOAST, not the > rest of the system > - you got some non-indexed foreign key > - some other reason ? Yes, the "other" reason is that I am not issueing a single SQL command, but import data from plain ASCII files through the Pyhton-based framework into the database. The difference between your measurement and my measurent is the upper potential of improvement for my system (which has, on the other hand, the advantage of being a bit more powerful and flexible than a single SQL statement....;-) ) Joachim
> Yes, the "other" reason is that I am not issueing a single SQL command, > but import data from plain ASCII files through the Pyhton-based > framework into the database. > > The difference between your measurement and my measurent is the upper > potential of improvement for my system (which has, on the other hand, > the advantage of being a bit more powerful and flexible than a single > SQL statement....;-) ) Ah, in that case ... ;) You could give pypy a try, sometimes it's a lot slower, sometimes it's a lot faster.