Re: Inserting 8MB bytea: just 25% of disk perf used? - Mailing list pgsql-performance
From | fkater@googlemail.com |
---|---|
Subject | Re: Inserting 8MB bytea: just 25% of disk perf used? |
Date | |
Msg-id | 20100122204202.GA1843@comppasch2 Whole thread Raw |
In response to | Re: Inserting 8MB bytea: just 25% of disk perf used? (Scott Carey <scott@richrelevance.com>) |
Responses |
Re: Inserting 8MB bytea: just 25% of disk perf used?
|
List | pgsql-performance |
Scott Carey: > Well, something is causing the system to alternate between > CPU and disk bound here. (see below). > It would be useful to see what affect the index has. Ok, I simply deleted the index and repeated the test: I did not notice any difference. This is probably so because in fact I am doing just 10 INSERTs. > > (B) perfmon34.png: Results/graphs (performance monitor): > > > Great data! BTW: I have some more screenshots but as they do not arrive on the mailing list I keep it. The new graphs are basicly the same anyway. > > (1) The upper dark/gray graph touching the 100% sometimes is > > "disk write time %" of the data disk G: > > > > (2) The yellow graph is nearly completly overpainted by (1) > > since it is "disk time %". > > > > (3) The brown graph below (1) is "Disk Write Byts/s" divided > > by 1.000.000, so around 40 MB/s average. > > > > Looks like it is writing everything twice, or close to it. > Alternatively the index writes occupy half, but that is > unlikely. 'Writing twice': That is the most interesting point I believe. Why is the data disk doing 40 MB/s *not* including WAL, however, having 20 MB/s write thoughput in fact. Seems like: 20 MB for data, 20 MB for X, 20 MB for WAL. Although that questions is still unanswered: I verified again that I am disk bound by temporarily replacing the raid-0 with slower solution: a singly attached sata disk of the same type: This *did* slow down the test a lot (approx. 20%). So, yes, I am disk bound but, again, why that much... About removing the index on OIDs: No impact (see above). > > (4) The read graph is "Disk Time %" of the WAL drive E:, > > average approx. 30%. > > > > WAL doesn't look like a bottleneck here, as other tests > have shown. A larger wal_buffers setting might lower this > more, since your record overflows the buffer for sure. > You might want to change your test case to write records > similar size to what you expect (do you expect 80MB?) and > then set wal_buffers up to the size of one checkpoint > segment (16MB) if you expect larger data per transaction. Ok, without knowing each exact effect I changed some of the configuration values (from the defaults in 8.2.4), and did some tests: (1) First, the most important 8.2.4 defaults (for you to overlook): #shared_buffers=32MB #temp_buffers=8MB #max_prepared_transactions=5 #work_mem=1MB #maintenance_work_mem=16MB #max_stack_depth=2MB #max_fsm_pages=204800 #max_fsm_relations=1000 #max_files_per_process=1000 #shared_preload_libraries='' #vacuum_cost_delay=0 #vacuum_cost_page_hit=1 #vacuum_cost_page_miss=10 #vacuum_cost_page_dirty=20 #vacuum_cost_limit=200 #bgwriter_delay=200ms #bgwriter_lru_percent=1.0 #bgwriter_lru_maxpages=5 #bgwriter_all_percent=0.333 #bgwriter_all_maxpages=5 #fsync=on #full_page_writes=on #wal_buffers=64kB #checkpoint_segments=3 #checkpoint_timeout=5min #checkpoint_warning=30s #seq_page_cost=1.0 #random_page_cost=4.0 #cpu_tuple_cost=0.01 #cpu_index_tuple_cost=0.005 #cpu_operator_cost=0.0025 #effective_cache_size=128MB #default_statistics_target=10 #constraint_exclusion=off #from_collapse_limit=8 #join_collapse_limit=8 #autovacuum=on #autovacuum_naptime=1min #autovacuum_vacuum_threshold=500 #autovacuum_analyze_threshold=250 #autovacuum_vacuum_scale_factor=0.2 #autovacuum_analyze_scale_factor=0.1 #autovacuum_freeze_max_age=200000000 #autovacuum_vacuum_cost_delay=-1 #autovacuum_vacuum_cost_limit=-1 #deadlock_timeout=1s #max_locks_per_transaction=64 (2) The tests: Note: The standard speed was about 800MB/40s, so 20MB/s. a) What I changed: fsync=off Result: 35s, so 5s faster. b) like a) but: checkpoint_segments=128 (was 3) autovacuum=off Result: 35s (no change...?!) c) like b) but: temp_buffers=200MB (was 8) wal_sync_method=open_datasync (was fsync) wal_buffers=1024kB (was 64) Result: The best ever, it took just 29s, so 800MB/29s = 27.5MB/s. However, having autovacuum=off probably means that deleted rows will occupy disk space? And I also fear that checkpoint_segments=128 mean that at some point in the future there will be a huge delay then (?). d) also like b) but: temp_buffers=1000MB wal_buffers=4096kB checkpoint_segments=3 autovacuum=on Result: Again slower 36s I am not able to interprete that in depth. > > (C) My interpretation > > > > (1) Although the data disk G: sometimes hits 100%: All in > > all it seems that neither the CPUs nor the data disk > > (approx. 65%) nor the WAL disk (approx. 30%) are at their > > limits. See also 1000 writes/s, 40MB/s write thoughput. > > > > I think it is alternating. Whatever is causing the 25% > CPU jump during the 'slow' periods is a clue. Some > process on the system must be increasing its time > significantly in these bursts. I suspect it is postgres > flushing data from its shared_buffers to the OS. 8.2 is > not very efficient at its ability to write out to the OS > in a constant stream, and tends to be very 'bursty' like > this. I suspect that 8.3 or 8.4 would perform a lot > better here, when tuned right. Ok, I've managed to use 8.4 here. Unfortunatelly: There was nearly no improvement in speed. For example test 2d) performed in 35s. > > The write troughput is still disappointing to me: Even if we > > would find a way to avoid those 'interrupts' of disk > > inactivity (see E 1), we are too far beyond serial disk > > write throughput (20 MB/s data disk + 20 MB/s other (!) WAL ... or better 20MB/s data disk + 20MB/s unexplained writes to data disk + 20 MB/s WAL disk... > > disk: is far below 100-200 MB/s resp. 40-70 MB/s). > > > > Although reaching 'full' sequential throughput is very > hard because not all of the writes are sequential, there > is a rather large gap here. Yes, it's a pitty. Thank You again so much. Felix
pgsql-performance by date: