Discovery - when your IO Subsystem out performs processor Cores - how parallelisation really is our friend |
CPU Saturation – an over performing IO subsystem
It’s not often that you see the IO subsystem able to out drive the cores in the box. To demonstrate this behaviour the TestGuid_HeapInsert table (see http://sqlblogcasts.com/blogs/tonyrogerson/archive/2011/07/19/use-guid-s-uniqueidentifier-for-keys-when-storing-data-on-ssd-s-part-2-initial-database-schema.aspx for set up scripts), a 50 million row table (a heap) 54GBytes in size will be used, processor affinity will be used to lock SQL Server to a specific number of available cores.
Note: the equipment for this test is described here: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2011/07/22/use-guid-s-uniqueidentifier-for-keys-when-storing-data-on-ssd-s-part-3-initial-base-line-with-iometer-and-first-test-group.aspx. In summary, Windows 2008 R2 x64, SQL 2008 R2 x64, 16GB of RAM but SQL Server is limited to 1GB, processor is a AMD Phenom II X6 Six Core 1090T Black Edition.
Interestingly this behaviour will not show up when using the COUNT(*) function without any WHERE clause, performance is consistent regardless of the number of cores used, however, once you have a Compute Scalar in the plan then the core scalability issue described kicks in.
select COUNT(*) from TestGuid_HeapINSERT with ( index = 0 )
Average Disk Bytes/read 404,396
Average Disk Sec/read 0.001
Average Disk Reads/sec 2,806
Average Read Bytes/sec 1,134,827,393
The SQL below has been used; the index hint used to simplify the experiment and make sure a table scan is being performed:
select MAX( cast( expandrow as bigint ) ) from TestGuid_HeapINSERT with
( index = 0 )
Cores |
Avg Disk Bytes/Read |
Avg Disk Sec/Read |
Avg Disk Reads/Sec |
Avg Read Bytes/Sec |
Logical Reads |
Read-Ahead Reads |
CPU Time (ms) |
Elapsed (ms) |
1 |
411,619 |
0.000 |
622 |
256,306,080 |
7,142,864 |
7,142,864 |
211,225 |
228,679 |
2 |
406,215 |
0.001 |
1,244 |
506,625,778 |
7,142,864 |
7,142,864 |
216,249 |
116,102 |
3 |
407,495 |
0.001 |
1,833 |
746,940,222 |
7,142,864 |
7,142,754 |
220,336 |
78,957 |
4 |
409,191 |
0.001 |
2,392 |
978,956,526 |
7,142,864 |
7,142,864 |
223,797 |
60,129 |
5 |
408,588 |
0.001 |
2,795 |
1,142,225,279 |
7,142,864 |
7,142,864 |
225,248 |
51,388 |
6 |
408,588 |
0.001 |
3,270 |
1,324,789,338 |
7,142,864 |
7,142,864 |
229,429 |
44,631 |
Reviewing the figures above Average Disk Bytes per read, Average Disk Sec/Read,
Logical Reads, Read-ahead reads and CPU Time (ms) remain fairly constant,
however Average Disk Reads/Sec and Average Read Bytes/Sec are stepped and
Elapsed (ms) reduces dramatically with a second core then savings diminish as
cores are added.
Cores |
Avg Disk Reads/Sec |
Avg Read Bytes/Sec |
Elapsed (ms) |
2 |
622 |
250,319,698 |
-112,577 |
3 |
589 |
240,314,444 |
-37,145 |
4 |
559 |
232,016,304 |
-18,828 |
5 |
403 |
163,268,753 |
-8,741 |
6 |
475 |
182,564,059 |
-6,757 |
It should be noted, this problem isn’t actually specific to the IO subsystem, and the problem occurs if you have enough of your table in the buffer pool too, essentially the core cannot get the data through quick enough.
Relating to the real world
The figures above show that given a query that has no parallelisation and therefore runs on a single core the duration will be 512% of the capability of the box should all cores be used and will use only 19% of the availability Read Bytes/sec maximum.
In past versions of SQL Server we have been used to turning parallelism off because it more often than not extended the duration of the query, which was fixed in SQL Server 2008 where the parallelism does now work well. However, legacy systems and code and a perpetual myth that needs breaking is that MAXDOP should be removed and just let SQL Server get on with it.
It is true that on a system with a number of concurrent connections that this problem will balance itself out, however, what about the sequential overnight batch jobs that so many of us have?
No direct advice here (yet), may be that will come when I finally do my conclusions around these SSD benchmarks, but you need to be aware that this problem is here and as IO subsystems perform better (which mine does) then these issues will need addressing.
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |