REF : http://sqlserverplanet.com/troubleshooting/sql-server-slowness
The first step in diagnosing SQL Server Slowness is to determine the physical bottleneck with the most contention. Contention in one of the following areas does not always mean that subsystem is performing poorly. It could just as well be improper utilization due to poor tuning. Nevertheless, identifying the bottleneck is always the first place to start. Luckily, Microsoft does provide some unsupported but very useful tools to help us find this. For these, see the video Using SQLDiag and SQL Nexus. I still recommend reading the article below, as it will help understand the interrelation between subsystems.
There are ten main bottlenecks that can cause SQL Server to slow down. They are all interrelated and often fixing one causes another bottleneck to be revealed.
1.CPU
2.Memory
3.Disk
4.Paging
5.Blocking
6.Network IO
7.Process Trimming
8.Old Statistics
9.Fragmentation
10.Recompilation
Blocking
Blocking is caused by contention of resources. To understand blocking you need to understand locking. Locking occurs in order to ensure users see up-to-date accurate information. If records being updated are also being shown to a user before the update finishes, then inaccurate information is being displayed.
The most common reason for blocking is the lack of indexing or queries not written to utilize existing indexes. They mostly occur when update or delete statements are performing a table scan, while an incompatible lock like a select statement tries to read the same records.
This script will Show the blocking process
This script will show the table involved in the blocking
And this will show the queries that are most blocked
Once you find the blocker, you may need to kill it. To do so use the kill command:
--Replace the number below with the blocking SPID
KILL 5000
You will then want to determine if it is missing an index. If so, adding the appropriate index will likely get rid of the problem.
Show the blocking process:
Finding the lead blocker in a chain of blocks is a difficult challenge when using sp_who2. There are also a lot of detect blocking scripts that I have run in production environments that seem to block things themselves.
Probably the most efficient way to detect blocks is to query sysprocesses. The following script displays the lead blocker in a chain of blocks:
SELECT
spid
,sp.status
,loginame = SUBSTRING(loginame, 1, 12)
,hostname = SUBSTRING(hostname, 1, 12)
,blk = CONVERT(char(3), blocked)
,open_tran
,dbname = SUBSTRING(DB_NAME(sp.dbid),1,10)
,cmd
,waittype
,waittime
,last_batch
,SQLStatement =
SUBSTRING
(
qt.text,
er.statement_start_offset/2,
(CASE WHEN er.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(MAX), qt.text)) * 2
ELSE er.statement_end_offset
END - er.statement_start_offset)/2
)
FROM master.dbo.sysprocesses sp
LEFT JOIN sys.dm_exec_requests er
ON er.session_id = sp.spid
OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) as qt
WHERE spid IN (SELECT blocked FROM master.dbo.sysprocesses)
AND blocked = 0
This will also display the SQL Statement that is the offending blocker.
show the table involved in the blocking:
SELECT DISTINCT
objname = object_name(p.object_id)
FROM sys.partitions p
JOIN sys.dm_tran_locks t1
ON p.hobt_id = t1.resource_associated_entity_id
show the queries that are most blocked :
SELECT
OBJECT_NAME(objectid)
,BlockTime = total_elapsed_time - total_worker_time
,execution_count
,total_logical_reads
FROM sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.sql_handle)
ORDER BY total_elapsed_time - total_worker_time DESC
Old Statistics
If old statistics is your problem, you will likely experience this as a gradual decline of SQL Server slowing down over many days or weeks, or you may have just upgraded your platform (from 2000 to 2008) and forgot to update the statistics. Out of date statistics cause inaccurate execution plans. This problem is difficult to determine, so we should eliminate it first. Bad execution plans are many times a silent killer because the system can run severely slow, yet all other bottlenecks may be performing within an acceptable range.
First and foremost, turn on “auto update statistics” for the database. It is highly recommended to leave this on. Next, update all the statistics in the database. Keep in mind this may take time considering the sample rate for the statistics update. It is still highly recommended prior to troubleshooting.
It is also a good practice to rebuild or reorganize the indexes. Although this can likely be an intensive process to run during production hours.
update all the statistics in the database:
There are two ways to update statistics. The first way is the easy way. It is one line of code that will update all the statistics in the database using the default sample size of 20,000 rows per table.
EXEC sp_updatestats
The other way, is to use the UPDATE STATISTICS command. This command gives much better granularity of control:
-- Update all statistics on a table
UPDATE STATISTICS Sales.SalesOrderDetail
-- Update a specific index on a table
UPDATE STATISTICS Sales.SalesOrderDetail IX_SalesOrderDetail
-- Update one column on a table specifying sample size
UPDATE STATISTICS Production.Product(Products) WITH SAMPLE 50 PERCENT
Using update statistics can give you the granularity of control to only update the out of date statistics, thus having less impact on your production system.
The following script updates all out of date statistics. Set the @MaxDaysOld variable to the number of days you will allow the statistics to be out of date by. Setting the @SamplePercent variable to null will use the SQL Server default value of 20,000 rows. You can also change the sample type to specify rows or percent.
DECLARE @MaxDaysOld int
DECLARE @SamplePercent int
DECLARE @SampleType nvarchar(50)
SET @MaxDaysOld = 0
SET @SamplePercent = NULL --25
SET @SampleType = 'PERCENT' --'ROWS'
BEGIN TRY
DROP TABLE #OldStats
END TRY
BEGIN CATCH SELECT 1 END CATCH
SELECT
RowNum = ROW_NUMBER() OVER (ORDER BY ISNULL(STATS_DATE(object_id, st.stats_id),1))
,TableName = OBJECT_SCHEMA_NAME(st.object_id) + '.' + OBJECT_NAME(st.object_id)
,StatName = st.name
,StatDate = ISNULL(STATS_DATE(object_id, st.stats_id),1)
INTO #OldStats
FROM sys.stats st WITH (nolock)
WHERE DATEDIFF(day, ISNULL(STATS_DATE(object_id, st.stats_id),1), GETDATE()) > @MaxDaysOld
ORDER BY ROW_NUMBER() OVER (ORDER BY ISNULL(STATS_DATE(object_id, st.stats_id),1))
DECLARE @MaxRecord int
DECLARE @CurrentRecord int
DECLARE @TableName nvarchar(255)
DECLARE @StatName nvarchar(255)
DECLARE @SQL nvarchar(max)
DECLARE @SampleSize nvarchar(100)
SET @MaxRecord = (SELECT MAX(RowNum) FROM #OldStats)
SET @CurrentRecord = 1
SET @SQL = ''
SET @SampleSize = ISNULL(' WITH SAMPLE ' + CAST(@SamplePercent AS nvarchar(20)) + ' ' + @SampleType,N'')
WHILE @CurrentRecord <= @MaxRecord
BEGIN
SELECT
@TableName = os.TableName
,@StatName = os.StatName
FROM #OldStats os
WHERE RowNum = @CurrentRecord
SET @SQL = N'UPDATE STATISTICS ' + @TableName + ' ' + @StatName + @SampleSize
PRINT @SQL
EXEC sp_executesql @SQL
SET @CurrentRecord = @CurrentRecord + 1
END
After updating the statistics, the execution plans that use these statistics may become invalid. Ideally SQL Server should then create a new execution plan. Personally, I prefer to help SQL Server out by flushing the cache. I would recommend you do the same. Note, this clears the entire procedure cache for the server, not just the database.
-- Clears the procedure cache for the entire server
DBCC FREEPROCCACHE
You should then also update the usage stats. Usage stats are the row counts stored for each index:
-- Update all usage in the database
DBCC UPDATEUSAGE (0);
If you are not already doing so, it is highly recommended to leave the default settings of “Auto Update Statistics” and “Auto Create Statistics” ON.
rebuild or reorganize the indexes :
This script will automatically determine whether a rebuild or a reorganize should be used according to the fragmentation of the index. It will then execute the appropriate command. Note that performing index rebuilds online during production hours will cause contention.
SET NOCOUNT ON;
DECLARE @objectid int;
DECLARE @indexid int;
DECLARE @partitioncount bigint;
DECLARE @schemaname nvarchar(258);
DECLARE @objectname nvarchar(258);
DECLARE @indexname nvarchar(258);
DECLARE @partitionnum bigint;
DECLARE @partitions bigint;
DECLARE @frag float;
DECLARE @command varchar(8000);
-- ensure the temporary table does not exist
IF EXISTS (SELECT name FROM sys.objects WHERE name = 'work')
DROP TABLE work;
-- conditionally select from the function, converting object and index IDs
-- to names.
SELECT
object_id AS objectid,
index_id AS indexid,
partition_number AS partitionnum,
avg_fragmentation_in_percent AS frag
INTO work
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED')
WHERE avg_fragmentation_in_percent > 10.0 AND index_id > 0;
-- Declare the cursor for the list of partitions to be processed.
DECLARE partitions CURSOR FOR SELECT * FROM work;
-- Open the cursor.
OPEN partitions;
-- Loop through the partitions.
FETCH NEXT
FROM partitions
INTO @objectid, @indexid, @partitionnum, @frag;
WHILE @@FETCH_STATUS = 0
BEGIN;
SELECT @objectname = QUOTENAME(o.name),
@schemaname = QUOTENAME(s.name)
FROM sys.objects AS o
JOIN sys.schemas as s ON s.schema_id = o.schema_id
WHERE o.object_id = @objectid;
SELECT @indexname = QUOTENAME(name)
FROM sys.indexes
WHERE object_id = @objectid AND index_id = @indexid;
SELECT @partitioncount = count (*)
FROM sys.partitions
WHERE object_id = @objectid AND index_id = @indexid;
-- 30 is the decision point at which to switch
-- between reorganizing and rebuilding
IF @frag < 30.0
BEGIN;
SELECT @command = 'ALTER INDEX ' + @indexname + ' ON '
+ @schemaname + '.' + @objectname + ' REORGANIZE';
IF @partitioncount > 1
SELECT @command = @command + ' PARTITION='
+ CONVERT (CHAR, @partitionnum);
EXEC (@command);
END;
IF @frag >= 30.0
BEGIN;
SELECT @command = 'ALTER INDEX ' + @indexname +' ON ' + @schemaname
+ '.' + @objectname + ' REBUILD';
IF @partitioncount > 1
SELECT @command = @command + ' PARTITION='
+ CONVERT (CHAR, @partitionnum);
EXEC (@command);
END;
PRINT 'Executed ' + @command;
FETCH NEXT FROM partitions INTO @objectid, @indexid, @partitionnum, @frag;
END;
-- Close and deallocate the cursor.
CLOSE partitions;
DEALLOCATE partitions;
-- drop the temporary table
IF EXISTS (SELECT name FROM sys.objects WHERE name = 'work')
DROP TABLE work;
GO
CPU – (Historically Low)
When the CPU on a box that has historically been low becomes pegged, there is a lot of hope because this is usually the easiest fix. It is likely you have a run away query. The first thing you want to do is run sp_who2 (or sp_who3) and look for connections that have a SPID > 50 with abnormally high CPU time. The CPU time shown on sp_who2 is cumulative for the entire connection, so keep in mind if the connection has been open for days, the high CPU time might be plausible. Once you find a suspect connection, run the following command to see what is executing (sp_who3 already provides this):
DECLARE @SPID int
SET @SPID = yourspidhere
DBCC INPUTBUFFER(@SPID)
The input buffer will show you the command that is being executed. If it is determined to be rougue, kill it:
DECLARE @SPID int
SET @SPID = yourspidhere
kill @SPID
Another slowdown that may occur is Random SQL Server Execution plan hanging. To determine this, when looking at the CPU utilization for each session, take note of the amount of Disk IO being used. If you notice high processor utilization and very low IO utilization, then it is likely due to a spinning execution plan.
High CPU, little or no IO
This is a problem that has cropped up since SQL Server 2005. It is a small price to be paid for a smarter optimizer. The easiest way I have found to fix this is to simply rewrite the query. You can find more information here (Random SQL Server Execution plan hanging).
Random SQL Server Execution plan hanging:
This post deals with a random hanging that sometimes happens with SQL Server 2005+. In order to troubleshoot SQL Server Slowness, go here.
The introduction of the new SQL Server 2005 Query Optimization engine has brought great things (including statement-level caching and smarter execution plan generation). There is however a little more overhead with the advent of this new technology. Aside from taking longer to generate an execution plan, I have noticed two separate instances where a query would appear to intermittently hang. From a database engine perspective however, the query is not hanging but generating an execution plan.
The two instances I’ve witnessed this in were in both SQL Server 2005 (sp2), and also now in SQL Server 2008. Both procedures were relatively small, however somewhat complex in their where clauses.
The physical indicators in both instances where very high CPU usage and very low IO usage for one particular SPID which we gathered by executing sp_who2.
High CPU, little or no IO
Here is an example of the query that hung on SQL Server 2008:
SET @IsTrue =
(
SELECT COUNT(1)
FROM dbo.table1 t1
JOIN dbo.table2 t2
ON t1.ColumnID = t2.ColumnID
WHERE (
t2.ID = @ID
AND t2.SomeDate < GETDATE()
)
OR EXISTS
(
SELECT 1
FROM dbo.Table3
WHERE ID = @ID
AND AlternateID IN
(1,2,3,4,6)
)
)
Once the query would hang in generating the execution plan, the effect seemed to snowball and cause other executing instances of the same procedure to hang. Also, no blocking was occuring in the database, and these SPIDs did not show as ‘Runnable’ in sp_who2.
The fix we found was to make the query easier to digest for the optimizer. First, the GETDATE() within the where clause possibly made the query too non-deterministic in order for the optimizer to cache it’s execution plan. So we replaced the GETDATE() with a variable and set the variable to GETDATE(). This likely allowed the optimizer to ‘sniff’ the variable and determine a possible range of values in which to base an execution plan upon. Second, we removed the OR in the where clause and UNION’ed the two statements together. These two changes fixed our issue.
DECLARE @Today datetime = GETDATE()
SET @IsTrue =
(
SELECT SUM(t.Flag)
FROM(
SELECT COUNT(1) AS Flag
FROM dbo.table1 t1
JOIN dbo.table1 t1
ON t1.ColumnID = t2.ColumnID
WHERE t2.SomeDate < @Today
UNION
SELECT 1 AS Flag
FROM dbo.Table3
WHERE ID = @ID
AND AlternateID IN
(1,2,3,4,6)
) t
)
CPU – (Historically Medium/High)
For a CPU that has historically been high, the most common issue is a lack of indexes. This is an easy fix with the new DMVs introduced in SQL Server 2005. Usually adding indexes for the top 5 offending queries will resolve this issue. Follow this to Find Missing Indexes. Lack of indexes causes table scans which in turn eats up memory IO, disk IO and processor.
Find Missing Indexes :
This query will show the indexes that are missing ordered according to those having the most impact. It will also provide the create index script needed in order to help you create the index.
SELECT
mid.statement
,migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,OBJECT_NAME(mid.Object_id),
'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)
+ '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
+ ' ON ' + mid.statement
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
What you need to know about this script however is what it does not account for. It does not account for an index that should be clustered. One of the warning signs that an index should be clustered is when this query suggests to you an index that contains a lot of columns (or has a lot of include columns). It is suggesting that, because it does not want to do a bookmark lookup to get the columns it needs for the select list. In those cases, you want to compare the number of user_seeks this query tells you, with the number that is shown in the Index Usage DMV. If this query’s seeks are higher, then consider making this the clustered index.
Also, once the queries are created, you can check to see if your assumption was correct by using the Clustered Index Checker DMV Script.
Memory
Likely the most important component of SQL Server is enough memory. With enough RAM, you can mitigate the number of times your disk needs to be touched. Ideally, the data that is queried often should always be in RAM. In order to accomplish this, enough RAM should be provided to ensure the Page Life Expectancy (PLE) remains over 300.
To verify the PLE:
Open perfmon Select the Performance Object of MSSQLServer:Buffer Manager Highlight “Page Life Expectancy” and click Add.
The counter should remain over 300 seconds. This is the average amount of time SQL Server has estimated that each data page will be able to stay resident in memory until another process forces it out. If you see the PLE between zero and 100, then you definitely have a bottleneck and you need to Find the memory usage of the currently executing queries. If the query is run on a schedule and you weren’t able to catch it then you can Find the queries that have historically taken the most memory.
Additional information that may help is identifying the database taking the most memory
Find the memory usage of the currently executing queries :
To get the query memory usage of currently executing queries run the following:
SELECT
TEXT
,query_plan
,requested_memory_kb
,granted_memory_kb
,used_memory_kb
FROM sys.dm_exec_query_memory_grants emg
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
CROSS APPLY sys.dm_exec_query_plan(emg.plan_handle)
ORDER BY emg.requested_memory_kb DESC
Find the queries that have historically taken the most memory :
This query returns back the queries that use the most IO. This can mean that either the query is reading from disk more than usual or occupying and utilizing a large amount of buffer cache. These are typical symptoms of queries that do not have the proper indexes or queries that simply read a lot of data.
/**********************************************************
* top procedures memory consumption per execution
* (this will show mostly reports & jobs)
***********************************************************/
SELECT TOP 100 *
FROM
(
SELECT
DatabaseName = DB_NAME(qt.dbid)
,ObjectName = OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads
,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads
,Executions = SUM(qs.execution_count)
,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)
,CPUTime = SUM(qs.total_worker_time)
,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)
,MemoryWrites = SUM(qs.max_logical_writes)
,DateLastExecuted = MAX(qs.last_execution_time)
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
GROUP BY DB_NAME(qt.dbid), OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
) T
ORDER BY IO_Per_Execution DESC
/**********************************************************
* top procedures memory consumption total
* (this will show more operational procedures)
***********************************************************/
SELECT TOP 100 *
FROM
(
SELECT
DatabaseName = DB_NAME(qt.dbid)
,ObjectName = OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads
,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads
,Total_IO_Reads = SUM(qs.total_physical_reads + qs.total_logical_reads)
,Executions = SUM(qs.execution_count)
,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)
,CPUTime = SUM(qs.total_worker_time)
,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)
,MemoryWrites = SUM(qs.max_logical_writes)
,DateLastExecuted = MAX(qs.last_execution_time)
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
GROUP BY DB_NAME(qt.dbid), OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
) T
ORDER BY Total_IO_Reads DESC
/**********************************************************
* top adhoc queries memory consumption total
***********************************************************/
SELECT TOP 100 *
FROM
(
SELECT
DatabaseName = DB_NAME(qt.dbid)
,QueryText = qt.text
,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads
,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads
,Total_IO_Reads = SUM(qs.total_physical_reads + qs.total_logical_reads)
,Executions = SUM(qs.execution_count)
,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)
,CPUTime = SUM(qs.total_worker_time)
,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)
,MemoryWrites = SUM(qs.max_logical_writes)
,DateLastExecuted = MAX(qs.last_execution_time)
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
WHERE OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid) IS NULL
GROUP BY DB_NAME(qt.dbid), qt.text, OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
) T
ORDER BY Total_IO_Reads DESC
/**********************************************************
* top adhoc queries memory consumption per execution
***********************************************************/
SELECT TOP 100 *
FROM
(
SELECT
DatabaseName = DB_NAME(qt.dbid)
,QueryText = qt.text
,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads
,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads
,Total_IO_Reads = SUM(qs.total_physical_reads + qs.total_logical_reads)
,Executions = SUM(qs.execution_count)
,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)
,CPUTime = SUM(qs.total_worker_time)
,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)
,MemoryWrites = SUM(qs.max_logical_writes)
,DateLastExecuted = MAX(qs.last_execution_time)
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
WHERE OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid) IS NULL
GROUP BY DB_NAME(qt.dbid), qt.text, OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
) T
ORDER BY IO_Per_Execution DESC
identifying the database taking the most memory :
This query shows how many pages each database has in the buffer Pool. This will show you the breakdown of memory allocation for each database.
SELECT
DB_NAME(database_id),
COUNT(page_id)as number_pages
FROM sys.dm_os_buffer_descriptors
WHERE database_id !=32767
GROUP BY database_id
ORDER BY database_id
Disk
The biggest scapegoat for slowness is always the disk. (Yeah I said it). Yes, it is true that more often than not disks do not perform according to the ideal specifications. But aside from checkpoints and large data loads, it shouldn’t matter that much. Most customer facing OLTP systems I have worked with have been mostly read intensive. In almost all instances where the disk counters were indicative of under performing, it was a lack of both memory and indexing that contributed disk high disk queue length, thus hosing the disk. I would like to initially steer you clear of looking at the disk as being the problem.
Now, with that being said, here are the conditions where you will see disk bottlenecking:
1.In almost all cases, the CPU % is very low because it is waiting for the disk.
2.The server is write intensive, usually a data warehouse or a subscriber to a large replication.
3.Import processes are written using full truncates and full inserts rather than delta inserts.
4.Delete is used where truncate could be an option
Look here to help determine expensive IO read queries and determine expensive IO write queries
The only plausible instance where I needed management to change the disk was when IT setup a RAID 6 for my data drive.
So, if your CPU is low (less than 30%), and you verified the Page Life Expectancy (> 300) and cleaned the indexes, go ahead and blame the disk. But first, verify in perfmon under Performance Object -> Physical Disk:
1.% Idle Time < 30%
2.Avg Disk/Sec Transfer > .100
3.Disk Queue Length < 20
Those are the only ones I typically use because most of the times Windows is assuming it is looking at a local disk, when most corporate SQL Servers run off a SAN. Also, to get accurate results, the interval for perfmon should be under 5 seconds.
determine expensive IO read queries :
SELECT TOP 10
'Procedure' = qt.text
,DiskReads = qs.total_physical_reads -- The worst reads, disk reads
,MemoryReads = qs.total_logical_reads --Logical Reads are memory reads
,Executions = qs.execution_count
,CPUTime = qs.total_worker_time
,DiskWaitAndCPUTime = qs.total_elapsed_time
,MemoryWrites = qs.max_logical_writes
,DateCached = qs.creation_time
,DatabaseName = DB_Name(qt.dbid)
,LastExecutionTime = qs.last_execution_time
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
ORDER BY qs.max_logical_reads DESC
determine expensive IO write queries :
SELECT TOP 10
'Procedure' = qt.text
,DiskReads = qs.total_physical_reads -- The worst reads, disk reads
,MemoryReads = qs.total_logical_reads --Logical Reads are memory reads
,Executions = qs.execution_count
,CPUTime = qs.total_worker_time
,DiskWaitAndCPUTime = qs.total_elapsed_time
,MemoryWrites = qs.max_logical_writes
,DateCached = qs.creation_time
,DatabaseName = DB_Name(qt.dbid)
,LastExecutionTime = qs.last_execution_time
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
ORDER BY qs.max_logical_writes DESC
Paging
This is an issue that has cropped up with the advent of 64bit OS’s. Windows Server 2003 changed it’s priority for non OS processes. This severely impacted SQL Server and created a heck of a time for SQL Server support. What happens, is when the OS feels it is experiencing external memory pressure, it begins choosing victims to steal their memory. In which cases non system level process will find their working set swapped to the paging file. The most obvious indication of this is the amount of page file usage. It should preferably be under 8%. I have seen some SQL Servers run ok on 10% however this is pushing it.
To determine the page file usage open perfmon and select the Paging File performance object. Select both % Usage Peak and %Usage. Usage Peak is the peak amount that has how much has been used since the server last rebooted.
Paging File
To fix a paging file gone bad, unfortunately either requires a reboot, or sometimes you can get away with restarting SQL Server. The most important issue is why did it happen in the first place. Ideally, you want to provide the OS with enough room to breathe. And with 64-bit OS’s, this usually means a lot more memory than you are use to providing that for 32-bit. I only feel comfortable if my dedicated SQL Server box has a full 8 GB Free. Yes, that’s correct, 8 GIGA-BYTES. I know it sounds like a lot, however that is a valid threshold we determined with a SQL Server 2005 box that had trimming issues. (years from now people will be laughing at this post)
As for setup, it is recommended that (even on 64-bit OS’s) that you turn on AWE. I went around and around with this until I ended up speaking with someone from the highest level support at SQL Server Support. This person had access to the developers who looked up source code in helping with our issue. I explained to him that it doesn’t seem like we need AWE turned on because 64-bit can map the memory itself. He replied, I know, but just turn it on. And that was enough for me.
Some other things to keep in mind, are large data transfers. DO NOT use your SQL Server box to copy or move files across the network. Instead have another box come and get the file. Do not waste your precious memory on file operations. Here is a Microsoft KB: How to reduce paging of buffer pool memory in the 64-bit version of SQL Server
Process Trimming
This is very much related to paging and some of the same symptoms can be seen. What we were referring to regarding paging in the section above was Buffer Pool paging. Process trimming refers to working set paging. The difference however is subtle. It requires an explanation between the difference of “Private Bytes” and “Working Set”. The working set is the amount of memory the OS has granted to a process for usage. According to the process, it has the entire working set as it’s playground. However, the actual amount of physical memory it has is the private bytes which is a lower amount. The difference between the working set and the private bytes can be found in the glorious paging file. So if you notice a large disparity between private bytes and the working set, you are experiencing something known as process trimming. I hope you never have this issue, and you should not have this issue if you give the OS enough room to breathe. There are plenty of articles that delve deep into this subject so I will leave the experts to explain the specifics. The point is, give the OS enough room to breathe. Here is a performance team article on the subject: SQL and the Working Set
this article is refered from the below link
REF : http://sqlserverplanet.com/troubleshooting/sql-server-slowness
The first step in diagnosing SQL Server Slowness is to determine the physical bottleneck with the most contention. Contention in one of the following areas does not always mean that subsystem is performing poorly. It could just as well be improper utilization due to poor tuning. Nevertheless, identifying the bottleneck is always the first place to start. Luckily, Microsoft does provide some unsupported but very useful tools to help us find this. For these, see the video Using SQLDiag and SQL Nexus. I still recommend reading the article below, as it will help understand the interrelation between subsystems.
There are ten main bottlenecks that can cause SQL Server to slow down. They are all interrelated and often fixing one causes another bottleneck to be revealed.
1.CPU
2.Memory
3.Disk
4.Paging
5.Blocking
6.Network IO
7.Process Trimming
8.Old Statistics
9.Fragmentation
10.Recompilation
Blocking
Blocking is caused by contention of resources. To understand blocking you need to understand locking. Locking occurs in order to ensure users see up-to-date accurate information. If records being updated are also being shown to a user before the update finishes, then inaccurate information is being displayed.
The most common reason for blocking is the lack of indexing or queries not written to utilize existing indexes. They mostly occur when update or delete statements are performing a table scan, while an incompatible lock like a select statement tries to read the same records.
This script will Show the blocking process
This script will show the table involved in the blocking
And this will show the queries that are most blocked
Once you find the blocker, you may need to kill it. To do so use the kill command:
--Replace the number below with the blocking SPID
KILL 5000
You will then want to determine if it is missing an index. If so, adding the appropriate index will likely get rid of the problem.
Show the blocking process:
Finding the lead blocker in a chain of blocks is a difficult challenge when using sp_who2. There are also a lot of detect blocking scripts that I have run in production environments that seem to block things themselves.
Probably the most efficient way to detect blocks is to query sysprocesses. The following script displays the lead blocker in a chain of blocks:
SELECT
spid
,sp.status
,loginame = SUBSTRING(loginame, 1, 12)
,hostname = SUBSTRING(hostname, 1, 12)
,blk = CONVERT(char(3), blocked)
,open_tran
,dbname = SUBSTRING(DB_NAME(sp.dbid),1,10)
,cmd
,waittype
,waittime
,last_batch
,SQLStatement =
SUBSTRING
(
qt.text,
er.statement_start_offset/2,
(CASE WHEN er.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(MAX), qt.text)) * 2
ELSE er.statement_end_offset
END - er.statement_start_offset)/2
)
FROM master.dbo.sysprocesses sp
LEFT JOIN sys.dm_exec_requests er
ON er.session_id = sp.spid
OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) as qt
WHERE spid IN (SELECT blocked FROM master.dbo.sysprocesses)
AND blocked = 0
This will also display the SQL Statement that is the offending blocker.
show the table involved in the blocking:
SELECT DISTINCT
objname = object_name(p.object_id)
FROM sys.partitions p
JOIN sys.dm_tran_locks t1
ON p.hobt_id = t1.resource_associated_entity_id
show the queries that are most blocked :
SELECT
OBJECT_NAME(objectid)
,BlockTime = total_elapsed_time - total_worker_time
,execution_count
,total_logical_reads
FROM sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.sql_handle)
ORDER BY total_elapsed_time - total_worker_time DESC
Old Statistics
If old statistics is your problem, you will likely experience this as a gradual decline of SQL Server slowing down over many days or weeks, or you may have just upgraded your platform (from 2000 to 2008) and forgot to update the statistics. Out of date statistics cause inaccurate execution plans. This problem is difficult to determine, so we should eliminate it first. Bad execution plans are many times a silent killer because the system can run severely slow, yet all other bottlenecks may be performing within an acceptable range.
First and foremost, turn on “auto update statistics” for the database. It is highly recommended to leave this on. Next, update all the statistics in the database. Keep in mind this may take time considering the sample rate for the statistics update. It is still highly recommended prior to troubleshooting.
It is also a good practice to rebuild or reorganize the indexes. Although this can likely be an intensive process to run during production hours.
update all the statistics in the database:
There are two ways to update statistics. The first way is the easy way. It is one line of code that will update all the statistics in the database using the default sample size of 20,000 rows per table.
EXEC sp_updatestats
The other way, is to use the UPDATE STATISTICS command. This command gives much better granularity of control:
-- Update all statistics on a table
UPDATE STATISTICS Sales.SalesOrderDetail
-- Update a specific index on a table
UPDATE STATISTICS Sales.SalesOrderDetail IX_SalesOrderDetail
-- Update one column on a table specifying sample size
UPDATE STATISTICS Production.Product(Products) WITH SAMPLE 50 PERCENT
Using update statistics can give you the granularity of control to only update the out of date statistics, thus having less impact on your production system.
The following script updates all out of date statistics. Set the @MaxDaysOld variable to the number of days you will allow the statistics to be out of date by. Setting the @SamplePercent variable to null will use the SQL Server default value of 20,000 rows. You can also change the sample type to specify rows or percent.
DECLARE @MaxDaysOld int
DECLARE @SamplePercent int
DECLARE @SampleType nvarchar(50)
SET @MaxDaysOld = 0
SET @SamplePercent = NULL --25
SET @SampleType = 'PERCENT' --'ROWS'
BEGIN TRY
DROP TABLE #OldStats
END TRY
BEGIN CATCH SELECT 1 END CATCH
SELECT
RowNum = ROW_NUMBER() OVER (ORDER BY ISNULL(STATS_DATE(object_id, st.stats_id),1))
,TableName = OBJECT_SCHEMA_NAME(st.object_id) + '.' + OBJECT_NAME(st.object_id)
,StatName = st.name
,StatDate = ISNULL(STATS_DATE(object_id, st.stats_id),1)
INTO #OldStats
FROM sys.stats st WITH (nolock)
WHERE DATEDIFF(day, ISNULL(STATS_DATE(object_id, st.stats_id),1), GETDATE()) > @MaxDaysOld
ORDER BY ROW_NUMBER() OVER (ORDER BY ISNULL(STATS_DATE(object_id, st.stats_id),1))
DECLARE @MaxRecord int
DECLARE @CurrentRecord int
DECLARE @TableName nvarchar(255)
DECLARE @StatName nvarchar(255)
DECLARE @SQL nvarchar(max)
DECLARE @SampleSize nvarchar(100)
SET @MaxRecord = (SELECT MAX(RowNum) FROM #OldStats)
SET @CurrentRecord = 1
SET @SQL = ''
SET @SampleSize = ISNULL(' WITH SAMPLE ' + CAST(@SamplePercent AS nvarchar(20)) + ' ' + @SampleType,N'')
WHILE @CurrentRecord <= @MaxRecord
BEGIN
SELECT
@TableName = os.TableName
,@StatName = os.StatName
FROM #OldStats os
WHERE RowNum = @CurrentRecord
SET @SQL = N'UPDATE STATISTICS ' + @TableName + ' ' + @StatName + @SampleSize
PRINT @SQL
EXEC sp_executesql @SQL
SET @CurrentRecord = @CurrentRecord + 1
END
After updating the statistics, the execution plans that use these statistics may become invalid. Ideally SQL Server should then create a new execution plan. Personally, I prefer to help SQL Server out by flushing the cache. I would recommend you do the same. Note, this clears the entire procedure cache for the server, not just the database.
-- Clears the procedure cache for the entire server
DBCC FREEPROCCACHE
You should then also update the usage stats. Usage stats are the row counts stored for each index:
-- Update all usage in the database
DBCC UPDATEUSAGE (0);
If you are not already doing so, it is highly recommended to leave the default settings of “Auto Update Statistics” and “Auto Create Statistics” ON.
rebuild or reorganize the indexes :
This script will automatically determine whether a rebuild or a reorganize should be used according to the fragmentation of the index. It will then execute the appropriate command. Note that performing index rebuilds online during production hours will cause contention.
SET NOCOUNT ON;
DECLARE @objectid int;
DECLARE @indexid int;
DECLARE @partitioncount bigint;
DECLARE @schemaname nvarchar(258);
DECLARE @objectname nvarchar(258);
DECLARE @indexname nvarchar(258);
DECLARE @partitionnum bigint;
DECLARE @partitions bigint;
DECLARE @frag float;
DECLARE @command varchar(8000);
-- ensure the temporary table does not exist
IF EXISTS (SELECT name FROM sys.objects WHERE name = 'work')
DROP TABLE work;
-- conditionally select from the function, converting object and index IDs
-- to names.
SELECT
object_id AS objectid,
index_id AS indexid,
partition_number AS partitionnum,
avg_fragmentation_in_percent AS frag
INTO work
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED')
WHERE avg_fragmentation_in_percent > 10.0 AND index_id > 0;
-- Declare the cursor for the list of partitions to be processed.
DECLARE partitions CURSOR FOR SELECT * FROM work;
-- Open the cursor.
OPEN partitions;
-- Loop through the partitions.
FETCH NEXT
FROM partitions
INTO @objectid, @indexid, @partitionnum, @frag;
WHILE @@FETCH_STATUS = 0
BEGIN;
SELECT @objectname = QUOTENAME(o.name),
@schemaname = QUOTENAME(s.name)
FROM sys.objects AS o
JOIN sys.schemas as s ON s.schema_id = o.schema_id
WHERE o.object_id = @objectid;
SELECT @indexname = QUOTENAME(name)
FROM sys.indexes
WHERE object_id = @objectid AND index_id = @indexid;
SELECT @partitioncount = count (*)
FROM sys.partitions
WHERE object_id = @objectid AND index_id = @indexid;
-- 30 is the decision point at which to switch
-- between reorganizing and rebuilding
IF @frag < 30.0
BEGIN;
SELECT @command = 'ALTER INDEX ' + @indexname + ' ON '
+ @schemaname + '.' + @objectname + ' REORGANIZE';
IF @partitioncount > 1
SELECT @command = @command + ' PARTITION='
+ CONVERT (CHAR, @partitionnum);
EXEC (@command);
END;
IF @frag >= 30.0
BEGIN;
SELECT @command = 'ALTER INDEX ' + @indexname +' ON ' + @schemaname
+ '.' + @objectname + ' REBUILD';
IF @partitioncount > 1
SELECT @command = @command + ' PARTITION='
+ CONVERT (CHAR, @partitionnum);
EXEC (@command);
END;
PRINT 'Executed ' + @command;
FETCH NEXT FROM partitions INTO @objectid, @indexid, @partitionnum, @frag;
END;
-- Close and deallocate the cursor.
CLOSE partitions;
DEALLOCATE partitions;
-- drop the temporary table
IF EXISTS (SELECT name FROM sys.objects WHERE name = 'work')
DROP TABLE work;
GO
CPU – (Historically Low)
When the CPU on a box that has historically been low becomes pegged, there is a lot of hope because this is usually the easiest fix. It is likely you have a run away query. The first thing you want to do is run sp_who2 (or sp_who3) and look for connections that have a SPID > 50 with abnormally high CPU time. The CPU time shown on sp_who2 is cumulative for the entire connection, so keep in mind if the connection has been open for days, the high CPU time might be plausible. Once you find a suspect connection, run the following command to see what is executing (sp_who3 already provides this):
DECLARE @SPID int
SET @SPID = yourspidhere
DBCC INPUTBUFFER(@SPID)
The input buffer will show you the command that is being executed. If it is determined to be rougue, kill it:
DECLARE @SPID int
SET @SPID = yourspidhere
kill @SPID
Another slowdown that may occur is Random SQL Server Execution plan hanging. To determine this, when looking at the CPU utilization for each session, take note of the amount of Disk IO being used. If you notice high processor utilization and very low IO utilization, then it is likely due to a spinning execution plan.
High CPU, little or no IO
This is a problem that has cropped up since SQL Server 2005. It is a small price to be paid for a smarter optimizer. The easiest way I have found to fix this is to simply rewrite the query. You can find more information here (Random SQL Server Execution plan hanging).
Random SQL Server Execution plan hanging:
This post deals with a random hanging that sometimes happens with SQL Server 2005+. In order to troubleshoot SQL Server Slowness, go here.
The introduction of the new SQL Server 2005 Query Optimization engine has brought great things (including statement-level caching and smarter execution plan generation). There is however a little more overhead with the advent of this new technology. Aside from taking longer to generate an execution plan, I have noticed two separate instances where a query would appear to intermittently hang. From a database engine perspective however, the query is not hanging but generating an execution plan.
The two instances I’ve witnessed this in were in both SQL Server 2005 (sp2), and also now in SQL Server 2008. Both procedures were relatively small, however somewhat complex in their where clauses.
The physical indicators in both instances where very high CPU usage and very low IO usage for one particular SPID which we gathered by executing sp_who2.
High CPU, little or no IO
Here is an example of the query that hung on SQL Server 2008:
SET @IsTrue =
(
SELECT COUNT(1)
FROM dbo.table1 t1
JOIN dbo.table2 t2
ON t1.ColumnID = t2.ColumnID
WHERE (
t2.ID = @ID
AND t2.SomeDate < GETDATE()
)
OR EXISTS
(
SELECT 1
FROM dbo.Table3
WHERE ID = @ID
AND AlternateID IN
(1,2,3,4,6)
)
)
Once the query would hang in generating the execution plan, the effect seemed to snowball and cause other executing instances of the same procedure to hang. Also, no blocking was occuring in the database, and these SPIDs did not show as ‘Runnable’ in sp_who2.
The fix we found was to make the query easier to digest for the optimizer. First, the GETDATE() within the where clause possibly made the query too non-deterministic in order for the optimizer to cache it’s execution plan. So we replaced the GETDATE() with a variable and set the variable to GETDATE(). This likely allowed the optimizer to ‘sniff’ the variable and determine a possible range of values in which to base an execution plan upon. Second, we removed the OR in the where clause and UNION’ed the two statements together. These two changes fixed our issue.
DECLARE @Today datetime = GETDATE()
SET @IsTrue =
(
SELECT SUM(t.Flag)
FROM(
SELECT COUNT(1) AS Flag
FROM dbo.table1 t1
JOIN dbo.table1 t1
ON t1.ColumnID = t2.ColumnID
WHERE t2.SomeDate < @Today
UNION
SELECT 1 AS Flag
FROM dbo.Table3
WHERE ID = @ID
AND AlternateID IN
(1,2,3,4,6)
) t
)
CPU – (Historically Medium/High)
For a CPU that has historically been high, the most common issue is a lack of indexes. This is an easy fix with the new DMVs introduced in SQL Server 2005. Usually adding indexes for the top 5 offending queries will resolve this issue. Follow this to Find Missing Indexes. Lack of indexes causes table scans which in turn eats up memory IO, disk IO and processor.
Find Missing Indexes :
This query will show the indexes that are missing ordered according to those having the most impact. It will also provide the create index script needed in order to help you create the index.
SELECT
mid.statement
,migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,OBJECT_NAME(mid.Object_id),
'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)
+ '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
+ ' ON ' + mid.statement
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
What you need to know about this script however is what it does not account for. It does not account for an index that should be clustered. One of the warning signs that an index should be clustered is when this query suggests to you an index that contains a lot of columns (or has a lot of include columns). It is suggesting that, because it does not want to do a bookmark lookup to get the columns it needs for the select list. In those cases, you want to compare the number of user_seeks this query tells you, with the number that is shown in the Index Usage DMV. If this query’s seeks are higher, then consider making this the clustered index.
Also, once the queries are created, you can check to see if your assumption was correct by using the Clustered Index Checker DMV Script.
Memory
Likely the most important component of SQL Server is enough memory. With enough RAM, you can mitigate the number of times your disk needs to be touched. Ideally, the data that is queried often should always be in RAM. In order to accomplish this, enough RAM should be provided to ensure the Page Life Expectancy (PLE) remains over 300.
To verify the PLE:
Open perfmon Select the Performance Object of MSSQLServer:Buffer Manager Highlight “Page Life Expectancy” and click Add.
The counter should remain over 300 seconds. This is the average amount of time SQL Server has estimated that each data page will be able to stay resident in memory until another process forces it out. If you see the PLE between zero and 100, then you definitely have a bottleneck and you need to Find the memory usage of the currently executing queries. If the query is run on a schedule and you weren’t able to catch it then you can Find the queries that have historically taken the most memory.
Additional information that may help is identifying the database taking the most memory
Find the memory usage of the currently executing queries :
To get the query memory usage of currently executing queries run the following:
SELECT
TEXT
,query_plan
,requested_memory_kb
,granted_memory_kb
,used_memory_kb
FROM sys.dm_exec_query_memory_grants emg
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
CROSS APPLY sys.dm_exec_query_plan(emg.plan_handle)
ORDER BY emg.requested_memory_kb DESC
Find the queries that have historically taken the most memory :
This query returns back the queries that use the most IO. This can mean that either the query is reading from disk more than usual or occupying and utilizing a large amount of buffer cache. These are typical symptoms of queries that do not have the proper indexes or queries that simply read a lot of data.
/**********************************************************
* top procedures memory consumption per execution
* (this will show mostly reports & jobs)
***********************************************************/
SELECT TOP 100 *
FROM
(
SELECT
DatabaseName = DB_NAME(qt.dbid)
,ObjectName = OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads
,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads
,Executions = SUM(qs.execution_count)
,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)
,CPUTime = SUM(qs.total_worker_time)
,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)
,MemoryWrites = SUM(qs.max_logical_writes)
,DateLastExecuted = MAX(qs.last_execution_time)
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
GROUP BY DB_NAME(qt.dbid), OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
) T
ORDER BY IO_Per_Execution DESC
/**********************************************************
* top procedures memory consumption total
* (this will show more operational procedures)
***********************************************************/
SELECT TOP 100 *
FROM
(
SELECT
DatabaseName = DB_NAME(qt.dbid)
,ObjectName = OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads
,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads
,Total_IO_Reads = SUM(qs.total_physical_reads + qs.total_logical_reads)
,Executions = SUM(qs.execution_count)
,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)
,CPUTime = SUM(qs.total_worker_time)
,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)
,MemoryWrites = SUM(qs.max_logical_writes)
,DateLastExecuted = MAX(qs.last_execution_time)
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
GROUP BY DB_NAME(qt.dbid), OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
) T
ORDER BY Total_IO_Reads DESC
/**********************************************************
* top adhoc queries memory consumption total
***********************************************************/
SELECT TOP 100 *
FROM
(
SELECT
DatabaseName = DB_NAME(qt.dbid)
,QueryText = qt.text
,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads
,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads
,Total_IO_Reads = SUM(qs.total_physical_reads + qs.total_logical_reads)
,Executions = SUM(qs.execution_count)
,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)
,CPUTime = SUM(qs.total_worker_time)
,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)
,MemoryWrites = SUM(qs.max_logical_writes)
,DateLastExecuted = MAX(qs.last_execution_time)
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
WHERE OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid) IS NULL
GROUP BY DB_NAME(qt.dbid), qt.text, OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
) T
ORDER BY Total_IO_Reads DESC
/**********************************************************
* top adhoc queries memory consumption per execution
***********************************************************/
SELECT TOP 100 *
FROM
(
SELECT
DatabaseName = DB_NAME(qt.dbid)
,QueryText = qt.text
,DiskReads = SUM(qs.total_physical_reads) -- The worst reads, disk reads
,MemoryReads = SUM(qs.total_logical_reads) --Logical Reads are memory reads
,Total_IO_Reads = SUM(qs.total_physical_reads + qs.total_logical_reads)
,Executions = SUM(qs.execution_count)
,IO_Per_Execution = SUM((qs.total_physical_reads + qs.total_logical_reads) / qs.execution_count)
,CPUTime = SUM(qs.total_worker_time)
,DiskWaitAndCPUTime = SUM(qs.total_elapsed_time)
,MemoryWrites = SUM(qs.max_logical_writes)
,DateLastExecuted = MAX(qs.last_execution_time)
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
WHERE OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid) IS NULL
GROUP BY DB_NAME(qt.dbid), qt.text, OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
) T
ORDER BY IO_Per_Execution DESC
identifying the database taking the most memory :
This query shows how many pages each database has in the buffer Pool. This will show you the breakdown of memory allocation for each database.
SELECT
DB_NAME(database_id),
COUNT(page_id)as number_pages
FROM sys.dm_os_buffer_descriptors
WHERE database_id !=32767
GROUP BY database_id
ORDER BY database_id
Disk
The biggest scapegoat for slowness is always the disk. (Yeah I said it). Yes, it is true that more often than not disks do not perform according to the ideal specifications. But aside from checkpoints and large data loads, it shouldn’t matter that much. Most customer facing OLTP systems I have worked with have been mostly read intensive. In almost all instances where the disk counters were indicative of under performing, it was a lack of both memory and indexing that contributed disk high disk queue length, thus hosing the disk. I would like to initially steer you clear of looking at the disk as being the problem.
Now, with that being said, here are the conditions where you will see disk bottlenecking:
1.In almost all cases, the CPU % is very low because it is waiting for the disk.
2.The server is write intensive, usually a data warehouse or a subscriber to a large replication.
3.Import processes are written using full truncates and full inserts rather than delta inserts.
4.Delete is used where truncate could be an option
Look here to help determine expensive IO read queries and determine expensive IO write queries
The only plausible instance where I needed management to change the disk was when IT setup a RAID 6 for my data drive.
So, if your CPU is low (less than 30%), and you verified the Page Life Expectancy (> 300) and cleaned the indexes, go ahead and blame the disk. But first, verify in perfmon under Performance Object -> Physical Disk:
1.% Idle Time < 30%
2.Avg Disk/Sec Transfer > .100
3.Disk Queue Length < 20
Those are the only ones I typically use because most of the times Windows is assuming it is looking at a local disk, when most corporate SQL Servers run off a SAN. Also, to get accurate results, the interval for perfmon should be under 5 seconds.
determine expensive IO read queries :
SELECT TOP 10
'Procedure' = qt.text
,DiskReads = qs.total_physical_reads -- The worst reads, disk reads
,MemoryReads = qs.total_logical_reads --Logical Reads are memory reads
,Executions = qs.execution_count
,CPUTime = qs.total_worker_time
,DiskWaitAndCPUTime = qs.total_elapsed_time
,MemoryWrites = qs.max_logical_writes
,DateCached = qs.creation_time
,DatabaseName = DB_Name(qt.dbid)
,LastExecutionTime = qs.last_execution_time
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
ORDER BY qs.max_logical_reads DESC
determine expensive IO write queries :
SELECT TOP 10
'Procedure' = qt.text
,DiskReads = qs.total_physical_reads -- The worst reads, disk reads
,MemoryReads = qs.total_logical_reads --Logical Reads are memory reads
,Executions = qs.execution_count
,CPUTime = qs.total_worker_time
,DiskWaitAndCPUTime = qs.total_elapsed_time
,MemoryWrites = qs.max_logical_writes
,DateCached = qs.creation_time
,DatabaseName = DB_Name(qt.dbid)
,LastExecutionTime = qs.last_execution_time
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
ORDER BY qs.max_logical_writes DESC
Paging
This is an issue that has cropped up with the advent of 64bit OS’s. Windows Server 2003 changed it’s priority for non OS processes. This severely impacted SQL Server and created a heck of a time for SQL Server support. What happens, is when the OS feels it is experiencing external memory pressure, it begins choosing victims to steal their memory. In which cases non system level process will find their working set swapped to the paging file. The most obvious indication of this is the amount of page file usage. It should preferably be under 8%. I have seen some SQL Servers run ok on 10% however this is pushing it.
To determine the page file usage open perfmon and select the Paging File performance object. Select both % Usage Peak and %Usage. Usage Peak is the peak amount that has how much has been used since the server last rebooted.
Paging File
To fix a paging file gone bad, unfortunately either requires a reboot, or sometimes you can get away with restarting SQL Server. The most important issue is why did it happen in the first place. Ideally, you want to provide the OS with enough room to breathe. And with 64-bit OS’s, this usually means a lot more memory than you are use to providing that for 32-bit. I only feel comfortable if my dedicated SQL Server box has a full 8 GB Free. Yes, that’s correct, 8 GIGA-BYTES. I know it sounds like a lot, however that is a valid threshold we determined with a SQL Server 2005 box that had trimming issues. (years from now people will be laughing at this post)
As for setup, it is recommended that (even on 64-bit OS’s) that you turn on AWE. I went around and around with this until I ended up speaking with someone from the highest level support at SQL Server Support. This person had access to the developers who looked up source code in helping with our issue. I explained to him that it doesn’t seem like we need AWE turned on because 64-bit can map the memory itself. He replied, I know, but just turn it on. And that was enough for me.
Some other things to keep in mind, are large data transfers. DO NOT use your SQL Server box to copy or move files across the network. Instead have another box come and get the file. Do not waste your precious memory on file operations. Here is a Microsoft KB: How to reduce paging of buffer pool memory in the 64-bit version of SQL Server
Process Trimming
This is very much related to paging and some of the same symptoms can be seen. What we were referring to regarding paging in the section above was Buffer Pool paging. Process trimming refers to working set paging. The difference however is subtle. It requires an explanation between the difference of “Private Bytes” and “Working Set”. The working set is the amount of memory the OS has granted to a process for usage. According to the process, it has the entire working set as it’s playground. However, the actual amount of physical memory it has is the private bytes which is a lower amount. The difference between the working set and the private bytes can be found in the glorious paging file. So if you notice a large disparity between private bytes and the working set, you are experiencing something known as process trimming. I hope you never have this issue, and you should not have this issue if you give the OS enough room to breathe. There are plenty of articles that delve deep into this subject so I will leave the experts to explain the specifics. The point is, give the OS enough room to breathe. Here is a performance team article on the subject: SQL and the Working Set
this article is refered from the below link
REF : http://sqlserverplanet.com/troubleshooting/sql-server-slowness
No comments:
Post a Comment