Event Type: Information
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 17883
Date: 12/17/2009
Time: 12:21:33 AM
User: N/A
Computer: XXXXXXXXX
Description:
Process 228:0:0 (0x8d0) Worker 0x00000003A92981C0 appears to be non-yielding on Scheduler 3. Thread creation time: 12905499755717. Approx Thread CPU Used: kernel 46 ms, user 1968 ms. Process Utilization 0%%. System Idle 99%%. Interval: 2719241 ms.
And:
Event Type: InformationAll with corresponding errors in the SQL log.
Event Source: MSSQLSERVER
Event Category: (2)
Event ID: 17884
Date: 12/17/2009
Time: 1:32:51 AM
User: N/A
Computer: XXXXXXXXX
Description:
New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 6960 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%%. System Idle: 99%%.
When this issue arises, most of the time the SQL service will not stop, it simply hangs and times out until the entire server is rebooted which is understandable given the issue with the spin locks and effect on schedulers; although if you are able to connect to the server (usually through DAC) and you recognize this issue soon enough, you can use the shutdown command to bounce the service.
After the normal research routines and opening a case with MS, we were informed that this is a bug initially identified in SQL 2005 pre SP1 (build 2039), however we did not see it until we applied CU 6 for SP3 (build 4226). The issue was identified as spin-lock contention on OPT_IDX_STATS as SQL tries to update the stats feeding sys.db_index_usage_stats for Tempdb and the fix/ workaround provided to us was to enable trace flag 2330 as a startup parameter and seeing as how this is an undocumented trace flag, I thought I would share what we found and subsequently confirmed with the support team.
Initially, we were told that TF2330 would only disable stats collection for objects within tempdb, however we noticed that our index usage stats for all databases were not being updated so we followed up with Support and they confirmed that this was the case; actual response is below:
This regards your case SR # 0000000. We checked the source code and found that TF 2330 just disables collection of index usage statistics to feed data for sys.db_index_usage_stats DMV. So after you enable this TF and restart the server, you should see empty rows in this DMV. I tested this and did not see any rows for this DMV. This data also feeds into the Missing Indexes DMV suggestions.According to MS Support, there are no plans to fix this in SQL Server 2005 as it was supposed to have been corrected in SQL 2008.
For example: sys.dm_db_missing_index_group_stats will also rely on this info and hence will not be available. I tested the sys.dm_db_missing_index_group_stats and confirmed as well. Can you trying creating the missing indexes and see if that helps without enabling the TF2330?
In summary these are the two dmvs that are being affected:
sys.db_index_usage_stats
sys.dm_db_missing_index_group_stats
The server info (identical for all for all affected servers) is:
Microsoft SQL Server 2005 - 9.00.4226.00 (X64)
Developer Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)
Update - 12/24/2009: Since I sat on this for a few months before posting, there has been documentation from MS and another recent blog posting here: http://www.pythian.com/news/5211/sql-server-more-light-shed-on-non-yielding-scheduler-and-indexes-stats/