Tuesday, March 20, 2012

Could someone please explain the following...

As previous threads have indicated I've been trying to figure out what's slowing down our 3rd party ETL tool.

My research has figured out the culprit, but now I need some advice on the best way to handle it.

Here's what's going on...
SPID 1 = dynamic sql passed from Crystal
SPID 2 = ETL tool
SPID 3 = report in stored proc

Until a few days ago we didn't even know SPID 1 existed. These are different guys, using old version of Crystal passing crappy queries.

Here's the scenario
SPID 1 is running (inefficient select)
SPID 2 is waiting
30 minutes pass and SPID 2 is still waiting PAGEIOLATCH_SH
SPID 3 fires off

Now SPID 2 is blocking SPID 3, but SPID 2 is still waiting for SPID 1

Finally when SPID 1 finishes, everything catches up. But it appears that there is a period of at least 45 minutes when all SPID's are just sleeping.

My questions:
1. If SPID 1 is just running an inefficient select query, why is it holding up SPID 2 (Which reads and inserts). SPID 1 should have a shared lock on table, this would prevent SPID 2 from insert/update?
2. Is there any steps I can take to give SPID 2 precendence? or is it first come first serve?

Thanks in advanceSince SPID 1 has a very inefficient query, it is probably doing a table scan. This will incur a shared lock on the entire table. With a shared lock on the table, SPID2 must wait to do any insert/update/delete. SPID 2 can read all day long, but any write activity must wait until his intent lock is upgraded to an exclusive lock. SPID 3 is waiting, because it will need to get its intent shared lock upgraded to a shared lock, and the intent exclusive lock (from SPID2) is preventing that upgrade.

Now, if SPID 3 wanted to cash in enough frequent flier miles, they could bring a companion along, but only in Economy class, unless they brought no luggage, in which case they could travel Business class. Alternatively, they could upgrade to first class, but only on Yak Shaving day.

Does that help?|||Yes. Given that SPID 1's are out of our control. How would you try to rectify? I don't think you can use WITH NOLOCK through older Crystal? But that would be perfect solution right?|||Well wait a minute.
I've seen where SPID 1 blocks SPID 3 as well. How could that be? (SPID 2 is not running at all) and both are just selects. Spid 3 has been indexed and optimized while 1 has not.

I'm using sp_blocker_pss80|||How can 2 selects block each other?|||What is the waittype and waitresource on SPID3, then? It is possible, that they could be waiting on an OAM page in tempdb (resource 2:1:2). In which case, there is a hot fix out for it.|||OK, when we say spid 1, it's just an example right, I mean it's gotta be 50+...

In any case...

KILL 1

That should solve the problem|||Ok Here's what I've got. I did a search on the 2:1:2 and didn't find anything in the output file.
Here's a small excerpt from the proc...
There's actually about 15 more of the Spid 70s.

What causes multiple instances of Spid 70 to show up? At this point in time it looks like nothing is going on. Just a bunch of waiting.
63 sleeping 70 1 TAB: 7:1177771253 []
70 sleeping 0 0 7:1:2233567
70 sleeping 0 0 7:1:1699288
70 sleeping 0 0 7:1:3235145
70 sleeping 0 0 7:1:2014717

DBCC SQLPERF(WAITSTATS)
Wait Type Requests Wait Time Signal Wait Time
---------- ----- ----- ------
MISCELLANEOUS 15791.0 0.0 0.0
LCK_M_SCH_S 0.0 0.0 0.0
LCK_M_SCH_M 6.0 2075406.0 0.0
LCK_M_S 24.0 72611.0 65095.0
LCK_M_U 1.0 60031.0 0.0
LCK_M_X 1163.0 445734.0 330.0
LCK_M_IS 24.0 9562923.0 0.0
LCK_M_IU 0.0 0.0 0.0
LCK_M_IX 3.0 31750.0 0.0
LCK_M_SIU 0.0 0.0 0.0
LCK_M_SIX 0.0 0.0 0.0
LCK_M_UIX 0.0 0.0 0.0
LCK_M_BU 0.0 0.0 0.0
LCK_M_RS_S 0.0 0.0 0.0
LCK_M_RS_U 0.0 0.0 0.0
LCK_M_RIn_NL 0.0 0.0 0.0
LCK_M_RIn_S 0.0 0.0 0.0
LCK_M_RIn_U 0.0 0.0 0.0
LCK_M_RIn_X 0.0 0.0 0.0
LCK_M_RX_S 0.0 0.0 0.0
LCK_M_RX_U 0.0 0.0 0.0
LCK_M_RX_X 0.0 0.0 0.0
SLEEP 2.3878072E+7 4.067754E+9 4.0649859E+9
IO_COMPLETION 2665064.0 2.8815246E+7 87694.0
ASYNC_IO_COMPLETION 38.0 2488516.0 0.0
RESOURCE_SEMAPHORE 52.0 747943.0 250123.0
DTC 0.0 0.0 0.0
OLEDB 1546014.0 1.5816713E+9 4.9219424E+7
FAILPOINT 0.0 0.0 0.0
RESOURCE_QUEUE 8648136.0 3.5166684E+9 4.0521631E+9
ASYNC_DISKPOOL_LOCK 471.0 0.0 0.0
UMS_THREAD 0.0 0.0 0.0
PIPELINE_INDEX_STAT 2.0 0.0 0.0
PIPELINE_LOG 0.0 0.0 0.0
PIPELINE_VLM 0.0 0.0 0.0
WRITELOG 709526.0 1.2246787E+7 26791.0
PSS_CHILD 0.0 0.0 0.0
EXCHANGE 7204.0 165437.0 1329.0
XCB 0.0 0.0 0.0
DBTABLE 0.0 0.0 0.0
EC 0.0 0.0 0.0
TEMPOBJ 0.0 0.0 0.0
XACTLOCKINFO 0.0 0.0 0.0
LOGMGR 0.0 0.0 0.0
CMEMTHREAD 754511.0 17897.0 11913.0
CXPACKET 4.5454468E+7 1.99537E+9 1.0079291E+7
PAGESUPP 59543.0 166747.0 1616.0
SHUTDOWN 0.0 0.0 0.0
WAITFOR 1821.0 5.4635088E+7 5.4635088E+7
CURSOR 0.0 0.0 0.0
EXECSYNC 20.0 0.0 0.0
LATCH_NL 0.0 0.0 0.0
LATCH_KP 0.0 0.0 0.0
LATCH_SH 1808.0 424271.0 46.0
LATCH_UP 26838.0 4159827.0 595.0
LATCH_EX 2.3463176E+7 1.0278586E+8 984947.0
LATCH_DT 0.0 0.0 0.0
PAGELATCH_NL 0.0 0.0 0.0
PAGELATCH_KP 340.0 0.0 0.0
PAGELATCH_SH 5.644284E+7 664660.0 274523.0
PAGELATCH_UP 2996847.0 2414701.0 66588.0
PAGELATCH_EX 4.1701784E+7 438060.0 174322.0
PAGELATCH_DT 0.0 0.0 0.0
PAGEIOLATCH_NL 0.0 0.0 0.0
PAGEIOLATCH_KP 0.0 0.0 0.0
PAGEIOLATCH_SH 4.9351192E+7 4.4433366E+8 792880.0
PAGEIOLATCH_UP 24098.0 3505965.0 632.0
PAGEIOLATCH_EX 6859974.0 3.7601116E+7 40328.0
PAGEIOLATCH_DT 0.0 0.0 0.0
TRAN_MARK_NL 0.0 0.0 0.0
TRAN_MARK_KP 0.0 0.0 0.0
TRAN_MARK_SH 0.0 0.0 0.0
TRAN_MARK_UP 0.0 0.0 0.0
TRAN_MARK_EX 0.0 0.0 0.0
TRAN_MARK_DT 0.0 0.0 0.0
NETWORKIO 6047450.0 1.2257828E+8 0.0
Total 2.706583E+8 1.1991903E+10 8.2338621E+9|||OK, when we say spid 1, it's just an example right, I mean it's gotta be 50+...

In any case...

KILL 1

That should solve the problem

Yes just an example. And I wish it could be that simple.|||to answer your question they are PAGEIOLATCH_SH wait types|||I think that only one person at a time can have a latch, so if one person is sitting on that latch, but stuck waiting for a mass update, then you could see this behaviour. I will have to go look that up in Kalen's book.

As for SPID 70, the most common cause is Parallelism. See if the waittype is CX_PACKET, or some such. The waittype should be 0x208.

No comments:

Post a Comment