Thursday, March 22, 2012

Couldn't establish trusted connections after SQL memory problems

Has anyone seen anything like this before? One of our production servers lost
the ability establish any trusted connections list night from 5:47 until we
rebooted it this morning. From that point on all attempts for SQL jobs
running under domain ids on the system and any BizTalk processes trying to
connect failed with the message "Login failed for user '(null)'. Reason: Not
associated with a trusted SQL Server connection.". The couple jobs that we
have running under standard Standard users (other than sa) had no problems.
None of our other servers had problems and the network looked ok, so I don't
think it was a domain or network connectivity problem. The reboot resolved
the problem, so it doesn't look like anything related to the domain userids
themself had changed, just SQL suddenly started failing to be able to use
them.
Looking in the SQL Server Log I see the following messages logged (all from
the same spid) at the same time all this started. I'm interpreting these as
meaning the server ran out of memory, but can not think of any reason why
this could have caused trusted connections to fail from that point on. BTW -
We have the latest service packs and patches (ver 8.00.818) installed. Any
ideas? Has anyone else ever seen anything like this before?
Query Memory Manager: Grants=0 Waiting=0 Maximum=150632 Available=150632
Global Memory Objects: Resource=2705 Locks=93
SQLCache=102 Replication=2
LockBytes=2 ServerGlobal=45
Xact=215
Dynamic Memory Manager: Stolen=9211 OS Reserved=16632
OS Committed=16593
OS In Use=12841
Query Plan=3349 Optimizer=0
General=14718
Utilities=13 Connection=3870
Procedure Cache: TotalProcs=922 TotalPages=3400 InUsePages=1759
Buffer Counts: Commited=208408 Target=208408 Hashed=198709
InternalReservation=201 ExternalReservation=0 Min Free=128
Buffer Distribution: Stolen=5811 Free=488 Procedures=3400
Inram=0 Dirty=46274 Kept=0
I/O=0, Latched=35, Other=152400
WARNING: Failed to reserve contiguous memory of Size= 65536.The message
WARNING: Failed to reserve contiguous memory of Size= 65536.
shows that you have some pressure it seems in your MemoryToleave area of
sql server's address pool.
Here are a couple of things regarding this error. Try tshooting this, else
you might want to call into PSS and open up a support case -
There are two main areas of memory within SQL Server's address space, the
buffer pool (BPool) and a second memory pool sometimes called the
"MemToLeave" area. The contents of the SQL Server buffer pool include
cached table data, workspace memory used during query execution for
in-memory sorts or hashes, most cached stored procedure and query plans,
memory for locks and other internal structures, and the majority of other
miscellaneous memory needs of the SQL Server. SQL Server 7.0 introduced
dynamic memory management to the buffer pool, which means that the amount
of memory under SQL Server's direct control may grow and shrink in response
to internal SQL Server needs and external memory pressure from other
applications. It is normal for the size of the SQL Server buffer pool to
increase over time until most memory on the server is consumed. This
design can give the false appearance of a memory leak in the SQL Server
buffer pool when operating under normal circumstances. For more detailed
information please reference the Books Online articles "Server Memory
Options", "Memory Architecture", and (SQL Server 2000 only) "Effects of min
and max server memory".
The other significant memory area is sometimes called MemToLeave, and it is
primarily used by non-SQL Server code that happens to be executing within
the SQL Server process. The MemToLeave area is memory that is left
unallocated and unreserved, primarily for code that is not part of the core
SQL Server and therefore does not know how to access memory in the SQL
Server buffer pool. Some examples of components that may use this memory
include extended stored procedures, OLE Automation/COM objects, linked
server OLEDB providers and ODBC drivers, MAPI components used by SQLMail,
and thread stacks (one-half MB per thread). This does not just include the
EXE and .DLL binary images for these components; any memory allocated at
runtime by the components listed above will also be allocated from the
MemToLeave area. Non-SQL Server code makes its memory allocation requests
directly from the OS, not from the SQL Server buffer pool. The entire SQL
Server buffer pool is reserved at server startup, so any requests for
memory made directly from the operating system must be satisfied from the
MemToLeave area, which is the only source of unreserved memory in the SQL
Server address space. SQL Server itself also uses the MemToLeave memory
area for certain allocations; for example, SQL Server 7.0 stores procedure
plans in the MemToLeave area if they are too large for a single 8KB buffer
pool page.
How to Determine Whether the Error Points to a Memory Shortage in Buffer
Pool or MemToLeave
If the 17803 error in the SQL Server errorlog is accompanied by one of the
following error messages, the memory pressure is most likely in the
MemToLeave area (see section "Troubleshooting MemToLeave Memory Pressure"
below). If the messages below do not appear with the 17803, start from
section "Troubleshooting Buffer Pool Memory Pressure".
Errors that imply insufficient contiguous free memory in the MemToLeave
area:
WARNING: Failed to reserve contiguous memory of Size= 65536.
WARNING: Clearing procedure cache to free contiguous memory.
Error: 17802, Severity: 18, State: 3 Could not create server event
thread.
SQL Server could not spawn process_loginread thread
Server: Msg 7399, Level 16, State 1, Line 1
OLE DB provider 'SQLOLEDB' reported an error. The provider ran out of
memory.
Troubleshooting MemToLeave Memory Pressure
If the 17803 and associated errors point to a shortage of memory in
MemToLeave, the items below list some of the most common causes of this
type of problem and suggest ways to alleviate the issue.
- If the server runs many concurrent linked server queries against SQL
Servers using SQLOLEDB or MSDASQL, upgrade to MDAC 2.5 or MDAC 2.6 on
the server. The MDAC 2.5 and 2.6 versions of SQLOLEDB are more
conservative with their initial memory allocations.
Perhaps the most common cause of out-of-memory conditions in the MemToLeave
area is a memory leak in a non-SQL Server component running inside the SQL
address space. Items to check:
- Check the errorlog for messages like "Using 'abc.dll' version '123' to
execute extended stored procedure 'xp_abc'." If the dll referenced in this
message is not a Microsoft-provided DLL you may want to consider moving it
out of the production server's address space
- Determine whether COM objects are being executed inside SQL Server's
address space with the sp_OA stored procedures. If sp_OA is being used you
will see ODSOLE70.DLL which hosts sp_OACreate being loaded and the
following message in the errorlog:
Using 'odsole70.dll' version '2000.80.382' to execute extended stored
procedure 'sp_OACreate'.
If sp_OA stored procedures are being used, ensure that the COM objects
are being loaded out of process by passing 4 to the optional third
parameter for sp_OACreate (e.g. "EXEC sp_OACreate 'SQLDMO.SQLServer', @.obj
OUTPUT, 4").
- If linked servers using third-party OLEDB providers or ODBC drivers are
in use, these are also a possible cause of memory leaks. Evaluate whether
the linked servers can be disabled for a time as a troubleshooting step to
see whether this prevents the leak, or examine whether the provider is
still fully functional once it has been configured to run out of process by
setting the AllowInProcess registry value for the provider to 0 (the
AllowInProcess value can be found at HKLM\Software\Microsoft\MSSQLServer(or
MSSQL$instance key)\Providers\[ProviderName]).
- If the server supports heavy linked server activity or must run
memory-hungry non-SQL Server code inside the SQL Server process, you
may simply need to adjust the size of the MemToLeave area to make more
memory available to non-SQL Server memory consumers. "-g" is an
optional SQL Server startup parameter that can be used to increase the
size of the MemToLeave area. The default -g memory size is 128MB in SQL
Server 7.0 and 256MB in SQL Server 2000. You can increase the size of
the MemToLeave area by an additioal 128MB by adding -g256 (SQL 7.0) or
-g384 (SQL 2000) as a server startup parameter. This setting will take
effect the next time the SQL Server service is started. Startup
parameters are added in the "General" tab of the Server Properties
dialog in Enterprise Manager.
Troubleshooting Buffer Pool Memory Pressure
Because of SQL Server's dynamic memory managment, it is not unusual for a
significant leak in the MemToLeave area to initially manifest itself as a
shortage of buffers in buffer pool because SQL Server will dynamically
scale down the size of the buffer pool in response to the growing number of
bytes committed within the MemToLeave area. Similarly, a memory hungry or
leaking application running on the same box can cause SQL Server to release
almost all of its BPool memory, leading to a 17803. To rule out these two
alternatives and confirm that the problem is confined to BPool, start by
looking at the Performance Monitor log you collected.
If the counter "Process(sqlservr):Working Set" was much lower than the
amount of physical RAM on the server while the insufficient memory errors
were occurring, look for another process that holds most of the memory and
pursue that process as the root of the memory pressure.
If "Process(sqlservr):Working Set" accounts for most or nearly all of the
physical RAM on the server but if "SQL Server:Memory Manager:Target Server
Memory(KB)" is only a fraction of this amount of memory, the root cause of
the problem is likely a leak in MemToLeave that had the side effect of
causing skrinkage of BPool. In this case follow the suggestions in the
previous section "Troubleshooting MemToLeave Memory Pressure".
If the root cause of the problem is a leak in or excessive demand for bpool
memory, the following should generally be true:
- Buffer pool should have already been grown to its maximum size. In
other words, "SQL Server:Memory Manager:Total Server Memory(KB)" should
be equal to "SQL Server:Memory Manager:Target Server Memory(KB)".
- Buffer pool should consume the majority of physical RAM allocated to
the SQL process ("SQL Server:Memory Manager:Target Server Memory(KB)"
should account for the majority of "Process(sqlservr):Working Set".)
- There should be high lazywriter activity ("SQLServer:Buffer Manager -
Lazy Writer Buffers/sec"). If the problem appears to be BPool memory
pressure and you see no lazywriter activity, something may be blocking
lazywriter. Consider getting one or more DBCC STACKDUMPs while in this
state.
If you have determined that the problem is a leak or excessive demand for
buffer pool memory, use perfmon to determine what is consuming the most
buffer pages. Counters to examine include:
- "SQLServer:Buffer Manager - Cache Size (pages)" (procedure cache)
- "SQLServer:Cache Manager - Cache Pages(Adhoc/Cursor/Stored Proc/etc)"
- "SQLServer:Memory Manager - Granted Workspace Memory (KB)" (query
memory)
- "SQLServer:Memory Manager - SQL Cache Memory (KB)" (cached data pages)
- "SQLServer:Memory Manager - Lock Memory (KB)"
- "SQLServer:Memory Manager - Optimizer Memory (KB)"
- "SQLServer:Buffer Manager - Stolen Pages"
Hope that helps!
Vikram Jayaram
Microsoft, SQL Server
This posting is provided "AS IS" with no warranties, and confers no rights.
Subscribe to MSDN & use http://msdn.microsoft.com/newsgroups.|||SQL Server cache's connection information, in the MEM TO LEAVE region. If
that was the process that was attempting to make a memory reservation in
this region, then that would account for your errors.
What is happening to cause this is too many external memory processes
executing on the server. Although SQL Server has wonderful Dynamic Memory
managers for the internal processes, it has to appeal to the OS to manage
the external reservation calls. Over time, the MEM TO LEAVE region get
fragmented and the only fix is to clear them segments. Thus, at least a
recycling of the SQL Server services if not a reboot of the system.
You can counter balance this by reducing the number of external process
calls and/or adjust the amount of memory left to the MEM TO LEAVE region
using the /g startup parameter. Howerver, the sizing of this parameter
needs to be adjusted with the assistance of the PSS staff.
A good first start, however, would be to increase from the default for this
parameter from 256 to 384, which represents MB removed from the Buffer Pool
allocation in addition to what is reserved for the UMS Worker threads, 128
MB with the default 256 threads configuration.
Sincerely,
Anthony Thomas
"Jeff Turlington" <Jeff Turlington@.discussions.microsoft.com> wrote in
message news:D9783602-FAD2-4730-BB96-FB7CD2438605@.microsoft.com...
Has anyone seen anything like this before? One of our production servers
lost
the ability establish any trusted connections list night from 5:47 until we
rebooted it this morning. From that point on all attempts for SQL jobs
running under domain ids on the system and any BizTalk processes trying to
connect failed with the message "Login failed for user '(null)'. Reason: Not
associated with a trusted SQL Server connection.". The couple jobs that we
have running under standard Standard users (other than sa) had no problems.
None of our other servers had problems and the network looked ok, so I don't
think it was a domain or network connectivity problem. The reboot resolved
the problem, so it doesn't look like anything related to the domain userids
themself had changed, just SQL suddenly started failing to be able to use
them.
Looking in the SQL Server Log I see the following messages logged (all from
the same spid) at the same time all this started. I'm interpreting these as
meaning the server ran out of memory, but can not think of any reason why
this could have caused trusted connections to fail from that point on. BTW -
We have the latest service packs and patches (ver 8.00.818) installed. Any
ideas? Has anyone else ever seen anything like this before?
Query Memory Manager: Grants=0 Waiting=0 Maximum=150632 Available=150632
Global Memory Objects: Resource=2705 Locks=93
SQLCache=102 Replication=2
LockBytes=2 ServerGlobal=45
Xact=215
Dynamic Memory Manager: Stolen=9211 OS Reserved=16632
OS Committed=16593
OS In Use=12841
Query Plan=3349 Optimizer=0
General=14718
Utilities=13 Connection=3870
Procedure Cache: TotalProcs=922 TotalPages=3400 InUsePages=1759
Buffer Counts: Commited=208408 Target=208408 Hashed=198709
InternalReservation=201 ExternalReservation=0 Min Free=128
Buffer Distribution: Stolen=5811 Free=488 Procedures=3400
Inram=0 Dirty=46274 Kept=0
I/O=0, Latched=35, Other=152400
WARNING: Failed to reserve contiguous memory of Size= 65536.

No comments:

Post a Comment