SQL Server 2016 internal memory pressure leads to the plan cache clearing

Two weeks ago we migrated into SQL Server 2016 SP1 CU3 (previously our app was using SQL Server 2014 SP2). The application hasn't been upgraded or changed nor the workload has changed.

Since the migration, every few minutes up to more or less one hour, the proc cache is being flushed (not entirely, but still the majority of plans go away). If I run:

 SELECT count (*) FROM sys.dm_exec_cached_plans

...just after clearing happens, the number of plans drops to 1 up to 3 hundred and then gradually increase to more or less 2,000, then usually clearing happens again and so on. It worth to mention that Buffer Pool seems to stay intact and only caches are affected.

The server runs on VMWare, it has 128 GB of RAM (SQL Server max server memory is set to 102 GB, min. server memory is set to 72 GB). Based on the output from SentryOne, I can see that buffer pool consumes ~61 GB.

SQL Server memory usage

My SQL Server version is as follows:

Microsoft SQL Server 2016 (SP1-CU3) (KB4019916) - 13.0.4435.0 (X64)

It's a Standard Edition.

Finally, I came across an article by Jonathan Kehayias and I decided to check ring buffers and boom!, it turns out that I have notifications from resource monitors saying: 'low physical memory'. Occurrences of this notification fit perfectly to the proc cache clearing. Now the case is how to interpret these results and how to find the responsible process. As you can find in the query result:

     SELECT 
     EventTime,
     record.value('(/Record/ResourceMonitor/Notification)[1]', 'varchar(max)') as [Type],
     record.value('(/Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') as [IndicatorsProcess],
     record.value('(/Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') as [IndicatorsSystem],
     record.value('(/Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [Avail Phys Mem, Kb],
     record.value('(/Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [Avail VAS, Kb] FROM (
     SELECT
         DATEADD (ss, (-1 * ((cpu_ticks / CONVERT (float, ( cpu_ticks / ms_ticks ))) - [timestamp])/1000), GETDATE()) AS EventTime,
         CONVERT (xml, record) AS record
     FROM sys.dm_os_ring_buffers
     CROSS JOIN sys.dm_os_sys_info
     WHERE ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR') AS tab ORDER BY EventTime DESC;

Ring buffer query results

we can observe 'low physical memory' flag although the value of Available Physical memory stays at the same level. Moreover, if I am not mistaken above results are indicating internal memory pressure (IndicatorsProcess = 2, which is weird to me as Sentry all the time shows that SQL Server doesn't fully utilize the allocated memory. This is memory usage captured by Sentry for a sample taken at 8 AM:

SQL Server memory usage

All lines are pretty flat. What is also weird to me is that group of events:

  • RESOURCE_MEMPHYSICAL_LOW

  • RESOURCE_MEM_STEADY

  • RESOURCE_MEMPHYSICAL_HIGH

happens at the same time. So this pressure takes milliseconds or less (perhaps this is also the reason why Sentry doesn't capture anything as it collects data a way less frequently).

I tried to find the reason behind this internal pressure and I checked top 10 memory clerks (in terms of memory consumption) to see if there are any heavy consumers there:

Memory clerks query results

but to be fair I don't see anything suspicious there.

Another thing worth to say is I haven't tried LPIM yet, as it requires SQL Server to be restarted, but even if it's the solution I would really like to understand why this issue happens. Moreover, please correct me if I am wrong, but as buffer pool seems not to be affected by trimming I don't really think LPIM is a solution here.

Now I am completely lost and I don't really know what else should I check in order to find the root cause of the issue. Perhaps I can utilize Sentry One in a better way to investigate this issue. I would really appreciate if some can help me to solve this puzzle.

sentrymemory.png (42.6 kB)
ringbuffer.png (173.9 kB)
sentrymemory-2.png (108.6 kB)
memory-clerks.png (39.1 kB)
avatar image By gopara 1 asked Aug 03 at 12:56 PM
more ▼
(comments are locked)
10|10000 characters needed characters left

1 answer: sort voted first

Eventually, LPIM made a trick and solved / masked the issue, but the root cause is still uncertain and it might be even a bug in SQL Server.

If anyone is interested in further findings please navigate to the discussion on MSDN forum: SQL Server 2016 memory pressure leads to the plan cache clearing

avatar image By gopara 1 answered Aug 10 at 06:39 AM
more ▼
(comments are locked)
10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.

Follow this question

Topics:

x20
x15
x1
x1

asked: Aug 03 at 12:56 PM

Seen: 63 times

Last Updated: Aug 10 at 06:39 AM