Worried about PLE

Jay 2014-04-28 03:08:43

We've been having some disk read latency issues here and this prompted me into investigating some other stats on my poor neglected servers. I opened 3 of them in performance advisor and all 3 had a PLE of between 10 and 12.
Thinking this very low I googled and saw the debate around 300 or not 300.
I also found some rather heavy reading material and wondered what other specific stats should I be looking to other than a "get more RAM" response?
The servers have 16GB RAM each which is Maxed to 13.5GB for SQL. None of the applications appear to be suffering any performance issues.
I've included a screen shot of PA. Let me know which other stats could help/hinder PLE. Could the current disk read latency issues (with our SAN) have any connection?alt text
Justin 2014-04-28 12:18:14
Hi Jay,

Performance Advisor continues to show PLE in seconds. The screen shot you provided (sample mode) shows a server nearly at rest. PLE is 12,100, there are almost no waits, few page faults and no disk latency. A screen shot of the dashboard in history mode showing a 10 – 30 minute time frame when low PLE has occurred would be more instructive.

Are the low PLE values sustained or short term (what does PLE look like when the PA dashboard is in history mode covering a time period of several hours. Is there a recurring pattern)?

When you see a drop in PLE, check for concurrent high CPU, high waits and/or high disk latency. If none of those are occurring then a query or some other operation (CHECKDB, index maintenance, etc.) has caused memory pages to be flushed and new pages brought in, and the server is responding appropriately. If PLE returns to higher levels, there probably isn't a problem.

You can find out what is causing the drop in PLE by looking at the calendar and/or Top SQL tabs in SQL Sentry.

@SQLTrooper 2014-04-28 13:49:04
Good catch Justin – I didn't realize that was a comma. It looks like a decimal. 12,000+ (200 minutes) isn't bad.
Jay 2014-04-28 18:37:38
I too thought it was a decimal hence my panic, thanks! I can rest easy now
@SQLTrooper 2014-04-28 03:13:52
If your PLE is that low, you have memory pressure. It has very little to do with your SAN at that point – as the data being read in/out of memory is at a rate that your server cannot support. 300 is an arbitrary number – Paul Randall talks about it quite often, and in this day and age 300 is way too low.

https://www.sqlskills.com/blogs/paul/page-life-expectancy-isnt-what-you-think/

You need more memory.

Jay 2014-04-28 04:32:02
That's what I thought (way too low) but the apps are responding fine.
One of the servers with a PLE of 5.5(!!) is a sharepoint farm DB which is heavily used and running perfectly which is what made me wonder if PA was using minutes rather than seconds in its new version 🙂
Hans 2014-04-28 06:18:48
Hi page life expectancy, how long the server expects the datapage to stay in memory. So sqlsrv expects the datapages to be only 10 seconds in memory before its overwritten. Say that you have 10GB of databuffer that means the server expects to read 10 000MB of data every 10 seconds which is 1000MB per second. If this PLE is normal for your server you should investigate it. You should find out what queries reads all this data pages and tune them. Also invesrigate how much of the sql srv memory is used for databuffer. And if you have numa architecture on the server, PLE is for each numa node's memory.
GregGonzalez 2014-04-30 20:04:20
Jay,
Sounds like you already figured this out, but in history mode the Y-axis is showing PLE in thousands of seconds, hence the decimal and "K" after.

Note that SQL Sentry v8 ships with a custom condition which uses Jonathan Kehayias' adaptable PLE formula, which is a much better approach than the old 300 seconds. I blogged about it here:

http://blogs.sqlsentry.com/GregGonzalez/sql-sentry-v8-intelligent-alerting-redefined/

Jay 2014-04-30 20:11:22
Hi Greg, it was sample mode that I was looking at. When I mouseover the PLE the tooltip says page life expectancy (seconds).
Apparently it is a comma after the (currently) 30. But it is very small and I mistook it for a decimal point. So when I look at it now I see 30.744 (as did Justin) when apparently it reads 30,744
GregGonzalez 2014-04-30 20:21:34
Ah, gotcha. Yes, that is an issue with the new gauge controls in v8… the comma is much smaller than previous, and attempts to enlarge it have thus far been unsuccessful. We'll keep at it.

Really the best way to look at PLE is in history mode so you can ascertain patterns. As has been mentioned, certain types of ops are always going to cause buffer churn and low PLE, but as long as it quickly returns to high levels it should not generally be a concern.

To expound on what Justin said, if you highlight a range a minute or two around a drop, then Jump To the calendar or Top SQL, you can usually find the culprit quickly. If it turns out to be a poorly designed user query and/or associated supporting indexes, there may indeed be an optimization opportunity.