Any idea why sp_MSdistribution_cleanup is so CPU intensive?

Kirby 2013-04-17 21:49:08

Specifically sp_MSsubscription_Cleanup is a dog and SQL Sentry clocked it at 1.6 billion page reads and over 3000 seconds of CPU time. This is Microsoft system code and running on a new Dell R820 with 24 cores.

Kirby 2013-04-17 22:30:44
Thank you. I will look at that link.
Kirby 2013-04-17 22:39:11
Good articles. We did make those changes earlier to the retention periodimmediate sync and our Distribution DB is fairly small at 3 GB. MSrepl_commands has 9 million records and MSrepl_transactions has 2.5 million. These are set to a cleanup of @min_distretention = 10, @max_distretention = 36.
SQLkiwi 2013-04-19 11:15:33
The primary cause is that Microsoft do not write good T-SQL or architect good databases (not by any means limited to replication stored procedures).

The query in question contains horribly nested correlated subqueries that the optimizer has no hope of unrolling from nested loops form. If you were to capture an execution plan with runtime statistics, you would find a very poor distribution of rows across threads in the parallel plan (quite likely all rows on one thread), and a simply enormous number of executions on the inner side of one or more nested loops joins.

The sheer number of inner-side executions is responsible for the high CPU usage, logical read count, and duration. If the code were better, the worst of the correlated loops joins would be replaced by a much more efficient (in this case) hash or merge joins and the thing would fly along. Sadly, we do not live in that particular universe.

Kirby 2013-04-19 15:25:38
Thank you SQLkiwi. Transactional replication is one of the things that keeps me up at night and if we continue with Microsofts current approach I don't think we can scale. Looking for alternatives to scaling out our application. Thanks again!