Really bad performing query with fulltext

Barry 2013-05-15 09:40:11

Joining multiple fulltext tables , each with a large volume of data. The query is dynamic sql. Help!!

Aaron Bertrand 2013-05-15 11:53:34
Hi Barry,

There seem to be some statistics issues, shown by the discrepancies between estimated rows and actual rows. In some cases there isn't going to be much you can do about this; for example I suspect the statements where estimated rows = 1 are marked as such simply because full-text operators can act like UDFs/table variables (no valid stats). But for Statement2, Statement5 and Statement7 it is possible that these discrepancies are due to out-of-date statistics.

There is also a table scan in Statement4 (is this a heap? on purpose?), and if this is meant to be a table scan there appears to be a case to create column-level statistics on Column23, Column24 and Column25. There is also a table scan in Statement5, and another on Statement6 which also could benefit from statistics on Column28. More table scans on Statement7, 8 and 9. Whether focusing on any of this makes sense is hard to tell since I don't know if these are real tables, temporary tables, staging tables etc. and without more data I don't know how much the stats will actually help.

Statement2 really seems like the place where you want to focus. There is a sort which I'm not sure is necessary, and a lot of hash match / merge joins that are more significant as a whole than individually. I also see four references to a table-valued function ("Table8") and I'm not sure if this is necessary against four different operators in the same branch – especially since they seem to repeatedly be joined against the same table ("Table2").

Since you've anonymized the statements it is going to be tough to offer specific guidance on the batch or query logic (it is certainly possible that there are some inefficiencies there). Also, since this is an estimated plan, it is difficult to tell where all the time is being spent (though I/O is pretty clearly focused on Statement2).

Ideally if you provide a non-anonymized actual plan we can draw a lot more from it, but I understand if you can only provide an anonymized actual plan (which will still be much better than an estimated plan). I see there are actual rows but I see nothing for duration / CPU / reads.