Do I need an index for a table that is used by a trigger and monitor?

Peachy 2015-06-29 15:00:39

I have a query in the following format:
select id
from table1
group by id
having count(id) < 2; table1 is a log table that gets written to by a trigger. We run a monitor every 2 hours and the query above is what the monitor executes. In about a week, the table has grown to approximately 100,000 rows. I am debating whether an index is necessary on the table? On one hand, I am thinking since this table is written to a lot, I want the least overhead possible. On the other hand, i'm thinking if it is worth indexing on the ID column for a monitor that runs every 2 hours? For now, the monitor takes about 80 milliseconds to execute. Can you give me some advice on what my best steps should be? Thank you
Aaron Bertrand 2015-06-29 17:45:51
Heaps are generally problematic and only useful in a few scenarios, IMHO. Seems to me like a clustered index on id might make sense to support that specific query pattern, but you may want a second column in the index (like a datetime column) to prevent SQL Server from injecting its own uniqueifiers. You might reverse that if you have other query patterns (like measuring how many log entries per day, or only get the entries for a specific range). Really, though, we can't know for sure what the optimal approach is, since we don't really know your schema, or the write side of the workload and the impact you would see there. These are things you'll have to test and that we can't possibly determine from an anonymized query plan.
Peachy 2015-06-29 18:18:52
Hello Mr. Bertrand,

Thank you so much for your response, I appreciate it a lot. Your statement about heaps helps me a lot. I get questioned at times for wanting to create clustered indexes on heaps that do not really have any exceptions to be a heap.

To give you some more details on the log table, it has an ID column (IDENTITY, INT), MesssageID column (INT), InsertedDate column (datetime), ProcessedDate column (datetime). This log table is written to when INSERT and DELETE triggers are fired from the main message table. The InsertedDate column of the log table will contain the date/time the message was created in the main table and the ProcessedDate column of the log table will contain the date/time the message was inserted or deleted. What the monitor is doing is making sure that for each message, there is an insert/delete pair logged. This is what tells us if the message has been processed.

I really like the idea of only getting entries for a specific range since it will reduce the amount of data I'm looking at each time. I've been testing and observing the InsertedDate column and I think I can use that to narrow down my working set.

That being said, I am thinking towards a clustered index on the (MessageId, ID) column. MessageId is not unique while ID is so that should make my clustered index unique. And then, a non clustered index on my InsertedDate column.

Do you think 2 sets of indexes on a log table that get written to by INSERT and DELETE triggers is a lot of overhead?

I am sorry if the information I have provided is anonymized or very limited. I am trying to make sure that I do not unintentionally disclose any data that I'm not allowed to =)

I truly value your opinion so I really appreciate you commenting on my post!