Replacing string Primary Key with Int

SqlTeffy 2015-11-07 17:31:23

I am scaling an old application in which all old tables have string primary keys. For example: Order_Id which looks somewhat like this: ORD:IND:2013:2014-2015

I read a post on Stack Overflow which says DataType int as primary key is faster as compared to strings (like the one mentioned below). I don't have the link of that post so that I can share with you all.

I want to know whether performance really gets affected with the type of value and data type of the primary key?

Aaron Bertrand 2015-11-12 14:28:37
> I want to know whether performance really gets affected with the type of value and data type of the primary key?

Certainly, there are many different aspects of this.

One is that it looks like there are multiple pieces of data combined into your single primary key string. Do you ever have to perform searches for things like all orders with 'IND' or '2014' in the string? This will have to use pattern matching and will undoubtedly have to perform a scan on the whole table.

Another is size – ints are more compact, and since the clustering key (assuming your PK is the clustered index) is repeated in every non-clustered index, a wider key can have a real domino effect on space. And while disk space is cheap, many people stop there, but you have to consider the cost of storing all of that extra data in memory, too, as well as the additional reads that will be required for every operation. And since most workloads and hardware are still I/O-bound (lucky people with Pure / Fusion-io aside), more reads = worse performance.

I did quite a lot of research on this in a post earlier this year, and though I focused on GUID vs. INT, some of the concepts apply equally well to the type of string you are using (even if it sorts well).

Bad habits : Focusing only on disk space when choosing keys

Now, I am not going to say you should definitely use INTs as primary keys in your specific case – way too many variables for that. But I did want to confirm that, yes, performance is one very tangible factor you'll want to use when deciding what type of key you want to use.

SqlTeffy 2015-11-15 11:28:39
Thanks Aaron. I also found a post on MSDN forums which states string can have collation issues as well.
SDyckes2 2015-11-12 14:47:49
Yes performance will be affected by the size of the primary key (PK), and the data types will dictate the size of the PK. The performance impact will be minimized on smaller data sets, but as the table and PK grow in size, the performance impact becomes more noticeable.

Changing the PK to utilize INT data types instead of strings, you are in effect compressing the index. In either case, you are reducing the size of each row of the PK, which allows more rows to fit on each 8K page. When you need to access this information, you will then be able to read fewer pages into memory to retrieve the needed data. Apply this to scale, many millions or billions of records, you may read a large percentage fewer pages into memory to retrieve your record set, which will directly affect performance.

Kimberly Tripp has a great post, How much does that key cost?, I recommend you check it out.

SqlTeffy 2015-11-15 11:29:17