Filtering records not present with specific conditions in another table

Dmitry 2013-06-13 22:44:08

I have a table (let's call it extra_info) in MySQL that has various extra infomation/labels on user accounts:

account_id / data_key
11 / 'test'
11 / 'blah'
12 / 'coupon a'
12 / 'coupon b'
15 / 'whatever'
18 / 'test'

And so on.

I am building reports on other tables but need to filter out records that are present in the extra_info table as 'test'.

So in the example above, records 1-10, 13-14, 16-17 are fine because they are not in extra_info, 11 needs to be filtered out (it is 'test'), 12 is fine (but obviously just need to have it once even though is has 2 labels, 15 is fine (not 'test'), and 18 needs to be removed.

I have done this with subquiry in the WHERE clause:

WHERE
— filter out test labels
a.id NOT IN(
SELECT ei.account_id FROM extra_info AS ei
WHERE ei.data_key = 'test')

This does its job, however, I have a suspicion that a regular join would have worked faster. What's the right syntax for it?

If I just join and filter out the ones with data_key='test', #11 would still be there – so this is not a solution. Grouping by account_id will probably has performance implications and basically same impact as the current code, right?

Any ideas?

Aaron Bertrand 2013-06-13 22:58:01
Hi Dmitry, if this is a question about MySQL, you might be in the wrong place – we're a SQL Server shop, focusing on SQL Server performance issues. I could probably take stabs at your question but you'd be better served on a forum with MySQL experts, like dba.stackexchange.com.

Cheers,
Aaron

Dmitry 2013-06-13 23:04:15
Thanks Aaron! I cross posted there
Jonathan Kehayias 2013-06-13 23:00:41
Hey Dmitry,

While SQL as a language is fairly universal, being someone that uses MySQL, I can tell you that there are differences between how MySQL does things and how SQL Server does things. Since your question is for MySQL this is probably not the best place for your question and I'd suggest posting it over on StackOverflow.com or one of their other sites like DBA.StackExchange.com where you will be more likely to get a MySQL appropriate solution. Using NOT EXISTS instead of NOT IN would help you with removing #11, but your data model and requirement to return only one version of #12 would be implementation specific to MySQL for the best way to do it.

Cheers!