Constructing multiple two-way contingency tables currently used multiple scans. Can some of those be combined using windowing functions?

I'm accumulating results for multiple contingency tables, one for each Test. As an example:

 Contingency Table for TestA (one for each test, and for each word)
                 EventsWithWord  EventsWithoutWord
 TestA passed          n11                 n10         
 TestA failed          n01                 n00   

   

For analysis, I need the number of events matching each of the following categories (this is how R expects the data results):

  • TestA passed for each word: n11

  • TestA results available for each word: n_1 (sum of n11 and n01)

  • TestA passed: n1_ (sum of n11 and n10)

  • TestA results available: n__

Table "LeapFrogResults": columns: FacilityID nvarchar(20) NOT NULL, TestName nvarchar(255) NOT NULL, FacilityResults bit NULL

Table "Data" (one row for each event): Columns: EventID int NOT NULL, FacilityID nvarchar(20) NOT NULL

Table "WordEventMap": Columns: EventID int NOT NULL, Word nvarchar(255) NOT NULL

Is there a way of combining some of the queries (e.g., with windowing functions)? Some of these tables are very large--40 million rows currently in WordEventMap and 1.3 million rows in Data, and I expect the results table will have about 4 million rows (40 different tests, 100,000 different words).

Current query:

 WITH CTE1
 AS (SELECT
   COUNT(DISTINCT Data.Id) AS n__,
   LeapFrogResults.TestName
 FROM LeapFrogResults
 INNER JOIN Data
   ON LeapFrogResults.FacilityID = Data.FacilityPsoProviderId
 WHERE LeapFrogResults.FacilityResults IS NOT NULL
 GROUP BY LeapFrogResults.TestName),
 CTE2
 AS (SELECT
   COUNT(DISTINCT Data.[Id]) AS n11,
   LeapFrogResults.TestName,
   DataWordMap.Word
 FROM LeapFrogResults
 INNER JOIN Data
   ON LeapFrogResults.FacilityID = Data.FacilityPsoProviderId
 INNER JOIN DataWordMap
   ON Data.[Id] = DataWordMap.EventID
 WHERE LeapFrogResults.FacilityResults = 1
 GROUP BY LeapFrogResults.TestName,
          DataWordMap.Word),
 CTE3
 AS (SELECT
   COUNT(DISTINCT Data.[Id]) AS n1_,
   LeapFrogResults.TestName
 FROM LeapFrogResults
 INNER JOIN Data
   ON LeapFrogResults.FacilityID = Data.FacilityPsoProviderId
 WHERE LeapFrogResults.FacilityResults = 1
 GROUP BY LeapFrogResults.TestName),
 CTE4
 AS (SELECT
   COUNT(DISTINCT Data.[Id]) AS n_1,
   LeapFrogResults.TestName,
   DataWordMap.Word
 FROM LeapFrogResults
 INNER JOIN Data
   ON LeapFrogResults.FacilityID = Data.FacilityPsoProviderId
 INNER JOIN DataWordMap
   ON Data.[Id] = DataWordMap.EventID
 WHERE LeapFrogResults.FacilityResults IS NOT NULL
 GROUP BY LeapFrogResults.TestName,
          DataWordMap.Word)
 SELECT
   CTE2.TestName,
   CTE2.Word,
   n11,
   n1_,
   n_1,
   n__
 FROM CTE2
 INNER JOIN CTE4
   ON CTE2.Word = CTE4.Word
   AND CTE2.TestName = CTE4.TestName
 INNER JOIN CTE1
   ON CTE2.TestName = CTE1.TestName
 INNER JOIN CTE3
   ON CTE2.TestName = CTE3.TestName





tmp85FF.pesession (18.7 kB)
avatar image By rsjaffe 1 asked Jul 18 at 07:04 PM
more ▼
(comments are locked)
10|10000 characters needed characters left

0 answers: sort voted first
Be the first one to answer this question
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.

Follow this question

Topics:

x650
x463
x109
x46

asked: Jul 18 at 07:04 PM

Seen: 259 times

Last Updated: Jul 18 at 07:09 PM