So far, we have followed the top-down, four step query writing methodology to arrive at the LINQ query. If you are able to complement this top-down approach with a bottom-up introspection of the query in Step 3, this will greatly improve your ability to write efficient queries. In this section, we will use this bottom-up approach to constructing a query to build up an Outer Join operation. First, let's understand how a StreamInsight query works.
StreamInsight enriches LINQ with temporal query capability through LINQ extension methods. On the client, the C# compiler runs HelloToll.cs through a StreamInsight LINQ Provider, which translates a LINQ query into query template, query binder, and query definition XML documents. The StreamInsight engine understands only this XML-based representations of the query– it generates the execution plan, loads and links the required assemblies that constitute a running query, and executes the query.
In its compiled form, a StreamInsight query is an event flow graph made up of operators and streams. Each Operator is a stateful or stateless machine that performs a specific operation on each event arriving via one or more input streams, generates new events as a result of this operation, and outputs the event via one or more of its output streams. Streams are efficient in-memory couriers of events from one operator to another.
For any given query, you can view this query plan using the StreamInsight Debugger. In the simplest use case, start a query trace, run the query, stop the trace, and then view the trace file in the debugger (we will not digress into Debugger setup here – please consult the documentation). Let's consider the query [TopK] for this exercise.
Figure 20 Query Graph of [TopK] as shown in the Event Flow Debugger
c:\Program Files\Microsoft StreamInsight 2.1\Tools>trace start c:\q9-trace.etl The command completed successfully.
Click here to view code as image
Now execute the query from Visual Studio. Once the query completes…
c:\Program Files\Microsoft StreamInsight 2.1\Tools>trace stop The command completed successfully.
Click here to view code as image
Next, start the debugger from the Start menu, and open the file c:\q9-trace.etl. The resulting query graph is shown in Figure 11. The plan shows events flowing from top to bottom through an operator chain IMPORT -> CLEANSEINPUT -> ALTERLIFETIME -> GROUPING
(which contains an AGGREGATE
and PROJECT
) -> PROJECT -> TOP-K -> PROJECT -> CLEANSE
. We have looked at all of the LINQ level constructs that correspond to these operators, except for CLEANSEINPUT
and CLEANSE
. Note that there are other powerful uses for the debugger – we will not digress into those capabilities now.
Now, imagine generating such a plan manually, using just the higher level constructs we have learned so far, to build either other operators or complete queries. This is the bottom-up approach to query building that we referred to earlier. The repertoire of primitive operators in StreamInsight is not very large, and you do not have to know all of them. What is important is the process of trying to compose a query by working bottom up to construct a query using a flow diagram substituting for the actual query plan.
In the earlier section on Joins, we left out one important join type for later: Outer Join. In this section, we will build this operator bottom up by combining other primitive operators such as inner join, multicast, and project - we can put them together like so many Lego pieces to compose the outer join query.
An Outer join derives from inner join, and has the following characteristics.
• The left outer join results in an output event that contains all the fields of the payload from the left stream, with matching values from the fields in the right stream, and NULL values for the fields where no matches are found.
• The right outer join contains all fields of the payload from the right stream, and matching values from the left stream, and NULL if no matches are found.
• The full outer join results in events that contain all the fields from payloads of both the streams, with NULL values for non-matching fields.
Let's consider a left outer join.
Step 1 Model input and output events from the application’s perspective.
We will use interval events for this example.
Step 2 Understand the desired query semantics by constructing sample input and output event tables.
We will first consider the payload values to illustrate the outer join. From a temporal perspective, the semantics are that of an inner join. Assume we have two streams with payloads shown side by side below.
Start Time |
End Time |
License Plate |
Make |
Model |
|
Start Time |
End Time |
License Plate |
Toll |
TagId |
12:01 |
12:03 |
XYZ 1003 |
Toyota |
Camry |
|
12:03 |
12:08 |
BAC 1005 |
5.50 |
567891234 |
12:02 |
12:04 |
XYZ 1001 |
Toyota |
Camry |
|
12:05 |
12:07 |
ABC 1004 |
5.00 |
456789123 |
12:03 |
12:07 |
ZYX 1002 |
Honda |
Accord |
|
|
|
|
|
|
12:03 |
12:08 |
BAC 1005 |
Toyota |
Camry |
|
|
|
|
|
|
12:05 |
12:07 |
ABC 1004 |
Ford |
Taurus |
|
|
|
|
|
|
12:07 |
12:09 |
NJB 1006 |
Ford |
Taurus |
|
|
|
|
|
|
Click here to view table as image
The result from a left outer join of the above two streams on equality of License Plates, and based on an intrinsic comparison of the timestamps done by the inner join semantics, is shown below.
Start Time |
End Time |
License Plate |
Make |
Model |
Toll |
TagId |
12:01 |
12:03 |
XYZ 1003 |
Toyota |
Camry |
NULL |
NULL |
12:02 |
12:04 |
YXZ 1001 |
Toyota |
Camry |
NULL |
NULL |
12:03 |
12:07 |
ZYX 1002 |
Honda |
Accord |
NULL |
NULL |
12:03 |
12:08 |
BAC 1005 |
Toyota |
Camry |
5.00 |
456789123 |
12:05 |
12:07 |
ABC 1004 |
Ford |
Taurus |
5.50 |
567891234 |
12:07 |
12:09 |
NJB 1006 |
Ford |
Taurus |
NULL |
NULL |
Click here to view table as image
Step 3 Consider the different elements of query logic required to compute the output above.
Given that this operator is not available out of the box, we will emulate left outer join behavior with the primitives at hand.
• An inner join between the two streams will give us the matching rows from the two streams
• A Left Anti Join between the two streams will give all the rows in left stream that do NOT have matching rows in the right stream. Since the left stream is the reference stream, this will give us the payload fields (License Plate, Make, Model). Now we need to add the NULL columns to this payload – this is done very easily using a Project operation on this output.
• Combine the above two streams with a Union operator to get the output we want.
This is shown graphically in Figure 20 below.
Figure 20. Outer Join composed from primitives
Step 4 Compose the query as a streaming transformation of input to output.
We will use the same input stream to generate the left and right streams using suitable projections. See how the query matches the graph provided above.
// Simulate the left stream input from inputStream var outerJoin_L = from e in inputStream select new { LicensePlate = e.LicensePlate, Make = e.Make, Model = e.Model, }; // Simulate the right stream input from inputStream – eliminate all events with Toyota as the vehicle // These should be the rows in the outer joined result with NULL values for Toll and LicensePlate var outerJoin_R = from e in inputStream where e.Make != "Toyota" select new { LicensePlate = e.LicensePlate, Toll = e.Toll, TollId = e.TollId, }; // Inner join the two simulated input streams var innerJoin = from left in outerJoin_L from right in outerJoin_R where left.LicensePlate == right.LicensePlate select new TollOuterJoin { LicensePlate = left.LicensePlate, Make = left.Make, Model = left.Model, Toll = right.Toll, TollId = right.TollId, }; // Left anti join the two input simulated streams, and add the Project var leftAntiJoin = outerJoin_L .LeftAntiJoin(outerJoin_R, left => left.LicensePlate, right => right.LicensePlate) .Select(left => new TollOuterJoin { LicensePlate = left.LicensePlate, Make = left.Make, Model = left.Model, Toll = null, TollId = null }; // Union the two streams to complete a Left Outer Join operation var query = innerJoin.Union(leftAntiJoin);
Click here to view code as image
Such graphing of event flows, followed by query composition in LINQ, is the bread and butter of StreamInsight query development.