7. Let’s Play Ball – Heavy Hitters

Recall the query [Partitioned Sliding Window] from Section 4.1.

[Partitioned Sliding Window] Find the most recent toll generated from vehicles being processed at each station over a one minute window reporting the result every time a change occurs in the input.

Now, assume that a toll administrator wants to obtain some aggregate statistics on the output from this query. Assume that she is interested in knowing the top 2 toll amounts every 3 minutes from the output of [Partitioned Sliding Window], as described in the following query:

[TopK] “Report the top 2 toll amounts over a 3 minute tumbling window over the results obtained from [Partitioned Sliding Window].”

Step 1 Model input and output events from the application’s perspective.

We will retain the same event models as [Partitioned Sliding Window].

Step 2 Understand the desired query semantics by constructing sample input and output event tables.

The input to this example is the output from [Partitioned Sliding Window], shown in Figure 18.

image

Figure 18. Top-K query with tumbling windows built using output from [Partitioned Sliding Window]

Event presentation follows established convention – events related to toll booth 1 in blue, toll booth 2 in red, and green line segments show tumbling windows of 3 minutes. We named events with an ″r″ prefix as they are the results from the [Partitioned Sliding Window] query. Provided total tolls are in parenthesis.

The expected output is shown in the bottom track of the time plot. We also used convention to put events ranked 1 in the top row and those ranked 2 in the second row below.

Step 3 Consider the different elements of query logic required to compute the output above.

We need a mechanism to rank in descending order the TollAmount in each of the buckets defined by the tumbling window. The orderby clause with the descending qualifier in the StreamInsight query achieves this.

We need to take the top 2 of these rank-ordered entries – the Take() operator in LINQ allows us to do this. LINQ also provides us the rank in the output.

Step 4 Compose the query as a streaming transformation of input to output.

The resulting query is shown below.

var query = from window in partitionedSlidingWindow.TumblingWindow(TimeSpan.FromMinutes(3))
            from top in
                (from e in window
                 orderby e.TollAmount descending
                 select e
                ).Take(2,
                    e => new TopEvents
                    {
                        TollRank = e.Rank,
                        TollAmount = e.Payload.TollAmount,
                        TollId = e.Payload.TollId,
                        VehicleCount = e.Payload.VehicleCount
                    })
            select top;

Click here to view code as image