12. Notes on Events and Event Modeling

12.1 Event Shapes

A StreamInsight query can support three event models (aka shape) – Point, Interval, and Edge – based on which you can model your application events. It is worth mentioning that a Project operator can change the shape of the payload only, not the event shape itself.

StreamInsight provides you with the flexibility to model the shape of your events:

In the input adapter: Using the Adapter API in the StreamInsight SDK, you can implement your input application event as a PointEvent, IntervalEvent, or an EdgeEvent and enqueue it into the query. The input adapter instance is represented in a logical query graph as an IMPORT operator.

In the query itself: Inside the query, it is still a good practice to think of all events as interval events, irrespective of its shape. But from the standpoint of programmability, once an event crosses this threshold into the query, constructs like AlterEventDuration(), AlterEventLifetime(), and ToPointStream() enable explicit transformation of the event shape, while operations like JOIN can also implicitly change the shape of the event being output from them. A shape change from a point to an interval event is reversible, but changes to other event types may be irreversible. For example, if you transform an interval event to a point event, it is impossible within the context of the query execution to “go back” and discover the validity interval of the event you just transformed. Of course, you can record the event flow and inspect what happened using the Event Flow Debugger. We have seen examples of these constructs earlier.

In the output adapter: Similar to the input adapter, the output adapter can also model the event along one of the three shapes for output. An output adapter instance is represented in the query graph as an EXPORT operator. Once the event crosses this threshold, the output adapter instance influences the shape of the output event via the binding of the query with the instance.

image

Figure 21. Point Event Example

We will briefly look at the various event models in the context of query development, and to set the stage for discussions around stream behavior in upcoming sections.

PointEvent is used to model application events that typically occur at a point in time. A simple example is a web click.

From a query development standpoint:

• Point events used in conjunction with Hopping/Tumbling windows convey the idea of events “falling into” or “belonging to” a particular, fixed time slice, as shown in the Figure 21.

• Point events have the lifetime of a single tick. When you want to report the impact of a point event over a particular time period (as in “web clicks over the past one week, one hour” etc.) every time a change occurs in the stream (i.e., in an event-driven manner, using snapshot windows) the programming technique is to “stretch” or alter the duration of the point events uniformly by the time period, and then define the snapshots. You can use the same trick when you have to correlate point events from two streams.

The figure below shows an example of correlating point events from two streams in a data-driven manner looking back over a 2 minute window.

image

• Conversely, you can use point events to vertically slice an interval or an edge event through a JOIN to generate a point-in-time result.

IntervalEvent is used to model application events that span a duration of time. We have built most of our examples in the previous chapter with interval events. Some key features from a query development standpoint:

• Events modeled as intervals are a quick means to correlate and detect co-occurrence of events across two different streams. The overlap of lifetimes is an indicator of this co-occurrence.

• Interval events influence CTI events to exhibit specific temporal behavior (discussed next).

image

Figure 22. Edge Event

EdgeEvent is used to model application events for which the end time is unknown at the start time of the event, but is provided at a later point in time. The input source must have clear markers against each event indicating if a given event that has to be enqueued is a Start edge or an End edge. The input adapter can then use this marker to set the EdgeType property in the CreateInsertEvent() method. The two main caveats in enqueueing edge events are:

• The End edge must have the same start time and the payload field values as the Start edge.

• If an End edge event arrives without a corresponding Start edge event it will be ignored.

These are some of the most common use cases and motivations for using Edge Events.

• Edge events are ideal for modeling “long-running” events, such as the uptime of a process or a machine, given the flexibility of deciding the termination of the event on demand. Against this “anchoring” event, several small events such as CPU cycles, I/O, memory usage, and such short term metrics can be correlated and summarized.

• A popular application of edge events is in modeling signals – to implement step functions, discretize analog signals, or model event patterns with arbitrary event lifetimes.

• Following the same theme of “anchoring” event, edge events are used for modeling anomaly detection applications in conjunction with Left Anti Join. The edge event can be the anchoring event in the left hand side (or reference) stream, while the event stream with the “anomaly” such as non-occurrence of the event, gaps/mismatch in event payload values can be the right stream. An output from the left stream is sent out whenever an event in the left (reference) stream of the join does NOT find a matching event in the right (observed) stream. The match is defined as the points in time (1) when an event does not exist on the observed stream (2) when the event’s payload does not satisfy the join predicate.

12.2 Event Kinds

You can enqueue just two kinds of events into a StreamInsight query from the input adapter. You do not explicitly tag these events as such – they are generated by virtue of your using different methods in the Adapter API.

INSERT event – This is the event kind that gets enqueued when you choose one of the above event shapes, populate the payload structure, and use the Enqueue() method in your input adapter program.

CTI eventAt the outset, know this much – if CTI events are not enqueued into a query, the query will NOT generate any output. This is a special punctuation event that gets enqueued when you use the method EnqueueCtiEvent() in your input adapter program, or define their enqueueing process declaratively. We will understand the significance and purpose of this event in the next section.

12.3 Event “CRUD”

Of the familiar CRUD (Create, Read, Update, Delete) operations, only Create will resonate with users familiar with persisted storage or messaging systems – the other operations are distinctly different in a streaming system.

Create: You use CreateInsertEvent() to create an INSERT event of a given shape, populate its application timestamp(s) and payload fields, and enqueue the event.

Read: Recall from our introduction that the streaming model is very different from the query–response model seen in persisted systems. There is no equivalent of a “row/tuple store” from which you can selectively read an event based on a tuple identifier. The only way you’d “read” the exact event that you enqueued into a query is at the output of the query – assuming the query was a straight pass through, as in:

var query = from e in inputStream select e;

Click here to view code as image

Update: StreamInsight events are immutable. Once an event is enqueued, there is no means to reach into the system and delete or modify that specific enqueued event in-place in the stream.

The only event model that supports a semblance of an update is the Edge Event. This is demonstrated in Figure 23 Edge Event, and shown in this table below. You can enqueue a start edge event with EndTime ∞ and then submit an end edge event with a modified end time. Then you can submit another start edge in a gapless manner with the new payload value. A sequence of such start and end edge events mimics the behavior of a step function, or an AD convertor.

EventKind

Event

Edge

StartTime

EndTime

Payload

Comment

INSERT

e1

Start

12:01

1

 

INSERT

e1

End

12:01

12:03

1

Same start time and payload value as e1

INSERT

e2

Start

12:03

2

 

INSERT

e2

End

12:03

12:06

2

-- ditto for e2 --

INSERT

e3

Start

12:06

1

 

INSERT

e3

End

12:06

12:08

1

 

INSERT

e4

Start

12:08

3

 

INSERT

e4

End

12:08

12:11

3

 

INSERT

e5

Start

12:11

2

 

INSERT

e5

End

12:11

12:13

2

 

INSERT

e6

Start

12:13

4

 

 

 

 

 

 

and so on…

Click here to view table as image

Inside the query processing engine, StreamInsight supports RETRACT and EXPAND events. These events are not exposed to the external user. The only time you will confront these events is while examining a query plan from the Event Flow Debugger. Event retraction and expansion, and the related operators responsible for cleaning up these events – CLEANSEINPUT and CLEANSE – is outside the scope of this document.

Delete: Similar to update, there is no concept of deleting an enqueued event. The closest equivalent to a Delete operation is the ReleaseEvent() method. You will invoke this in the output adapter after you Dequeue an event and copy the results to the event structure in your local/managed memory.

This short digression on event models and structures is a prelude to the main topic in this chapter – which is to understand stream behavior and query advancement. This is also the final step to consider in completing the query development process.