1. Introduction

Enterprises have always been under pressure to reduce the lag between data acquisition and acting on the acquired data. Over the past decade, the expansion of free markets has created a massive user base for consumer and business technology – everyone from a billionaire in Seattle to a street hawker in Bangalore has a mobile phone. The Web owns our eyeballs – in six short years, online advertising and its adjoining business models have overtaken traditional media spend. The next generation of productivity and media products will be a seamless mash-up between a user’s office and personal life with ubiquitous access to any content from any device. Competitive mandates in manufacturing and worldwide initiatives like Green IT have led to massive volumes of sensor-driven, machine-born data. Utility computing from platforms like Windows Azure aim to eliminate technology barriers of scale and economics. These irreversible trends portend a scenario where the success of an enterprise depends on its capability to efficiently respond to every stimulus from its customers and partners. Such a competitive enterprise that seeks a 24x7, 360° view of its opportunities will face a torrent of high volume, high velocity business and consumer data.

The singular impact that these trends will have on traditional IT applications is to elevate the awareness of time. In this attention-deficit world described above, every arriving data point is fungiblei.e., it represents a customer touch-point, an abnormal sensor reading, or a trend point whose business value rapidly diminishes with time. For a growing number of applications, this necessitates a shift in mindset where we want to respond to an opportunity as soon as it presents itself, based on insight built in an incremental manner over time, with the response having an optimal, if not always the best, fidelity or returns. In other words, it necessitates a shift towards event-driven processing, where you deal with each arriving data point as an event – as something that happened at a point, or over a period, in time – and apply your business logic on this event to respond in a time-sensitive manner to the opportunity that this event represents.

This model suits a growing number of applications with demands of low latency (sub-second response time) and high throughput (handling 100x K events/sec). This includes financial services (risk analytics, algorithmic trading), industrial automation (process control and monitoring, historian based process analysis), security (anomaly/fraud detection), web analytics (behavioral targeting, clickstream analytics, customer relationship management), and business intelligence (reporting, predictive analytics, data cleansing) and more.

StreamInsight is a .NET-based platform for the continuous and incremental processing of unending sequences of such events from multiple sources with near-zero latency. It is a temporal query processing engine that enables an application paradigm where a standing (i.e., potentially infinitely running) query processes these moving events over windows of time. Processing can range from simple aggregations, to correlation of events across multiple streams, to detection of event patterns, to building complex time series and analytical models over streaming data. The StreamInsight programming model enables you to define these queries in declarative LINQ, along with the ability to seamlessly integrate the results into your procedural logic written in C#. The goal of this paper is to help you learn, hands-on, how to write declarative queries in LINQ with a clear understanding of reasoning with windows of time.

In this paper, we provide a hands-on, developer’s introduction to the Microsoft StreamInsight Queries. The paper has two goals: (1) To help you think through stream processing in simple, layered levels of understanding, complementing product documentation. (2) To reinforce this learning through examples of various use cases – so that you can design the query plan for a particular problem, and compose the LINQ query. This ability for top-down composition of a LINQ query, combined with a bottom-up understanding of the query model, will help you build rich and powerful streaming applications. As you read this paper, please study the actual code examples provided to you in the form of a Visual Studio 2010 solution (HitchHiker.sln).

StreamInsight is an in-memory event processing engine that ships as part SQL Server 2012 and is available separately in the Microsoft Download Center. It requires .NET 4.0 Full Edition, a C# 3.0 compiler, and Windows 7. The engine leverages CLR for its type system and .NET for runtime, application packaging, and deployment. The choice of Visual Studio gives you the benefit of productivity features like IntelliSense. An event flow debugger provides a complete developer experience. You are assumed to have a working knowledge of C# and LINQ.