VisionHub is the system that allows GroupM to analyze how the advertising campaigns of their agencies are working in the real world, and to help them provide ROI figures to their advertisers. By aggregating the information supplied by different AdServers, in different formats, and building a data warehouse and a querying system on top of it, we’re able to provide the business the insight they need in a fast and easy way. They’re able to take action faster than ever, and be on top of the market at all times.
GroupM had a production system with an aging architecture and a convoluted processing pipeline. Any error in a single file could halt the whole file ingestion process for all the system, and any modification was a cumbersome and dangerous endeavor. The provided way of querying the available information was static, and didn’t provide the required level of detail.
VisionHub works by leveraging the power of the Azure platform, with files being downloaded from an Azure Worker role into Azure Blob Storage, where they are ingested into a Data Warehouse also in Azure Blob Storage. All query operations are managed from an Azure Website UI which relays those queries into one (or more) HDInsight clusters. This ensures the architecture can be easily scaled to suit the business needs at any time.
VisionHub works with HDInsight by having both permanent clusters to ingest the data into the system as available, or execute simple data extraction queries, and also a cluster on demand system that allows a user to run costly data processing queries that work with vast amounts of data.
Storing all the data in the Azure Blob Storage system lets VisionHub store years of data at a very low cost, and also provides replication and ease of access for the ingestion and processing using HDInsight.
The data ingestion system can be easily modified to suit the changing needs of the business: new rows, schema changes, etc.
MASSIVE AMOUNTS OF DATA
The current Data Warehouse holds more than 150 Tb.
Users can run simple data extraction jobs, or massive queries that summarize the information from millions of rows into only a few.
EASY TO UNDERSTAND
The UI allows the user to execute complex queries by selecting a few key values.