To learn more about Datastream, visit docs.chartbeat.com/datastream. To request a call with our sales team, click the 'Get Datastream' button on this page and enter your details in the contact form.
What is Datastream?
When readers, customers, or prospects engage with your site online, they create user interaction data. User interaction data is data about site visits, content consumption, mobile app usage, and any other interactions related to your site coming from a connected device.
Datastream, Chartbeat’s raw data pipeline, offers real-time access to the user interaction data collected from all visitor interactions on a page.
With Datastream, you’ll have a detailed history of user-level interactions with your content, from the amount of pages someone viewed in a session, to how they engaged with the page, and how deep they scrolled through a particular article.
Unlike APIs, data does not need to be pulled to be accessed; instead, Chartbeat sets up and transfers this data to your warehouse, so you can use it as you wish and own it in perpetuity.
Why do data-driven businesses need Datastream?
User interaction data is relevant and important to any business, since it allows you to connect user interactions to company-wide goals. However, this data usually only exists behind a proprietary analytics dashboard showing aggregated user data, like totals and averages.
For the most part, you can export these aggregated data sets to combine with other data sources via APIs, but you won’t get the user-level intel needed to transform company strategies, build next-generation products and features, and personalize customer experiences at scale.
Datastream frees you from the resources, time, and money needed to build an in-house data collection system. Because Chartbeat already runs an analytics service for thousands of high-traffic sites around the world, we can offer our data pipeline at an accessible, volume-based price.
Building an in-house infrastructure to ingest user level data from your sites and apps can take anywhere from 6-12 months. With Datastream, this becomes possible in a matter of minutes.
What makes Datastream unique?
The data is yours.
Datastream delivers all your raw, unsampled data, straight to your Amazon S3 or Google Cloud Services bucket. All interaction events from your users, sites, and apps are sent to you in real time, with second-by-second measurements.
Unique content metrics.
Datastream offers an exceptionally rich dataset of content consumption and engagement signals to combine with session-based metrics, including Engaged Time and Scroll Depth. Our distributed audience segments, including Apple News, Facebook Instant Articles, Google AMP, and your own native app, are unmatched anywhere in the market.
Raw data for complete versatility.
Only raw event data (with one event per user interaction) can enable the most granular level of analysis. With Datastream, you have access to every event, so you can analyze it, transform it, alert on it, combine it with other sources, or transfer it somewhere else.
The market leader in audience analytics.
Chartbeat’s data collection stack has provided real-time and historical content analytics for the world’s top publishers and content creators for over a decade.
Convenient pricing.
Datastream pricing is affordable and straight-forward. Pay only for the volume you need, with no complicated pricing structures or hidden storage costs.
Use Cases
Datastream can be used by any team who finds value in real-time, user-level web and mobile engagement data, including:
-
Data Analysts and Business Intelligence Teams
-
Data Scientists and Personalization Teams
-
Product, Marketing, and Revenue Teams
For Data Analysts and Business Intelligence Teams
To help guide other departments, like editorial, marketing, product, or audience development, many data analysis and business intelligence teams run internal analytics products to surface data from multiple data sources in ways that third-party analytics products do not.
To do so, customers many times integrate our data pipeline with their own ETL before dumping the data in Amazon Redshift or BigQuery. Then, they use a tool like Tableau, Looker, or an internal solution to provide data exploration and dashboard interfaces for their teams.
Only raw, user-level data allows BI teams to understand their users at a granular level. By integrating site interaction data with first-party data, subscriptions data, CMS data, Google/Adobe data, and virtually any other data source, BI teams can create custom analyses that help inform company-wide goals. Here are just a few of the metrics included in Datastream:
-
Rich Time based data: Eg, Engagement data, scroll depth
-
Multi-platform data: For teams looking to visualize your traffic data with data stream you can pull in data from a variety of platforms
-
Apple News
-
Google AMP
-
Facebook Instant Articles
-
Your own native app
-
For Data Science Teams
Datastream provides unique engagement and high quality data that is easily linkable to other data used in user-level analysis.
Data science use cases:
-
Understand engagement at a granular level
-
Conduct user journey analysis
-
Audience segmentation
-
Behavioral modeling
-
Funnel analysis
-
Conversion and subscription attribution
With Datastream, you can use interactive data exploration environments such as:
-
Python and Pandas, with Jupyter Notebooks
-
R Studio, for R users
-
Scikit learn for machine learning
Datastream’s data formats have been specifically optimized to be cleanly integrated by their bulk loading and stream into Amazon Redshift and Google BigQuery. For SQL experts on your team, you can use a tool like Periscope to query your raw data.
For Personalization Teams
Data Science teams working on personalizing their site, or building recommendation engines to better drive engagement for loyal users, Datastream provides unique engagement data that is easily linkable to other data used in personalization algorithms and models.
For Data Engineering teams
Datastream provides a real-time pipeline of clean, easy-to-use traffic that can be easily imported into your data infrastructure without any additional extraction or transformation.
Data engineering teams commonly utilize Datastream data by loading it into Google BigQuery or Amazon Redshift data warehouse for querying.
For Product, Marketing and Revenue Teams
Raw data can be a vital tool to improve your product and more accurately target users. Datastream extends your existing customer analytics with user-level attributes such as:
-
Platform
-
Device
-
City/Region/Country
-
Referrer
-
Engaged time
-
Articles touched
-
Recency & Frequency
-
Scroll depth
-
Subscriber status
Because the Datastream feed is at the user level, you can analyze and act upon every interaction on your site, allowing your team to do things like:
Targeted Marketing
As a marketer, you might like to provide targeted promotions to guest readers to entice them to subscribe. For instance, readers who frequent cooking and dining-related articles and have higher than average engaged times on those articles may be offered a free cookbook with their subscription, where the cookbook is a Kindle book if the reader most often reads on their tablet or smartphone.
Article Recommendations
As Head of Customer Engagement, you might like to show readers links to articles similar to those they have engaged with in the past, in order to entice them to continue reading and thus increase the time that readers spend on the site. In order to do this, you need to know which articles (author/section/content type) a user has had the highest engagement with in a recent time period, and on what device, as you may want to construct different recommendation sets dependent on which device the reader is on at the time and what geographic location s/he is coming from. Additionally, the recommendations provided may differ based on the initial page view source, e.g. Search vs Social, as this indicates ‘intent’.
Advertising Placement
As an Advertising Optimization analyst, you might be charged with determining which ads will be shown where within the context of an article. In order to do this, you’ll want to know the average scroll depth for different reader segments, where the segments might be based on engaged time per section, per author, recency, frequency, geo, device, subscriber status, etc.
How it Works
Datastream is a raw data pipeline that delivers real-time, user-level data from visitor interactions on a page, streamed to your Amazon S3 or Google Cloud Storage bucket.
Our feed contains over 50 fields of data. These can be categorized into four broad groups:
-
Engagement: Chartbeat’s best-in-class engagement metrics, such as engaged time, time on page, scroll depth, page and browser geometry.
-
Data about the page: Data points related to the identity of the page, such as the path, title, section and author, content type, platform, and sponsor data associated with each page view.
-
Data about the user: For user level analysis, a unique ID, their browser’s user agent string, frequency, and recency.
- Timestamp: The time the visitor visited the page, left the page, and user's time zone.
Note: Unique IDs are unique to a given user, in a given browser, on a given website. Chartbeat only uses a first-party cookie, so our IDs cannot be used to track a user between sites. Customers who want to track users between sites can pass us an ID and perform user journey analysis on their backend systems.
Some of the Platform data supported in Datastream include:
-
Web
-
Google AMP
-
Facebook Instant Articles
-
Apple News
-
Your own native app
Chartbeat’s Datastream Reporting supports exporting data to the following data storage platforms:
-
Amazon Web Services
- Google Cloud Storage
Datastream specifications and formats
File Format: CSV, one row per Chartbeat-logged page session-expired page view
Compression Type: GZIP
Delimiters: pipe-separated
Character Encoding: UTF-8
Example File Naming Convention: rawdata/YYYY/MM/DD/h/[00|30]/[epoch timestamp].[file hash].csv.gz
Data Batch Interval: by minute
Delivery Frequency: by minute
Delivery Destination: Amazon S3 or GCS bucket with shared read/write permissions
Note: Files are created every minute, with each minute’s files representing the users whose page views ended in that minute.
Download a sample data file
Click here to download a sample data CSV file attachment, or preview a row of pageview data below.
distribution|last_ping_timestamp|host|cookie_id|page_session_id|domain|path|new_user|device|engaged_time_on_page_seconds|page_width|page_height|max_scroll_position_top|window_height|external_referrer|no_client_storage|city_name|region_name|country_code|country_name|continent_name|dma_code|utc_offset_minutes|user_agent|recency|frequency|internal_referrer|author|section|content_type|sponsor|utm_campaign|utm_medium|utm_source|utm_content|utm_term|account_id|page_title|virtual_page|scrolldepth|total_time_on_page_seconds|ga_client_id|login_id|id_sync|subscriber_acct|page_load_time
SITE|1571314031|mysite.com|M_qCECKGCqIcP9a3|Ffe3bT84JvqrIDfBDS+w5FgVyRY=|mysite.com|mysite.com/news/3977611002|false|desktop|5|1366|768.0|0.0|768||false|Brooklyn|New York|US|United States|North America|501|-240|Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36|1|16|mysite.com/|no author|news,local|how-to|||||||REMOVED|How to become a writer for your local newspaper|false|768|74|1236546315.5527916466||"{""clientId"":""62d8fbb2-0060-1cfd-a004-a6f56c0dc7a4"",""anonymousId"":""46e7a61c3a0d3208cf504ff859008b70"",""userMeterState"":""3""}"||784
Map Chartbeat data with other data sources
Dimensions and Metrics included
Datastream exports all of the following dimensions and metrics to clients. Each row of data in a Datastream file represents a completed user session where a session corresponds with the time a visitor spent on a single page before either going to a new page or leaving your website.
Unless otherwise mentioned, data should be considered “raw”, meaning it has been unaltered by Chartbeat.
Note: All data of type STRING is UTF-8 encoded.