How does the new path classification methodology differ from the previous approach?
Chartbeat’s previous approach for determining whether a page is labeled as a landing page or an article relies on examining a combination of traffic metrics for the page and attributes of the path itself. Rules based on commonly seen path constructs and traffic patterns for landing pages versus articles are applied to decide which label to assign. Since traffic metrics to a page tend to change over time, the current methodology allows for the classification of a page to be updated during its first 14 days of traffic, depending on the specific patterns of its traffic.
The new approach to determine whether a path is labeled as a landing page or an article is based on a much richer set of attributes of the path, without incorporation of traffic metrics. Each attribute’s importance for determining a path’s label was determined through training a machine learning model on a set of over two-thousand representative paths that were manually labeled by Chartbeat’s data science team. For a test set of paths labeled using the new approach, we found that many more landing pages were likely to be correctly labeled than with the old approach, with true article pages continuing to be labeled correctly just as often as before. Of course, the new approach will still get the labeling wrong sometimes, but this should happen much less frequently than with the old system.
Some examples of the types of pages that we now expect to see labeled as landing pages using the new approach include search result pages, marketplace listings, author profiles, and scoreboards.
What differences should Chartbeat users expect to see?
Users may notice that some types of pages that tended to show up in the Top Stories component of the Real-time Dashboard because they were labeled as articles no longer show up when Show Landing Pages is not selected. This allows individual articles with high amounts of traffic to surface that otherwise may have been relegated to a later page because a high traffic landing page mislabeled as an article was showing up higher in the list of top articles. Additionally, users may notice that pages once mislabeled as articles are now omitted from analyses in the Historical Dashboard.
In our Heads Up Display, users may notice that landing pages previously mislabeled as article pages will no longer have HUD pins assigned to them.
In Advanced Queries, users may see slight differences in their reports if pages with traffic in the last 90 days are assigned a different label using the new system. Chartbeat users may also notice small differences in their traffic source metrics for Direct versus Social. This would happen only for pages with a missing referrer whose labels change under the new system.
While most users should only see the new system's labels appearing on their newly published articles moving forward, some sites (who the data science team preemptively writes regular expression patterns for) will see articles that have had reader engagement in the past 90 days relabeled as well.
What to do if I notice misclassified paths in our top pages data?
Customers who notice pages that they think are mislabeled should contact Technical Support, per usual, at firstname.lastname@example.org.
If your site's paths are being mislabeled and these paths follow a distinctive pattern, we can now easily change the labels on all such paths, for existing pages and pages with those patterns that may appear in the future.