Published April 1, 2024 | Version v1
Journal article

A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews

Description

Description

Online reviews provide rich information on customer satisfaction, displaying various numeric ratings as well as detailed explanations presented in written form. However, analyzing this data is challenging due to the unstructured nature of text. This paper introduces a novel machine-learning method for identifying interpretable key drivers of star ratings from text reviews, which might vary across segments. By adopting the Ising model prior to account for dependence between words, the model simultaneously achieves segmentation, identifies segment-level key topics (i.e., groups of frequently co-occurring words), and estimates the impacts of the selected words on the ratings. We first demonstrate that the proposed model successfully identifies segment-specific key drivers of customer satisfaction using illustrative simulated review data. Then, we utilize real-world reviews from Yelp for our empirical applications. When applied to online reviews of 5,241 Arizona-based restaurants, the model identifies three distinct restaurant segments, each characterized by three to five important topics. Our model’s performance is evaluated against six benchmark models, encompassing various topic models and latent class regression with variable selection. The comparison results emphasize the proposed model’s unique advantages in prediction, interpretability, and handling heterogeneity. Additionally, we demonstrate the applicability of our model in examining customer segmentation for individual restaurants.

Details

Title A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews
Authors
  • Kim, Sunghoon
  • Lee, Sanghak
  • McCulloch, Robert
  • Year of publication 2024