This article has been translated with DeepL.

NEW RESEARCH | Text analysis – a new method to capture innovations

Mustafa Bulut looks for innovations with text analysis.
Photo: Canva/Stockholm School of Economics.

Innovation is usually measured in patents and R&D expenditure. But these measures are far from capturing all types of innovation that take place in companies. Mustafa Bulut at the Stockholm School of Economics has developed a new measure that captures a wider range of business innovation.

In your thesis, you have developed a new measure of innovation that is text-based. Tell us about it!

– The measure is based on text data. By analyzing the text of the annual reports published by companies each year, we can measure the extent to which companies discuss innovation-related activities. With this measure, we assume that the more companies engage in innovation, the more the annual report will contain discussions on innovative activities.

What types of innovations can be measured with text analysis?

– Text-based innovation is a broader measure of innovative activity within firms. It captures patents and R&D expenditure, as companies usually discuss these in their annual reports. In addition, it can also capture innovations that are not patented or recorded as R&D expenditure.

– For example, an improvement in production processes that increases efficiency, or a reorganization of the company that enables better use of resources. In addition, it can capture improvements in the quality of management, which can be considered an important innovation.

How is text analysis used as a method?

– The text-based measure is based on an unsupervised machine learning method called the topics model. The underlying assumption in such models is that each document is generated from a common set of topics or clusters of words.

– Using the Latent Dirichlet Allocation (LDA) algorithm I estimate a model with 25 topics in the annual reports of Swedish listed companies. Of the estimated topics, the one most similar to those in a standard innovation textbook is chosen. The loading of this topic in each document is then taken as the text-based innovation measure for the given business year.

Why is it important to have multiple measures of innovation?

– Each innovation metric has its pros and cons. For example, R&D expenditure effectively captures firms’ investments and efforts to produce innovation. However, these investments may not produce the expected results in terms of innovations. Conversely, patents are a good indicator of innovation output, but many forms of innovation within firms are not patentable. The text-based innovation measure I have developed can provide a broader picture of innovation activities within firms and capture innovations that are not recorded in traditional accounting-based measures.

What do the practical results of your thesis show?

– First, I show that the text-based measure I developed is a qualitative indication of innovation. Second, it is strongly associated with the number of patents and, to a lesser extent, with R&D intensity. Finally, it is very informative when it comes to predicting the future operating performance of companies. The latter result holds for both patent producing and non-patenting firms.

What message do you have for companies that are not involved in patent production?

– They should not think that they cannot be innovative. There is room for various improvements in organizational structure, management practices and operational processes.

– As my research shows, managers who introduce such innovations will be rewarded with higher business performance.


More about the thesis
Mustafa Bulut recently defended his thesis Going Public, Spillover Effects and Innovation , at the Stockholm School of Economics.