The emerging role of big data, machine learning, and artificial intelligence in the healthcare and pharmaceutical industries is indisputable.[1,2] While many larger pharmaceutical companies have teams in pharmacovigilance or marketing that monitor, collect, and draw meaningful conclusions from big data (extremely large datasets), medical affairs and field medical professionals should not shy away from learning how to personalize, manage, and analyze big data as it pertains to their drug portfolio, disease state, or region. An emerging body of literature has utilized free, open-source frameworks (i.e., Google Trends) to monitor regional and temporal interest in drug side effects, voluntary vaccination participation, scientific data dissemination, and off-label drug use.[3–6] As a follow up to our previous article published in the July 2021 issue, which deployed a similar framework for monitoring trends in the medical science liaison (and related) careers, the aim of this article is to help others in medical affairs access, properly analyze, and implement big data into their medical affairs strategy, including recruitment, KOL engagement, and trend identification.
Identifying which databases to scrape
Identifying robust databases (i.e. search engines or social media platforms) is vital for ensuring that enough data is readily available for analysis. The aim should be to utilize the tool that has the highest market share in a given region or to carry out a multi-database analysis (using more than one search engine/social media platform’s data). Regional and temporal changes in the market share of such platforms are freely available at gs.statcounter.com. Temporal (2015-present) and average data of search engine and social media platform market share is presented in Figure 1 a & b. For the United States, Google has consistently been the go-to browser for most internet users. Additionally, Google Trends (GT) is a free and useful tool to visualize, investigate, and gather social data. As a platform, GT gives users access to temporal and regional data of up to five search terms. When terms are searched in combination, their search volume is given relative to one another. Additionally, regional data has exceptional granularity, with data ranging from countries to metropolitan areas. GT has a fundamentally intuitive design that is highly customizable and easy to maneuver. More in-depth training (i.e., differentiating search terms from topics, fine-tuning verbiage, etc) is available free of charge on Google News Initiative’s Google Trends course series (and even comes with a certification).
Figure 1. Market share of (a) internet search engines and (b) social media platforms from 2015 to present. Data Source = StateCounter.
Fine-tuning and extracting data
As medical affairs professionals, it is important to understand how this data can be used to generate new insights, create engagements, and help direct field medical plans. Tailoring searches over certain timelines (i.e., during key marketing campaigns or before/during/after the FDA approval process) can lend valuable insight into how HCPs and the broader public are searching for certain therapeutics. As GT normalizes all search interest as relative to a “maximum interest” over a given time point, establishing a timeline is vital for extracting the most valuable information. Additionally, it is valuable to consider the region in which the search is carried out. In most cases, we recommend searching for terms within the desired time window (+/- several weeks) and at a country level. Then, as state, metropolitan, and city data permits, more granular searches can be carried out. An example (using the term “Keytruda”) is shown in Figure 2 (temporal) and 3 (regional). Following a search, four data sets are immediately available for download: temporal data (“interest over time”), regional data (“interest by subregion”), related topics, and related queries. Temporal and regional interest are both downloaded as simple CSV files including the search terms, their corresponding dates of data collection, as relative search volume (RSV). Related topics and queries are crucial for contextualizing motivations for search volume. GT assigns quantitative values for interest in related searches, which lends insight into how rapidly a related search has arisen. For example, when Keytruda® is searched over the last 12 months and within the United States, related topics include: “Scott Hamilton”, “fatigue”, “prognosis”, “hypothyroidism”, and “myasthenia”. Related queries include: “keytruda mechanism of action”, “keytruda commercial”, “lenvima”, and “keytruda cancer treatment”. Unlike search queries, topics include searches closely associated with the particular search phrase. So how might an MSL interpret these results? The related queries “keytruda mechanism of action” and “lenvima” may suggest a significant portion of search volume for these terms came from KOLs and HCPs, given their scientific terminology. Additionally, it offers insight into which applications (these are often off-label indications) searchers associate with Keytruda®. This can be particularly useful as a method to rapidly scan for off-label indications, side effects, or regional discrepancies in drug access.
Figure 2. Temporal relative search volume in the search term “Keytruda”. Data Source = Google Trends
Figure 3. Regional search volume for “Keytruda” in global, country, regional, and metro areas. Data Source = Google Trends.
Analyzing & visualizing data
Taking a peek at GT data is almost always interesting, as it allows us to take a large-scale look at how people are searching for our companies, therapies, and institutions. While this is interesting, it may not be particularly useful in and of itself. For MSLs to put meta social data to work, they must analyze and visualize the data to draw meaningful conclusions. In the previous section, we discussed how to extract social data. Here, we want to talk about functionalizing it for medical affairs. Many of us still have access to statistical analysis software (GraphPad Prism, R, SPSS, etc.) that is excellent at generating graphs and carrying out analysis. As field medical professionals, though, the data must be put to use and told as a story. One of the best platforms to accomplish this quickly and for free is through utilizing Flourish. In addition to creating beautiful graphs, Flourish allows users to create interactive, transformative, and engaging stories with their data, making it ideal for presentations. To create stories with GT-generated data, we think smoothed temporal data and illustrative regional data are logical and quickly convey messages to HCPs and KOLs. Tables containing related topics and queries can also be created to help contextualize data sets. Temporal data can be averaged to generate statistical analysis (i.e., taking the average RSV before and after a marketing event), which leads to insight into how certain events impacted the relevancy of a therapeutic. On their own, regional and temporal data can lend insights into trends in a therapeutic area or regarding a certain drug, though the true power of meta social data is in prediction and correlation.
Regression analysis with the second data set
The introduction of a second data set to RSV transforms social data from observational to predictive and actionable. Several studies have demonstrated that changes in temporal RSV can, with high fidelity, predict participation in vaccination, elective surgical procedures, and even help identify emerging diseases [3,9,10] In each case, secondary datasets (vaccination numbers, claims data, and COVID-19 diagnoses) were used in a correlation analysis to investigate if social interest was significantly correlated to the secondary data set. Further analysis (Augmented Dickey-Fuller and Lag-Correlation) can provide insight into whether a trend is “noise” or a significant deviation in expected fluctuations and how long a change in social interest is “realized” in the secondary data sets. MSLs may consider pairing RSV data with FDA progression, marketing data, prescription data, claims data, sales and revenue data, or any other on-hand data that may be useful for medical affairs or KOLs. We present this unique use case in the following example, which illustrates how temporal changes in interest in Keytruda® and Opdivo® significantly correlate to their realized quarterly revenue.
Case Study – Keytruda® and Opdivo®
In this use case, the objective was to determine whether RSV in two key cancer drugs, Keytruda® and Opdivo®, could significantly predict their quarterly sales data. To this end, sales data for each drug was accessed through a third party, and GT data was extracted from Q1 of 2015 to Q1 of 2021. Using Flourish, we plotted the increasing sales data of both drugs by fiscal quarter (Figure 4). Though both drugs are still experiencing market growth in sales, their rate of change varies dramatically.
Figure 4. If we compare the sales by grouping them into financial quarters, we can see where Keytruda® sales overtook Opdivo®
Up to 2017, Opdivo® dominated the market space. Notably, in Q3 of 2017, a switch in sales leader occurred. When RSV is overlaid with sales data, it becomes immediately clear that a change in social interest predicated the switch in sales data (Figure 5). The optimized lag period of around 12 months suggests that this social interest change was fiscally realized 12 months later.
Figure 5. Top: RSV (lines) overlaid on quarterly sales. Bottom: Zoomed region of RSV and subsequent sales leader switch.
Regardless of the reason behind the change, the significant correlation between the two data sets suggests a relationship between drug sales and social interest that is actionable for medical affairs and marketing. Questions that may incite action following this analysis may be: “Where are these changes in social interest occurring?” and “How is our engagement in these areas?”. Regional analysis from Q2 2016-Q2 2017 demonstrates a stark regional contrast in RSV switch (Figure 6). MSLs and field medical can use these tools to predict where a field presence may emerge, or where increased efforts at engagement may be warranted.
Figure 6. Regional interest in Keytruda and Opdivo from Q2 2016-Q2 2017. Color intensity reflects increasing social interest. Data Source = Google Trends.
Medical Affairs Use Cases
Meta social data analysis is by no means a new concept, though its implementation and widespread use has not been broadly adopted by most industries. In medical affairs, specifically, social data analysis can bring a lot to the table. Arguably one of the most important and simplest use cases is monitoring how well scientific data is disseminated within a region and between regions. A simple look at how people are searching for a certain drug, therapy, or disease gives a realistic and unbiased look into how well scientific information has been conveyed. Off-label drug use can be monitored with social data analysis as well. When searching for a drug, checking the related searches or related queries often reveals a correlated increase in off-label drug uses or associated side effects.
GT tools can be especially useful to a medical affairs organization for a number of applications. As a drug matures in its development and data readouts progress (such as key publications, congress abstracts, and regulatory approvals), GT can provide an immediate social impact of such catalyst events that may be able to illustrate not only quantitative/regional search data, but also shed light on potential qualitative factors such as the perception of drug’s safety profile/efficacy. Properly assessing these factors could potentially inform a medical affairs team to best anticipate KOL needs in the field and properly prepare MSLs for an impactful drug launch based on the specific needs of each region. In the post-approval setting, GT data could help identify knowledge gaps, off-label, and real-world usage trends that would ultimately serve as strong guiding factors to future indications as well as the company’s overall development strategy.
It is important to keep in mind possible limitations associated with using meta-social data in any type of analysis. GT provides little demographic information about searches and expresses all searches as relative, not absolute. While this prevents population-dense regions from dominating searches, it also prevents users from quantifying raw search values. Additionally, search terms, such as a specific drug name, that do not generate significant search volume do not yield meaningful data (a problem that increases with decreasing geographic size). Finally, to best utilize GT, a “syntax screen” is a necessary and timely step to understand how users are searching for your query of interest.
In this follow-up study, we introduce and describe a framework for accessing, extracting, analyzing, and contextualizing social metadata and demonstrate how such a tool can be used in medical affairs. Additionally, we demonstrate how a changing social trend in two highly-used cancer drugs is realized in sales after approximately 12 months. As artificial intelligence and big data continue to improve, MSLs and medical affairs teams can use similar data to understand where and when trends occur and adjust their field medical teams accordingly. Trends in a disease state, off-label drug use, side effects, etc can offer valuable insights that can be proactively addressed in KOL engagements. Geographic data can be used to assess scientific data dissemination within a region, locate trending HCPs/KOLs and hospitals, universities, and clinics that may be of interest to MSLs and medical affairs teams. Overall, we hope this mini case study helps MSLs proactively collect and manage new insights and understand trends in their therapeutic space and within their regions. For any questions or ideas, please reach out to Alec McCarthy (firstname.lastname@example.org).
Timothy Bielecki is an employee of Sanofi Genzyme. His views are his own and do not necessarily represent those of his employer. Nicholas Wojtynek is an employee of Karyopharm Pharmaceuticals. His views are his own and do not necessarily represent those of his employer. Alec McCarthy is a PhD Candidate at the University of Nebraska Medical Center. His views are his own and do not represent those of his employer. None of the authors are affiliated with Google, Merck, or Bristol Myers Squibb and no compensation of any kind was received for the work.
 statcounter: gs.statcounter.com
 Google Trends: trends.google.com
 Google Trends Lessons: Google News Initiative Training Center
 Flourish Data Visualization: flourish.studio
 Cancer Drug Case Study: https://public.flourish.studio/story/970563/
 A. Pesqueira, M. J. Sousa, Á. Rocha, J Med Syst 2020, 44, 197.
 V. Vergetis, D. Skaltsas, V. G. Gorgoulis, A. Tsirigos, Cancer Res 2021, 81, 816.
 A. D. McCarthy, D. J. McGoldrick, P. A. Holubeck, C. Cohoes, L. D. Bilek, Cureus 2021, DOI 10.7759/cureus.16379.
 A. D. McCarthy, D. McGoldrick, Cureus 2021, DOI 10.7759/cureus.15715.
 J. Tay Wee Teck, M. McCann, Int J Drug Policy 2018, 51, 52.
 R. Ågren, Sci Rep 2021, 11, 13136.
 A. McCarthy, T. Bielecki, N. Wojtynek, The Journal of the Medical Science Liaison 2021.
 “Desktop Search Engine Market Share Worldwide,” can be found under https://gs.statcounter.com/search-engine-market-share/desktop/worldwide, n.d.
 A. Mavragani, K. Gkillas, Sci Rep 2020, 10, 20693.
 J. D. Tijerina, S. D. Morrison, I. T. Nolan, M. J. Parham, M. T. Richardson, R. Nazerali, Aesth Plast Surg 2019, 43, 1669.
Alec McCarthy received his BS in Biological Systems Engineering at the University of Nebraska – Lincoln in 2018 and is a PhD Candidate in the Mary and Dick Holland Regenerative Medicine Program at the University of Nebraska Medical Center. His benchside research focuses on dermatology and orthopedics and his clinical research focuses on improving bone health in bariatric surgery patients. He hopes to move into an MSL role following the completion of his PhD.
Nicholas Wojtynek, PhD
Nick received his BA from Saint Mary’s University of Minnesota in 2015 and PhD in Cancer Research from the University of Nebraska Medical Center in 2020. Nick has previous medical affairs experience with Takeda Oncology and E4 Health Group. Currently, Nick is a Medical Science Liaison at Karyopharm Therapeutics.
Timothy Alan Bielecki, PhD
Tim Bielecki received his BA in Biology at Carleton College in 2011 and his PhD in Cancer Biology at the University of Nebraska Medical Center in 2017. Tim is currently a Solid Tumor Medical Science Liaison at Sanofi and strives to digitally innovate the profession by using the skills and knowledge he gained from his past experiences.