Predicting Tech Employee Job Satisfaction Using Machine Learning Techniques

High-tech industry employees are among the most talented groups of people in the workforce, and are therefore difficult to recruit and retain. We analyze employee reviews submitted by employees from five technology companies. Following the Cross-Industry Standard Process for Data Mining (CRISP-DM) and the data science life cycle process, we use machine learning techniques to analyze employees’ reviews. Our goal is to predict an overall measure of whether employees are satisfied or not, using other information from the reviews, such as employer attitudes towards upper management. We also use predictive analysis to determine which features are more helpful in determining an employee’s overall job satisfaction. Finally, we analyze which prediction algorithm provides the most accurate predictions. We find the percentage of true positives we correctly identify in the holdout sample is 97.4%, while the percentage of true negatives correctly identified is 72.5%.

One of the major problems that organizations face is employee turnover. It can cost businesses a lot to find replacements. Thus, it is important to understand what causes employees to be satisfied or dissatisfied with their companies, to both prevent employee turnover and to keep current and prospective employees feeling satisfied with their work.
There are many sources of data concerning employee opinions about their companies. These include internal data collections (within the companies) and external sources such as data collections at Glassdoor.com. Datasets from open sources tend to be anonymous, which allows employees to have more freedom to share their real feelings about their places of work, without the risk of losing their jobs. As a result, true in-depth information is available, and since the reviews are typically anonymous, they often contain a voluminous amount of information. Thus, these datasets allow researchers to do a deep analysis of employee opinions, using various techniques.
Most of the current data collections from online reviews include both star ratings and open-ended opinions in a textual format. Reviews are submitted by various groups of employees, which vary by type of job (e.g., programmers or managers), status (e.g., current or former employees), rank, and anonymous versus nonanonymous, etc. Such datasets allow more in-depth analysis of what employees think about the companies they work for, which can help companies improve the recruiting and retention of these employees.
Current data analysis tools and techniques allow researchers to analyze electronic data more effectively. To structure our business predictions, we use the Cross-Industry Standard Process for Data Mining (CRISP-DM) and the data science life cycle process. CRISP-DM is a widely accepted methodology for data mining and analytics (IBM Knowledge Center). It involves five phases: 1) Business Understanding, 2) Data Understanding, 3) Data Preparation, 4) Modeling, 5) Evaluation and 6) Deployment. The data science life cycle process consists of the problem definition, ETL (extract, transform, load) & feature extraction, learning, and model deployment & development. We use machine learning techniques to analyze tech industry employees' reviews to find out whether or not employees' overall job satisfaction (as measured by star ratings) can be predicted, based on other data available in their reviews, such as star ratings of senior management, as well as written comments. This paper complements previous research (Conlon, 2021). That previous paper looked at why data scientists consider changing jobs as a function of environmental variables (such as city development levels and company sizes and types). The current paper, by contrast, looks at high-tech employee job satisfaction (and so, presumably willingness to stay at their current job) in terms of their attitudes towards various aspects of their employment situation. This paper then examines what the major features are that contribute to accurately predicting this job satisfaction, and which algorithms perform best in predicting job satisfaction.
Using CRISP-DM, we begin with Business Understanding. That is, we define our data mining objectives and the data mining problem in terms of a set of research questions. Thus, in this research, we ask: RQ 1: Can we use machine learning techniques to predict employee job satisfaction from employees' reviews? RQ 2: Using machine learning techniques, which algorithms perform best in predicting employee job satisfaction? RQ 3: Using machine learning techniques to analyze employee reviews, which features have the highest predictive power in helping the system to predict accurately?
The rest of the paper is organized as follows. First, we discuss related work in the analysis of employees' job satisfaction, text mining, sentiment analysis and machine learning. Next, we discuss the dataset used, and also does some comparisons of the data across several firms. Also, we discuss our predictive analysis methodology and presents our research findings. Then, we discuss the results from the analysis and the business implications based on the techniques and the findings. Finally, we conclude the paper and proposes future research.

Related Work
This research is based on two major related research areas: analysis of employee job satisfaction and the techniques used in analyzing datasets to do predictive analysis (we use machine learning techniques in this research).

Employee Job Satisfaction
There has been a great deal of research on employee job satisfaction in the literature. In general, employee job satisfaction measures various dimensions of whether or not employees are satisfied with their jobs. Locke (1976) defines job satisfaction as "a pleasurable or positive emotional state resulting from the appraisal of one's job or job experiences" (p. 1304), while Spector (1997) defines job satisfaction as "how an individual is with his or her job; whether he or she likes the job or not." The level of job satisfaction can be evaluated at the global level (overall satisfaction) or according to the particular aspects of the job that make employees satisfied or not satisfied. Spector (1997) lists 14 common aspects of employee job satisfaction, i.e., appreciation, communication, coworkers, fringe benefits, job conditions, nature of the work, organization, personal growth, policies and procedures, promotion opportunities, recognition, security, and supervision. Hulin and Judge (2003) argue that job satisfaction is influenced by several psychological factors including cognitive (evaluative), affective (or emotional), and behavioral factors. Cognitive job satisfaction can be evaluated using one dimension, such as benefits or supervision, or multiple dimensions if two or more facets of a job are evaluated simultaneously. Affective or emotional factors are a response to the job or to cognitive opinions about the job and reflect the degree of pleasure or happiness employees feel about their jobs. Finally, Hulin and Judge (2003)  Employee turnover costs business a lot. Employee Benefit News (EBN) reported in August 11, 2017 that if an employee leaves a company, it will cost the employer 33% of that worker's annual salary to hire a replacement. For example, for a media salary of $45,000 a year, the cost of finding a replacement is about $15,000 per person (Bolden-Barrett, 2017). Thus, understanding what influences employee job satisfaction can benefit firms a great deal if it can help them to prevent employee turnover. Hinkin and Tracey (2000) lists the five major categories of employee turnover costs as: • Predeparture (costs that are incurred once an employee has given notice), • Recruitment (promotional materials, advertising, and recruiting sources), • Selection (identifying the most suitable candidates -Interviewing, background and reference checks, and travel expenses), • Orientation and Training (almost everyone requires some sort of formal or informal training), • Productivity Loss (largest percentage of the total costs, up to 70 percent in some cases Some important related research on employee turnover can also be found at Abraham, 1999 Thus, being able to predict whether employees are satisfied or dissatisfied with the organization should be very helpful in assisting firms to reduce employee turnover.

Machine Learning Techniques and Their Applications in Human Resource Management
Machine learning (ML) is a subarea of artificial intelligence that aims to use algorithms and statistical models to train computer systems to perform tasks such as prediction, without explicitly programming the computer for that specific task. With the rapid growth of electronic data, ML has been used in many application areas such as medical diagnosis, speech recognition, image processing, and many business applications.
In business, ML has been used widely in areas such as sales, product recommendation, dynamic pricing, marketing, finance, etc. In the human resource management area, ML has been used, for example, in predicting employee turnover ( (20).
In order to analyze employee churn, Bendemra (2019) built an employee churn model to develop a strategic retention plan using Python and found that the stronger indicators of people leaving include: Monthly Income (employees with higher wages are less likely to leave), Overtime (people who work overtime are more likely to leave the company), age (25-35 are more likely to leave), Distance From Home (Employees who live further from home are more likely to leave the company), TotalWorkingYears (more experienced employees are less likely to leave, so employees who have between 5-8 years of experience should be identified as potentially having a higher-risk of leaving), YearsAtCompany (Employees who hit their two-year anniversary should be identified as potentially having a higher-risk of leaving), YearsWithCurrManager (A large number of leavers leave 6 months after they have worked for their current managers) (Bendemra, 2019).

Data Collection
In this section, we continue with phase two of the CRISP-DM methodology: data understanding. In this phase, we collect and describe our data collection, and assess data quality. We used online data sources for our research and analysis.

Data Source:
In this study, we used data consisting of employee reviews from technology companies. We downloaded the data from Kaggle.com (the dataset has since been removed). The data retrieved from Kaggle.com was originally scraped from the website Glassdoor.com by the original Kaggle.com collection author. The Kaggle.com page included over 67k reviews from employees of Google, Amazon, Facebook, Apple, Microsoft, and Netflix. Due to a disproportionately low number of Netflix reviews (810), we excluded Netflix from our study. Reviews are both from current and former employees, and both from anonymous employees and from employees whose identity was disclosed. An excerpt from the csv data file is shown in Figure 1.

Figure 1: Format of the Data
Similar to general product reviews that appear in many online sources, this data set consists of both textual and numerical information. The attributes provided by the Kaggle.com data collection page are described in Table  1  This provides the user with a direct link to the page that contains the review. However, it is likely that this link will be outdated

Distribution of the data
The total number of reviews for each company is displayed in Figure 2.

Figure 2: Number of Reviews for Each Company
A count of the overall ratings by number of stars for all companies is shown in Figure 3.  The star ratings are between 1 and 5 (1 is low and 5 is high). To make the predictions more accurate, we grouped the reviews into two broad categories: satisfied and dissatisfied. The reviews that contained an overall rating between 3 and 5 were classified as "satisfied" while the reviews that contained overall ratings of 1 or 2 were classified as "dissatisfied." We broke up ratings like this because we felt that only the 10%-20% most dissatisfied employees were likely to leave. However, as a robustness check, we also analyze the 123/45 classification below. Table 2 shows the proportion in percentages of the 12 vs 345 groups for each company. A graphical representation of star ratings is shown in Figure 4. There is unstructured text in the summary column, as well as in the pros and cons columns of the reviews. Text comments are important because they include an in-depth employee views about the company in question. One way to understand which terms are important in the corpus is by using a word cloud. A word cloud is a visual representation of the textual data. The level of importance and/or frequency is indicated by the size of the word. Figure 5 shows an example of a word cloud. This example shows that, for example, the term "management" is an important word that is mentioned frequently.

Word Frequencies
Star ratings indicate the level of satisfaction but do not indicate the nature of the sentiment creating that satisfaction or dissatisfaction. To determine the specific reasons why employees feel satisfied or not, we use text analysis techniques to study the contents in the reviews. Based on the theory of information retrieval, the system first eliminates "stop words" (such as "is,", "else," "between," "the"). The number of occurrences of the remaining terms then indicates the importance of those terms in the reviews. The following table shows some examples of term occurrences in the comments from positive and negative reviews (based on the two rating categories, 1-2 being negative and 3-5 positive). Sample term occurrences from positive reviews are listed in Table 3 and from negative reviews in Table 4.

Predictive Analysis
One of the major goals of this research is to create a system that can predict a target metric for an organization.
For example, an organization may wish to predict whether an employee is satisfied with his/her company or not.
We also want to examine which factors are most closely associated with overall satisfaction or dissatisfaction. The system is trained by using the data from the employee review dataset. We will use several machine learning algorithms. After the training process is complete, the system will rank the algorithms by the model that performs the best. The top performing algorithm will then be used for future predictions. The overall workflow for this process is shown in Figure 6.

Figure 6: Workflow Diagram
There are 15 features in the employee review data set, including short summary, pros, cons, work/life balance, culture and values, career opportunities, and senior management. "Overall rating" is our target feature that we want to predict. It indicates whether employees are satisfied with their company or not. The rating values are between 1 and 5 and indicate, from low to high, the degree to which the employee is satisfied with his or her company.
Data mining methods are categorized as either supervised (a specific target variable selected for analysis) or unsupervised (no specific target variable) (Larose and Larose, 2014). Since we know our target for prediction, we are interested in supervised learning. Supervised learning is the most common data mining method. Regression, decision trees, neural networks and support vector machines are all supervised learning algorithms.

Tools and Processes
Recall that there are five phases in CRISP-DM: 1) Business Understanding, 2) Data Understanding, 3) Data Preparation, 4) Modeling, 5) Evaluation and 6) Deployment. We are now entering into the third and fourth phases of this process. The data preparation phase is where we preprocess the data (data cleaning and preparation) and the modeling phase is where we train the model or algorithm.
(1) The preprocessing stage: The data preparation stage is critical in preparing the data for modeling and analysis. We need to ensure a clean and valid dataset. For example, we need to remove the data that will not be analyzed. In this dataset, we removed the reviews from Netflix due to a low number of employee responses in comparison to the other companies. Other data cleaning and preparation tasks are based on what we want the system to predict and thus which features should be included or removed. For the data cleaning stage, we mainly use spreadsheet software to prepare our data.
For the top algorithms presented, if the prediction performance measures are not high enough, these algorithms will not predict accurately in the holdout sample. If the performance is lower than our acceptable threshold, we will make changes to the dataset such as regrouping the data for clarity, and resubmit the data to the algorithm. For example, if the target feature is "overall ratings," which consists of 5 different values (1-5), it is more difficult to make good predictions than if we try to predict whether the employees are satisfied or not (yes or no). This is why we change the values in the target feature by grouping the reviews that have "overall ratings" 1 and 2 as "unsatisfied" and 3 through 5 as "satisfied. The data with the original and modified rankings are shown in Figure  7a and Figure 7b, respectively (compare column J in the two figures). This modification of the data helps to improve the system performance a great deal. Intuitively, the algorithms we consider treat the number of stars as a categorial variable, rather than as an interval variable. That is, the algorithms essentially maximize precision, so a prediction of four when the true value is five is treated as if it were as bad as a prediction of one or two when the true value is five. Thus, if we do not transform our target feature, then the algorithms penalize "near misses" too much. Aggregating categories largely solves this problem. However, future work will look at algorithms which treat our target variables as interval variables, so we will not have to aggregate categories. The system performance information is discussed in Section 5.1.
(2) Training of the model: The modeling phase uses the cleaned data as the input file for the system to learn how to predict the overall "satisfied" and "dissatisfied" categories in a training data set. The resulting predictions are then evaluated on a holdout data set. Once a set of good models is obtained, they are used to determine which features are most useful in predicting satisfaction. It is also possible to use the models to predict the likely overall ratings for reviews that do not actually have an overall rating. During this stage, the machine learning tool, DataRobot, is used.
DataRobot (https://www.datarobot.com/) is a supervised automated machine learning tool implemented by several successful data scientists. It analyzes a data set for a target feature and runs the data using several algorithms. The best algorithm for predicting the target correctly with acceptable speed will be recommended for use for future predictive analysis purposes. The outputs of the other algorithms are also presented in case the user prefers using them for some purpose.
After the data set is cleaned and ready for the training phase, the data set is uploaded to DataRobot. The target feature (i.e., overall rating) is then indicated. DataRobot then uses part of the data set to train the system and identify which algorithms are most promising (i.e., can best predict the target feature). The algorithms that perform well then use more data to train until they successfully fit an acceptable amount of the training data (such as 80%). The top algorithms are then presented to the user.

Results and System Performance
The last phase of the CRISP-DM methodology is evaluation. Here we evaluate the performance of the algorithms. In our case, DataRobot ranked the three algorithms with the highest performance as: (1)

Performance Evaluation
In general, prediction performance is measured by the AUC and log-loss measures. The AUC is the area under the (ROC) curve, where the ROC (receiver operating characteristic) curve is the curve that illustrates the tradeoff between false positives (on the horizontal axis) and true positives (on the vertical axis), for different values of the cutoff threshold. For a more demanding threshold, the system will incorrectly classify a smaller number of truly negative individuals as positive, but will also correctly identify a smaller number of truly positive individuals as positive. The ROC curve is therefore upward sloped. A value of the AUC close to one means that one can identify a large percentage of the true positives, while suffering only a small number of false positives. In this study, a "positive" indicates the individual is satisfied, overall, with the job, in the sense that their overall star rating is a 3, 4 or 5. The log loss measure, on the other hand, is simply the negative of the log likelihood function, so a small value of the log loss indicates a higher value of the likelihood function, and so, a better fit to the data. Table 5  Similarly, the percentage of true negatives correctly identified is 72.5%. The DataRobot software also indicates which features were most important in each algorithm. These rankings are shown in Table 6. According to the Nystroem Kernel SVM, the top two features were number of stars for "career opportunities" and "culture values," followed by the text categories "summary" and "cons." The eXtreme Gradient Boosted Trees Classifier had a similar ranking, but with "culture values" and "career opportunities" switched. The Auto-Tuned Word N-Gram Text Modeler only used the "cons" feature, and so, is not included in Table 6.

Robustness
To check for the robustness of our results, we also reran our data, but with the target feature indicating that the employee was dissatisfied if the employee chose one, two or three stars and satisfied only if they chose four or five stars. Thus, more reviews were included in the dissatisfied category. The results are presented in Table 7, where again, FN, TN, FP and TP are for the holdout sample. The Nystroem Kernel SVM was again the best performing algorithm, but the Auto-Tuned Word N-Gram Text Modeler using Token Occurrences -Cons and the Auto-Tuned Word N-Gram Text Modeler using Token Occurrences -Summary algorithms were the two next best algorithms. Again, the Auto-Tuned Word N-Gram Text Modeler using Token Occurrences algorithms only use one feature each, i.e., the text categories "cons" and "summary," respectively, so Table 8 only shows feature rankings for the Nystroem Kernel SVM. The rankings of the first six features are identical to those above, though the last three features are different when predicting 1 through 3 stars versus 4 or 5 stars. Note, that the features "Cons," "Summary," "Pros," and "Advice to Management" are in a textual format, "Helpful count" is the number of readers that voted the comments from the reviewers helpful to them, and "stars" categories consist of numerical scores (1-5 stars).
In summary, the Nystroem Kernel SVM Classifier best predicted overall satisfied or dissatisfied employees, and the features "Career opportunities" and "Culture values," are the most important features helping the system to predict the target feature correctly, and thus have high predictive power (Burkov, 2019).
It is also worth noting that system performance was better when predicting the 12/345 breakdown of job satisfaction than when predicting the 123/45. This suggests that the three-star reviews are more appropriately classified with the four and five-star reviews than with the one and two-star reviews. For example, employees may use three stars as indicating a satisfactory work environment, like four and five stars, while they may reserve one and two stars to express extreme dissatisfaction.

Discussion
By using machine learning techniques to perform predictive analysis on employees' reviews, we are able to predict with high accuracy whether employees are satisfied or dissatisfied with their jobs. Thus, we are able to answer our research questions as follows: RQ 1: Can we use machine learning techniques to predict employee job satisfaction from employees' reviews?
The AUC and Log loss as well as true versus false positives and negatives indicate that using machine learning techniques, the system is able to predict employee job satisfaction very well.
In addition to using the system evaluation using AUC and Log loss, we also evaluated the system by predicting a set of data consisting of 80 reviews that were not part of the original training set. We removed the overall satisfaction values and let the systems predict them. We compared the results created by the system with the real overall satisfaction values given by the reviewers. The system was able to predict whether those employees were satisfied with their companies or not with an accuracy rate of about 95%. Thus, we believe that the high accuracy rate is largely because the dataset is very clean and many of the features are themselves in terms of numerical scores.
Thus, we can conclude that it is possible to predict employee job satisfaction using machine learning techniques.
RQ 2: Using machine learning techniques, which algorithms perform best in predicting employee job satisfaction?
Using machine learning techniques, the system was able to predict the target feature with high accuracy rates. The most successful algorithms are: (1) the Nystroem Kernel SVM Classifier, (2) the eXtreme Gradient Boosted Trees Classifier with Early Stopping, and (3) the Auto-Tuned Word N-Gram Text Modeler using Token Occurrences -Cons.

RQ 3:
Using machine learning techniques to analyze employee reviews, which features have the highest predictive power in helping the system to predict accurately?
Most features in this dataset contribute highly to the predictive analysis, since the dataset is fairly clean. However, the top two features that provided the highest prediction impact (predictive power) are "career opportunities" and "culture value." This means that, in order to predict whether an employee is satisfied with his/her company, analyzing the "career opportunities" and "culture value" provides more predictive power than other features.
One interesting thing we found in the results is that the contents of the textual "summary" feature has more impact on the prediction accuracy than the contents of the "cons" and "pros" features. We believe that the terms used in the "summary" feature must be more distinctive than those used in the other two features (the words used must contain more information).

Business Implications
In order for businesses to perform well, they need employees who are satisfied with the company. Knowing whether employees are satisfied and knowing the major indicators for satisfaction will help businesses to obtain the optimal performance from their employees. Thus, our research findings can contribute to several human resource management tasks including: • Enhancing recruitment of strong employees. If a company knows what makes employees satisfied, it can improve the company's culture to incorporate a more satisfying workplace. As a result, the work environment can attract strong prospective employees. • Predicting employees' retention (staying with the company) -Turnover is very costly. If employers can identify dissatisfaction in employees and rectify the situation, this may lead to greater employee retention. • Improve the employer/employee relationship -Our study finds that "career opportunities" and "culture values" lead to more satisfied employees. More satisfied employees may be more prone to feel good about their position with the company. This positive relationship may encourage employees to perform well at work. • Applying the algorithm to other areas of business -Machine learning tools such as DataRobot, and predictive models can be applied to other areas such as analyzing customer satisfaction to predict churn.

Conclusion and Future Research
In this paper, we demonstrated that it is possible to use supervised machine learning techniques to analyze and predict employee job satisfaction with high rates of accuracy. The best performed algorithm is Nystroem Kernel SVM Classifier with an accuracy rate of more than 96% according to the AUC-measure. In addition, the system can find the most important factors that provide a high impact on predicting job satisfaction.
In our study, a career opportunities and company's work culture contribute greatly to predicting employee job satisfaction.
Building on what we have found from this study, we plan to extend this research in the following areas: • Incorporate more semantic and linguistic analyses using terms and their relations (words/phrases, lexical semantic relations, etc.) to improve system performance. E.g., how exactly does the text in the "summary" and "cons" features help to predict job satisfaction? • Apply similar techniques to other application domains such as supply chains, health care, and security.
• Compare employee job satisfaction across similar businesses, such as Apple vs. Microsoft. From the current dataset, we find that the overall ratings for both companies are similar. However, by using semantic analysis to examine the textual comments in the employee reviews (pros, cons, and summaries) we can identify more specifically what employees think about their companies.