Main Article Content

Abstract

The increasing number of Android applications available on the Google Play Store with the benefits the developers get has attracted the attention of many Android application developers. To benefit from developing Android apps, one way is to know the characteristics of highly rated apps on the Google Play Store. This research will investigate the features of size, installs, reviews, type (free / paid), rating, category, content rating, and price on applications on the Google Play Store to determine the characteristics of high-rated applications. This study uses the Random Forest algorithm to identify the most influential features in high ranking applications on the Google Play Store. At the preprocessing stage, this research uses data cleaning methods and data reduction using SQL Server. This study uses feature important to find out the attributes that most influence the high ranking of Android apps on the Google Play Store. To classify high-ranking applications, the authors use 8-fold cross validation using the Random Forest algorithm and get better results than the Gradient Boost, K-NN, and Decision Tree algorithms with an accuracy of 83%. The results of the Random Forest algorithm also have better performance than the algorithm from the previous research conclusions, with a 0.8% increase in accuracy. To classify high-ranking applications, the authors use 8-fold cross validation using the Random Forest algorithm and get better results than the Gradient Boost, K-NN, and Decision Tree algorithms with an accuracy of 83%. The results of the Random Forest algorithm also have better performance than the algorithm from the previous research conclusions, with a 0.8% increase in accuracy. To classify high-ranking applications, the authors use 8-fold cross validation using the Random Forest algorithm and get better results than the Gradient Boost, K-NN, and Decision Tree algorithms with an accuracy of 83%. The results of the Random Forest algorithm also have better performance than the algorithm from the previous research conclusions, with a 0.8% increase in accuracy.

Keywords

Random-Forest Classifier, Feature Important, Performance Evaluation, Root Mean Squared Error, 8-Fold Cross Validation

Article Details

How to Cite
Maringka, R., Khoirunnita, A., Maringka, R., Utami, E., & Kusnawi. (2021). Android App Rating Classification on Google Play Store Using Random Forest Algorithm with SQL Server Preprocessing. TEPIAN, 2(2), 79-84. https://doi.org/10.51967/tepian.v2i2.404

References

  1. A Gentle Introduction to k-fold Cross-Validation. (nd). Retrieved November 8, 2020, from https://machinelearningmastery.com/k-fold-cross-validation/
  2. Accuracy, Precision, Recall or F1? | by Koo Ping Shung | Towards Data Science. (nd). Retrieved November 8, 2020, from https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9
  3. App Download and Usage Statistics (2020) - Business of Apps. (nd). Retrieved November 8, 2020, from https://www.businessofapps.com/data/app-statistics/
  4. Aralikatte, R., Sridhara, G., Gantayat, N., & Mani, S. (2018). Fault in your stars: An analysis of android app reviews. ACM International Conference Proceeding Series. https://doi.org/10.1145/3152494.3152500
  5. Azhagusundari, B., & Thanamani, AS (2013). Feature Selection based on Information Gain. International Journal of Innovative Technology and Exploring Engineering (IJITEE).
  6. Bavota, G., Linares-Vásquez, M., Bernal-Cárdenas, CE, Di Penta, M., Oliveto, R., & Poshyvanyk, D. (2015). The impact of API change- and fault-proneness on the user ratings of android apps. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2014.2367027
  7. Feature Selection using Information Gain | by Muhammad Yunus | Medium. (nd). Retrieved November 8, 2020, from https://medium.com/@yunusmuhammad007/feature-selection-mengusing-information-gain-ba94ca66f658
  8. Google Play Store Apps | Kaggle. (nd). Retrieved November 8, 2020, from https://www.kaggle.com/lava18/google-play-store-apps
  9. Harman, M., Jia, Y., & Zhang, Y. (2012). App store mining and analysis: MSR for app stores. IEEE International Working Conference on Mining Software Repositories. https://doi.org/10.1109/MSR.2012.6224306
  10. Ho, TK (1995). Random decision forests. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 1, 278–282. https://doi.org/10.1109/ICDAR.1995.598994
  11. How Much Money Can You Earn With an App in 2019? | Fueled. (nd). Retrieved November 8, 2020, from https://fueled.com/blog/much-money-can-earn-app/