Analysis of Indonesian Public Opinion Sentiment on Policy on Twitter Social Media “PPKM” Using K-Nearest Neighbor

— COVID-19 or Coronavirus disease 2019 is currently a pandemic that is spreading very quickly throughout the world, including Indonesia. Various handling and policies have been carried out, one of which is called “ PPKM ” policy or what can also be called the Enforcement of Restrictions on Community Activities issued by the Indonesian


I. INTRODUCTION
At the beginning of 2020 until now, the world's citizens infected with Covid-19 are increasing every day. Based on KOMPAS.com data, the number of patients infected with corona in the world, until March 1, 2021 + 114 million cases. (Toharudin et al, 2021)Data that are positive for Covid-19, amounting to hundreds of thousands of patients who died and 64.4 million were declared cured. The development of  Indonesia is still increasing in number. As a result of the increase in Covid-19 cases, the government implemented various policies in the form of "PSBB" (Mukhtadi et al, 2021), lock down, restrictions on a limited scale, social distancing, new normal, obeying health protocols and other policies. (Napitu et al., 2021).
This policy has the impact of worsening economic conditions and other multidimensional impacts. Realizing that this policy cannot overcome the Covid-19 pandemic, the government from mid-February 2021 implements a policy called the Implementation of Micro Community Activity Restrictions "PPKM" and seeks mass vaccines for all citizens in stages throughout Indonesia.(Ministry of Home Affairs Regulation (Permendagri), 2020). This community service activity aims (Imam et al, 2021), overcoming the spread of Covid-19 which is still increasing, at the village and sub-district level, optimizing restrictions on community activities in various aspects of life up to the village/district level by implementing restrictions on community activities "PPKM" Micro, and Increase public awareness to implement and comply with health protocols, social distancing and restrictions on micro community activities "PPKM" (Miharja et al, 2021). Sentiment analysis to determine the trend of public opinion on the impact of the Corona virus (Astari et al., 2020). Twitter is a social networking service website that is in great demand by internet users as a medium of communication and getting information (Nurjanah et al., 2017). Twitter is one of the platforms used by the public to express the latest conditions after the Corona virus spread. The purpose of this study is to obtain an analysis of text documents to obtain positive, negative or neutral public sentiments. The data used is a tweet document from Twitter about the impact of the Corona virus. The data collected is divided to be used as training data and test data for the classification process. The method used for classification in this study is the Naive Bayes Classifier Method. Given the existing problems, it is necessary to have a sentiment analysis to find out public opinion about PPKM policies by classifying opinions into 2 sentiments, namely positive or negative. Classification is done using the K-Nearest Neighbor (K-NN) Algorithm. (Sari, 2020). The K-Nearest Neighbor (K-NN) algorithm is also a classification method for a set of data based on previously classified data learning. Included in supervised learning, where the results of the new query instance are classified based on the majority of the distance proximity of the categories in K-NN.

A. Materials and Methods
This research was conducted using the K-Nearest Neighbor algorithm, namely the data collection was carried out using the tweet crawling technique, tweet preprocessing, classification and performance evaluation.

A. Crowd Tweet
Crawl is one of the methods used to collect or download data from a database (Gabielkov et al, 2016). In collecting tweet data to be downloaded from the Twitter database, it must go through an authentication process through the Twitter Application Programming Interface (API) (Zheng et al, 2013). Twitter API is a program or application provided by Twitter to make it easier for other developers to access information on the Twitter website (Bošnjak et al, 2012).

B. Preprocessing Tweet
Preprocessing carried out to eliminate incomplete, noisy and inconsistent data with the aim of normalizing the data. (Alasadi et al, 2017;Nayak et al, 2016) The process carried out in preprocessing tweets is case folding, namely changing all capital letters to lowercase, eliminating URLs, eliminating mentions/usernames, namely removing words starting with the "@" symbol, eliminating hashtags, namely removing words starting with the "#" symbol, eliminating symbols such as punctuation marks and emoticons/emojis, then removes Retweets, namely tweets that are copied by other users and then reposted, usually starting with "RT" at the beginning of the tweet (Son et al, 2019;Abrar et a;2019;Astari et al., 2020). In addition, the tokenizing process or what can be called parsing is also carried out, namely cutting the input string in a document based on each word that composes it, in this case the data for each tweet is separated by each word (Pratama et a;, 2019). The tokenizing process aims to simplify the process of eliminating stop words. The last process is eliminating stop words, namely words that are most often used in sentences but have no meaning, in English the words that are used as stop words such as pronouns, prepositions, conjunctions and others (Salim, 2020).

C. K-Nearest Neigbhors
KNN is a classification algorithm that uses an approximation between the value of K in the surrounding conditions (neighbors-system) (Ali et al, 2020). K-NN in the classification process must divide the dataset into training and testing data so that the calculation of the closest distance is easier. K-NN in the calculation of the closest distance using the Euclidean method from testing data to training data. The following is the formula for calculating the Euclidean distance (1). (1) Variables (1)

D. Performance Evaluation
The evaluation results will be obtained from the classification process from KNN to predict sentiment has been carried out and obtain accuracy results from testing the K value by using a confusion matrix to compare the results of the K test continuously (2) (Chikh, 2012). The following is the formula for calculating the confusion matrix in the assessment of accuracy, precision and recalculation at the final stage of K-NN (3) (4)

A. Data collection
In collecting data obtained from Twitter related to PPKM from July 1, 2021 to July 14, 2021 with 964 data, it can be seen in table 1.

B. Preprocessing Tweet
Preprocessing carried out to eliminate incomplete, noisy and inconsistent data with the aim of normalizing the data. The process carried out in preprocessing tweets is changing all capital letters to lowercase, removing URLs, eliminating mentions/usernames, namely removing words starting with the "@" symbol, eliminating hashtags, namely removing words starting with the "#" symbol, removing symbols such as punctuation marks and emoticons/emoji, then removes Retweets, namely tweets that are copied by other users and then re-posted, usually starting with "RT" at the beginning of the tweet. In addition, the tokenizing process or what can be called parsing is also carried out, namely cutting the input string in a document based on each word that composes it, in this case the data for each tweet is separated by each word.

C. K-Nearest Neighbor Results
Testing accuracy with the K-NN algorithm to find out the results of the classification process carried out by dividing the data into 80% training data and 20% testing data. In the testing process using K-NN by providing validation K = 3, K = 5, and K = 7 and the accuracy results obtained will be compared with the confusion matrix can be seen in tables 2 and 3.  From the two tables, the validation accuracy values are different from the first data and the second validation of the K test results in the best accuracy from other K tests. Then the best results obtained are K = 3 85.49% than the other K from the first data and the second data produces the best K = 3 89.66%

V. CONCLUSION
With the application of the K-Nearest Neighbor classification algorithm, the results of the sentiment analysis of Indonesian public opinion on the application of PPKM can be concluded in data preprocessing, sentiment classification and the accuracy of the K-NN Algorithm. Based on data preprocessing and sentiment classification, in the first test positive sentiment was 37.6% from 261 data, negative sentiment was 65.9% from 636 data and neutral sentiment was 9.6% from 67 data. For the accuracy of the K-NN algorithm, the best results obtained are K = 3 85.49% than the other K from the first data and the second data produces the best K = 3 89.66%. Based on the results of the sentiment analysis of Indonesian public opinion on the implementation of PPKM, negative opinions are greater than positive opinions. So that the PPKM system needs to be evaluated and the PPKM system improved. For further research, it is expected to increase accuracy in applying different algorithms to get more accurate results in sentiment analysis.