TWITTER SENTIMENT ANALYSIS TOWARDS QATAR AS HOST OF THE 2022 WORLD CUP USING TEXTBLOB

Twitter provides services to its users in the form of creating status messages or what is usually called tweets . Through tweets , users can express opinions, views, or emotions toward a topic. On 2 December 2010, Qatar was selected to host the 2022 World Cup. Qatar's selection as the host of the 2022 World Cup could elicit a variety of responses from various circles around the world. This research used TextBlob to find out the sentiment of Twitter users around the world regarding Qatar as the host of the 2022 World Cup. The research uses three stages, the first stage before Qatar was selected to host there was 88.46% positive sentiment and 11.54% negative sentiment, the second stage after Qatar was selected to host there was 79.38% positive sentiment and 20.62% negative sentiment, the third sentiment when the 2022 World Cup took place in Qatar was 83.72% positive sentiment and 16.28% negative sentiment. Based on the study, an accuracy score of 83% was obtained, meaning that the model was able to accurately predict 83% of the total testing data. This study can predict new data without having to be labeled first.


INTRODUCTION
Twitter is one of the most popular social communication platforms today.Twitter can connect people around the world through computers or mobile phones.Twitter provides services to its users in the form of creating status messages or what is usually called tweets (Fikri et al., 2020).Through tweets, users can express opinions, views, or emotions towards a topic, if the topic gets a lot of responses accompanied by the use of hashtags by users it will become a trending topic.
One of the topics that echo on Twitter and is a trending topic right now is the 2022 FIFA World Cup.On 2 December 2010, Qatar surprisingly won the bid as host of the 2022 FIFA World Cup, thus Qatar becoming the first Middle Eastern country to host the World Cup.Qatar was chosen to host after defeating Australia, Japan, South Korea, and the United States who are also running to host the 2022 World Cup.Qatar's selection as the host of the 2022 World Cup could elicit a variety of responses from various circles around the world.Social media such as Twitter is one of the places to give a response or opinion.(Odd, 2016) This can be used as material for sentiment analysis toward Qatar as the host of the 2022 World Cup.
Sentiment analysis belongs to one of the areas of Natural Language Processing (NLP) and is a process designed to identify the content of datasets in the form of opinions or views (sentiments) in the form of text on topics that are positive, negative, or neutral.However, users will face difficulties when reading Fauzi & Adinugroho (2018) tweets directly without marking them as positive, negative, or neutral.Therefore, a classification is needed that allows users to easily see which tweets are of positive or negative value (Primary, Ariesta, & Gata, 2022).
Research conducted by Ravikumar Patel and Kalpdrum Passi entitled "Sentiment Analysis on Twitter Data of World Cup Soccer Tournament Using Machine Learning" explains that the analysis was carried out by applying machine learning techniques.The data collected in this study is tweet data with hashtags "#brazil2014", "#worldcup2014", and match hastags.Results showed that Naïve Bayes provided the best accuracy of 88.17%, while random forest provided the best area with an AUC of 0.9 (Patel & Passi, 2020).
Research conducted by I Gede Susrama Mas Diyasa et al with the title "Twitter Sentiment Analysis as an Evaluation and Service Base on Python Textblob" explained that Twitter can be a place to express feelings, to customers, one of which is PT Telkom Indonesia.Tweet data processed using Textblob resulted in 34.4% positive tweets, 16.1% negative tweets, and 49.6% neutral tweets (Mas Diyasa et al., 2021).

METHOD
This study used the flowchart methodology in Figure 1.

Figure 1. Research methodology flowchart
The data that has been collected cannot be directly analyzed because the data is still not clean.Data that has not been cleaned must go through several stages of Data Preprocessing.Once the data is clean, it will be analyzed using TextBlob to find out which tweets are positive and negative.The final step is the visualization of the results of the analysis.

Data Collection
Tweet data is captured by means of web scraping using SNScrape.SNScrape is a python library that is useful for web scraping on social media, one of which is Twitter.Researchers took a Rivaldi et al (2022) tweet in English relating to the World Cup in Qatar in 2022.Data is collected starting before Qatar was selected to host, after being selected to host, and when the 2022 World Cup took place in Qatar.

Data Preprocessing
Data Preparation is performed before the dataset is implemented into the model.Data Preparation is a must.The goal is to reduce data noise to make it cleaner so that maximum results are obtained.The process is divided into several stages, namely: (1) drop unnecessary columns to remove unnecessary columns, (2) change column name to change column names to make it easier to understand, (3) add tweet columns to add new columns that later the data has been cleaned, (4) remove hyperlinks to remove links in tweets, (5) remove retweets to remove the retweet label in text, (6) remove symbols to remove symbols in the tweet, (7) remove stopwords to remove words that have a function but have no meaning, (8) remove duplicate to delete duplicate data (Giovani et al., 2020).

Data Classified
The sentiment analysis process is performed using the TextBlob library.TextBlob is a library used for textual information that provides a simple API for accessing Neuro-Linguistic Programming (NLP) activities.(Hazarika et al., 2020)The TextBlob library is capable of processing three types of classifications, namely positive, negative, and neutral.However, TextBlob can only be done in English.Therefore, researchers only take tweets that use English.In TextBlob there is a calculation that returns the polarity value.If the polarity value is greater than 0, then it is positive.If the polarity value is equal to 0, it is neutral.If the polarity value is less than 0, then negative (Mas Diyasa et al., 2021).

Tokenization
Tokenization is the process of breaking a sentence into a stand-alone set of words.Tokenization breaks down text that was originally a sentence into words.Tonization also eliminates delimiters such as periods, commas, spaces, and number characters in sentences (Hafidz, 2020).

Data Visualization
The visualization of the sentiment analysis results is displayed with the matplotlib library which is one of the libraries in Python for performing statistical data processing, visualization, and plotting (LEMENKOVA, 2019).The visualization is displayed in the form of a word cloud.A Word cloud is a display of words that appear frequently, and its size shows the frequency of occurrence in the data.Words whose frequency of occurrence is larger in size.On the contrary, words whose frequency of occurrence is less are smaller in size (Permadi, 2020).

Data Collection
The web scraping process is carried out in three stages, the first stage was when Qatar offered to host in March 2009 until before the announcement of Qatar's selection to host on December 1, 2010, the second stage which was after the official announcement of Qatar's selection to host on December 2, 2010, until one year later on December 2, 2011, and the last stage is when the FIFA World Cup starts on November 20, 2022 until December 18, 2022.  1 is the date the tweet was retrieved with the amount of data on that date.The maximum data results from web scraping are up to 200,000 at each stage, this is done so that the data is not unlimited.

ChrisThomasFC
In this study, researchers used the attributes 'Datetime', 'Tweet Id', 'Text', and 'Username' taken from the web scraping process.An example sample of web scraping result data is shown in Table 2.After the data is collected, the tweet frequency can be seen in Figure 2, Figure 3, and Figure 4.The diagram in Figure 4 shows the frequency of tweets at the start of the 2022 FIFA World Cup in Qatar from November 20, 2022, to December 18, 2022.It can be seen that the peak of tweets occurred at the opening of the World Cup, after that there was a decline.Even though the number of tweets had increased quite drastically on December 9 to 10 2022, after that it went down again.

Data Preprocessing
Table 2 contains sample data, the 'Original Tweet' column shows tweets that still contain unnecessary URLs, usernames, and ASCII symbols that could interfere with the visualization.Therefore, it must be removed first.The data that has not been cleaned is carried out in several processes that have been described in the research methods section.After several stages of data preprocessing, the data will be easier to process into information.The clean data can be seen in Table 3 of the 'Tweets' column.

Data Classified
The data structure that used in this process is data that already through the data preprocessing process.Tweet text data consist of predictor variable that is tweets that already cleaned, and response variable contains the tweet sentiment of the classification results (positive and negative).Next, code to implementation TextBlob.
The sentiment result of '1' is positive, while the result of sentiment of '-1' is negative.If the negative score exceeds the positive score, then the tweet is included in negative sentiment, besides that it is included in positive sentiment.This can be seen in the code.Table 4 is a sample of the results of the analytical sentiment used in this study.The classified data process is divided into three stages, the first stage being when Qatar offered to host in March 2009 until before the announcement of Qatar's selection to host on December 1, 2010. Figure 5 is the result of the sentiment analysis produced in the first stage.Before Qatar was announced as the host, it showed that people had little negative feelings towards Qatar.In contrast, positive sentiment resulted is more than 88% of the total data retrieved.The number of tweets before Qatar was announced as the host was also very small compared to after the announcement and at the time Qatar's 2022 World Cup took place.The second stage was after the official announcement of Qatar's election to host on December 2, 2010 until one year later on December 2, 2011.Figure 7 is the result of the sentiment analysis produced in the second stage.After Qatar was announced as the host, it shows that people have an increase in negative sentiment toward Qatar.Meanwhile, positive sentiment has decreased slightly from before.When viewed from the number of tweets after Qatar was announced as the host has increased a lot compared to before Qatar was announced as the host.The increase in the number of tweets was followed by an increase in negative sentiment towards Qatar.In Figure 10 the Y axis represents the number of tweets.Meanwhile, the X axis represents each day.The line chart compares negative sentiment and positive sentiment each day.The increase in positive sentiment was accompanied by an increase in negative sentiment.

Tokenization
The tokenization process is used to recognize words and break sentences into words based on spaces and punctuation.Table 4 is a sample of the tokenization results.
Table 5. Tokenization Process Tweets Tweets branding qatar world cup bid mt via 'branding ','qatar','world','cup','bid','mt','via' qatar bidding nation fifa world cup 'qatar','bidding','nation','fifa','world','cup' Let Us Support Qatar World Cup Bid 'let','us','support','qatar','world','cup','bid' 5. Data Visualization The data that has gone through the tokenization process is then grouped to get the frequency of the words that appear the most.The frequency of this word will be visualized into a positive and negative wordcloud as shown in Figure 11, Figure 12, and Figure 13.

CONCLUSION
In this study, sentiment analysis of tweets towards the 2022 FIFA World Cup in Qatar has been carried out using TextBlob.Data obtained from Twitter through web scraping with the keyword search for the 2022 FIFA World Cup in Qatar, obtained data as many as 244,832 tweets.The data is then analyzed to determine positive and negative sentiment.The research uses three stages, the first stage before Qatar was selected to host there was 88.46% positive sentiment and 11.54% negative sentiment, the second stage after Qatar was selected to host there was 79.38% positive sentiment and 20.62% negative sentiment, the third sentiment when the 2022 World Cup took place in Qatar was 83.72% positive sentiment and 16.28% negative sentiment.Based on the study, an accuracy score of 83% was obtained, meaning that the model was able to accurately predict 83% of the total testing data.This study can predict new data without having to be labeled first.

Figure 2 .
Figure 2. Tweet Frequency Before Qatar's Vote to Host

Figure 3 .
Figure 3. Tweet Frequency After Qatar Was Selected to Host

Figure 4 .
Figure 4. Tweet Frequency during the 2022 FIFA World Cup

Figure 5 .
Figure 5. Percentage of Sentiment Before Qatar's Election to Host

Figure 6 .
Figure 6.Comparison of Positive and Negative Sentiment before the Announcement

Figure 7 .
Figure 7. Percentage of Sentiment After Qatar's Election to Host

Figure 8 .
Figure 8.Comparison of Positive and Negative Sentiment after the Announcement

Figure 9 .
Figure 9. Sentiment Percentages during the 2022 FIFA World Cup

Figure 10 .
Figure 10.Comparison of Positive and Negative Sentiment during the 2022 World Cup

Figure 11 .Figure 12 .
Figure 11.Positive and Negative Wordcloud Before Qatar Was Selected to Host

Table 2 .
Sample Tweet Data Spector scored those two goals in order to get back at Sir Alex Ferguson backing Qatar's 2022 World Cup Bid.#goUSAbid

Table 3 .
Sample Data Before Cleaning and After Cleaning

Table 4 .
Sample Sentiment Analysis Results Mark it: Russia for World Cup 2018, Qatar for World Cup 2022.Nothing surprises me with FIFA anymore.Lack of FIFA talk for US bid is odd.