R language performs text sentiment analysis on twitter data | with code data

Original link: http://tecdat.cn/?p=4012

Recently we were asked by a client to write a research report on text sentiment analysis, including some graphics and statistical output.

We take the Twitter data captured by the R language as an example, conduct text mining on the data, and further conduct sentiment analysis, so as to obtain a lot of interesting information

Find samples whose Twitter source is an Apple phone or an Android phone, and clean up samples from other sources.

tweets <-tweets_df>%select(id, statusSource, text, created) %>%
extract(statusSource, "source", "Twitter for (.*?)<")>%filter(source %in%c("iPhone", "Android"))
copy

Visualize the data and calculate the proportion of tweets corresponding to different times.

And compare the difference in the number of tweets on Android phones and Apple phones.

From the comparison chart, we can find that there is a significant difference in the time of posting tweets between Android phones and Apple phones. Android phones tend to post tweets between 5:00 and 10:00, while Apple phones generally post Twitter between 10:00 and 20:00 Tweet. At the same time, we can also see that the proportion of tweets posted by Android phones is higher than that of Apple phones.

Click on the title to view past content

NLP natural language processing - topic model LDA case: Mining text data of message boards of People's Daily Online

Swipe left and right to see more

01

02

03

04

Then check to see if there are quotes in the tweet, and compare the number on different platforms.

ggplot(aes(source, n, fill = quoted)) +
geom_bar(stat ="identity", position ="dodge") +
labs(x ="", y ="Number of tweets", fill ="")
copy

From the comparison results, the proportion of Android phones that are not cited is significantly lower than that of Apple phones. The number of references to Android phones is significantly greater than that of Apple phones. Therefore, it can be considered that most of the Twitter content sent by Apple mobile phones is original, while most of the tweets sent by Android mobile phones are quotes.

Then check if there are links or pictures in Twitter, and compare the situation on different platforms

ggplot(tweet_picture_counts, aes(source, n, fill = picture)) +
geom_bar(stat ="identity", position ="dodge") +
labs(x ="",
copy

From the comparison chart above, we can see that there are more Android phones without pictures or links than Apple phones, that is to say, users who use Apple phones usually post photos or links when tweeting.

At the same time, it can be seen that users of the Android platform generally do not use pictures or links on Twitter, while users of Apple mobile phones are just the opposite.

spr <-tweet_picture_counts>%spread(source, n) %>%
mutate_each(funs(. /sum(.)), Android, iPhone)
rr <-spr$iPhone[2] /spr$Android[2]
copy

Then we detect the abnormal characters in Twitter, delete them, find the keywords in Twitter, and sort them by quantity

reg <- "([^A-Za-zd#@']|'(?![A-Za-zd#@]))
"tweet_words <-tweets>%filter(!str_detect(text, '^"')) %>%m
utate(text =str_replace_all(text, "https://t.co/[A-Za-zd]+|&", "")) %>%
unnest_tokens(word, text, token ="regex", pattern = reg) %>%
filter(!word %in%stop_words$word,str_detect(word, "[a-z]"))


tweet_words %>%count(word, sort =TRUE) %>%head(20) %>%
mutate(word =reorder(word, n)) %>%ggplot(aes(word, n)) +geom_b
copy

Perform sentiment analysis on the data and calculate the relative influence of Android and Apple phones.

The sentiment ratios of different platforms are calculated and visualized through the sentimental tendencies of feature words.

After counting the number of words with different sentiment tendencies, draw their confidence intervals. As can be seen from the graph above, compared to Apple phones, Android phones have the most negative emotions, followed by disgust, and then sadness. Very little tendency to express positive emotions.

Then we count the number of keywords appearing in each sentiment category.

android_iphone_ratios %>%inner_join(nrc, by ="word") %>%
filter(!sentiment %in%c("positive", "negative")) %>%
mutate(sentiment =reorder(sentiment, -logratio),word =reorder(word, -logratio)) %>%
copy

From the results, we can see that most negative words appear on Android phones, while the number of negative words on Apple phones is much smaller than that on Android platforms.

Click "Read the original" at the end of the article

Get the full text and complete code data materials.

This article is selected from "Text sentiment analysis of twitter data in R language".

Click on the title to view past content

[Data Sharing] Wikipedia Wiki Negative Harmful Comments (Internet Violence) Text Data Multi-label Classification Mining Visualization R language text mining tf-idf, topic modeling, sentiment analysis, n-gram modeling research NLP natural language processing - topic model LDA case: Mining text data of message boards of People's Daily Online Python topic modeling LDA model, t-SNE dimensionality reduction clustering, word cloud visualization text mining newsgroup dataset Natural Language Processing NLP: Topic LDA, Sentiment Analysis News Text Data under the Epidemic Topic modeling analysis of text mining of NASA metadata by R language R language text mining, sentiment analysis and visualization of Harry Potter novel text data A Case Study of Text Mining and Hierarchical Clustering Visual Analysis of Novels Using Python and R Python for NLP: Deep Learning Text Generation with Keras Application of long short-term memory network LSTM in time series forecasting and text classification Application of Text Mining with Rapidminer: Sentiment Analysis R language text mining tf-idf, topic modeling, sentiment analysis, n-gram modeling research R language performs text sentiment analysis on twitter data Python uses neural networks for simple text classification Python for NLP: Multi-Label Text LSTM Neural Network Classification with Keras R language text mining uses tf-idf to analyze keywords of NASA metadata R language NLP case: LDA topic text mining coupon recommendation website data Python uses neural networks for simple text classification R Language Natural Language Processing (NLP): Sentiment Analysis News Text Data A Case Study of Text Mining and Hierarchical Clustering Visual Analysis of Novels Using Python and R R language performs text sentiment analysis on twitter data LDA model in R language: topic modeling analysis of text data Latent Semantic Analysis of R Language Text Topic Model (LDA:Latent Dirichlet Allocation

Tags: Python iOS Android iphone

Posted by grazzman on Thu, 08 Dec 2022 03:51:50 +0300