British Airways and Virgin Atlantic twitter analysis


Institutional Affiliation

British Airways and Virgin Atlantic twitter analysis

British Airways Plc is regarded as a flag carrier airline of the United Kingdom and by far it is the second largest airline in terms of passenger numbers in the country. It was began in 1974 after the merger of four airline Companies, and it serves destinations globally with considerable codeshare frameworks with multiple companies, like the American Airlines and Qatar Airlines. On the hand Virgin Atlantic, a trading name of Virgin Atlantic Airways Limited and Virgin Atlantic International Limited, is a British aircraft with its administrative centre in Crawley, England. The carrier was set up in 1984 as British Atlantic Airways, and was initially arranged by its fellow benefactors Randolph Fields and Alan Hellary to fly among London and the Falkland Islands. Not long after in the wake of changing the name to Virgin Atlantic Airways, Fields sold his offers in the organization after conflicts with Sir Richard Branson over the administration of the organization (Bifet & Frank, 2010). The lady departure from Gatwick Airport to Newark Liberty International Airport occurred on 22 June 1984. Covering the periods from 1st to 31st December of 2019, this paper provides an extensive review by comparing and contrasting the tweets of these two airlines. Furthermore, through exploring these twitter data the paper provides the pros and cons of potentially replacing the customer satisfaction survey and finally communicating the analytics techniques and steps used to retrieve these data.

Exploratory data analysis

The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain.

In statistics, exploratory data analysis (EDA) is an approach to analysing data sets to summarize their main characteristics.

The summary of the shows that British_Airway the average of negative confidence is 0.674 and while the average positive confidence is 0.911.

The summary of the shows that Virgin Atlantic the average of negative confidence is 0.7027 and while the average positive confidence is 0.9290.

In the above table all of the negative reasons given to the left column. From the table we can see that there are 882 negative reason in Virgin Atlantic while 745 negative reason in British Airways. Out 882 negative reason in Virgin Atlantic 203 times the airline got the negative reason because of Customer Service Issue which is the highest numbers of negative reason followed by Late Flight is 174 times. The rare reason of negative feedback is Damaged Luggage. Similarly in British Airway out of 745 negative reasons 214 times the airline got the negative reason because of Customer Service Issue which is the highest numbers of negative reason followed by Can't Tell is 112 times. The rare reason of negative feedback is Damaged Luggage.

Percentage of the positive feedback:

Out of the 1000 tweets 255 tweets were positive in the British Airway which is 34 percent of the total tweets (1000), while Out of the 1000 tweets only 118 tweets were positive in the British Airway which is 13 percent of the total tweets (1000). From the table we can easily conclude that British Airway is the best as compare to the Virgin Atlantic.
The average accuracy of all the classifiers with TF and TF-IDF is shown in Figure below. The experimental results show that there is very little difference in accuracy when the feature extraction technique is changed from TF to TF-IDF; however, TF-IDF is better in terms of accuracy, precision, and other performance metrics.
Entropy 21 01078 g010 550

To map the confidence scores into sentiment labels (positive, neutral and negative), we added the R code to an Execute R Script module.

This study id voting classifier that is based on logistic regression and stochastic gradient descent classifier. Soft voting is used to combine the probability of LR and SGDC. In addition, various machine learning-based text classification methods were investigated to perform sentiment analysis. The experiments were carried out on a twitter dataset which contains the reviews of travellers about Virgin Atlantic and British Airway.

Comparing and contrasting the tweets

By 12th Dec for example, the two airlines made some adventures tweets about opening of their new routes. British Airways was very clear with their tweet holding that they’ll be opening 6 new European routes from London Heathrow in their summer program for 2020. These two airline companies were both in the adventure to open and expand their operations in the new routes around the world. Virgin Atlantic however, was quite discrete in sharing or rather mentioning the name names of their new routes. While British Airways was expanding their new routes all over Europe, Virgin Atlantic was in the Asian continent celebrating their new route to Mumbai, India. British Airways tweet had 19 retweets and 109 likes, while Virgin Atlantic had 4 retweets and 35 likes. 
On 16th December, 2019 the two airlines made at least 2 tweets. British Airways made a tweet while surprising their customers arriving for Christmas with great performance at Heathrow Terminal 5. This has been their culture since a long time, they always have the experience of bringing their customers home. They decided to put together a choir in the arrivals hall at Terminal 5 at Heathrow, brought in some British Airways staff, as just a way of celebrating that time of the year. This is one of their approaches of exciting the customers, through coming and singing together, et cetera. On the other hand, Virgin Atlantic, for example, made a tweet “If you can’t eat sweets for breakfast at Christmas, then when can you? We make sure you get all your festive treats on board, especially @LoveHearts_UK!.” To them, also, this was the time to treat their customers on board to have that unique experience of an holiday and a festive period by offering them unique treatments to make them exciting and to win their loyalty always. 
It is true, therefore, that for the two tweets, the two airlines had a motive of not only having an exciting experiences and performance with their staffs, but also treating their customers through gifting them aboard to win more of their loyalty and to always convince them that they are the best. In fact on 17th of the same month, Virgin Atlantic tweeted and did exactly the very performance to their customers who were travelling home for Christmas.  Below are the visual representations of the two tweets showing the performances? 
Looking at the traffic statistics on twitter, British Airways tweet has got 64 replies, 88 retweets and 446 likes, while Virgin Atlantic has got 8 replies, 6 retweets and 96 likes. Therefore, basing on these two tweets we can generally deduce that throughout the December holiday of 2019, British Airways had actually attracted more replies, likes, and retweets with regards to each and every tweet they made in that month as compared to Virgin Atlantic which had relatively less likes, retweets and replies.  Being that it was a festive season, one common aspect of the tweets from both the airlines is that they all had focused on tweeting their festive experiences, adventurers, celebrations, and having exciting performances with their customers. 

Pros and Cons of replacing customer satisfaction survey by mining twitter data.

First off, one advantage of using twitter data or approach in studying consumer behaviour, like marketing survey and focus groups is because it does not require large amount of time and resources. Micro-blog Sentiment Analysis System applied in twitter is founded on the sentiment analysis that automatically analyses customer sentiments or opinions from Twitter micro-blog service (Chamlertwat et al., 2012). It actually comprises five components; it collect twitter posts, filter for the opiniated posts, detect any bias or polarity in every posts, categorize product features and give the summary and visual the entire results. This is less expensive and less time consuming as compared to the traditional approaches for studying consumer behaviour. Secondly, micro-blog is actually a social networking application that high grows the user’s opinions. In the above tweets, the replies/comments are the customer’s opinions. 
It is actually the evolution of mini-blogging which many entities like the two airlines could latch on. Another advantage is that the investigation of enormous scale social information, which affirms corporate point of view from shopper sentiments, is extremely basic for supporting top-level administration to take care of this present reality issue. These buyer voices can impact brand observation, brand steadfastness and brand promotion. With web based life observing and estimation investigation, the ventures will have the option to take advantage of buyer bits of knowledge to improve their nature of item, give better help, or even recognize new business opportunity, furthermore, different exercises in like manner.
On the contrary the disadvantages are also a bound, for example; in spite of the fact that there has been some past research in opinion investigation on micro blog, no exploration has underlined on the usage of the final product. The outcome from Twitter Sentiment gives just review of purchaser slant on an item, however it can't indicate the estimation on any definite element.
Finally, Unsupervised Text Mining/Clustering: Text grouping is unaided learning, where no name or target esteems is given for the information. It is a technique for social occasion things or (records) in light of a few comparative qualities among them. It performs order of information things solely dependent on comparability among them. Most bunching calculations need to know the quantity of classes ahead of time. A few analysts use bunching of characterization in theme location since it elusive information set for new points.

Key findings and Discussions

Twitter, possessed and worked by Twitter Inc., is the most mainstream smaller scale blogging administration among other existing reciprocals, for example, Friendfeed, Tumblr, and Twitter clients can post short messages, called tweets, on their client profile and read others' messages on a solitary rundown amassed in a turnaround sequentially requested, called course of events. Tweets are content based presents restricted on 140 UTF-8 characters about any updates from little easily overlooked details occurring in client everyday life. The short idea of refreshes enables clients to post rapidly continuously, contacting their crowd right away. As a matter of course, tweets are openly obvious yet the proprietor can set security to show just to their companions (Kontopoulos et al., 2013). The connections between clients, or supposed after, are hilter kilter. Client can tail others and see their tweets, however different clients need not respond. The endorsers are known as devotees. Two supporters are companions when the two commonly follow one another. 
In this paper, we revealed an exploratory investigation utilizing our Micro-blog Sentiment Analysis System (MSAS) to find shopper understanding. Our work reconfirms that conclusion investigation on miniaturized scale blog, particularly twitter, can give strong data for makers in cell phone industry to settle on some choice about their cutting edge item. Our framework, the MSAS, can accumulate data with respect to item highlight survey without upsetting customers, and the outcome is satisfactory by specialists in the field (Diakopoulos & Shamma, 2010). At last, we presume that supposition examination on smaller scale blog is very helpful instrument for the buyer investigates, particularly in the ventures that clients invest their energy in online life. The less information we can gather from internet based life, the more mistake we will experience the ill effects of the examined outcome.


Bifet, A., & Frank, E. (2010, October). Sentiment knowledge discovery in twitter streaming data. In International conference on discovery science (pp. 1-15). Springer, Berlin, Heidelberg.
Chamlertwat, W., Bhattarakosol, P., Rungkasiri, T., & Haruechaiyasak, C. (2012). Discovering Consumer Insight from Twitter via Sentiment Analysis. J. UCS18(8), 973-992.
Diakopoulos, N. A., & Shamma, D. A. (2010, April). Characterizing debate performance via aggregated twitter sentiment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1195-1198). ACM.
Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert systems with applications40(10), 4065-4074.
Jacobson, R. 2.5 Quintillion Bytes of Data Created Every Day. How Does CPG & Retail Manage It; IBM: Indianapolis, 
Wang, Q.; Kealy, A.; Zhai, S. Introduction for the Special Issue on Beyond the Hypes of Geospatial Big Data: Theories, Methods, Analytics, and Applications. Comput. Model. Eng. Sci.  
> View(British_Airway_BA_)
> library(readxl)
> Virgin_Atlantic <- read_excel("C:/Users/Desktop/Virgin Atlantic.xlsx")
> View(Virgin_Atlantic)
> summary((British_Airway_BA_))
> summary((Virgin_Atlantic))
>#get the scores returned from the model 
dataset1 <- maml.mapInputPort(1) # class: data.frame 
>threshold) <- 0.60 
>threshold2 <- 0.45 
>positives <- which(dataset1["Scored Probabilities"] > threshold)) 
>negatives <- which(datasetl["Scored Probabilities"] < threshold2)
>neutrals <- which(datasetl["Scored Probabilities"] <= threshold) & datasetl["Scored Probabilities"] >= threshold2) 
>new.labels <- matrix(nrow=length(datasetl["Scored Probabilities"]), ncol=1) 
>new. labels[positives] <- "positive" 
>new. labels[negatives] <- "negative"
>new.labels[neutrals] <- "neutral" 
>data.set <- data.frame(assigned=new.labels, confidence=datasetl["Scored Probabilities"]) colnames(data.set) <- c('Sentiment', 'Score') 

># Select data. frame to be sent to the output ()closet port maml.mapOutputPort("data.set");
Virgin AtlanticBritish Airway 
Lost Luggage9670
Flight Attendant Complaints5647
Customer Service Issue203214
Bad Flight8856
Cancelled Flight7959
Can't Tell120112
Late Flight174110
Damaged Luggage610
Flight Booking Problems3857
Total 882745
Virgin Atlantic
Proportion 34%13%