About a year ago, Nicole Ricora wrote an amazing article on Medium talking about the profound impact that big data is having on social media. She makes some excellent points about how social media and big data are very closely intertwined.
Many brands around the world are searching for ways to leverage big data that is generated through social media. Twitter, in particular, can be an excellent source for mining valuable data.
The Twitter API is Exceptional for Mining and Analyzing Big Data
Social media has turned into a major medium that has been influencing modern society. It is a place where users express and share their opinions to get in touch with the latest trends. How can use the information and big data on Twitter to your advantage? This is one of the biggest things that we should understand if we hope to grasp the impact of big data on social media.
A post from a single user is information itself, but it does not reflect a broader trend in social media content since it has to be compared with some other reference point. But when millions of posts are published every day, this information becomes big data which can be analyzed in different ways. Computer analysis enables brands to find unobvious relationships between groups of tweets that would never be found by a human. This allows for better understanding users? attitudes to different phenomena, people, companies and products. Twitter has talked about its use of machine learning and other big data tools to make the most of its interface. Its users are following suit.
Twitter is one of the most popular social media in the world that produces about 350-400 million tweets daily. For a researcher with a certain objective, Twitter can be a real gold mine providing useful information about people?s sentiment, attitudes and even behavior. The data collected from Twitter can be analyzed in economic, political, social and industrial dimensions.
Besides the themes of tweets, the analysis may account for different characteristics of users, namely demographic characteristics such as gender, age and native language, spatial characteristics such as the location of the tweet and the type of the tweet such as public or personal tweets.
API, the acronym for Application Programming Interface, is the special software that allows two programs to interact with each by exchanging data and commands. Two different types of APIs to obtain Twitter data can be distinguished.
Rest APIs
The first type is the REST APIs, currently a popular kind of architecture for web applications. These APIs are used for retrieving data that has been previously collected on a web resource or a database. To obtain data from these APIs, it is necessary to make a direct request to a database. A database will provide the data with corresponding parameters over a requested period of time.
Streaming APIs
The second type are the streaming APIs that provide a flow of Twitter data in real time. Once a request for obtaining such data is made, streaming APIs start delivering information with given parameters from Twitter to your application. Steaming APIs are limited by their capability depending on the type of information and the number of users. Streaming APIs can be divided into three groups by the types of endpoints:
- Public streams consisting of public tweets
- User streams comprising tweets of a particular user
- Site streams designed for applications aggregating tweets from different users by some parameters.
Analysis of Twitter big data can be conducted using different programming languages. As an example, Python or R can be employed. The general sequence of steps will be the following.
1. Downloading and cache the dictionary of categorizing words
This will be a benchmark that your set of tweets will be compared with. You will need a base of words to further categorise the words in the sample.
2. Download the set of tweets you are going to explore
To do this, two operations have to be completed. This example describes the work with Python and its ready-to-use tools.
Preparation
For using Twitter API, you need to create a developer account on Twitter?s app site. For this, it is necessary to create or login into an account at https://app.twitter.com and create a new app. After filling in the info about your project, it is necessary to request the access token and access token secret.
Accessing Twitter API
You can access Twitter APIs only by means of authenticated requests. Twitter utilises Open Authentication so that each request has to be signed with relevant Twitter user data. It is also noteworthy that access to Twitter APIs is restrained to the rate limit, that is a certain number of requests over a period of time. These restraints are established on both user and application levels. A rate limit interval is established to refresh the quota of admitted API calls.
Twitter API can be accessed using different tools, with Tweepy being one of them. Prior to creating the API object, you have to authenticate yourself with your developer data. Once authentication is made, you can create your API object.
3. Clean the tweets
In most cases, cleaning the tweets means either of the two operations. Some tweets contain hashtags that you may be interested in while the tweet itself does not contain any information about the mentioned object. Such tweets have to be removed from the analysed sample.
Also, tweets of many people may include typos, shortenings, parables and slang words relevant for our sample. That is, you have to make sure that different words that are used in changed forms but have a relevant meaning are also taken into account.
4. Match each word from the tweets with the words in the dictionary
You should keep in mind that words written in different variants may have the same meaning or be related with the topic. Thus, you have to make sure that you match these transformed words and abbreviations to the words in the dictionary as well. Otherwise, you might miss a substantial share of tweets relevant for your analysis.
5. Categorize the words
Once you have matched the words from tweets and the dictionary, you need to categorise them, that is to split them into groups. If you are making a sentiment analysis aimed at indicating the users? attitude to an event, a person or a company, you will likely have some categories such as ?positive?, ?negative? and ?neutral?. You will have to determine to which group each word of the sample refers.
6. Make conclusions
You can count the number of words in each category to make a conclusion about an overall attitude to the explored object. You may have several categories besides the attitude including region, male, age and many others. By analyzing relationships between these subcategories, you can determine what people with different parameters think of the object.