Scraping Twitter without API using Twint

Hashim Puthiyakath
3 min readMay 14, 2021

This guide demonstrates how to use scrapes tweets from Twitter using Twint. Twint is a free and open-source python library that help you to scrape tweets without using API.

Install Twint

Use the following code to install Twint and nest_asyncio to your machine. This is a python library, so you need to run this code in a python environment.

!pip3 install twint 
!pip3 install nest_asyncio

It is not necessary to add !pip3 install nest_asyncio but this will fix an error when using Twint in Jupyter Notebook. So, keep it in the code.

Import Twint and other libraries

The following lines of code install the libraries you need to scrape the data.

import twint
import nest_asyncio
nest_asyncio.apply()

Apart from that, you can also import panda, and other useful libraries to work with the data as you go through.

import panda

Configure Twint

Once you have included the Twint library, you can start a new configuration of Twint and configure it up the way you want. Here is a simple exmaple.

# Configure
c = twint.Config()
c.Username = "cutn_official"
c.Search= "UGC"
c.Limit = 10
c.Resume = 'resume_test.csv'

The line c = twint.Config() instantiates a new configuration of Twint and assign it to c.

c.Username

This tells the Twint to look for only the tweets or data related to a particular username

c.Search

This tells Twint to look for tweets containing the word UGC. If you want to get the tweets that contains a group of words you can use an array.

c.Search=["CUTN", "NEP", "UGC"]

However, the above the code will only get the tweets that contain all of the given words.

If you want to get tweets that contain any of the given words, you can use the following code.

c.Search= "CUTN OR NEP"

c.Limit

This limits the number tweets scraped. Sometimes, there might be millions of tweets tat match with your criteria but you want to only a few of them, you can use this. This is also useful while testing Twint.

twint.run.Search(c)

This command asks Twint to run the query the data based on the parameters described above. You add this code at the end after the configuration.

c.Store_csv

This tells Twint to store the data as a CSV File or not. It accepts TRUE or FALSE values.

c.Output

This command tells Twint how to save the file.

c.Output = "test.csv"# save inside a folder
# c.Output = "folder/test.csv"

Apart from CSV, you can also save the data as txt, json, SQLite, or elsasticsarch

c.Resume

This helps you to resume the search when the search interrupted in the middle.

c.Resume = 'resume_test.csv'

Additional options to filter the tweets

Here is a list of a few more options you can add to the Twint configuration to get the exact types of tweet you want.

You can check the Github Wiki for Twint to get the full options.

c.Geo

This takes longitude, latitude, and radius as values as a string. Each item is separated by a coma. The unit for radius could be km or mi.

c.Geo = "10.1212, 21.12212, 25km"

Tweets from since a date

c.Since = "2017-12-27"# c.Since = "2017-12-27 15:55:00"

Tweets Until a date

c.Until = "2019-12-27"# c.Since = "2019-12-27 15:55:00"

--

--

Hashim Puthiyakath

Hashim is a Mass Communication scholar and SEO Analyst with expertise in media production, digital marketing, data analytics, and programming.