Web scraper python project

In this tutorial you will learn how to create a Web scraper python project and learn to use requests and Beautiful Soup library.

Web scraping has become an essential technique for extracting data from websites, automating data collection tasks, and gathering insights for various purposes. Python, with its powerful libraries and tools, offers an efficient and flexible platform for developing web scrapers. In this article, we’ll delve into the process of building a web scraper using Python, covering key concepts, implementation steps, and best practices for creating an effective scraping project.

In this web scraping project, you use a Python requests library and BeautifulSoup library to scrape country data from a Wikipedia page and store and print the result in a JSON format.

Before jumping into code make sure you have installed requests and beauftifulsoup library by running the command below.

pip install requests beautifulsoup4

Source Code

import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population"
r = requests.get(url)

soup = BeautifulSoup(r.content)

table = soup.find("table", {"class": "wikitable sortable"})

data = []
rows = table.find_all("tr")
for row in rows[1:]:
    cols = row.find_all("td")
    if len(cols) > 1:
        country = cols[1].text.strip()
        population = cols[2].text.strip()
        year = cols[3].text.strip()
        data.append({"Country": country, "Population": population, "Year": year})

print(data)

Table of Contents

The output of Python Web Scraper

Web scraper python project Web scraper python project

The above code shows how to do web scraping using the requests library for making HTTP requests and the BeautifulSoup library for parsing HTML content. By fetching data from a Wikipedia page that lists countries and their populations, the script demonstrates how to extract specific information from a structured table on a web page.

The output of Python Web Scraper

Leave a Comment Cancel reply