What is web scraping?
In simple words its just getting desired data from web page and saving it in a notepad or excel file.
(data like:- data from tables, data like url links or images, videos, pdf, etc)
There are various ways to scrape websites and there are various tools and softwares that can do you work easy but most of them are not free. But python on the other hand is not just free but you can create your own set of tools that can do your work just the way you design.
You can create a better web scraping code then this I am sure about that.
But this simple code will give you the Idea what web scraping is like and what can you do with it rather then learning and doing nothing about it.
Believe me web scraping is cool once you know what to do with those data.
Here is the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # We will be using python 3.7 # We will scrap Index page from any website # Create by Joel Dcosta that is me # First we will search Google for Python PDFs # The following google search code are as follows:- # -inurl:(htm|html|php) intitle:"index of" + "last modified" +"parent directory" +description +size +(pdf) "python" # ################################################################################################################# import requests import re f = open("INDEX_DATA.txt", "a+") url = "" website = requests.get(url) html = website.text #PDF files = re.findall('href="(.*pdf)"', html) for infile in sorted(x for x in (files)): f.write(url + infile+"\n") f.close() |
Please see the video to know how to use the code.
Hope you like the tutorial.
Python is simple and fun to play with and yet very powerful. 🔥