Photo by Ahmet Yalçınkaya on Unsplash
Scraping Top 100 Greatest Movie
Automatically Sort and Save Top 100 Greatest Movie Titles from Empire Online
Table of contents
No headings in the article.
Are you a movie buff looking for a way to sort and save a list of the best movies from a website? Well, look no further! In this article, we'll walk you through a Python project that automatically sorts and saves movie titles from the Empire Online website.
The project utilizes popular libraries such as requests
, BeautifulSoup
, and re
(regular expressions). With just a few lines of code, you can extract movie titles, sort them based on a specific criterion, and save the sorted list to a text file.
First, we start by specifying the URL of the webpage we want to scrape. In this case, we're using a snapshot from the Internet Archive of Empire Online's "500 Greatest Movies of All Time" list.
url = "https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/"
Next, we send a GET request to the URL and retrieve the webpage's HTML content.
response = requests.get(url)
movies_webpage = response.text
We then create a BeautifulSoup
object, which allows us to parse the HTML and navigate its elements.
soup = BeautifulSoup(movies_webpage, "html.parser")
Using BeautifulSoup's find_all
method, we extract the movie titles from the webpage. In this case, the movie titles are contained within <h3>
tags with a class of "title."
movie_titles = [movie.getText() for movie in soup.find_all("h3", class_="title")]
To sort the movie titles, we use a lambda function in the sorted
function. The lambda function extracts the numerical value from each movie title using a regular expression (re
), allowing us to sort the titles based on their numeric order.
sorted_movie_titles = sorted(movie_titles, key=lambda x: int(re.search(r"\d+", x).group()))
Finally, we save the sorted movie titles to a text file named "movie_list.txt."
with open("movie_list.txt", "w") as file:
file.write("\n".join(sorted_movie_titles))
And there you have it! With this Python script, you can automatically sort and save a list of movie titles from Empire Online or any other webpage. This project demonstrates the power of web scraping and basic data manipulation using Python.
Feel free to customize the script to suit your needs, such as sorting based on different criteria or extracting additional information from the webpage.
Happy movie list sorting!