Course Information
Course Overview
Learn the fundamentals of asynchronous web scraping & data mining in Python to drastically improve extraction speeds.
Web scraping is simply automatically opening up any website and grabbing the data you find important on that website. It's fundamental to the internet, search engines, Data Science, automation, machine learning, and much more.
Opening websites and extracting data are only part of what makes web scraping great. It's the parsing of the data that's where the value is.
This project will cover:
Basic web scraping with Python
Web scraping with Selenium
Sync vs Async
Asynchronous Web scraping with Asyncio
But why asynchronous code? What is it? How does it benefit us?
Asynchrounous code is a way to execute multiple functions basically at once. It's not actually at the exact same time but it's close. (They actually run concurrently). This means that we can do more things in less time and, when it comes to mining or scraping data, this time saving is absolutely significant.
Imagine for a moment you're recreating google's search engine. You'd have to scrape trillions (if not more) web pages on a regular interval to help with the search results. Of course you're not going to be scraping all of the trillions of pages at once but the idea is that scraping event 1,000 pages would take a very long time doing it synchronously (like using Python requests and/or just selenium).
If you've done a lot of web scraping before but never used Python's aysncio, this course will help you better understand the fundamentals and bring your scraping game to another level.
Let's get started!
Course Content
- 6 section(s)
- 22 lecture(s)
- Section 1 Welcome
- Section 2 Fundamentals
- Section 3 Extraction & Formatting
- Section 4 Prepare for Re-usability
- Section 5 Storing Data
- Section 6 Thank you and next steps
What You’ll Learn
- Basic Web scraping with Python
- Web scraping with Selenium & Python
- JavaScript-Heavy Website Scraping
- Asynchronous Web scraping with Asyncio
Skills covered in this course
Reviews
-
RRyan Kelleher
hard to follow, speak too fast, code doesn't work, too busy of a video to follow easily even as an experienced developer. Never mind using the java interface is completely unnecessary and adds complexity where a simple API scrape would work.
-
RRizal Junaedi
Ilmu yang bermanfaat
-
AAnonymized User
I never saw someone use pandas as a database manager before. And I hope to never see it again. Also, while I can understand wanting to demonstrate using Arsenic to scrape a website, it's worth noting that in this case it is probably entirely unnecessary. You can get all the product urls from the sitemap. And you can get the product info via the API that the site's frontend uses. You'll need to spend a few minutes with the browsers web-dev tools, but in the end it is a lot faster, because you're not waiting 15 seconds for pages to load. Scraping with Selenium/Arsenic should usually be a last resort.
-
LLorenzo Mateo
Talking to fast, not slowing down, code looks confusing, going back and fourth, can not pin down what function he is working on.