Download Instagram Images in 60 lines of Python code
- Python 3 installed
- A text editor installed ( e.g., VS Code )
- Some basic knowledge about html
Today, we will use Selenium, a Python library, to crawl images in Instagram and download them to the local drive
I will use Chrome to demonstrate the effect
- Download ChromeDriver: ChromeDriver — WebDriver for Chrome — Downloads (chromium.org)
2. Download the corresponding version of webdriver
3. Create a new folder and open it with VS Code
4. Create a file called crawl.py
5. Remember to check your Python version to avoid error
6. Open your terminal in VS Code and install the required package
pip install selenium
pip install wget
7. In your crawl.py, import the required Python package
from selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.common.keys import Keysimport timeimport osimport wget
8. Check where your webdriver is located (The Chrome Driver we previously download)
9. Create a variable called PATH to remember your ChromeDriver location
PATH="C:/Users/tinki/OneDrive/桌面/網頁製作/chromedriver.exe"
10. Tell Selenium to use the Chrome Driver
driver =webdriver.Chrome(PATH)
11. Tell Selenium to go to the Instagram
driver.get("https://www.instagram.com/")
12. We have to control the browser to login to the Instagram
13. We can inspect the element by pressing the F12
The input field has an attribute called name and the value is username
14. We can locate the username & password field by their name attribute. But before that, you need to know that fetching data takes time. Sometimes, it takes a few second to load the pages in your browser.
username = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, "username")))password = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, "password")))
The code snippet above tells the browser to wait maximum of 10s and locate the elements with a name attribute called username and password
15. Locate the login button with the Xpath
login_Xpath = '//*[@id="loginForm"]/div/div[3]/button'login = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, login_Xpath)))
16. Clear the username and password field
username.clear()password.clear()
17. Send your username and password
username.send_keys('Your username')password.send_keys('Your password')
18. Click the login button
login.click()
19. Use the same method described above to locate the search box
searchBox = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="react-root"]/section/nav/div[2]/div/div/div[2]/input')))
20. Store your keyword in the keyword variable
keyword= "Your keyword"
21. Type your keyword in the search box and press enter
searchBox.send_keys(keyword)time.sleep(1)searchBox.send_keys(Keys.RETURN)time.sleep(1)searchBox.send_keys(Keys.RETURN)
22. Tell the browser to wait until we locate the photos
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "FFVAD")))
23. Instagram will not load all the image at once. We need to scroll down the browser to get more images. We can stimulate this action by the following code to scroll down the browser 5 times. After that, we wait 5 seconds
for i in range(5):driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")time.sleep(5)
24. Find all the elements with the class name “FFVAD”
imgs = driver.find_elements_by_class_name("FFVAD")
25. Create a folder and name it using the keyword
path = os.path.join(keyword)os.mkdir(path)
26. Use a for loop to download all the images
count = 0for img in imgs:save_as = os.path.join(path, keyword + "_" + str(count) + '.jpg')wget.download(img.get_attribute("src"), save_as)count += 1
The save_as variable tells the computer where to save the images
The code below is get all the image url
img.get_attribute("src")
We use the wget module to download the images and save them in the folder
wget.download(img.get_attribute("src"), save_as)
The full code can be found here