Download Instagram Images in 60 lines of Python code

Prerequisite:

Andy Fung
4 min readOct 13, 2021
  1. Python 3 installed
  2. A text editor installed ( e.g., VS Code )
  3. Some basic knowledge about html

Today, we will use Selenium, a Python library, to crawl images in Instagram and download them to the local drive

I will use Chrome to demonstrate the effect

  1. Download ChromeDriver: ChromeDriver — WebDriver for Chrome — Downloads (chromium.org)
Check your Chrome version

2. Download the corresponding version of webdriver

Download the corresponding version of webdriver

3. Create a new folder and open it with VS Code

4. Create a file called crawl.py

Create a file called crawl.py

5. Remember to check your Python version to avoid error

6. Open your terminal in VS Code and install the required package

pip install selenium
pip install wget

7. In your crawl.py, import the required Python package

from selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.common.keys import Keysimport timeimport osimport wget

8. Check where your webdriver is located (The Chrome Driver we previously download)

Check where your webdriver is located (The Chrome Driver we previously download)

9. Create a variable called PATH to remember your ChromeDriver location

PATH="C:/Users/tinki/OneDrive/桌面/網頁製作/chromedriver.exe"

10. Tell Selenium to use the Chrome Driver

driver =webdriver.Chrome(PATH)

11. Tell Selenium to go to the Instagram

driver.get("https://www.instagram.com/")

12. We have to control the browser to login to the Instagram

13. We can inspect the element by pressing the F12

The input field has an attribute called name and the value is username

14. We can locate the username & password field by their name attribute. But before that, you need to know that fetching data takes time. Sometimes, it takes a few second to load the pages in your browser.

username = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, "username")))password = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, "password")))

The code snippet above tells the browser to wait maximum of 10s and locate the elements with a name attribute called username and password

15. Locate the login button with the Xpath

login_Xpath = '//*[@id="loginForm"]/div/div[3]/button'login = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, login_Xpath)))

16. Clear the username and password field

username.clear()password.clear()

17. Send your username and password

username.send_keys('Your username')password.send_keys('Your password')

18. Click the login button

login.click()

19. Use the same method described above to locate the search box

searchBox = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="react-root"]/section/nav/div[2]/div/div/div[2]/input')))

20. Store your keyword in the keyword variable

keyword= "Your keyword"

21. Type your keyword in the search box and press enter

searchBox.send_keys(keyword)time.sleep(1)searchBox.send_keys(Keys.RETURN)time.sleep(1)searchBox.send_keys(Keys.RETURN)

22. Tell the browser to wait until we locate the photos

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "FFVAD")))

23. Instagram will not load all the image at once. We need to scroll down the browser to get more images. We can stimulate this action by the following code to scroll down the browser 5 times. After that, we wait 5 seconds

for i in range(5):driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")time.sleep(5)

24. Find all the elements with the class name “FFVAD”

imgs = driver.find_elements_by_class_name("FFVAD")

25. Create a folder and name it using the keyword

path = os.path.join(keyword)os.mkdir(path)

26. Use a for loop to download all the images

count = 0for img in imgs:save_as = os.path.join(path, keyword + "_" + str(count) + '.jpg')wget.download(img.get_attribute("src"), save_as)count += 1

The save_as variable tells the computer where to save the images

The code below is get all the image url

img.get_attribute("src")

We use the wget module to download the images and save them in the folder

wget.download(img.get_attribute("src"), save_as)

The full code can be found here

--

--

Andy Fung
Andy Fung

Written by Andy Fung

Self taught grow marketer with experience in Python and Javascript

No responses yet