I have recurring tasks these days that consist to automatically download files on the Internet. Usually, requests, beautifulsoup and some tricks do the job effectively. Sometimes though, I have to play it hard and ask Selenium and Chromium* headless to do the heavy lifting. Alas, asking Chromium to automatically download files is not clear.
I found the solution in Chrome tracker, and you read it bellow written in Python:
from selenium import webdriver
# Let's create some option to make Chromium go headless
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('disable-gpu')
# Launch the browser
browser = webdriver.Chrome(chrome_options=options)
download_dir = tempfile.TemporaryDirectory().name
os.mkdir(download_dir)
# Send a command to tell chrome to download files in download_dir without
# asking.
browser.command_executor._commands["send_command"] = (
"POST",
'/session/$sessionId/chromium/send_command'
)
params = {
'cmd': 'Page.setDownloadBehavior',
'params': {
'behavior': 'allow',
'downloadPath': download_dir
}
}
browser.execute("send_command", params)
There you go, happy scraping!
* Of course, it works with regular Chrome too!