Download large file in python with requests
Download large file in python with requests
With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:
def download_file(url):
local_filename = url.split(/)[-1]
# NOTE the stream=True parameter below
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, wb) as f:
for chunk in r.iter_content(chunk_size=8192):
# If you have chunk encoded response uncomment if
# and set chunk_size parameter to None.
#if chunk:
f.write(chunk)
return local_filename
Note that the number of bytes returned using iter_content
is not exactly the chunk_size
; its expected to be a random number that is often far bigger, and is expected to be different in every iteration.
See body-content-workflow and Response.iter_content for further reference.
Its much easier if you use Response.raw
and shutil.copyfileobj()
:
import requests
import shutil
def download_file(url):
local_filename = url.split(/)[-1]
with requests.get(url, stream=True) as r:
with open(local_filename, wb) as f:
shutil.copyfileobj(r.raw, f)
return local_filename
This streams the file to disk without using excessive memory, and the code is simple.
Note: According to the documentation, Response.raw
will not decode gzip
and deflate
transfer-encodings, so you will need to do this manually.
Download large file in python with requests
Not exactly what OP was asking, but… its ridiculously easy to do that with urllib
:
from urllib.request import urlretrieve
url = http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso
dst = ubuntu-16.04.2-desktop-amd64.iso
urlretrieve(url, dst)
Or this way, if you want to save it to a temporary file:
from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
url = http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso
with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:
copyfileobj(fsrc, fdst)
I watched the process:
watch ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso
And I saw the file growing, but memory usage stayed at 17 MB. Am I missing something?