Download large file in python with requests

Download large file in python with requests

With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

def download_file(url):
    local_filename = url.split(/)[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, wb) as f:
            for chunk in r.iter_content(chunk_size=8192): 
                # If you have chunk encoded response uncomment if
                # and set chunk_size parameter to None.
                #if chunk: 
                f.write(chunk)
    return local_filename

Note that the number of bytes returned using iter_content is not exactly the chunk_size; its expected to be a random number that is often far bigger, and is expected to be different in every iteration.

See body-content-workflow and Response.iter_content for further reference.

Its much easier if you use Response.raw and shutil.copyfileobj():

import requests
import shutil

def download_file(url):
    local_filename = url.split(/)[-1]
    with requests.get(url, stream=True) as r:
        with open(local_filename, wb) as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename

This streams the file to disk without using excessive memory, and the code is simple.

Note: According to the documentation, Response.raw will not decode gzip and deflate transfer-encodings, so you will need to do this manually.

Download large file in python with requests

Not exactly what OP was asking, but… its ridiculously easy to do that with urllib:

from urllib.request import urlretrieve
url = http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso
dst = ubuntu-16.04.2-desktop-amd64.iso
urlretrieve(url, dst)

Or this way, if you want to save it to a temporary file:

from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
url = http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso
with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:
    copyfileobj(fsrc, fdst)

I watched the process:

watch ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso

And I saw the file growing, but memory usage stayed at 17 MB. Am I missing something?

Leave a Reply

Your email address will not be published. Required fields are marked *