amazon s3 – Python boto, list contents of specific dir in bucket

amazon s3 – Python boto, list contents of specific dir in bucket

For boto3

import boto3

s3 = boto3.resource(s3)
my_bucket = s3.Bucket(my_bucket_name)

for object_summary in my_bucket.objects.filter(Prefix=dir_name/):
    print(object_summary.key)

By default, when you do a get_bucket call in boto it tries to validate that you actually have access to that bucket by performing a HEAD request on the bucket URL. In this case, you dont want boto to do that since you dont have access to the bucket itself. So, do this:

bucket = conn.get_bucket(my-bucket-url, validate=False)

and then you should be able to do something like this to list objects:

for key in bucket.list(prefix=dir-in-bucket): 
    <do something>

If you still get a 403 Errror, try adding a slash at the end of the prefix.

for key in bucket.list(prefix=dir-in-bucket/): 
    <do something>

Note: this answer was written about the boto version 2 module, which is obsolete by now. At the moment (2020), boto3 is the standard module for working with AWS. See this question for more info: What is the difference between the AWS boto and boto3

amazon s3 – Python boto, list contents of specific dir in bucket

Boto3 client:

import boto3

_BUCKET_NAME = mybucket
_PREFIX = subfolder/

client = boto3.client(s3, aws_access_key_id=ACCESS_KEY,
                            aws_secret_access_key=SECRET_KEY)

def ListFiles(client):
    List files in specific S3 URL
    response = client.list_objects(Bucket=_BUCKET_NAME, Prefix=_PREFIX)
    for content in response.get(Contents, []):
        yield content.get(Key)

file_list = ListFiles(client)
for file in file_list:
    print File found: %s % file

Using session

from boto3.session import Session

_BUCKET_NAME = mybucket
_PREFIX = subfolder/

session = Session(aws_access_key_id=ACCESS_KEY,
                  aws_secret_access_key=SECRET_KEY)

client = session.client(s3)

def ListFilesV1(client, bucket, prefix=):
    List files in specific S3 URL
    paginator = client.get_paginator(list_objects)
    for result in paginator.paginate(Bucket=bucket, Prefix=prefix,
                                     Delimiter=/):
        for content in result.get(Contents, []):
            yield content.get(Key)

file_list = ListFilesV1(client, _BUCKET_NAME, prefix=_PREFIX)
for file in file_list:
    print File found: %s % file

Leave a Reply

Your email address will not be published. Required fields are marked *