amazon s3 – Python boto, list contents of specific dir in bucket
amazon s3 – Python boto, list contents of specific dir in bucket
For boto3
import boto3
s3 = boto3.resource(s3)
my_bucket = s3.Bucket(my_bucket_name)
for object_summary in my_bucket.objects.filter(Prefix=dir_name/):
print(object_summary.key)
By default, when you do a get_bucket
call in boto it tries to validate that you actually have access to that bucket by performing a HEAD
request on the bucket URL. In this case, you dont want boto to do that since you dont have access to the bucket itself. So, do this:
bucket = conn.get_bucket(my-bucket-url, validate=False)
and then you should be able to do something like this to list objects:
for key in bucket.list(prefix=dir-in-bucket):
<do something>
If you still get a 403 Errror, try adding a slash at the end of the prefix.
for key in bucket.list(prefix=dir-in-bucket/):
<do something>
Note: this answer was written about the boto version 2 module, which is obsolete by now. At the moment (2020), boto3 is the standard module for working with AWS. See this question for more info: What is the difference between the AWS boto and boto3
amazon s3 – Python boto, list contents of specific dir in bucket
Boto3 client:
import boto3
_BUCKET_NAME = mybucket
_PREFIX = subfolder/
client = boto3.client(s3, aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
def ListFiles(client):
List files in specific S3 URL
response = client.list_objects(Bucket=_BUCKET_NAME, Prefix=_PREFIX)
for content in response.get(Contents, []):
yield content.get(Key)
file_list = ListFiles(client)
for file in file_list:
print File found: %s % file
Using session
from boto3.session import Session
_BUCKET_NAME = mybucket
_PREFIX = subfolder/
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
client = session.client(s3)
def ListFilesV1(client, bucket, prefix=):
List files in specific S3 URL
paginator = client.get_paginator(list_objects)
for result in paginator.paginate(Bucket=bucket, Prefix=prefix,
Delimiter=/):
for content in result.get(Contents, []):
yield content.get(Key)
file_list = ListFilesV1(client, _BUCKET_NAME, prefix=_PREFIX)
for file in file_list:
print File found: %s % file