398

I'd like to graph the size (in bytes, and # of items) of an Amazon S3 bucket and am looking for an efficient way to get the data.

The s3cmd tools provide a way to get the total file size using s3cmd du s3://bucket_name, but I'm worried about its ability to scale since it looks like it fetches data about every file and calculates its own sum. Since Amazon charges users in GB-Months it seems odd that they don't expose this value directly.

Although Amazon's REST API returns the number of items in a bucket, s3cmd doesn't seem to expose it. I could do s3cmd ls -r s3://bucket_name | wc -l but that seems like a hack.

The Ruby AWS::S3 library looked promising, but only provides the # of bucket items, not the total bucket size.

Is anyone aware of any other command line tools or libraries (prefer Perl, PHP, Python, or Ruby) which provide ways of getting this data?

3

27 Answers 27

222

The AWS CLI now supports the --query parameter which takes a JMESPath expressions.

This means you can sum the size values given by list-objects using sum(Contents[].Size) and count like length(Contents[]).

This can be be run using the official AWS CLI as below and was introduced in Feb 2014

 aws s3api list-objects --bucket BUCKETNAME --output json --query "[sum(Contents[].Size), length(Contents[])]"
12
  • 41
    For large buckets (large #files), this is excruciatingly slow. The Python utility s4cmd "du" is lightning fast: s4cmd du s3://bucket-name Commented Mar 31, 2015 at 22:08
  • That's strange. What is the overall profile of your bucket (shallow and fat / deep and thin)? It looks like s3cmd should have the same overheads as AWS CLI. In the code it shows s3cmd make a request for each directory in a bucket. Commented Apr 1, 2015 at 15:14
  • 34
    to get it in human readable format: aws s3api --profile PROFILE_NAME list-objects --bucket BUCKET_NAME --output json --query "[sum(Contents[].Size), length(Contents[])]" | awk 'NR!=2 {print $0;next} NR==2 {print $0/1024/1024/1024" GB"}'
    – Sandeep
    Commented Aug 8, 2015 at 23:22
  • 29
    Now that AWS Cloudwatch offers a "BucketSizeBytes" per-bucket metric this is no longer the right solution. See Toukakoukan's answer below.
    – cce
    Commented Sep 24, 2015 at 20:42
  • 5
    s4cmd du is wonderful, thank you @Brent Faust! small note (for those concerned) that you need to add -r to get the sizes of sub-directories as well. Commented Jun 30, 2018 at 21:06
502

This can now be done trivially with just the official AWS command line client:

aws s3 ls --summarize --human-readable --recursive s3://bucket-name/

Official Documentation: AWS CLI Command Reference (version 2)

This also accepts path prefixes if you don't want to count the entire bucket:

aws s3 ls --summarize --human-readable --recursive s3://bucket-name/directory
16
  • 32
    This is the best and up-to date answer
    – Tim
    Commented Apr 5, 2016 at 22:26
  • 4
    Agree, this is the best answer. Commented Jul 5, 2016 at 17:28
  • 48
    This is very slow for buckets with many files as it basically lists all the objects in the bucket before showing the summary, and in that it is not significantly faster than the @Christopher Hackett's answer - except this one is much more noisy.
    – Guss
    Commented Jul 24, 2016 at 23:19
  • 5
    This will show the size of ALL the individual files in the directory tree. What if I just want to total size for the directory?
    – Chris F
    Commented Jul 16, 2018 at 19:05
  • 6
    For large buckets, check out Bahreini's answer. In the S3 console, go to your bucket > Management > Metrics. You can view total usage and filter by prefix, tag, etc. serverfault.com/a/981112/226067
    – colllin
    Commented Jan 30, 2020 at 17:42
187

AWS Console:

As of 28th of July 2015 you can get this information via CloudWatch. If you want a GUI, go to the CloudWatch console: (Choose Region > ) Metrics > S3

AWS CLI Command:

This is much quicker than some of the other commands posted here, as it does not query the size of each file individually to calculate the sum.

 aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2015-07-15T10:00:00 --end-time 2015-07-31T01:00:00 --period 86400 --statistics Average --region eu-west-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=toukakoukan.com Name=StorageType,Value=StandardStorage

Important: You must specify both StorageType and BucketName in the dimensions argument otherwise you will get no results. All you need to change is the --start-date, --end-time, and Value=toukakoukan.com.


Here's a bash script you can use to avoid having to specify --start-date and --end-time manually.

#!/bin/bash
bucket=$1
region=$2
now=$(date +%s)
aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time "$(echo "$now - 86400" | bc)" --end-time "$now" --period 86400 --statistics Average --region $region --metric-name BucketSizeBytes --dimensions Name=BucketName,Value="$bucket" Name=StorageType,Value=StandardStorage
13
  • 27
    Or in the CloudWatch console: (Choose Region > ) Metrics > S3 Commented Jan 7, 2016 at 17:57
  • 4
    This is by far the easiest and fastest solution. Unfortunately the answer is still only in fourth place.
    – luk2302
    Commented Oct 13, 2016 at 10:12
  • This worked for my bucket with 10million+ objects. But the bash script didn't return anything, had to go to the GUI).
    – Petah
    Commented Mar 6, 2017 at 19:36
  • 1
    It should also be noted that you'll have to change the region as well
    – chizou
    Commented Feb 5, 2018 at 21:21
  • 1
    Also be wary of Name=StorageType,Value=StandardStorage: if your bucket has some custom lifecycle (utilizing Inrequent Access Storage Class, for example), replace StandardStorage with StandardIAStorage or some other value (see docs for details. Commented Jul 17, 2019 at 14:56
109

s3cmd can do this :

s3cmd du s3://bucket-name

4
  • 1
    Thanks. Here's some timing. On a bucket that holds an s3ql deduplicated filesystem with about a million files using about 33 GB of undupicated data, and about 93000 s3 objects, s3cmd du took about 4 minutes to compute the answer. I'm curious to know how that compares with other approaches like the php one described elsewhere here.
    – nealmcb
    Commented Jul 10, 2012 at 23:46
  • 2
    It is slow because the S3 ListObjects API call returns objects in pages of 1000 objects. As I/O is by far the limiting factor I think any solution will be relatively slow over 93000 objects. Commented Apr 20, 2013 at 13:54
  • 13
    s4cmd can also do the same thing, with the added benefit of multi-threading the requests to S3's API to compute the result faster. The tool hasn't been updated recently, but the Internet passer-by may find it useful. Commented Jul 7, 2014 at 17:34
  • s4cmd just returns 0 for me, and returns BotoClientError: Bucket names cannot contain upper-case characters when using either the sub-domain or virtual hosting calling format. for buckets with uppercase characters.
    – Lakitu
    Commented Oct 5, 2015 at 20:52
30

If you want to get the size from AWS Console:

  1. Go to S3 and select the bucket
  2. Click on "Metrics" tab

enter image description here

By default you should see Total bucket size metrics on the top

1
  • In the graph, I saw nothing. Only when I hovered my mouse over the graph did I see dots appear that told me the daily total. Commented Mar 13, 2020 at 17:50
26

If you download a usage report, you can graph the daily values for the TimedStorage-ByteHrs field.

If you want that number in GiB, just divide by 1024 * 1024 * 1024 * 24 (that's GiB-hours for a 24-hour cycle). If you want the number in bytes, just divide by 24 and graph away.

25

Using the official AWS s3 command line tools:

aws s3 ls s3://bucket/folder --recursive | awk 'BEGIN {total=0}{total+=$3}END{print total/1024/1024" MB"}'

This is a better command, just add the following 3 parameters --summarize --human-readable --recursive after aws s3 ls. --summarize is not required though gives a nice touch on the total size.

aws s3 ls s3://bucket/folder --summarize --human-readable --recursive
5
16

s4cmd is the fastest way I've found (a command-line utility written in Python):

pip install s4cmd

Now to calculate the entire bucket size using multiple threads:

s4cmd du -r s3://bucket-name
3
  • 8
    No, s4cmd du s3://123123drink will not simply return the size of the bucket. To get the size of the bucket you do add the recursive -r, like this: s4cmd du -r s3://123123drink Commented Nov 9, 2015 at 16:12
  • 1
    Yes, good point @BukLau (added -r to example above to avoid confusion when people are using simulated folders on S3). Commented Apr 9, 2018 at 22:02
  • What if we want versions also to be considered in the calculation for versioned buckets? Commented Dec 19, 2020 at 15:42
9

You can use the s3cmd utility, e.g.:

s3cmd du -H s3://Mybucket
97G      s3://Mybucket/
1
  • how do we we go about using this if we have to use something like aws --profile saml s3 xyz and so on Commented Dec 19, 2020 at 18:22
6

So trolling around through the API and playing some same queries, S3 will produce the entire contents of a bucket in one request and it doesn't need to descend into directories. The results then just requiring summing through the various XML elements, and not repeated calls. I don't have a sample bucket that has thousands of items so I don't know how well it will scale, but it seems reasonably simple.

2
  • This does seem to be the best option. Will update this post in the future if it scales poorly and I need to do something else. The library that ended up providing easy access to the raw API results was this PHP one: undesigned.org.za/2007/10/22/amazon-s3-php-class
    – powdahound
    Commented Nov 16, 2009 at 15:20
  • Isn't that only limited to the first 1000 items? Commented Apr 13, 2015 at 18:30
6

I used the S3 REST/Curl API listed earlier in this thread and did this:

<?php
if (!class_exists('S3')) require_once 'S3.php';

// Instantiate the class
$s3 = new S3('accessKeyId', 'secretAccessKey');
S3::$useSSL = false;

// List your buckets:
echo "S3::listBuckets(): ";
echo '<pre>' . print_r($s3->listBuckets(), 1). '</pre>';

$totalSize = 0;
$objects = $s3->getBucket('name-of-your-bucket');
foreach ($objects as $name => $val) {
    // If you want to get the size of a particular directory, you can do
    // only that.
    // if (strpos($name, 'directory/sub-directory') !== false)
    $totalSize += $val['size'];
}

echo ($totalSize / 1024 / 1024 / 1024) . ' GB';
?>
4

... A bit late but, the best way I found is by using the reports in the AWS portal. I made a PHP class for downloading and parsing the reports. With it you can get total number of objects for each bucket, total size in GB or byte hrs and more.

Check it out and let me know if was helpful

AmazonTools

3
  • This is an interesting solution, although a little hackish. Worried about it breaking if/when Amazon changes their site, but I may have to try this out once I have enough objects that the other way becomes too slow. Another benefit of this approach is that you don't get charged for any API calls.
    – powdahound
    Commented Dec 21, 2009 at 16:16
  • . . . its an assumption but, if Amazon do change the look of their site, I doubt they would change the back end much, meaning the current GET and POST queries should work. I will maintain the class in the event it does break anyway as I use it often.
    – Corey Sewell
    Commented Dec 22, 2009 at 0:26
  • @Corey its not working undesigned.org.za/2007/10/22/amazon-s3-php-class
    – Maveňツ
    Commented Jan 21, 2022 at 14:18
4

I recommend using S3 Usage Report for large buckets, see my How To on how to get it Basically you need to download Usage Report for S3 service for the last day with Timed Storage - Byte Hrs and parse it to get disk usage.

cat report.csv | awk -F, '{printf "%.2f GB %s %s \n", $7/(1024**3 )/24, $4, $2}' | sort -n
4

The AWS documentation tells you how to do it:

aws s3 ls s3://bucketnanme --recursive --human-readable --summarize

This is the output you get:

2016-05-17 00:28:14    0 Bytes folder/
2016-05-17 00:30:57    4.7 KiB folder/file.jpg
2016-05-17 00:31:00  108.9 KiB folder/file.png
2016-05-17 00:31:03   43.2 KiB folder/file.jpg
2016-05-17 00:31:08  158.6 KiB folder/file.jpg
2016-05-17 00:31:12   70.6 KiB folder/file.png
2016-05-17 00:43:50   64.1 KiB folder/folder/folder/folder/file.jpg

Total Objects: 7

   Total Size: 450.1 KiB
2

For a really low-tech approach: use an S3 client that can calculate the size for you. I'm using Panic's Transmit, click on a bucket, do "Get Info" and click the "Calculate"-button. I'm not sure how fast or accurate it is in relation to other methods, but it seems to give back the size I had expected it to be.

2

Since there are so many answers, I figured I'd pitch in with my own. I wrote my implementation in C# using LINQPad. Copy, paste, and enter in the access key, secret key, region endpoint, and bucket name you want to query. Also, make sure to add the AWSSDK nuget package.

Testing against one of my buckets, it gave me a count of 128075 and a size of 70.6GB. I know that is 99.9999% accurate, so I'm good with the result.

void Main() {
    var s3Client = new AmazonS3Client("accessKey", "secretKey", RegionEndpoint.???);
    var stop = false;
    var objectsCount = 0;
    var objectsSize = 0L;
    var nextMarker = string.Empty;

    while (!stop) {
        var response = s3Client.ListObjects(new ListObjectsRequest {
            BucketName = "",
            Marker = nextMarker
        });

        objectsCount += response.S3Objects.Count;
        objectsSize += response.S3Objects.Sum(
            o =>
                o.Size);
        nextMarker = response.NextMarker;
        stop = response.S3Objects.Count < 1000;
    }

    new {
        Count = objectsCount,
        Size = objectsSize.BytesToString()
    }.Dump();
}

static class Int64Extensions {
    public static string BytesToString(
        this long byteCount) {
        if (byteCount == 0) {
            return "0B";
        }

        var suffix = new string[] { "B", "KB", "MB", "GB", "TB", "PB", "EB" };
        var longBytes = Math.Abs(byteCount);
        var place = Convert.ToInt32(Math.Floor(Math.Log(longBytes, 1024)));
        var number = Math.Round(longBytes / Math.Pow(1024, place), 1);

        return string.Format("{0}{1}", Math.Sign(byteCount) * number, suffix[place]);
    }
}
1

I know this is an older question but here is a PowerShell example:

Get-S3Object -BucketName <buckename> | select key, size | foreach {$A += $_.size}

$A contains the size of the bucket, and there is a keyname parameter if you just want the size of a specific folder in a bucket.

1
  • First run the Get-object..line and then run $A (for those not familiar with PowerShell)
    – Faiz
    Commented Sep 30, 2016 at 10:34
1

To check all buckets size try this bash script

s3list=`aws s3 ls | awk  '{print $3}'`
for s3dir in $s3list
do
    echo $s3dir
    aws s3 ls "s3://$s3dir"  --recursive --human-readable --summarize | grep "Total Size"
done
2
  • This worked great. Commented Sep 14, 2018 at 14:27
  • 1
    Capturing the output in a variable just so you can loop over it is a wasteful antipattern.
    – tripleee
    Commented Oct 8, 2018 at 6:52
1

You can use s3cmd:

s3cmd du s3://Mybucket -H

or

s3cmd du s3://Mybucket --human-readable

It gives the total objects and the size of the bucket in a very readable form.

1
  • 1
    Does du traverse list all the objects or retrieve the metadata? Would really like an api version of the reports version or what is displayed in the aws console...
    – user67327
    Commented Jul 2, 2019 at 22:52
0

Hey there is a metdata search tool for AWS S3 at https://s3search.p3-labs.com/.This tool gives statstics about objects in a bucket with search on metadata.

0

Also Hanzo S3 Tools does this. Once installed, you can do:

s3ls -s -H bucketname

But I believe this is also summed on the client side and not retrieved through the AWS API.

0

By Cloudberry program is also possible to list the size of the bucket, amount of folders and total files, clicking "properties" right on top of the bucket.

0

If you don't want to use the command-line, on Windows and OSX, there's a general purpose remote file management app called Cyberduck. Log into S3 with your access/secret key pair, right-click on the directory, click Calculate.

0

I wrote a Bash script, s3-du.sh that will list files in bucket with s3ls, and print count of files, and sizes like

s3-du.sh testbucket.jonzobrist.com
149 files in bucket testbucket.jonzobrist.com
11760850920 B
11485205 KB
11216 MB
10 GB

Full script:

#!/bin/bash

if [ “${1}” ]
then
NUM=0
COUNT=0
for N in `s3ls ${1} | awk ‘{print $11}’ | grep [0-9]`
do
NUM=`expr $NUM + $N`
((COUNT++))
done
KB=`expr ${NUM} / 1024`
MB=`expr ${NUM} / 1048576`
GB=`expr ${NUM} / 1073741824`
echo “${COUNT} files in bucket ${1}”
echo “${NUM} B”
echo “${KB} KB”
echo “${MB} MB”
echo “${GB} GB”
else
echo “Usage : ${0} s3-bucket”
exit 1
fi    

It does do subdirectory size, as Amazon returns the directory name and the size of all of it's contents.

0

This works for me..

aws s3 ls s3://bucket/folder/ --recursive | awk '{sz+=$3} END {print sz/1024/1024 "MB"}'
2
  • 3
    Can you add a few more details ? Commented Apr 14, 2016 at 20:08
  • 1
    This is essentially the same solution as another answer posted about a year earlier.
    – Louis
    Commented May 2, 2016 at 11:41
0

CloudWatch has a default S3 service dashboard now which lists it in a graph called "Bucket Size Bytes Average". I think this link will work for anyone already logged into AWS Console:

-1

Following way uses AWS PHP SDK to get the total size of the bucket.

// make sure that you are using correct region (where the bucket is) to get new Amazon S3 client
$client = \Aws\S3\S3Client::factory(array('region' => $region));

// check if bucket exists
if (!$client->doesBucketExist($bucket, $accept403 = true)) {
    return false;
}
// get bucket objects
$objects = $client->getBucket(array('Bucket' => $bucket));

$total_size_bytes = 0;
$contents = $objects['Contents'];

// iterate through all contents to get total size
foreach ($contents as $key => $value) {
   $total_bytes += $value['Size'];
}
$total_size_gb = $total_size_bytes / 1024 / 1024 / 1024;

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .