AWS Simple Storage Service (S3)  incubating 

Object storage offering 99.999999999% (11 nines) data durability.

Internals

Based on Amazon Dynamo (not the same as DynamoDB) - technology developed internally at amazon for a incrementally scalable, highly available key-value storage system.

Scope

Regional, although AWS does call it global at places.

Can you create a bucket without specifying a region?

Consistency

  • read-after-write consistency for PUT, and DELETE requests
  • Strongly consistent for reads, always returns the most recent data. Note, a read op initiated before a write op is finished can return either old or new data.
  • No out of the box object locking for concurrent writes (manage in app code), so be mindful of write conflicts
  • last-writer-wins for conflict resolution of concurrent writes

More here

Storage class

  1. Standard
  2. Intelligent tiering
  3. Express One-Zone (High Performance)
  4. Standard Infrequent Access
  5. One-Zone Infrequent Access
  6. Glacier Instant Retrieval
  7. Glacier Flexible Retrieval
  8. Glacier Deep Archive
  9. S3 on [outposts]

More https://aws.amazon.com/s3/storage-classes/

Lifecycle rules

Use to transition between storage classes or deleting/expiring objects etc. Be sure to not operate on too many small objects as the transition cost will be more than any cost gains due to storage class change.

Object lock

Use to retain particular objects for compliance reasons.

Encryption

AWS encrypts all objects uploaded to S3 using SSE-S3 by default. This can be changed in bucket settings.

Other encryption options (SSE = server side encryption) -

  • SSE-C (customer managed keys)
  • SSE-KMS (kms keys - customer or aws managed) #
  • DSSE-KMS (dual-layer SSE using KMS keys)

# With SSE-KMS there can be cost implications (KMS charges) if too many encrypted files are uploaded/downloaded. Always use a bucket key with SSE-KMS to reduce KMS costs.

Bucket Key

bucket-key Source

  • Uses the bucket ARN as the encryption context instead of the object ARN if using bucket keys, thus reducing calls to KMS, and eventually cost
  • Create unique data keys for objects
  • Used for a time-limited period within S3

Encrypt existing objects?

  • S3 Batch Operations - copy objects to same bucket.
  • Use AWS API (SDK/CLI) to copy objects to same bucket.

Replication

Access Logs

Useful for audit and security purposes. Deliver to a different S3 bucket. Alternative to switching on cloudtrail data events, and incurring higher cost.

Logs can be queried via athena.

Endpoints

Use endpoints to keep the data transfer within AWS backbone network, and reduce data transfer charges. S3 supports both gateway endpoints and interface endpoints.

Gateway

Think of it as the bare minimum for enabling private access to S3 from a vpc. Traffic does not travel via NAT or internet gateways avoiding the data transfer costs associated with them.

Interface

This is a more heavyweight solution, use this to access S3 from on-prem, peered VPCs in other AWS Regions, or even through a transit gateway.

Access Points

Network endpoints connected to a SINGLE bucket. A single bucket can have multiple access points, preferrebly for each service/app that needs access to the bucket. This design allows for keeping permissions modular and scoped to a particular application without impacting any other app. Unlike bucket policies which tend to be a monolith managing access for ALL apps together.

Multi-Region Access Points

Uses Global Accelerator under the hood to optimize S3 traffic. Ensures low latency access to data from anywhere in the world irrespective of the home region of the bucket and data.

It can centrally configure replication rules between buckets in different regions, and use the nearest bucket to fulfil a request.

Transfer Acceleration S3TA

Speeds up S3 transfers for apps where users are geographically far away from the bucket’s region. It uses cloudfront edge and AWS backbone network to speed up the data transfer by as much as 50 to 500%. Only accelerated transfers are billed. Enable in S3 bucket properties.

Performance

Dependent on keys (path)

Events

File operations can directly trigger events for services

If the service isn’t supported directly, create a data events only cloudtrail, this will push the event to eventbridge. Add a rule with detail-type:AWS API Call via CloudTrail and target the required service.

Security

A public bucket can be tracked by IAM Access Analyzer findings.

Enable server access logging and send logs to a different bucket. Use these logs for security audit, understanding usage patterns etc. Query via athena.

SELECT requestdatetime, remoteip, requester, key 
FROM s3_access_logs_db.mybucket_logs 
WHERE key = 'images/picture.jpg' AND operation like '%DELETE%';