AWS Simple Storage Service (S3) incubating
Object storage offering 99.999999999% (11 nines) data durability.
Internals
Based on Amazon Dynamo (not the same as DynamoDB) - technology developed internally at amazon for a incrementally scalable, highly available key-value storage system.
Scope
Regional, although AWS does call it global at places.
Can you create a bucket without specifying a region?
Consistency
read-after-write
consistency forPUT
, andDELETE
requests- Strongly consistent for reads, always returns the most recent data. Note, a read op initiated before a write op is finished can return either old or new data.
- No out of the box object locking for concurrent writes (manage in app code), so be mindful of write conflicts
last-writer-wins
for conflict resolution of concurrent writes
More here
Storage class
- Standard
- Intelligent tiering
- Express One-Zone (High Performance)
- Standard Infrequent Access
- One-Zone Infrequent Access
- Glacier Instant Retrieval
- Glacier Flexible Retrieval
- Glacier Deep Archive
- S3 on [outposts]
More https://aws.amazon.com/s3/storage-classes/
Lifecycle rules
Use to transition between storage classes or deleting/expiring objects etc. Be sure to not operate on too many small objects as the transition cost will be more than any cost gains due to storage class change.
Object lock
Use to retain particular objects for compliance reasons.
Encryption
AWS encrypts all objects uploaded to S3 using SSE-S3
by default. This can be changed in bucket settings.
Other encryption options (SSE = server side encryption) -
SSE-C
(customer managed keys)SSE-KMS
(kms keys - customer or aws managed) #DSSE-KMS
(dual-layer SSE using KMS keys)
# With SSE-KMS
there can be cost implications (KMS charges) if too many encrypted files are uploaded/downloaded. Always use a bucket key with SSE-KMS
to reduce KMS costs.
Bucket Key
- Uses the bucket ARN as the encryption context instead of the object ARN if using bucket keys, thus reducing calls to KMS, and eventually cost
- Create unique data keys for objects
- Used for a time-limited period within S3
Encrypt existing objects?
- S3 Batch Operations - copy objects to same bucket.
- Use AWS API (SDK/CLI) to copy objects to same bucket.
Replication
- If source object uses S3 Bucket Keys & destination bucket uses default encryption, replica object maintains its S3 Bucket Key encryption settings in the destination bucket.
- If source object is not encrypted & destination bucket uses S3 Bucket Key with SSE-KMS, replica object is encrypted with an S3 Bucket Key using SSE-KMS. This results in the ETag of the source object being different from the ETag of the replica object.
- Options for data replication - https://aws.amazon.com/blogs/storage/considering-four-different-replication-options-for-data-in-amazon-s3/
- 2-way sync for S3 objects https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-s3-replication-adds-support-two-way-replication/
Access Logs
Useful for audit and security purposes. Deliver to a different S3 bucket. Alternative to switching on cloudtrail data events, and incurring higher cost.
Logs can be queried via athena.
Endpoints
Use endpoints to keep the data transfer within AWS backbone network, and reduce data transfer charges. S3 supports both gateway endpoints and interface endpoints.
Gateway
Think of it as the bare minimum for enabling private access to S3 from a vpc. Traffic does not travel via NAT or internet gateways avoiding the data transfer costs associated with them.
Interface
This is a more heavyweight solution, use this to access S3 from on-prem, peered VPCs in other AWS Regions, or even through a transit gateway.
Access Points
Network endpoints connected to a SINGLE bucket. A single bucket can have multiple access points, preferrebly for each service/app that needs access to the bucket. This design allows for keeping permissions modular and scoped to a particular application without impacting any other app. Unlike bucket policies which tend to be a monolith managing access for ALL apps together.
Multi-Region Access Points
Uses Global Accelerator under the hood to optimize S3 traffic. Ensures low latency access to data from anywhere in the world irrespective of the home region of the bucket and data.
It can centrally configure replication rules between buckets in different regions, and use the nearest bucket to fulfil a request.
Transfer Acceleration S3TA
Speeds up S3 transfers for apps where users are geographically far away from the bucket’s region. It uses cloudfront edge and AWS backbone network to speed up the data transfer by as much as 50 to 500%. Only accelerated transfers are billed. Enable in S3 bucket properties.
Performance
Dependent on keys (path)
Events
File operations can directly trigger events for services
If the service isn’t supported directly, create a data events only cloudtrail, this will push the event to eventbridge. Add a rule with detail-type:AWS API Call via CloudTrail
and target the required service.
Security
A public bucket can be tracked by IAM Access Analyzer findings.
Enable server access logging and send logs to a different bucket. Use these logs for security audit, understanding usage patterns etc. Query via athena.
SELECT requestdatetime, remoteip, requester, key
FROM s3_access_logs_db.mybucket_logs
WHERE key = 'images/picture.jpg' AND operation like '%DELETE%';