Published on

S3 Object Overview

Authors
  • avatar
    Name
    Aaron Lee
    Twitter

S3 Object Overview

Amazon S3 (Simple Storage Service) is a widely used object storage service that allows users to store and retrieve any amount of data from anywhere on the web. It is designed for durability, availability, and scalability, making it an ideal choice for a variety of use cases, including backup and restore, archiving, big data analytics, and content distribution. S3 stores data as objects within buckets. Each object consists of the following components:

  • Object Key: A unique identifier for the object within a bucket. It is used to retrieve the object.
  • Object Value: The actual data being stored, which can be any type of file, such as images, videos, documents, or backups.
  • Metadata: Key-value pairs that provide additional information about the object, such as content type, size, and custom metadata defined by the user.
  • Version ID: A unique identifier for the object version, which is used when versioning is enabled for the bucket. This allows users to keep multiple versions of an object and retrieve or restore previous versions as needed.
  • Access Control List (ACL): A set of permissions that define who can access the object and what actions they can perform (e.g., read, write, delete).
  • Storage Class: The storage class defines the storage tier for the object, which can affect cost and performance. S3 offers several storage classes, including Standard, Intelligent-Tiering, Standard-IA (Infrequent Access), One Zone-IA, Glacier, and Glacier Deep Archive.
  • Encryption: S3 supports server-side encryption (SSE) and client-side encryption to protect data at rest. Users can choose to encrypt their objects using AWS-managed keys (SSE-S3), AWS Key Management Service (SSE-KMS), or customer-provided keys (SSE-C).
  • Lifecycle Policies: Users can define lifecycle policies to automatically transition objects between storage classes or delete them after a specified period. This helps manage costs and optimize storage usage.
  • Replication: S3 supports cross-region replication (CRR) and same-region replication (SRR) to automatically replicate objects across different AWS regions or within the same region. This enhances data durability and availability.
  • Event Notifications: S3 can trigger notifications for specific events, such as object creation, deletion, or restoration. This allows users to integrate S3 with other AWS services or custom applications for real-time processing.

At DigitalOcean, we provide an S3-compatible object storage service called Spaces. Spaces is designed to be easy to use and integrates seamlessly with our other products, such as App Platform and Kubernetes. It offers a simple API, a user-friendly web interface, and built-in CDN support for fast content delivery. Spaces is ideal for developers and businesses looking for a cost-effective and scalable solution for storing and serving large amounts of data, such as images, videos, backups, and static website content. Spaces also supports features like versioning, access control, and lifecycle management, making it a powerful tool for managing your data in the cloud.

Service Architecture

Our service architecture involves a distributed system of services to ensure availability, durability, and scalability. In order to ensure unique bucket names, we do have a centralized metadata service that keeps track of all bucket names and their associated metadata. This service is responsible for enforcing bucket name uniqueness and providing a consistent view of the bucket namespace across all regions and availability zones. At a high level, we have load balancers at each datacenter to ensure requests do not overload any single server. Furthermore, there are multiple backend nodes to handle traffic so that we can scale horizontally and accommodate varying workloads. I will go into another scalability issue and how we are attempting to solve it.

Ceph

Ceph is an open source distributed storage system that provides object, block, and file storage in a unified platform. It is designed to be highly scalable, fault-tolerant, and self-healing, making it an ideal choice for large-scale storage deployments. We primarily use it to power our volumes and object storage products.