Data Storage & Encryption

ContentGrid separates structured metadata from binary content, storing each optimally. Metadata lives in PostgreSQL, while content (documents, images, videos) is stored in S3-compatible object storage.

Storage Architecture

PostgreSQL for Metadata

Each ContentGrid application has its own PostgreSQL database. The schema is generated automatically from the application model:

  • Entities → Tables: Each entity in your model becomes a table
  • Attributes → Columns: Entity attributes map to columns with appropriate types
  • One-to-x Relations → Foreign Keys: Relations between entities use foreign key constraints
  • Many-to-many Relations → Join Tables: Many-to-many relations between entities use join tables

This direct mapping enables leveraging PostgreSQL’s full capabilities:

  • Indexes: Standard B-tree and other indexes for efficient queries
  • Constraints: Check constraints, unique constraints, and foreign keys enforce data integrity
  • Transactions: Full ACID guarantees for all operations

Migration Management: Flyway manages schema migrations. When the model changes, Scribe generates SQL migration scripts that execute automatically on deployment.

S3-Compatible Storage for Content

Binary content is stored in S3-compatible object storage (AWS S3, MinIO, Ceph, etc.). Each application has dedicated buckets. This ensures:

  • Isolation: Applications cannot access each other’s buckets
  • Scalability: Object storage scales independently of compute and database

Content References: The database stores only references (unique identifiers) to content, not the content itself. When the application needs content, it retrieves it from S3 using the reference.

Immutability: Content objects are never overwritten. Updating content creates a new object with a new reference. The old content remains until explicitly deleted, enabling:

  • Safe Backups: Backup S3 buckets without worrying about in-flight modifications
  • Recoverability: Old content versions can be retained for recovery or audit
  • Atomic Updates: Database transactions can commit content reference changes without coordinating with S3

Content Encryption

ContentGrid provides transparent encryption at rest for content stored in S3. Encryption and decryption happen automatically—applications and users don’t need to manage keys or modify their workflows.

Content is encrypted with AES-128 in CTR mode.

Encryption Goals

The encryption architecture is designed to meet several requirements:

  1. Strong Security: Content encrypted using standard cryptographic primitives
  2. Key Protection: Encryption keys managed securely with database access controls
  3. Enable After Deployment: Encryption can be enabled for applications with existing unencrypted content
  4. Key Rotation: Encryption keys can be rotated on an individual basis without re-encrypting all content
  5. Application Isolation: Each application uses different encryption keys
  6. Range Request Support: Clients can request parts of files (HTTP Range) without decrypting the entire file

Data Encryption Keys

ContentGrid encrypts content at rest using Data Encryption Keys (DEKs). Each content object gets its own unique symmetric key, ensuring strong isolation between content objects.

flowchart LR
    DEK[Data Encryption Key<br/>Per Content Object]
    Content[Content<br/>Binary Data]
    DEK -->|Encrypts| Content

How It Works:

  • Unique Keys: Each content object has its own 128-bit AES symmetric key (DEK)
  • Strong Isolation: Compromising one DEK does not affect other content objects
  • Local Encryption: Content encryption and decryption happen in the application using the DEK (no external service calls)
  • Database Storage: DEKs are stored in the database alongside content metadata

Note: Future enhancements will add Key Encryption Keys (KEKs) stored in Hardware Security Modules (HSMs) or cloud Key Management Services (KMS) to provide an additional layer of protection for DEKs. See the Future Enhancements section below.

Encryption Process

When content is uploaded:

flowchart TD
    Content[Content Upload]
    GenDEK[Generate DEK]
    EncryptContent[Encrypt Content with DEK]
    StoreContent[Store Encrypted Content in S3]
    StoreKey[Store DEK in Database]
    Content --> GenDEK
    GenDEK --> EncryptContent
    EncryptContent --> StoreContent
    GenDEK --> StoreKey
  1. Generate DEK: Application generates a random symmetric key (AES-128)
  2. Encrypt Content: Application encrypts content using the DEK
  3. Store Content: Encrypted content is stored in S3
  4. Store DEK: DEK is stored in the database alongside content metadata

Decryption Process

When content is downloaded:

flowchart TD
    FetchKey[Fetch DEK from Database]
    FetchContent[Fetch Encrypted Content from S3]
    DecryptContent[Decrypt Content using DEK]
    Return[Return Content to Client]
    FetchKey --> DecryptContent
    FetchContent --> DecryptContent
    DecryptContent --> Return
  1. Fetch DEK: Application retrieves DEK from database
  2. Fetch Encrypted Content: Application retrieves encrypted content from S3
  3. Decrypt Content: Application decrypts content locally using the DEK
  4. Return to Client: Decrypted content is sent to the client

Key Storage and Management

Data Encryption Keys (DEKs):

  • Stored in the database in a dedicated table
  • Each DEK is a 128-bit AES symmetric key
  • Each content object has its own unique DEK
  • Access controlled through database permissions and connection authentication
  • DEKs are associated with their corresponding content references

Key Rotation

To rotate a DEK (e.g., upgrading encryption algorithm or if a DEK is compromised):

  1. Fetch and decrypt the old content using the old DEK
  2. Generate a new DEK
  3. Encrypt the content with the new DEK
  4. Store the new encrypted content in S3
  5. Update the database with the new DEK and content reference

DEK rotation is performed on a per-object basis and is typically only needed when upgrading cryptographic algorithms or responding to a security incident.

Range Request Support

HTTP Range requests allow clients to request specific byte ranges of a file (e.g., “bytes 1000-2000”). This is essential for:

  • Video Seeking: Jump to a timestamp without downloading the entire video
  • Large PDFs: Load only visible pages
  • Parallel Downloads: Split large files across multiple connections

Encryption Challenge: Not all encryption modes support decrypting arbitrary byte ranges—some require decrypting from the beginning.

Solution: ContentGrid uses block cipher modes that support random access (e.g., AES-CTR). The encryption implementation:

  1. Calculates which encrypted blocks contain the requested byte range, and adjust the counter for that
  2. Fetch only the exact amount of requested data from S3 (using S3 range requests)
  3. Pads the downloaded data to align to the correct block size
  4. Decrypts the blocks
  5. Trims to the exact requested range
  6. Returns to the client

No additional data needs to be fetched. There is no need to decrypt the entire file, only a small amount of extra decryption is performed to align with block boundaries.

Security Considerations

Data Confidentiality:

  • Content is encrypted using strong symmetric algorithms (AES-128)
  • Each content object has a unique encryption key (DEK)
  • DEKs are stored in the database with access controlled through database permissions
  • Encrypted content in S3 is protected from unauthorized access at the storage layer
  • Database connections are authenticated and encrypted

Data Integrity:

  • Immutability prevents accidental overwrites

Access Control:

  • Database access controls restrict which services and users can access DEKs
  • Application servers authenticate to the database using service credentials

Defense in Depth:

  • Content encrypted at rest (this architecture)
  • Data encrypted in transit (TLS)
  • Access control enforced at query level (ABAC)
  • Database connections authenticated and encrypted
  • Database encryption at rest can provide an additional protection layer

Performance Impact

Encryption Overhead:

Modern CPUs have AES hardware acceleration (AES-NI), making symmetric encryption very fast. The overhead for encrypting/decrypting content is minimal—typically less than 100 MB/s of throughput impact.

Range Requests:

Range requests with encryption decrypt slightly more data (to align with block boundaries), but the overhead is small. For a 1 KB range request, you might decrypt 1-2 KB. This is negligible compared to fetching and decrypting the entire file.

Future Enhancements

Envelope Encryption with Key Encryption Keys

ContentGrid’s encryption architecture is designed to support envelope encryption (also called two-level encryption), a standard technique used by AWS KMS, Google Cloud KMS, HashiCorp Vault, and other enterprise systems.

In envelope encryption, a Key Encryption Key (KEK) stored in a Hardware Security Module (HSM) or cloud Key Management Service (KMS) is used to encrypt the DEKs before storing them in the database. This provides additional security benefits:

  • Enhanced Key Protection: DEKs are encrypted before storage, with KEKs never leaving the HSM/KMS
  • Audit Logging: All key operations logged in the KMS for compliance and security monitoring
  • Efficient Key Rotation: Rotating the KEK only requires re-encrypting small DEKs, not the entire content

The implementation will be backward compatible with existing encrypted content. When enabled, new content will use envelope encryption, and existing DEKs can be migrated in the background without service interruption.

Summary

ContentGrid’s storage architecture separates structured metadata (PostgreSQL) from binary content (S3), optimizing each for its purpose:

  • PostgreSQL: Provides ACID transactions, relational integrity, and efficient querying for metadata
  • S3: Provides scalable, durable storage for large binary content
  • Content Encryption: Protects content at rest using unique Data Encryption Keys (DEKs) for each object
  • Range Request Support: Enables efficient access to large files without sacrificing encryption

Encryption is transparent to applications and users—no code changes or workflow modifications required. The architecture balances strong security with operational simplicity and performance.