Data Storage & Encryption

ContentGrid separates structured metadata from binary content, storing each optimally. Metadata lives in PostgreSQL, while content (documents, images, videos) is stored in S3-compatible object storage.

Storage Architecture

PostgreSQL for Metadata

Each ContentGrid application has its own PostgreSQL database. The schema is generated automatically from the application model:

Entities → Tables: Each entity in your model becomes a table
Attributes → Columns: Entity attributes map to columns with appropriate types
One-to-x Relations → Foreign Keys: Relations between entities use foreign key constraints
Many-to-many Relations → Join Tables: Many-to-many relations between entities use join tables

This direct mapping enables leveraging PostgreSQL’s full capabilities:

Indexes: Standard B-tree and other indexes for efficient queries
Constraints: Check constraints, unique constraints, and foreign keys enforce data integrity
Transactions: Full ACID guarantees for all operations

Migration Management: Flyway manages schema migrations. When the model changes, Scribe generates SQL migration scripts that execute automatically on deployment.

S3-Compatible Storage for Content

Binary content is stored in S3-compatible object storage (AWS S3, MinIO, Ceph, etc.). Each application has dedicated buckets. This ensures:

Isolation: Applications cannot access each other’s buckets
Scalability: Object storage scales independently of compute and database

Content References: The database stores only references (unique identifiers) to content, not the content itself. When the application needs content, it retrieves it from S3 using the reference.

Immutability: Content objects are never overwritten. Updating content creates a new object with a new reference. The old content remains until explicitly deleted, enabling:

Safe Backups: Backup S3 buckets without worrying about in-flight modifications
Recoverability: Old content versions can be retained for recovery or audit
Atomic Updates: Database transactions can commit content reference changes without coordinating with S3

Content Encryption

ContentGrid provides transparent encryption at rest for content stored in S3. Encryption and decryption happen automatically—applications and users don’t need to manage keys or modify their workflows.

Content is encrypted with AES-128 in CTR mode.

Encryption Goals

The encryption architecture is designed to meet several requirements:

Strong Security: Content encrypted using standard cryptographic primitives
Key Protection: Encryption keys managed securely with database access controls
Enable After Deployment: Encryption can be enabled for applications with existing unencrypted content
Key Rotation: Encryption keys can be rotated on an individual basis without re-encrypting all content
Application Isolation: Each application uses different encryption keys
Range Request Support: Clients can request parts of files (HTTP Range) without decrypting the entire file

Data Encryption Keys

ContentGrid encrypts content at rest using Data Encryption Keys (DEKs). Each content object gets its own unique symmetric key, ensuring strong isolation between content objects.

flowchart LR
    DEK[Data Encryption Key<br/>Per Content Object]
    Content[Content<br/>Binary Data]
    DEK -->|Encrypts| Content

How It Works:

Unique Keys: Each content object has its own 128-bit AES symmetric key (DEK)
Strong Isolation: Compromising one DEK does not affect other content objects
Local Encryption: Content encryption and decryption happen in the application using the DEK (no external service calls)
Database Storage: DEKs are stored in the database alongside content metadata

Note: Future enhancements will add Key Encryption Keys (KEKs) stored in Hardware Security Modules (HSMs) or cloud Key Management Services (KMS) to provide an additional layer of protection for DEKs. See the Future Enhancements section below.

Encryption Process

When content is uploaded:

flowchart TD
    Content[Content Upload]
    GenDEK[Generate DEK]
    EncryptContent[Encrypt Content with DEK]
    StoreContent[Store Encrypted Content in S3]
    StoreKey[Store DEK in Database]
    Content --> GenDEK
    GenDEK --> EncryptContent
    EncryptContent --> StoreContent
    GenDEK --> StoreKey

Generate DEK: Application generates a random symmetric key (AES-128)
Encrypt Content: Application encrypts content using the DEK
Store Content: Encrypted content is stored in S3
Store DEK: DEK is stored in the database alongside content metadata

Decryption Process

When content is downloaded:

flowchart TD
    FetchKey[Fetch DEK from Database]
    FetchContent[Fetch Encrypted Content from S3]
    DecryptContent[Decrypt Content using DEK]
    Return[Return Content to Client]
    FetchKey --> DecryptContent
    FetchContent --> DecryptContent
    DecryptContent --> Return

Fetch DEK: Application retrieves DEK from database
Fetch Encrypted Content: Application retrieves encrypted content from S3
Decrypt Content: Application decrypts content locally using the DEK
Return to Client: Decrypted content is sent to the client

Key Storage and Management

Data Encryption Keys (DEKs):

Stored in the database in a dedicated table
Each DEK is a 128-bit AES symmetric key
Each content object has its own unique DEK
Access controlled through database permissions and connection authentication
DEKs are associated with their corresponding content references

Key Rotation

To rotate a DEK (e.g., upgrading encryption algorithm or if a DEK is compromised):

Fetch and decrypt the old content using the old DEK
Generate a new DEK
Encrypt the content with the new DEK
Store the new encrypted content in S3
Update the database with the new DEK and content reference

DEK rotation is performed on a per-object basis and is typically only needed when upgrading cryptographic algorithms or responding to a security incident.

Range Request Support

HTTP Range requests allow clients to request specific byte ranges of a file (e.g., “bytes 1000-2000”). This is essential for:

Video Seeking: Jump to a timestamp without downloading the entire video
Large PDFs: Load only visible pages
Parallel Downloads: Split large files across multiple connections

Encryption Challenge: Not all encryption modes support decrypting arbitrary byte ranges—some require decrypting from the beginning.

Solution: ContentGrid uses block cipher modes that support random access (e.g., AES-CTR). The encryption implementation:

Calculates which encrypted blocks contain the requested byte range, and adjust the counter for that
Fetch only the exact amount of requested data from S3 (using S3 range requests)
Pads the downloaded data to align to the correct block size
Decrypts the blocks
Trims to the exact requested range
Returns to the client

No additional data needs to be fetched. There is no need to decrypt the entire file, only a small amount of extra decryption is performed to align with block boundaries.

Security Considerations

Data Confidentiality:

Content is encrypted using strong symmetric algorithms (AES-128)
Each content object has a unique encryption key (DEK)
DEKs are stored in the database with access controlled through database permissions
Encrypted content in S3 is protected from unauthorized access at the storage layer
Database connections are authenticated and encrypted

Data Integrity:

Immutability prevents accidental overwrites

Access Control:

Database access controls restrict which services and users can access DEKs
Application servers authenticate to the database using service credentials

Defense in Depth:

Content encrypted at rest (this architecture)
Data encrypted in transit (TLS)
Access control enforced at query level (ABAC)
Database connections authenticated and encrypted
Database encryption at rest can provide an additional protection layer

Performance Impact

Encryption Overhead:

Modern CPUs have AES hardware acceleration (AES-NI), making symmetric encryption very fast. The overhead for encrypting/decrypting content is minimal—typically less than 100 MB/s of throughput impact.

Range Requests:

Range requests with encryption decrypt slightly more data (to align with block boundaries), but the overhead is small. For a 1 KB range request, you might decrypt 1-2 KB. This is negligible compared to fetching and decrypting the entire file.

Future Enhancements

Envelope Encryption with Key Encryption Keys

ContentGrid’s encryption architecture is designed to support envelope encryption (also called two-level encryption), a standard technique used by AWS KMS, Google Cloud KMS, HashiCorp Vault, and other enterprise systems.

In envelope encryption, a Key Encryption Key (KEK) stored in a Hardware Security Module (HSM) or cloud Key Management Service (KMS) is used to encrypt the DEKs before storing them in the database. This provides additional security benefits:

Enhanced Key Protection: DEKs are encrypted before storage, with KEKs never leaving the HSM/KMS
Audit Logging: All key operations logged in the KMS for compliance and security monitoring
Efficient Key Rotation: Rotating the KEK only requires re-encrypting small DEKs, not the entire content

The implementation will be backward compatible with existing encrypted content. When enabled, new content will use envelope encryption, and existing DEKs can be migrated in the background without service interruption.

Summary

ContentGrid’s storage architecture separates structured metadata (PostgreSQL) from binary content (S3), optimizing each for its purpose:

PostgreSQL: Provides ACID transactions, relational integrity, and efficient querying for metadata
S3: Provides scalable, durable storage for large binary content
Content Encryption: Protects content at rest using unique Data Encryption Keys (DEKs) for each object
Range Request Support: Enables efficient access to large files without sacrificing encryption

Encryption is transparent to applications and users—no code changes or workflow modifications required. The architecture balances strong security with operational simplicity and performance.