Skip to main content

Caching

How to configure caching for blob archives to minimize network requests and improve performance.

For most users, a single option enables all cache layers:

c, err := blob.NewClient(
blob.WithDockerConfig(),
blob.WithCacheDir("/var/cache/blob"),
)

This creates a complete caching hierarchy under the specified directory with sensible defaults. The cache directory structure is:

/var/cache/blob/
├── refs/ # Tag → digest mappings (TTL-based)
├── manifests/ # Digest → manifest (content-addressed)
├── indexes/ # Digest → index blob (content-addressed)
├── content/ # Hash → file content (deduplication)
└── blocks/ # Block-level cache (HTTP optimization)

Cache Architecture

Blob supports five cache layers that work together:

┌─────────────────────────────────────────────────────────────────┐
│ Cache Layers │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ OCI Metadata Caches (3 layers) │ │
│ │ ┌──────────┐ ┌───────────────┐ ┌──────────────────┐ │ │
│ │ │ RefCache │ → │ ManifestCache │ → │ IndexCache │ │ │
│ │ │ tag→dgst │ │ dgst→manifest │ │ dgst→index │ │ │
│ │ └──────────┘ └───────────────┘ └──────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Data Caches (2 layers) │ │
│ │ ┌─────────────────────┐ ┌────────────────────────────┐ │ │
│ │ │ ContentCache │ │ BlockCache │ │ │
│ │ │ hash→file content │ │ range→data blocks │ │ │
│ │ └─────────────────────┘ └────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
CachePurposeKeyWhen to Use
RefCacheAvoid tag resolution requeststag → digestAlways (default 15 min TTL)
ManifestCacheAvoid manifest fetchesdigest → manifestAlways (content-addressed)
IndexCacheAvoid index blob downloadsdigest → bytesAlways (content-addressed)
ContentCacheDeduplicate file contentSHA256 → contentRepeated file access
BlockCacheOptimize HTTP range requestssource+offset → blockRandom access patterns

Configuration Options

RefCache TTL

Control how long tag→digest mappings are cached:

c, _ := blob.NewClient(
blob.WithDockerConfig(),
blob.WithCacheDir("/var/cache/blob"),
blob.WithRefCacheTTL(5 * time.Minute), // Refresh tags every 5 minutes
)

For mutable tags like latest that change frequently, use shorter TTLs. For immutable tags (semver releases), use longer TTLs or 0 to disable expiration.

Individual Cache Directories

Place caches on different storage:

c, _ := blob.NewClient(
blob.WithDockerConfig(),
blob.WithContentCacheDir("/fast-ssd/content"), // SSD for content
blob.WithBlockCacheDir("/fast-ssd/blocks"), // SSD for blocks
blob.WithRefCacheDir("/hdd/refs"), // HDD for metadata
blob.WithManifestCacheDir("/hdd/manifests"),
blob.WithIndexCacheDir("/hdd/indexes"),
)

Disabling Specific Caches

Omit specific cache directories to disable them:

// Only enable metadata caches, no content caching
c, _ := blob.NewClient(
blob.WithDockerConfig(),
blob.WithRefCacheDir("/var/cache/blob/refs"),
blob.WithManifestCacheDir("/var/cache/blob/manifests"),
blob.WithIndexCacheDir("/var/cache/blob/indexes"),
)

How Caching Works

Pull Operation Flow

Pull("ghcr.io/org/repo:v1")


RefCache hit? ─Yes─▶ Use cached digest
│ No

HEAD request → Get digest → Cache in RefCache


ManifestCache hit? ─Yes─▶ Use cached manifest
│ No

GET manifest → Parse → Cache in ManifestCache


IndexCache hit? ─Yes─▶ Use cached index
│ No

GET index blob → Cache in IndexCache


Return *Blob (data loaded lazily)

File Read Flow

archive.ReadFile("config.json")


ContentCache hit? ─Yes─▶ Return cached content
│ No

BlockCache hit for range? ─Yes─▶ Use cached blocks
│ No

HTTP Range Request → Cache in BlockCache


Decompress → Verify hash → Cache in ContentCache


Return content

Cache Sizing

Sizing Guidelines

Use CaseRecommended SizeNotes
Development256 MB - 1 GBBalance performance with disk
CI/CD (ephemeral)UnlimitedDisk reclaimed after job
Production server2-10 GBBased on working set
Memory-constrained64-128 MBMinimum useful size

Sizing by Cache Type

CacheTypical Entry SizeSizing Notes
RefCache~100 bytesSmall; 10 MB holds 100K+ refs
ManifestCache1-5 KB50 MB holds 10K-50K manifests
IndexCache100 KB - 5 MBVaries by file count
ContentCacheFile sizesMost disk usage
BlockCache64 KB blocksTemporary; auto-prunes

Cache Integrity

All caches validate entries on read:

CacheValidationOn Failure
RefCacheFormat checkDelete, return miss
ManifestCacheDigest + JSON parseDelete, return miss
IndexCacheDigest verificationDelete, return miss
ContentCacheSHA256 verificationDelete, return miss
BlockCacheNo verificationRe-fetch

This prevents cache poisoning and handles filesystem corruption gracefully.

Bypassing Cache

Force fresh fetches when needed:

// Pull without using any caches
archive, err := c.Pull(ctx, ref,
blob.PullWithSkipCache(),
)

// Fetch manifest without cache
manifest, err := c.Fetch(ctx, ref,
blob.FetchWithSkipCache(),
)

Cache Lifecycle

Automatic Pruning

When caches exceed their limits, old entries are automatically removed using LRU-style eviction (oldest entries removed first based on modification time).

Sharing Across Processes

All disk caches are safe for concurrent access from multiple processes. They use atomic file operations and handle race conditions correctly.

Persistence

Caches persist across program restarts. Content-addressed caches (ManifestCache, IndexCache, ContentCache) never become stale. RefCache entries expire based on TTL.

Complete Example

A production setup with all caches and custom TTL:

package main

import (
"context"
"fmt"
"log"
"time"

"github.com/meigma/blob"
)

func main() {
ctx := context.Background()

// Create client with full caching
c, err := blob.NewClient(
blob.WithDockerConfig(),
blob.WithCacheDir("/var/cache/blob"),
blob.WithRefCacheTTL(5 * time.Minute),
)
if err != nil {
log.Fatal(err)
}

// First pull: fetches from registry, populates caches
archive, err := c.Pull(ctx, "ghcr.io/myorg/myarchive:v1")
if err != nil {
log.Fatal(err)
}
fmt.Printf("Pulled: %d files\n", archive.Len())

// Second pull: uses all caches, minimal network
archive2, err := c.Pull(ctx, "ghcr.io/myorg/myarchive:v1")
if err != nil {
log.Fatal(err)
}
fmt.Printf("Cached pull: %d files\n", archive2.Len())

// File reads use content cache
content, _ := archive.ReadFile("config.json")
fmt.Printf("config.json: %s\n", content)
}

See Also