Extracting Files
How to extract archive contents to the local filesystem.
Extracting from OCI Registry
Pull an archive from an OCI registry and extract its contents:
import "github.com/meigma/blob"
func extractFromRegistry(ref, destDir string) error {
c, err := blob.NewClient(blob.WithDockerConfig())
if err != nil {
return err
}
// Pull the archive
archive, err := c.Pull(ctx, ref)
if err != nil {
return err
}
// Extract all files
return archive.CopyDir(destDir, ".")
}
The pulled archive fetches file data lazily. For extraction, data is streamed via range requests as needed.
Working with BlobFile
When using OpenFile or CreateBlob from the core package, you receive a *BlobFile which embeds *Blob. All extraction methods work identically:
import blobcore "github.com/meigma/blob/core"
blobFile, err := blobcore.OpenFile("index.blob", "data.blob")
if err != nil {
return err
}
defer blobFile.Close()
// All Blob methods work on BlobFile
err = blobFile.CopyDir("/dest", ".")
The examples below use *Blob but apply equally to *BlobFile.
Extracting Specific Files
To extract specific files by path, use CopyTo:
err := archive.CopyTo("/dest/dir", "config.json", "lib/utils.go", "main.go")
The files are extracted to the destination directory, preserving their relative paths:
config.json->/dest/dir/config.jsonlib/utils.go->/dest/dir/lib/utils.go
Parent directories are created automatically.
To pass options, use CopyToWithOptions:
paths := []string{"config.json", "lib/utils.go"}
err := archive.CopyToWithOptions("/dest/dir", paths,
blob.CopyWithOverwrite(true),
blob.CopyWithPreserveMode(true),
)
Extracting Directories
To extract all files under a directory prefix, use CopyDir:
// Extract everything under src/
err := archive.CopyDir("/dest/dir", "src")
// Extract the entire archive
err = archive.CopyDir("/dest/dir", ".")
Files matching the prefix are extracted to the destination directory with their full archive paths:
- Archive path
src/main.gowith prefixsrc->/dest/dir/src/main.go - Archive path
lib/utils.gowith prefix.->/dest/dir/lib/utils.go
Overwrite Behavior
By default, existing files are skipped. To overwrite:
err := archive.CopyDir("/dest/dir", ".",
blob.CopyWithOverwrite(true),
)
This is useful when you want to ensure files match the archive contents, even if the destination already exists.
Preserving Metadata
File Modes
To preserve file permission modes from the archive:
err := archive.CopyDir("/dest/dir", ".",
blob.CopyWithPreserveMode(true),
)
Without this option, extracted files use the current umask defaults.
Modification Times
To preserve file modification times:
err := archive.CopyDir("/dest/dir", ".",
blob.CopyWithPreserveTimes(true),
)
Both Mode and Times
err := archive.CopyDir("/dest/dir", ".",
blob.CopyWithPreserveMode(true),
blob.CopyWithPreserveTimes(true),
)
Clean Destination Mode
For faster extraction when starting fresh, use CopyWithCleanDest:
err := archive.CopyDir("/dest/dir", ".",
blob.CopyWithCleanDest(true),
)
This option:
- Removes the destination directory (or subdirectory if prefix is specified)
- Writes files directly to their final paths (no temp files)
- Automatically enables overwrite mode
Warning: This deletes existing content in the destination. Use with caution.
CopyWithCleanDest is only supported by CopyDir, not CopyTo.
Safety Checks
The clean destination option includes safety checks:
- Refuses to delete filesystem roots (
/,C:\) - Refuses to delete the volume root on Windows
- Requires a non-empty destination path
Parallel Extraction
Worker Count
To control the number of parallel file writers:
err := archive.CopyDir("/dest/dir", ".",
blob.CopyWithWorkers(8), // Use 8 parallel workers
)
Values:
0- Use automatic heuristics (default)- Negative - Force serial processing
- Positive - Use exactly N workers
Read Concurrency
For remote archives, control the number of concurrent range reads:
err := archive.CopyDir("/dest/dir", ".",
blob.CopyWithReadConcurrency(8), // 8 concurrent HTTP requests
)
The default is 4 concurrent reads. Higher values reduce latency but increase memory usage and server load.
Read-Ahead Budget
To limit memory usage during parallel reads:
err := archive.CopyDir("/dest/dir", ".",
blob.CopyWithReadAheadBytes(64 * 1024 * 1024), // 64MB budget
)
This caps the total size of buffered read-ahead data. Use this when extracting large files to prevent memory exhaustion.
Error Handling
Extraction errors include the file path and underlying cause:
err := archive.CopyDir("/dest/dir", ".")
if err != nil {
// Errors are wrapped with path context
log.Printf("extraction failed: %v", err)
}
Common error scenarios:
- Destination directory not writable
- Disk full
- Hash mismatch (corrupted archive data)
- Network error (for remote archives)
Complete Example
A production extraction function with all options:
func extractArchive(archive *blob.Blob, destDir string, opts ExtractOptions) error {
copyOpts := []blob.CopyOption{
blob.CopyWithOverwrite(opts.Overwrite),
}
if opts.PreserveMetadata {
copyOpts = append(copyOpts,
blob.CopyWithPreserveMode(true),
blob.CopyWithPreserveTimes(true),
)
}
if opts.Clean {
copyOpts = append(copyOpts, blob.CopyWithCleanDest(true))
}
if opts.Workers > 0 {
copyOpts = append(copyOpts, blob.CopyWithWorkers(opts.Workers))
}
if opts.ReadConcurrency > 0 {
copyOpts = append(copyOpts, blob.CopyWithReadConcurrency(opts.ReadConcurrency))
}
prefix := opts.Prefix
if prefix == "" {
prefix = "."
}
return archive.CopyDir(destDir, prefix, copyOpts...)
}
type ExtractOptions struct {
Prefix string
Overwrite bool
PreserveMetadata bool
Clean bool
Workers int
ReadConcurrency int
}
Usage:
err := extractArchive(archive, "/app/deploy", ExtractOptions{
Prefix: "dist",
Overwrite: true,
PreserveMetadata: true,
Clean: true,
Workers: 4,
ReadConcurrency: 8,
})
See Also
- OCI Client - Pull archives from registries
- Performance Tuning - Advanced extraction optimization
- Caching - Cache content for faster repeated extraction