
Zarr Python
Install when your agent must read, write, or tune chunked compressed N-D arrays on local disk, S3, or GCS inside scientific or ML data pipelines.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill zarr-pythonWhat is this skill?
- Create, read, and slice chunked N-D Zarr arrays with NumPy-style indexing
- Parallel I/O patterns for large-scale scientific computing workloads
- S3 and GCS integration via s3fs and gcsfs optional installs
- Interoperates with NumPy, Dask, and Xarray in the same pipelines
- Quick start covers zeros, ones, full, and configurable chunk shapes plus dtypes
Adoption & trust: 517 installs on skills.sh; 27.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Paper Context Resolverlllllllama/ai-paper-reproduction-skill
Repo Intake And Planlllllllama/ai-paper-reproduction-skill
Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill
Minimal Run And Auditlllllllama/ai-paper-reproduction-skill
Analyze Projectlllllllama/rigorpilot-skills
Ai Research Reproductionlllllllama/rigorpilot-skills
Journey fit
Common Questions / FAQ
Is Zarr Python safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Zarr Python
# Zarr Python ## Overview Zarr is a Python library for storing large N-dimensional arrays with chunking and compression. Apply this skill for efficient parallel I/O, cloud-native workflows, and seamless integration with NumPy, Dask, and Xarray. ## Quick Start ### Installation ```bash uv pip install zarr ``` Requires Python 3.11+. For cloud storage support, install additional packages: ```python uv pip install s3fs # For S3 uv pip install gcsfs # For Google Cloud Storage ``` ### Basic Array Creation ```python import zarr import numpy as np # Create a 2D array with chunking and compression z = zarr.create_array( store="data/my_array.zarr", shape=(10000, 10000), chunks=(1000, 1000), dtype="f4" ) # Write data using NumPy-style indexing z[:, :] = np.random.random((10000, 10000)) # Read data data = z[0:100, 0:100] # Returns NumPy array ``` ## Core Operations ### Creating Arrays Zarr provides multiple convenience functions for array creation: ```python # Create empty array z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000), dtype='f4', store='data.zarr') # Create filled arrays z = zarr.ones((5000, 5000), chunks=(500, 500)) z = zarr.full((1000, 1000), fill_value=42, chunks=(100, 100)) # Create from existing data data = np.arange(10000).reshape(100, 100) z = zarr.array(data, chunks=(10, 10), store='data.zarr') # Create like another array z2 = zarr.zeros_like(z) # Matches shape, chunks, dtype of z ``` ### Opening Existing Arrays ```python # Open array (read/write mode by default) z = zarr.open_array('data.zarr', mode='r+') # Read-only mode z = zarr.open_array('data.zarr', mode='r') # The open() function auto-detects arrays vs groups z = zarr.open('data.zarr') # Returns Array or Group ``` ### Reading and Writing Data Zarr arrays support NumPy-like indexing: ```python # Write entire array z[:] = 42 # Write slices z[0, :] = np.arange(100) z[10:20, 50:60] = np.random.random((10, 10)) # Read data (returns NumPy array) data = z[0:100, 0:100] row = z[5, :] # Advanced indexing z.vindex[[0, 5, 10], [2, 8, 15]] # Coordinate indexing z.oindex[0:10, [5, 10, 15]] # Orthogonal indexing z.blocks[0, 0] # Block/chunk indexing ``` ### Resizing and Appending ```python # Resize array z.resize(15000, 15000) # Expands or shrinks dimensions # Append data along an axis z.append(np.random.random((1000, 10000)), axis=0) # Adds rows ``` ## Chunking Strategies Chunking is critical for performance. Choose chunk sizes and shapes based on access patterns. ### Chunk Size Guidelines - **Minimum chunk size**: 1 MB recommended for optimal performance - **Balance**: Larger chunks = fewer metadata operations; smaller chunks = better parallel access - **Memory consideration**: Entire chunks must fit in memory during compression ```python # Configure chunk size (aim for ~1MB per chunk) # For float32 data: 1MB = 262,144 elements = 512×512 array z = zarr.zeros( shape=(10000, 10000), chunks=(512, 512), # ~1MB chunks dtype='f4' ) ``` ### Aligning Chunks with Access Patterns **Critical**: Chunk shape dramatically affects performance based on how data is accessed. ```python # If accessing rows frequently (first dimension) z = zarr.zeros((10000, 10000), chunks=(10, 10000)) # Chunk spans columns # If accessing columns frequently (second dimension) z = zarr.zeros((10000, 10000), chunks=(10000, 10)) # Chunk spans rows # For mixed access patterns (balanced approach) z = zarr.zeros((10000, 10000), chunks=(1000, 1000)) # Square chunks ``` **Performance example**: For a (200, 200, 200) array, reading along the first dimension: - Using chunks (1, 200, 200): ~107ms - Using chunks (200, 200, 1): ~1.