
Dask
Scale NumPy-style array code to datasets larger than RAM with chunked, parallel Dask arrays.
Overview
Dask Array is an agent skill for the Build phase that explains chunked, parallel NumPy-style arrays for out-of-core numerical computation.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill daskWhat is this skill?
- NumPy ndarray API via blocked chunks for out-of-core computation
- Parallel reductions, linear algebra, slicing, reshape, and ufuncs across chunks
- Clear guidance on when to use Dask vs stay on in-memory NumPy
- Supports sum/mean/std, dot products, SVD/QR-style ops where applicable
- Chunk grid model so operations apply per block and combine automatically
Adoption & trust: 552 installs on skills.sh; 27.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your NumPy code works on samples but crashes or crawls when full arrays exceed RAM or you need parallel chunk execution.
Who is it for?
Python builders adding analytics, ML prep, or scientific numerics who need larger-than-memory arrays with familiar ndarray APIs.
Skip if: Tiny in-memory arrays, non-Python stacks, or production cluster tuning without any NumPy-style code path.
When should I use this skill?
Arrays exceed available RAM, computation can be parallelized across chunks, or you need NumPy-style ops on larger datasets.
What do I get? / Deliverables
You get a clear Dask Array mental model—chunks, supported ops, and when to stay on NumPy—so pipelines can scale without a full rewrite.
- Dask Array architecture and chunking recommendations
- Decision guidance for Dask vs NumPy for a given workload
Recommended Skills
Journey fit
How it compares
Procedural Dask Array knowledge for agents—not a hosted notebook runtime or a one-click ETL SaaS.
Common Questions / FAQ
Who is dask for?
Solo and indie Python developers implementing data-heavy features, research prototypes, or batch pipelines who already use NumPy.
When should I use dask?
During Build backend work when arrays exceed RAM, chunk-parallel math is possible, or you are scaling existing NumPy-style code to larger datasets.
Is dask safe to install?
Review the Security Audits panel on this Prism page and treat third-party scientific skill repos like any dependency before enabling shell or package installs.
SKILL.md
READMESKILL.md - Dask
# Dask Arrays ## Overview Dask Array implements NumPy's ndarray interface using blocked algorithms. It coordinates many NumPy arrays arranged into a grid to enable computation on datasets larger than available memory, utilizing parallelism across multiple cores. ## Core Concept A Dask Array is divided into chunks (blocks): - Each chunk is a regular NumPy array - Operations are applied to each chunk in parallel - Results are combined automatically - Enables out-of-core computation (data larger than RAM) ## Key Capabilities ### What Dask Arrays Support **Mathematical Operations**: - Arithmetic operations (+, -, *, /) - Scalar functions (exponentials, logarithms, trigonometric) - Element-wise operations **Reductions**: - `sum()`, `mean()`, `std()`, `var()` - Reductions along specified axes - `min()`, `max()`, `argmin()`, `argmax()` **Linear Algebra**: - Tensor contractions - Dot products and matrix multiplication - Some decompositions (SVD, QR) **Data Manipulation**: - Transposition - Slicing (standard and fancy indexing) - Reshaping - Concatenation and stacking **Array Protocols**: - Universal functions (ufuncs) - NumPy protocols for interoperability ## When to Use Dask Arrays **Use Dask Arrays When**: - Arrays exceed available RAM - Computation can be parallelized across chunks - Working with NumPy-style numerical operations - Need to scale NumPy code to larger datasets **Stick with NumPy When**: - Arrays fit comfortably in memory - Operations require global views of data - Using specialized functions not available in Dask - Performance is adequate with NumPy alone ## Important Limitations Dask Arrays intentionally don't implement certain NumPy features: **Not Implemented**: - Most `np.linalg` functions (only basic operations available) - Operations difficult to parallelize (like full sorting) - Memory-inefficient operations (converting to lists, iterating via loops) - Many specialized functions (driven by community needs) **Workarounds**: For unsupported operations, consider using `map_blocks` with custom NumPy code. ## Creating Dask Arrays ### From NumPy Arrays ```python import dask.array as da import numpy as np # Create from NumPy array with specified chunks x = np.arange(10000) dx = da.from_array(x, chunks=1000) # Creates 10 chunks of 1000 elements each ``` ### Random Arrays ```python # Create random array with specified chunks x = da.random.random((10000, 10000), chunks=(1000, 1000)) # Other random functions x = da.random.normal(10, 0.1, size=(10000, 10000), chunks=(1000, 1000)) ``` ### Zeros, Ones, and Empty ```python # Create arrays filled with constants zeros = da.zeros((10000, 10000), chunks=(1000, 1000)) ones = da.ones((10000, 10000), chunks=(1000, 1000)) empty = da.empty((10000, 10000), chunks=(1000, 1000)) ``` ### From Functions ```python # Create array from function def create_block(block_id): return np.random.random((1000, 1000)) * block_id[0] x = da.from_delayed( [[dask.delayed(create_block)((i, j)) for j in range(10)] for i in range(10)], shape=(10000, 10000), dtype=float ) ``` ### From Disk ```python # Load from HDF5 import h5py f = h5py.File('myfile.hdf5', mode='r') x = da.from_array(f['/data'], chunks=(1000, 1000)) # Load from Zarr import zarr z = zarr.open('myfile.zarr', mode='r') x = da.from_array(z, chunks=(1000, 1000)) ``` ## Common Operations ### Arithmetic Operations ```python import dask.array as da x = da.random.random((10000, 10000), chunks=(1000, 1000)) y = da.random.random((10000, 10000), chunks=(1000, 1000)) # Element-wise operations (lazy) z = x + y z = x * y z = da.exp(x) z = da.log(y) # Compute result result = z.compute() ``` ### Reductions ```python # Reductions along axes total = x.sum().compute() mean = x.mean().compute() std = x.std().compute() # Reduction along specific axis row_means = x.mean(axis=1).compute() col_sums = x.sum(axis=0).compute() ``` ### Slicing and Indexing ```python # Standard slicing (returns Dask Array) subset = x[10