Fancy slicing with lists by mrocklin · Pull Request #26 · dask/dask

mrocklin · 2015-02-03T03:04:06Z

OK, so we do a dual approach to achieve fancy indexing.

Given an index, like

(5, slice(5, 10, 2), [1, 2, 3])

We first do the normal dask_slice solution on the array with the slice replaced with an empty list

(5, slice(5, 10, 2), slice(None, None, None))

We then follow with the final list list. I suspect that we could repeat these for multiple lists and achieve Matlab style orthogonal indexing.

It mostly works

Example

In [1]: from blaze import Data, compute, into

In [2]: import dask.array as da

In [3]: import numpy as np

In [4]: x = np.arange(100).reshape((10, 10))

In [5]: d = Data(into(da.Array, x, blockshape=(3, 3)))

In [6]: np.array(d[5:9, 1:9:2])  # could do this before
Out[6]: 
array([[51, 53, 55, 57],
       [61, 63, 65, 67],
       [71, 73, 75, 77],
       [81, 83, 85, 87]])

In [7]: np.array(d[0:3, [1, 3, 8, 3]])  # Now can do this
Out[7]: 
array([[ 1,  3,  8,  3],
       [11, 13, 18, 13],
       [21, 23, 28, 23]])

The actual dask looks like the following

In [8]: y = compute(d[0:3, [1, 3, 8, 3]])

In [14]: cull(y.dask, y.keys())
Out[14]: 
{'x_1': array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]),
 ('slice-2', 0, 0): (<function operator.getitem>,
  ('x_1', 0, 0),
  (slice(0, 3, 1), slice(0, 3, 1))),
 ('slice-2', 0, 1): (<function operator.getitem>,
  ('x_1', 0, 1),
  (slice(0, 3, 1), slice(0, 3, 1))),
 ('slice-2', 0, 2): (<function operator.getitem>,
  ('x_1', 0, 2),
  (slice(0, 3, 1), slice(0, 3, 1))),
 ('x_1', 0, 0): (<function operator.getitem>,
  'x_1',
  (slice(0, 3, None), slice(0, 3, None))),
 ('x_1', 0, 1): (<function operator.getitem>,
  'x_1',
  (slice(0, 3, None), slice(3, 6, None))),
 ('x_1', 0, 2): (<function operator.getitem>,
  'x_1',
  (slice(0, 3, None), slice(6, 9, None))),
 ('x_4', 0, 0): (<function operator.getitem>,
  (<function numpy.core.multiarray.concatenate>,
   (list,
    [(<function operator.getitem>,
      ('slice-2', 0, 0),
      (slice(None, None, None), [1])),
     (<function operator.getitem>,
      ('slice-2', 0, 1),
      (slice(None, None, None), [0, 0])),
     (<function operator.getitem>,
      ('slice-2', 0, 2),
      (slice(None, None, None), [2]))]),
   1),
  (slice(None, None, None), (0, 1, 3, 1)))}

Some known problems

d[0] fails
Multiple lists fail (though I think that this is probably easy to fix in the Matlab style)
edge cases may fail

cc @nevermindewe @shoyer

(and also support nested tasks)

mrocklin · 2015-02-03T03:05:15Z

Fixes #22

mrocklin · 2015-02-03T03:07:16Z

Also @shoyer, this brings us to the point where dask.array can successfully perform the following

In [19]: np.array(d[[5, 3, 0]].sum(axis=0))
Out[19]: array([ 80,  83,  86,  89,  92,  95,  98, 101, 104])

Which, I think, is likely sufficient for your common use cases.

shoyer · 2015-02-03T03:45:01Z

dask/array/core.py

do you really want to explicitly restrict array indexing to lists?

Assuming numpy is a hard dep of dask (which I think it is?) I would rather cast to ndarray for non integer/slices and then allow only 1d arrays of integers. For large arrays, using lists is going to be a bottleneck.

We can do both. I was just at about my limit for complexity while I was building this and didn't want to think about other cases. Both of those sound good though.

shoyer · 2015-02-03T03:46:35Z

Handling 1D boolean arrays is also pretty easy -- you can just convert them into integer arrays with np.nonzero.

Conflicts: dask/tests/test_core.py

mrocklin · 2015-02-04T00:34:58Z

I've handled the edge cases (I think). Merging.

This doesn't yet handle multi-list nor things like slicing with arrays.

Fancy slicing with lists

Set num_boost_round

mrocklin added 9 commits February 2, 2015 15:12

remove comments

fe67b34

remove more comments

17cbf02

first draft of take

b4a2c4a

s/operator.getitem/getitem

980da44

rewrite inline_functions to use inline

162f91e

(and also support nested tasks)

fancy slicing with lists

ad64731

inline_functions doesn't inline if no keys

2455e52

cull supports nested keys

0cb74de

py26 compatibility

a5ad81d

shoyer reviewed Feb 3, 2015
View reviewed changes

mrocklin added 6 commits February 3, 2015 14:58

fix doctests

7a52686

Array.blockdims set correctly in uneven slices

8306cf1

slicing with lost axes and lists

0073893

Merge branch 'master' into more-slicing

b5351ce

Conflicts: dask/tests/test_core.py

inline_functions doesn't inline constants

7f9a5a7

py26

106ef5d

mrocklin added a commit that referenced this pull request Feb 4, 2015

Merge pull request #26 from mrocklin/more-slicing

7107fc7

Fancy slicing with lists

mrocklin merged commit 7107fc7 into dask:master Feb 4, 2015

mrocklin deleted the more-slicing branch February 4, 2015 00:49

mrocklin pushed a commit to mrocklin/dask that referenced this pull request Mar 28, 2019

Merge pull request dask#26 from TomAugspurger/xgboost-num-boost-round

6d41590

Set num_boost_round

rgommers mentioned this pull request Jul 7, 2022

Add take specification for returning elements of an array along a specified axis data-apis/array-api#416

Merged

phofl pushed a commit to phofl/dask that referenced this pull request Dec 23, 2024

Always optimize (dask#26)

d47f0c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Fancy slicing with lists#26

Fancy slicing with lists#26
mrocklin merged 15 commits intodask:masterfrom
mrocklin:more-slicing

mrocklin commented Feb 3, 2015

Uh oh!

mrocklin commented Feb 3, 2015

Uh oh!

mrocklin commented Feb 3, 2015

Uh oh!

shoyer Feb 3, 2015

Uh oh!

mrocklin Feb 3, 2015

Uh oh!

shoyer commented Feb 3, 2015

Uh oh!

mrocklin commented Feb 4, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

mrocklin commented Feb 3, 2015

Example

Some known problems

Uh oh!

mrocklin commented Feb 3, 2015

Uh oh!

mrocklin commented Feb 3, 2015

Uh oh!

shoyer Feb 3, 2015

Choose a reason for hiding this comment

Uh oh!

mrocklin Feb 3, 2015

Choose a reason for hiding this comment

Uh oh!

shoyer commented Feb 3, 2015

Uh oh!

mrocklin commented Feb 4, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants