Skip to content

MAINT: Work with unicode strings in dtype('i8,i8')#15261

Merged
mattip merged 1 commit intonumpy:masterfrom
eric-wieser:unicode-commastring
Jan 11, 2020
Merged

MAINT: Work with unicode strings in dtype('i8,i8')#15261
mattip merged 1 commit intonumpy:masterfrom
eric-wieser:unicode-commastring

Conversation

@eric-wieser
Copy link
Member

Right now, we convert str objects to bytes, and then work with those.

Since this is a human convenience API, the input really ought to be a string.
A future patch will suggest deprecating dtype(b'i8,i8'), but for now it will continue to work.

Should fix the CI failures in #15249

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what motivated #15254

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice. I think in conversion_utils.py there is another bunch of these were the logic should be reversed like here. I think this can change the error type in very weird, but it seems Unicode*Error inherits from ValueError so nothing really changes, and quite honestly it would be very weird cases in any case.

@charris
Copy link
Member

charris commented Jan 8, 2020

Needs a rebase.

@eric-wieser
Copy link
Member Author

Will rebase on #15310 once that goes in

Right now, we convert `str` objects to `bytes`, and then work with those.

Since this is a human convenience API, the input really ought to be a string.
A future patch will suggest deprecating `dtype(b'i8,i8')`, but for now it will continue to work.
@mattip mattip merged commit c9fd0e7 into numpy:master Jan 11, 2020
@mattip
Copy link
Member

mattip commented Jan 11, 2020

Thanks @eric-wieser

@eric-wieser
Copy link
Member Author

eric-wieser commented Jan 11, 2020

@seberg: Mind elaborating on that with a link to a line?

@seberg
Copy link
Member

seberg commented Jan 12, 2020

I think I thought of invalid unicode input. But the change of error type doesn't happen of course. The error message just improved a bit for np.dtype("ä").

@eric-wieser
Copy link
Member Author

I think in conversion_utils.py there is another bunch of thes

I meant regarding this, sorry for being unclear

@seberg
Copy link
Member

seberg commented Jan 12, 2020

Ah, I meant this type of thing:
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/conversion_utils.c#L604-L611

I think most converters that accept strings will convert unicode to ascii and then make a recursive call.

@eric-wieser
Copy link
Member Author

@seberg: Looks like I ended up addressing that in #16008.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants