-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Open
Description
Many of the C functions in numpy have code that is something like:
PyObject *foo(PyObject *obj) {
if (PyUnicode_Check(obj)) {
/* accept unicode input */
PyObject *obj_bytes = PyUnicode_AsASCIIString(obj);
if (obj_bytes == NULL) {
return NULL;
}
PyObject *ret = foo(obj_bytes);
Py_DECREF(obj_bytes);
}
char *str = NULL;
Py_ssize_t length = 0;
if (PyBytes_AsStringAndSize(obj, &str, &length) < 0) {
return NULL;
}
// work with bytes
}This construction means they raise UnicodeEncodeError for some illegal inputs, which is often less useful than the message would normally be
It would be better to work with the unicode objects directly, by copying the approach taken in the C code changes in #15261, notably these lines and these lines.
Spawned from discussion in #15261 (comment)
Reactions are currently unavailable