Skip to content

Clean up internal callers of PyUnicode_AsASCIIString #15317

@eric-wieser

Description

@eric-wieser

Many of the C functions in numpy have code that is something like:

PyObject *foo(PyObject *obj) {
    if (PyUnicode_Check(obj)) {
        /* accept unicode input */
        PyObject *obj_bytes = PyUnicode_AsASCIIString(obj);
        if (obj_bytes == NULL) {
             return NULL;
        }
        PyObject *ret = foo(obj_bytes);
        Py_DECREF(obj_bytes);
    }
    
    char *str = NULL;
    Py_ssize_t length = 0;
    if (PyBytes_AsStringAndSize(obj, &str, &length) < 0) {
        return NULL;
    }

    // work with bytes
}

This construction means they raise UnicodeEncodeError for some illegal inputs, which is often less useful than the message would normally be

It would be better to work with the unicode objects directly, by copying the approach taken in the C code changes in #15261, notably these lines and these lines.


Spawned from discussion in #15261 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions