improve/fix glue job logs printing#30886
Conversation
ferruzzi
left a comment
There was a problem hiding this comment.
Glue logging is a mess, I like this solution. Left a couple comments.
|
Even if this is closed, I would like to make a note on Glue logging. DEFAULT_LOG_FORMAT = "%(levelname)s:%(name)s:%(module)s:%(message)s"
def get_logger(name: str = None, level: Any = logging.INFO, log_format: str = DEFAULT_LOG_FORMAT) -> logging.Logger:
"""Returns a logger configured for Glue jobs"""
formatter = logging.Formatter(fmt=log_format)
# glue sets its own handlers by default, but they suck.
# this handler redirects INFO, WARNING and DEBUG to sys.stdout
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setLevel(logging.DEBUG)
stdout_handler.addFilter(lambda record: record.levelno < logging.ERROR)
stdout_handler.setFormatter(formatter)
# this handler redirects ERROR to sys.stderr
stderr_handler = logging.StreamHandler(sys.stderr)
stderr_handler.setLevel(logging.ERROR)
stderr_handler.setFormatter(formatter)
logger = logging.getLogger(name=name)
logger.handlers.clear()
logger.setLevel(level)
logger.addHandler(stdout_handler)
logger.addHandler(stderr_handler)
return loggerIn effect, this will log all INFO, WARNING and DEBUG to /output and all ERROR to /error. |
|
@IAL32 Neat. Where does that helper live? Is it added to the script that the job executes? |
Exactly. As far as I know, this is the only way to get logging to work on Glue Jobs. Also note: when grabbing the root logger ( |
there was several problems with the current implementation:
Regarding the fact that we now display both streams, I hesitated between interleaving messages from both, sorting by timestamp, or leaving them separated. I ended up choosing to have them separated to keep the experience consistent with what the user would see in cloudwatch, but I'd be happy to change that to chronological order if people think it's better.
Also: added it to the system test (for better visibility for users) + added utest
cc @ferruzzi