-
Notifications
You must be signed in to change notification settings - Fork 215
Description
Using a python gunicorn/flask service to pull streaming data and then publish to pubsub running in a pod on GKE, I receive errors regardless of batch size settings (including default).
Using both the pubsub and pubsub_v1 PublisherClient, with certain sets of data and default or larger batch sizes (>1000), I get:
"Retrying due to 504 Deadline Exceeded, sleeping 0.0s ..."
Until ultimately I receive the following error in the added done_callback.
Deadline of 60.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x7f869c25ae50>
Or, in some cases, no logs at all, the gunicorn worker just reboots silently. With smaller batch sizes in the low hundreds, I typically see no indication of an error, but the worker still silently fails and reboots and does so in under 60s.
This occurs regardless of if I'm running many publish workers or 1.
The GKE cluster is on a shared private VPC and is using workload identity.
This does seem to occur with specific datasets but there seems to be no clear reason why those datasets would cause any error like this.
This is the output of pip3 freeze
cachetools==4.1.0
certifi==2020.4.5.2
chardet==3.0.4
click==7.1.2
Flask==1.1.2
gevent==20.6.2
google-api-core==1.20.1
google-auth==1.17.2
google-cloud-bigquery==1.25.0
google-cloud-core==1.3.0
google-cloud-pubsub==1.6.0
google-resumable-media==0.5.1
googleapis-common-protos==1.52.0
greenlet==0.4.16
grpc-google-iam-v1==0.12.3
grpcio==1.29.0
gunicorn==20.0.4
idna==2.9
itsdangerous==1.1.0
Jinja2==2.11.2
MarkupSafe==1.1.1
protobuf==3.12.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pytz==2020.1
requests==2.23.0
rsa==4.6
six==1.15.0
urllib3==1.25.9
Werkzeug==1.0.1
zope.event==4.4
zope.interface==5.1.0