Conversation
96cecb2 to
72d9ac9
Compare
YifeiZhuang
left a comment
There was a problem hiding this comment.
This is good overall. I finally start to understand the spirit of it.
709026a to
0c1eef8
Compare
af33b25 to
d8f7094
Compare
|
Fallback is ready for review again. |
YifeiZhuang
left a comment
There was a problem hiding this comment.
Sending what I have, mostly minor, I haven't looked deep enough.
This comment was marked as resolved.
This comment was marked as resolved.
larry-safran
left a comment
There was a problem hiding this comment.
Things done in commit 003348b
f186005 to
f304264
Compare
| subscriber.controlPlaneClient.adjustResourceSubscription(type); | ||
|
|
||
| CpcWithFallbackState cpcToUse = manageControlPlaneClient(subscriber); | ||
| if (cpcToUse.cpc != null) { |
There was a problem hiding this comment.
If cpc is null here because all cpcs are failing, then the subscriber won't be notified with an error. Although, is that preexisting? If it is preexisting, we can resolve that as a separate PR.
There was a problem hiding this comment.
When the retry timer goes off and the CPC tries to reconnect, if it fails, then handleStreamClosed will call onError (the same as previously done). It seems like we shouldn't continue sending errors for every backoff, but that should probably be addressed in a separate PR.
There was a problem hiding this comment.
Yes, I had considered that, but retries can take 2 minutes. It is known to be failing at this very moment, so we shouldn't delay an error.
I think it is proper to update the error each backoff; the error can change over time.
There was a problem hiding this comment.
If we don't care about an extra call to onError, then it trivially becomes adding a call to subscriber.onError in the catch (IOException) of manageControlPlaneClient
There was a problem hiding this comment.
Why manageControlPlaneClient(), which I've already said is broken to pass a subscriber, when we could just call onError() directly here? Nothing actually knows the error more than "no working cpcs." I was most interested in having an actionable error message, but having a call to onError() here would be fine except...
... we'd have to be careful about it to avoid #11672 . Let's not worry about it in this PR, because it is pre-existing, broken in multiple ways, and fallback is enough trouble by itself.
| subscriber.controlPlaneClient.adjustResourceSubscription(type); | ||
|
|
||
| CpcWithFallbackState cpcToUse = manageControlPlaneClient(subscriber); | ||
| if (cpcToUse.cpc != null) { |
There was a problem hiding this comment.
Yes, I had considered that, but retries can take 2 minutes. It is known to be failing at this very moment, so we shouldn't delay an error.
I think it is proper to update the error each backoff; the error can change over time.
|
The `FailingXdsTransport` seems like the cleanest approach, though I think
we need a `FailingXdsStreamingCall` to go with it.
…On Wed, Nov 27, 2024 at 2:43 PM Eric Anderson ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In xds/src/main/java/io/grpc/xds/client/XdsClientImpl.java
<#11254 (comment)>:
> + cleanUpResourceTimers(cpcClosed);
+
+ if (status.isOk()) {
+ return; // Not considered an error
+ }
+
+ Collection<String> authoritiesForClosedCpc = getActiveAuthorities(cpcClosed);
+ for (Map<String, ResourceSubscriber<? extends ResourceUpdate>> subscriberMap :
+ resourceSubscribers.values()) {
+ for (ResourceSubscriber<? extends ResourceUpdate> subscriber : subscriberMap.values()) {
+ if (subscriber.hasResult() || !authoritiesForClosedCpc.contains(subscriber.authority)) {
+ continue;
+ }
+
+ // try to fallback to lower priority control plane client
+ if (shouldTryFallback && manageControlPlaneClient(subscriber).didFallback) {
In getOrCreateControlPlaneClient(String) we need to catch
IllegalArgumentException from getOrCreateControlPlaneClient(ServerInfo),
because we wouldn't want failure to create one transport to prevent us from
trying other servers.
But handling that error is tricky. If we log and keep going then
addCpcToAuthority() will add null to the activateCpClients authority
list. We'd need to put protections everywhere it is used. I think we'd be
better off making FailingXdsTransport (I know, I know. Hide your surprise).
The main issue I see is we probably shouldn't call eventHandler until
StreamingCall.start() returns, which requires a separate thread. If we
move the IllegalStateException handling into GrpcXdsTransport then we could
use GrpcUtil.SHARED_CHANNEL_EXECUTOR.
Ideas?
—
Reply to this email directly, view it on GitHub
<#11254 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZQMCXTJJTCVPTBJHCVVC2L2CZDKJAVCNFSM6AAAAABIXKY5WKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDINRWGA2TAMJYGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Yeah, it implies a |
ejona86
left a comment
There was a problem hiding this comment.
I agree we don't need an executor in the FailingXdsTransport because the synchronization context will prevent reentrancy.
I'm still not all that comfortable with manageControlPlaneClient() being passed a Subscriber. But it looks like it will behave okay, even if it is misleading, so this can go in without that changed.
f13ebe9 to
3ee25db
Compare
Implementation of https://github.com/grpc/proposal/blob/master/A71-xds-fallback.md in Java