Skip to content

Fix for cloud fetch#362

Closed
andrefurlan-db wants to merge 2 commits intodatabricks:mainfrom
andrefurlan-db:3.1.1
Closed

Fix for cloud fetch#362
andrefurlan-db wants to merge 2 commits intodatabricks:mainfrom
andrefurlan-db:3.1.1

Conversation

@andrefurlan-db
Copy link
Contributor

@andrefurlan-db andrefurlan-db commented Feb 21, 2024

  • Throw when failed to download file

  • Retry properly while downloading file

  • Add a bunch of debug logs

  • Prevent thread issues

TODO: http connection pools for cloud storage, proxies, etc.

Also backported to version 2

* fixes for cloud fetch

Signed-off-by: Andre Furlan <andre.furlan@databricks.com>
---------

Signed-off-by: Andre Furlan <andre.furlan@databricks.com>
Co-authored-by: Raymond Cypher <raymond.cypher@databricks.com>
# Download was not successful for next download item. Fail
self._shutdown_manager()
return None
raise ResultSetDownloadError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the change in the comment above, there is no retry attempted?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it just handled by raising the exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the retry is done outside this function, closer to the actual http request

Signed-off-by: Andre Furlan <andre.furlan@databricks.com>
return uncompressed_data


def http_get_with_retry(url, max_retries=5, backoff_factor=2, download_timeout=60):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we implementing retry behavior here rather than using a Retry passed to the session?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. It is in the TODO to also have connection pools

Copy link
Collaborator

@benc-db benc-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve, but consider if we can implement with urllib3 Retry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments