Skip to content

Conversation

@shlomi-noach
Copy link
Contributor

Storyline: #30

This is the first and potentially last change so as to solve the out-of-order issue.
The change is to not have the DML event func run asynchronously. They were originally introduced to run synchronously when the heartbeat mechanism was using binlog events as well. That turned out to be a wrong implementation, and heartbeat now uses plain old table queries.

There is a buffer of 100 event entries that will allow for a queue buildup of DML events. See https://github.com/github/gh-osc/blob/master/go/logic/migrator.go#L34 and https://github.com/github/gh-osc/blob/master/go/logic/migrator.go#L69

So up to 100 unhandled events can still build up without further blocking DML events such as status entries. I'm not at all sure at this point we should care about the DML events blocking other events. Running with this small change.

- Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe.
- Multiple rowcopy operations would still write to `rowCopyComplete` channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking
- Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked.
- `waitForEventsUpToLock()` unifies the waiting on all variants of complete-migration
- With `--test-on-replica`, now stopping replication "nicely", using `master_pos_wait()`
- With `--test-on-replica`, not throttling on replication after replication is stopped (duh)
- More debug output
@shlomi-noach
Copy link
Contributor Author

Solved various race conditions:

  • Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe.
  • Multiple rowcopy operations would still write to rowCopyComplete channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking
  • Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked.
  • waitForEventsUpToLock() unifies the waiting on all variants of complete-migration
  • With --test-on-replica, now stopping replication "nicely", using master_pos_wait()
  • With --test-on-replica, not throttling on replication after replication is stopped (duh)
  • More debug output

@shlomi-noach shlomi-noach merged commit 92d09db into master May 16, 2016
@shlomi-noach shlomi-noach deleted the fix-out-of-order-dml-apply branch May 16, 2016 09:04
timvaillancourt pushed a commit to timvaillancourt/gh-ost that referenced this pull request Aug 4, 2022
Support zero date and zero in date, via dedicated command line flag
timvaillancourt pushed a commit to timvaillancourt/gh-ost that referenced this pull request Aug 4, 2022
Support zero date and zero in date, via dedicated command line flag
timvaillancourt pushed a commit to timvaillancourt/gh-ost that referenced this pull request Aug 4, 2022
Support zero date and zero in date, via dedicated command line flag
timvaillancourt added a commit that referenced this pull request Aug 10, 2022
* Merge pull request #31 from openark/zero-date

Support zero date and zero in date, via dedicated command line flag

* Merge pull request #32 from openark/existing-date-with-zero

Support tables with existing zero dates

* Remove un-needed ignore_versions file

* Fix new lint errors from golang-ci update

Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant