Audio repetition detection in audio blogs

My goal is to detect repeated parts in a private audio blog, as in

I recently talked about … strike that … I recently discovered… no… I … umm … er … Yes, I take that … I recently discovered that a population increase due to a decrease of environmental pollution…

to remove

I recently talked about … strike that … I recently discovered… no… I … umm … er … Yes, I take that …

from it automatically.

Now I tried some machine learning algorithms as in audio file -> … -> machine learning-> … -> cutlist and I encountered some problems.

  1. Where do I cut the audio file? I have a 2GB GPU from Nvidia, therefore not too much information can be put into it at once. To find repetition I could look at an arbitrary window that still fits into memory. But what if repetition is just outside this window? Or if repetition is intended as in “How can we accomplish X, how can we accomplish Y?”. Should I cut at areas below a volume threshold?
  2. What platform should it be on? I am using audacity to cut audio files, so a nyquist-plugin seems to be most suitable, but as far as I know there is no tensorflow <-> audacity bridge using nyquist plugins. I found mod-script-pipe, which seems to do exactly that.