tl;dr: No, it's not just you. If you switch over to the Raw edition and seek to those points you'll notice the audio is clean. The audio artifacting is probably due to lazy encoding.
The "I love to hear myself type" explanation:
The type of switching I attempted is a bit different compared to other fansubbers who are making use of ordered chapters. While I've read about other people using ordered chapters to make a continuous "playlist" using segment linking and for OP-/ED-less playback, I think this experiment is unique. I tried to provide some of the benefits of hardsubbing and softsubbing in the same file. Somewhere along the line I realized this was a Bad Idea (tm) and still decided to take it about 3 steps further.
A little background: ordered chapters in matroska have to all contain the same number of tracks and those tracks have to all use the same codecs. You can't, as far as I know, create a virtual timeline with some video segments containing no audio and overlay an audio track with the real timeline at playback.
To create the real timeline I simply trimmed the sections that needed to be hardsubbed and spliced them at the end of the video, after the raw frames. In most cases, I tried to cut at points where there is either almost no sound (op/ed lead-in/-out) or a lot of noise (metamorphose, attacks). It doesn't work out in practice because probably the way Haali's splitter buffers audio <or something more complex than i want to look into>. In other words I have no idea and didn't research it further because I was hoping either nobody would notice, or nobody would be downloading these eps.
The artifacting could maybe be reduced by trimming more frames before the required switch point but then again it might just shift when you hear that annoying click. Maybe future updates in splitter technology will alleviate or eliminate the problem. In the meantime, I just finished a program to generate chapter timecodes directly from frame numbers which might make audio switching more precise. Alternatively it could make the problems worse.
Interesting note: the audio bitrate is higher simply because I didn't want to mess around with audio settings and 128k was the default. This means that to achieve the same filesize as the official release, the video is actually lower bitrate (even lower than you'd think because there's an extra ~3.5 minutes of video/audio to encode). Thanks to the magic of x264, there's little to no quality loss overall compared to the xvid encode. I could probably get away with lowering audio bitrate too, since aac is also a better codec than mp3.