We'll have to dig into this sometime. For the right-menu to work then each paragraph needs to be uniquely identified by the TeX code in a way that the viewer can read the .parlocs file and understand. esbs and figure-captions probably add several layers of confusion, both to the bit that does the identifying and the bit that tries to understand it.
t
I was going to write that 'automatic sidebars' (cat:headingsbox|esb / cat:titlebox|esb) are handled by the same code, so there's not much point in trying them. However, if I remember correctly, there are some subtle differences in the sequencing, i.e. I think they generate a normal heading block, and then put that inside a sidebar, meaning that you won't ever get something like a cat:headingsbox|s1 to work, where in your example, cat:middle|s1 would be valid styling.
So it might be worth trying if you're feeling adventurous. All this is from months old memory, though, without re-reading the code, so I could be sending you on a wild goose chase..