Lockless Queue Second Draft

Last week I wrote a first draft of a lockless queue for my event system. This week, I wrote a second draft, and I'm pretty happy with it. I'll likely write one more version but it'd be similar to this one. Since Bold will have a lot of events and asynchronous code it's important that I get this right. Some features are

I didn't implement work stealing. I want to implement real code so I can figure out if the best strategy is to steal 1 message (from the end,) or a group of messages, or if the message type affects the optimal way of stealing work.