We've been seeing a random failure in one of the unit tests: AbstractRunImplTest.replayRunTestMB. For some very strange reason, I'm seeing it a lot more here in San Jose
I tracked the immediate problem that I see down to the fact that PipelineRunImpl.replay assumes that the task will remain in the queue for the duration of the functions execution. In my case, that's not what's happening i.e. the task goes to a running state fast enough for PipelineRunImpl.replay to not be able to see it in the queue and to think the run was not added to the queue and for it to throw an exception.
I suppose the harder part to solve here is the API i.e. it seems like it can't return a BlueQueueItem, seeing as it might not be in the queue anymore.
I tested this by hacking PipelineRunImpl.replay as follows: b8639f8.
Assigning to Vivek for now ... he can decide if someone else needs to handle it.