Before you ask for the accuracy of speech to text transcription you have to ask what language, dialect and the number of speakers to start with. Then consider, are you asking for human-based transcription services or speech engine solutions?
Our testing so far for the accuracy of speech to text transcription generally shows most speech engine accuracy for libraries already built on language and dialect still only achieve around 50% (five out of ten words on average correct). The best we have come across is 65% so far, on a set of language and dialect audio recording tests we use (but if we include noisy backgrounds, the percentage drops by more than 20% in such circumstances). In both cases, test audio speech was clear and single speaker (untrained) so not applied to software that is being trained.
Human-based services obviously vary by the transcribers they employ (and their vetted capabilities also to work on the subject matter, understand nuances, etc), context of the recording, and format required for transcript (smart or verbatim for example). Interestingly many transcription services claim incredible accuracy of up to 200% (!) but reality is around 85%+. In our case, targets for contracting to Way With Words (http://waywithwords.net) require around 95% + minimum.
Hope this helps!