When discussing the rise of #deep_learning, the accuracy of automated approaches is typically compared to
the gold standard of flawless human output. In reality, real-world human
performance is actually quite poor at the kinds of tasks typically being considered
for #AI_automation. Cataloging imagery, reviewing videos and transcribing are
all tasks where humans have the potential for very high accuracy but the
reality of their long repetitive mind-numbing hours sitting in front of the
screen means human accuracy fades rapidly and can vary dramatically from day to
day and even hour to hour. For all their accuracy issues, automated systems
promise far more consistent results.
#Speech_recognition is an area
where humans at their best still typically outperform machines. In real-life
real-time transcription tasks like generating closed captioning for television
news, however, it turns out that commercially available systems like Google’s Speech-to-Text API
are actually almost as accurate as their human counterparts and are
far more faithful in their renditions of what was said.
Look more closely at the
captioning of some stations and an interesting pattern emerges: the quality of
the human captioning can vary from day to day and even over the course of a
single day.
Real-time transcription is
typically outsourced to third-party companies who employ contractors to type up
what they hear. Quality can vary dramatically between contractors and even the same individual might perform better in the morning when they are more rested
or just have a bad day.
Some stations tape their morning
shows and rebroadcast them as-is directly from tape in the afternoon, but may
choose to retranscribe them in the afternoon on the off chance that breaking
news forces the station to interrupt the taped show. This means that the exact the same show may have different typographical errors in the afternoon than it did
in the morning.
No comments:
Post a Comment