They found that the software system was more accurate compared to the professional. The research team compared the ability of the machine and a human expert to work out what was being said in the silent video by focusing solely on each speaker's lip movements. The videos contained more than 118,000 sentences in total, and a vocabulary of 17,500 words. The AI system uses computer vision and machine learning methods to learn how to lip-read from a dataset made up of more than 5,000 hours of TV footage, gathered from six different programmes including Newsnight, BBC Breakfast and Question Time. Watch, Attend and Spell (WAS), is a new artificial intelligence (AI) software system that has been developed by Oxford, in collaboration with the company DeepMind.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |