Last Saturday I attended Speakerthon, a collaborative web-enhancement event organised by BBC R&D and Wikimedia UK. The aim of the day was to interrogate the BBC Radio 4’s permanently available archive (e.g. The Woman’s Hour Collection), select clips of notable people speaking and add them to Wikipedia. Wikimedia UK’s Andy Mabbett thought up the idea and has spent the past 2 to 3 years convincing BBC decision makers of the efficacy of opening up their archive. In addition to applying open licences to BBC content, providing a rich layer of information to Wikipedia entries, and adding good quality linked data to the Web, the visibility of the archive is greatly enhanced, and tagged clips will be used to teach applications to automatically identify voices in the archive (e.g. The World Service Radio Archive Project), thereby making BBC researchers jobs a great deal easier.
The day started with a briefing session. We were shown how to use the BBC ‘Snippets‘ software (sadly only made available to us on the day), and what type of clips to listen out for. Finding 20 to 40 second clips of individuals talking, preferably about themselves or their field of work, without interruption or any background music was frustrated on some programmes by over enthusiastic interviewers who would insist on butting in, whereas others (like Desert Island Discs) proved to be a goldmine of useful clips.
Once a clip was identified and selected, ‘Snippets’ created a URL, which we manually added to a Google Docs spreadsheet along with the persons name and gender, Wikipedia URL, and programme archive URL. This was then picked up by the BBC editorial team, who checked ‘compliance’ (i.e. the suitability of the clip and any outstanding copyright issues), trimmed and edited the clip (using Audacity a free audio editor), encoded it to the open source .flac format, and uploaded it to Wikimedia.
At the time of writing about 100 clips have been uploaded out of the 300 created on the day. I added eleven clips to the Google Docs spreadsheet, three of which have been uploaded to Wikimedia. So far I’ve embedded voice clips and metadata for Owen Hatherley and Claire Skinner, and three of the clips: Guglielmo Marconi, his second wife Maria Cristina Bezzi-Scali and John Scott-Taggert (the first person to receive a radio message from a ship in distress) are awaiting confirmation of their copyright status.
It was a real joy to take part in this collaborative cyberspace project and to be in at the start of a project that has the potential to have an effect considerably greater than the sum of its parts.
See also: Speakerthon: Sharing Voice Samples – Marieke Guy, Open Education Working Group
Please note: While capturing audio from the BBC’s web archive and uploading it to Wikipedia (or anywhere else) is relatively straight-forward, doing so without the express permission of the BBC infringes their copyright.