IAradio4

Written by

To create a script that scrapes audio files and metadata from BBC Radio 4 schedules and uploads them to the Internet Archive, you need to perform web scraping and then upload the collected data. Below is an example of how you can achieve this using Python:

Step 1: Set Up Required Libraries

Install the necessary libraries:

pip install requests beautifulsoup4 lxml

Step 2: Scrape Audio Files and Metadata

First, create a script to scrape the audio URLs and metadata from the BBC Radio 4 schedules.

import requests
from bs4 import BeautifulSoup
import json

# URL of the BBC Radio 4 schedule
schedule_url = 'https://www.bbc.co.uk/sounds/schedules/bbc_radio_fourfm'

def scrape_bbc_radio_schedule(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'lxml')

    listings = []

    # Scrape schedule details
    for item in soup.find_all('div', class_='programme'):
        title = item.find('h3').text.strip()
        time = item.find('span', class_='broadcast__time').text.strip()
        description = item.find('p', class_='programme__synopsis').text.strip() if item.find('p', class_='programme__synopsis') else 'No description available'
        audio_url = item.find('a', class_='programme__link')['href']
        
        listings.append({
            'title': title,
            'time': time,
            'description': description,
            'audio_url': audio_url
        })

    return listings

listings = scrape_bbc_radio_schedule(schedule_url)
print(json.dumps(listings, indent=4))

Step 3: Download Audio Files

Modify the script to download the audio files from the retrieved URLs.

import os

def download_audio(listings):
    for listing in listings:
        audio_response = requests.get(listing['audio_url'])
        filename = f"{listing['title']}.mp3"

        with open(filename, 'wb') as f:
            f.write(audio_response.content)
        
        listing['filename'] = filename

download_audio(listings)

Step 4: Upload to Internet Archive

Use the Internet Archive API to upload the audio files and metadata.

def upload_to_internet_archive(listing, username, password):
    metadata = {
        'collection': 'opensource_audio',
        'title': listing['title'],
        'description': listing['description'],
        'creator': 'BBC Radio 4',
        'subject': 'Radio',
        'licenseurl': 'http://creativecommons.org/licenses/by/4.0/'
    }

    session = requests.Session()
    session.auth = (username, password)

    files = {
        'file': open(listing['filename'], 'rb')
    }
    response = session.put(f'https://s3.us.archive.org/{listing["title"]}/{listing["filename"]}', files=files)
    print(response.status_code, response.text)

    metadata_url = f'https://archive.org/metadata/{listing["title"]}'
    response = session.post(metadata_url, data=metadata)
    print(response.status_code, response.text)

username = 'YOUR_ARCHIVE_USERNAME'
password = 'YOUR_ARCHIVE_PASSWORD'

for listing in listings:
    upload_to_internet_archive(listing, username, password)

Summary

This script scrapes audio files and metadata from BBC Radio 4 schedules, downloads the audio files, and uploads them to the Internet Archive. Replace the placeholders with your actual Internet Archive credentials. You may need to adjust the scraping logic based on the structure of the BBC website.

Referenences

API reference

FILES, FORMATS, AND DERIVATIVES – FILE DEFINITIONS

Example of radio page listings with transcript

IAradio4

Step 1: Set Up Required Libraries

Step 2: Scrape Audio Files and Metadata

Step 3: Download Audio Files

Step 4: Upload to Internet Archive

Summary

Referenences

Comments

Leave a Reply Cancel reply

More posts

Eat These 5 Foods to Activate Autophagy Without Fasting

Enabling live subtitles on IPTV

The #1 Best Meal to Protect Your Heart and Support Healthier Arteries (According to an A&E Doctor)

Rebellious Bots

New Study: A Ketogenic Diet May Reverse Biological Aging in People with Obesity

Chios Mastic Gum: A Traditional Resin with Emerging Evidence for Gut Health and Selective Anticancer Activity