Radio Text RS

To create a WordPress taxonomy for program listings and integrate Google Cloud Speech-to-Text API for transcription, follow these steps:

Step 1: Create a Custom Taxonomy

  1. Add Taxonomy in WordPress:
    • Go to Appearance > Theme Editor > functions.php and add:
function create_program_listing_taxonomy() {
    $labels = array(
        'name' => _x( 'Program Listings', 'taxonomy general name' ),
        'singular_name' => _x( 'Program Listing', 'taxonomy singular name' ),
        'search_items' =>  __( 'Search Program Listings' ),
        'all_items' => __( 'All Program Listings' ),
        'parent_item' => __( 'Parent Program Listing' ),
        'parent_item_colon' => __( 'Parent Program Listing:' ),
        'edit_item' => __( 'Edit Program Listing' ),
        'update_item' => __( 'Update Program Listing' ),
        'add_new_item' => __( 'Add New Program Listing' ),
        'new_item_name' => __( 'New Program Listing Name' ),
        'menu_name' => __( 'Program Listings' ),
    );

    register_taxonomy('program-listings', array('post'), array(
        'hierarchical' => true,
        'labels' => $labels,
        'show_ui' => true,
        'show_admin_column' => true,
        'query_var' => true,
        'rewrite' => array( 'slug' => 'program-listings' ),
    ));
}

add_action( 'init', 'create_program_listing_taxonomy', 0 );

Step 2: Integrate Google Cloud Speech-to-Text API

  1. Set Up Google Cloud:

    • Create a project in the Google Cloud Console.
    • Enable the Speech-to-Text API.
    • Create and download a service account key in JSON format.
  2. Install Google Cloud Client Library:

    • Ensure you have Composer installed.
    • Run composer require google/cloud-speech.
  3. Create a PHP Script for Transcription:

    • Create a PHP script to handle transcription using Google Cloud Speech-to-Text.
require 'vendor/autoload.php';

use Google\Cloud\Speech\V1\SpeechClient;
use Google\Cloud\Speech\V1\RecognitionConfig;
use Google\Cloud\Speech\V1\RecognitionAudio;

function transcribe_audio($audioFilePath) {
    $speechClient = new SpeechClient([
        'credentials' => json_decode(file_get_contents('/path/to/your-service-account-key.json'), true)
    ]);

    $audioContent = file_get_contents($audioFilePath);
    $audio = (new RecognitionAudio())
        ->setContent($audioContent);

    $config = (new RecognitionConfig())
        ->setEncoding(RecognitionConfig\AudioEncoding::LINEAR16)
        ->setSampleRateHertz(16000)
        ->setLanguageCode('en-US');

    $response = $speechClient->recognize($config, $audio);
    $transcriptions = '';
    
    foreach ($response->getResults() as $result) {
        $transcriptions .= $result->getAlternatives()[0]->getTranscript();
    }

    $speechClient->close();
    
    return $transcriptions;
}

Step 3: Automate the Process

  1. Daily Cron Job for Scanning Listings:
    • Add a cron job to run a script that scrapes listings and calls the transcription function.
    • Example cron job setup in Unix-based systems:
0 6 * * * /usr/bin/php /path/to/your-script.php
  1. PHP Script to Combine Everything:
    • Combine scraping, transcribing, and updating WordPress in a single script.
require 'vendor/autoload.php';
require 'wp-load.php'; // Adjust the path to your wp-load.php

use GuzzleHttp\Client;

// Function to scrape listings
function scrape_listings() {
    $client = new Client();
    $res = $client->request('GET', 'https://www.bbc.co.uk/sounds/schedules/bbc_radio_fourfm');
    $html = $res->getBody()->getContents();
    $dom = new DOMDocument();
    @$dom->loadHTML($html);

    $xpath = new DOMXPath($dom);
    $times = $xpath->query('//span[@class="broadcast__time"]');
    $programs = $xpath->query('//div[@class="programme"]//h3');

    $listings = [];
    for ($i = 0; $i < $times->length; $i++) {
        $listings[] = [
            'time' => $times->item($i)->textContent,
            'program' => $programs->item($i)->textContent
        ];
    }

    return $listings;
}

$listings = scrape_listings();

// Process each listing
foreach ($listings as $listing) {
    $transcript = transcribe_audio('/path/to/audio/files/' . $listing['program'] . '.mp3');
    $post_id = wp_insert_post([
        'post_title' => $listing['program'],
        'post_content' => $transcript,
        'post_status' => 'publish',
        'post_type' => 'post'
    ]);

    if (!is_wp_error($post_id)) {
        wp_set_object_terms($post_id, $listing['program'], 'program-listings');
    }
}

This script sets up a custom taxonomy in WordPress for program listings, scrapes daily radio schedules, uses Google Cloud Speech-to-Text API to transcribe audio files, and posts the content to your WordPress site.

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.