Source code for scraping the website of Too Jewish Radio program to create two RSS feeds (audio podcast)

Many of you have asked me for the source code of the Perl application I wrote for creating the two rss feeds ( All Episodes & Last 10 Episodes ) for the Too Jewish Radio program. Well, here ya go! Keep in mind that it was written quickly and really doesn’t have much in the line of robustness (no real error checking code, etc).

#!/usr/bin/perl

use strict;
use warnings;

use utf8;

use Config::Simple;
use Date::Manip qw( ParseDate UnixDate );
use Getopt::Std;
use LWP::Simple;
use Net::FTP;
use XML::LibXML;
use XML::RSS;
use URI;

use vars qw( $opt_h $opt_n $opt_o $opt_u $opt_c );

sub print_usage {
  print "-"x40 . "\n";
  print " "x10 . "too_jewish.pl\n";
  print "-"x40 . "\n";
  print " "x4 . "-n [# episodes] : Number of episodes to list in the RSS feed\n";
  print " "x4 . "-o [filename] : Save RSS feed to file\n";
  print " "x4 . "-u     : Upload RSS feed to ftp server\n";
  print " "x4 . "-c [cfg filename]: Read configuration from filename\n";
  print " "x4 . "-h     : Print help\n";
  print "-"x40 . "\n";
}

sub format_W3CDTF_date {
  my $orig_date = shift;

  my $w3cdtf_format = "%Y-%m-%d";
  my $tmp_date = ParseDate($orig_date);
  $tmp_date = UnixDate($tmp_date, $w3cdtf_format);

  return $tmp_date;
}

sub get_episode_data {
  my $num_episodes = shift;

  my $episode_data_ref;

  # Set up the parser, and set it to recover
  # from errors so that it can handle broken
  # HTML
  my $parser = XML::LibXML->new();
  $parser->recover(1);

  # Parse the page into a DOM tree structure
  my $url = 'http://www.toojewishradio.com/too_jewish_shows.htm';
  my $data = get($url) or die $!;
  my $doc = $parser->parse_html_string($data);

  # Extract the table rows (as an
  # array of referrences to DOM nodes)
  my @table_rows = $doc->findnodes( q{ /html/body/table/tr } );

  @table_rows = @table_rows[1..$#table_rows];

  foreach my $row (@table_rows) {
    my $row_date = $row->find('string(td[1]//font/font)')->value();
    next if $row_date =~ /^\s*$/;

    my $mp3_file = $row->find('string(td[2]/font//a[1]/@href)')->value();
    next if $mp3_file =~ /^\s*$/;

    my $description = $row->find('string(td[2]/font)')->value();
    next if $description =~ /^\s*$/;
    $description =~ s/\s+/ /g;
    $description =~ s/^\s+//;
    $description =~ s/\s+$//;

    my $abs_url_mp3 = URI->new($mp3_file)->abs($url)->as_string;

    push @$episode_data_ref, {
        'pubDate' => $row_date,
        'description' => $description,
        'url' => $abs_url_mp3
      };
  }

  if ($num_episodes) {
    @$episode_data_ref = @$episode_data_ref[0..$num_episodes-1];
  }

  return $episode_data_ref;
}

sub create_rss_feed {
  my ($episode_data_ref, $rss_filename) = @_;

  my $rss = new XML::RSS (version => '2.0');

  $rss->channel(
    title        => 'Too Jewish with Rabbi Cohon',
    link       => 'http://www.toojewishradio.com',
    description  => '"Too Jewish" with Rabbi Sam Cohon and Friends plays every Sunday morning at 7:00 am on radio station KAPR 930 AM in Douglas, Bisbee, and Sierra Vista at 9:00 am, on KJAA 1240 AM in Globe at 9:00 am, and at 9:00 am on radio station KVOI AM 690 in Tucson.

"Too Jewish" is a lively and fast-paced show that highlights everything interesting in contemporary Jewish life and features music, arts, culture, comedy, and inspiration. "Too Jewish" is a blend of information, irreverence, and exploration of all things Jewish in the 21st century. "Too Jewish" makes Judaism accessible, interesting, and fun for listeners of all ages and backgrounds, and brings the best of Jewish experience vividly to life. But on "Too Jewish", Rabbi Cohon also challenges accepted pieties and has fun with anything boring or inauthentic in the way Jews live today in the United States, Israel, and everywhere else.

Since its Tucson debut August 4, 2002, "Too Jewish" has featured such prominent guests as legendary singer and recording artist Neil Sedaka, Kinky Friedman, Elie Wiesel, comedian Lily Tomlin, folksinger Peter Yarrow, NPR Supreme Court Expert Nina Totenberg, Eve Ensler, U.S. Senator Russ Feingold, and many more!

Regular expert commentators of the "Too Jewish" maven section include Tom Price, an educator and former diplomat who offers unique insights into Jewish life around the world, and Amy Hirshberg Lederman, nationally syndicated columnist, who shares stories which speak to the heart of Jewish listeners. Comedy and musical selections drawn by Rabbi Cohon from the remarkable range of great Jewish material help make listening to "Too Jewish" an exciting and fun experience.',
    dc => {
        date     => format_W3CDTF_date('now'),
        subject   => 'Jewish Radio',
        creator   => 'toojewishradio@yahoo.com',
        publisher   => 'toojewishradio@yahoo.com',
        rights   => 'All Rights Reserved, toojewishradio@yahoo.com',
        language  => 'en-us',
      },
    sync => {
        updatePeriod   => 'weekly',
        updateFrequency => '1',
        updateBase      => format_W3CDTF_date('01/01/1901'),
      },
    taxo => [
        'http://www.templeemanueltucson.org',
        'http://www.toojewishradio.com/about_the_rabbi.htm'
      ],
    );

  $rss->image(
    title => 'Too Jewish with Rabbi Cohon',
    url    => 'http://www.toojewishradio.com/Too%20Jewish%20logo_color.jpg',
    link => 'http://www.templeemanueltucson.org',
    dc => {
      creator => 'toojewishradio@yahoo.com'
      },
    );

  foreach my $episode (@$episode_data_ref) {
    $rss->add_item(
      title        => $episode->{description},
      enclosure     => {
        url    => $episode->{url},
        type => "audio/mpeg"
        },
      description => $episode->{description},
      pubDate     => format_W3CDTF_date( $episode->{pubDate} )
      );
  }

  if ($rss_filename) {
    $rss->save($rss_filename);
  }

  return $rss->as_string;
}

sub upload_rss_file {
  my ($ftp_server, $rss_filename) = @_;

  my $ftp = Net::FTP->new( $ftp_server->{server}, Debug => 0)
    or die "Cannot connect to $ftp_server->{server}: $@";
  $ftp->login( $ftp_server->{user}, $ftp_server->{password})
    or die "Cannot login ", $ftp->message;
  $ftp->cwd($ftp_server->{dir})
    or die "Unable to change directory to $ftp_server->{dir}", $ftp->message;
  $ftp->put($rss_filename)
    or die "Cannot upload $rss_filename", $ftp->message;
  $ftp->quit;
}

my $num_episodes;
my $ftp_server;
my $rss_filename;
my $upload;

if (getopts('hc:n:o:u')) {
  if ($opt_h) {
    print_usage;
    exit;
  }

  if ($opt_n) {
    if ($opt_n =~ /^[[:digit:]]+$/) {
      $num_episodes = $opt_n;
    }
  }

  if ($opt_o) {
    $rss_filename = $opt_o;
  }

  if ($opt_u) {
    $upload = 1;

    my $cfg_filename;

    if ($opt_c && -f $opt_c) {
      $cfg_filename = $opt_c;
    } else {
      $cfg_filename = '.too_jewish.ini';
    }

    my $cfg = new Config::Simple($cfg_filename);

    if ($cfg &&
        $cfg->param('RSS.ftp_server') && $cfg->param('RSS.ftp_user') &&
        $cfg->param('RSS.ftp_password') && $cfg->param('RSS.ftp_dir')
      ) {
      $ftp_server = {
        server   => $cfg->param('RSS.ftp_server'),
        user     => $cfg->param('RSS.ftp_user'),
        password  => $cfg->param('RSS.ftp_password'),
        dir        => $cfg->param('RSS.ftp_dir')
        };
    } else {
      print "Configuration file \"$cfg_filename\" does not contain a valid configuration!\n";
      exit;
    }
  }
}

my $rss_string = create_rss_feed( get_episode_data($num_episodes), $rss_filename );

if ($rss_filename && $upload && -f $rss_filename) {
  upload_rss_file($ftp_server, $rss_filename);
} else {
  print "$rss_string\n" unless $rss_filename;
}
Share Button

Can’t find a working RSS feed for the “Too Jewish with Rabbi Cohon” podcast? Here is a working RSS feed that is updated every Monday morning!

A long time ago when I used iTunes on my WinXP laptop I was able to subscribe to the Too Jewish with Rabbi Sam Cohon & Friends podcast through the iTunes Store.  Since my main desktop runs Ubuntu Linux 9.04 and the RhythmBox Music Player is unable to retrieve podcast from the iTunes Store, I was at a loss as to how to listen to the podcast.

I sent numerous emails to the webmasters at toojewishradio@yahoo.com requesting a RSS feed of the podcast to no avail.  So… I wrote a small Perl application to create two RSS feeds:

  • All Episodes
  • Last 10 Episodes

Please note that I’m only generating and hosting the RSS feeds and NOT the podcast files.  The podcast is a product of Temple Emanu-El.
Update: The source code is available at  Source code for scraping the website of Too Jewish Radio program to create two RSS feeds (audio podcast)

Share Button

Want to upgrade your iTunes DRM’d music (m4p,aac) to non-DRM legally? Check out iTunes 8

In iTunes 8, Apple has mentioned in their “What’s New in iTunes 8” that it is possible to upgrade your iTunes music to non-DRM’d for a small fee.  Great!

I purchased quite a bit of music from the iTunes store when I ran WinXP as my primary desktop.  Now I can legally convert them to something that Linux (and my Blackberry Storm) can read!

Let me just pull up iTunes and do that.  Can’t find how do upgrade my iTunes music from within iTunes.  Tried help but that didn’t give me any clue.

After much hunting on the apple.com website, I was able to figure out that while iTunes supports the upgrade, it appears that Apple really doesn’t want you to else they would have made it far less obscure on how to do it.  In Apple’s TechDoc HT1711, you will be directed to go to a special link that will activate the upgrade process in iTunes 8.

Excerpt from the Apple TechDoc:

Can I upgrade previously purchased music to iTunes Plus?

Yes. Any available upgrades will be shown on the Upgrade to iTunes Plus page. You can upgrade all of your items at once by using the Buy All button. This replaces all eligible previous purchases with iTunes Plus versions of the same items. You can also choose to make individual upgrades by clicking the Buy button to the right of each item. Song upgrades are available for 0.30 USD, video upgrades for 0.60 USD, and albums for 30 percent of the album price. The counter to the right of the “Upgrade to iTunes Plus” link in the Quick Links box will indicate when additional eligible content become available.

You can view your eligible iTunes Plus upgrade items by clicking here.

After you re-purchase *cough* upgrade your music, iTunes will download the music files (with the .M4a extension) and replace your old DRM encumbered music (with the .M4P extension).  Note that the non-DRM music files will contain enough information for you to identified easily if you should share your music with your friends.  The music files also contain water marks that will also survive if you should convert the non-DRM music files into mp3, ogg or flac files.  So, share your music at your own risk.

Share Button