brent timothy saner on 27 Nov 2018 18:59:27 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] OT: MontCo Dispatch system web & RSS


On 11/27/18 7:13 PM, Keith C. Perry wrote:
> That's pretty slick.
> 
> Since you mentioned hams though...
> 
> I've been practicing my HF mobile work.  The bands have been a bit challenging the last couple of days.  I worked a more local station 205 miles west of Philly on 80m last night and we all agreed the band was doing some very very weird things.
> 
> I've been enjoying the challenge of listening for and getting better at working those stations buried a bit further down in the noise.  Never know when such a skill will be needed due to a more series event when all the "slick" services are down.

because you can never have too much python fun, i wrote a parser (see
attached). it currently just converts it to JSON, but if you know a
little python yourself i'm sure you can extend it into some fun. it's
more important that everyone learn how awesome XML is. ;)

it's obj-oriented - the parsed RSS XML is converted to a python dict in
a class attribute, so you can import it and instance it all you want, etc.

#!/usr/bin/env python3

# http://lists.netisland.net/archives/plug/plug-2018-11/msg00065.html
# Note that you could take it even further, i.e. collating into MySQL or Sqlite3 DB, feed it into a TTS,
# break the rss/channel/item/description into linebreaks based on ; and turn it into mailing labels, all sorts of fun
# stuff.

import datetime
try:
    from lxml import etree
    has_lxml = True
except ImportError:
    # p.s. you definitely should install python-lxml:
    # - it's safer
    # - it's faster
    # - using it won't make you want to slap your grandma
    from xml.etree import ElementTree as etree
    has_lxml = False
from urllib.request import urlopen

class IncidentParser(object):
    def __init__(self, incident_url = 'https://webapp02.montcopa.org/eoc/cadinfo/livecadrss.asp'):
        self.url = incident_url
        self.events = []
        self.__getRaw__()
        self.__parseXML__()

    def __getRaw__(self):
        # Laziest error catching ever, but it's gonna be the problem 9/10 times.
        try:
            with urlopen(self.url) as webdata:
                self.xmlobj = etree.fromstring(webdata.read())
        except Exception as e:
            self.xmlobj = None
            raise RuntimeError('Can\'t connect to {0}. Network error?'.format(self.url))
        return()

    def __parseXML__(self):
        if self.xmlobj is None:
            return()
        self.xml = self.xmlobj.findall('channel')
        for channel in self.xml:
            _channel = {'Events': []}
            # Get "channel" info, i.e. metadata of the feed.
            # Technically we can expand this to the full spec of RSS, found at https://cyber.harvard.edu/rss/rss.html
            # but since we're PoCing it specifically for the Montco events feed, we can trim it down to just the
            # information they populate.
            for x in ('title', 'link', 'description', 'ttl'):
                # They should only have at most ONE of these, I think? I haven't checked the XSD lately.
                e = channel.find(x)
                if e is not None and e.text:
                    _channel[e.tag.title()] = e.text
            for e in channel.findall('item'):  # Each item is an "event" notification container.
                _event = {}
                for x in ('title', 'link', 'description', 'pubDate'):
                    sub = e.find(x)
                    if sub is not None and sub.text:
                        # See https://cyber.harvard.edu/rss/rss.html#hrelementsOfLtitemgt
                        # to understand why we serialize these as list items instead of dicts
                        # with titles keying.
                        # if x == 'title':
                        #     _event_name = sub.text
                        #     continue
                        data = (sub.text if x != 'pubDate'
                                    # Yes, I'm a showoff.
                                    # This will FAIL on anything but UTC/GMT, and it isn't conversion-aware.
                                    # https://bugs.python.org/issue22377
                                    else datetime.datetime.strptime(sub.text,
                                                                    '%a, %d %b %Y %H:%M:%S %Z'))
                        name = (sub.tag if x != 'pubDate' else 'Published')
                        _event[name.title()] = data
                _channel['Events'].append(_event)
            self.events.append(_channel)

def main():
    import json
    ip = IncidentParser()
    print(json.dumps(ip.events, indent = 4, default = str))

if __name__ == '__main__':
    main()

Attachment: signature.asc
Description: OpenPGP digital signature

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug