sub parse_file {
my $code=shift; # the xml data
my $r_info={};
my @ary=split /\n/,$code;
for (@ary) {
if (/<([\w_]+)>(.+)<\/([\w_]+)>/) {
my $stag=$1;
my $val=$2;
my $ctag=$3;
$r_info->{$stag}=$val if $stag eq $ctag;
}
}
return $r_info;
}
There are more xml-centric ways to do this. For example, Some other people have written XSL transformations for the feeds. Greg B. wrote the original version of this transformation, which I needed to kludge into oblivion in order to get Netscape to display properly (This isn't Greg's fault, it's Netscape's). Thanks Greg!
Is there a mailing list associated with these weather feeds?
Yes, there is. To subscribe to the list, send a message to weathernews-subscribe@lists.boygenius.com. You will receive a confirmation message to which you must reply in order to activate your subscription.
The list is set up as an unmoderated, "any member can post" list, but the traffic should be fairly light. If you have questions regarding the use of these feeds, please subscribe and send your question there.
The list is archived, but I don't have a display and search function set up for it yet. Eventually.
What are all the possible condition values, and their corresponding image names?
'Few Clouds'=>'few.jpg', 'Mostly Cloudy'=>'mcloudy.jpg', 'Cloudy'=>'cloudy.jpg', 'Partly Cloudy'=>'pcloudy.jpg', 'Rain'=>'shra.jpg', 'Light Rain'=>'rain.jpg', 'Clear'=>'hi_clr.jpg', 'Fair'=>'fair.jpg', 'Fog'=>'fg.jpg', 'Freezing Drizzle'=>'fdrizzle.gif', 'Freezing Rain'=>'fzra.jpg', 'Haze'=>'haze.jpg', 'Drizzle'=>'drizzle.jpg', 'Drizzle and Fog'=>'fog.jpg', 'Mostly Sunny'=>'few.jpg', 'Partly Sunny'=>'pcloudy.jpg', 'Snow'=>'snowshowers.jpg', 'Sunny'=>'sunny.jpg', 'Windy'=>'wind.jpg', 'Mist'=>'mist.jpg', 'Misty'=>'mist.jpg', 'Drizzle'=>'drizzle.gif', 'Dust'=>'du.jpg', 'Not Available'=>'na.gif', 'Heavy Rain'=>'showers.jpg', 'Thunderstorms'=>'tstorm.jpg',
These are just the conditions for which I've seen images. If you need more complete information, see this page regarding TAF codes.
What's an epoch?
In the world of *nix systems, the epoch is the "beginning of time". It is January 1, 1970, at 00:00:00 UTC. There are lots and lots of libraries out there that will convert this into local time for you.
How do you decide who to ban for abuse?
I have log reports that tell me how many requests that remote computers have generated to my server within the reporting period. It gives me a list that looks like this:
reqs: %bytes: host
----: ------: ----
7058: 7.07%: 192.168.1.59
5494: 5.24%: 10.0.12.5
5270: 5.04%: 172.16.35.199
and so on...
That tells me that 192.168.1.59 has made 7058 requests during the reporting period.
Reporting period of 13.9 days, * 24 hours = 333.6 requests
Finally, for every IP address that is generating more than the expected number of requests, I go to the server logs to see what's going on:
$ grep 192.168.1.59 logfile
192.168.1.59 - - [13/Feb/2004:01:49:28 -0500] "GET /feeds/maine-york.xml HTTP/1.1" 200
192.168.1.59 - - [13/Feb/2004:01:49:32 -0500] "GET /feeds/maine-york.xml HTTP/1.1" 200
192.168.1.59 - - [13/Feb/2004:01:49:41 -0500] "GET /feeds/maine-york.xml HTTP/1.1" 200
192.168.1.59 - - [13/Feb/2004:01:49:42 -0500] "GET /feeds/maine-york.xml HTTP/1.1" 200
192.168.1.59 - - [13/Feb/2004:01:49:44 -0500] "GET /feeds/maine-york.xml HTTP/1.1" 200
This machine is requesting the York, Maine file entirely too often. Into the ban list it goes.
Another thing that will get a machine put into the ban list is erratic requests. If the requests aren't coming in on a regular schedule, I'll assume that you've inexplicably configured you web page to generate a request to my server for every request to yours. Don't do that. See How do I parse these XML feeds?, above.
How do I download a new copy of the feed once per hour?
...on a *nix machine:
For those of you who are paying attention, yes, you can put the wget/lynx/etc call directly into crontab. Go to town.
- Write a little shell script that grabs the feed using whatever utility you're comfortable using. For example, wget:
#!/bin/sh /usr/local/bin/wget -q -O /some/directory/weatherfile.xml \ http://weather.boygenius.com/feeds/state-city.xmlor lynx:#!/bin/sh /usr/bin/lynx \ --dump http://weather.boygenius.com/feeds/state-city.xml \ >/some/directory/weatherfile.xmlCall it get_weather.sh. Set its execute bit. Execute it, and see if the xml file shows up in /some/directory/. If it doesn't work, ask someone other than me to help you.- Pick a number between 10 and 59
- Set up a cron job that looks like this:
[your number here] * * * * [your username] /path/to/script/get_weather.shIf you don't know what a lynx is, or a cron, or you're actually planning on typing [your username] in the command rather than belvedere or edgar or whatever your actual username is, or you have no idea what a username is, you probably need help with this. Once again, please ask someone other than me.
...on a Windows machine:
Kevin B. recommends the iOpus File and Web Page Downloader for this task. I've never used it, but it looks straightforward.
Some of the cities in the feeds list are associated with the wrong states.
This is a consequence of the NOAA's arrangement of their weather stations and reports. The data that I'm parsing is organized by state, but sometimes the NOAA sneaks a station from a nearby state into another state's report. When this happens, you wind up with things like "Cleveland, Pennsylvania".
This isn't really a big deal, unless Cleveland isn't also showing up in Ohio. If that's the case, please let me know and I'll fix it. Otherwise, just use the ohio-cleveland file, and pretend that the pennsylvania-cleveland file doesn't exist.
Where can I find the NOAA's XML feed, the one that you're using to generate your feeds?
I'm not using an XML feed to generate my feeds. I'm using raw state weather round up data, which you can view here. I'm using the files that contain "rwr" in their filenames.
One of three things is happening:
The feed that I'm using is very old.
See the next question.
The feed that I was using has disappeared.
For some reason, every now and again the NOAA will remove a city from their data. I don't know why this happens. When it does, the feed generation script will leave the most recent data file in place, in case the condition is temporary. Older files are eventually removed when I run a second script, which cleans out files that haven't been updated recently.
In a production environment, you may want to build a bit of redundancy into your feed retrieval process. Have your retrieval program look for the first-choice file. If it returns a 404, have it look for the second file, and so on. Please do not download all of the files at every retrieval to implement this redundancy. The missing cities problem does not occur frequently enough to warrant a threefold increase in the bandwidth usage.