Category Archives: Computing

Data Scraping

Time to get some automation in place.  Rather than have to enter the draw twice a week, it would be so much better to have the application grab the draw the morning after it’s made.  This is interesting stuff.  My application is going to access another server, grab the HTML, parse it and store the results in my database.  This kind of scripting has many uses, automating and creating your own data feeds is something I’ve done in the past.  I purchased a subscription weather service, and extracted key values and stored them in a database for use in a DSL, line-quality metrics application.

A little poking around in Google searches yielded a gem called “Hpricot”.  Hpricot looked pretty cool.  It allowed you to parse HTML and pull divs and spans based on HTML class and id.  Even better they had a website where you could try it all out interactively.  I quickly realized that this would suit my needs exactly.

The MegaMillions site looked like it had been put together by people who understood what they were doing.  I say this because a great deal of sites don’t.  This site had well-structured CSS and naming conventions.  Extracting the draw data was a simple as:

doc = Hpricot(open("http://megamillions.com/includes/numberData_home.asp"))
draw_dates = doc/"div.num_date"
draw_picks = doc/"div.num_num"
draw_mega = doc/"div.num_mb"

None of that old string parsing code, this method just returned exactly what I needed.  Brilliant!

I came up with a little date calculation algorithm to determine whether the application should go and get the draw.  This was based on how many days had passed since the last draw recorded in the database.  For this I needed to remember to add the “require ‘date'”.  And, just for fun I decided to add a group of nine buttons, which would allow you to review the past nine draws.

Random Rules of Programming – #2 in an occasional series

Once a process has been started, it should be possible to interrupt and terminate it before completion.

Ideally, any changes completed should be “rolled back” to the state that they were before the action was initiated.  It is recognized that this latter part can be an issue in some environments, which lack transaction-style processing.  Therefore, suitable warnings should be given when an irreversible action is about to be performed.

Bonus Picks!

Every once in a while we’d win.  Now let’s not get carried away, it would only be a few dollars every once in while.  Occasionally, as much as $10.  We had decided that any “modest” winnings would be re-invested as extra tickets for the next draw.

So I needed to provide a mechanism for adding “bonus” picks.  I’d already made allowance for this in my model, by providing a boolean “perm” column, which indicated whether or not a row of picks was part of the permanent collection.  I already had the mechanism for input from the view, using the rails <% form tag %>, all I needed to do was extend that to provide a multi-line input field. Easily done.  And to add the code in the controller to iterate through each line and add the temporary picks to the database.  Again, easy to do with text parsing and splitting on a carriage return line-feed, before reloading the page, all from within the controller.  One small thing to look out for was that the “params” were passed as an array, therefore I need to access the zero index element in the action.

Not a huge increase in functionality, but satisfyingly easy to implement and deploy.  The complement to these new functionality, was add another button to erase any temporary picks from the database, which was literally a single call to destroy_all for any non “perm” rows.

Great!  Now how about the annoying task of having to enter the twice-weekly draw.  Time to research some URI and HTML parsing.

Deploy, Deploy, Deploy!

So now I have a working application. It’s pretty simple, an activerecord model stores the lottery picks, the user enters the current draw, and the resulting display shows the results, as well as any winning lines. I even added a helper to take logic out of the view to handle the formatting of the SPAN elements.  All the layout and formatting is handled through CSS.

Now it’s time to try and deploy it to my hosting service.  At this time I was still with Network Solutions, which had a nifty ZIP deployment procedure for uploading Rails apps.  I create a database in the MySQL admin.  I migrate the data, such that it is, with and SQL export/import. Upload the Rails ZIP, restart the server and…. Failure, 500 error.  I check the logs, they’re empty.  I try to check versions, can’t check the version because I don’t have shell access!  I find that the versions are not the same, so I go back and recreate the app with the version running on Network Solutions, including the gems.  Still getting the 500 error, and nothing in the logs!  I change permissions on the log folder, still nothing in the logs. I determine that the 500 page is the one from my application, so something is working, just not sure where to look.  I manhandle the environment.rb configuration to force everything into development, and suddenly I have log entries.  Aha!  It was running in production all along, and coupled with a permissions issue was the reason I had no logs.  Now I see that I never updated the database.yml config for production, which I had assumed would run in develpment until I implicitly switched it over.  Wrong!  So I update the database.yml file and after a couple of iterations while I determine the correct connection details to the database, I finally get my application running from Network Solutions, accessible over the internet.  “Hello World”, indeed!

Intermission: I hate Network Solutions

I’ve been a customer of Network Solutions for some years now. My needs were simple, I wanted some domain hosting, email, and a place to store mostly HTML / Javascript websites. More recently, I’ve been hosting some Ruby on Rails (RoR) apps, as I tinker with the technology. Up until about two weeks ago, there has never been a problem.

Network Solutions got hacked. They’ll tell you that it’s a WordPress issue, but it’s their issue, since it’s their script that sets up WordPress for you. To cut a long story short, Network Solutions seems to have a massive security breach of its file servers. A few months ago, many of their hosted sites were hacked. This wasn’t someone guessing a few hundred passwords. This was someone getting close to root access, and then running a few scripts to redirect some sites. As I can figure this has happened several times in the past six months, evidenced by Network Solutions resetting everyone’s passwords, FTP and otherwise.

Two weeks ago, email started going missing, sometimes delayed for eight hours or more. This was annoying. Then all my Rails apps started breaking, I couldn’t get the logger to switch to DEBUG in production, so it was hard to analyse. Then I did manage to see the error message, and could see that the app was not able to find folders or files reliably. I submitted several tickets. I got responses fourty-eight hours later (not the twenty-four, guaranteed!), that the issue had been resolved, which is odd since we’d barely begun a detailed exchange of information, and more to the point the issue was definitely NOT resolved. I figured out a sort-of solution, which was to “Reset all file permissions”, after doing this everything worked for a few hours before it all went wrong again. I submitted another ticket explaining my solution. I got another “Everything’s resolved” response. A day later, MY solution appeared on their front page, except it wasn’t a solution, since it only worked for a few hours at a time.

I’m no longer with Network Solutions.

Hey Network Solutions! Next time you go hacking through your file system, have the good grace to tell your customers what you’re doing. Stop responding to all tickets that the issue is resolved, and if the ticket is something to do with Rails, or another add-on, don’t give a stock response saying “we don’t offer support for these add-ons”, when the problem is ALL to do with you screwing with the file systems.

Stateless Development

So I had a functional Model, Controller, and View. My application loaded some rows from the database, performed some string manipulation / parsing, and handed them off to the View, which using CSS, formatted them for the screen. These rows from the database represented the lottery picks from the office pool. Now, I wanted to be able to enter the lottery draw for that day and check it against the picks.

I did some research and found out about “form helpers” and was able to create a single text field form, and that in turn was able to pass the contents to the Controller. It seemed logical to me not to want to re-use the same view and objects created when the form is initially displayed, and this is where I ran into some conceptual stumbling blocks. You see, I thought that once I’d populated a Model instance with data, and I was operating within the context of the same Controller, that the data would still be available. But in reality all of the data that had been retrieved from the database vanished when I submitted the form and called a method from the same Controller. I couldn’t see where I was going wrong. Was it a scope issue? It was time to “phone a friend”.

My friend explained to me that this being a web application, it ran under different rules to the way I was used to. It was stateless. Whatever had happened in previous executions was not carried over to the subsequent ones. Now, I’d done some ASP and JSP, and even PHP programming and I knew that you could create variables that persisted, so why couldn’t do that here? He explained that it was possible, but it wasn’t a good practice, and it went against many of the design paradigms. So to fix my “vanishing data” problem, I’d have to re-query the database every time the page was displayed. This seemed like a terrible inefficiency, I’d need to make an SQL query all the time. Actually, it wouldn’t be as bad as I was making out, cached queries would be make it all quite efficient, but I still had a hard time reconciling the fact that I would be involving a database call when I already had those returned rows available. Oh well.

It was quite easy to re-jig things in the Controller to repeat the database query, and once I did that the application started working as intended. I could go to the page, which would display the lottery picks. I could enter the draw for that day and the display would alter color and format to show the matching numbers.

First Steps

The initial spec. for this project is to write a Rails application that stores the office lottery pool picks in a simple database, allowing for multiple users (a future capability), and allows the user to enter additional picks.  The user will then be able to enter the current draw and the results will be displayed.  Not the most complex application in the world, but covering some basic functionality.

First off, I created a MySQL database, and added a table for my picks.  This required me to edit the database.yml config file to change it from sqlite3 to mysql, which is how the rails command left it.  I subsequently found out that there’s a command line switch that allows you to generate a config with mysql, but…   I used a tinyint for one of the boolean columns.  Now some of you maybe thinking, why didn’t I use the generator for all of this?  The answer is I didn’t know about that stuff. I generated the application with the rails command.  I generated a controller, and a model.  I popped a simple action in the controller to test everything, being careful to match the case-sensitive model name (I hate case-sensitive languages!):

 def list_picks
   @list=Megapick.find_by_sql("select * from mega;")
 end

and created a view with <%= debug(@list) %> What could be simpler? It all failed, so I tried rake db: migrate, and restarted the server.  This now worked, so I started messing around with the query, perhaps I should use find(:all)?  So I tried, that didn’t work, which was worrying.  Dug around a bit and found something that said I should add a reference to the database table in the model set_table_name "mega".  That fixed it right up, but I was obviously doing something in the wrong way.

I poked around some more and found out about “scaffold”.  Scaffold was the method for building the framework connecting the models to the database.  You used to use it inline, but the recommended way was to use the generator.  I scrubbed the existing model and table ran the scaffold generator with a new model.  That failed.  I had used a model name that was plural.  That’s a big no no.  Fixed that, remembering to check the routes.rb file, and lo!  I had a simple Model, Controller, View.  All model searches were working, I could display the search results in the view.  I could control how they were formatted.  I had joined up all the dots.

Now I realize that I had gone about all of this in the wrong way, but there something about having a “hello world” application that covers some important aspects of web application programming.  I started planning in my head, the way I was actually going to structure this application.  Little did I realize that my “fat application” developer background was going to be thrown a loop.  Coming up next: developing in a stateless environment.

Scheduling Tasks with Network Solutions

Of course my brain is now twitching with the possibilities even with a service provider like Network Solutions. No shell access! How can I schedule jobs? Can I run Ruby as CGI? (no). What if I need to run some regular maintenance? (I know, getting a little ahead of myself, but these are the kind of things that pop into my head, while I’m putting the initial design of something together).

Rails on Network Solutions is a well-guarded affair, you get to deploy, but you’re not getting access to a command line, so make sure it’s all fine and dandy before you deploy, which is fine most of the time, but you’ll definitely trip up somewhere, sometime and then it’s a right, royal pain. Setting up your Rails application is all done through their browser-based interface, after that you either upload a ZIP or handle it through FTP.

So, how about my scheduled tasks? Well, a little poking around reveals a nice little command called “runner“. Runner lives in the script directory of each application, and allows you to run Ruby files from the command line. Now, Network Solutions provides me with the capability of scheduling scripts/commands to run (again through the browser-based interface, no cron here!). So, a little test Ruby script and the creation of a scheduled task later, and “ta-da!”, I have a way of running Ruby scripts on a schedule. This is my little command
/htdocs/rails/test_app1/script/runner /htdocs/rails/test_app1/hello.rb

Where “test_app1 is the rails application name.

As I said, I’m not sure why I’m going to need this, but I bet I will.

Interlude – Think Different!

A few years ago Apple ran a campaign picturing some famous and renowned “free thinkers” under the concept “Think Different!”. The idea being that people who used Apple products were creative and able to put their stamp on the technology being used.

My wife asked to put one of my sounds on her iPhone for use as a ringtone. Simple, right?

Turns out that getting your own content onto an iPhone is much harder than it should be. Even once you’ve achieved that, assigning your file to be a ringtone exceeded my patience. I understand that Apple makes systems that are “simple” to use, removing a lot of the sometimes unnecessary clutter, but time and time again, I find that they leave out key features, or actively prevent you from being able to perform the simplest of actions. The default Mail program in OSX has no ability to download “headers only”, which seems to me, to be a fundamental option.

It’s not so much “Think Different”, as “Think the same as us”, which seems to be reflected in the new platforms such as the iPhone and iPad. These devices are “consumer” interfaces, portals into functionality, which you, the user have little or no control.