public
Description: Twitter data crawler, replies archiver, and statistics generator
Home | Edit | New

Installing Twitalytic on DreamHost

NOTE: Every instance of “yourusername” and “yourdomain.com” you’ll have to replace with the values appropriate to you. “yourusername” always refers to your DreamHost username, not your Twitter username. I’m making certain assumptions about what your DreamHost environment looks like and your mileage may vary. If you want to access Twitalytic at a URL other than yourdomain.com/tweets, or on a subdomain, you’ll have to adjust accordingly.

I picked “tweets” and not “twitalytic” for the path to reduce confusion between directory names on the server and the path in the URL. I’m sure there are errors to be found and clarifications to be made—please help me clean this up!

Contents

  1. The short version
  2. The long version
  3. Cron enhancement

The short version:

  1. In the DreamHost Panel set your domain to use PHP5.
  2. Get shell access and SSH in.
  3. git clone git://github.com/ginatrapani/twitalytic.git
  4. Make a database, then:
    cd twitalytic
    mysql -u john_twitalytic -p -h mysql.yourhostname.com john_twitalytic_db < build-db.sql
  5. wget http://www.smarty.net/do_download.php?download_file=Smarty-2.6.26.tar.gz
    gunzip -c Smarty-2.6.26.tar.gz | tar -xf -
  6. ln -s ~/yourdomain.com/tweets ./webapp
  7. Register the app with Twitter:
    Application Website: http://yourdomain.com/tweets
    Application Type: Browser
    Callback URL: http://yourdomain.com/tweets/account/oauth.php
    Copy the Consumer key and Consumer secret
  8. Rename common/config.sample.inc.php to common/config.inc.php, and enter the oauth_consumer_key and oauth_consumer_secret as above, plus:
    log_location: /home/yourusername/twitalytic/logs/
    site_root_path: /tweets/
    smarty_path: /home/yoursusername/twitalytic//Smarty-2.6.26/
    And the db_host, db_user, db_password, and db_name from Step 2.
  9. Browse to http://yourdomain.com/tweets/session/register.php and create an account.
  10. Wait for the activation email, click on the link it contains to activate your account and log in, then jump through the hoops to authorize Twitalytic to access your Twitter account.
  11. Back in SSH, crawl your tweets: /usr/local/php5/bin/php ~/twitalytic/crawler/crawl.php
  12. Setup a Cron Job to run this command every hour:
    cd /home/yourusername/twitalytic && /usr/local/php5/bin/php /home/yourusername/twitalytic/crawler/crawl.php
  13. Enjoy!

The long version:

1. Set your domain to use PHP 5

Go to Manage Domains on the DreamHost Panel. Click on Edit next to the domain you want to use (we’ll call it yourdomain.com from now on) and scroll down to PHP mode under Web Options. If it doesn’t say PHP 5 CGI or PHP 5 FastCGI, change it so it does (either will work; DreamHost recommends FastCGI). If you’re afraid this will break some of your existing scripts, you could create a new subdomain for Twitalytic instead and set that to PHP 5. Scroll down and click on Change settings.

2. Make sure you have shell access

On the DreamHost Panel’s Manage Users page, find the username that you usually use to upload files to your site and click Edit next to it. Next to User Account Type check the Shell account radio button. Scroll to the bottom and click Save changes.

3. SSH to your shell account

On Linux, Unix or OS X you can just type ssh yourusername@yourdomain.com in a terminal window. In Windows you’ll need to download an SSH client like PuTTY (free). Once you’re connected you should be in your home directory (if you type pwd (“print working directory”) it should tell you something like /home/yourusername).

4. Clone the GitHub repository

Inside your home directory you will have a subdirectory for each of your domains and subdomains (type ls if you want to see them listed). Your first impulse may be to switch to one of those subdirectories (domains) to install Twitalytic there, but I recommend putting it its own subdirectory so that not all of its files will be accessible from the web (later I’ll show you how to make the “webapp” directory publicly accessible). So, to download the Twitalytic source, type git clone git://github.com/ginatrapani/twitalytic.git. Git will work its magic and put its files in a subdirectory called twitalytic, e.g. /home/yourusername/twitalytic.

(If you want the subdirectory to be called something different, you can instead type git clone git://github.com/ginatrapani/twitalytic.git somedirectory.)

5. Set up your database

On the DreamHost Panel’s MySQL Databases page scroll down to Create a new MySQL database. For Database Name choose something unique like john_twitalytic_db. In the Use Hostname drop-down either choose an existing hostname, if you have one, or choose Create a new hostname now… If you create a new hostname, just choose e.g. “yourdomainname.com” from the drop-down on the right and in the text box to its left enter “mysql,” which will create a hostname called “mysql.yourhostname.com”. Next to First User you can either choose an existing user or create a new one. If you create a new one you’ll have to pick something unique, and of course come up with a new—strong!—password and enter it in both password boxes. Make note of all of these values (you’ll need them later) and click on Add new database now!

Back in SSH, change to the Twitalytic directory with cd twitalytic. Then import the database structure from build-db.sql into the database with the following command:

mysql -u john_twitalytic -p -h mysql.yourhostname.com john_twitalytic_db < build-db.sql

Make sure you fill in the correct values for the database you just created, and enter the password when prompted.

6. Install Smarty

DreamHost doesn’t provide Smarty, the template system Twitalytic needs, so you have to download it and install it. First, download Smarty using wget: wget http://www.smarty.net/do_download.php?download_file=Smarty-2.6.26.tar.gz.

Now unzip Smarty by typing gunzip -c Smarty-2.6.26.tar.gz | tar -xf -. This will unpack Smarty into a subdirectory, e.g. /home/username/twitalytic/Smarty-2.6.26@.

7. Set up a symbolic link to webapp

Now it’s time to make the webapp directory publicly accessible on the web. Let’s say you want to access Twitalytic at http://yourdomain.com/tweets, so you’ll make a symbolic link or shortcut to /home/yourusername/twitalytic/webapp from /home/username/yourdomain.com/tweets. To do this, first switch to your home directory by typing cd ~ (“~” is synonymous with your home directory). Now make the link by typing ln -s yourdomain.com/tweets twitalytic/webapp.

You can check to make sure the directory has been created by typing cd ~/yourdomain.com/tweets. If you don’t get a “No such file or directory” error, it worked.

8. Register your app with Twitter

Since you’re hosting Twitalytic yourself, you have to register it as an app with Twitter. Head over to Twitter’s Applications page and click on Register a new application. For Application Name put something unique like “John’s Twitalytic.” Enter anything in Description (it just can’t be blank). For Application Website put the URL you set up in the previous step, e.g. http://yourdomain.com/tweets. For Application Type choose Browser. Finally, for Callback URL put e.g. http://yourdomain.com/tweets/account/oauth.php. Finally, click on Save.

On the next page Twitter will give you some information. Copy down the Consumer key and the Consumer secret for later.

9. Edit Twitalytic configuration files

9a. Rename common/config.sample.inc.php to common/config.inc.php and edit it.

Type nano -w common/config.php. This will give you a nice Notepad/TextEdit-like editor showing the configuration file. An important thing to remember here is that the values you enter here must be enclosed in quotation marks (single or double, but they must match), and each line must end with a semicolon (;). Find the following lines.

$TWITALYTIC_CFG['oauth_consumer_key'] = 'yourconsumerkey';
$TWITALYTIC_CFG['oauth_consumer_secret'] = 'yourconsumersecret';

Replace yourconsumerkey with the Consumer key you copied down in the last step. Replace yourconsumersecret with the Consumer secret you got.

Now scroll down to this line:

$TWITALYTIC_CFG['log_location'] = '/your-path-to/twitalytic/crawler/logs/';

And replace /your-path-to/twitalytic/crawler/logs/ with e.g. /home/yourusername/twitalytic/logs/.

On this line:

$TWITALYTIC_CFG['site_root_path'] = '/';

Replace / with e.g. /tweets/ (i.e. the part of the URL after “yourdomain.com”).

Next we have to tell it where we installed Smarty in Step 6. Find this line:

$TWITALYTIC_CFG['smarty_path'] = '/usr/local/php5/lib/php/smarty/libs/';

And replace /usr/local/php5/lib/php/smarty/libs/ with e.g. /home/yoursusername/twitalytic//Smarty-2.6.26/.

Now, remember the database we set up? Have those values ready. Scroll down to these lines:

$TWITALYTIC_CFG['db_host'] = "localhost";
$TWITALYTIC_CFG['db_user'] = "user";
$TWITALYTIC_CFG['db_password'] = "s3cret";
$TWITALYTIC_CFG['db_name'] ="twitalytic";

Replace localhost with the hostname you chose, e.g. mysql.yourdomain.com. Replace username with the username you chose, e.g. john_twitalytic, and replace s3cret with the corresponding password. Replace the database name, twitalytic, with the database you chose, e.g. john_twitalytic_db.

Finally, save the file by pressing Ctrl+X, then pressing Y when it asks you if you want to “Save modified buffer”, and pressing enter when it asks for the “File Name to Write” (so it will save over the already-existing file).

9b. Rename _crawler/config.crawler.sample.inc.php to crawler/config.crawler.inc.php and edit it

Type nano -w crawler/config.crawler.inc.php. On this line:

$INCLUDE_PATH = "/Users/gina/Sites/twitalytic/common";

Replace /Users/gina/Sites/twitalytic/common with e.g. /home/yourusername/twitalytic/common

Save the file by pressing Ctrl+X then Y then enter.

9c. Rename webapp/config.webapp.sample.inc.php to webapp/config.webapp.inc.php and edit it

Type nano -w webapp/config.webapp.inc.php and then do exactly what you did in 9b.

10. Create an Twitalytic account

Open your web browser and navigate to e.g. http://yourdomain.com/tweets/session/register.php Fill it out and submit it, whereupon you’ll be sent an email with an activation link. Click on the link to set up your Twitter account and jump through the hoops to authorize Twitalytic to access your Twitter account.

11. Crawl your tweets

Twitalytic won’t do anything until the crawler has run. To run it manually, go back to your SSH window (Step 4) and from the Twitalytic directory (e.g. cd ~/twitalytic) run the crawler like this:

/usr/local/php5/bin/php crawler/crawl.php

(We have to give the full path to PHP 5 because otherwise DreamHost defaults to PHP 4 and falls over.)

Nothing will happen for a few minutes, and then you’ll be returned to the command prompt. When you go back to Twitalytic in your web browser you should see some of your recent tweets. That means it’s working!

12. Set up a cron job to crawl your tweets periodically

Because Twitter limits the number of data requests an app makes each hour, it probably won’t be able to crawl all of your tweets and replies in one go. This means you’ll have to crawl periodically to get all of your tweets, and to get new tweets. Instead of having to enter the crawl command every hour, you can tell DreamHost to do it automatically for you.

Go to the DreamHost Panel and on the Cron Jobs page click Add New Cron Job. For User choose the same shell user you chose in Step 3. Give it a meaningful Title like “Twitalytic Crawler.” For Email output to put in your email address—-you’ll want to remove this later, but for now it’s useful to make sure everything is running smoothly. Make sure Status is “Enabled.” For Command to run enter the following:

cd /home/yourusername/twitalytic &&
/usr/local/php5/bin/php /home/yourusername/twitalytic/crawler/crawl.php 

Check the Use locking box and for When to run choose Hourly. Then click on Edit to save your changes.

12. Enjoy!

Cron enhancement

If your cron job is successful there will be no output, so you won’t get an email. If you want it to email you the log of the most recent crawl, you can instead paste the following in the Command to run box (replacing yourusername, of course):

lytic=/home/yourusername/twitalytic &&
log=$lytic/logs/crawler.log &&
set -- $(wc -l $log) && lines=$1 &&
cd $lytic &&
/usr/local/php5/bin/php $lytic/crawler/crawl.php &&
set -- $(wc -l $log) &&
tail -n $((lines - $1)) $log

This is my first-ever attempt at a bash script, so be gentle. Basically what it does is count the lines in the log file, run the crawler, counts the lines again and calculates the difference, then shows that number of lines from the end of the log.

Last edited by ginatrapani, Sun Aug 30 11:40:14 -0700 2009
Home | Edit | New
Versions: