2tap.com

Category: Linux

Throttling uploads on Linux

I have recently been developing some fancy AJAX upload progress meters for a project I’m working on. This is using the new(ish) hooks in PHP which, when coupled with an extension such as APC, allow for polling of the upload progress as a file uploads in a standard HTML form.

Developing on a local server, however, means that file uploads are near instantaneous which makes testing… problematic. How best to simulate a real user’s experience?

My first instinct was to see if there were any suitable modules for Apache to enable bandwidth throttling. Apache 1.3 has mod_throttle which seems to be up to the task but I’m using Apache 2 and I don’t believe mod_throttle has been ported yet.

There also seem to be some extensions for Firefox which enable bandwidth limiting but these, unfortunately, are written for Windows environments.

The solution? trickle. Trickle is a portable lightweight userspace bandwidth shaper. It allows bandwidth limiting on a per-program basis and can be simply called with the executable as one of its parameters. Even better, it’s available in the Ubuntu repositories:

apt-get install trickle

So, to restrict Firefox’s upload bandwidth we can run the following command:

trickle -s -d 1000 -u 10 firefox

This limits the upload rate of Firefox to 10Kb/s. Perfect for testing form uploads.

Note: The -d flag shouldn’t be necessary (according to the docs) but without this arbitrarily high setting, download bandwidth seems to be hampered. The -s flag merely instructs trickle to run in standalone mode (as opposed to running through the trickle daemon).

Efficient caching of versioned JavaScript, CSS and image assets for fun and profit

“The new image is showing but I think it’s using the old stylesheet!”

Sound familiar?

Caching?

Caching of a web page’s assets such as CSS and image files can be a double-edged sword. On the one hand, if done right, it can lead to much faster load times with less strain on the server. If done incorrectly, or worse not even considered, developers are opening themselves up to all kinds of synchronisation issues whenever files are modified.

In a typical web application, certain assets rarely change. Common theme images and JavaScript libraries are a good example of this. On the other hand, CSS files and the site’s core JavaScript functionality are prime candidates for frequent change but it is not an exact science and generally impossible to predict.

Caching of assets is the browser’s default behaviour. If an expiry time is not specifically set, it is up to the browser to decide how long to wait before checking the server for a new version. Once a file is in a browsers cache you’re at the mercy of the browser as to when the user will see the new version. Minutes? Hours? Days? Who knows. Your only option is to rename the asset in order to force the new version to be fetched.

So caching is evil, right? Well, no. With a little forethought, caching is your friend. And the user’s friend. And the web server’s friend. Treated right, it’s the life of the party.

Imagine your site is deployed once and nothing changes for eternity. The optimal caching strategy here is to instruct the browser to cache everything indefinitely. This means that, after the first visit, a user may never have to contact the server again. Load times are speedy. Your server’s relaxed. All is well. The problem, of course, is that any changes you do inevitably make will never be shown to users who have the site in their cache. At least, not without renaming the changed asset so the browser considers it a new file.

So the problem is that we want the browser to cache everything forever. Unless we change something. And we want the browser to know when we do this. Without asking us. And it’d be nice if this was automated. Ideas?

Option One – Set an expiry date in the past for all assets

Never cache anything!

Not really an option, but it does solve half of the problem. The browser will never cache anything and so the user will always see the latest version of all site assets. It works, but we’re completely missing out on one of the main benefits of caching – faster loading for the user and less stress on the server. Next.

Option Two – Include a site version string in every URL

One commonly used strategy is to include a unique identifier in every URL which is changed whenever the site is deployed. For example, an image at the following URL:

/images/logo.png

Would become:

/images/logo.82.png

Here, 82 is a unique identifier. With some Apache mod_rewrite trickery, we can transparently map this to the original URL. As far as the browser is concerned, this is a different file to the previous logo.81.png image and so any existing cache of this file is ignored.

Generally, this technique is employed in a semi-automated way. The version number can either be set manually in a configuration file (for example) or pulled from the repository version number. With this technique, all assets can be set to cache indefinitely.

The above is a pretty good solution. I’ve used it myself. But it’s not the most optimal. Every time a new version of the site is deployed, any assets in the users cache are invalidated. The whole site needs to be downloaded again. If site updates are infrequent, this isn’t too much of a problem. It sure as hell beats never caching anything or, worse, leaving the browser to decide how long to cache each item.

Option Three – Fine grained caching + Automated!

Clearly, the solution is to include a unique version string per file. This means that every file is considered independently and will only be re-downloaded if it has actually changed. One technique for doing this is to use the files last-modified timestamp. This gives a unique ID for the file which will change every time the file contents change. If the file is under version control (your projects are versioned, right?) we can’t use the modified timestamp as-is since it will change whenever the file is checked out. But we can find out what revision the file was changed in (under SVN at least) so we’re still good to go.

The goal is as follows: To instruct the browser to cache all assets (in this case, JavaScript, CSS and all image files) indefinitely. Whenever an asset changes, we want the URL to also change. The result of this is that whenever we deploy a new version of the site, only assets that have actually changed will be given a new URL. So if you’ve only changed one CSS file and a couple of images, repeat visits to the site will only need to re-download these files. We’d also like it to be automated. Only a masochist would attempt to manually change URLs whenever something changes on any sufficiently complex site.

Presented here is an automated solution for efficient caching using a bit of PHP and based on a site in an SVN repository. It’s also based around Linux. It could easily be adapted to other scripting languages, operating systems and/or version control systems – these technologies are merely presented here as an example.

To achieve the automated part, we need to run a script on the checked out version of the site prior to its deployment. The script will search the project for URLs (for a specific set of assets) and will rewrite the URL for any that it finds including a unique identifier. In our case, we’ll use the svn info command to find out the last revision the file actually changed in. Another approach would be to simply take a hash of the file contents (md5 would be a good candidate) and use this as its last-changed-identifier.

Rather than renaming each file to match the included identifier we set in the URL, we’ll use mod_rewrite within Apache to match a given format of URL back to its original. So myasset.123.png will be transparently mapped back to its original myasset.png filename.

Here’s a quick script I knocked up in PHP to facilitate this process. It should be run on a checked out working copy. It scans a given directory for files of a given type (in my base, “.tpl” (HTML templates) and .css files). Within each file it finds, it looks for any assets of a given type referenced in applicable areas (href and src attributes in HTML, url() in CSS). It then converts each URL to a filesystem path and checks the working copy for its existence. If it finds it, the URL is rewritten to include the last modified version number (pulled from svn info). Once this is done we just need to include an Apache mod_rewrite rule as discussed above.

The PHP

< ?php
 
//
// config
//
$arr_config = array(
 
    // file types to check within for assets to version
    'file_extensions' => array('tpl', 'css'),
 
    // asset extensions to version
    'asset_extensions' => array('jpg', 'jpeg', 'png', 'gif', 'css', 'ico', 'js', 'htc'),
 
    // filesystem path to the webroot of the application (so we can translate
    // relative urls to the actual path on the filesystem)
    'webroot' => dirname(__FILE__) . '/../www',
 
    // regular expressions to match assets
    'regex' => array(
        '/(?:src|href)="(.*)"/iU', // match assets in src and href attributes
        '/url\((.*)\)/iU'          // match assets in CSS url() properties
    )
);
 
//
// arguments
//
 
// we require just one argument, the root path to search for files
if(!isset($_SERVER['argv'][1])) {
    die("Error: first argument must be the path to your working copy\n");
}
 
//
// execute
//
version_assets($_SERVER['argv'][1], $arr_config);
 
 
 
 
/**
 * Checks each file in the passed path recursively to see if there are any assets
 * to version.
 *
 * Only file extensions defined in the config are checked and then only assets matching
 * a particular filetype are versioned.
 *
 * If an asset referenced is not found on the filesystem or is not under version control
 * within the working copy, the asset is ignored and nothing is changed.
 *
 * @param str $str_search_path    Path to begin scanning of files
 * @param arr $arr_config         Configuration params determining which files to check, which
 *                                asset extensions to check etc.
 * @return void
 */
function version_assets($str_search_path, $arr_config) {
 
    // pull in filenames to check
    $arr_files = get_files_recursive($str_search_path, $arr_config['file_extensions']);
 
    foreach($arr_files as $str_file) {
 
        // load the file into memory
        $str_file_content = file_get_contents($str_file);
 
        // look for any matching assets in the regex list defined in the config
        $arr_matches = array();
 
        foreach($arr_config['regex'] as $str_regex) {
 
            if(preg_match_all($str_regex, $str_file_content, $arr_m)) {
                $arr_matches = array_merge($arr_matches, $arr_m[1]);
            }
        }
 
        // filter out any matches that do not have an extension defined in the asset list
        $arr_matches_filtered = array();
 
        foreach($arr_matches as $str_match) {
 
            $arr_url = parse_url($str_match);
            $str_asset = $arr_url['path'];
 
            if(preg_match('/\.(' . implode('|', $arr_config['asset_extensions']) . '$)/iU', $str_asset)) {
                $arr_matches_filtered[] = $str_asset;
            }
        }
 
        // if we've found any matches, process them
        if(count($arr_matches_filtered)) {
 
            // flag to determine if we need to write any changes back once we've processed
            // each match
            $boo_modified_file = false;
 
            foreach($arr_matches_filtered as $str_url_asset) {
 
                // use parse_url to extract just the path
                $arr_parsed = parse_url($str_url_asset);
                $str_url_path = $arr_parsed['path'] . @$arr_parsed['query'] . @$arr_parsed['fragment'];
 
                // if this is a relative url (e.g. begininng ../) then work out the filesystem path
                // based on the location of the file containing the asset
                if(strpos($str_url_path, '../') === 0) {
                    $str_fs_path = $arr_config['webroot'] . '/' . dirname($str_file) . '/' . $str_url_path;
                }
                else {
                    $str_fs_path = $arr_config['webroot'] . '/' . $str_url_path;
                }
 
                // normalise path with realpath
                $str_fs_path = realpath($str_fs_path);
 
                // only proceed if the file exists
                if($str_fs_path) {
 
                    // execute the svn info command to retrieve the change information
                    $str_svn_result = @shell_exec('svn info ' . $str_fs_path);
                    $arr_svn_matches = array();
 
                    // extract the last changed revision to use as the version
                    preg_match('/Last Changed Rev: ([0-9]+)/i', $str_svn_result, $arr_svn_matches);
 
                    // only proceed if this file is in version control (e.g. we retrieved a valid match
                    // from the regex above)
                    if(count($arr_svn_matches)) {
 
                        $str_version = $arr_svn_matches[1];
 
                        // add version number into the file url (in the form asset.name.VERSION.ext)
                        $str_versioned_url = preg_replace('/(.*)(\.[a-zA-Z0-9]+)$/', '$1.' . $str_version . '$2', $str_url_asset);
                        $str_file_content = str_replace($str_url_asset, $str_versioned_url, $str_file_content);
 
                        // flag as
                        $boo_modified_file = true;
 
                        echo 'Versioned: [' . $str_url_asset . '] referenced in file: [' . $str_file . ']' . "\n";
                    }
                    else {
                        echo 'Ignored: [' . $str_url_asset . '] referenced in file: [' . $str_file . '] (not versioned)' . "\n";
                    }
                }
                else {
                    echo 'Ignored: [' . $str_url_asset . '] referenced in file: [' . $str_file . '] (not on filesystem)' . "\n";
                }
            }
 
            if($boo_modified_file) {
                echo '-> WRITING: ' . $str_file . "\n";
 
                // write changes to this file back to the file system
                file_put_contents($str_file, $str_file_content);
            }
        }
    }
}
 
/**
 * Utility method to recursively retrieve all files under a given directory. If
 * an optional array of extensions is passed, only these filetypes will be returned.
 *
 * Ignores any svn directories.
 *
 * @param str $str_path_start  Path to begin searching
 * @param mix $mix_extensions  Array of extensions to match or null to match any
 * @return array
 */
function get_files_recursive($str_path_start, $mix_extensions = null) {
 
    $arr_files = array();
 
    if($obj_handle = opendir($str_path_start)) {
 
        while($str_file = readdir($obj_handle)) {
 
            // ignore meta files and svn directories
            if(!in_array($str_file, array('.', '..', '.svn'))) {
 
                // construct full path
                $str_path = $str_path_start . '/' . $str_file;
 
                // if this is a directory, recursively retrieve its children
                if(is_dir($str_path)) {
 
                    $arr_files = array_merge($arr_files, get_files_recursive($str_path, $mix_extensions));
                }
 
                // otherwise add to the list
                else {
 
                    // only add if it's included in the extension list (if applicable)
                    if($mix_extensions == null || preg_match('/.*\.(' . implode('|', $mix_extensions) .')$/Ui', $str_file)) {
                        $arr_files[] = str_replace('//', '/', $str_path);
                    }
                }
            }
        }
 
        closedir($obj_handle);
    }
 
    return $arr_files;
}

This is then executed like so:

php version_assets.php "/path/to/project/checkout"

The Apache config

#
# Rewrite versioned asset urls
#
RewriteEngine on
RewriteRule ^(.+)(\.[0-9]+)\.(js|css|jpg|jpeg|gif|png)$ $1.$3 [L]
 
#
# Set near indefinite expiry for certain assets
#
<filesmatch "\.(css|js|jpg|jpeg|png|gif|htc)$">
    ExpiresActive On
    ExpiresDefault "access plus 5 years"
</filesmatch>

Note: You’ll need the rewrite and expires modules enabled in Apache. This is for Apache 2. The syntax above may be somewhat different for Apache 1.3. To enable the modules in Apache 2 you can simply use:

a2enmod rewrite
a2enmod expires

Done! Now, whenever the site is deployed, only changed assets will be downloaded. Fast, efficient and headache free. Well, unless…

Caveats

The above script is purely to illustrate the process. Your specific needs may well need a slightly different approach. For example, there may be other areas it needs to look for URLs. If you do a lot of dynamic construction of URLs or funky script includes with JavaScript, you may need a secondary deployment script or procedure in order to accommodate such features. Using this technique, you must be careful to add the unique version to all the file types looked for in the deployment script, otherwise you’re telling the browser to cache a file indefinitely without the URL changing on new versions being deployed.

Another area to watch out for would be if you serve assets from different domains. Again, this technique will work in principle but will need some modification. It’s an exercise left to you, dear reader.

So, there you have it. A reasonably hassle free, efficient and optimised caching policy for your web applications. I hope you find this helpful – good luck.

Ubuntu on the Asus Eeepc 901/1000/1000h

Useful custom kernel (including working wireless drivers) available from “adamm”‘s repository here: http://www.array.org/ubuntu/

At the time of writing not everything’s fully worked out (issues with the headphone socket etc.) but it’s a good way to get the stock Hardy 8.04 install functional and on the net.

For the most basic install (to get wireless working at least) you can just copy a couple of debs onto a usb stick and “dpkg -i *” install them before getting the rest of the updates via the net repository.

Up to date discussion of progress currently available in this thread.

Sharing files between a Windows guest and Ubuntu host using VMware and Samba

VMware Workstation (and presumably the other enterprise-grade products in the VMware family) come with the handy “shared folders” feature which makes sharing files between a host and a virtual appliance nice and simple. The free products (VMware Player and Server) do not, unfortunately, have this ability and so we must find another way.

This quick guide shows how to use Samba to achieve the same aim. It is aimed at Ubuntu users but (the general concepts at least) should work on any modern Linux distribution. It is also written with a Windows XP guest in mind but a similar process should work in Windows Vista, Windows 2000 and other operating systems.

The goal is to set up a network share which both operating systems can transparently read and write to.

For reference, I am using Ubuntu 7.04 (Feisty).

Which VMware?

I’ll presume you have VMWare already installed with a Windows XP guest virtual appliance already set up. This guide is aimed at users of VMware Player and Server editions (I am using VMware Player).

VMware Player is a simple:

sudo apt-get install vmware-player

away. For the Server edition, you’ll probably want to consult the wiki.

Install Samba

If you don’t already have Samba installed, now would be a good time to do it:

sudo apt-get install samba

In order to keeps things clean and easy to manage, we’ll set up a new user account to own the share. This account name will be used when connecting to the share from within Windows. For the purposes of illustration, I will be creating a share called sandbox with the username and group also being sandbox.

Create the new group and user account with no login privileges:

sudo groupadd sandbox
sudo useradd --gid sandbox --shell /bin/false sandbox

To avoid creating a redundant home directory, you can add:

--home /nonexistent

to the end of the previous command.

Now you need to add a matching Samba account. You’ll be prompted to set a password – make note of this as this is what you will use to connect to the share from within Windows.

sudo smbpasswd -a sandbox

Next you’ll need to create a directory to be used as the share (assuming you don’t already have one). Create a directory, setting the username to your usual login and group to sandbox. Then chmod the directory 775 (assuming you wish both yourself and the virtual appliance to have read/write access). Here is what I entered:

cd $HOME
mkdir sandbox
sudo chown russ:sandbox sandbox
sudo chmod 775 sandbox

When you write to the share from within Ubuntu, new files will be created with the default permissions 644 with the username and group being your own user account. When your Windows client connects to the share, it will access it as if it were the local system user sandbox and so the group permissions will apply and you won’t be able to write to any files created from within Ubuntu.

To get around this problem, we can set the groupid bit for the sandbox directory which means all new files created will inherit the permissions of the parent and so the sandbox user from within Windows will be able to make read and write changes as desired, you can download them as pdf with sodapdf.

sudo chmod g+s sandbox

If you don’t understand the above, don’t worry, just chmod the directory with the command above and all should be well.

Setting up the Samba share

Now all that’s left to do is to tell Samba about our share. Open up /etc/samba/smb.conf in your favourite text editor.

sudo gedit /etc/samba/smb.conf

Firstly, we need to set the security mechanism to user. Look for the line:

security = user

and make sure it is uncommented (remove the preceding semicolon if there is one).

Now, scroll down to the Share Definitions section and add the following:

[sandbox]
path = /home/russ/sandbox
valid users = sandbox
read only = No
create mask = 0777
directory mask = 0777

Be sure to set the correct path to your share. Save the file and restart the Samba daemon:

sudo /etc/init.d/samba restart

That should be it. You should now be able to connect to your share from within the Windows guest. At this point you need to know what IP address to connect to from within Windows. This depends on what networking mode you are using for your virtual appliance.

Bridged Networking

In this mode, your guest OS has its own IP address and so the address it needs to connect to is your usual host machine’s address. In this case your address is probably the top line from the output of this command:

ifconfig | grep "inet addr:"

NAT networking

In this mode, your guest OS shares your host’s address (in terms of other machines on the LAN) and communicates with the host via a private network. In this case, the IP address you need to connect to is most likely the bottom one from the output of this command:

ifconfig | grep "inet addr:"

Connecting to the share from within Windows

If you are unsure as to your host’s IP address, try and ping it first from within the Windows guest to confirm you have the right one.

Windows Map Networking Drive dialog

Having worked out what IP address to connect to, you should now be able to connect to your share from within Windows.

The easiest way to do this is:

  1. Open up My Computer
  2. Go to the Tools menu and then Map Network Drive
  3. Choose a drive letter to map the network share to
  4. In Folder, enter: \\HOSTIP\sandbox (replacing HOSTIP)
  5. Click “Connect using a different user name” and enter:
    • username: sandbox
    • password: yourpassword
  6. Click OK and then the Finish button to connect

Hopefully, congratulations are in order. If not, be sure to make sure that any firewalls you have running (host or guest) have the correct rules set to allow communication between the two systems.

A note on security

At this point, assuming you have a successful connection, it is worth noting that any other machine on your local network (and potentially the internet if you are not behind a NAT or firewall) can connect to your share (assuming they have the correct credentials).

If you are only using Samba for sharing with VMware (as I am), you may wish to restrict access to VMware only. This is quite easy to do since VMware creates virtual network interfaces for communication between hosts and guests. This means we can set Samba up to ignore any communications that do not originate from these interfaces.

To do this, open up your Samba configuration file again:

sudo gedit /etc/samba/smb.conf

Make sure you have a:

bind interfaces only = true

line and that it is uncommented (remove any preceding semicolons). Just above this should be an interfaces line (most likely commented out). Add the following just below this:

interfaces = vmnet0 vmnet1 vmnet8

These are the virtual interfaces VMware uses for each type of virtual networking: bridged, host only and NAT respectively.

After making the changes, you will need to restart Samba again:

sudo /etc/init.d/samba restart

and possibly shutdown your VMware session and restart the VMware service:

VMware Player:

sudo /etc/init.d/vmware-player restart

VMware Server:

sudo /etc/init.d/vmware restart

Finished

You should now have a Samba share configured which is only accessible from your VMware guest appliances. Good luck!

Copyright © 2020 2tap.com

Theme by Anders NorenUp ↑