Wednesday, May 23, 2007

Do You Know Bassa?

I have been working on an open source project for the past few months at University of Colombo School of Computing (UCSC). It was a solution for the bandwidth problem experienced by UCSC staff and students. You can get more information on how and why I initiated this project from Bassa site.

The project was circled around several existing concepts like, content distribution, content caching, content sharing, policy based networking etc. Bassa basically relies on several other open source products to perform it's tasks.

1. Squid
2. Apache
3. MySQL
4. PHP

Bassa system's front line of defence is Squid proxy. Squid is armed with an ACL to detect requests (HTTP/FTP) that request for files larger than a given threshold. This threshold is defined by the network policy. If the file requested is larger than the threshold Squid will redirect the user to a different site called GADisk (Globally Accessible Disk) which works as the front end of Bassa (uses LAMP).

If the file has been already downloaded by a different user, the file will be streamed back to the user. The above process is fully transparent from the user's perspective. The user will even not get the idea that he is getting the file from a local content server (Edge sever). The only thing he will notice is the increased speed of the transaction.

If the file was not downloaded by a different user, the user will be immediately prompted for his user name and password by GADisk. After logging in successfully the user's file request will be queued in Bassa server (written in C). The whole request queue will be downloaded by Bassa server in batch mode when bandwidth is available.

The other advantage is that users can log in to the system and start browsing or searching the cache through their web browser. This is a feature that traditional proxy caches cannot provide you. And GADisk is capable of describing the content in an extraordinary way so that the users can get an idea what the content look like before they download it in to their local computer. For an instance after Bassa downloads a video/audio it will generate a 10 second trailer of the video/audio in Flash video format. In addition there will be thumbnails for videos as well (Other information like Genre, Artist, Album will also come along with above mentioned items). And Bassa is also capable of doing above process even if the file is compressed. For ISO and compressed files Bassa will generate file lists within the archive. Similarly Bassa also generates PNGs of PDF/Open office doc pages for on site viewing.

Content describing comes really useful for users when they perform searches. All the content is tagged according to their MIME Type, therefore users can narrow the search. And also they can preview audio/video or documents (PDF, Open office, etc.) on site before downloading. After looking at the content users have the option of commenting about them, in other words Bassa can even work as a content oriented blog.

Given below is a diagram of high level architecture of Bassa.





Finally the Bassa had delivered following benefits for the end user.

1. They have more bandwidth for browsing web, since large bandwidth consuming downloads are done at off-peak.
2. Users can effortlessly share the content they have downloaded.
3. Users can save LAN bandwidth and disk space by identifying what they really need to download before they process downloading a file.
4. Users can communicate through content based blogging.
5. Reducing the traffic (Bandwidth usage) at remote servers by caching content in Bassa cache.

After implementing and deploying this tool at UCSC network, the web has become more responsive, and users are quite satisfied with it.