Archive for the ‘Projects’ Category

epi_sort.py: Filename comparison

As every Joe Six-pack would do, I usually write a lot of scripts to automate my tasks as much as I can. Most of them aren’t even worth mentioning, but nevertheless I have been meaning to start posting some of those. I’ve stumbled upon lots of jewels on the net that seemed worthless to their authors, so if any one gets to use one of mine I’ll be happy. You never know.

The problem:

I have a directory full of unclassified media files, some are duplicates, some aren’t, and each one follows a different naming convention.

I even try to classify them from time to time, so you can throw some directories into the pack. Sometimes, I even create two or three directories for the same group-series-category-whatever before I realize there is an existing one with a slightly different name. And frequently a lot of files remain unclassified, many of which could fit into one of the directories I mentioned.

Of course … whenever a new file arrives to my home server, it gets thrown into that very same directory, so Chaos keeps spreading, as it always does.

To clarify things, lets show an example:

drwxrwxrwx 1 user user      4096 2010-07-03 11:50 01_Battlestar_Enterprise
drwxrwxrwx 1 user user      4096 2010-07-03 11:42 02_Startrek_Galactica
drwxrwxrwx 1 user user      4096 2010-07-03 11:50 03_battlestar.enterprise-season.1
-rwxrwxrwx 1 user user 220393472 2010-07-03 02:49 battlestar.enterprise.s1e01.avi
-rwxrwxrwx 1 user user 221227008 2010-07-03 02:50 Battlestar_Enterprise_1_22.mp4
-rwxrwxrwx 1 user user 195393472 2010-07-03 02:49 startrek.galactica.4x15.[ripper_22].mkv

As you can imagine, sorting things up can get really tedious, and there is no automatic way of doing it that I know of.

I had some time this morning and got fed up with it. Every little piece of help is more than welcome, and here is where Python comes to the rescue.

The solution:

There are dozens of ways to do this, but I ended up coding a quick hack to help me sort things out.
It just compares the names of files and directories, and estimates the similarities. Anything above a 50% match is usually correctly estimated.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# (C) 2010, Taher Shihadeh
# Licensed: GPL v2

"""
The script works based only on names of files and directories in a
non-recursive manner.

It takes a path as parameter and tries to determine if the names of
the contents look alike.

It removes separator characters, numbers and file extensions prior to
the comparison.
"""

import os
import sys
import string
from operator import itemgetter

FAST = False # Change this to skip file-to-file comparisons
SEP  = '_-+~.·:;·()[]¡!¿?<>'

def main (path):
    lst1    = os.listdir (path)
    lst2    = lst1
    len_lst = len(lst1)
    count   = 0.0
    results = []

    for x in lst1:
        for y in lst2:
            if x==y:
                continue
            x_dir = os.path.isdir(x)
            y_dir = os.path.isdir(y)

            if FAST and not (x_dir or y_dir):
                continue

            result = {'A': (x, x_dir), 'B': (y, y_dir)}

            str1, str2 = x, y
            if not x_dir:
                str1,_ = os.path.splitext (x)
            if not y_dir:
                str2,_ = os.path.splitext (y)

            result['factor'] = compare (str1,str2)
            results.append(result)

        lst2.remove(x)
        count += 1
        print >> sys.stderr, '%.2f%% done' %((count / len_lst)*200)

    show(results)

def split (str1):
    trans = string.maketrans(SEP, ' '*len(SEP))
    return str1.translate(trans).split()

def clean (lst):
    assert type(lst) == list
    return filter(lambda x: not x.isdigit(), lst)

def compare (str1, str2):
    """Return similarity factor as percentage"""
    aux1 = clean (split (str1.lower()))
    aux2 = clean (split (str2.lower()))

    set_or  = set(aux1) | set(aux2)
    set_and = set(aux1) & set(aux2)

    return (float(len(set_and)) / float(len(set_or)))*100

def show (results):
    """Show most similar last"""
    for x in sorted(results, key=itemgetter('factor')):
        a,b = x['A'],x['B']
        if not b[1] and a[1]:
            a,b = b,a
        print '%.2f \t %s \t --> %s' %(x['factor'], a[0], b[0])

if __name__=='__main__':
    try:
        path = sys.argv[1]
    except IndexError:
        path = os.getcwd()

    main (path)

I don’t think any one is going to use it, but what the hell. It’s a big Internet ;-)

Read the rest of this entry »

Cherokee Summit 2010: Mission accomplished

We’ve been working in frenzy since last week. Not that we usually don’t, but this was something more. The Cherokee Summit just took place last weekend, and among other things we released our latest and greatest Cherokee v1.0, we defined the roadmap for v2.0, we shared knowledge with some of the most impressive experts in High Availability I’ve ever met, and above all, we had the chance to meet face to face. Our Community is, without a doubt, stronger than ever. The summit has been a great success. We had people attending from all over the World, all levels of expertise, and even from all ages. On this photo you can see Alvaro and the youngest attendee.

Everything was recorded, so we will upload the slides and videos of all our sessions really soon. For now, only the photo gallery is available. Take a look at the mugshots.

I’m really glad we could make this Summit. It surpased all my expectations. By far. It was an unbelievable experience, and we had lots of fun. Take a look at our family photo. If you want to know which of the guys above is me, here’s a clue: “In brightest day…”.

I’m really looking forward to the next summit. Cherokee Summit 2010 was awesome. I’m sure the next one will be even better.

Read the rest of this entry »

Countdown to Cherokee Summit 2010

Only one more week to go!
I’m going to remind you all about the first Cherokee Summit. It will be held next week in Madrid (7-8 May), and I’m really excited about it. We will release Cherokee 1.0, will rub shoulders with many members of our community, and we’ll define the road-map for Cherokee 2.0. I’ll be giving a tech-talk along Jonathan Hernandez, so you know when and where to find me.

I’m sure that meeting many of the developers of Cherokee in person will be the highlight for me.

If by any chance you’ll be in Madrid that weekend, don’t forget to register in time and join us.

It’s gonna be legen… wait for it… dary!

Read the rest of this entry »

It’s official: Cherokee Summit 2010 is on its way!

It is no secret that our Cherokee-Project Community has been growing steadily and relentlessly over the last couple of years. In fact, it has been doing so well that we’ve reached a point where holding a conference about the project actually makes a lot of sense. A lot of people have been asking about this, and after a lot of work we are ready to announce our first Summit, to be held on May 7th-8-th.

You can read Alvaro’s announcement, or you can check out the Summit web-site.

Cherokee will be an important topic, but it won’t be the only one. Those will be a couple of days fully dedicated to High Performance and Scalable Web topics, so there’s room for everyone to join in.  We are commited to reaching the 1.0 milestone of Cherokee by then, so we will also have a party to celebrate it.

It’s going to be fun. I’ll be a speaker at the summit and I’m really looking forward to personally meeting many of the members of the project. Thanks to our sponsors we’ve managed to make the event completely free, so don’t forget to register while we still have free spots!

UPDATE: We’ve written a little brochure (~100KB) that can be used to  let your colleagues know about the summit. Do not hesitate to send it to any coworker or friend who would be interested in attending a High Performance and Scalable Web event.

Read the rest of this entry »

Cherokee screencast season kicks off

On a previous post I introduced our first Cherokee Project screencast. We were going to wait for a new and improved website before we made them public, but what the hell! Why wait? I’m sure the new Cherokee-Project Screencast Collection will come in handy for many of you.

video-footage

From here I’d like to thank P.V. Anthony for his invaluable advice on audio production and my old friend Sara Genge for lending her voice to the project (and for her awesome fiction writing, but that is another story).

Read the rest of this entry »

Our first Cherokee screencast

Alvaro an I have been putting together a screen-cast to show an overview of Cherokee-Admin’s capabilities. It is just an introduction, but I think this kind of thing is really helpful to spread out the word about Cherokee’s multiple merits.

We wanted to brag about our little baby. After all, not every serious web server out there has a killer interface to configure it. Take a look at our Cherokee Web Server introductory screen-cast.

You might want to see it at full screen for readability.

It’s just one of many to come. We’ve got some more planned, so I’ll let you know when they’re ready.

Read the rest of this entry »

Cherokee 0.99.25 party kit!

No, don’t worry. I’m not going to play with you and expect you to work for free as my personal advertisement company. I have to recognize that I’m astonished that Microsoft got away with it with all it’s Windows 7 craze, which once again proves that there are lots of guys out there that outsmart me by far. I’m talking about their PR guys, mind you.

If you expect a party kit from us you’ve come to the wrong place. We actually believe our software is so good that it is a prize for its own merits. It has been a while since I last announced one of our releases, mainly because I didn’t have much to add besides what was told on the official announcements. As always, a lot of development effort is being invested in our flag product, and this is something that doesn’t go by unnoticed. This weekend we decided to release Cherokee 0.99.25. As you can tell by the .25 part, lots of fixes and enhancements have been added steadily release after release.

Cherokee Webserver

I hope you enjoy it. We’ve tried to update all the documentation for this release, and we’ve automated most of the recipes in our cookbook by adding lots and lots of configuration Wizards, so hopefully you’ll be able to set up anything in a matter of seconds. As always feedback and feature requests are more than welcome at the mailing lists. Here are links to download the tarball and the online documentation.

Read the rest of this entry »

Jaunty Server on Compact Flash: running Ubuntu 9.04 on a Thin Client

Previously on UnixWars …
I said in another post that I would be using a Thin Client as my home server. The machine is fanless, diskless and it makes no noise. The system boots from a low power Compact Flash. After setting things up, some friends decided to buy the exact same machine, so I’m going to write down the steps I followed to configure the server. Please note that Ubuntu Jaunty is still in Alpha 5 stage, so you might want to rethink which release you are going to install. I’ve had no problems whatsoever, so for a setup such as mine you should be safe. Alpha 6 is due to be released shortly and the first beta will be out on March 21st. Then, just one month will separate us from the official release day, so by now I’d say Jaunty has done most of it’s homework.

The easiest thing to do would be installing the system normally through PXE (look for the link at the end of this post if you’re interested), but there is a problem. Flash devices have a limited number of write-cycles, and wear levelling is not used on consumer grade cards. There are file systems optimized for flash devices -such as JFFS2- but the benefits are not obvious here. As I understand it, these are designed for industrial devices with direct access to the memory cells. Consumer devices have an abstraction layer that make them transparent and enable us to use them as any other normal storage. It should provide everything we need, such as wear levelling. But as I said before, it normally doesn’t for consumer-type devices.

Alternatives

Having discarded the optimal solution for both technological and budget limitations, we are left with three alternatives.

  1. Use the CF as an ordinary device, with total disregard for the premature death of the drive. This is the simplest, and provided you have a lifetime warranty for your media it might not be such a bad idea.
  2. Use the CF as if it were a LiveCD, maybe even adding persistence for our changes. This option should prove perfect if we want to use the Thin Client as a desktop system. Temporary changes are written to RAM, permanent changes will be stored on disk and even software updates will remain between reboots. It is a bit more complicated, so I’ll leave this in my TO DO list for now.
  3. Install everything by hand on a local copy, configure the system to be able to run with as little disk access as possible (ideally, from a read-only root file-system), and dump it to the CF. Since space is not a problem for me (I have both a 1GB and 8GB CF cards), I’ll prefer this approach. In case everything was set as read-only, we would only  have to remount the file systems as writable during the process. As for SquashFS, though it seems an ideal choice for an embedded system that needs no upgrades, I’ll discuss this in other posts. It is simpler to deal with a “standard” system, not having to recreate binary boot images every time you update the box.

My home server is used to share media and printers on my local network, manage downloads, and above all, provide a permanent access to my home network from wherever I might be. This is, by far, the handiest thing ever. At least it is for me. You never know when you are going to need to bypass Internet filters when you are roaming, for example. The wonders of SSH truly never cease to amaze me :)

I’ve opted for letting some writing be done to the CF, albeit the write-intensive tasks have been dealt with, because it is easier for the system to be updated, and because some of the services I’ll be using refuse to work on read-only mode. However, my next entry will be about using aufs/unionfs, and replicating this setup on totally read-only and compressed system that will run from the 256MB standard CF that comes with the computer.

Click to continue reading “Jaunty Server on Compact Flash: running Ubuntu 9.04 on a Thin Client”

Read the rest of this entry »

Fattening up a Thin Client: silent cheap home server

Futro S400Do you have a server at home running 24/7? Having permanent access to your home network can be very useful at times, as is sharing media and printers, or managing your downloads. My last server was actually a downgrade from my previous box in computing terms. It was no powerhouse, but being a fanless Epia with minimal power consumption and very low noise was a huge upgrade for me. I just connected the printer, some external USB drives, installed Debian and it has been sitting in a corner for ages, working flawlessly.

back

A while ago I was looking for a similar noiseless solution for my brother in law and a friend, and the itch started all over again. I decided it was a good moment to upgrade my system. The low power consumption and being fanless were a must, but I also wanted it to have integrated gigabit ethernet. So I though using a Thin Client would be a good solution. These are normally fanless and have very little power needs, and some even have decent processors and Gigabit Ethernet. After looking for a while, I settled for a Fujitsu-Siemens Futro S400 that I found dirt cheap in Ebay.

Click to continue reading “Fattening up a Thin Client: silent cheap home server”

Read the rest of this entry »

Cherokee v0.98: Jailbreak

It has been a while since my last post. Some rough month this has been, oh boy. Anyway, I wanted to let you know that we have released the new and improved Cherokee a couple of days ago.

Since the project has advanced so much over the last months, we have decided to boost the release closer to the 1.0 milestone. Many things have been improved, in stability, features  and performance. The Windows build has received some attention, and though it still has a lot of issues, Stefan de Konik has built a beta Windows package for people to try it out and help us sort out the rest of the problems. Great job as always, Stefan! The admin part is still not running under Windows, but you can always create the necessary config files on another environment and try out the Windows binaries.

In this release the caching mechanisms have been fixed, the web server can now be bound to multiple IPs and ports at the same time, a new balancing strategy has been added (so sticky sessions can now be implemented, for instance) …

To find out more about it, read the official release note for 0.98

Cherokee Webserver

Try it out. Cherokee is the fastest web server there is right now.

Her you have some links:

Read the rest of this entry »