Archive for the ‘Linux’ Category

epi_sort.py: Filename comparison

As every Joe Six-pack would do, I usually write a lot of scripts to automate my tasks as much as I can. Most of them aren’t even worth mentioning, but nevertheless I have been meaning to start posting some of those. I’ve stumbled upon lots of jewels on the net that seemed worthless to their authors, so if any one gets to use one of mine I’ll be happy. You never know.

The problem:

I have a directory full of unclassified media files, some are duplicates, some aren’t, and each one follows a different naming convention.

I even try to classify them from time to time, so you can throw some directories into the pack. Sometimes, I even create two or three directories for the same group-series-category-whatever before I realize there is an existing one with a slightly different name. And frequently a lot of files remain unclassified, many of which could fit into one of the directories I mentioned.

Of course … whenever a new file arrives to my home server, it gets thrown into that very same directory, so Chaos keeps spreading, as it always does.

To clarify things, lets show an example:

drwxrwxrwx 1 user user      4096 2010-07-03 11:50 01_Battlestar_Enterprise
drwxrwxrwx 1 user user      4096 2010-07-03 11:42 02_Startrek_Galactica
drwxrwxrwx 1 user user      4096 2010-07-03 11:50 03_battlestar.enterprise-season.1
-rwxrwxrwx 1 user user 220393472 2010-07-03 02:49 battlestar.enterprise.s1e01.avi
-rwxrwxrwx 1 user user 221227008 2010-07-03 02:50 Battlestar_Enterprise_1_22.mp4
-rwxrwxrwx 1 user user 195393472 2010-07-03 02:49 startrek.galactica.4x15.[ripper_22].mkv

As you can imagine, sorting things up can get really tedious, and there is no automatic way of doing it that I know of.

I had some time this morning and got fed up with it. Every little piece of help is more than welcome, and here is where Python comes to the rescue.

The solution:

There are dozens of ways to do this, but I ended up coding a quick hack to help me sort things out.
It just compares the names of files and directories, and estimates the similarities. Anything above a 50% match is usually correctly estimated.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# (C) 2010, Taher Shihadeh
# Licensed: GPL v2

"""
The script works based only on names of files and directories in a
non-recursive manner.

It takes a path as parameter and tries to determine if the names of
the contents look alike.

It removes separator characters, numbers and file extensions prior to
the comparison.
"""

import os
import sys
import string
from operator import itemgetter

FAST = False # Change this to skip file-to-file comparisons
SEP  = '_-+~.·:;·()[]¡!¿?<>'

def main (path):
    lst1    = os.listdir (path)
    lst2    = lst1
    len_lst = len(lst1)
    count   = 0.0
    results = []

    for x in lst1:
        for y in lst2:
            if x==y:
                continue
            x_dir = os.path.isdir(x)
            y_dir = os.path.isdir(y)

            if FAST and not (x_dir or y_dir):
                continue

            result = {'A': (x, x_dir), 'B': (y, y_dir)}

            str1, str2 = x, y
            if not x_dir:
                str1,_ = os.path.splitext (x)
            if not y_dir:
                str2,_ = os.path.splitext (y)

            result['factor'] = compare (str1,str2)
            results.append(result)

        lst2.remove(x)
        count += 1
        print >> sys.stderr, '%.2f%% done' %((count / len_lst)*200)

    show(results)

def split (str1):
    trans = string.maketrans(SEP, ' '*len(SEP))
    return str1.translate(trans).split()

def clean (lst):
    assert type(lst) == list
    return filter(lambda x: not x.isdigit(), lst)

def compare (str1, str2):
    """Return similarity factor as percentage"""
    aux1 = clean (split (str1.lower()))
    aux2 = clean (split (str2.lower()))

    set_or  = set(aux1) | set(aux2)
    set_and = set(aux1) & set(aux2)

    return (float(len(set_and)) / float(len(set_or)))*100

def show (results):
    """Show most similar last"""
    for x in sorted(results, key=itemgetter('factor')):
        a,b = x['A'],x['B']
        if not b[1] and a[1]:
            a,b = b,a
        print '%.2f \t %s \t --> %s' %(x['factor'], a[0], b[0])

if __name__=='__main__':
    try:
        path = sys.argv[1]
    except IndexError:
        path = os.getcwd()

    main (path)

I don’t think any one is going to use it, but what the hell. It’s a big Internet ;-)

Read the rest of this entry »

Cherokee Summit 2010: Mission accomplished

We’ve been working in frenzy since last week. Not that we usually don’t, but this was something more. The Cherokee Summit just took place last weekend, and among other things we released our latest and greatest Cherokee v1.0, we defined the roadmap for v2.0, we shared knowledge with some of the most impressive experts in High Availability I’ve ever met, and above all, we had the chance to meet face to face. Our Community is, without a doubt, stronger than ever. The summit has been a great success. We had people attending from all over the World, all levels of expertise, and even from all ages. On this photo you can see Alvaro and the youngest attendee.

Everything was recorded, so we will upload the slides and videos of all our sessions really soon. For now, only the photo gallery is available. Take a look at the mugshots.

I’m really glad we could make this Summit. It surpased all my expectations. By far. It was an unbelievable experience, and we had lots of fun. Take a look at our family photo. If you want to know which of the guys above is me, here’s a clue: “In brightest day…”.

I’m really looking forward to the next summit. Cherokee Summit 2010 was awesome. I’m sure the next one will be even better.

Read the rest of this entry »

Org-mode to the rescue

It’s been a while since I started using Org-mode. Like four months or so. When I discovered it I knew I would blog about it sooner or later, but I didn’t want to rush things.

Before writing about it,  I wanted to give it a run to see if it could be of any help to a rather absentminded guy. I’m sure many long time Emacs users out there are forgetful at times. I know I am. It seems to fit the profile somehow ;-)

Since I couldn’t rely too much on my memory for these things, I had to find a task management solution. That’s where Org-mode comes in.

If you are like me, maybe Org-mode can save the day. I seem to be able to organize my time a lot better since I started using it.

Org-mode is a mode for keeping notes,  ToDo lists, and project planning in Emacs, with a fast and effective plain-text system. It seems awfully spartan  and simplistic at first, but it is nothing less than magnificent in features. Being a part of Emacs is also a plus for me, since it is the first thing I install on any platform I happen to be working. Besides the OS independence, not being tied at all to a particular application does get extra points. Formats may vary over time, but plain text files are here to stay.

These days I’m using it as an outliner, as a note-taking application, to manage my accounting and, most importantly, as a Getting Things Done (GTD) tool. I don’t quite yet use it for Web and PDF Authoring, but it never hurts to know I could if I wanted.

And for now the deal is working pretty well for me. It is very flexible, has lots of other uses, and also a very rich and knowledgable community, so I totally recommend you take a look at some of the links of this post. It will be worth your while.

Read the rest of this entry »

It’s official: Cherokee Summit 2010 is on its way!

It is no secret that our Cherokee-Project Community has been growing steadily and relentlessly over the last couple of years. In fact, it has been doing so well that we’ve reached a point where holding a conference about the project actually makes a lot of sense. A lot of people have been asking about this, and after a lot of work we are ready to announce our first Summit, to be held on May 7th-8-th.

You can read Alvaro’s announcement, or you can check out the Summit web-site.

Cherokee will be an important topic, but it won’t be the only one. Those will be a couple of days fully dedicated to High Performance and Scalable Web topics, so there’s room for everyone to join in.  We are commited to reaching the 1.0 milestone of Cherokee by then, so we will also have a party to celebrate it.

It’s going to be fun. I’ll be a speaker at the summit and I’m really looking forward to personally meeting many of the members of the project. Thanks to our sponsors we’ve managed to make the event completely free, so don’t forget to register while we still have free spots!

UPDATE: We’ve written a little brochure (~100KB) that can be used to  let your colleagues know about the summit. Do not hesitate to send it to any coworker or friend who would be interested in attending a High Performance and Scalable Web event.

Read the rest of this entry »

Our first Cherokee screencast

Alvaro an I have been putting together a screen-cast to show an overview of Cherokee-Admin’s capabilities. It is just an introduction, but I think this kind of thing is really helpful to spread out the word about Cherokee’s multiple merits.

We wanted to brag about our little baby. After all, not every serious web server out there has a killer interface to configure it. Take a look at our Cherokee Web Server introductory screen-cast.

You might want to see it at full screen for readability.

It’s just one of many to come. We’ve got some more planned, so I’ll let you know when they’re ready.

Read the rest of this entry »

Cherokee 0.99.25 party kit!

No, don’t worry. I’m not going to play with you and expect you to work for free as my personal advertisement company. I have to recognize that I’m astonished that Microsoft got away with it with all it’s Windows 7 craze, which once again proves that there are lots of guys out there that outsmart me by far. I’m talking about their PR guys, mind you.

If you expect a party kit from us you’ve come to the wrong place. We actually believe our software is so good that it is a prize for its own merits. It has been a while since I last announced one of our releases, mainly because I didn’t have much to add besides what was told on the official announcements. As always, a lot of development effort is being invested in our flag product, and this is something that doesn’t go by unnoticed. This weekend we decided to release Cherokee 0.99.25. As you can tell by the .25 part, lots of fixes and enhancements have been added steadily release after release.

Cherokee Webserver

I hope you enjoy it. We’ve tried to update all the documentation for this release, and we’ve automated most of the recipes in our cookbook by adding lots and lots of configuration Wizards, so hopefully you’ll be able to set up anything in a matter of seconds. As always feedback and feature requests are more than welcome at the mailing lists. Here are links to download the tarball and the online documentation.

Read the rest of this entry »

Linux turns 18. Happy birthday!

I’m sure plenty of sites will talk about it, so I’ll keep it short. Precisely 18 years ago, Linux was born. I’m told Linus -nicknamed Linux back then- wanted to call it Freax, but it didn’t stick.

From: torvalds@klaava.Helsinki.FI (Linus Benedict Torvalds)
Newsgroups: comp.os.minix
Subject: What would you like to see most in minix?
Summary: small poll for my new operating system
Message-ID: <1991Aug25.205708.9541@klaava.Helsinki.FI>
Date: 25 Aug 91 20:57:08 GMT
Organization: University of Helsinki

Hello everybody out there using minix -

I’m doing a (free) operating system (just a hobby, won’t be big and professional like gnu) for 386(486) AT clones. This has been brewing since april, and is starting to get ready. I’d like any feedback on things people like/dislike in minix, as my OS resembles it somewhat (same physical layout of the file-system (due to practical reasons) among other things).

I’ve currently ported bash(1.08) and gcc(1.40), and things seem to work. This implies that I’ll get something practical within a few months, and I’d like to know what features most people would want. Any suggestions are welcome, but I won’t promise I’ll implement them :-)

Linus (torvalds@kruuna.helsinki.fi)

PS. Yes – it’s free of any minix code, and it has a multi-threaded fs. It is NOT protable (uses 386 task switching etc), and it probably never will support anything other than AT-harddisks, as that’s all I have :-(.

Regarding Linux portability, one could easily loose track. Some hobbies can change the World.

Read the rest of this entry »

Ubuntu 9.04 problems: Jaunty fixes for HP DV6 1120es

A friend of mine just asked me for help with his new laptop. He wanted to try out Jaunty, but got stuck with a couple of show stoppers: no WIFI and no sound. The hardware is already supported in newer releases of ALSA and the Linux kernel, so 9.10 “Karmic” will probably run flawlessly with this HP out of the box. Here’s how to fix it:

  • Wifi: it ships an Atheros AR9285 wireless card. From the Official Linux Wireless wiki we can see that it is supported on kernels >= 2.6.29. Jaunty comes with 2.6.28, but it is not a problem:
sudo apt-get install linux-backports-modules-jaunty
  • Sound: Update ALSA. This is for the latest snapshot:
sudo apt-get install build-essential
wget http://ftp.kernel.org/pub/linux/kernel/people/tiwai/snapshot/\
alsa-driver-unstable-snapshot.tar.bz2 -O -| tar xvj
cd alsa-driver-unstable
./configure --enable-dynamic-minors
make
sudo make install-modules
echo "options snd_hda_intel model=hp-dv5" | \
sudo tee -a /etc/modprobe.d/alsa-base

Problem solved. Reboot and enjoy.

Read the rest of this entry »

Something BIG is about to happen

Lately I’ve been wanting to buy a netbook. I’m not an impulsive guy when it comes to buying new gadgets, so I have been postponing the moment. But having an unexpensive, semi-disposable system laying around can come in quite handy.

I mention this because the time to buy is getting closer and closer, and yet I’m astonished that so little people out there are making a big fuss over the next big thing: ARM netbooks and Linux. Sure, we already have some of these out there from Skytone and Elonex -and probably others-, but those are fairly limited machines performance-wise.

I’m talking about cheap mini-laptops that can be used to surf the web, write reports and even play 720p video, all with an extremely low power consumption, 10+ hour battery life and very little heat generation. These machines are already on their way, will use newer and more powerful ARM processors and will hit the market in the following months.

I believe this is, in fact, a silent revolution that doesn’t even ripple the surface. This will change everything. And why is that? Well… for starters there is a huge market for something like this, like the sells of anything Netbook-related have been steadily showing lately. An it seems hardly possible that Microsoft will release an ARM enabled Windows XP. This means Linux will get yet another boost in market share when these machines become mainstream, although I’m pretty sure Microsoft will still claim a 90%+ share in the netbook segment. Despite this alleged 90%, the rules of the game have changed, and netbooks are not playing by Microsoft’s rules any more.

Things are changing. And with this, cheap, ubiquitous, multimedia network-enabled machines will become a reality. And those will be powered by free software, at last. Debian has been supporting ARM chips for a long time, Ubuntu does since release 9.04 and other mobile and embedded devices have a long history with Linux. And lets not forget about Android! Oh man, I can hardly wait to get my hands on one of these jewels!

It’s a revolution.  It’s quiet, but it’s happening.

Read the rest of this entry »

Jaunty Server on Compact Flash: running Ubuntu 9.04 on a Thin Client

Previously on UnixWars …
I said in another post that I would be using a Thin Client as my home server. The machine is fanless, diskless and it makes no noise. The system boots from a low power Compact Flash. After setting things up, some friends decided to buy the exact same machine, so I’m going to write down the steps I followed to configure the server. Please note that Ubuntu Jaunty is still in Alpha 5 stage, so you might want to rethink which release you are going to install. I’ve had no problems whatsoever, so for a setup such as mine you should be safe. Alpha 6 is due to be released shortly and the first beta will be out on March 21st. Then, just one month will separate us from the official release day, so by now I’d say Jaunty has done most of it’s homework.

The easiest thing to do would be installing the system normally through PXE (look for the link at the end of this post if you’re interested), but there is a problem. Flash devices have a limited number of write-cycles, and wear levelling is not used on consumer grade cards. There are file systems optimized for flash devices -such as JFFS2- but the benefits are not obvious here. As I understand it, these are designed for industrial devices with direct access to the memory cells. Consumer devices have an abstraction layer that make them transparent and enable us to use them as any other normal storage. It should provide everything we need, such as wear levelling. But as I said before, it normally doesn’t for consumer-type devices.

Alternatives

Having discarded the optimal solution for both technological and budget limitations, we are left with three alternatives.

  1. Use the CF as an ordinary device, with total disregard for the premature death of the drive. This is the simplest, and provided you have a lifetime warranty for your media it might not be such a bad idea.
  2. Use the CF as if it were a LiveCD, maybe even adding persistence for our changes. This option should prove perfect if we want to use the Thin Client as a desktop system. Temporary changes are written to RAM, permanent changes will be stored on disk and even software updates will remain between reboots. It is a bit more complicated, so I’ll leave this in my TO DO list for now.
  3. Install everything by hand on a local copy, configure the system to be able to run with as little disk access as possible (ideally, from a read-only root file-system), and dump it to the CF. Since space is not a problem for me (I have both a 1GB and 8GB CF cards), I’ll prefer this approach. In case everything was set as read-only, we would only  have to remount the file systems as writable during the process. As for SquashFS, though it seems an ideal choice for an embedded system that needs no upgrades, I’ll discuss this in other posts. It is simpler to deal with a “standard” system, not having to recreate binary boot images every time you update the box.

My home server is used to share media and printers on my local network, manage downloads, and above all, provide a permanent access to my home network from wherever I might be. This is, by far, the handiest thing ever. At least it is for me. You never know when you are going to need to bypass Internet filters when you are roaming, for example. The wonders of SSH truly never cease to amaze me :)

I’ve opted for letting some writing be done to the CF, albeit the write-intensive tasks have been dealt with, because it is easier for the system to be updated, and because some of the services I’ll be using refuse to work on read-only mode. However, my next entry will be about using aufs/unionfs, and replicating this setup on totally read-only and compressed system that will run from the 256MB standard CF that comes with the computer.

Click to continue reading “Jaunty Server on Compact Flash: running Ubuntu 9.04 on a Thin Client”

Read the rest of this entry »