Latest Publications

So… AI-class and ML-class are over

Some of you know that I’ve been taking the on line Artificial Intelligence and  Machine Learning courses offered by Stanford this semester. I decided to enroll in both out of curiosity, and to enjoy the privilege of being taught by World renowned researchers. Also, I wanted to take part of an experiment in teaching to a massive amount of people (tens of thousands actively involved students, and then some). To be frank, I didn’t enjoy AI much at my university, and I never had a formal introduction to ML (not at this level, at least). AI-class has been taught by Sebastian Thrun and Peter Norvig, and ML-class by Andrew Ng.

Today both courses will be officially over. In fact, both the final exam of AI and final review questions of ML are due tonight. It’s been 10 weeks of thrill, and I’ve enjoyed both courses tremendously. The dedication, generosity, and enthusiasm displayed by all the three professors has been inspiring, and from here I want to give my most heartfelt thanks to everyone involved in making the courses a reality.

Concretely (-grin-), I want to express the utmost respect for the professors involved. I tip my hat to you, Sirs. Stanford Engineering To anyone interested in knowing about the subject, I strongly recommend taking a look at the links provided for the courses. In fact, the Machine Learning course is scheduled to be repeated in January, and I have no doubt that the Artificial Intelligence one will also have some form of continuation eventually. In my opinion, this  Stanford School of Engineering Initiative makes the world a better place. In particular, the final session of Office Hours has made me realize that I’ve come to feel fully involved with the courses and instructors. Really, quite moving. I did have some good professors back in the old days, both in Med School and Computer Science, but rarely have I grown so fond of  an instructor. I’d really love the chance to meet them face to face simply to thank them in person.

I’ve learnt a lot of stuff, I’ve had a lot of fun, and I’m already enrolled in several of  the new courses that will start in 2012. There’s a lot to choose from: Natural Language Processing, Probabilistic Graphical Models, and a long list of endless instructional joy. Additionally, a bunch of earlier (and equally interesting) courses are available through Stanford Engineering Everywhere and MIT Open Courseware. If you feel like learning new stuff, I wholeheartedly recommend them. If the experience is anywhere similar to the one I’ve had, you’ll be amazed!

UPDATE: I sent an appreciation mail to each of the professors involved, and it turns out professor Norvig actually took the time to reply. Nothing short of astonishing given the huge amount of such emails they must have received. As I said, amazing experience, amazing courses, and amazing professors. Totally recommended.

Spanish Democracy: WIP

This is a non-tech entry. Sorry. but every now and then I feel the need to spit one of these rants. You have been warned.

By now everyone knows what has been happening in Spain for the last couple of weeks, the so called #spanishrevolution. Basically, everyone is fed up with politicians governing according to the markets, and against the needs of the people. An electoral reform is needed to prevent power simply switching back and forth between two parties, none of which is seen as actually caring for the well-being of the citizens. Personally I’m amazed that only about one hundred candidates of our latest elections were under court investigation for political corruption charges, given the situation we currently face. By the way, most of such candidates belong to the party that has been the clear winner of these elections, which should speak by itself about how adequate our electoral system is/isn’t.

Add to it official rulings against civilized protests, violent police behavior against peaceful demonstrators, an the absolute certainty of not being taken into account, and you can imagine why everybody has been so pissed off.

Here’s a video to show how much work remains to be done. This was shot in Barcelona on May 27, when peaceful protesters were violently dislodged. I saw it through Meneame. You can skip to 2:05 to see Spanish Law Enforcement at it’s finest. Sensitive people should skip it all together.

Server migration

Some of you might have noticed the site has been down lately.
The old server was running a bunch of things, some being much more critical than a personal blog, so to prevent further problems I simply decided to turn it down for a couple of days.

As was to be expected, I didn’t have the time to migrate things immediately, so it took a little while longer. I’ve been making some time now and then, and I think everything should be ready by now.
Thanks for your patience.

I’ve migrated the blog to a new server and will be moving some more domains in the following days. It’s not really powerful machine, but Cherokee isn’t resource hungry anyway and everything seems to be running smoothly.

If you happen to find anything broken, please let me now.

Cherokee 1.2: batteries included

Today we’ve released Cherokee 1.2. There’s been a big bump in everything, and the shiny new version number is by far the least improved feature of all.

With this version we’ve also launched the Cherokee Market, which is totally integrated with Cherokee-Admin. The change is much more than superficial. Besides having a new section in Cherokee-admin, this is going to change the way that people interact with their webserver infrastructures.

Remote services such as saving and restoring the webserver configuration are now enabled by default, and getting real job done has never been easier. Through Cherokee Market, downloading and deploying Web applications on your Cherokee servers has never been easier. In fact, with only a couple of clicks and a few seconds, you’ll have a completely functional and highly optimized environment at your disposal.

It was the natural way to go for the configuration wizards we’ve been enjoying for the last couple of years, and I’m sure that in the near future everyone will wonder how could they have ever lived without this.

Take a look at the quickstart video. It shows the installation of the Cherokee Web Server on MacOS X, and deployment of Drupal 7, all in a matter of seconds. I don’t really think it can get much easier than that.

The long wait is over: Cherokee 1.0.9 is out!

Like one of my all time favorite fiction characters would say: Once again we meet at last.

It has been a while since I last announced one of our releases, but it has been a long time since the last one, so this one deserves some credit. We’ve been really busy implementing a lot of stuff that is really going to make life easier for all the guys out there that are using web servers. We are not done yet, and this one is a maintenance release that doesn’t really show most of what we’ve been doing. It doesn’t make sense to do so until we’re ready. Nevertheless, it adds lots and lots of improvements, and some bug fixes.

As always, a lot of development effort is being invested in our flag product, and this is something that doesn’t go by unnoticed. And just in case you haven’t noticed, take a look at  Cherokee 1.0.9 (and be amazed) ;-)

Cherokee Webserver

I hope you enjoy it. Feedback and feature requests are more than welcome at the mailing lists. Here are links to download the tarball and the online documentation.

dirsort: Directory sorter

This one is a grown up version of epi_sort.py, a simple script I wrote a while ago to help me sort a huge bunch of media files I had laying around. I recently suffered some major data loss on an external hard drive full of videos. After recovering most of the contents, the directory structure was completely lost. I’ve managed to organize things pretty decently thanks to the script.

I have named it episort. Not very original, but it has been a lifesaver so far.
It is much faster and accurate than before, by the way.

epi_sort.py: Filename comparison

As every Joe Six-pack would do, I usually write a lot of scripts to automate my tasks as much as I can. Most of them aren’t even worth mentioning, but nevertheless I have been meaning to start posting some of those. I’ve stumbled upon lots of jewels on the net that seemed worthless to their authors, so if any one gets to use one of mine I’ll be happy. You never know.

The problem:

I have a directory full of unclassified media files, some are duplicates, some aren’t, and each one follows a different naming convention.

I even try to classify them from time to time, so you can throw some directories into the pack. Sometimes, I even create two or three directories for the same group-series-category-whatever before I realize there is an existing one with a slightly different name. And frequently a lot of files remain unclassified, many of which could fit into one of the directories I mentioned.

Of course … whenever a new file arrives to my home server, it gets thrown into that very same directory, so Chaos keeps spreading, as it always does.

To clarify things, lets show an example:

drwxrwxrwx 1 user user      4096 2010-07-03 11:50 01_Battlestar_Enterprise
drwxrwxrwx 1 user user      4096 2010-07-03 11:42 02_Startrek_Galactica
drwxrwxrwx 1 user user      4096 2010-07-03 11:50 03_battlestar.enterprise-season.1
-rwxrwxrwx 1 user user 220393472 2010-07-03 02:49 battlestar.enterprise.s1e01.avi
-rwxrwxrwx 1 user user 221227008 2010-07-03 02:50 Battlestar_Enterprise_1_22.mp4
-rwxrwxrwx 1 user user 195393472 2010-07-03 02:49 startrek.galactica.4x15.[ripper_22].mkv

As you can imagine, sorting things up can get really tedious, and there is no automatic way of doing it that I know of.

I had some time this morning and got fed up with it. Every little piece of help is more than welcome, and here is where Python comes to the rescue.

The solution:

There are dozens of ways to do this, but I ended up coding a quick hack to help me sort things out.
It just compares the names of files and directories, and estimates the similarities. Anything above a 50% match is usually correctly estimated.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# (C) 2010, Taher Shihadeh 
# Licensed: GPL v2

"""
The script works based only on names of files and directories in a
non-recursive manner.

It takes a path as parameter and tries to determine if the names of
the contents look alike.

It removes separator characters, numbers and file extensions prior to
the comparison.
"""

import os
import sys
import string
from operator import itemgetter

FAST = False # Change this to skip file-to-file comparisons
SEP  = '_-+~.·:;·()[]¡!¿?<>'

def main (path):
    lst1    = os.listdir (path)
    lst2    = lst1
    len_lst = len(lst1)
    count   = 0.0
    results = []

    for x in lst1:
        for y in lst2:
            if x==y:
                continue
            x_dir = os.path.isdir(x)
            y_dir = os.path.isdir(y)

            if FAST and not (x_dir or y_dir):
                continue

            result = {'A': (x, x_dir), 'B': (y, y_dir)}

            str1, str2 = x, y
            if not x_dir:
                str1,_ = os.path.splitext (x)
            if not y_dir:
                str2,_ = os.path.splitext (y)

            result['factor'] = compare (str1,str2)
            results.append(result)

        lst2.remove(x)
        count += 1
        print >> sys.stderr, '%.2f%% done' %((count / len_lst)*200)

    show(results)

def split (str1):
    trans = string.maketrans(SEP, ' '*len(SEP))
    return str1.translate(trans).split()

def clean (lst):
    assert type(lst) == list
    return filter(lambda x: not x.isdigit(), lst)

def compare (str1, str2):
    """Return similarity factor as percentage"""
    aux1 = clean (split (str1.lower()))
    aux2 = clean (split (str2.lower()))

    set_or  = set(aux1) | set(aux2)
    set_and = set(aux1) & set(aux2)

    return (float(len(set_and)) / float(len(set_or)))*100

def show (results):
    """Show most similar last"""
    for x in sorted(results, key=itemgetter('factor')):
        a,b = x['A'],x['B']
        if not b[1] and a[1]:
            a,b = b,a
        print '%.2f \t %s \t --> %s' %(x['factor'], a[0], b[0])

if __name__=='__main__':
    try:
        path = sys.argv[1]
    except IndexError:
        path = os.getcwd()

    main (path)

I don’t think any one is going to use it, but what the hell. It’s a big Internet ;-)

Marketing budgets

This entry is by no means technical, but it shows perfectly the vast difference in budget available to two very well known companies: Google and Opera.

I just stumbled upon these two videos. One of them is almost three months old. Although it doesn’t prove much, it is quite spectacular with its fancy high speed camera at 2700 shots per second.

Chrome versus Potato

The other one … well… it isn’t as spectacular as Chrome’s. Seriously, it isn’t. But …. OMG!!! This is genius. Exactly as scientific as the first one. Not as visually appealing. But tomorrow morning I’ll still be laughing.

Opera versus Potato (Parody)

Cherokee Summit 2010: Mission accomplished

We’ve been working in frenzy since last week. Not that we usually don’t, but this was something more. The Cherokee Summit just took place last weekend, and among other things we released our latest and greatest Cherokee v1.0, we defined the roadmap for v2.0, we shared knowledge with some of the most impressive experts in High Availability I’ve ever met, and above all, we had the chance to meet face to face. Our Community is, without a doubt, stronger than ever. The summit has been a great success. We had people attending from all over the World, all levels of expertise, and even from all ages. On this photo you can see Alvaro and the youngest attendee.

Everything was recorded, so we will upload the slides and videos of all our sessions really soon. For now, only the photo gallery is available. Take a look at the mugshots.

I’m really glad we could make this Summit. It surpased all my expectations. By far. It was an unbelievable experience, and we had lots of fun. Take a look at our family photo. If you want to know which of the guys above is me, here’s a clue: “In brightest day…”.

I’m really looking forward to the next summit. Cherokee Summit 2010 was awesome. I’m sure the next one will be even better.

Countdown to Cherokee Summit 2010

Only one more week to go!
I’m going to remind you all about the first Cherokee Summit. It will be held next week in Madrid (7-8 May), and I’m really excited about it. We will release Cherokee 1.0, will rub shoulders with many members of our community, and we’ll define the road-map for Cherokee 2.0. I’ll be giving a tech-talk along Jonathan Hernandez, so you know when and where to find me.

I’m sure that meeting many of the developers of Cherokee in person will be the highlight for me.

If by any chance you’ll be in Madrid that weekend, don’t forget to register in time and join us.

It’s gonna be legen… wait for it… dary!