Archive for the ‘Python’ Category

epi_sort.py: Filename comparison

As every Joe Six-pack would do, I usually write a lot of scripts to automate my tasks as much as I can. Most of them aren’t even worth mentioning, but nevertheless I have been meaning to start posting some of those. I’ve stumbled upon lots of jewels on the net that seemed worthless to their authors, so if any one gets to use one of mine I’ll be happy. You never know.

The problem:

I have a directory full of unclassified media files, some are duplicates, some aren’t, and each one follows a different naming convention.

I even try to classify them from time to time, so you can throw some directories into the pack. Sometimes, I even create two or three directories for the same group-series-category-whatever before I realize there is an existing one with a slightly different name. And frequently a lot of files remain unclassified, many of which could fit into one of the directories I mentioned.

Of course … whenever a new file arrives to my home server, it gets thrown into that very same directory, so Chaos keeps spreading, as it always does.

To clarify things, lets show an example:

drwxrwxrwx 1 user user      4096 2010-07-03 11:50 01_Battlestar_Enterprise
drwxrwxrwx 1 user user      4096 2010-07-03 11:42 02_Startrek_Galactica
drwxrwxrwx 1 user user      4096 2010-07-03 11:50 03_battlestar.enterprise-season.1
-rwxrwxrwx 1 user user 220393472 2010-07-03 02:49 battlestar.enterprise.s1e01.avi
-rwxrwxrwx 1 user user 221227008 2010-07-03 02:50 Battlestar_Enterprise_1_22.mp4
-rwxrwxrwx 1 user user 195393472 2010-07-03 02:49 startrek.galactica.4x15.[ripper_22].mkv

As you can imagine, sorting things up can get really tedious, and there is no automatic way of doing it that I know of.

I had some time this morning and got fed up with it. Every little piece of help is more than welcome, and here is where Python comes to the rescue.

The solution:

There are dozens of ways to do this, but I ended up coding a quick hack to help me sort things out.
It just compares the names of files and directories, and estimates the similarities. Anything above a 50% match is usually correctly estimated.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# (C) 2010, Taher Shihadeh
# Licensed: GPL v2

"""
The script works based only on names of files and directories in a
non-recursive manner.

It takes a path as parameter and tries to determine if the names of
the contents look alike.

It removes separator characters, numbers and file extensions prior to
the comparison.
"""

import os
import sys
import string
from operator import itemgetter

FAST = False # Change this to skip file-to-file comparisons
SEP  = '_-+~.·:;·()[]¡!¿?<>'

def main (path):
    lst1    = os.listdir (path)
    lst2    = lst1
    len_lst = len(lst1)
    count   = 0.0
    results = []

    for x in lst1:
        for y in lst2:
            if x==y:
                continue
            x_dir = os.path.isdir(x)
            y_dir = os.path.isdir(y)

            if FAST and not (x_dir or y_dir):
                continue

            result = {'A': (x, x_dir), 'B': (y, y_dir)}

            str1, str2 = x, y
            if not x_dir:
                str1,_ = os.path.splitext (x)
            if not y_dir:
                str2,_ = os.path.splitext (y)

            result['factor'] = compare (str1,str2)
            results.append(result)

        lst2.remove(x)
        count += 1
        print >> sys.stderr, '%.2f%% done' %((count / len_lst)*200)

    show(results)

def split (str1):
    trans = string.maketrans(SEP, ' '*len(SEP))
    return str1.translate(trans).split()

def clean (lst):
    assert type(lst) == list
    return filter(lambda x: not x.isdigit(), lst)

def compare (str1, str2):
    """Return similarity factor as percentage"""
    aux1 = clean (split (str1.lower()))
    aux2 = clean (split (str2.lower()))

    set_or  = set(aux1) | set(aux2)
    set_and = set(aux1) & set(aux2)

    return (float(len(set_and)) / float(len(set_or)))*100

def show (results):
    """Show most similar last"""
    for x in sorted(results, key=itemgetter('factor')):
        a,b = x['A'],x['B']
        if not b[1] and a[1]:
            a,b = b,a
        print '%.2f \t %s \t --> %s' %(x['factor'], a[0], b[0])

if __name__=='__main__':
    try:
        path = sys.argv[1]
    except IndexError:
        path = os.getcwd()

    main (path)

I don’t think any one is going to use it, but what the hell. It’s a big Internet ;-)

Read the rest of this entry »

Cherokee v0.98: Jailbreak

It has been a while since my last post. Some rough month this has been, oh boy. Anyway, I wanted to let you know that we have released the new and improved Cherokee a couple of days ago.

Since the project has advanced so much over the last months, we have decided to boost the release closer to the 1.0 milestone. Many things have been improved, in stability, features  and performance. The Windows build has received some attention, and though it still has a lot of issues, Stefan de Konik has built a beta Windows package for people to try it out and help us sort out the rest of the problems. Great job as always, Stefan! The admin part is still not running under Windows, but you can always create the necessary config files on another environment and try out the Windows binaries.

In this release the caching mechanisms have been fixed, the web server can now be bound to multiple IPs and ports at the same time, a new balancing strategy has been added (so sticky sessions can now be implemented, for instance) …

To find out more about it, read the official release note for 0.98

Cherokee Webserver

Try it out. Cherokee is the fastest web server there is right now.

Her you have some links:

Read the rest of this entry »

Cherokee on steroids: v0.11 now with reverse proxy

Yesterday we reached yet another milestone in Cherokee’s development towards World Domination. After almost a month of hard work, our newest creation hit the streets. The official anouncement hasn’t even been made yet, but Cherokee 0.11.1 is out in the wild.

Besides our regular bug fixes and performance enhancements, it is shipped with some new features. SSI support was being requested every now and then, the SSL infrastructure has been reworked and the new reverse proxy is working flawlessly. The Windows build is not a reality yet, but great advances have been made towards that end. Cherokee is currently working under Windows, but the admin is not. And it has to be cross compiled, for now. Anyway, we’re one step closer to releasing a binary Windows build ;)

Cherokee Webserver

We have a lot of fresh ideas, and as always feedback and feature requests are more than welcome at the mailing lists. Here are links to download and read the online documentation with tons of new information and recipes. Enjoy! ;)

UPDATE: Link to the official anouncement.

Read the rest of this entry »

Python can do a lot of things

But it is not omnipotent. Leave that to Q. I stumbled upon this funny answer from back in 2005 and I had to share it. That guy is a genius! ;-)

I’m curious how I can make Python print text in color.

That depends strictly on your printer. With my hp LaserJet 1200, no way — not even Python’s power can overcome the hardware’s limitations in this regard… it’s a black-and-white printer and that’s all there is to it! If I did have a color printer, then I would have Python produce the appropriate postscript code, or “escape-sequences” in whatever printer-specific language a given printer requires to have it output color text (or, depending on my operating system, printer driver, filters, etc, I might have to send appropriate “escape-sequences” or whatever to the DRIVER in order to convince it to drive the printer appropriately).

Read the rest of this entry »

The family keeps growing

I posted a note about this at Cherokee’s main site, but I totally forgot to tell you over here. My bad.

However, it’s never late to share good news. A couple of days ago Cherokee’s family incorporated a Polish Cherokee Community as new member!

And they took a huge leap forward by creating Cherokee Polska, which isn’t just another site about Cherokee. So far they’ve translated everything, documentation included. And the effort has been worth it. They received thousands of unique visitors on their first day, and that is only the beginning.

From here, I wish to extend my most sincere congratulations to these guys. Great work!

Since Cherokee has been steadily getting better and better, I expect more localization initiatives will keep popping up. We’ll see it in time. That’s a given.

Read the rest of this entry »

New day, new release: Cherokee 0.10

We’ve been really busy lately. After my adventures in Venezuela at the Infociencias and the Open Source World Conference 2008 in Málaga -I know, I know… I still have to talk about that and post some photos, but the days are not long enough!-, we’ve finally made the time to polish some last details.

Today, Cherokee 0.10 has been born! Even if you’re not into this FOSS World thingie, you should know that this is the fastest web server out there!

Cherokee Webserver

As always, stability and performance have improved, some bugs have been fixed and new features are available. Lately our MySQL load balancer module has been attracting a lot of attention. Download Cherokee and follow the cookbook to give it a try.

As always, here yo have the list of relevant links:

Handle with care: This baby is a heavy hitter by its own merits! ;)

Read the rest of this entry »

Cherokee 0.8.0 “Hard as a rock” released!

The day has finally arrived. After a lot of hard work, we are finally releasing 0.8. It has improved quite a lot in this time. It is faster, much more stable and has been thoroughly tested and documented, at last!

Unfortunately not everything are good news. After putting in a lot of effort fixing the Windows build, we finally decided to postpone this until 0.8.1 the next major release. It has been too long since the last release, and having so many improvements it doesn’t make much sense to hold the relase back just to offer it simultaneously to all platforms. This was the only thing holding us back beside some bugs that had to be fixed, so now this is our one big remaining task for the next release ;)

Cherokee Webserver

This is our best release ever. By far. Improved performance, interface and documentation enhancements and lots of new features: much faster I/O cache, huge FastCGI performance improvement, updates (and binary upgrades) are now handled gracefully with no downtime, the load balancing is better and a lot more. Alvaro just sent the official release note minutes ago.

We have a lot of fresh ideas, and as always feedback and feature requests are more than welcome at the mailing lists. Here is the download link. Enjoy it! ;)

UPDATE: I’ve just updated the documentation available at the site.

UPDATE: A quick update to fix some minor bugs has been released: Cherokee 0.8.1.

Read the rest of this entry »

Cherokee Quickstart

Many users have told us that they would love to have some more documentation about Cherokee. One of the tasks before relasing 0.8 (which is almost ready by now) is documenting.

Yesterday I wrote a small tutorial that will be part of the documentation. It is a simple walkthrough to set up a couple of virtual servers, basic authentication (PAM and flat) and some redirections.

It will be available at the official site as soon as we make the release, at http://cherokee-project.com/doc

Here it is for now. No screenshots and not much styling in my blog, sorry. It’s just a  quick cut&paste. There’s a lot of other stuff I should be documenting instead of blogging ;)

Configuration Quickstart

This section briefly describes the whole administration web interface provided by cherokee-admin. This is the only recommended way of configuring Cherokee. If you are looking for development information, you should refer to the appropriate section, especially cherokee.conf file specification.

We will first show a quick overview of the available options, followed by a simple walkthrough. You can learn more about the options in their specific documentation entries.

Click to continue reading “Cherokee Quickstart”

Read the rest of this entry »

Cherokee on Windows: improving the building environment

As it was anounced some time ago, Cherokee 0.8 will once again have a native Windows binary. We’ve been having a lot of requests because our Windows users haven’t had the chance to taste Cherokee-Admin since it was born.

Beware that the Windows build has to be taken with a grain of salt under Windows. A lot of work is still needed since some major changes -like a totally rewritten I/O cache, a lot more efficient and stable- will be coming by the time 0.8 is released.

Cherokee Webserver

These are the necessary steps to setup a suitable building environment.

Like Alvaro said in his blog, installing the whole bundle of needed tools is not trivial. In fact, there was a strange problem with the provided autotools (automake 1.8.2 and autoconf 2.59) of the previous environment that made us have to manually tweak things in order to successfuly finish the compilation of Cherokee. This has been tested on a Windows XP virtual machine.

This is what you need to install.

Click to continue reading “Cherokee on Windows: improving the building environment”

Read the rest of this entry »

Cherokee on Windows

Our next release of Cherokee, 0.8.0, will once again have a native Windows package.
A few moments ago it was officially announced at the mailing list.

We had plans to finally fix it in the very near future, but Alvaro decided to speed things up a bit. This guy is amazing! ;)

I’ve been rather busy these days and haven’t been paying all the attention to the project I would have wanted, but everything started moving on Windows’ side of things just a couple of days ago. I really didn’t expect it to be ready this soon, but here it is: Cherokee Windows Build. The development branch already compiles and works on Microsoft’s OS. Just check out the latest SVN version and give it a try.

Alternatively if you can live without Windows and want something more stable, you can just download the latest official release, Cherokee 0.7.1 Cherokee 0.7.2 (as of June 12th).

Read the rest of this entry »