Perl Servey 2007

3 08 2007

Perl Servey 2007 is taking place. Please take time to take part in it if you have any relation with Perl. The servey form is very short, wont take much of your time.

Funny thing is, they listed Bangladesh and Bengali in the coutry and spoken-language fields. That is really interesting, because I am the only Perl programmer I know in Bangladesh.

Here is the promotional notice:

Take part in the 2007 Perl Survey!

The Perl Survey is an attempt to capture a picture of the Perl community
in all its diversity.  No matter what sort of Perl programmer you are,
we’d love to hear from you.

The survey can be found at: http://perlsurvey.org/

It only takes about 5 minutes to complete.

The survey will be open until September 30th, 2007.  After that, we’ll be
reporting on the results and making the data freely available.

Please feel free to forward this email to any other Perl programmers
you know.

Thanks for your help!

Yours,

Kirrily “Skud” Robert
The Perl Survey
info@perlsurvey.org

Tags:

Advertisements




Do not run after misleading benchmarks

10 07 2007

I just had to find some time to write about this. This post is aimed to help anyone who feels elevated and impressed by the trendy benchmarks between programming languages and/or frameworks. Benchmarks are tools to help us judge. They have a purpose to serve. But using them to their lowest significance and publishing those results to the community does not help anybody. I’d rather say misleads the rookies. And let’s face it, there are more rookies in the industry right now than anyother time. High level MVC frameworks are starting to get in good stable shapes, enabling the developers do more with less. But the problem is doing more with knowing less is not good and neither is sustainable.

Let’s have a look at some posts that proves the point. Someone benchmarked Code Igniter, Cake PHP and Symfony in this blog. All the fuss is about, these three framework printing “Hello World”. This benchmark uses artillary to kill mosquitos, then honours and ranks the million dollar artillaries for doing that better that each others. And then (the catch) gets great appreciation too. What is the point of printing Hello World with RAD tools? Do you benchmark a sniper rifle, a rail gun and an AK-47 in an indoor fight and rank them which one is better? Aren’t they built for long range?

Some people did mention in the reaction that  a benchmarks like this should involve practical use. Like DB operations, ORM use, to find them perform what they are built to perform and compare them there. My intention is not to blame efforts. My point is, how many rookies gets distracted by these type of benchmarks? I say, many. Just read the comments. There are lot more examples. This one is at least,  amongst PHP frameworks. People are compairing cross language and cross purpose tools like that.

If you are either impressed or repelled by the arguments up to now, I suggest you to listen to this speech of brian d foy about benchmarking, given in Nordic Perl Workshop 2007 a few months ago. This is one of the most interesting speech I ever heard. He starts by saying, why you should never use benchmarking ever in your life. Well not ever, but not while your brain is off, as he later clarifies. And also advices to use profiling, which will help you more than benchmarking.

And about speed; speed is a very relative metric for judgement. It is sensible that RAD tools give away little bit of execution speed for developement speed. Symfony guys posted a good explanation in their blog to clear what is how and why. My final thoughts: there are lot of good frameworks to choose from, choose depending on your particular needs. But above all, choose with realistic expectations.

Blogged with Flock

Tags: , , , , , , , , , ,





A research on Web 2.0 webserver demographics

8 06 2007

Background: I had an interesting chat with one of my friends about what technology Web 2.0 uses and promotes. Does it break any trend or at least started to break any in the field of web servers? So for the welfare of science, technology and humanity, I decided to do a research. I can knock all the Web 2.0 web sites to learn about their web servers info and analyze to find out if there is really any significant visible trend or not. Alexa pays their researchers a lot of money to do this type of research. Anyway I am doing it for free this time.

Plan: So, I had to write a web crawler that makes a list of all the web 2.0 sites some how and knocks all of them at the head (pulls the http header only) to know what web server they use. I chose eConsultants.com to create the list. They have a list of all Web 2.0 sites like quite a few sites do, as you may know. And they are easier to crawl as they are least bloated than the others (and I cannot pull data from websites made with flash).

Research: So crawl i did. And found a list of 1269 sites in total. So you know how many Web 2.0 web sites are there. Hang on a minute! What makes them qualified as Web 2.0 sites? Lets leave that responsibility on eConsultants.com. But if you want to know what I think about Web 2.0, here you go, I found these comments in a Digg story:

Digger X: What exactly is Web 2.0? What kind of features can I expect? I keep hearing about this buzz, but I’m not sure exactly what it is.
Digger Y: web 2.0 is a new buzz word that will allow startups to get funding again if they can tag themselves as web 2.0 If your website has gradient colors and uses ajax you’re already web 2.0 baby!!!

Okey, so our mind is clear again. Here is the list of Web 2.0 websites. And another web crawler retrieved web server info from all these sites and created a Web 2.0 list with webserver info. Then a text analyzer program I wrote, made me a sorted list of all the web servers.

Result: Final list is not that big, so I can post it here:

725 (57.13%) ==> Apache
176 (13.87%) ==> Microsoft-IIS
173 (13.63%) ==> unknown
52 (4.10%) ==> Lighttpd
37 (2.92%) ==> Apache-Coyote
25 (1.97%) ==> Mongrel
10 (0.79%) ==> nginx
7 (0.55%) ==> Zope
7 (0.55%) ==> Jetty
6 (0.47%) ==> GFE/1.3
6 (0.47%) ==> LiteSpeed
6 (0.47%) ==> Resin
4 (0.32%) ==> Oversee Webserver v1.3.18
3 (0.24%) ==> GWS/2.1
2 (0.16%) ==> AOLserver/4.0.10
2 (0.16%) ==> Apache-AdvancedExtranetServer
2 (0.16%) ==> SWS
2 (0.16%) ==> Zeus
1 (0.08%) ==> Web Crossing(r)
1 (0.08%) ==> Juniper Networks NitroCache/v1.0
1 (0.08%) ==> Japache/2.2.4
1 (0.08%) ==> AZTK – dido
1 (0.08%) ==> Web Server
1 (0.08%) ==> TwistedWeb/2.2.0
1 (0.08%) ==> JoyWeb 1.0b1
1 (0.08%) ==> Server
1 (0.08%) ==> LuMriX
1 (0.08%) ==> JWS 1.2
1 (0.08%) ==> Lotus-Domino
1 (0.08%) ==> mfe
1 (0.08%) ==> netvibes.com
1 (0.08%) ==> Concealed by Juniper Networks DX
1 (0.08%) ==> Sparky
1 (0.08%) ==> Sun Java System Application Server Platform Edition
1 (0.08%) ==> Mittwald HTTPD
1 (0.08%) ==> Yaws/1.65 Yet Another Web Server
1 (0.08%) ==> igfe
1 (0.08%) ==> Phillips Data v1
1 (0.08%) ==> bsfe
1 (0.08%) ==> SimplyServer 1.0
1 (0.08%) ==> DMS/1.0.42
1 (0.08%) ==> Sun-ONE-Web-Server/6.1

You can compare it with Netcraft’s research result of all web server. I won’t claim that mine is a very accurate research (does have some rubbish data), but it showes the picture more or less. You can see that Mongrel gets a greater share here than the general list. At least those are Rails sites (does anyone use Ruby without Rails as a web platform?). I am happy to see Lighttpd (lighty) getting a spot amongst top 4. If any web server goes up in that list significantly in the near future, it will be lighty. The “unknowns” listed in 3rd place did not give any web server info in their http header. So let us assume that they have the same demographics as the visible ones.

And Apache is the leader by far, IIS being the second biggest web server in the market.

I would be happier if more info could be retrieved this way. I will definitely try to know more about the other web servers here I did not hear of before. I would also love to take part in any research project of similar sort in the future. Finally, If you have any suggestion/critique to make this research any better, or just would like to let me know your appreciation, please drop me a line.

Update: I just found that the primary list has some duplications. Too bad. I just updated the post with the new data. Changed the bad list files with new ones too.

technorati tags:, , , , , , , , , , , ,

Blogged with Flock





Perl commandline tool for reading google groups posts

3 06 2007

This is a little commandline tool I wrote with Perl to help myself read the posts in google groups. It fetches the thread urls with the latest posts
and opens them in firefox tabs. I use it from cygwin command prompt in my windows machine. You can easily customize it to help your needs. Long live open source.

Why did I create it: I use a feed reader to read the posts of the groups i subscribed from google groups. To read the posts, I click on the the links on summary/description of each post in my reader. Then I reach the page with only one post, then I click on another link to reach to the thread. I find doing that few times everyday really painful. So this one will keep my sanity intact for now.

#!/usr/bin/perl
#browse.pl

# example useage:
# browse.pl
# browse.pl perl.beginners
# browse.pl comp.unix.shell
# (comp.lang.perl.misc is the default group) 

use strict;
use warnings;
use WWW::Mechanize;

my $browser_path = '/cygdrive/c/Program\\ Files/Mozilla\\ Firefox/firefox.exe ';
my $group_name = 'comp.lang.perl.misc'; # default group. it will be used if you don't provide one as parameter
$group_name = $ARGV[0] if @ARGV; # user specified group, from commandline parameter
my $url = 'http://groups.google.com/group/'.$group_name.'/topics?gvc=2'; 
my $limit = 10; # limit number of posts to open

print "Group: [$group_name]\\n";
my $m = WWW::Mechanize->new();
print "Getting $limit links of threads...\\n";
$m->get($url);
die "oooops! could not load main page\\n" unless $m->success;
my $html = $m->content();
#print $html;

my @links = ($html =~ m{<td><a href="(/group/$group_name/browse_thread/thread/[^/]+/[^/]+)#[^/]+">}igs);
if (@links)	{
	@links = map {'http://groups.google.com' . $_} @links;
	my @links_limited = splice @links, 0, $limit;
	my $url_string = sprintf (qq/"%s"/, join q/" "/, @links_limited);
	#print $url_string."\\n";
	print "Opening browser...\\n";
	system ($browser_path . $url_string . ' &');
}else{
	print "oooops! could not load main page\\n";
}

exit 0;

Note: There are better ways to do this. Use of HTML::TreeBuilder or HTML::Parser would be more standard. But I like to write regex by hand ( 😉 actually that’s the main reason I bothered writing this script).

Click here to download the script.

technorati tags:, , , , , ,

Blogged with Flock





Language war: real stuff

23 05 2007

One of the bright side of flame wars between the languages is, they gives the community some real stuff at the end. We love it when people who really knows the ins and outs of the languages/platforms speaks up and one can actually get to know the gotchas and strengths of the languages and the tools.

One of my friends was researching for quite some times about what development medium to choose for their new top secret startup. He is a Python guy, but initially was very interested with Ruby on Rails (RoR) for speedy development and easier maintainability in mind. Then the twitter debate came up and people spoke about the scalability and other issues with RoR. Now he is thinking of going after Django or TurboGears.

And now there is another controversy (if I may call it that. or, “friendly discussion” perhaps?) going on about Perl. Bugzilla, the largest open source project that uses Perl, has announced that they might move to some other medium rather than Perl, because they think Perl is less maintainable! Then chromatic from onLamp followed up. The debate and related articles is really worth following. The comments and replies to these posts are more important for me. IMO the community is helping itself to find out the right tools for the right trades. I read for long. I think I will follow up this post with another one later to write my perceptions and findings. Readers, keep exploring!

technorati tags:, , , , , , , , ,

Blogged with Flock





Cheat sheets: Perl 6 and Perl 5

20 05 2007

To make your life easier and your wall geekier, here are the cheat sheets for Perl 6 and Perl 5:

For Perl 6: (courtesy Damian Conway)

 CONTEXTS  SIGILS             ARRAYS        HASHES
 void      $scalar   whole:   @array        %hash
 scalar    @array    slice:   @array[0, 2]  %hash{'a', 'b'}
 list      %hash     element: @array[0]     %hash{'a'}
           &sub
                    SCALAR VALUES
                    number, string, reference, undef
 REFERENCES
 \     references      @{$foo}[1]       aka $foo.[1]
 $@%&  dereference     %{$foo}{bar}     aka $foo.{bar} 
 []    anon. arrayref  @{@{$foo}[1]}[2] aka $foo.[1].[2]

 {}    anon. hashref   @{@{$foo}[1]}[2] aka $foo[1][2]
 \()   list of refs
                         NUMBERS vs STRINGS  LINKS
 OPERATOR PRECEDENCE     =          =        perl.plover.com
                         +          ~        search.cpan.org
 ++ --                   == !=      eq ne         cpan.org
 **                      < > <= >=  lt gt le ge   pm.org
 ! u^ \ u+ u- ? u~       <=>        cmp           tpj.com
 ~~ !~                                            perldoc.com
 * / % x xx              SYNTAX
 + - ~                   for    LIST { }, loop (a;b;c) { }
 .<< .>> +<< ~<< etc.    while  EXPR { }, until ( ) { }
 named uops              if     EXPR { } elsif EXPR { } else { }
 &                       unless EXPR { } elsif EXPR { } else { }
 |^                      
 < > <= >= lt gt le ge   
 == != <=> eq ne cmp     
 &&               REGEX METACHARS        REGEX MODIFIERS
 || ^^ //         ^     string begin     :i  case insens.
                  $     string end       :w  skip w/space
 ..               +     one or more      :e  each
 ?? ::            *     zero or more     
 = += -= *= etc.  ?     zero or one      
 ,                ()    capture          
 list ops         []    no capture       REGEX CHARCLASSES
 not              <[]>  character class  .  == any char
 and              |     alternation      \s == [\x20\f\t\r\n]

 or xor err       <1,2> repeat in range  \w == [A-Za-z0-9_]
                  \b    word boundary    \d == [0-9] 
                                         \S, \W and \D negate
 DO
 use strict;        DON'T            LINKS
 use warnings;      "$foo"           perl.com       
 my $var;           $$variable_name  perlmonks.org  
 open() err die $!; `$userinput`     use.perl.org   
 use Modules;       /$userinput/     perl.apache.org
                                     parrotcode.org 
 FUNCTION RETURN OBJECT ELEMENTS
 stat      localtime    caller         SPECIAL VARIABLES
  0 dev    0 second     0 package      $_    current topic
  1 ino    1 minute     1 filename     $0    regex result
  2 mode   2 hour       2 line         
  3 nlink  3 day        3 subroutine   
  4 uid    4 month-1    4 hasargs      
  5 gid    5 year-1900  5 want         $!    error object
  6 rdev   6 weekday    6 evaltext     
  7 size   7 yearday    7 is_require   
  8 atime  8 is_dst     8 hints        
  9 mtime               9 bitmask      @ARGS command line args
 10 ctime                              @INC  include paths
 11 blksz                              @_    subroutine args
 12 blcks                              %ENV  environment

This one is for Perl 5 (courtesy Juerd Waalboer)

 CONTEXTS  SIGILS             ARRAYS        HASHES
 void      $scalar   whole:   @array        %hash
 scalar    @array    slice:   @array[0, 2]  @hash{'a', 'b'}
 list      %hash     element: $array[0]     $hash{'a'}
           &sub
           *glob    SCALAR VALUES
                    number, string, reference, glob, undef
 REFERENCES
 \     references      $$foo[1]       aka $foo->[1]
 $@%&* dereference     $$foo{bar}     aka $foo->{bar}
 []    anon. arrayref  ${$$foo[1]}[2] aka $foo->[1]->[2]
 {}    anon. hashref   ${$$foo[1]}[2] aka $foo->[1][2]
 \()   list of refs
                         NUMBERS vs STRINGS  LINKS
 OPERATOR PRECEDENCE     =          =        perl.plover.com
 ->                      +          .        search.cpan.org
 ++ --                   == !=      eq ne         cpan.org
 **                      < > <= >=  lt gt le ge   pm.org
 ! ~ \ u+ u-             <=>        cmp           tpj.com
 =~ !~                                            perldoc.com
 * / % x                 SYNTAX
 + - .                   for    (LIST) { }, for (a;b;c) { }
 << >>                   while  ( ) { }, until ( ) { }
 named uops              if     ( ) { } elsif ( ) { } else { }
 < > <= >= lt gt le ge   unless ( ) { } elsif ( ) { } else { }
 == != <=> eq ne cmp     for equals foreach (ALWAYS)
 &

 | ^              REGEX METACHARS            REGEX MODIFIERS
 &&               ^     string begin         /i case insens.
 ||               $     str. end (before \n) /m line based ^$
 .. ...           +     one or more          /s . includes \n
 ?:               *     zero or more         /x ign. wh.space
 = += -= *= etc.  ?     zero or one          /g global
 , =>             {3,7} repeat in range
 list ops         ()    capture          REGEX CHARCLASSES
 not              (?:)  no capture       .  == [^\n]
 and              []    character class  \s == [\x20\f\t\r\n]
 or xor           |     alternation      \w == [A-Za-z0-9_]
                  \b    word boundary    \d == [0-9] 
                  \z    string end       \S, \W and \D negate
 DO
 use strict;        DON'T            LINKS
 use warnings;      "$foo"           perl.com       
 my $var;           $$variable_name  perlmonks.org  
 open() or die $!;  `$userinput`     use.perl.org   
 use Modules;       /$userinput/     perl.apache.org
                                     parrotcode.org 
 FUNCTION RETURN LISTS
 stat      localtime    caller         SPECIAL VARIABLES
  0 dev    0 second     0 package      $_    default variable
  1 ino    1 minute     1 filename     $0    program name
  2 mode   2 hour       2 line         $/    input separator
  3 nlink  3 day        3 subroutine   $\    output separator
  4 uid    4 month-1    4 hasargs      $|    autoflush
  5 gid    5 year-1900  5 wantarray    $!    sys/libcall error
  6 rdev   6 weekday    6 evaltext     $@    eval error
  7 size   7 yearday    7 is_require   $$    process ID
  8 atime  8 is_dst     8 hints        $.    line number
  9 mtime               9 bitmask      @ARGV command line args
 10 ctime  just use                    @INC  include paths
 11 blksz  POSIX::      3..9 only      @_    subroutine args
 12 blcks  strftime!    with EXPR      %ENV  environment





Perl one line web crawler/scraper

20 05 2007

Someone posted a Perl code snippet on snippets.dzone.com

Extract the body of an HTML document
For example, print out just the body of Google’s home page:

use LWP::UserAgent;
use HTML::TreeBuilder;
$ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(GET => 'http://www.google.com/');
my $res = $ua->request($req);
if ($res->is_success) {
my $tree = HTML::TreeBuilder->new_from_content($res->content);
$tree->elementify();
my $body = $tree->find('body');
foreach $e ($body->content_list())
{
print $e->as_HTML();
}
}

My shorter, one-liner, command-line version:

perl -MLWP::Simple -e ' $html = get "http://www.google.com"; $html =~ s{.*?(<body.*</body>).*}{$1}is; print $html;'

This is an example of the diversity of Perl. You can solve same problem multiple ways, whichever suits your need. Also shows why Perl is regarded as the number one tool for writing crawlers, text processing, prototyping etc.