July 11, 2008

autobox and ref()

The Perl builtin function ref() “returns a non-empty string if EXPR is a reference, the empty string otherwise”.

ref() is implemented in autobox::Core, but only for ARRAY, HASH and CODE data types. This means that the method call will fail when invoked on a scalar or undef:

use autobox::Core;

my($foo) = ['foo'];
$foo->ref->concat("\n")->print;
#prints 'ARRAY'
my($x) = undef;
$x->ref->concat("\n")->print;
# Fails with 'Can't call method "ref" on an undefined value'

There are two ways we can fix this: either we can use base autobox behaviour and define our own classes to handle SCALAR and UNDEF types, or we can hack the autobox::Core source with a view to submitting a patch to the author.

Let’s look at option number one first:

We define packages to hold our SCALAR and UNDEF implementations:

package MyScalarRef;
use strict;

sub ref ($) { CORE::ref( shift ); }

1;

package MyUndefRef;
use strict;

sub ref ($) { CORE::ref( shift ); }

1;

package main;
use strict;
use autobox SCALAR => 'MyScalarRef'
               , UNDEF   => 'MyUndefRef' ;
use autobox::Core;

my($foo) = ['foo'];
$foo->ref->concat("\n")->print;
#prints 'ARRAY as before
my($x) = undef;
$x->ref->concat("\n")->print;
# Now prints empty string.

(Note that, since the two packages are implemented identically, we could have pointed both data types at the same namespace. I’ve left them separate for clarity of explanation).

This is fine, and works as advertised, but it’s cumbersome to do this on the off-chance that we might want to call ref() on something other than an array, hash or coderef. A more elegant solution is to patch the autobox::Core source. A diff with autobox-Core-0.6 looks like:

@@ -498,7 +498,17 @@

 =cut

+#
+# UNDEF
+#
+
+package autobox::Core::UNDEF;

+#      Currently only "ref"
+#
+sub ref ($) { CORE::ref($_[0]); }
+#
+#
 #
 # SCALAR
 #
@@ -508,7 +518,7 @@
 #       Functions for SCALARs or strings
 #          "chomp", "chop", "chr", "crypt", "hex", "index", "lc",
 #           "lcfirst", "length", "oct", "ord", "pack",
-#           "q/STRING/", "qq/STRING/", "reverse", "rindex",
+#           "q/STRING/", "qq/STRING/", "ref", "reverse", "rindex",
 #           "sprintf", "substr", "tr///", "uc", "ucfirst", "y///"

 # current doesn't handle scalar references - get can't call method chomp on unblessed reference etc when i try to support it
@@ -523,6 +533,7 @@
 sub length  ($)   { CORE::length($_[0]); }
 sub ord     ($)   { CORE::ord($_[0]); }
 sub pack    ($;@) { CORE::pack(@_); }
+sub ref     ($)   { CORE::ref($_[0]); }
 sub reverse ($)   { CORE::reverse($_[0]); }
 sub rindex  ($@)  { CORE::rindex($_[0], $_[1], @_[2.. $#_]); }
 sub sprintf ($@)  { CORE::sprintf($_[0], $_[1], @_[2.. $#_]); }

i.e. we have added a call to CORE::ref in the existing SCALAR package, and added a new UNDEF package which currently contains only our ref() implementation.

How does this perform?

package main;
use strict;
use autobox::Core;

my($foo) = ['foo'];
$foo->ref->concat("\n")->print;
#prints 'ARRAY'
my($x) = undef;
$x->ref->concat("\n")->print;
## Fails with 'Can't call method "ref" on an undefined value'

So what has gone wrong here?

autobox::Core over-rides the import() method to associate data types with a corresponding namespace that provides the implementation. As of version 0.6, it looks like this:

sub import {
    shift->SUPER::import(DEFAULT => 'autobox::Core::', @_);
}

This means that the autobox DEFAULT data types are associated with autobox:Core: these are SCALAR, ARRAY, HASH and CODE. Therefore, by default, our UNDEF definition is never actually associated with autobox::Core and hence never actually called. We need a further modification to import():

sub import {
    shift->SUPER::import(DEFAULT => 'autobox::Core::'
                                ,UNDEF=>'autobox::Core::'
                                ,@_);
}

This now works as expected:

package main;
use strict;
use autobox::Core;

my($foo) = ['foo'];
$foo->ref->concat("\n")->print;
#prints 'ARRAY as before
my($x) = undef;
$x->ref->concat("\n")->print;
# Now prints empty string.

July 9, 2008

Frustrated by autobox

I am becoming increasingly fond of autobox, and particularly Scott Walters’ extension for the core Perl built-in functions. It will probably appear in the vites-3.0 core if and when we adopt Moose as our standard OO framework.

It brings to Perl syntax the sort of everything-is-an-object convenience and expressiveness that Rubyists and Smalltalkers get out of the box, e.g. you can do:

print [10, 20, 30, 40, 50]->pop(), "\n";
# prints 50

rather than the orthodox:

my(@list) = (10, 20, 30, 40, 50);
my($val) = pop(@list);
print $val, "\n";

You can also represent complex expressions in a more natural order (which also gives you a certain amount of self-documentation). As the autobox::Core documentation explains:

# Perl 5 - a somewhat complex expression
print join("\n", map { CGI::param($_) } @cgi_vars), "\n";

becomes

@cgi_vars->map(sub { CGI::param($_[0]) })  # turn CGI arg names into values
           ->join("\n")                      # join with newlines
           ->concat("\n")                    # give it a trailing newline
           ->print;                          # print them all out

It’s obviously longer than the orthodox idiom, but there is greater clarity about the order in which the operations take place.

Alas, not everything in the garden is rosy. The autobox implementation of push() tickles an obscure Perl bug, with the consequence that you cannot pass an array to push(), e.g.:

my(@s) = ();
my(@aa) = (1,2);
@s->push( @aa );
# Bizarre copy of ARRAY in push at /usr/lib/perl5/site_perl/5.8.8/autobox/Core.pm line 710.
#
# It works with scalars:
for my $i ( 1,2,3 ) {
    @s->push( $i );
}
# @s is (1,2,3)

You can workaround this with map():

my(@aa) = (1,2);
@s->push( map { $_ } @aa );
# @s is now ( 1, 2 )

but this is an ugly mixture of old and new syntax. Can we use autoboxed map()?

Well, by default, map() in autobox returns an arrayref; we can’t use the @{[ .. ]} ref-deref trick (because this triggers the bug above), so we are snookered. However, we can use the flatten() method exported by autobox::Core (which does the equivalent of @{ $array_ref }).

(Note that the elements() method is identical to flatten. This isn’t mentioned in the docs anywhere, though elements() is used in some examples).

my(@aa) = (1,2);
@s->push( @aa->flatten );
# @s is now ( 1, 2 )

Is this better than push( @s, @aa )? Not really.

An alternative workaround is to use autoboxed foreach() method and turn the expression inside-out:

my(@aa) = (1 ..3 );
@aa->foreach( sub { @s->push($_[0]) } );
# @s is now (1,2,3);

but the need for the anonymous subroutine and the use of the $_[0] notation means that this is no easier on the eye.

As an aside, the autoboxed map() supports two invocation styles:

Called on a subroutine reference, it takes the list to be mapped as an argument:

my($mapped_arrayref) = sub { return somefunc($_[0]) }->map( @somelist );

and called on an array, it takes a subroutine reference as an argument:

my($mapped_arrayref) = @list->map( sub { return somefunc($_[0]) } );


	

July 3, 2008

Code, or comment?

While writing regression tests for the VITES core modules, I came upon the following comment, referring to a data structure used internally as part of the process which creates indexes and key-word searches for profile templates:

# will return a hash of sub dirs in profile and default profile
          'profilea3' => [
                           'profilea2',
                           'profilea1',
                           'profilea',
                           'default'
                         ],
          'profileb1' => [
                           'profileb',
                           'default'
                         ],
          'default' => [
                         'default'
                       ]
         'default'

Now, that looks wrong to me; for one thing, I don’t see where that trailing ‘default’ fits into a supposed hash-based data structure. The only thing to do is to have a peek at the data while we run the code and see what it actually looks like.

This is harder than it sounds; the data structure is contained within a package-scoped lexical variable to which no external interface is provided. I’m reluctant to modify the source code with debugging statements - we’re supposed to be writing regression test, after all - so I resort to one of the scarier Perl modules available from the CPAN: PadWalker.

According to its description, PadWalker allows you to “inspect (and even change!) lexical variables in any subroutine which called you”.

We’ll use the peek_sub function to capture the variables in scope when we execute our profile structuring routine:

my($s) = Site::Search::SearchUtils->new();
# .. some set-up happens here ...
my($pro) = \&Site::Search::SearchUtils::get_profile_structure;
$s->get_profile_structure;
my($h) = peek_sub( $pro );

require Data::Dumper;
print( Data::Dumper::Dumper( $h->{'$h_all_base'} ) );

The output from this shows that the data structure actually ends up looking like this:


$VAR1 = \{
'subtest' => [
'default',
'test',
'subtest'
],
'default' => [
'default',
'default'
]
};

Since this data structure lists the places where we will be looking in order to index templates for the associated profiles, this output makes more sense.

However, that repeated ‘default’ still looks wrong. A bit of work with the code reveals that this is caused in the part that sorts the profile names: it pushes the name of the current profile onto the sorted array, and takes no special action for the ‘default’ profile.

I’ll be addressing that, and the broken comment, when I’ve completed the regression tests.

June 12, 2008

Writing Tests For Legacy Code

The VITES codebase has come a long way in a relatively short time. It’s a remarkably sound body of work, but, like most “organically grown” production code I’ve encountered in my career, there are no meaningful tests in existence. This makes changing the codebase somewhat fraught.

So, after a bit of thought, and several more cups of coffee, I came up with a script that gives me a running start at a test suite. I called it test_skeleton.pl.

It’s very simple: you give it a base directory containing a load of Perl modules that need tests, and it creates a test script that tests:

  1. use_ok: can I import the module?
  2. can_ok: can I call the expected functions/methods?
  3. isa_ok: can I instantiate the class (if this is an OO module)?

For example, to generate a test script, e.g. for the vites-2.0 codebase:

[stephenc@localhost ~] cvs co vites-2.0
....
[stephenc@localhost ~] cd vites-2.0
[stephenc@localhost vites-2.0] find . -name \*\.pm -print | perl test_skeleton.pl > t/001.t
[stephenc@localhost vites-2.0] prove -v -Icgi-lib -Icgi-lib/ext t/001.t
... copious test output generated ..

Assuming your tests pass, you would at this point commit the test script to CVS and use it as a basis for more detailed testing, e.g. actual function outputs on given inputs.

The test skeleton generator uses a load of regex matching to extract the subroutines from the modules, but it’s easily defeated by anything that’s not totally vanilla, e.g prototypes, attributes, closures, anything dynamically-generated, etc. There’s no real need for this with the current VITES codebase, but it could easily be hacked in as required. On the other hand, it does understand multiple packages in a single file. Which is nice.

Since I was experimenting with autobox and Readonly at the time, the script uses both of these. It would be fairly easy to hack these out if they’re a bit too bleading-edge for your liking, however. Also, because this was essentially a quick hack, there’s very little in the way of comments. Sorry about that.

Have fun.

test_skeleton.pl