My Diversions

March 27, 2008

A CAL webapp with persistent data using GWT, STM and BDB

Filed under: CAL and Open Quark, Computer Science, Java — Tom Davies @ 10:24 pm

aka, attack of the TLAs.

This webapp’s architecture is depicted below:

Webapp Architecture

Any data structure can be built on top of TVars — and each TVar is a mutable reference, these are not functional data structures.

In this application a simple hashmap is used. Skiplists and relaxed balance btrees are other data structures which might allow reasonable concurrency too, while also providing features like in-order traversal.

This illustrates the relationship between the CAL objects in the CAL ExecutionContext and key-value pairs in the BDB:

cal-gwt-stm-persistence.png

The root of the persistent data structure is a TVar with a ‘well-known’ id — 1 in the example, which is created by a constant applicative form function. This TVar will retrieve its value from the BDB when it is created, or if no value exists for its id, it will be initialised with a default value, which in the case of a hashmap is an array of TVars, each containing an empty list of key-value pairs. A value stored in a TVar is persisted to the BDB by serialising it using CAL’s default output function and Xstream, which can serialize and deserialize instances which are not serializable and do not have accessible constructors.

TVars themselves have transient values, so only the id is persisted — the value is lazily loaded when required, using the id. So even though a TVar may persist a complicated tree of CAL algebraic values, this stops at the first TVar. (The root TVar is never persisted itself — only its value is stored).

You can get a snapshot here — just unpack it and run ant run, then point your browser at http://localhost:8080/caltest.html. Or browse the source.

The ant build script includes a target run-tests which runs some Selenium tests. Stop the server before running that target.

Note that the source code includes various bits of half-baked rubbish, in addition to that described above!

March 15, 2008

GWT as a CAL client

Filed under: CAL and Open Quark, Java — Tags: — Tom Davies @ 7:12 pm

I’ve been interested in GWT as a way of building rich Internet applications since it appeared, and I’m very pleased to see it getting better and better.

So it’s natural that I’d want to try using it with CAL, a functional language quite similar to Haskell which runs on the JVM.

I used a similar approach to marshalling Javabeans to CAL algebraic types as I used before, but this time I haven’t used any bytecode manipulation — as the Java classes are needed at compile time for the GWT client there isn’t any point in generating them at runtime (although generating them as a separate build step might be useful). I’ve also extended the previous work to include mapping a Java 5 enum to a CAL algebraic type which has constructors with zero parameters.

So in our GWT client we can write:

CaltestServiceAsync service = GWT.create(CaltestService.class);
((ServiceDefTarget) service).setServiceEntryPoint(
    GWT.getModuleBaseURL() + "CaltestService");
service.processPerson(
    new Person(Salutation.MR, "Jim", "Earl", "Jones"), new MyAsyncCallback(...));

and call the CAL function:

public processPerson p =
    let Person s f m l = p; 
    in Person s (toUpperCase f) (lift toUpperCase m) (toUpperCase l);

where the types are:

data Salutation = MR | MRS deriving Inputable, Outputable;
data Person =
    Person salutation :: Salutation 
              firstName :: String 
              middleName :: (Maybe String) 
              lastName :: String 
    deriving Inputable, Outputable;

All this is done via three different annotations, and a special subclass of the GWT RemoteServiceServlet.

The first annotation, @Cal is applied to the GWT service interface, and indicates the CAL module to map the functions on the interface to:

@Cal(workspace = "myworkspace.cws", module = "TDavies.GwtTest")
public interface CaltestService extends RemoteService {
    @Cal
    Person processPerson(Person p);
...

The CAL types Person and Salutation need to be mapped to Java classes:

Person is a simple Javabean with getters and setters for each attribute:

@CalBean(workspace = "myworkspace.cws", module = "TDavies.GwtTest",
    constructorName = "Person")
public class Person implements IsSerializable {
    private Salutation salutation;
    private String firstName;
    private String lastName;
    private String middleName;
...

Note that middleName has the type Maybe String in the CAL type. A value of null maps to Nothing while a value of "x" maps to Just "x".

Salutation is an enum:

@CalEnum(workspace = "myworkspace.cws", module = "TDavies.GwtTest",
    type = "Salutation")
public enum Salutation {
    MR, MRS
}

The names of the enum’s values must be identical to the names of the CAL constructors.

A subclass of RemoteServiceServlet checks for the annotations and transforms the values in both directions.

The source code for this experiment is available via anonymous svn from http://tgdavies.beanstalkapp.com/eddy/browse/trunk/cal. Please note that this repository contains various other half-baked and half-finished experiments! Look at the build.xml file to see how to set up an environment — you’ll need to supply OpenQuark, GWT and Jetty.

In my next post I’ll describe how to persist information on the server.

January 1, 2008

Confluence, Spotlight and Hessian

Filed under: Computer Science, Java — Tom Davies @ 9:42 pm

My choice for the highlight of OS X 10.5, Leopard, is that Spotlight is worth using. It’s fast enough that it’s replaced Quicksilver (I was never a power Quicksilver user) as my application launcher.

I’ve also found it useful for general searching for emails, contacts, PDF documents and so on.

This inspired my second current spare-time project, a Spotlight importer and Quickview generator for Confluence.

There are two parts to this project, one which is easy and fun and another which is difficult and (relatively) dull. Of course I’ve only done the first part!

The fun part is getting Spotlight (and Quickview) working with Confluence data. Spotlight provides search results at the file level, so each distinct entity you want to be able to find must be represented by a file on your disk. To index a Confluence instance, all you do is crawl a space via the Confluence remote API (OS X has a good XML-RPC library), and create a file for each page/blog-post. The file contains enough information for the Spotlight indexer (which comes along after the file has been created and extracts metadata for the index) to do its job, plus what Quickview needs to create thumbnails and previews in the Finder:

  • Page metadata such as creator, editor, creation date, modification date, labels, title and so on.
  • The page markup.
  • A pre-generated thumbnail — this is just a ’screenshot’ of the page being rendered. This is shown in the finder’s coverflow mode. (In fact it is shown in list mode too, but I think that may be a bug in the finder)
  • The HTML Confluence renders for the page, plus all the images shown on the page, so that Quickview can produce an HTML preview view without making any requests to Confluence.
  • The original URL the page is at, so that opening the file opens the original page in your default browser.

The Confluence demonstration space looks like this in the Finder’s Coverflow mode: Coverflow view of the Confluence demo space.

And previews of pages look like this: Preview of a Confluence page.

The previews are accessible off-line, but the links all point to the original Confluence instance. A useful enhancement would be to send them through a helper application which determined whether the server was available, and if it was not, opened the locally stored preview HTML.

The duller part is making the process efficient enough that many users can keep their Spotlight indexes up to date without imposing too much load on the Confluence server.

I plan to achieve this using a plugin which provides a specialised RPC service which returns only the data required, in the minimum number of requests, and shares some of the work done between clients by caching. As I have an irrational dislike of XML I’m writing the plugin using the Hessian protocol, which uses a binary encoding, and is available for Objective C. At present the plugin does not do any incremental updates, and doesn’t share data between clients — it just dumps the contents of a space. It also assembles the entire response in memory, rather than streaming it back, so I won’t be installing the plugin on confluence.atlassian.com any time soon :-)

How should the plugin work? I plan to host the data on Amazon S3, with base data produced each night and incremental data produced every few minutes. Each set of data will comprise a file for each ‘level’ of Confluence authorisation. That is, there will be a file containing the data visible to an anonymous user, then a file for each group containing data visible to a member of that group, but not visible to an anonymous user and finally a file for each user containing the data they can see by virtue of their identity, rather than their membership of a group. The user files are likely to be mostly empty.

Files stored in S3 are either private to the account owner, or publicly visible, so data will be encrypted with symmetric encryption. The client requests the keys to which it is entitled from the Confluence instance before downloading the data files from S3.

The advantage of using S3 is that the load on the IT infrastructure hosting Confluence is reduced. If this is not an issue the files could be placed on any available http server.

October 3, 2007

CAL and Tapestry 5, Part 2: Algebraic Types and Forms

Filed under: CAL and Open Quark, Computer Science, Java, Tapestry — Tom Davies @ 5:29 am

In my previous post I described how to use a CAL function as part of the implementation of a Java class.

This post looks at interfacing CAL to Tapestry 5 using the ‘Java Bean’ conventions of getter and setter methods for the fields in an object.

Tapestry 5 provides a BeanEditForm component which simplifies providing CRUD operations for Beans. This is described in the second part of the Tapestry 5 tutorial.

By creating a Java class which provides a Bean with fields equivalent to the constructor parameters of a CAL algebraic data type we can use CAL to provide the data model for a web UI created with Tapestry. (more…)

September 28, 2007

Simplifying Code

Filed under: Computer Science, Java — Tom Davies @ 2:50 am

In the course of trying to adapt an ANTLR lexer to be used in an Intellij IDEA language plugin, I found that I’d written this:

token = lexer.nextToken();
if (token != null)
{
    tokenStart = findCharPos(token.getLine(), token.getColumn());

    if (token.getText() == null)
    {
        tokenEnd = tokenStart;
    }
    else
    {
        tokenEnd = tokenStart + token.getText().length() - 1;
        state = tokenEnd;
    }
    if (token.getType() == 1)
    {
        state = -1;
    }
}
else
{
    tokenStart = 0;
    tokenEnd = 0;
    state = -1;
}

What a mess! Which parts of the code influence the final value of tokenEnd? Should state be modified in the token.getText() == null case?

The problem with the code above is that it is structured as cases of the state of token, rather than as the functions which determine the values of the three variables.

I changed it to:

token = lexer.nextToken();
tokenStart = calcTokenStart(token);
tokenEnd = calcTokenEnd(tokenStart, token);
state = calcState(tokenEnd, token);
...
private int calcTokenStart(Token token)
{
    return token == null ? 0 : findCharPos(token.getLine(), token.getColumn());
}

private int calcTokenEnd(int tokenStart, Token token)
{
    if (token == null)
    {
        return 0;
    }
    else
    {
        String tokenText = token.getText();
        return tokenText == null ? tokenStart : tokenStart + tokenText.length();
    }
}

private int calcState(int tokenEnd, Token token)
{
    if (token == null || token.getType() == 1)
    {
        return -1;
    }
    else
    {
        return tokenEnd;
    }
}

This shows us each calculation in isolation, and allows us to see its dependencies.Note that this isn’t an ‘extract method’ refactoring — it comes from changing your view of the function from ‘processing a new token’ to ‘calculating the new values of the three state variables’.

Note that the functions above don’t use the state of the object, even though, for instance, this.tokenStart has been assigned when calcTokenEnd is called. That would reduce the clarity of the solution.

In fact, unless this class was specifically designed to be extensible, perhaps making these functions static would be clearer and safer.

September 24, 2007

CAL and the Tapestry 5 Tutorial

Filed under: CAL and Open Quark, Computer Science, Tapestry — Tom Davies @ 10:13 pm

The technique described in my previous post can be used to create Tapestry 5 pages which call CAL functions. Tapestry also uses Javassist to enhance pages, so adding CAL integration requires that Tapestry is reconfigured to apply the CAL transformations in addition to its own — I wasn’t able to find a way to transparently modify the classes before Tapestry sees them.

I’ve modified the Hi/Lo Guessing Game from the Tapestry Tutorial to use CAL implementations for some functions:

(more…)

Powered by WordPress