My Diversions

January 23, 2008

The Tyranny of Distance

Filed under: Computer Science, General Interest — Tom Davies @ 10:56 pm

Apple ships good developer documentation with OS X, and Xcode provides a good UI for searching it, but as I’m a beginner I like to read the conceptual documentation all the way through.

I don’t like reading long documents on a screen, so I print out the documents I want to read 2-up on single sided A4.

This means that I end up with an unordered pile of dog-eared pages.

I have always been interested in the publish on demand idea, so I decided to get hard copies of some of the Apple documentation from http://www.lulu.com — this is permitted by Apple’s terms and conditions, as I read them.

So I concatenated a few documents, converted to Postscript and back to PDF to embed the fonts, and uploaded the result to Lulu. I photoshopped a cover with a Leopard desktop wallpaper image, and it was all ready to go. I had a 712 page book with a spiffy looking colour cover!

The best part was that the indicated cost was $18.77.

Of course once I got to the checkout I was unpleasantly unsurprised to see that shipping was more than double the cost of the book itself.


200801242346.jpg

I don’t want to spend that much money, so I put this experiment on hold for now.

There’s certainly a cost to not living in the US. Perhaps I’ll just wait for the AUD to appreciate a little more against the USD…

The Language of The Year for 2008 is Scala

Filed under: Computer Science — Tom Davies @ 8:00 am

Last week a colleague pointed my at this post by Steve Yegge. It discusses ‘code’s worst enemy’ (size) and asks which new language, by being more expressive, can reduce line count.

Dijkstra said (via Overcoming Bias):

“If we wish to count lines of code, we should not regard them as ‘lines produced’ but as ‘lines spent’: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.”

Less expressive languages mean that algorithms need to be duplicated to be applied to different data structures. Yegge doesn’t give any examples, but he’s presumably thinking of things like the paradise benchmark which requires that you can write a function which increases salaries for all employees of a company, without any knowledge of the complicated data structure which represents the company other than the type which represents a Salary.

Steve wants a language which runs on the JVM, which is entirely reasonable. It gives you immediate cross-platform consistency, and the potential to use the thousands of libraries written for Java, if the language has been written to support it.

He rejects JRuby because his colleagues don’t like it and chooses the next generation of Javascript, EcmaScript 4. This has an optional type system which looks as though it is a little less expressive than Java 5’s, and of course has the usual Javascript dynamism.

Many of the comments to Steve’s post suggest Scala, a functional/object oriented language with an advanced type system. I have looked at Scala in the past, but have never been able to ‘get into it’. I think this is because it is too much like Java in syntax, and supports the OO paradigm. In Haskell, by contrast, all you have are functions, algebraic types, and type classes, so you just have to get on with it. Scala left me not knowing where to start.

To remedy that I’ve forked out hard cash for the PDF+printed version of Odersky et. al.’s “Programming in Scala”. I hope that the combination of having spent money, and having a physical book will motivate me to learn the language.

January 1, 2008

Confluence, Spotlight and Hessian

Filed under: Computer Science, Java — Tom Davies @ 9:42 pm

My choice for the highlight of OS X 10.5, Leopard, is that Spotlight is worth using. It’s fast enough that it’s replaced Quicksilver (I was never a power Quicksilver user) as my application launcher.

I’ve also found it useful for general searching for emails, contacts, PDF documents and so on.

This inspired my second current spare-time project, a Spotlight importer and Quickview generator for Confluence.

There are two parts to this project, one which is easy and fun and another which is difficult and (relatively) dull. Of course I’ve only done the first part!

The fun part is getting Spotlight (and Quickview) working with Confluence data. Spotlight provides search results at the file level, so each distinct entity you want to be able to find must be represented by a file on your disk. To index a Confluence instance, all you do is crawl a space via the Confluence remote API (OS X has a good XML-RPC library), and create a file for each page/blog-post. The file contains enough information for the Spotlight indexer (which comes along after the file has been created and extracts metadata for the index) to do its job, plus what Quickview needs to create thumbnails and previews in the Finder:

  • Page metadata such as creator, editor, creation date, modification date, labels, title and so on.
  • The page markup.
  • A pre-generated thumbnail — this is just a ’screenshot’ of the page being rendered. This is shown in the finder’s coverflow mode. (In fact it is shown in list mode too, but I think that may be a bug in the finder)
  • The HTML Confluence renders for the page, plus all the images shown on the page, so that Quickview can produce an HTML preview view without making any requests to Confluence.
  • The original URL the page is at, so that opening the file opens the original page in your default browser.

The Confluence demonstration space looks like this in the Finder’s Coverflow mode: Coverflow view of the Confluence demo space.

And previews of pages look like this: Preview of a Confluence page.

The previews are accessible off-line, but the links all point to the original Confluence instance. A useful enhancement would be to send them through a helper application which determined whether the server was available, and if it was not, opened the locally stored preview HTML.

The duller part is making the process efficient enough that many users can keep their Spotlight indexes up to date without imposing too much load on the Confluence server.

I plan to achieve this using a plugin which provides a specialised RPC service which returns only the data required, in the minimum number of requests, and shares some of the work done between clients by caching. As I have an irrational dislike of XML I’m writing the plugin using the Hessian protocol, which uses a binary encoding, and is available for Objective C. At present the plugin does not do any incremental updates, and doesn’t share data between clients — it just dumps the contents of a space. It also assembles the entire response in memory, rather than streaming it back, so I won’t be installing the plugin on confluence.atlassian.com any time soon :-)

How should the plugin work? I plan to host the data on Amazon S3, with base data produced each night and incremental data produced every few minutes. Each set of data will comprise a file for each ‘level’ of Confluence authorisation. That is, there will be a file containing the data visible to an anonymous user, then a file for each group containing data visible to a member of that group, but not visible to an anonymous user and finally a file for each user containing the data they can see by virtue of their identity, rather than their membership of a group. The user files are likely to be mostly empty.

Files stored in S3 are either private to the account owner, or publicly visible, so data will be encrypted with symmetric encryption. The client requests the keys to which it is entitled from the Confluence instance before downloading the data files from S3.

The advantage of using S3 is that the load on the IT infrastructure hosting Confluence is reduced. If this is not an issue the files could be placed on any available http server.

Powered by WordPress