December 28, 2009

EtherPad and co-ment

I was sitting in a train for a few hours, with the source code of EtherPad (Apache License 2.0) and co-ment (GNU Affero GPL 3) on my MacBook. I had just done the initial checkout before the trip - I was very excited to look how these two open source code editors work. EtherPad was just being acquired by Google, apparantly to get more professional people to work on the Wave. Real-time collaborative editing seems to be the new paradigm for web user interfaces.

Both projects, naturally, use JavaScript for transacting with the DOM. EtherPad handles page updates with Comet techniques. Neither come with verbose description of their inner workings, but the EtherPad authors have written a nice post (in the source code repository) that reveals the essential magic ingredient:

"The crazy idea here, which seems to have originated with Dutch programmer Marijn Haverbeke, is to take advantage of a browser feature called "design mode" (or "content editable"), a mode which allows the user to directly edit an HTML document. This feature has quietly been added to all major browsers over time. In fact, it's what GMail uses to let users compose rich-text e-mail. The advantages of basing an editor on a design-mode buffer are that such a buffer has a full DOM (document model), which allows arbitrary styling and swapping of parts of the document, and that native editing operations (selection, copy/paste) are mapped by the browser onto operations on the DOM.
...
The key is to treat the DOM as a hugely complicated I/O device between you and the browser, and carefully make rules to constrain it. The plus side is that once you've systematically beat design mode into submission, you can have an unmatched degree of scalability and nativity."
The server-side is handled by Scala, Java and JavaScript. Huh?
"This enabled us to be more productive by writing all of EtherPad in the same language, and shuttle data between the client, server, and database all using JavaScript objects."
The data persistance model is not very conventional either:
"EtherPad stores all its data in the AppJet Database, which automatically scales and caches itself in memory as necessary. This makes it fast to implement EtherPad features, fast to change storage models, and fast to serve requests in production."
The code looks very professional and clean, albeit difficult to grok to develop new features. As an interesting note, the included documents describe the changeset to look something like:
Z:5g>1|5=2p=v*4*5+1$x
"This changeset, together with the pool, represents inserting a bold letter "x" into the middle of a line."
This is achieved by JavaScript:
function handleUserChanges() {
  ...
   var userChangesData = editor.prepareUserChangeset();
   if (userChangesData.changeset) {
     lastCommitTime = t;
     state = "COMMITTING";
     stateMessage = {type:"USER_CHANGES", baseRev:rev,
                     changeset:userChangesData.changeset,
                     apool: userChangesData.apool };
     stateMessageSocketId = socketId;
     sendMessage(stateMessage);
     sentMessage = true;
     callbacks.onInternalAction("commitPerformed");
   }
 ...
 }
...and the changeset can be traced back to function "compose()":
Changeset.compose = function(cs1, cs2, pool) {
   var unpacked1 = Changeset.unpack(cs1);
   var unpacked2 = Changeset.unpack(cs2);
   ...
   var bankAssem = Changeset.stringAssembler();

   var newOps = Changeset.applyZip(unpacked1.ops, 0, unpacked2.ops, 0, function(op1, op2, opOut) {

     var op1code = op1.opcode;
     var op2code = op2.opcode;
     if (op1code == '+' && op2code == '-') {
       bankIter1.skip(Math.min(op1.chars, op2.chars));
     }
     Changeset._slicerZipperFunc(op1, op2, opOut, pool);
     if (opOut.opcode == '+') {
       if (op2code == '+') {
           bankAssem.append(bankIter2.take(opOut.chars));
       }
       else {
           bankAssem.append(bankIter1.take(opOut.chars));
       }
     }

   });

   return Changeset.pack(len1, len3, newOps, bankAssem.toString());
 };

So overall, EtherPad looks very promising, but difficult to tailor for specific needs.

Co-ment, in the other hand, is a simpler solution. "co-ment® makes it possible for you to write or upload your own texts, submit them for comments and process the comments."

It also uses JavaScript on client-side and Python with Django on the server. The magic is not as advanced; it uses a DOM manipulation API (Beautiful Soup), and happily ignores standard design patterns by mixing views and request/response mechanism control structures with inline SQL. In other words: it is written with more traditional technology, but in a way that makes it difficult to maintain.

The source codes of either project are definitely beneficial to read and learn from. Unfortunately the lack of architectural documentation makes this a tedious task.

UPDATE: LWN comments on EtherPad