cyrusharmon.org

Cyrus Harmon's new completely useless blog

 

whoops

posted by cyrus in Lisp

Elias Martenson kindly pointed out that I forgot a progn around ,@body:

(defmacro with-java-stack-trace (&body body)  
  `(handler-case   
    (progn  
      ,@body)  
    (java:java-exception (e)  
     (print (#"getMessage" e))))) 
whoops

with-java-stack-trace

posted by cyrus in Lisp

Following up on the java stack trace post, here's a little macro that can be used to wrap the lisp code that (eventually) triggers the java-side error:

(defmacro with-java-stack-trace (&body body)  
  `(handler-case   
    ,@body  
    (java:java-exception (e)  
     (print (#"getMessage" e))))) 

Note the use of the java:java-exception in the handler-case clause so that we don't inadvertently trap lisp errors.

with-java-stack-trace

ABCL Error Handling

posted by cyrus in Lisp

There's probably a better way to do this, but I have been having a difficult time trying to, from the lisp side of things, track down the cause of errors signaled from java code.

It turns out that we can use lisp's normal error handling facilities to work with java errors. The following snippet triggers a java NullPointerException and if we just evaluate this in SLIME we don't actually see the java backtrace (or at least I don't see it -- of course it would be nice if there were a way to do so).

(handler-case  
    ;; this will throw an NPE  
    (java:jstatic-raw "getenv" "java.lang.System" nil)  
  (error (e)  
    ;; this prints the stack trace to the jvm's standard out, which  
    ;; when running under slime, is our *inferior-lisp* buffer.  
    (print (#"printStackTrace" (java:java-exception-cause e))))) 

But this isn't so great as the stack trace is printed to the inferior-lisp buffer. To see it in SLIME's output buffers, we can use ABCL's getMessage routine as follows:

(handler-case  
    ;; this will throw an NPE  
    (java:jstatic-raw "getenv" "java.lang.System" nil)  
  (error (e)  
    ;; this prints the exception type and the stack trace to SLIME's STDOUT  
    (print (#"getMessage" e)))) 

Having this certainly makes it easier to find the source of errors in java code called from ABCL.

ABCL Error Handling

hunchentoot-cgi update

posted by cyrus in Lisp

Well, it had been a while since hunchentoot-cgi had seen any attention. It turns out that the initial releases of hunchentoot-cgi had a pretty major limitation -- it didn't work at all with POST request methods, or at least it didn't send along any of the request's data to the CGI process. This has no been fixed, along with a bunch of other bugs in setting up the CGI script's environment variables.

The new code can be found at https://github.com/slyrus/hunchentoot-cgi.

hunchentoot-cgi update

CDK Debugging

posted by cyrus in Computational Biology

[Well, this is more computational chemistry than computational biology, but I didn't want this to show up on planet.lisp.org, so I'm using this category]

If you want to turn on logging with CDK, the magic JVM incantation arguments are:

-Dcdk.debugging=true -Dcdk.debug.stdout=true 

I'm not sure how to get logging for only a single class yet. Perhaps that will be next. In the meantime, at least basic logging works for me in Eclipse now.

CDK Debugging

Common Lisp and Java

posted by cyrus in Lisp

Tales of Woe

So... in an attempt to use preexisting wheels, rather than reinvent my own at every turn, I've been trying to get a decent Common Lisp environment working with the CDK (Chemistry Development Kit). My abcl-cdk adventures actually went reasonably well and I was able, eventually, to get ABCL talking nicely to CDK. Of course I wanted more than just that, I wanted interoperability between the CDK and my half-round wheel, chemicl, a cheminformatics package I started writing in Common Lisp. This is where the train began to fall of the tracks.

ABCL and cxml-stp

A while back, in an earlier, aborted attempt to get some of my chem/bioinformatics(https://github.com/slyrus/cl-bio) stuff working with ABCL I noticed that plexippus-xpath couldn't be loaded into ABCL. This was fixed, so I was encouraged that things might work with ABCL. (While I'm on a rant, the ABCL trac issue tracker is really slow...). However, cxml-stp seems to break ABCL.

Hopefully this is a fixable bug and some future version of ABCL will work with cxml-stp.

In the meantime...

SBCL and Java

So, I figured I'd try some other approaches to getting Java and a Common Lisp implementation to play nice. I know, you're thinking "why doesn't the dude just use clojure? After all, that's what clojure was designed for!" Well, that's a good question. I did use clojure for some earlier explorations with CDK and, while the java integration generally works well, I have a bunch of existing Common Lisp code I'd like to use and, at the time at least, it seemed like all of the clojure wrappers where thin wrappers around ugly Java libraries. I've grown to know and love many Common Lisp libraries, many of which are nicely available in QuickLisp, and I'd like to be able to use those (things like cxml-stp, plexippus-xpath, opticl, etc...).

Anyway, I tried to get some sort of SBCL Java interoperability working. Three possibilities appeared: 1) jfli, 2) foil and 3) cl+j. Turns out jfli is (was?) Rich Hickey's pre-clojure Common Lisp. I'm guessing that the challenges in getting jfli to work with any of reasonably Common Lisp implementations was part of the motivation behind clojure. In any event, it doesn't seem that jfli works under SBCL.

Next, I looked at foil, which appears to use sockets to communicate to another process running a JVM. This sounded suboptimal but, presumably, workable. Turns out foil looks like some sort of windows-only beast with a bunch of C# files. Not for me.

Finally, I looked at cl+j and it turns out there are some scary warning messages about how cl+j can't possibly work with SBCL's foreign threads handling mechanism. Bummer. This seems somewhat unreasonable on SBCL's part. Surely some amount of engineering should make it possible to have both a JVM and SBCLs runtime running in the same process. Unfortunately, I'm too out of practice with SBCL internals to give this much of a go at this point. Bummer again.

CCL and Java Ok, next approach. How about cl+j and Clozure Common Lisp (CCL)? Seemed reasonable, but, unfortunately, hung just like SBCL did. Presumably this is more of a MacOS issue than a CCL issue, as cl+j is supposed to work with CCL, but maybe just on other non-mac platforms.

Now what?

So, it seems I'm stuck without a viable approach to using the common lisp libraries I want and the java libraries I want in the same process. Perhaps the ABCL bug will get fixed. Perhaps JVM integration would make a good summer project for the next SBCL Summer of Code.

Common Lisp and Java

More fun with CDK and ABCL

posted by cyrus in Lisp

ticagrelor

The drug ticagrelor (marketed as Brilinta by AstraZeneca) is an inhibitor of platelet activation and aggregation that has been shown to reduce the frequency of cardiovascular events in patients with acute coronary syndrome.

The CHEBI page for ticagrelor tells us that the SMILES for ticagrelor is:

CCCSc1nc(N[C@@H]2C[C@H]2c2ccc(F)c(F)c2)c2nnn([C@@H]3C[C@H](OCCO)[C@@H](O)[C@H]3O)c2n1 

So we can read that in as follows:

(eval-when (:compile-toplevel :load-toplevel :execute)  
  (asdf:load-system 'abcl-cdk))  
 
(cl:defpackage :ticagrelor  
  (:use :common-lisp :abcl-cdk))  
 
(cl:in-package :ticagrelor)  
 
(defparameter *ticagrelor*  
  (read-smiles-string  
   "CCCSC1=NC2=C(C(=N1)N[C@@H]3C[C@H]3C4=CC(=C(C=C4)F)F)N=NN2[C@@H]5C[C@@H]([C@H]([C@H]5O)O)OCCO")) 

And we can render a 2-d depiction as follows:

(mol-to-svg *ticagrelor* "ticagrelor.svg")  
 
CL-USER> (in-package :ticagrelor)  
#<PACKAGE TICAGRELOR>  
TICAGRELOR> (mol-to-svg *ticagrelor* "ticagrelor.svg")  
"ticagrelor.svg" 

ticagrelor SVG

Notice the 6 nice chiral bonds. This is all well and good, but let's jazz things a bit by rendering the molecule on a black bacground with white bonds:

(let ((*background-color* (java:jfield "java.awt.Color" "black"))  
      (*default-bond-color* (java:jfield "java.awt.Color" "white")))  
  (mol-to-svg *ticagrelor* "ticagrelor-inverted.svg")) 

ticagrelor inverted SVG

There, now we have a nice pretty picture of ticagrelor. Thanks CDK!

More fun with CDK and ABCL

ABCL-CDK update part 2

posted by cyrus in Lisp

An update on using the Chemistry Development Kit (CDK) with ABCL, Part 2

Rendering Stereochemical Molecules

You may recall that in my original blog post on using CDK with ABCL I had an example for reading a description of a molecule (a SMILES string) and rendering a picture of the 2-d structure of the molecule. Let's take another look at this process and see where things went awry and how they have gotten better.

The following line reads in a description of the amino acid valine, creates returns a new AtomContainer object:

(defparameter *valine* (abcl-cdk:parse-smiles-string "CC(C)[C](C(=O)O)N")) 

Evaluating this gives:

CL-USER> (defparameter *valine* (abcl-cdk:parse-smiles-string "CC(C)[C](C(=O)O)N"))  
*VALINE*  
CL-USER> *valine*  
#<org.openscience.cdk.AtomContainer AtomContainer(1954296239, #A:8, .... {50F523C0}> 

We can write this molecule to an SVG file with the following:

(abcl-cdk:mol-to-svg *valine* "valine.svg") 

valine SVG

So far so good. But the problem is that valine actually comes in two forms that are mirror images of each either. Think a left-handed version, l-valine, and a right-handed version, d-valine. The central carbon atom in valine has four neighbors, two carbons (which are functionally distinct as they themselves have distinct neighbors), a nitrogen, and a hydrogen. These four neighbors are arranged in a tetrahedral configuration and can be arranged in two distinct non-superimposable configurations, giving rise to a tetrahedral chiral center. A given chiral molecule and its mirror image are known as enantiomers.

Let's assume that we're really interested in the biologically important enantiomer, l-valine. Fortunately the SMILES spec has support for representing this information and we can write (and read) l-valline as:

(defparameter *l-valine* (abcl-cdk:parse-smiles-string "CC(C)[C@@H](C(=O)O)N")) 

The problem with the 2012-era CDK was that it just ignored this information and, until recently, didn't draw the 2-d structure in such a way as to show the stereochemistry.

Luckily, recent changes to the CDK add support for precisely this.

Reading and writing a chiral SMILES string

CL-USER> (defparameter *l-valine* (abcl-cdk:read-smiles-string "CC(C)[C@@H](C(=O)O)N"))  
*L-VALINE*  
CL-USER> *l-valine*  
#<org.openscience.cdk.AtomContainer AtomContainer(31488044, #A:8, At.... {2B0F0F71}>  
CL-USER> (abcl-cdk:write-chiral-smiles-string *l-valine*)  
"CC(C)[C@@H](C(=O)O)N" 

So we can now do a round-trip to and from a chiral smiles string with CDK without losing the stereochemistry information. Hooray for CDK 1.5.4!

Render a 2-d depiction of a chiral molecule to an SVG file:

(abcl-cdk:mol-to-svg *l-valine* "l-valine.svg") 

l-valine SVG

Double hooray for CDK 1.5.4!

Just for good measure, let's render the other entantiomer of valine, d-valine:

CL-USER> (defparameter *d-valine* (abcl-cdk:read-smiles-string "CC(C)[C@H](C(=O)O)N"))  
*D-VALINE*  
CL-USER> (abcl-cdk:mol-to-svg *d-valine* "d-valine.svg")  
"d-valine.svg" 

l-valine SVG

Notice that the bond connecting the carbon in the middle of the molecule and the nitrogen is now a solid wedged bond (indicating that the bond is going up and that the nitrogen should be considered as being above the plane created by the bonds carbon-carbon bonds.

Explicit configurations around double bonds

In addition to the tetrahedral chiral centers mentioned, another important class of stereochemistry is the configurations around double bonds. For a simple example, let's consider the molecule 2-butene, or as it is known by its IUPAC name, but-2-ene.

CL-USER> (defparameter *but-2-ene* (abcl-cdk:read-smiles-string "CC=CC"))  
*BUT-2-ENE*  
CL-USER> (abcl-cdk:mol-to-svg *but-2-ene* "but-2-ene.svg" :height 128 :width 128)  
"but-2-ene.svg" 

but-2-ene SVG

Notice that the two single bonds are shown as going in opposite directions from the atoms involved in the double bond in the middle. But this is really just an accident. We didn't explicitly specify the stereochemical configuration. The convention for describing configurations around double bonds is known as the E/Z notation. If we want to ensure that the two terminal carbons are on the same side of the double bond (represented by Z (short for zusammen, which supposedly means together in German)), we can read an appropriate so-called chiral SMILES string (I say so-called because we're actually describing the stereochemistry of explicit configuration around a double bond, not a chiral center, but the SMILES folks play fast and loose with the nomenclature):

CL-USER> (defparameter *z-but-2-ene* (abcl-cdk:read-smiles-string "[H]/C(C)=C(\\[H])C"))  
*Z-BUT-2-ENE*  
CL-USER> (abcl-cdk:mol-to-svg *z-but-2-ene* "z-but-2-ene.svg" :width 128 :height 128)  
"z-but-2-ene.svg" 

z-but-2-ene SVG

Now we see that the two terminal carbons are indeed on the same side of the double bond between the two internal carbons, and that when we draw an explicit configuration around a double bond the otherwise implicit hydrogens are shown in their proper position. Another hooray for CDK 1.5.4!

While we're at it, notice that we have explicitly provided width and height arguments to abcl-cdk:mol-to-svg in the previous two examples. The CDK 2-d rendering code requires some dimension arguments that seem to affect the size of things like bonds and atom symbols. It's not entirely clear what the best way to figure out what parameters should be used to display a given molecule at a given size, so we'll use some combination of (hopefully) lucky guesses and trial and error. 128x128 seems to look good for small molecules like the various flavors of butene.

Support for tetrahedral chiral centers and explicit stereochemical configuration around double bonds is a big win for CDK. Many thanks to John May and the rest of the CDK team for including this in the latest release. We'll look at some more complicated examples and additional features of abcl-cdk in the next installment.

ABCL-CDK update part 2

CDK/ABCL Update

posted by cyrus in Lisp

An update on using the Chemistry Development Kit (CDK) with ABCL

Last year I explored using the CDK with ABCL. It was nice to see that ABCL could call out to the CDK and that I could use a Common Lisp environment for dealing with various kinds of chemistry data, molecules, atoms, bonds, etc...

The seemingly straightforward use-case I had in mind was to be able to read and write descriptions of molecules and to render these as 2-d drawings in various ways. This sort of worked, when I tried to work with more complex molecules, particularly molecules with explitic stereochemistry such as tetrahedral chiral centers or explicit configurations around double bonds, things broke down. I'm pleased to report that things have gotten much better in the past year or so!

First, the preliminaries. The canonical home for the cdk source code has for some time been somewhat difficult to track down, or, rather, I should say it's hard to know which particular version of the source code is the canonical version at any given time. But it does seem like https://github.com/cdk/cdk is the current canonical location. Unfortunately, the good folks at cloudera seem to have grabbed the top-ranking google spot for CDK with the Cloudera Development Kit. As awesome as the cloudera folks are, that's not what we're after. And the second hit on google is for Egon Willighagen's personal CDK repository, which is pretty damn close to the canonical repository these days, but I think https://github.com/cdk/cdk is actually the preferred place to grab the source at any given point in time.

Until quite recently, I needed a branch of CDK from John May that can be found at https://github.com/johnmay/cdk/tree/master+. But fortunately these changes were rolled into the recent CDK 1.5.4 and John May's blog post describes many of the changes that went into 1.5.4.

So, now we're good to go with either the 1.5.4 release or, at least for the moment, the current HEAD of the master branch which will presumably one day become CDK 1.5.5.

Getting started with CDK

git clone http://github.com/cdk/cdk.git  
 
cd cdk 

If we want to use version 1.5.4 we can either hunt it down from some maven repository, which I generally hate doing, or build our own:

git checkout cdk-1.5.4  
 
ant dist-large 

Note that we need to make sure that ant builds the dist-large target as we want all of the CDK files to be rolled into one jar. We could use the individual jars but that would be a lot more work.

Now that we have the jar, I'm going to hold my nose and suggest that we use maven for the installation of the jar and then rely on ABCL's ASDF extensions that interact with maven to access the required jar files. Certainly other approaches could work too, but this one seems simple enough. In order to install the CDK jar using maven we can do the following:

CDK_VERSION=1.5.4  
CDK_BUILD_VERSION=1.5.4  
mvn install:install-file -DgroupId=org.openscience.cdk -DartifactId=cdk \  
    -Dversion=${CDK_VERSION} -Dpackaging=jar \  
    -Dfile=dist/jar/cdk-${CDK_BUILD_VERSION}.jar 

If we look at the abcl-cdk ASDF system defintion we see:

(asdf:defsystem :abcl-cdk  
  :name "abcl-cdk"  
  :author "Cyrus Harmon"  
  :serial t  
  :default-component-class asdf:cl-source-file  
  :components  
  ((:mvn "org.freehep/freehep-graphics2d" :version "2.2.1")  
   (:mvn "org.freehep/freehep-graphicsio-pdf" :version "2.2.1")  
   (:mvn "org.freehep/freehep-graphicsio-svg" :version "2.2.1")  
   (:mvn "org.openscience.cdk/cdk" :version "1.5.4")  
   (:file "package")  
   (:file "utilities")  
   (:file "smiles")  
   (:file "geometry")  
   (:file "render")  
   (:file "inchi"))) 

If we want to build from the current HEAD of the master branch and install this into maven we would do:

CDK_VERSION=1.5.5  
CDK_BUILD_VERSION=1.5.5.git  
mvn install:install-file -DgroupId=org.openscience.cdk -DartifactId=cdk-git \  
    -Dversion=${CDK_VERSION} -Dpackaging=jar \  
    -Dfile=dist/jar/cdk-${CDK_BUILD_VERSION}.jar 

Note that we change the name of the artifact to cdk-git here. We do this because (recent versions of) ASDF only accepts dotted integers for versions, so we can't request :version "1.5.5-git". Therefore we change the name of the artifact and use cdk-git for devlopment versions and cdk for release versions.

So now if we want to use the work-in-progress 1.5.5 git HEAD version we have to change the line in the ASDF system definition to:

   (:mvn "org.openscience.cdk/cdk-git" :version "1.5.5") 

Both versions should suffice for the following examples. I'm going to assume we're using the 1.5.5-git version from here on out.

ABCL and abcl-cdk

So, of course we need ABCL, and we'll need abcl-cdk:

git clone https://github.com/slyrus/abcl-cdk 

To load abcl-cdk, do:

(pushnew *default-pathname-defaults* asdf:*central-registry*)  
(asdf:load-system 'abcl-cdk) 

To load the examples do:

(asdf:load-system 'abcl-cdk-examples) 

We'll walk through some examples in the next installment.

CDK/ABCL Update

Olivia and Tucker

posted by cyrus in General

He's gotten much larger since then, but here's a picture of Tucker and Olivia .

Olivia and Tucker