Today I uploaded initial version of Clojure bindings to Apache Tika project. Tika allows to perform mime type detection & content extraction for many file formats. It also allows to perform language detection for given text. Clojure bindings to Tika is a part of set of libraries, that I plan to write. To description of bindings & source code is available from github.
P.S. Next tasks - bindings to Apache Mahout & Lucene