November 28, 2005

about programming knowledge

Rewriting the old code, written by other programmers, i cannot stop to wonder - why programmers use low level code to work with some things, like files, dirs, strings, ....
There are many libraries, that hide low-level detail from the programmer - this list includes boost, ACE and other libraries. With this libraries i don't want to know on which system i work, what filenames is valid. I can simply write directory iterator and iterate through directory, and don't think about members of DIR structure and other things.
But many programmers simply don't care about using this libraries, they write code with errors, with uggly printf instead of log libraries, etc. things. Good programmer MUST select appropriate instruments and libraries for each project!

November 24, 2005

type detection again

How you can bypass data through most of content filtering systems? In some cases you can just put two symbols 'MZ' in the start of data and your data will detected as DOS/Windows executable. And there is many such tricks to make your data detected as different type

November 22, 2005

black PR in action

In last issue of BYTE Magazine, Russian edition, was published article with comparision of content-filtering systems. This article was published with InfoWatch's (child company of Kaspersky Lab) materials as basis for it. But this article is tightly coupled with our mainstream product -- Dozor-Jet, aka MailBoss.
From this comparision stated, that InfoWatch is the best product for content filtering ;-) Other products, such as, ISS Proventia WebFilter, Clearswift WebSweeper, does not mentioned - only their e-mail analogs. Our product is mentioned with many errors - in article stated, that we does not support PDF's conversion, does not provide different support programms for our customers, etc.
But we have support for PDF (at least - last 3 years), we provide different support programms for customers, etc. We have bigger customers base, than InfoWatch. We have major technical advantage over the InfoWatch's products, such as, content-type detection, more supported formats and so on.
So, i decide to battle with such PR by using my blogs, and does not write to editors. In our time, blogging is more powerfull method of fight with black PR, than other methods

November 19, 2005

About programming style

I'm allways think, that good programmer must be lazy. If programmer is lazy, then it make very short (and at most cases - elegant) solutions. And bad programmers are not lazy, so they duplicate code, write many useless code, etc.
Almost year ago i send some improvements to DansGuardian code, one of the was moving code like:

if (is_daemon)
std::cerr << "Bla-bla" << std::endl;
syslog(...,"Bla-bla");

into the one function, that receives one argument, and run code, that i show above.
But looking today into the CVS, i'm again seen this code in the all parts of the DansGuardian. With this programming style, easier to rewrite DansGuardian from scratch, than try to improve the code - after first quick look on the DansGuardian at start of 2004, i'm decide to use it to protect our corporate users, but after code investigation and attempts to improve code, i leave the DansGuardian and swicth to the other solution.

November 15, 2005

about file type detection

As i mentioned at earlier posts, file type detection is the hard task, especially in branch of complex data formats. At earlier time, almost all files had a signatures, that helps programms to distinguish one format from another.
Currently we have many complex formats, so we need to write complex code to handle. One of example is Microsoft OLE-based formats - they are used to store data from MS Office files, and also to store data of Microsoft Installer. For MS Office files we can detect right type by reading main directory of file and analyze root records. Buf .msi files does not have constant root records, that we could use to detect file type. One of annoying thing in detecting ole-based file types is, that root directory usually located at end of file, so we need to read all file to analyze it. this is very annoying, especially when we try file type in web filtering application.
But the hardies tasks in content type detection are: detecting of text files and detecting of old .com files. In our content type detection library we use statistical approach to solve this complex tasks.
Widespread used library libmagic (from the file package) work very unstable and give false results in many cases. The main problem of libmagic is the very simple language, that used to describe tests against content.
To avoid this, I (idea and architecture) and alexey voinov (implementation) developed new content type detection library, that resolve problems, that arise when using libmagic. This library implement complex DSL (domain-specific language), in Lisp-like form, that we use to describe tests (simple and complex) against given content. Now this library integrated into our mail and web filtering products (see http://mailboss.com) and we also developed add-on for widely used software Cerberus for Lotus Notes (running under Windows).

November 14, 2005

blogs and data mining/extraction

Currently i read about 3-4 hundreds of blogs. i need to do this to make me happy with new information from different branches. But there are few posts per day, that have important information for me. I read posts with Del button, looks into the different areas. I think, that it will be simpler to narrow number of blogs, but important information is even in blogs, that are not directly linked with my interests.
I want to have instrument, that could filter posts in blogs, but main problem is, that my interests have big fluctuationm, depending on different parameters - current work, emotions, etc.
Mozilla Thunderbird is not ideal - it eat many memory - after update blogs list, it earn about 150Mb of RAM. May be switch back to the Emacs + Gnus to read mail, news and rss, but i need to customize sending mail from home, with use of digital certificates.
But, without any questions, blogs is one of important things in current internet and i like their

November 12, 2005

My projects and Mind Mapping

After finishing MBA, that i get during the last 2 years, now i have some time to work on my personal projects, especially on my book about Emacs.
To speedup work with book, i move it into the the MindManager 6, that i'm also heavly use in my professional work. Now i can see my book structure as one big picture and can have quick access to any part of book, and also rearrange parts of the book.
Mind Manager is very handy tool for modeling different things. I use it for plan and trace development of Software, that make my group. I can set time treshold for parts of map, set priority, etc. and this software will help me to trace, as i can export data to the MS Project, MS Word and so on. I can also publish maps for our software as HTML and any people in our company can view our progress as very good-looking page. I working with MindManager last six month, but still discover new features and extensions.

November 10, 2005

Content filtering

I'm working in branch of content filtering since 2001. When i start we had very early release of product for e-mail filtering and archiving, called Dozor-Jet aka MailBoss. The product had many limitations, that was very troublesome. During the 4 years we released 4 versions and now we close to release a new version of e-mail filtering product, that resolve many annoyed features. We also have a new product for web-filtering, also called Dozor. I'm also have detailed requirements to making a new products for IM and P2P filtering (but now we have no ressources for this tasks).
Content filtering is very interesting area - we need to extract information from different sources, analyze every piece of mail or web traffic to find information leakage. User's often try to hide information by crypting, setting passwords, or just change file extension, and we try to reconstruct original information.
Detecting type of data (file) and language/encoding of texts is very difficult tasks, especially in multi-language environment. I'll write about these topics in next posts ;-)

November 9, 2005

About credentials delegation

Many books speak about credentials delegation to improve motivation of employee. But at last time, i found, that delegation is very problem-prone, as i try to delegate some parts of development process to worker, but i need to check all solutions, documents, etc. instead of give only tasks and check, that given tasks completed properly.
Now i can delegate tasks only to one people in my group (that consist from 6 peoples).
My conclusion is -- "we can delegate tasks only to small, approved group of peoples" -- all others does not have motivation to make tasks-related decisions very good :-(