February 25, 2008

Description of MS Excel binary file format

MS Excel file description from Microsoft was compared with format description published on OpenOffice.org.
After analysis of existing corpus of files in MS Excel format, i found number of records, that not descibed in documentation, provided by Microsoft. Some records, also not mentioned in any of existing specification (they marked with ? sign). For known records, names are taken from description from OpenOffice site.
In description from Microsoft described BIFFs (Binary Interchange File Format) versions 5, 7 and 8 (MS Excel 5.0, Excel 95, Excel 97 and later), so in this review i don't mention BIFF records from version 2, although files in this format still used by customers.
  • 0x0006 - FORMULA, exists in BIFF v.2,5 & 8. In description from Microsoft has another number - 0x406.
  • 0x0018 - NAME, exists in BIFF v.2,5 & 8. In description from Microsoft has another number - 0x218.
  • 0x0023 - EXTERNNAME, exists in BIFF v.2,5 & 8. In description from Microsoft has another number - 0x223.
  • 0x0031 - FONT, exists in BIFF v.2,5 & 8. In description from Microsoft has another number - 0x231.
  • 0x0033 - ?
  • 0x00a4 - ?
  • 0x00bf - ?
  • 0x00c0 - ?
  • 0x00ef - PHONETIC, exists in BIFF v.8. Missed in description from Microsoft.
  • 0x015f - LABELRANGES, exists in BIFF v.8. Missed in description from Microsoft.
  • 0x01ba - ?
  • 0x01bd - ?
  • 0x01c2 - ?
  • 0x027e - RK,exists in BIFF v.3,4,5 & 8. In description from Microsoft has another number - 0x7e (there is no record with this number in description from OpenOffice).
  • 0x0400 - ?
  • 0x04bc - SHRFMLA, exists в BIFF v.5 & 8. In description from Microsoft has another number - 0xbc.
  • 0x0850 - ?
  • 0x0851 - ?
  • 0x0852 - ?
  • 0x0853 - ?
  • 0x0854 - ?
  • 0x0855 - ?
  • 0x085a - ?
Besides these records, was found number of records with number greater as 0x1000, but i can't find any information about records in this range.

Windows Compound file format

Format description published by Microsoft looks less detailed than description of this format published by OpenOffice.org — only 7 pages of description from Microsoft comparing with 22 pages in description from OpenOffice.org. In principe, this enough to write code, that will work with this file format, but documentation from OpenOffice.org has more diagrams, and more detailed examples.
Besides this, description from OpenOffice.org provide format of date/time records, that used in OLE2 directory entries, while in Microsoft's description just mentioned, that date/time records are just structure from two DWORs.

Anaysis of descriptions of MS Office binary file formats

With this post i start a series of postings, that dedicated to analysis of descriptions of MS Office binary file formats, that was published by Microsoft last week.
All posts will has label msoffice.
P.S. i want to mention, that these posts (and all posts in this blog) state only my own opinion, and not related to my main work.