POI-HMEF - Java API To Access Microsoft Transport Neutral Encoding Files (TNEF)
Overview
Overview
HMEF is the POI Project's pure Java implementation of Microsoft's TNEF (Transport Neutral Encoding Format), aka winmail.dat, which is used by Outlook and Exchange in some situations.
Currently, HMEF provides a read-only api for accessing common message and attachment attributes, including the message body and attachment files. In addition, it's possible to have read-only access to all of the underlying TNEF and MAPI attributes of the message and attachments.
HMEF also provides a command line tool for extracting out the message body and attachment files from a TNEF (winmail.dat) file.
Write support, both for saving changes and for creating new files, is currently unavailable. Anyone interested in working on these areas is advised to read the Contribution Guidelines then join the dev list!
Using HMEF to access TNEF (winmail.dat) files
Easy extraction of message body and attachment files
The class org.apache.poi.hmef.extractor.HMEFContentsExtractor provides both command line and Java extraction. It allows the saving of the message body (an RTF file), and all of the attachment files, to a single directory as specified.
From the command line, simply call the class specifying the TNEF file to extract, and the directory to place the extracted files into, eg:
From Java, there are two method calls on the class, one to extract the message body RTF to a file, and the other to extract all the attachments to a directory. A typical use would be:
Attachment attributes and contents
To get at your attachments, simply call the getAttachments() method on a HMEFMessage instance, and you'll receive a list of all the attachments.
When you have a org.apache.poi.hmef.Attachment object, there are several helper methods available. These will all return the value of the appropriate underlying attachment attributes, or null if for some reason the attribute isn't present in your file.
- getFilename() - returns the name of the attachment file, possibly in 8.3 format
- getLongFilename() - returns the full name of the attachment file
- getExtension() - returns the extension of the attachment file, including the "."
- getModifiedDate() - returns the date that the attachment file was last edited on
- getContents() - returns a byte array of the contents of the attached file
- getRenderedMetaFile() - returns a byte array of a windows meta file representation of the attached file
Message attributes and message body
A org.apache.poi.hmef.HMEFMessage instance is created from an InputStream of the underlying TNEF (winmail.dat) file.
From a HMEFMessage, there are three main methods of interest to call:
- getBody() - returns a String containing the RTF contents of the message body.
- getSubject() - returns the message subject
- getAttachments() - returns the list of Attachment objects for the message
Low level attribute access
Both Messages and Attachments contain two kinds of attributes. These are TNEFAttribute and MAPIAttribute.
TNEFAttribute is specific to TNEF files in terms of the available types and properties. In general, Attachments have a few more useful ones of these then Messages.
MAPIAttributes hold standard MAPI properties and values, and work in a similar way to HSMF (Outlook) does. There are typically many of these on both Messages and Attachments. Note - see limitations
Both HMEFMessage and Attachment supports support two different ways of getting to attributes of interest. Firstly, they support list getters, to return all attributes (either TNEF or MAPI). Secondly, they support specific getters by TNEF or MAPI property.
Investigating a TNEF file
To get a feel for the contents of a file, and to track down where data of interest is stored, HMEF comes with HMEFDumper to print out the contents of the file.
Limitations
HMEF is currently a work-in-progress, and not everything works yet. The current limitations are:
- Non-standard MAPI properties from the range 0x8000 to 0x8fff may not be being quite correctly turned into attributes. The values show up, but the name and type may not always be correct.
- All testing so far has been performed on a small number of English documents. We think we're correctly turning bytes into Java unicode strings, but we need a few non-English sample files in the test suite to verify this!
- There is no support for saving changes, nor for creating new files
by Nick Burch