Busy Developers' Guide to HSSF and XSSF Features
Busy Developers' Guide to Features
Want to use HSSF and XSSF read and write spreadsheets in a hurry? This guide is for you. If you're after more in-depth coverage of the HSSF and XSSF user-APIs, please consult the HOWTO guide as it contains actual descriptions of how to use this stuff.
Index of Features
- How to create a new workbook
- How to create a sheet
- How to create cells
- How to create date cells
- Working with different types of cells
- Iterate over rows and cells
- Getting the cell contents
- Text Extraction
- Files vs InputStreams
- Aligning cells
- Working with borders
- Fills and color
- Merging cells
- Working with fonts
- Custom colors
- Reading and writing
- Use newlines in cells.
- Create user defined data formats
- Fit Sheet to One Page
- Set print area for a sheet
- Set page numbers on the footer of a sheet
- Shift rows
- Set a sheet as selected
- Set the zoom magnification for a sheet
- Create split and freeze panes
- Repeating rows and columns
- Headers and Footers
- XSSF enhancement for Headers and Footers
- Drawing Shapes
- Styling Shapes
- Shapes and Graphics2d
- Outlining
- Images
- Named Ranges and Named Cells
- How to set cell comments
- How to adjust column width to fit the contents
- Hyperlinks
- Data Validations
- Embedded Objects
- Autofilters
- Conditional Formatting
- Hiding and Un-Hiding Rows
- Setting Cell Properties
- Drawing Borders
- Create a Pivot Table
- Cells with multiple styles
Features
New Workbook
New Sheet
Creating Cells
Creating Date Cells
Working with different types of cells
Files vs InputStreams
When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file.
If using WorkbookFactory, it's very easy to use one or the other:
If using HSSFWorkbook or XSSFWorkbook directly, you should generally go through POIFSFileSystem or OPCPackage, to have full control of the lifecycle (including closing the file when done):
Demonstrates various alignment options
Working with borders
Iterate over rows and cells
Sometimes, you'd like to just iterate over all the sheets in a workbook, all the rows in a sheet, or all the cells in a row. This is possible with a simple for loop.
These iterators are available by calling workbook.sheetIterator(), sheet.rowIterator(), and row.cellIterator(), or implicitly using a for-each loop. Note that a rowIterator and cellIterator iterate over rows or cells that have been created, skipping empty rows and cells.
Iterate over cells, with control of missing / blank cells
In some cases, when iterating, you need full control over how missing or blank rows and cells are treated, and you need to ensure you visit every cell and not just those defined in the file. (The CellIterator will only return the cells defined in the file, which is largely those with values or stylings, but it depends on Excel).
In cases such as these, you should fetch the first and last column information for a row, then call getCell(int, MissingCellPolicy) to fetch the cell. Use a MissingCellPolicy to control how blank or null cells are handled.
Getting the cell contents
To get the contents of a cell, you first need to know what kind of cell it is (asking a string cell for its numeric contents will get you a NumberFormatException for example). So, you will want to switch on the cell's type, and then call the appropriate getter for that cell.
In the code below, we loop over every cell in one sheet, print out the cell's reference (eg A3), and then the cell's contents.
Text Extraction
For most text extraction requirements, the standard ExcelExtractor class should provide all you need.
For very fancy text extraction, XLS to CSV etc, take a look at /src/examples/src/org/apache/poi/examples/hssf/eventusermodel/XLS2CSVmra.java
Fills and colors
Merging cells
Working with fonts
Note, the maximum number of unique fonts in a workbook is limited to 32767. You should re-use fonts in your applications instead of creating a font for each cell. Examples:
Wrong:
Correct:
Custom colors
HSSF:
XSSF:
Reading and Rewriting Workbooks
Using newlines in cells
Data Formats
Fit Sheet to One Page
Set Print Area
Set Page Numbers on Footer
Using the Convenience Functions
The convenience functions provide utility features such as setting borders around merged regions and changing style attributes without explicitly creating new styles.
Shift rows up or down on a sheet
Set a sheet as selected
Set the zoom magnification
The zoom is expressed as a fraction. For example to express a zoom of 75% use 3 for the numerator and 4 for the denominator.
Splits and freeze panes
There are two types of panes you can create; freeze panes and split panes.
A freeze pane is split by columns and rows. You create a freeze pane using the following mechanism:
sheet1.createFreezePane( 3, 2, 3, 2 );
The first two parameters are the columns and rows you wish to split by. The second two parameters indicate the cells that are visible in the bottom right quadrant.
Split panes appear differently. The split area is divided into four separate work area's. The split occurs at the pixel level and the user is able to adjust the split by dragging it to a new position.
Split panes are created with the following call:
sheet2.createSplitPane( 2000, 2000, 0, 0, Sheet.PANE_LOWER_LEFT );
The first parameter is the x position of the split. This is in 1/20th of a point. A point in this case seems to equate to a pixel. The second parameter is the y position of the split. Again in 1/20th of a point.
The last parameter indicates which pane currently has the focus. This will be one of Sheet.PANE_LOWER_LEFT, PANE_LOWER_RIGHT, PANE_UPPER_RIGHT or PANE_UPPER_LEFT.
Repeating rows and columns
It's possible to set up repeating rows and columns in your printouts by using the setRepeatingRows() and setRepeatingColumns() methods in the Sheet class.
These methods expect a CellRangeAddress parameter which specifies the range for the rows or columns to repeat. For setRepeatingRows(), it should specify a range of rows to repeat, with the column part spanning all columns. For setRepeatingColums(), it should specify a range of columns to repeat, with the row part spanning all rows. If the parameter is null, the repeating rows or columns will be removed.
Headers and Footers
Example is for headers but applies directly to footers.
XSSF Enhancement for Headers and Footers
Example is for headers but applies directly to footers. Note, the above example for basic headers and footers applies to XSSF Workbooks as well as HSSF Workbooks. The HSSFHeader stuff does not work for XSSF Workbooks.
XSSF has the ability to handle First page headers and footers, as well as Even/Odd headers and footers. All Header/Footer Property flags can be handled in XSSF as well. The odd header and footer is the default header and footer. It is displayed on all pages that do not display either a first page header or an even page header. That is, if the Even header/footer does not exist, then the odd header/footer is displayed on even pages. If the first page header/footer does not exist, then the odd header/footer is displayed on the first page. If the even/odd property is not set, that is the same as the even header/footer not existing. If the first page property does not exist, that is the same as the first page header/footer not existing.
Drawing Shapes
POI supports drawing shapes using the Microsoft Office drawing tools. Shapes on a sheet are organized in a hierarchy of groups and and shapes. The top-most shape is the patriarch. This is not visible on the sheet at all. To start drawing you need to call createPatriarch on the HSSFSheet class. This has the effect erasing any other shape information stored in that sheet. By default POI will leave shape records alone in the sheet unless you make a call to this method.
To create a shape you have to go through the following steps:
- Create the patriarch.
- Create an anchor to position the shape on the sheet.
- Ask the patriarch to create the shape.
- Set the shape type (line, oval, rectangle etc...)
- Set any other style details concerning the shape. (eg: line thickness, etc...)
Text boxes are created using a different call:
It's possible to use different fonts to style parts of the text in the textbox. Here's how:
Just as can be done manually using Excel, it is possible to group shapes together. This is done by calling createGroup() and then creating the shapes using those groups.
It's also possible to create groups within groups.
Here's how to create a shape group:
If you're being observant you'll noticed that the shapes that are added to the group use a new type of anchor: the HSSFChildAnchor. What happens is that the created group has it's own coordinate space for shapes that are placed into it. POI defaults this to (0,0,1023,255) but you are able to change it as desired. Here's how:
If you create a group within a group it's also going to have it's own coordinate space.
Styling Shapes
By default shapes can look a little plain. It's possible to apply different styles to the shapes however. The sorts of things that can currently be done are:
- Change the fill color.
- Make a shape with no fill color.
- Change the thickness of the lines.
- Change the style of the lines. Eg: dashed, dotted.
- Change the line color.
Here's an examples of how this is done:
Shapes and Graphics2d
While the native POI shape drawing commands are the recommended way to draw shapes in a shape it's sometimes desirable to use a standard API for compatibility with external libraries. With this in mind we created some wrappers for Graphics and Graphics2d.
All Graphics commands are issued into an HSSFShapeGroup. Here's how it's done:
The first thing we do is create the group and set it's coordinates to match what we plan to draw. Next we calculate a reasonable fontSizeMultiplier then create the EscherGraphics object. Since what we really want is a Graphics2d object we create an EscherGraphics2d object and pass in the graphics object we created. Finally we call a routine that draws into the EscherGraphics2d object.
The vertical points per pixel deserves some more explanation. One of the difficulties in converting Graphics calls into escher drawing calls is that Excel does not have the concept of absolute pixel positions. It measures it's cell widths in 'characters' and the cell heights in points. Unfortunately it's not defined exactly what type of character it's measuring. Presumably this is due to the fact that the Excel will be using different fonts on different platforms or even within the same platform.
Because of this constraint we've had to implement the concept of a verticalPointsPerPixel. This the amount the font should be scaled by when you issue commands such as drawString(). To calculate this value use the follow formula:
The height of the group is calculated fairly simply by calculating the difference between the y coordinates of the bounding box of the shape. The height of the group can be calculated by using a convenience called HSSFClientAnchor.getAnchorHeightInPoints().
Many of the functions supported by the graphics classes are not complete. Here's some of the functions that are known to work.
- fillRect()
- fillOval()
- drawString()
- drawOval()
- drawLine()
- clearRect()
Functions that are not supported will return and log a message using the POI logging infrastructure (disabled by default).
Outlining
Outlines are great for grouping sections of information together and can be added easily to columns and rows using the POI API. Here's how:
To collapse (or expand) an outline use the following calls:
The row/column you choose should contain an already created group. It can be anywhere within the group.
Images
Images are part of the drawing support. To add an image just call createPicture() on the drawing patriarch. At the time of writing the following types are supported:
- PNG
- JPG
- DIB
It should be noted that any existing drawings may be erased once you add a image to a sheet.
Reading images from a workbook:
Named Ranges and Named Cells
Named Range is a way to refer to a group of cells by a name. Named Cell is a degenerate case of Named Range in that the 'group of cells' contains exactly one cell. You can create as well as refer to cells in a workbook by their named range. When working with Named Ranges, the classes org.apache.poi.ss.util.CellReference and org.apache.poi.ss.util.AreaReference are used.
Note: Using relative values like 'A1:B1' can lead to unexpected moving of the cell that the name points to when working with the workbook in Microsoft Excel, usually using absolute references like '$A$1:$B$1' avoids this, see also this discussion.
Creating Named Range / Named Cell
Reading from Named Range / Named Cell
Reading from non-contiguous Named Ranges
Note, when a cell is deleted, Excel does not delete the attached named range. As result, workbook can contain named ranges that point to cells that no longer exist. You should check the validity of a reference before constructing AreaReference
Cell Comments - HSSF and XSSF
A comment is a rich text note that is attached to & associated with a cell, separate from other cell content. Comment content is stored separate from the cell, and is displayed in a drawing object (like a text box) that is separate from, but associated with, a cell
Reading cell comments
To get all the comments on a sheet:
Adjust column width to fit the contents
For SXSSFWorkbooks only, because the random access window is likely to exclude most of the rows in the worksheet, which are needed for computing the best-fit width of a column, the columns must be tracked for auto-sizing prior to flushing any rows.
Note, that Sheet#autoSizeColumn() does not evaluate formula cells, the width of formula cells is calculated based on the cached formula result. If your workbook has many formulas then it is a good idea to evaluate them before auto-sizing.
How to read hyperlinks
How to create hyperlinks
Data Validations
As of version 3.8, POI has slightly different syntax to work with data validations with .xls and .xlsx formats.
hssf.usermodel (binary .xls format)
Check the value a user enters into a cell against one or more predefined value(s).
The following code will limit the value the user can enter into cell A1 to one of three integer values, 10, 20 or 30.
Drop Down Lists:
This code will do the same but offer the user a drop down list to select a value from.
Messages On Error:
To create a message box that will be shown to the user if the value they enter is invalid.
Replace 'Box Title' with the text you wish to display in the message box's title bar and 'Message Text' with the text of your error message.
Prompts:
To create a prompt that the user will see when the cell containing the data validation receives focus
The text encapsulated in the first parameter passed to the createPromptBox() method will appear emboldened and as a title to the prompt whilst the second will be displayed as the text of the message. The createExplicitListConstraint() method can be passed and array of String(s) containing interger, floating point, dates or text values.
Further Data Validations:
To obtain a validation that would check the value entered was, for example, an integer between 10 and 100, use the DVConstraint.createNumericConstraint(int, int, String, String) factory method.
Look at the javadoc for the other validation and operator types; also note that not all validation types are supported for this method. The values passed to the two String parameters can be formulas; the '=' symbol is used to denote a formula
It is not possible to create a drop down list if the createNumericConstraint() method is called, the setSuppressDropDownArrow(false) method call will simply be ignored.
Date and time constraints can be created by calling the createDateConstraint(int, String, String, String) or the createTimeConstraint(int, String, String). Both are very similar to the above and are explained in the javadoc.
Creating Data Validations From Spreadsheet Cells.
The contents of specific cells can be used to provide the values for the data validation and the DVConstraint.createFormulaListConstraint(String) method supports this. To specify that the values come from a contiguous range of cells do either of the following:
or
and in both cases the user will be able to select from a drop down list containing the values from cells A1, A2 and A3.
The data does not have to be as the data validation. To select the data from a different sheet however, the sheet must be given a name when created and that name should be used in the formula. So assuming the existence of a sheet named 'Data Sheet' this will work:
as will this:
whilst this will not:
and nor will this:
xssf.usermodel (.xlsx format)
Data validations work similarly when you are creating an xml based, SpreadsheetML, workbook file; but there are differences. Explicit casts are required, for example, in a few places as much of the support for data validations in the xssf stream was built into the unifying ss stream, of which more later. Other differences are noted with comments in the code.
Check the value the user enters into a cell against one or more predefined value(s).
Drop Down Lists:
This code will do the same but offer the user a drop down list to select a value from.
Note that the call to the setSuppressDropDowmArrow() method can either be simply excluded or replaced with:
Prompts and Error Messages:
These both exactly mirror the hssf.usermodel so please refer to the 'Messages On Error:' and 'Prompts:' sections above.
Further Data Validations:
To obtain a validation that would check the value entered was, for example, an integer between 10 and 100, use the XSSFDataValidationHelper(s) createNumericConstraint(int, int, String, String) factory method.
The values passed to the final two String parameters can be formulas; the '=' symbol is used to denote a formula. Thus, the following would create a validation the allows values only if they fall between the results of summing two cell ranges
It is not possible to create a drop down list if the createNumericConstraint() method is called, the setSuppressDropDownArrow(true) method call will simply be ignored.
Please check the javadoc for other constraint types as examples for those will not be included here. There are, for example, methods defined on the XSSFDataValidationHelper class allowing you to create the following types of constraint; date, time, decimal, integer, numeric, formula, text length and custom constraints.
Creating Data Validations From Spread Sheet Cells:
One other type of constraint not mentioned above is the formula list constraint. It allows you to create a validation that takes it value(s) from a range of cells. This code
would create a validation that took it's values from cells in the range A1 to F1.
The usefulness of this technique can be extended if you use named ranges like this;
OpenOffice Calc has slightly different rules with regard to the scope of names. Excel supports both Workbook and Sheet scope for a name but Calc does not, it seems only to support Sheet scope for a name. Thus it is often best to fully qualify the name for the region or area something like this;
This does open a further, interesting opportunity however and that is to place all of the data for the validation(s) into named ranges of cells on a hidden sheet within the workbook. These ranges can then be explicitly identified in the setRefersToFormula() method argument.
ss.usermodel
The classes within the ss.usermodel package allow developers to create code that can be used to generate both binary (.xls) and SpreadsheetML (.xlsx) workbooks.
The techniques used to create data validations share much in common with the xssf.usermodel examples above. As a result just one or two examples will be presented here.
Check the value the user enters into a cell against one or more predefined value(s).
Drop Down Lists:
This code will do the same but offer the user a drop down list to select a value from.
Prompts and Error Messages:
These both exactly mirror the hssf.usermodel so please refer to the 'Messages On Error:' and 'Prompts:' sections above.
As the differences between the ss.usermodel and xssf.usermodel examples are small - restricted largely to the way the DataValidationHelper is obtained, the lack of any need to explicitly cast data types and the small difference in behaviour between the hssf and xssf interpretation of the setSuppressDropDowmArrow() method, no further examples will be included in this section.
Advanced Data Validations.
Dependent Drop Down Lists.
In some cases, it may be necessary to present to the user a sheet which contains more than one drop down list. Further, the choice the user makes in one drop down list may affect the options that are presented to them in the second or subsequent drop down lists. One technique that may be used to implement this behaviour will now be explained.
There are two keys to the technique; one is to use named areas or regions of cells to hold the data for the drop down lists, the second is to use the INDIRECT() function to convert between the name and the actual addresses of the cells. In the example section there is a complete working example- called LinkedDropDownLists.java - that demonstrates how to create linked or dependent drop down lists. Only the more relevant points are explained here.
To create two drop down lists where the options shown in the second depend upon the selection made in the first, begin by creating a named region of cells to hold all of the data for populating the first drop down list. Next, create a data validation that will look to this named area for its data, something like this;
Note that the name of the area - in the example above it is 'CHOICES' - is simply passed to the createFormulaListConstraint() method. This is sufficient to cause Excel to populate the drop down list with data from that named region.
Next, for each of the options the user could select in the first drop down list, create a matching named region of cells. The name of that region should match the text the user could select in the first drop down list. Note, in the example, all upper case letters are used in the names of the regions of cells.
Now, very similar code can be used to create a second, linked, drop down list;
The key here is in the following Excel function - INDIRECT(UPPER($A$1)) - which is used to populate the second, linked, drop down list. Working from the inner-most pair of brackets, it instructs Excel to look at the contents of cell A1, to convert what it reads there into upper case – as upper case letters are used in the names of each region - and then convert this name into the addresses of those cells that contain the data to populate another drop down list.
Embedded Objects
It is possible to perform more detailed processing of an embedded Excel, Word or PowerPoint document, or to work with any other type of embedded object.
HSSF:
XSSF:
(Since POI-3.7)
Autofilters
Conditional Formatting
See more examples on Excel conditional formatting in ConditionalFormats.java
Hiding and Un-Hiding Rows
Using Excel, it is possible to hide a row on a worksheet by selecting that row (or rows), right clicking once on the right hand mouse button and selecting 'Hide' from the pop-up menu that appears.
To emulate this using POI, simply call the setZeroHeight() method on an instance of either XSSFRow or HSSFRow (the method is defined on the ss.usermodel.Row interface that both classes implement), like this:
If the file were saved away to disc now, then the first row on the first sheet would not be visible.
Using Excel, it is possible to unhide previously hidden rows by selecting the row above and the row below the one that is hidden and then pressing and holding down the Ctrl key, the Shift and the pressing the number 9 before releasing them all.
To emulate this behaviour using POI do something like this:
If the file were saved away to disc now, any previously hidden rows on the first sheet of the workbook would now be visible.
The example illustrates two features. Firstly, that it is possible to unhide a row simply by calling the setZeroHeight() method and passing the boolean value 'false'. Secondly, it illustrates how to test whether a row is hidden or not. Simply call the getZeroHeight() method and it will return 'true' if the row is hidden, 'false' otherwise.
Setting Cell Properties
Sometimes it is easier or more efficient to create a spreadsheet with basic styles and then apply special styles to certain cells such as drawing borders around a range of cells or setting fills for a region. CellUtil.setCellProperties lets you do that without creating a bunch of unnecessary intermediate styles in your spreadsheet.
Properties are created as a Map and applied to a cell in the following manner.
NOTE: This does not replace the properties of the cell, it merges the properties you have put into the Map with the cell's existing style properties. If a property already exists, it is replaced with the new property. If a property does not exist, it is added. This method will not remove CellStyle properties.
Drawing Borders
In Excel, you can apply a set of borders on an entire workbook region at the press of a button. The PropertyTemplate object simulates this with methods and constants defined to allow drawing top, bottom, left, right, horizontal, vertical, inside, outside, or all borders around a range of cells. Additional methods allow for applying colors to the borders.
It works like this: you create a PropertyTemplate object which is a container for the borders you wish to apply to a sheet. Then you add borders and colors to the PropertyTemplate, and finally apply it to whichever sheets you need that set of borders on. You can create multiple PropertyTemplate objects and apply them to a single sheet, or you can apply the same PropertyTemplate object to multiple sheets. It is just like a preprinted form.
Enums:
- BorderStyle
- Defines the look of the border, is it thick or thin, solid or dashed, single or double. This enum replaces the CellStyle.BORDER_XXXXX constants which have been deprecated. The PropertyTemplate will not support the older style BORDER_XXXXX constants. A special value of BorderStyle.NONE will remove the border from a Cell once it is applied.
- BorderExtent
- Describes the portion of the region that the BorderStyle will apply to. For example, TOP, BOTTOM, INSIDE, or OUTSIDE. A special value of BorderExtent.NONE will remove the border from the PropertyTemplate. When the template is applied, no change will be made to a cell border where no border properties exist in the PropertyTemplate.
NOTE: The last pt.drawBorders() call removes the borders from the range by using BorderStyle.NONE. Like setCellStyleProperties, the applyBorders method merges the properties of a cell style, so existing borders are changed only if they are replaced by something else, or removed only if they are replaced by BorderStyle.NONE. To remove a color from a border, use IndexedColor.AUTOMATIC.getIndex().
Additionally, to remove a border or color from the PropertyTemplate object, use BorderExtent.NONE.
This does not work with diagonal borders yet.
Creating a Pivot Table
Pivot Tables are a powerful feature of spreadsheet files. You can create a pivot table with the following piece of code.
Cells with multiple styles (Rich Text Strings)
To apply a single set of text formatting (colour, style, font etc) to a cell, you should create a CellStyle for the workbook, then apply to the cells.
To apply different formatting to different parts of a cell, you need to use RichTextString, which permits styling of parts of the text within the cell.
There are some slight differences between HSSF and XSSF, especially around font colours (the two formats store colours quite differently internally), refer to the HSSF Rich Text String and XSSF Rich Text String javadocs for more details.