最近在利用poi往excel中写入大量数据时,发现excel2003最多只支持65535条,大量数据时容易造成oom,上网查了一下api,发现目前对于2003,每个sheet最多支持65535条,若数据量远超65535,建议分sheet处理,而poi3.8之后,出现了SXSSFWorkbook,可以支持大数据量的写入excel操作,但是目前只支持excel2007

HSSF是POI工程对Excel 97(-2007)文件操作的纯Java实现
XSSF是POI工程对Excel 2007 OOXML (.xlsx)文件操作的纯Java实现

从POI 3.8版本开始,提供了一种基于XSSF的低内存占用的API----SXSSF

SXSSF通过一个滑动窗口来限制访问Row的数量从而达到低内存占用的目录,XSSF可以访问所有行。旧的行数据不再出现在滑动窗口中并变得无法访问,与此同时写到磁盘上。
在自动刷新的模式下,可以指定窗口中访问Row的数量,从而在内存中保持一定数量的Row。当达到这一数量时,在窗口中产生新的Row数据,并将低索引的数据从窗口中移动到磁盘中。
或者,滑动窗口的行数可以设定成自动增长的。它可以根据需要周期的根据一次明确的flushRow(int keepRows)调用来进行修改。

SXSSF (Streaming Usermodel API)

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)

When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.

The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.

A windowSize of -1 indicates unlimited access. In this case all records that have not been flushed by a call to flushRows() are available for random access.

Note that SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.

SXSSFWorkbook defaults to using inline strings instead of a shared strings table. This is very efficient, since no document content needs to be kept in memory, but is also known to produce documents that are incompatible with some clients. With shared strings enabled all unique strings in the document has to be kept in memory. Depending on your document content this could use a lot more resources than with shared strings disabled.

Carefully review your memory budget and compatibility needs before deciding whether to enable shared strings or not.

The example below writes a sheet with a window of 100 rows. When the row count reaches 101, the row with rownum=0 is flushed to disk and removed from memory, when rownum reaches 102 then the row with rownum=1 is flushed, etc.

import junit.framework.Assert;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;

    public static void main(String[] args) throws Throwable {
        SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
        Sheet sh = wb.createSheet();
        for(int rownum = 0; rownum < 1000; rownum++){
            Row row = sh.createRow(rownum);
            for(int cellnum = 0; cellnum < 10; cellnum++){
                Cell cell = row.createCell(cellnum);
                String address = new CellReference(cell).formatAsString();
                cell.setCellValue(address);
            }

        }

        // Rows with rownum < 900 are flushed and not accessible
        for(int rownum = 0; rownum < 900; rownum++){
          Assert.assertNull(sh.getRow(rownum));//调用了getRow方法,写入到磁盘中,释放了内存
        }

        // ther last 100 rows are still in memory
        for(int rownum = 900; rownum < 1000; rownum++){
           // Assert.assertNotNull(sh.getRow(rownum));//未调用,保留在内存中
        }

        FileOutputStream out = new FileOutputStream("D:\\sxssf.xlsx");
        wb.write(out);
        out.close();

        // dispose of temporary files backing this workbook on disk
        wb.dispose();
    }

The next example turns off auto-flushing (windowSize=-1) and the code manually controls how portions of data are written to disk

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;

    public static void main(String[] args) throws Throwable {
        SXSSFWorkbook wb = new SXSSFWorkbook(-1); // turn off auto-flushing and accumulate all rows in memory
        Sheet sh = wb.createSheet();
        for(int rownum = 0; rownum < 1000; rownum++){
            Row row = sh.createRow(rownum);
            for(int cellnum = 0; cellnum < 10; cellnum++){
                Cell cell = row.createCell(cellnum);
                String address = new CellReference(cell).formatAsString();
                cell.setCellValue(address);
            }

           // manually control how rows are flushed to disk
           if(rownum % 100 == 0) {
                ((SXSSFSheet)sh).flushRows(100); // retain 100 last rows and flush all others

                // ((SXSSFSheet)sh).flushRows() is a shortcut for ((SXSSFSheet)sh).flushRows(0),
                // this method flushes all rows
           }

        }

        FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
        wb.write(out);
        out.close();

        // dispose of temporary files backing this workbook on disk
        wb.dispose();
   }

SXSSF flushes sheet data in temporary files (a temp file per sheet) and the size of these temporary files can grow to a very large value. For example, for a 20 MB csv data the size of the temp xml becomes more than a gigabyte. If the size of the temp files is an issue, you can tell SXSSF to use gzip compression:

  SXSSFWorkbook wb = new SXSSFWorkbook();
  wb.setCompressTempFiles(true); // temp files will be gzipped

以上内容来自API及个人总结,详见源API    http://poi.apache.org/spreadsheet/how-to.html

最新文章

  1. ZeroMQ接口函数之 :zmq_unbind - 停止连接外来的请求
  2. AFNetworking3.0 Https P12证书
  3. Windows 10 下ASP.NET4.0尚未在Web服务器上注册
  4. Nginx 缓存参数
  5. hadoop多次格式化后,导致datanode启动不了
  6. VS2010升级VS2012必备(MVC4 WebPage2.0 Razor2.0资料汇集)
  7. array_column php 函数
  8. wince下GetManifestResourceStream得到的Stream是null的解决
  9. style、currentStyle、getComputeStylel的使用
  10. 使用python之环境管理
  11. OpenStack(企业私有云)万里长征第二步——使用Fuel部署
  12. Firefox插件推荐
  13. socket模型处理多个客户端
  14. json_encode处理json数据中文乱码
  15. 常用的phpdoc标签
  16. Linq中left join之多表查询
  17. 将任何GitHub内的代码转为外部CDN网址
  18. jquery给按钮绑定事件
  19. JavaScript 数组复制的方法
  20. js实现上传图片本地预览功能以及限制图片的文件大小和尺寸大小

热门文章

  1. 转:java项目经验面试总结
  2. NOI 7614 最低通行费(多段图最短路)
  3. spring boot 捕获filter异常 统一返回处理结果
  4. MVC---- DataSet 页面遍历
  5. Mysql tinyint长度为1时在java中被转化成boolean型
  6. Javascript 面向对象-继承
  7. linux之CentOS7在线安装Mysql
  8. 20170706wdVBA保存图片到本地API
  9. Html之a标签的使用
  10. Confluence 6 教程:在 Confluence 中导航