近期有个任务要求处理大XML文件,其中有个存了Base64的大节点(>90M,路径已知)。

这种任务只能上XmlReader,即使如此大节点的处理还是头疼了一阵……

最初查MSDN的时候,找到了ReadChars(),可以拿来对付大节点。

方法说明:https://msdn.microsoft.com/zh-cn/library/system.xml.xmltextreader.readchars(v=vs.110).aspx

示例中提到使用方法是:

while( != reader.ReadChars(buffer, , ))
{
// Do something.
// Attribute values are not available at this point.
}

这个处理规范格式的XML没有问题,比如这样的:

<Root>
<LeafNode>Value</LeafNode>
<ParentNode>
<LeafNode>Value</LeafNode>
</ParentNode>
</Root>

但是(没人喜欢这个词,然并卵……),遇到些格式诡异的XML就……

<Root><LeafNode>Value</LeafNode><ParentNode>
<LeafNode>Value</LeafNode></ParentNode>
</Root>

比如这个画风的,用示例代码去读第一个LeafNode的内容,估计会读出“ValueValue”来……

偏偏输入的XML就是这风格的……(*sigh*)

单步执行了一阵,发现这种情况下,XmlTextReader.Name会变化成下个节点的名称(XmlTextReader.LocalName亦如此),可以根据这个判断是否已经达到节点结尾。

改进版为:

string currentName = reader.LocalName;
while(currentName == reader.LocalName && != reader.ReadChars(buffer, , ))
{
// Do something.
// Attribute values are not available at this point.
}

顺便贴上一个转写并对特定节点进行处理的代码:

List<string> processNodePathList = new List<string> {"/Root/Path/to/Target"};
List<string> bigNodePathList = new List<string> { "/Root/Path/to/Big/Node" }; private static void ProcessBigXmlFile(string sourcePath, string targetPath, IList<string> processNodePathList, IList<string> bigNodePathList)
{
var processNodeNameList =
processNodePathList.Select(
processNodePath => processNodePath.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries))
.Select(nodePathParts => nodePathParts[nodePathParts.Length - ])
.ToList();
var bigNodeNameList = bigNodePathList.Select(
bigNodePath => bigNodePath.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries))
.Select(nodePathParts => nodePathParts[nodePathParts.Length - ])
.ToList(); var sourceStream = new FileStream(sourcePath, FileMode.Open, FileAccess.Read);
var reader = new XmlTextReader(sourceStream); var targetStream = new FileStream(targetPath, FileMode.Create, FileAccess.Write);
var writer = new XmlTextWriter(targetStream, Encoding.UTF8); try
{
var pathStack = new Stack<string>();
var readResult = reader.Read();
while (readResult)
{
int skipMode = ;
switch (reader.NodeType)
{
case XmlNodeType.Element:
{
pathStack.Push(reader.Name);
writer.WriteStartElement(reader.LocalName);
if (reader.HasAttributes)
{
while (reader.MoveToNextAttribute())
{
writer.WriteAttributeString(reader.LocalName,
reader.Value);
}
reader.MoveToElement();
} if (processNodeNameList.Contains(reader.LocalName))
{
var index = processNodeNameList.IndexOf(reader.LocalName);
if (CompareNodePath(pathStack, processNodePathList[index]))
{ // Replace node content writer.WriteFullEndElement();
skipMode = ;
}
}
else if (bigNodeNameList.Contains(reader.LocalName))
{
var index = bigNodeNameList.IndexOf(reader.LocalName);
if (CompareNodePath(pathStack, bigNodePathList[index]))
{
reader.MoveToContent();
var buffer = new char[];
int len;
while (reader.LocalName == bigNodePathList[index] &&
(len = reader.ReadChars(buffer, , buffer.Length)) > )
{
writer.WriteRaw(buffer, , len);
}
writer.WriteFullEndElement();
skipMode = ;
}
}
if (reader.IsEmptyElement)
{
pathStack.Pop();
writer.WriteEndElement();
}
break;
}
//case XmlNodeType.Attribute:
//{
// newPackageWriter.WriteAttributeString(oldPackageReader.LocalName, oldPackageReader.Value);
// break;
//}
case XmlNodeType.Text:
{
writer.WriteValue(reader.Value);
break;
}
case XmlNodeType.CDATA:
{
writer.WriteCData(reader.Value);
break;
}
//case XmlNodeType.EntityReference:
//{
// newPackageWriter.WriteEntityRef(oldPackageReader.Name);
// break;
//}
//case XmlNodeType.Entity:
//{
// break;
//}
case XmlNodeType.ProcessingInstruction:
{
writer.WriteProcessingInstruction(reader.Name, reader.Value);
break;
}
case XmlNodeType.Comment:
{
writer.WriteComment(reader.Value);
break;
}
//case XmlNodeType.Document:
//{
// break;
//}
case XmlNodeType.DocumentType:
{
writer.WriteRaw(string.Format("<!DOCTYPE{0} [{1}]>", reader.Name,
reader.Value));
break;
}
//case XmlNodeType.DocumentFragment:
//{
// break;
//}
//case XmlNodeType.Notation:
//{
// break;
//}
case XmlNodeType.Whitespace:
{
writer.WriteWhitespace(reader.Value);
break;
}
//case XmlNodeType.SignificantWhitespace:
//{
// break;
//}
case XmlNodeType.EndElement:
{
pathStack.Pop();
writer.WriteFullEndElement();
break;
}
case XmlNodeType.XmlDeclaration:
{
writer.WriteStartDocument();
break;
}
} switch (skipMode)
{
case :
{
reader.Skip();
pathStack.Pop();
readResult = !reader.EOF;
break;
}
case :
{
pathStack.Pop();
readResult = !reader.EOF;
break;
}
default:
{
readResult = reader.Read();
break;
}
}
}
}
finally
{
writer.Close();
targetStream.Close();
targetStream.Dispose();
reader.Close();
sourceStream.Close();
sourceStream.Dispose();
}
} private static bool CompareNodePath(Stack<string> currentNodePathStack, string compareNodePathString)
{
var currentArray = currentNodePathStack.Reverse().ToArray();
var compareArray = compareNodePathString.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
if (compareArray.Length != currentArray.Length)
{
return false;
}
bool isDifferent = false;
for (int i = ; i < currentArray.Length; i++)
{
if (compareArray[i] != currentArray[i])
{
isDifferent = true;
break;
}
}
return !isDifferent;
}

最新文章

  1. POJ1149 PIGS [最大流 建图]
  2. 小程序和APP谁将主导未来?
  3. iOS之tabBar随tableView的滑动而隐藏/显现
  4. wp8开发笔记之应用程序真机发布调试
  5. DTCMS中部分IE8不支持webupload上传附件的控件,更改为ajaxfileupload
  6. [FOJ 1752] A^B mod C
  7. 测试MD5的加密功能
  8. 常见的 http 状态码
  9. js实现浏览器添加收藏功能
  10. Day3---------网络基础和DOS命令
  11. 使用Promise发送多个异步请求, 全部完成后再执行
  12. spring boot 获取bean
  13. ABAP-异常捕获
  14. 第三方git pull免密码更新
  15. 洛谷P3398 仓鼠找sugar [LCA]
  16. 通过js获取内网ip和外网ip的简单方法 ...
  17. 点击底部input输入框,弹出的软键盘挡住input(苹果手机使用第三方输入法 )
  18. Springboot 日志管理配置logback-spring.xml
  19. mediawiki的安装
  20. SQL SERVER技术内幕之4 子查询

热门文章

  1. hdu_1029-Ignatius and the Princess IV_201310180916
  2. BZOJ——T 2097: [Usaco2010 Dec]Exercise 奶牛健美操
  3. DTrace Probes In MySQL 自定义探针
  4. 【实时文件同步】rsync+inotify-tools的安装与配置
  5. 《Pro Android Graphics》读书笔记之第二节
  6. luogu2157 [SDOI2009]学校食堂 局部状压
  7. media type
  8. Android和H5交互-基础篇
  9. 无损压缩算法历史——熵编码是最早出现的,后来才有Lzx这些压缩算法
  10. Rocky(模拟)