Although you can now natively parse HTML using DOMParser and XMLHttpRequest, this is a new feature that is not yet supported by all browsers in use in the wild. The code snippets on this page will let your site work until these new features are more widely available.

Safely parsing simple HTML to DOM

When using XMLHttpRequest to get the HTML of a remote webpage, it is often advantageous to turn that HTML string into DOM for easier manipulation. However, there are potential dangers involved in injecting remote content in a privileged context in your extension, so it can be desirable to parse the HTML safely.

The function below will safely parse simple HTML and return a DOM object which can be manipulated like web page elements. This will remove tags like <script>, <style>, <head>, <body>, <title>, and <iframe>. It will also remove all JavaScript, including element attributes that contain JavaScript.

function HTMLParser(aHTMLString){
var html = document.implementation.createDocument("http://www.w3.org/1999/xhtml", "html", null),
body = document.createElementNS("http://www.w3.org/1999/xhtml", "body");
html.documentElement.appendChild(body); body.appendChild(Components.classes["@mozilla.org/feed-unescapehtml;1"].getService(Components.interfaces.nsIScriptableUnescapeHTML).parseFragment(aHTMLString, false, null, body)); return body;
},

It works by creating a content-level (this is safer than chrome-level) <div> in the current page, then parsing the HTML fragment and attaching that fragment to the <div>. The <div> is returned, and it is never actually appended to the current page. The returned <body> object is of type Element

Here is a sample that counts the number of paragraphs in a string:

var DOMPars = HTMLParser('<p>foo</p><p>bar</p>');
alert(DOMPars.getElementsByTagName('p').length);

If method HTMLParser() returns variable html (instead of body), you have all document object with its complete functions list, therefore you can retrieve info within div tag like this:

var DOMPars = HTMLParser("<div id='userInfo'>John was a mediocre programmer, but people liked him <strong>anyway</strong>.</div>");
alert(DOMPars.getElementById('userInfo').innerHTML);

To parse a complete HTML page, load it into an iframe whose type is content (not chrome). See Using a hidden iframe element to parse HTML to a window's DOM below.

Parsing Complete HTML to DOM

Loading an html document seems much more simpler if its loaded using the XMLHttpRequest object. For that matter we are going to load our HTML document first:

var request = XMLHttpRequest();
request.open("GET", "http://example.org/file.html", false);
request.send(null);

our next step is to create the DOM that we need to feed our newly gathered html information:

var doc = document.implementation.createHTMLDocument("example");
doc.documentElement.innerHTML = request.responseText;

after this any manipulation that we might want to do will be something as simple as the following:

doc.body.textContent = "This is inside the body!";

Using a hidden iframe element to parse HTML to a window's DOM

Sample code may need more work. Create your own function using unique name, ID, and so forth.

var frame = document.getElementById("sample-frame");
if (!frame) {
// create frame
frame = document.createElement("iframe"); // iframe (or browser on older Firefox)
frame.setAttribute("id", "sample-frame");
frame.setAttribute("name", "sample-frame");
frame.setAttribute("type", "content");
frame.setAttribute("collapsed", "true");
document.getElementById("main-window").appendChild(frame);
// or
// document.documentElement.appendChild(frame); // set restrictions as needed
frame.webNavigation.allowAuth = false;
frame.webNavigation.allowImages = false;
frame.webNavigation.allowJavascript = false;
frame.webNavigation.allowMetaRedirects = true;
frame.webNavigation.allowPlugins = false;
frame.webNavigation.allowSubframes = false; // listen for load
frame.addEventListener("load", function (event) {
// the document of the HTML in the DOM
var doc = event.originalTarget;
// skip blank page or frame
if (doc.location.href == "about:blank" || doc.defaultView.frameElement) return; // do something with the DOM of doc
alert(doc.location.href); // when done remove frame or set location "about:blank"
setTimeout(function (){
var frame = document.getElementById("sample-frame");
// remove frame
// frame.destroy(); // if using browser element instead of iframe
frame.parentNode.removeChild(frame);
// or set location "about:blank"
// frame.contentDocument.location.href = "about:blank";
},10);
}, true);
} // load a page
frame.contentDocument.location.href = "http://www.mozilla.org/";
// or
// frame.webNavigation.loadURI("http://www.mozilla.org/",Components.interfaces.nsIWebNavigation,null,null,null);

If you are starting with an HTML string, you can convert it to a data URI and use that to load in the browser element.

Using a hidden XUL iframe (alternate example)

Sometimes, a browser element is overkill, or does not meet your needs, or you can't fulfill its requirements. While working on Donkeyfire, I discovered the iframe XUL element, and it is very easy to implement it.

As an example, I will show a browser overlay .xul file, and some JavaScript code to access it.

Here is some XUL code you can add to your browser overlay .xul file. Don't forget to modify the id and name!

<vbox hidden="false" height="0">
<iframe type="content" src="" name="donkey-browser" hidden="false" id="donkey-browser" height="0"/>
</vbox>

Then, in your extension's "load" event handler:

onLoad: function() {
donkeybrowser = document.getElementById("donkey-browser");
if (donkeybrowser) {
donkeybrowser.style.height = "0px";
donkeybrowser.webNavigation.allowAuth = true;
donkeybrowser.webNavigation.allowImages = false;
donkeybrowser.webNavigation.allowJavascript = false;
donkeybrowser.webNavigation.allowMetaRedirects = true;
donkeybrowser.webNavigation.allowPlugins = false;
donkeybrowser.webNavigation.allowSubframes = false;
donkeybrowser.addEventListener("DOMContentLoaded", function (e) { donkeyfire.donkeybrowser_onPageLoad(e); }, true);
}

With that code, we obtain a reference to the iframe element we declared in the .xul file. The most interesting piece of code here is the DOMContentLoaded event listener we define for the element. Let's take a look at the donkeyfire.donkeybrowser_onPageLoad() handler:

donkeybrowser_onPageLoad: function(aEvent) {
var doc = aEvent.originalTarget;
var url = doc.location.href;
if (aEvent.originalTarget.nodeName == "#document") { // ok, it's a real page, let's do our magic
dump("[DF] URL = "+url+"\n");
var text = doc.evaluate("/html/body/h1",doc,null,XPathResult.STRING_TYPE,null).stringValue;
dump("[DF] TEXT in /html/body/h1 = "+text+"\n");
}
},

As you can see, we obtain full access to the DOM of the page we loaded in background, and we can even evaluate XPath expressions. In the example, we dump() to the console the page's URL and the text contained in the first h1 tag of the page's <body>.

But, we still need to see how to execute the famous loadURI() method using our iframe:

donkeybrowser.webNavigation.loadURI("http://developer.mozilla.org",Components.interfaces.nsIWebNavigation, null, null, null);

Also, I recommend you take a look at the nsIWebNavigation interface.

来自MDNhttps://developer.mozilla.org/en-US/Add-ons/Code_snippets/HTML_to_DOM,应该比较完整了。

最新文章

  1. 由Dapper QueryMultiple 返回数据的问题得出==》Dapper QueryMultiple并不会帮我们识别多个返回值的顺序
  2. Linux系统启动级别及顺序
  3. windows上使用image库
  4. js的基本概念详解
  5. ionic 手机端如何嵌入视频iframe
  6. octopress Endless Error With Gem Dependencies
  7. [每日一题] 11gOCP 1z0-052 :2013-09-27 bitmap index.................................................C37
  8. 一个AVRUSB作品HID类
  9. prim模板
  10. 卸载jdk以及重新安装jdk
  11. Ubuntu14.04+cuda 7.5+cudnn_v4+tensorflow安装
  12. gcd,最大公约数,lcm,最小公倍数
  13. CSS3使用content属性来插入项目编号
  14. bzoj5251 [2018多省省队联测]劈配
  15. java中的超类是什么
  16. wrk 性能测试工具安装与使用
  17. PHP02
  18. C# 说说lock到底锁谁?(2)
  19. [转帖]Marvell兵败中国4G 创始人去职未来几何
  20. JavaScript 上万条数据 导出Excel文件 页面卡死

热门文章

  1. 有趣的js题目
  2. datawindow.net 动态按条件汇总字段值
  3. jQuery Mobile中文手册:开发入门
  4. Creole
  5. 简单地Android中图片的三级缓存机制
  6. 平庸与卓越的差别 z
  7. CentOS下安装gns3
  8. 【C traps and pit falls】阅读笔记
  9. Hadoop 中疑问解析
  10. URAL-1997 Those are not the droids you&#39;re looking for 二分匹配