The major web browsers load web pages in basically the same way. This process is known as parsing and is described by the HTML5 specification. A high-level understanding of this process is critical to writing web pages that load efficiently.

Parsing overview

As chunks of the HTML source become available from the network (or cache, filesystem, etc), they are streamed to the HTML parser. Next, in a process known as tokenization, the parser iterates through the source generating a token for (most notably) each start tag, end tag and character outside of a tag.

For example the input source <b>hello</b> yields 7 tokens:

start-tag { name: b }
character { data: h }
character { data: e }
character { data: l }
character { data: l }
character { data: o }
end-tag { name: b }

After each token is generated it is serially passed to the next major subsystem: the tree builder. The tree builder dynamically modifies the Document's DOM tree to reflect the new token.

The 7 input tokens above yield the following DOM tree:

<html>
<head>
<body>
<b>
"hello"

Fetching subresources

A frequent operation performed by the tree builder is creating a new HTML element and inserting it into the Document. It is at the point of insertion that HTML elements which load subresources begin fetching the subresource.

Running scripts

This parsing algorithm seems to translate HTML source into a DOM tree as efficiently as possible. That is, except for one wrinkle: scripts. When the tree builder encounters an end-tag token for a script, it must serially execute the script before parsing can continue (unless the associated script start-tag has the defer or async attribute).

There are two significant preconditions which must be fulfilled before a script can execute:

  1. If the script is external its source must be fully downloaded.
  2. For any script, all stylesheets in the document must be fully downloaded.

This means often the parser must idly wait while scripts and stylesheets are downloaded.

Why must parsing halt?

Well, a script may document.write something which affects further parsing or it may query something about the DOM which would yield incorrect results if parsing had continued (for instance the number of image elements in the DOM).

Why wait for stylesheets?

A script may expect to access the CSSOM directly or it may query an attribute of a DOM node which depends on the stylesheet (for example, how wide is a certain <table>).

Is it inefficient to block parsing?

Yes. Subresource download times often have a large constant factor limited by round trip time. This means it is faster to download two resources in parallel than to download the same two in serial. More obviously, the browser is also free to do CPU work while waiting on the network. For these reasons it is critical to efficient loading of a web page that subresource fetches are initiated as soon as possible. When parsing is blocked, the tree builder is not able to insert subsequent elements into the DOM, and thus subsequent subresource downloads are not initiated even if the HTML source which includes them is already available to the parser.

Mitigating blocking

As I've blogged previously, when the parser becomes blocked WebKit will run a lightweight parser known as the preload scanner. It mitigates the blocking problem by scanning ahead and fetching certain subresource that may be required. Other browsers employ similar techniques.

It is important to note that even with preload scanning, parsing is still blocked. Nodes cannot be added to the DOM tree. Although I haven't covered how a DOM tree becomes a render tree, layout or painting, it should be obvious that before a node is in the DOM tree it cannot be painted to the screen.

Finishing parsing

After the entire source has been parsed, first all deferred scripts will be executed (waiting for their source and all pending stylesheets to download). Their completion triggers the DOMContentLoaded event to be fired. Next, the parser will wait for any pending async scripts to finish loading and executing. Finally, once all subresources have finished downloading, the window's load event will be fired and parsing is complete.

Takeaway

With this understanding, it becomes clear how important it is to carefully consider where and how stylesheets and scripts are included in the document. Those decisions can have a significant impact on the efficiency of the page load.

最新文章

  1. oracle备份工具exp-imp
  2. Java线程池的实现
  3. this指针基础介绍
  4. js中typeof可以准确判断哪些变量类型
  5. CopyU!下一次更新将增加对设备厂商及型号的识别!
  6. FreeBSd ports 安装软件
  7. cisco san交换机配置
  8. 如何解决vector 析构函数的异常 opencv Assert _CrtIsValidHeapPointer
  9. python生产者消费者模型
  10. UVALive - 3523 - Knights of the Round Table
  11. 关于iptables添加规则不生效的问题
  12. MVC Razor
  13. python3与python2的区别(目前遇到的)
  14. caffe中的学习率的衰减机制
  15. C++ 学习路线和看法
  16. js 数组的删除
  17. OpenCV 颜色空间转换参数CV_BGR2GRAY改变
  18. springMVC 使用注解注入接口实现类
  19. 哈尔滨理工大学第七届程序设计竞赛初赛(高年级组)I - B-旅行
  20. 纯css做三角形形状

热门文章

  1. OSError: [Errno 22] Invalid argument: &#39;D:\x07ckup\test.txt&#39;
  2. set/multiset容器
  3. C# winform窗口打开特效及窗口位置居中
  4. plsql Developer----plsql软件总结(待续更新)
  5. linux环境下搭建redis
  6. python sentence
  7. MySQL自带的性能压力测试工具mysqlslap详解
  8. 第二届PHP全球开发者大会(含大会的PPT)
  9. java继承。顾不了
  10. mina在spring中的配置多个端口