Debugging TensorFlow models

Symbolic nature of TensorFlow makes it relatively more difficult to debug TensorFlow code compared to regular python code. Here we introduce a number of tools included with TensorFlow that make debugging much easier.

Probably the most common error one can make when using TensorFlow is passing Tensors of wrong shape to ops. Many TensorFlow ops can operate on tensors of different ranks and shapes. This can be convenient when using the API, but may lead to extra headache when things go wrong.

For example, consider the tf.matmul op, it can multiply two matrices:

a = tf.random_uniform([2, 3])
b = tf.random_uniform([3, 4])
c = tf.matmul(a, b) # c is a tensor of shape [2, 4]

But the same function also does batch matrix multiplication:

a = tf.random_uniform([10, 2, 3])
b = tf.random_uniform([10, 3, 4])
tf.matmul(a, b) # c is a tensor of shape [10, 2, 4]

Another example that we talked about before in the broadcasting section is add operation which supports broadcasting:

a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
c = a + b # c is a tensor of shape [2, 2]

Validating your tensors with tf.assert* ops

One way to reduce the chance of unwanted behavior is to explicitly verify the rank or shape of intermediate tensors with tf.assert* ops.

a = tf.constant([[1.], [2.]])
b = tf.constant([1., 2.])
check_a = tf.assert_rank(a, 1) # This will raise an InvalidArgumentError exception
check_b = tf.assert_rank(b, 1)
with tf.control_dependencies([check_a, check_b]):
c = a + b # c is a tensor of shape [2, 2]

Remember that assertion nodes like other operations are part of the graph and if not evaluated would get pruned during Session.run(). So make sure to create explicit dependencies to assertion ops, to force TensorFlow to execute them.

You can also use assertions to validate the value of tensors at runtime:

check_pos = tf.assert_positive(a)

See the official docs for a full list of assertion ops.

Logging tensor values with tf.Print

Another useful built-in function for debugging is tf.Print which logs the given tensors to the standard error:

input_copy = tf.Print(input, tensors_to_print_list)

Note that tf.Print returns a copy of its first argument as output. One way to force tf.Print to run is to pass its output to another op that gets executed. For example if we want to print the value of tensors a and b before adding them we could do something like this:

a = ...
b = ...
a = tf.Print(a, [a, b])
c = a + b

Alternatively we could manually define a control dependency.

Check your gradients with tf.compute_gradient_error

Not all the operations in TensorFlow come with gradients, and it's easy to unintentionally build graphs for which TensorFlow can not compute the gradients.

Let's look at an example:

import tensorflow as tf

def non_differentiable_entropy(logits):
probs = tf.nn.softmax(logits)
return tf.nn.softmax_cross_entropy_with_logits(labels=probs, logits=logits) w = tf.get_variable('w', shape=[5])
y = -non_differentiable_entropy(w) opt = tf.train.AdamOptimizer()
train_op = opt.minimize(y) sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(10000):
sess.run(train_op) print(sess.run(tf.nn.softmax(w)))

We are using tf.nn.softmax_cross_entropy_with_logits to define entropy over a categorical distribution. We then use Adam optimizer to find the weights with maximum entropy. If you have passed a course on information theory, you would know that uniform distribution contains maximum entropy. So you would expect for the result to be [0.2, 0.2, 0.2, 0.2, 0.2]. But if you run this you may get unexpected results like this:

[ 0.34081486  0.24287023  0.23465775  0.08935683  0.09230034]

It turns out tf.nn.softmax_cross_entropy_with_logits has undefined gradients with respect to labels! But how may we spot this if we didn't know?

Fortunately for us TensorFlow comes with a numerical differentiator that can be used to find symbolic gradient errors. Let's see how we can use it:

with tf.Session():
diff = tf.test.compute_gradient_error(w, [5], y, [])
print(diff)

If you run this, you would see that the difference between the numerical and symbolic gradients are pretty high (0.06 - 0.1 in my tries).

Now let's fix our function with a differentiable version of the entropy and check again:

import tensorflow as tf
import numpy as np def entropy(logits, dim=-1):
probs = tf.nn.softmax(logits, dim)
nplogp = probs * (tf.reduce_logsumexp(logits, dim, keep_dims=True) - logits)
return tf.reduce_sum(nplogp, dim) w = tf.get_variable('w', shape=[5])
y = -entropy(w) print(w.get_shape())
print(y.get_shape()) with tf.Session() as sess:
diff = tf.test.compute_gradient_error(w, [5], y, [])
print(diff)

The difference should be ~0.0001 which looks much better.

Now if you run the optimizer again with the correct version you can see the final weights would be:

[ 0.2  0.2  0.2  0.2  0.2]

which are exactly what we wanted.

TensorFlow summaries, and tfdbg (TensorFlow Debugger) are other tools that can be used for debugging. Please refer to the official docs to learn more.

更多教程:http://www.tensorflownews.com/

最新文章

  1. STM32F4读写内部FLASH【使用库函数】
  2. Laravel 使用多个数据库的问题。
  3. 好玩儿的Game
  4. slogan
  5. html中出现的script失效
  6. Android SDK Manager无法更新问题解决
  7. com.sun.org.apache.commons.logging.LogConfigurationException: java.lang.NullPointerException
  8. linux系统巡检脚本shell实例
  9. 【转】Tomcat7.0.42源代码运行环境搭建
  10. JavaScript系列文章:详解正则表达式之三
  11. 哈工大数据库系统 实验:练习并熟练掌握交互式 SQL 语言
  12. Python中的list,tuple,dict,set
  13. JavaScript最后的课程笔记
  14. day15-集合
  15. sql 根据日期模糊查询&SQL Server dateTime类型 模糊查询
  16. mysql游标错误
  17. libev学习之ev_run
  18. 新版本IntelliJ IDEA 构建maven,并用Maven创建一个web项目
  19. Java中JNI的使用详解第三篇:JNIEnv类型中方法的使用
  20. Object C学习笔记9-字符串NSMutableString

热门文章

  1. CentOS7 防火墙firewalld详细操作
  2. zookeeper 入门系列-理论基础 – zab 协议
  3. C#中委托。
  4. mongo数据库的常见操作
  5. ( 转 ) WebApiTestClient 的使用
  6. Spring Security入门(3-6)Spring Security 的鉴权 - 自定义权限前缀
  7. 从零搭建 webpack3 环境 #1 - 安装使用
  8. AtCoder Beginner Contest 073
  9. [LeetCode] Find Bottom Left Tree Value 寻找最左下树结点的值
  10. ios、移动端 input type=date无法点击的问题解决方法