The Core Issues and Ideas of This Paper

Problem

  • Baseline Searchable Symmetric Encryption (SSE) could not avoid access-pattern leakage.
  • ORAM algorithm performance is extremely low and cannot be applied in practice.

Idea

Solve the Access-pattern Leakage of current SSE by introducing differential privacy.

Important knowledge

Searchable Symmetric Encryption (SSE)

An SSE scheme is a tuple (KeyGen, BuildIndex, Token, Search, SKE) and asymmetric key encryption scheme.

  • (\(`K_I`\), \(`K_D`\) ) ← KeyGen(\(`1^\kappa`\) ): Probabilistic key generation.

    • Security parameter \(`\kappa`\): input.
    • Secret key \(`K_I`\): For the secure index,
    • Secret key \(`K_D`\) ← SKE.Gen(\(`1^\kappa`\)): For the document collection.
  • \(`I`\) ← BuildIndex(\(`K_I`\), \(`(D, W)`\)): Probabilistic algorithm for the client to build a secure index.
    • \(`K_I`\): input.
    • \(`D`\): Document collection.
    • \(`W`\): keyword lists W.
    • \(`I`\): Secure index.
  • \(`\tau`\) ← Token(\(`K_I`\), \(`w`\)): (Probabilistic) algorithm for the client to generate search tokens.
  • \(`R`\) ← Search(\(`I`\), \(`\tau`\)): Deterministic algorithm for the server.
    • \(`R`\): Document identifications.
  • \(`c`\) ← SKE.Enc(\(`K_D`\), \(`D`\)): Probabilistic algorithm for the client to encrypt the document collection.
  • \(`D`\) ← SKE.Dec(\(`K_D`\), \(`c`\)): Deterministic algorithm for the client to decrypt a ciphertext of a document.

Access-pattern Leakage

In the practical application of SSE, there is Access-pattern Leakage. The main reasons are list as flow:

  • The cloud server is able to observe which files are accessed in the encrypted database by the client.
  • To be used in practice, most existing SSE schemes allow it.
  • With some a priori knowledge of the outsourced documents, the adversary could recover the content of the queries with high accuracy.

Query Recovery Attack (IKK Attack)

IKK attack is a typical attack method for SSE with Access-pattern Leakage.

Assumption

The adversary has the knowledge of a (\(`r\times r`\) matrix \(`M`\) that depicts the probability of keyword co-occurrence (r is the number of keywords).

Method

  1. Compute \(`l\times l`\) co-occurrence matrix \(`\hat{M}`\) by the observed access patterns(a sub-matrix of \(`M`\)).
  2. The best match of \(`\hat{M}`\) to \(`M`\) can be generated by optimization methods (e.g. Simulated Annealing).

ORAM Algorithm

This algorithm allows SSE to defend against Access-pattern Leakage (with IKK attack method), but has serious performance problems and is of low practical value.

  • Allows a client to hide its access pattern from the remote server by continuously shuffling and re-encrypting data as they are accessed.
  • Access one of n documents in the storage, at least o(log n) documents need to be accessed. [Too much overhead for SSE]

Differential Privacy

Differential Privacy introduction: The Differential Privacy Frontier (Extended Abstract)

Assuming a positive real number \(`\epsilon`\), \(`A`\) is a random algorithm that takes a data set as input (representing the data owned by the relying party). \(`imA`\) represents the mapping of \(`A`\). For all data sets \(`D_1`\) and \(`D_2`\) of non-single elements (i.e., one person's data) and all subsets \(`S`\) of \(`imA`\), algorithm \(`A`\) is \(`\epsilon - differential \quad privacy`\), where the probability depends on the randomness of the algorithm.

Pr[A(D_1)\in S]\leqslant e^\epsilon \times Pr[A(D_2) \in S]

If an attacker is required to receive a \(`Q_i`\) (\(`i_{th}`\) query) value through a \(`\epsilon - differential \quad privacy`\) algorithm, he will not be able to distinguish between the two data sets if \(`\epsilon`\) is small enough.

Erasure Coding

The erasure code is the main method for adding redundancy to the Secure index.

Notes on erasure codes that I have posted on cnblogs

Key points

Assumption

  • Adversary has complete knowledge of the document collection.
  • Server simply passively monitors the storage access patterns and infers the content of the corresponding queries.

Why Introduce Differential Privacy for SSE

Differential privacy is a strong privacy guarantee for an individual’s input to a (randomized) function or sequence of functions.

Differential Privacy rules imply that the adversary cannot distinguish between queries using distinct search terms that induce access patterns that are within the specified distance of one another.

d-privacy

Here, \(`d`\) represents the Hamming distance in the access-pattern vector. By the parameter \(`d`\), the generalized \(`\epsilon - differential \quad privacy`\) definition is designed (add \(`d`\) as a parameter of \(`e^{\epsilon d}`\)).

d-private Access-pattern Obfuscation Mechanism

Add the two following part to SSE:

  • Obfuscate the access patterns: Add false positives and false negatives to the search results.
  • To handle the correctness issue: Introduce redundancy to the document collection using erasure codes.

The way to establish d-privacy APO

Define an access-pattern obfuscation mechanism \(`K`\) : \(`X \rightarrow Y`\) gives \(`\epsilon d_{h}-privacy`\), iff \(`\forall x,x' \in X`\) and \(`\forall S \subseteq Y`\) (using the Hamming distance \(`d_h`\))

Pr[K(x)\in S]\leqslant e^{\epsilon d_h(x,x')} \times Pr[K(x') \in S]

Define an obfuscation mechanism \(`K_f`\) such that, given an access pattern \(`x \in X`\), it outputs any \(`y \in Y`\) with probability

Pr[K_f(x)=y]=Pr[x|y]=\prod^n_{i=1}Pr[y_i|x_i]

Where

Pr[y_i=1|x_i=1]=p \qquad Pr[y_i=1|x_i=0]=q
Pr[y_i=0|x_i=1]=1-p \quad Pr[y_i=0|x_i=0]=1-q

Enforce two constraints on p and q to make the mechanism practical:

  • \(`Pr[y_i = 1|x_i = 0] < Pr[y_i = 1|x_i = 1]`\): non-matching shard should have a lower probability to be retrieved than a matching shard;
  • \(`Pr[y_i = 1|x_i = 0] < Pr[y_i = 0|x_i = 1]`\): non-matching shard should have a lower probability to be flipped than a matching shard.

Means that \(`q < p`\) and \(`q < 1-p`\). And find out that \(`\epsilon = ln(\frac{p}{q})`\).

By using the (m,k) erasure code, six parameter optimization conditions are established, and the values of all the variables required are obtained.

Workloads

  • Defined d-privacy for access patterns of general SSE schemes.
  • Proposed a d-private access-pattern obfuscation mechanism that is compatible with existing SSE schemes.
  • Implemented a prototype of the proposed obfuscation mechanism.

Evaluation

Based on the Enron Email Dataset.

Security

  • Baseline IKK attack on SSE with and without access-pattern obfuscation method.
  • Improved IKK attack (Adversary can successfully figure out which shards belong to the same documents) on SSE with and without access-pattern obfuscation method.

Performance

  • Storage and Communication Overhead
  • Precision
  • Runtime Overhead (build SSE local)

最新文章

  1. MailKit---获取邮件
  2. 从0到1打造直播 App
  3. Git简单应用(1)
  4. Solr JAVA客户端SolrJ 4.9使用示例教程
  5. 《利用python进行数据分析》读书笔记--第十一章 金融和经济数据应用(一)
  6. sql中写标量函数生成大写拼音首字母
  7. IIFE-js中(function(){…})()立即执行函数写法理解
  8. 解决github提交commit,contributions不统计显示绿色的问题
  9. 新一代分布式任务调度框架:当当elastic-job开源项目的10项特性
  10. System services not available to Activities before onCreate()
  11. 【2017-02-21】分支语句if...else...、分支嵌套、变量的作用域
  12. thinkPHP数据库操作
  13. 求第k小的数 O(n)复杂度
  14. React Native组件、生命周期及属性传值props详解
  15. 用Jmeter进行接口测试
  16. TCPlayer web切换播放问题
  17. linux修改文件为可执行文件
  18. mac 打印机无法打印
  19. JavaScript 检查IP
  20. 术语-软件-软件开发:SDK(软件开发工具包)

热门文章

  1. 使用virtualenv构建python虚拟环境
  2. poj 1080 Human Gene Functions(lcs,较难)
  3. MySQL--开发技巧(一)
  4. org.apache.catalina.core.StandardWrapperValve invoke报错
  5. stl_set.h
  6. Nginx-rtmp模块实现流媒体play、push、pull功能
  7. 51Nod1766 树上的最远点对
  8. ACM学习历程—HDU4746 Mophues(莫比乌斯)
  9. Hough变换原理
  10. 我的日志app企划书1.0版本