Paper Title

Real-time Attention Based Look-alike Model for Recommender System

Basic algorithm and main steps

Basic ideas

RALM is a similarity based look-alike model, which consists of user representation learning and look-alike learning. Novel points: attention-merge layer, local and global attention, on-line asynchronous seeds cluster.

1. Offline Traning

1. User Representation Learning

Treat it as multi-class classification that chooses an interest item from millions of candidates.

(1) Calculate the possibility of picking the $ i$-th item as a negative example

$ p(x_i) = \frac{log(k+2)-log(k+1)}{log(D+1)} $

$ D $: the max rank of all the items( rank by their frequency of appearance.)

$ k $: the rank of the $ i$-th item.

(2) Negative sampling: ample in a positive/negative proportion of 1/10

(3) Embedding layer

$ P(c=i|U,X_i) = \frac{e^{x_i u}}{\sum \limits_{j \in X}e^{x_j u}} $

the cross entropy loss : $ L = -\sum \limits_{j \in X} y_i log P(c=i|U,X_i) $

$ u $: a high-dimensional embedding of the user

$ x_j $: embeddings of item $ j $

$ y_i \in {0, 1} $: the label

When converge, output: the representation of user interests.

(4) Attention merge layer

Learn user-related weights for multiple fields.

\(n\) fields are embedded with the same length \(m\) as vector \(h \in R^m\), and then concatenate them in dimension 2, resulting a matrix \(H \in R^{n×m}\). Next, compute weights:

$ u = tanh(W_1H) $

$ w_i = \frac{e{W_2u_iT}}{\sum_j^n e{W_2u_jT}} $

\(W_1 \in R^{k×n}\) and \(W_2 \in R^k\) : weight matrix , \(k\) size of attention unit,

$ u \in R^n$ :the activation unit for fields, \(a ∈ R^n\) weights of fields.

Merge vector $ M \in R^m : M = aH $

Then take it as the input of the MLP layer and get universal user embedding.

2. Look-alike Learning

(1) Transforming matrix.

$ n \times m $ to $ n \times h $

(2) Local attention

To activate local interest / mine personalized info.

$ E_{local_s} = E_s softmax(tanh(E_s^T W_l E_u)) $

\(W_l \in R^{h \times h}\) : the attention matrix,

\(E_s\) : seen user $ E_u $: target user

Note: Firstly, cluster the seed users through K-means algorithm into k clusters, and for each cluster , calculate the average mean of seeds vectors.

(3) Global attention

$ E_{global_s} = E_s softmax(E_s^T tanh(W_g E_s)) $

(4) Calculate the similarity between seeds and target user

$ score_{u,s} = \alpha \cdot cosine(E_u,E_{global_s}) + \beta \cdot cosine(E_u, E_{local_s}) $

(5) Iterative training

2. Online Asynchronous Processing

Update seeds embedding database in real-time . It includes user feedback monitor and seeds clustering.

3. Online Serving

$ score_{u,s} = \alpha \cdot cosine(E_u,E_{global_s}) + \beta \cdot cosine(E_u, E_{local_s}) $

Motivation

  • The "Matthew effect" becomes increasingly evident in recent recommendation systems. Many competitive long-tail contents are

    difficult to achieve timely exposure because of lacking behavior

    features .
  • Traditional look-alike models which widely used in on-line

    advertising are not suitable for recommender systems because of

    the strict requirement of both real-time and effectiveness.

Contribution

  • Improve the effectiveness of user representation learning. Use the attention to capture various fields of interests.
  • Improve the robustness and adaptivity of seeds representation learning. Use local and global attention.
  • Realize a real-time and high-performance look-alike model

My own idea

Relations to what I had read

  • Method of concatenating feature fields. In other paper about CTR I had read, different feature fields

    are concatenated directly. It will cause overfitting in strongly-relevant fields(such as interested tags) and underfitting in to weakly-relevant fields(such as shopping interests) . Then it leads to a result that the recommended results are determined by the few strongly-relevant fields. Such models can not learn comprehensively on multi-fields features, and will lack diversity of recommended results. But in this paper, it uses attention merge to learn effective relations among different fields of user features.
  • Besides, it uses high-order continuous features instead of categorical features. In my opinion, if we use low-order categorical features to express the user group, we can only use statistical methods to construct the features, which will lose most of the information of the group. However, the higher-order continuous features after presentation learning actually contain the intersections of various lower-order features of users, which can more comprehensively express the information of users. Moreover, the higher-order features are generalized to avoid the expression of memory trapped in historical data.

Shortcomings and potential change I assume

  • In this paper, it seems that only a few features are used to learn representation, which may limits the effect in some extends.

最新文章

  1. Open 语法的使用
  2. Unit01: JAVA开发环境案例
  3. hibernate配置文件详细解析
  4. jQuery简单入门(三)
  5. 剑指Offer 替换空格
  6. BlueTooth: 蓝牙基础知识进阶——链路控制操作
  7. 为C# Windows服务添加安装程序
  8. 注册表删除chrome插件
  9. 深入理解Linux的系统调用【转】
  10. Web前端新人之CSS样式选择器
  11. 【HDU】4092 Nice boat(多校第四场1006) ——线段树 懒惰标记
  12. JS中for循序中延迟加载实现动态效果
  13. DOM 节点
  14. 大数据入门基础系列之Hadoop1.X、Hadoop2.X和Hadoop3.X的多维度区别详解(博主推荐)
  15. xenserver 上传centos6.8镜像
  16. unbutu中安装jdk并编写第一个java程序
  17. php多进程 防止出现僵尸进程
  18. U8API——向U8数据库表导入数据
  19. DML操纵语句
  20. pcl知识

热门文章

  1. spring-in-action_day01
  2. ACID和CAP的比较
  3. 强大的word插件,让工作更高效:不坑盒子 2023版
  4. 解决 Vue3 中路由切换到其他页面再切换回来时 Echarts 图表不显示的问题
  5. OWASP 靶机下载
  6. 【TS】函数和函数类型
  7. JZOJ 4754.矩阵
  8. 基于Python的OpenGL 02 之着色器
  9. c#反射优化
  10. ubuntu 中将DSLR相机用作网络摄像头