转自:https://blog.abevoelker.com/2017-01-03/generating-youtube-like-ids-in-postgres-using-plv8-and-hashids/

Recently on a Rails project, I ran into an issue where I wanted to expose a resource (lets say it was a product) in a RESTful route, but I also didn’t want the URLs to be easily guessable. In other words, following Rails conventions my standard “show” actions would be URLs like https://example.com/products/1https://example.com/products/2https://example.com/products/3, which are trivially guessable since we’re exposing the database’s auto-incrementing integer primary key as the resource ID. To prevent people from writing a super simple script that could scrape my whole product catalog, it would be nice if we could make the URLs not trivially guessable while still remaining publicly-accessable for people who know them.

One approach that some people advocate is simply using UUIDs, but I think URLs like https://example.com/products/3bc95fb9-f0c1-4af8-989e-6ea8467879d3 simply look nasty, particularly when you get into nested sub-resources with their own UUIDs tacked on. It’s something I don’t want to subject my users’ eyes to or have potentially affect SEO / page rank due to the extraneous length.1

Hashids

A nice compromise here is using a library called Hashids, which can take an integer input (e.g. our primary keys), and a salt, and obfuscate2 them into YouTube-like, short, non-guessable IDs like these: https://example.com/products/NVhttps://example.com/products/6mhttps://example.com/products/yD.

The Hashids project links to many implementations and documentation in various languages, including Ruby. Since my project is using Rails, a simple solution would be to add an after_create callback to my model to set an attribute using the Ruby library:

# == Schema Information
#
# Table name: products
#
# id :integer not null, primary key
# title :string
# hashid :string
#
# Indexes
#
# index_products_on_hashid (hashid)
# class Product < ActiveRecord::Base
after_create :save_hashid private def save_hashid
unless self.hashid
h = Hashids.new(ENV["HASHID_SALT"], ENV["HASHID_MIN_LENGTH"].to_i)
self.update!(hashid: h.encode(self.id))
end
end
end

This works! However there are at least two drawbacks:

  1. Creating a Product requires two round-trips to the database: an INSERT to create the record with a NULL value in the hashid column, then an UPDATE after Rails gets the value of the integer id column and can calculate the Hashid value, and update the record with it. This should be safe in terms of not leaving half-baked products records with NULL hashid values out there, since Rails runs after_create callbacks in the same transaction that creates the record, but it’s not good performance-wise.
  2. Somewhat related to the first drawback, the schema for this table is not optimal as the hashid column should really have a NOT NULL constraint with a UNIQUE index. But using Rails callbacks forces it to be this way. It would be much more preferable if we could lean on the database to enforce referential integrity; at my job we’ve seen plenty of instances of bad data getting into loose schemas that Should Never Happen™ from the application’s point of view.

If only there were a way for Postgres to populate that column instead…

Executing JavaScript in Postgres using PL/V8

Luckily there is a way to do this using a Postgres extension that embeds the V8 JavaScript engine in Postgres called PL/V8!3

On Ubuntu, installing PL/V8 is as easy as doing sudo apt-get install postgresql-9.6-plv8 (substitute 9.6 with whatever Postgres version you have installed) and restarting the database cluster with sudo service postgres restart. Then, open a SQL prompt on the database you want to enable it for, and execute CREATE EXTENSION plv8;. Now you can write JavaScript functions in the database!

The first step is writing a function to load the Hashids library:

  CREATE OR REPLACE FUNCTION load_hashids() RETURNS VOID AS $$
  (function() {
  !function(t,e){if("function"==typeof define&&define.amd)define(["module","exports"],e);else if("undefined"!=typeof exports)e(module,exports);else{var s={exports:{}};e(s,s.exports),t.Hashids=s.exports}}(this,function(t,e){"use strict";function s(t,e){if(!(t instanceof e))throw new TypeError("Cannot call a class as a function")}Object.defineProperty(e,"__esModule",{value:!0});var h=function(){function t(t,e){for(var s=0;s<e.length;s++){var h=e[s];h.enumerable=h.enumerable||!1,h.configurable=!0,"value"in h&&(h.writable=!0),Object.defineProperty(t,h.key,h)}}return function(e,s,h){return s&&t(e.prototype,s),h&&t(e,h),e}}(),r=function(){function t(){var e=arguments.length<=0||void 0===arguments[0]?"":arguments[0],h=arguments.length<=1||void 0===arguments[1]?0:arguments[1],r=arguments.length<=2||void 0===arguments[2]?"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890":arguments[2];s(this,t);var a=16,n=3.5,i=12,l="error: alphabet must contain at least X unique characters",u="error: alphabet cannot contain spaces",p="",o=void 0,f=void 0;this.escapeRegExp=function(t){return t.replace(/[-[\]{}()*+?.,\\^$|#\s]/g,"\\$&")},this.parseInt=function(t,e){return/^(\-|\+)?([0-9]+|Infinity)$/.test(t)?parseInt(t,e):NaN},this.seps="cfhistuCFHISTU",this.minLength=parseInt(h,10)>0?h:0,this.salt="string"==typeof e?e:"","string"==typeof r&&(this.alphabet=r);for(var g=0;g!==this.alphabet.length;g++)p.indexOf(this.alphabet.charAt(g))===-1&&(p+=this.alphabet.charAt(g));if(this.alphabet=p,this.alphabet.length<a)throw l.replace("X",a);if(this.alphabet.search(" ")!==-1)throw u;for(var c=0;c!==this.seps.length;c++){var b=this.alphabet.indexOf(this.seps.charAt(c));b===-1?this.seps=this.seps.substr(0,c)+" "+this.seps.substr(c+1):this.alphabet=this.alphabet.substr(0,b)+" "+this.alphabet.substr(b+1)}this.alphabet=this.alphabet.replace(/ /g,""),this.seps=this.seps.replace(/ /g,""),this.seps=this._shuffle(this.seps,this.salt),(!this.seps.length||this.alphabet.length/this.seps.length>n)&&(o=Math.ceil(this.alphabet.length/n),o>this.seps.length&&(f=o-this.seps.length,this.seps+=this.alphabet.substr(0,f),this.alphabet=this.alphabet.substr(f))),this.alphabet=this._shuffle(this.alphabet,this.salt);var d=Math.ceil(this.alphabet.length/i);this.alphabet.length<3?(this.guards=this.seps.substr(0,d),this.seps=this.seps.substr(d)):(this.guards=this.alphabet.substr(0,d),this.alphabet=this.alphabet.substr(d))}return h(t,[{key:"encode",value:function(){for(var t=arguments.length,e=Array(t),s=0;s<t;s++)e[s]=arguments[s];var h="";if(!e.length)return h;if(e[0]&&e[0].constructor===Array&&(e=e[0],!e.length))return h;for(var r=0;r!==e.length;r++)if(e[r]=this.parseInt(e[r],10),!(e[r]>=0))return h;return this._encode(e)}},{key:"decode",value:function(t){var e=[];return t&&t.length&&"string"==typeof t?this._decode(t,this.alphabet):e}},{key:"encodeHex",value:function(t){if(t=t.toString(),!/^[0-9a-fA-F]+$/.test(t))return"";for(var e=t.match(/[\w\W]{1,12}/g),s=0;s!==e.length;s++)e[s]=parseInt("1"+e[s],16);return this.encode.apply(this,e)}},{key:"decodeHex",value:function(t){for(var e=[],s=this.decode(t),h=0;h!==s.length;h++)e+=s[h].toString(16).substr(1);return e}},{key:"_encode",value:function(t){for(var e=void 0,s=this.alphabet,h=0,r=0;r!==t.length;r++)h+=t[r]%(r+100);e=s.charAt(h%s.length);for(var a=e,n=0;n!==t.length;n++){var i=t[n],l=a+this.salt+s;s=this._shuffle(s,l.substr(0,s.length));var u=this._toAlphabet(i,s);if(e+=u,n+1<t.length){i%=u.charCodeAt(0)+n;var p=i%this.seps.length;e+=this.seps.charAt(p)}}if(e.length<this.minLength){var o=(h+e[0].charCodeAt(0))%this.guards.length,f=this.guards[o];e=f+e,e.length<this.minLength&&(o=(h+e[2].charCodeAt(0))%this.guards.length,f=this.guards[o],e+=f)}for(var g=parseInt(s.length/2,10);e.length<this.minLength;){s=this._shuffle(s,s),e=s.substr(g)+e+s.substr(0,g);var c=e.length-this.minLength;c>0&&(e=e.substr(c/2,this.minLength))}return e}},{key:"_decode",value:function(t,e){var s=[],h=0,r=new RegExp("["+this.escapeRegExp(this.guards)+"]","g"),a=t.replace(r," "),n=a.split(" ");if(3!==n.length&&2!==n.length||(h=1),a=n[h],"undefined"!=typeof a[0]){var i=a[0];a=a.substr(1),r=new RegExp("["+this.escapeRegExp(this.seps)+"]","g"),a=a.replace(r," "),n=a.split(" ");for(var l=0;l!==n.length;l++){var u=n[l],p=i+this.salt+e;e=this._shuffle(e,p.substr(0,e.length)),s.push(this._fromAlphabet(u,e))}this._encode(s)!==t&&(s=[])}return s}},{key:"_shuffle",value:function(t,e){var s=void 0;if(!e.length)return t;for(var h=t.length-1,r=0,a=0,n=0;h>0;h--,r++){r%=e.length,a+=s=e.charAt(r).charCodeAt(0),n=(s+r+a)%h;var i=t[n];t=t.substr(0,n)+t.charAt(h)+t.substr(n+1),t=t.substr(0,h)+i+t.substr(h+1)}return t}},{key:"_toAlphabet",value:function(t,e){var s="";do s=e.charAt(t%e.length)+s,t=parseInt(t/e.length,10);while(t);return s}},{key:"_fromAlphabet",value:function(t,e){for(var s=0,h=0;h<t.length;h++){var r=e.indexOf(t[h]);s+=r*Math.pow(e.length,t.length-h-1)}return s}}]),t}();e.default=r,t.exports=e.default});
  })()
  $$ LANGUAGE plv8 IMMUTABLE STRICT;
view rawload_hashids.sql hosted with ❤ by GitHub

(The above is simply the source for hashids.min.js wrapped in an immediately-executed anonymous function).

After executing that DDL to create the function, execute this SQL to run it:

SELECT load_hashids();

And now, the Hashids constant is ready for use in any JavaScript code inside PL/V8 functions for the remainder of the SQL session (each session gets its own global JS runtime context). We can now do a quick test of the Hashids library inside Postgres:

DO LANGUAGE PLV8 $$
var h = new Hashids('foo');
plv8.elog(NOTICE,h.encode(123));
$$;

You should see NOTICE: 1yR in the output, confirming it works!

As mentioned, this constant will only live as long as the SQL session. A new connection will require rerunning SELECT load_hashids(); to make it available again. Luckily, PL/V8 comes with support for a postgresql.conf configuration value we can use to load a custom PL/V8 function when the runtime is initialized. Simply add this to to postgresql.conf:

plv8.start_proc = 'load_hashids'

And now that is all handled for us!

An example usage

Now let’s put it all together with an example that fixes my issue with products. First, let’s make a helper SQL function to generate Hashids that we’ll be able to call from other SQL functions (like triggers):

CREATE FUNCTION gen_hashid(salt TEXT, min_length BIGINT, key BIGINT) RETURNS TEXT AS $$
var h = new Hashids(salt, min_length);
return h.encode(key);
$$ LANGUAGE PLV8 IMMUTABLE STRICT;

This can be tested like so:

SELECT gen_hashid('foo', 5, 123);

Which should output 61yR6.

Next, here’s a little mockup of a products schema that uses a pre-insert trigger to automatically generate Hashids:

  CREATE TABLE products (
  id BIGSERIAL,
  title TEXT NOT NULL,
  hashid TEXT NOT NULL
  );
   
  CREATE FUNCTION products_pre_insert() RETURNS trigger AS $$
  BEGIN
  NEW.hashid := gen_hashid('products_secret_salt_here', 3, NEW.id);
  RETURN NEW;
  END;
  $$ LANGUAGE plpgsql;
   
  CREATE TRIGGER products_pre_insert BEFORE INSERT ON products FOR EACH ROW EXECUTE PROCEDURE products_pre_insert();
view rawproducts.sql hosted with ❤ by GitHub

Now let’s test it out by inserting some test records:

INSERT INTO products (title) VALUES ('foo');
INSERT INTO products (title) VALUES ('foo');
INSERT INTO products (title) VALUES ('bar');
INSERT INTO products (title) VALUES ('baz');

And now let’s see what SELECT * FROM products returns:

 id | title | hashid
----+-------+--------
1 | foo | WmX
2 | foo | 4zq
3 | bar | eJk
4 | baz | eEp
(4 rows)

Works beautifully! My problem is solved.

Note that in this example I hardcoded the salt and minimum length values in the products_pre_insert() function definition, but in reality one would probably want to create a table to store salt values as there should be a different salt value for each table that uses Hashids, and also salts should not be re-used between test environments and production.

Footnotes

1 I’m not saying it would necessarily affect SEO today, but SEO tends to trickle down from what Google et al consider to be human-friendly, which I don’t think excessively long machine-readable IDs are. I certainly think URLs that scroll way past the address bar with seemingly-random gibberish discourage people who share URLs via address bar copy and paste.

2 Although hash is in the name, the project makes clear it’s not a true cryptographic hash function (and thus not secure). But for my purposes, it’s exactly what I needed to discourage casual scraping while maintaining a certain level of user-friendliness that a very secure solution (UUIDs, real crypto hash functions) wouldn’t allow.

3 There are other Postgres extensions that add support for other languages, like PL/Python, but PL/V8 is a “trusted” Postgres language, while PL/Python is “untrusted.” Trusted languages are safer as they come with certain protections on what actions they can perform - untrusted languages can do anything that the database administrator can do! This is probably why AWS RDS supports PL/V8 but doesn’t support PL/Python.

最新文章

  1. android快速开发--常用utils类
  2. Android -----listView的属性大全
  3. perl中读取外部文件
  4. Moogoose操作之Schema实现增删查改
  5. poj-2376 Cleaning Shifts (排序+贪心)
  6. hibernate映射
  7. TabHost刷新activity的方法
  8. 使用微软Remote Desktop 手机远程控制 windows
  9. Google的PageRank及其Map-reduce应用(日志五)
  10. 机器学习:Python实现聚类算法(一)之K-Means
  11. 开源API测试工具 Hitchhiker v0.7更新 - Schedule的对比diff
  12. Java不走弯路教程(前言)
  13. 对于单页应用中如何监听 URL 变化的思考
  14. Selenium WebDriver原理(一):Selenium WebDriver 是怎么工作的?
  15. [Err] 1093 - You can&#39;t specify target table &#39;master_data&#39; for update in FROM clause
  16. centos6.5下修改文件夹权限和用户名用户组
  17. JavaScript从初见到热恋之深度讨论JavaScript中的面向对象。
  18. Springboot打包支持第三方jar
  19. 白盒静态自动化测试工具:FindBugs使用指南
  20. iOS端JSON转Model链式编程框架SuperKVC使用方法与原理

热门文章

  1. Gitlab CI/CD
  2. Oracle逻辑结构学习笔记
  3. Unity和Jenkins真是绝配,将打包彻底一键化!
  4. ElasticSearch : High Rest Api 使用
  5. [原创]Spring-Security-Oauth2.0浏览器端的登录项目分享
  6. Centos 7.6 安装 oracle 10.2.0.1 数据库软件
  7. python程序设计基础(程序设计基础方法)
  8. SQL Server 2014查看服务器数据库字段报错 (Microsoft.SqlServer.Management.Sdk.Sfc)
  9. 关于spring中请求返回值的json序列化/反序列化问题
  10. drf--版本控制