海量、多维数据让人抓狂？高效搜索方法看这里

发布时间：2019-09-16 23:02:50 所属栏目：编程来源：读芯术

导读：人与世界万物的互动会产生大量的时空数据。那么，当我们需要随时调用过去的数据时，改怎么办?尤其是面对各种海量、多维度的数据库，如果没有高效的搜索方法，我们只能望洋兴叹、束手无策。别担心，本文将用详细的代码，手把手来传授高效搜索法的绝技! 对

2. 95,147条记录

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c2<10;  
 QUERY PLAN  
---------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on postgres.tbl (cost=835.73..112379.10 rows=99785 width=73) (actual time=69.243..179.388 rows=95147 loops=1)  
 Output: id, info, crt_time, pos, c1, c2, c3  
 Recheck Cond: (tbl.c2 < 10)  
 Heap Blocks: exact=88681  
 Buffers: shared hit=88734  
 -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..810.79 rows=99785 width=0) (actual time=53.612..53.612 rows=95147 loops=1)  
 Index Cond: (tbl.c2 < 10)  
 Buffers: shared hit=53  
 Planning time: 0.094 ms  
 Execution time: 186.201 ms  
(10 rows)

3. 149930条记录(为快速获得结果，PostgreSQL使用位图进行合并扫描)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 in (1,2,3,4,100,200,99,88,77,66,55) or c2 <10;  
 QUERY PLAN  
------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on postgres.tbl (cost=1694.23..166303.58 rows=153828 width=73) (actual time=98.988..266.852 rows=149930 loops=1)  
 Output: id, info, crt_time, pos, c1, c2, c3  
 Recheck Cond: ((tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[])) OR (tbl.c2 < 10))  
 Heap Blocks: exact=134424  
 Buffers: shared hit=134565  
 -> BitmapOr (cost=1694.23..1694.23 rows=153936 width=0) (actual time=73.763..73.763 rows=0 loops=1)  
 Buffers: shared hit=141  
 -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..806.54 rows=54151 width=0) (actual time=16.733..16.733 rows=54907 loops=1)  
 Index Cond: (tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[]))  
 Buffers: shared hit=88  
 -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..810.79 rows=99785 width=0) (actual time=57.029..57.029 rows=95147 loops=1)  
 Index Cond: (tbl.c2 < 10)  
 Buffers: shared hit=53  
 Planning time: 0.149 ms  
 Execution time: 274.548 ms  
(15 rows)

4. 60,687条记录(即使运用出色的KNN性能优化，仍然需要耗费195毫秒)。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from ff(point (0,0) ,5,1000000);  
 QUERY PLAN  
----------------------------------------------------------------------------------------------------------------------  
 Function Scan on postgres.ff (cost=0.25..10.25 rows=1000 width=6) (actual time=188.563..192.114 rows=60687 loops=1)  
 Output: ff  
 Function Call: ff( (0,0) ::point, 5 ::double precision, 1000000)  
 Buffers: shared hit=61296  
 Planning time: 0.029 ms  
 Execution time: 195.097 ms  
(6 rows)

让我们看看不使用KNN优化需要多长时间。

结果非常令人惊讶——极限优化性能提高了一个数量级。

5. 2,640,751条记录

使用所有索引逐个扫描数据条件，得到ctid并执行ctid扫描。

现在，让我们来分解这个过程：

首先，让我们看看时间和对象属性的合并查询，成果非常惊人。使用位图BitmapOr时，查询可以跳过大多数数据块，并且扫描时间比单索引扫描要短。

（编辑：核心网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

9/21

首页

尾页

把not in 更换成not e	mydumper工具运用介绍
别花冤枉钱买专栏了！	Mysql索引类型创建错误