海量、多维数据让人抓狂？高效搜索方法看这里

发布时间：2019-09-16 23:02:50 所属栏目：编程来源：读芯术

导读：人与世界万物的互动会产生大量的时空数据。那么，当我们需要随时调用过去的数据时，改怎么办?尤其是面对各种海量、多维度的数据库，如果没有高效的搜索方法，我们只能望洋兴叹、束手无策。别担心，本文将用详细的代码，手把手来传授高效搜索法的绝技! 对

4. 创建时间索引

postgres=# create index idx_tbl_2 on tbl using btree (crt_time);  
CREATE INDEX

5. 创建空间索引

postgres=# create index idx_tbl_3 on tbl using gist (pos);  
CREATE INDEX

6. 生成数据布局以方便后续查询

postgres=# select min(crt_time),max(crt_time),count(*) from tbl;  
 min | max | count  
----------------------------+----------------------------+----------  
 2017-07-22 17:59:34.136497 | 2017-07-22 18:01:27.233688 | 50000000  
(1 row)

7. 创建一个极限KNN查询函数

create or replace function ff(point, float8, int) returns setof tid as  
$ 
declare  
 v_rec record;  
 v_limit int := $3;  
begin  
 set local enable_seqscan=off; -- Force index that exits when scanned rows reach a specific number  
 for v_rec in  
 select *,  
 (pos <-> $1) as dist,  
 ctid  
 from tbl  
 order by pos <-> $1  
 loop  
 if v_limit <=0 then  
 -- raise notice "Sufficient data obtained"  
 return;  
 end if;  
 if v_rec.dist > $2 then  
 -- raise notice "All matching points returned"  
 return;  
 else  
 return next v_rec.ctid;  
 end if;  
 v_limit := v_limit -1;  
 end loop;  
end;  
$ 
 language plpgsql strict volatile;  
postgres=# select * from ff(point (100,100) ,100,100) ;  
 ff  
-------------  
 (407383,11)  
 (640740,9)  
 (26073,51)  
 (642750,34)  
...  
(100 rows)  
Time: 1.061 ms

8. CTID合并检索

显示符合以下条件的记录

(  
c1 in (1,2,3,4,100,200,99,88,77,66,55)  
 or  
c2 < 10  
)  
 and  
pos <-> point (0,0) < 5  
 and  
crt_time between 2017-07-22 17:59:34 and 2017-07-22 17:59:40 ;

首先，分别查看每个条件，找匹配一个条件的记录数量，以及在索引扫描上所花时长。

1. 54,907条记录

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where c1 in (1,2,3,4,100,200,99,88,77,66,55);  
 QUERY PLAN  
-------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on postgres.tbl (cost=820.07..65393.94 rows=54151 width=73) (actual time=23.842..91.911 rows=54907 loops=1)  
 Output: id, info, crt_time, pos, c1, c2, c3  
 Recheck Cond: (tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[]))  
 Heap Blocks: exact=52778  
 Buffers: shared hit=52866  
 -> Bitmap Index Scan on idx_tbl_1 (cost=0.00..806.54 rows=54151 width=0) (actual time=14.264..14.264 rows=54907 loops=1)  
 Index Cond: (tbl.c1 = ANY ( {1,2,3,4,100,200,99,88,77,66,55} ::integer[]))  
 Buffers: shared hit=88  
 Planning time: 0.105 ms  
 Execution time: 94.606 ms  
(10 rows)

（编辑：核心网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

7/21

首页

尾页

把not in 更换成not e	mydumper工具运用介绍
别花冤枉钱买专栏了！	Mysql索引类型创建错误