MongoDB索引篇

xiaoxiao2021-02-28 84

索引简介

索引本质上是树，最小的值在最左边的叶子上，最大的值在最右边的叶子上，使用索引可以提高查询速度（而不用全表扫描），也可以预防脏数据的插入（如唯一索引）

索引即支持普通字段也支持内嵌文档中某个键和数组元素进行索引

索引的原理：

对某个键按照升续或降续创建索引，查询时首先根据查询条件查找到对应的索引条目找到，然后找对索引条目对应的文档指针（文档在磁盘上的存储位置），根据文档指针再去磁盘中找到相应的文档，整个过程不需要扫描全表，速度比较快

[“along”] —-> 0x0c965148(文档指针) …… [“zhangsan”] —-> 0x0c965148(文档指针)

索引的类型

唯一索引 unique：保证数据的唯一不重复稀疏索引 sparseTTL 索引 : 设置文档的缓存时间，时间到了会自动删除掉全文索引：便于大文本查询（如概要、文章等长文本）复合索引：用于提高查询速度二维平面索引：便于2d平面查询地理空间索引：便于地理查询

索引的管理

// 创建索引 function ensureIndex(keys, options); // 查询索引 function getIndexes(filter); // 删除索引 function dropIndex("IndexName");

一：唯一索引

> db.foo.ensureIndex({"username": 1}, {"unique": true}) { "createdCollectionAutomatically" : true, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } > db.foo.insert({"username": "mengday", "email": "mengday@163.com", "age": 26}) WriteResult({ "nInserted" : 1 }) // username 重复会报错 > db.foo.insert({"username": "mengday", "email": "mengday2@163.com"}) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 11000, "errmsg" : "E11000 duplicate key error collection: test.foo index: username_1 dup key: { : \"mengday\" }" } }) // 第一次插入不包含索引键的文档，插入成功，不包含索引键系统会默认为索引键的值为null > db.foo.insert({"email": "mengday3@163.com"}) WriteResult({ "nInserted" : 1 }) // 第二次插入不包含唯一索引的键，插入失败，因为不包含键，键的值就null，第一次已经有一个值为null, 再插入null，就是重复 > db.foo.insert({"email": "mengday4@163.com"}) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 11000, "errmsg" : "E11000 duplicate key error collection: test.foo index: username_1 dup key: { : null }" } }) // 对多个字段创建唯一索引（关系数据库中的联合主键） db.user.ensureIndex({"username": 1, "nickname": 1}, {"unique": true}) >

MongoDB是无结构型的NoSQL，同一个集合中的每条文档可以包含某个键，也可以不包含，为了达到如果文档中包含索引键，索引键的值必须唯一，如果不包含索引键那么不用校验唯一的效果，可以在创建索引时使用sparse: true，也就是稀疏索引。

> db.foo.drop() true > db.foo.ensureIndex({"username": 1}, {"unique": true, "sparse": true}) { "createdCollectionAutomatically" : true, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } > db.foo.insert({"email": "mengday1@163.com"}) WriteResult({ "nInserted" : 1 }) > db.foo.insert({"email": "mengday2@163.com"}) WriteResult({ "nInserted" : 1 }) > db.foo.insert({"username": "mengday3", "email": "mengday3@163.com"}) WriteResult({ "nInserted" : 1 }) > db.foo.insert({"username": "mengday3", "email": "mengday3@163.com"}) WriteResult({ "nInserted" : 0, "writeError" : { "code" : 11000, "errmsg" : "E11000 duplicate key error collection: test.foo index: username_1 dup key: { : \"mengday3\" }" } })

稀疏索引：对不存在的键就不进行索引，也就是该文档上没有建立索引，索引条目中也不包含索引键为null的索引条目，所以再次插入不包含索引键的文档不会报错，直接插入。注意：稀疏索引不光和唯一索引配合使用，也可以单独使用

对于唯一索引的看法：

唯一索引的目的是为了让数据库的某个字段的值唯一，为了确保数据的都是合法的，但是唯一索引在插入数据时会对数据进行检查，一旦重复会抛出异常，效率会比较低，唯一索引只是保证数据库数据唯一的最后一种手段，而不是最佳方式，更不是唯一方式，为了保证效率最好采用别的解决方案来保证数据的唯一合法性，尽量减少数据库的压力。

二： TTL索引

TTL索引是让文档的某个日期时间满足条件的时候自动删除文档，这是一种特殊的索引，这种索引不是为了提高查询速度的，TTL索引类似于缓存，缓存时间到了就过期了，就要被删除了

// expireAfterSeconds: 文档生存的时间，单位是秒，索引键是日期类型的 // 如果当期时间大于索引键的时间加上缓存时间就会删除该文档 > db.foo.ensureIndex({"create_at": 1}, {"expireAfterSeconds": 60}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "ok" : 1 }

三：全文索引

// 创建全文索引：字段：content，键的类型：text(全文索引类型) > db.blog.ensureIndex({"content": "text"}) > db.blog.insert({ "title": "MongoDB: The Definitive Guide", "summary": "MongoDB Atlas Database as a Service", "content": "The best way to deploy, operate, and scale MongoDB in the cloud. Available on AWS, Azure, and GoogleCloud Platform. Easily migrate your data to MongoDB Atlas with zero downtime." }) // 使用全文索引进行查询 > db.blog.find({"$text":{"$search": "best"}}) { "_id" : ObjectId("5986c5c94fbaf781302810e2"), "title" : "MongoDB: The Definitive Guide", "summary" : "MongoDB Atlas Database as a Service", "content " : "The best way to deploy, operate, and scale MongoDB in the cloud. Available on AWS, Azure, and GoogleCloud Platform. Easily migrate your data to M ongoDB Atlas with zero downtime." } >

全文索引是用于对长文本检索来使用的，是用正则表达式只能对字符串类型的值进行检索。注意：创建索引是一件比较耗时耗费资源的事情，而全文索引更是耗时更厉害，如果对索引键的内容比较长，需要对内容进行分词，会出现更严重的性能问题。

创建全文索引，建议在mongodb不忙的时候创建，mongodb的分词现在好像不支持中文，如果是对内容比较小的比如小于100个汉字的可以试用一下mongodb的全文索引，如果是对一篇很长的文章使用全文索引这是非常不合适的，这会把mongodb累死的，对于内容比较多可以采取其他技术如Lucenne、Solr、ElasticSearch等技术

四：复合索引

创建索引时可以对一个字段创建索引，也可以对多个字段创建索引，对多个字段创建索引被称为复合索引或者组合索引

> db.user.find() { "_id" : 1, "username" : "zhangsan", "age" : 25 } { "_id" : 2, "username" : "lisi", "age" : 18 } { "_id" : 3, "username" : "wangwu", "age" : 28 } { "_id" : 4, "username" : "fengwu", "age" : 27 } > // 1：为索引值以升续的方式创建索引条目，-1：代表降续 > db.user.ensureIndex({"username": 1}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } // explain()函数用于查看当前查询的一些信息，比如使用使用了索引等 > db.user.find({"username": "wangwu"}).explain() // 创建组合索引（以后台模式创建） > db.user.ensureIndex({"username": 1, "age": 1}, {"background": true}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "ok" : 1 } > db.user.find({"username": "wangwu", "age": 28}) // 如果查询时发现没有使用到索引，可以使用hint函数强制使用索引查询 > db.user.find().hint({"username": 1, "age": 1}) > db.user.update({"username": "zhangsan"}, {"$set": {"address": {"road": "yijiang", "code": 666}}}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) // 对内嵌文档中字段创建索引 > db.user.ensureIndex({"address.road": 1}, {"background": true}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 3, "numIndexesAfter" : 4, "ok" : 1 } > > db.user.find({"address.road": "yijiang"}).explain() // 对数组创建索引，就是对数组中的每个元素分别创建索引,而不是对整个数组建立索引，对数组的每一个元素都创建索引，那么维护索引的代价就比普通的值大 > db.user.update({"username": "zhangsan"}, {"$set": {"hobby": ["eat", "drink", "mm", "money"] }}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.user.ensureIndex({"hobby": 1}, {"background": true}) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 4, "numIndexesAfter" : 5, "ok" : 1 } > db.user.find({"hobby": "mm"}).explain()

索引的选项：

name:”“, 自定义索引的名称，不配置系统会有默认的索引名background: true, 默认是前台模式，创建索引是一件即费事又耗费资源的事情，创建索引是在前台模式或者后台模式下创建，在前台模式下创建非常快，但是当有读写请求时会堵塞，在后台模式下当有读写请求时并不堵塞，但是创建索引就会暂时暂停，后台模式要比前台模式慢的多unique: true:,唯一索引dropDups: true,是否强制删除其他重复的文档，默认不删除，当索引键值重复时创建失败sparse: true, 稀疏索引: 只对包含了该索引键的文档生成索引条目，不包含该键就跳过不生成索引键，可以和唯一索引配合使用，也可以单独使用

注意：

对于复合索引，相同的键，键在索引中的顺序不同是属于不同的索引，如：{“username”: 1, “age”: 1}和{“age”: 1,”username”: 1}是不同的索引对于复合索引，相同的键，每个键的排序不同也属于不同的键，如 {“username”: 1, “age”: 1}和{“username”: 1, “age”: -1}是属于不同的索引对于相同的键，键出现的顺序相同，而每个键的排序都乘以 -11，是属于相同的索引，如 {“username”: 1, “age”: -1}和{“username”: -1, “age”: -1}对于复合索引，存在隐式索引。隐式索引的意思是当对多个字段创建复合索引时，相当于也对所有字段组成的复合索引的前缀都创建了一个索引，例如创建了复合索引：{“field1”: 1, “field2”: -1, “field3”: -1， “field4”: -1}, 也相当于同时创建了{“field1”: 1}、{“field1”: 1, “field2”: -1}、{“field1”: 1, “field2”: -1, “field3”: -1} 所有前缀组成的索引

对于索引的使用效率

索引键基数越大，效率越高。基数：就是某个字段不同值的个数（相当于SQL中的 count(distinct key)），如性别就2个，如用户名和邮箱几乎都不同，所以不同值的个数就很多，基数越大，使用索引快速筛选掉不满足条件的文档越快，基数越小就不能快速筛选满足条件的文档一些特殊的操作符不能使用索引，如 $where、$exists一般取反的操作符索引利用率都比较低，如$not、$nin、$ne如果能使用$in操作符尽量不要使用$or操作符，因为or: 是执行两次查询操作，然后将结果合并起来，类似于union all，能使用in(单次查询)就不要使用or操作符

什么时候创建索引？

当需要对查询优化，或者经常使用某种查询的时候可以创建索引来提高查询效率

应该选哪些字段创建索引？

一般应该在基数比较高的键上建立索引，或者至少把基数较高的键放在复合索引的前面位置

转载请注明原文地址: https://www.6miu.com/read-46823.html

技术

最新回复(0)