Saturday, August 10, 2013

Riak $key index is slower than just get_keys..

For riak 1.4 using leveldb as the backend so that secondary indexing is enabled via

{riak_kv, [{storage_backend, riak_kv_eleveldb_backend},...]}
{eleveldb, [
             {data_root, "/var/lib/riak/leveldb"}
            ]}


Data:

from riak import RiakClient, RiakPbcTransport; ds = RiakClient('127.0.0.1',port=1030, transport_class=RiakPbcTransport)
[ipython 11]: for x in xrange(1, 100000):
        ds.bucket('stuff').new('bob_'+str(x)
,
        data={'test_obj':x}).store()


Setup:

from riak.mapreduce import RiakMapReduce, RiakKeyFilter
mr = MapReduce(self.datastore_client.riak_client)
mr.add(bucket)
mr.add_key_filters(RiakKeyFilter().starts_with(id_tag))

b = ds.bucket('stuff')

 

Results:

[ipython 21]: %time max([int(_id.get_key().split('bob_')[1]) for _id in mr.run()])
CPU times: user 4.09 s, sys: 0.31 s, total: 4.40 s
Wall time: 40.94 s
      Out[21]: 99999

[ipython 22]: %time max([int(_id.get_key().split('bob_')[1]) for _id in ds.index('stuff', "$key", ' ', '~').run()])
CPU times: user 4.16 s, sys: 0.32 s, total: 4.48 s
Wall time: 41.80 s
      Out[22]: 99999

[ipython 23]: %time max([int(_id.split('bob_')[1]) for _id in ds.bucket('stuff').get_keys()])
CPU times: user 0.48 s, sys: 0.01 s, total: 0.49 s
Wall time: 1.39 s
      Out[23]: 99999


Granted it's a super simple test, but the results are still confusing.  It could be a misconfigured leveldb?  Turning it on seems pretty basic.

No comments:

Post a Comment