PlayNicely - Beautiful collaboration for software developers

Redis multi-field searching and filtering

Posted by adam



_This post follows on from A fast, fuzzy, full-text index using Redis. Also, there are still a few places left for the first London Redis Meetup on Wednesday._

Searching across multiple fields is a required feature of any bug tracking system. How else would you find that bug which you know Jeff updated a couple of weeks ago which mentioned Nginx?

In a relational database such as MySQL/PostgreSQL etc, the solution is pretty straight forward. You take your parameters, concoct an SQL statement, and then execute it against the database. Of course, this may be very complex depending on your schema.

But PlayNice.ly uses Redis, which is a rather different creature. We have no SQL to create, and no tables so search through (it is known as NoSQL for a reason). So what do we do?

Working with sets

One thing Redis is great at handling is sets. A set is a list of unique, unordered, values. This list can be combined with other sets in a number of ways. Imagine that you have two sets:

# Our example sets
set_a = set([0, 2, 4, 6, 8])
set_b = set([0, 3, 6, 9])

We can combine these sets in the following ways:

1. UNION – A union:http://code.google.com/p/redis/wiki/SunionCommand just means “Give me a new set with all the values in set_a and set_b”. Remembering that sets contain unique values, the result would be:

# The result of a union on set_a and set_b
set_union = set([0, 2, 3, 4, 6, 8, 9])

2. INTERSECT – A intersection:http://code.google.com/p/redis/wiki/SinterCommand means “Give me a new set with the common to both set_a and set_b”. For example:

# The result of an intersection on set_a and set_b
set_union = set([0, 6])

3. DIFFERENCE – A difference:http://code.google.com/p/redis/wiki/SdiffCommand means “Give me a new set of all the values in set_a which are not in set_b”. For example:

# The difference between set_a and set_b
set_union = set([2, 4, 8])

How is this useful?

Some of you may have noticed that the three examples above are actually doing OR/AND/NOT commands respectively. If we have a set of items for each search criteria, when we can produce results for any query we like. For example:

# Get all the open items that Jeff is responsible for
 
# Some example data, you would need to store these sets
# in Redis and maintain them in your model
open_items = set([1, 2, 4, 6])
jeffs_items = set([2, 5, 6, 7])
 
results = redis.sinter(open_items, jeffs_items)
 
print results
# > set([2, 6])

Ok, that’s reasonably straight-forward. However, it is the job of the application to maintain these lookup sets (@open_items@ and jeffs_items, in the previous example). Redis will not maintain these lookups for you (and nor should it). This means that you really need to wrap your data-access logic up in some kind of model, if you are not doing so already. The idea is that the model will maintain all these lookup sets automatically so that the rest of your application can get on with doing its job.

Now that is out of the way, how about getting the new or open items that Jeff is responsible for?

# Get all the open and new items that Jeff is responsible for
open_items = set([1, 2, 4, 6])
new_items = set([5, 10])
jeffs_items = set([2, 5, 6, 7])
 
new_or_open_items = redis.sunion(open_items, new_items)
results = redis.sinter(jeffs_items, new_or_open_items)
 
print results
# > set([2, 5, 6])

Does that make sense to you? It is getting a bit unwieldy now, but hopefully you can see how it works. Let’s do one last example to show how you can perform a ‘NOT’:

# Get all items that Jeff is NOT responsible for
all_items = set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
jeffs_items = set([2, 5, 6, 7])
 
results = redis.sdiff(all_items, jeffs_items)
 
print results
# > set([1, 3, 4, 8, 9, 10])

Making it more manageable (the easy way)

This is all well and good, but even these simple examples get pretty cumbersome after a while. Clearly we need to wrap this logic up into something more manageable.

A simple example could be to create a ItemFilter class (in PlayNice.ly, items are things like ‘bugs’ and ‘tasks’) which could be used like this:

# How about wrapping the logic in a ItemFilter class? It could work like this...
myFilter = ItemFilter()
myFilter.statuses = ("closed", "qa")
myFilter.owners = (5) # 5 is Jeff's user ID
myFilter.type_name = "bug"
 
matching_item_ids = myFilter.execute()

Making it more manageable (the hard way)

Alternatively, you may choose to go a step further and add the ability to parse a search query such as:

(status:closed OR status:qa) AND owner:5 AND type_name:bug

You can then parse that query into the a tree structure where each node is a object capable of returning a Redis key for its own section of the query. For example:

The tree structure of the above query

You can then query the root node for its Redis key, at which point the following happens:

  1. You query the root node for its Redis key. To provide this, the root node needs the Redis keys of its children, so it queries each of those…
  2. The OR node also needs the Redis keys for its children, so it queries those…
  3. The child of the OR node returns items_by_status:closed and items_by_status:qa respectively
  4. The OR node can then perform an SUNIONSTORE into a temporary key, tmp:1
  5. The root node then queries the right-hand AND node for its Redis key…
  6. The right-hand AND node also needs the Redis keys for its children, so it queries those…
  7. The children of the right-hand AND node returns items_by_owner:5 and items_by_type_name_:bug respectively
  8. The right-hand AND node can then perform an SINTERSTORE into a temporary key, tmp:2
  9. The root node can the perform an SINTERSTORE into temporary key, tmp:3
  10. The Redis key tmp:3 is then returned, containing a SET with the results of the query!

Using with the full-text search

Readers of the previous Redis post may also notice that we can incorporate the full-text fuzzy search here. All we have to do is perform SUNION or SINTER commands against the sets in our full-text index. For example, we could perform an SINTER on word:AKSPXN and items_by_type_name_:bug to get all the bugs that contain the word ‘exception’.

Summing up

Creating a versatile multi-field filtering system in Redis is not a trivial task by any stretch of the imagination. However, PlayNice.ly benefits from the speed of this system. We are able to refresh results within the app while the user is still typing their query, and queries generally take tens of milliseconds to run. All of this gives the user a very smooth experience, and that is what counts. :)

This is our first attempt at creating such a system, and we would love to hear your thoughts on how we could improve it (either in the comments, or at the London Redis Meetup this week).



5 Responses to “Redis multi-field searching and filtering”

  1. finnw says:

    Someone’s reinvented SETL (again.)

  2. Cameron says:

    Great post. Really look forward to seeing more about how you are using redis. I am a ruby developer, but really like the look of redis for my next project. Would be interested to hear about how you are overcoming any memory issues by using redis for your main persistence store.

  3. Nick says:

    Great stuff. Thanks for sharing.

  4. Josh says:

    This is great but what do you do about the weighting aspect or subsets? For example, say you wanted to get everything that had a status of closed OR was a bug OR is Jeff’s. Obviously something that has a status of closed AND is a bug AND is Jeff’s should be weighted highest, followed by (Closed + bug or Closed + Jeff or Jeff + bug) and then anything that is just closed, a bug or jeffs but not the other.

    Seems you can hit Redis with every factorial, which might be fine with a factorial of 3 (so just 6 searches) but what happens if you have TEN! That’s 3,628,800 queries!

    Any thoughts on how you would use Redis to efficiently get weighted subsets?

  5. [...] have been covering some fairly complex Redis topics here so far, so I thought it would be nice to do a good ol' getting started guide. [...]

Leave a Reply