The procedure was simple - I grabbed the benchmark repo (https://gitorious.org/openismus-playground/fts-benchmark), used it to built two sets of the databases with 17251 and 121587 movies and then just ran valgrind's massif while only performing queries on the already built databases. Here are the values of peak memory usage:
|17251||1.4 MiB||2.5 MiB||1.2 MiB|
|121587||3.1 MiB||2.6 MiB||5.2 MiB|
Of course, the peak memory usage by itself isn't terribly interesting value, what matters is also how does the engine work with memory over time, so let's look at that as well (images are courtesy of Milian's fantastic massif-visualizer, note that their scale is not relative to each other):
We can see that both Lucene and SQLite seem to build a cache on the first query and then use it, Xapian on the other hand doesn't seem to be keeping a cache, as the rapid drops in mem usage suggest, but maybe there's different explanation to that.
So that's it for memory usage while performing standard queries, but what I was particularly interested in was memory usage when performing wildcard queries (as I saw some strange behaviour here and there with Xapian). Therefore I added one simple wildcard query "T*" to the list of executed queries and ran it on the largest DBs (ones with 121587 movies). As you can imagine the "T*" query is really generic and it matches around 110 thousand documents from the dataset, that's why I also added a limit of 10k results per query to each backend (although that shouldn't make much difference).
Let's look at the results:
Now we can clearly see that Xapian uses huge amount of memory during expansion of the wildcard query (can this be considered a bug report? :)), SQLite has a couple of peaks there, but nothing to be worried too much about, and Lucene++ shines with its fairly constant (and really low) mem usage.
You saw the data, so I'll leave any conclusions up to you. ;)
The small number of changes I had to do to the original benchmark repository is available as a simple diff here.