Like many services on the web, Box's search technology started out by indexing every file into a single large database. After experiencing massive growth over the past few years, (we now store over 360 million files, and that number increases by nearly one million every day) our team realized that Box search needed a speed boost to keep up with our users. We re-evaluated our technology and decided to make a switch to a fully distributed and scalable search platform using the open source project Solr, powered by Lucene. If Solr sounds familiar, that's because odds are good you're already using it. Companies like Salesforce, LinkedIn and Twitter are successfully using Solr on a massive scale, and we’re excited to join them in using and developing on top of this mature, robust technology.
Great, you say - but what does that mean for me?
First, you should immediately notice the blazing fast speed of Solr. Quick search results are available in less than half a second, and full search results don't take much longer. Second, full-text indexing for all your newly uploaded files now happens in under 20 minutes, helping you locate documents even faster. We also switched to using the Apache Tika project for text extraction, allowing for extremely accurate fidelity in the indexing process. As time goes on expect these speeds to improve even further, as we iterate and improve on the architecture.
And most importantly, the new search platform is not only scalable in the sheer quantity of data it indexes, but also in the sophisticated features we can build on top of it. We’re excited to be developing and rolling out some more advanced search options over the next several months.
Thanks to the cloud, you’ll be receiving all these improvements completely seamlessly, with no patches or updates to install. Happy searching!