PodParley PodParley

Episode 117: Full Text Search

Add enterprise level search into your site. News and Follow/Ups – 01:00 Square now being sold in Apple’s store Check-Ins dying out? Dropbox: 25 million users Geek Tools – 14:13 Yikerz! - Super fun magnet game Webapps - 16:12

An episode of the Faceoff Show podcast, hosted by Faceoff Staff, titled "Episode 117: Full Text Search" was published on April 19, 2011 and runs 34 minutes.

April 19, 2011 ·34m · Faceoff Show

0:00 / 0:00

Add enterprise level search into your site. News and Follow/Ups – 01:00 Square now being sold in Apple’s store Check-Ins dying out? Dropbox: 25 million users Geek Tools – 14:13 Yikerz! - Super fun magnet game Webapps - 16:12 Surfboard - Flipboard as a web app InstaLyrics - Find lyrics quickly Full Text Search - 22:11 Options Google Custom Search Commercial Benefits Super fast to setup Easy to implement Ability to add adsense into search results Downsides Unable to adjust content ranking and do custom integration Mainly for just indexing HTML pages, not search queries and other text. Sphinx “Searching via SphinxAPI is as simple as 3 lines of code, and querying via SphinxQL is even simpler, with search queries expressed in good old SQL.” Open source with commercial support Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings. The search service daemon (searchd) is pretty low on memory usage - and you can set limits on how much memory the indexer process uses too. API for: Java, PHP, Python, Ruby, Perl, C, and other languages. Written in C++ Stats 60+ MB/sec per server 500+ queries/sec Biggest known Sphinx cluster indexes 5 billion documents, resulting in over 6 TB of data. Busiest known one is, unsurpisingly, Craigslist, that serves 50+ million search queries/day. Companies using Sphinx Craigslist Slashdot Mozilla Wordpress.org Lucene Done by the Apache foundation Open source Written in Java Search types ranked searching -- best results returned first many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more fielded searching (e.g., title, author, contents) date-range searching sorting by any field multiple-index searching with merged results allows simultaneous update and searching Stats over 95GB/hour on modern hardware small RAM requirements -- only 1MB heap index size roughly 20-30% the size of text indexed Solr Lucene is a library where Solr is a server that supports XML, REST Benefits over Sphinx Solr is easily embeddable in Java applications. Solr can be integrated with Hadoop to build distributed applications Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't. Companies using Solr eHarmony Ticketmaster Digg AOL Zappos

Add enterprise level search into your site. News and Follow/Ups – 01:00 Square now being sold in Apple’s store Check-Ins dying out? Dropbox: 25 million users Geek Tools – 14:13 Yikerz! – Super fun magnet game Webapps – 16:12 Surfboard – Flipboard as a web app InstaLyrics – Find lyrics quickly Full Text Search – […]
URL copied to clipboard!