By Mark Bennett, New Idea Engineering, Inc. Read Part 2 and Part 3 of this series. Introduction The perennial question of what separates Enterprise Difference between internet and intranet pdf from the more familiar search engines that power the public Internet recently came up again.

Search was planning to do a blog entry but the list mushroomed, and we now present the first in a three part series on the dozens of things that make Enterprise Search surprisingly difficult, and that sometimes flummox the engines that were created to power the public web. As we hinted above, the public Internet was the inspiration and proving ground for a majority of the commercial and open source search engines out there. When vendors talk about their products, features and patents, they are usually talking about technology that was not specifically designed for the enterprise. This isn’t just academic theory – as you’ll see, these assumptions can actually break enterprise search, if not adjusted properly.

To be clear, when we say “enterprise” search, we are referring to both the search engines that power private Intranets and Extranets, and to a lesser extent, the engines that companies have purchased to power their commerce and customer facing web sites. Broadly, “enterprise” search could be thought of as “all search engines EXCEPT the public Yahoo, Google and MSN”, since you DO own and control the search engine that powers your public web site or online store. With all that said, let’s get started! Intranet Mismatches These are some differences viewed from the broadest 10,000 foot level. We’ll revisit some in more detail later.

The Enterprise is not just “a small Internet” Imagine if you powered the Internet, and had a brand name that rivaled Coca-Cola. And then, imagine if you took all of that wonderful technological goodness with the wonderful brand name, and stuffed it into a brightly colored rack-mounted box. You would assume that, if you could handle the Internet, then of course you could handle a relatively puny private network – it just makes sense! These seem like perfectly sane and compelling arguments, and this model has worked at some companies. HTML and PDF documents, this would possibly work for you.

Or, suppose you had a portal that powered all of the Internet back in the 1990s. Slap that software on a CD, give it a nice Web based admin GUI, and ship it! This was actually the start of several well known search vendors. These products have also been iterated to add on enterprise functionality. Ultraseek had been a great choice for more generic enterprise environments, and included some customization. Lately the Google Appliance is filling that segment, and can scale to reasonable sizes. In contrast, some engines were not created for the Internet, but were always targeted at more specific business applications.

It can also spider and search HTML and other document formats, but that was not its genesis. In the enterprise, however, content comes from many other sources, such as Content Management Systems, databases and archival storage appliances, etc. When you have all the pages indexed and basic search up and running, you have achieved “Search Dial-Tone”. Nothing fancy, but basic search functionality is online. Modern search engines employ “fulltext” search, looking for specific search terms in relatively unstructured text.

About the only assumption made about a document’s “structure” was that it would probably have a Title of some sort. Fulltext searches are also much more free form. Think about how much easier it is to type a search into Google, verses creating an old-school SQL SELECT statement. But Enterprises DO have data, lots of it, and it is often structured.