This project is read-only.

The numbers again...

Jul 8, 2008 at 1:20 PM
Hi all,

First of all, looking forward to the 2.5 release. Faceted search is a great feature to bring into search.

So, my numbering problem;
Have set up a facet to handle content types (ows_ContentTypes(Text)) which seems to be working fine. However, when I do a search, the facet "Report" (one of our contet types for document) shows up with 4 hits. If I then select that facet, the new SharePoint search show a total of 5 hits while Faceted search still shows "Report (4)". So I investigated the results, and found out that one of the result in face was not a "Report" content type, but a "Budget and Forecast Report" content type. Had a chat with a coworker here, and he said that this is due to the fact that SharePoint uses contains not equal in its search. So, no I'm stuck, and looking for options to solve this. Getting the results for both content types when selecting report is not exactly a good thing... The only thing that I see as a solution at the moment is to have totally unique content type names.

Have anybody had the same problem? Any suggestions to how it can be solved?

Regards,
Knut
Sep 2, 2008 at 6:18 PM
Edited Sep 2, 2008 at 6:19 PM
Hi Knut,


From the testing I have done, what you have described appears to be one of the main problems with the faceted search. The freetext search in MOSS uses the "contains" condition. If I search FreeText for "Project Server" the search query is: Contains "Project" AND Contains "Server". So my initial query "Project Server" would return a result from the text "The current Project requires a very big Server"; which of course has nothing to do with "Project Server".

The faceted search "drill-down" query also uses this mechanism described above. So say you have a attribute called “Priority” with four possible values (number of hits): “Very High” (8), “High” (12), “Medium” (20), “Low” (4) when you drill-down on "High" you will get not just "High" (12) but also "Very High" (8) in the Priority category and 20 records in the results page. It, of course, would be far more logical to only display “High” (12) with a total of 12 in the results page.

The problem is that FreeText Search only allows two operators "+" and "-". Spaces, comas etc are considered Word delimitators. The “drill-down” uses the Managed Search Properties via the FreeText search dialog using <property name>:"<search value>" syntax.

These are the issues which I am aware of, and I try to outline possible solutions:

1. Faceted Search Drill-down uses "contains" condition although the totals shown for each Facet uses “equals to”. As described above, I do not know of a Workaround.

2. Managed Properties can be mapped to one or many indexed properties. So, with many it is possible that the search query finds two hits or more in the same record. To avoid this issue you must reconfigure the managed property to count only the primary mapped property or split each property out into separate unique managed property.

3. Word Stemming algorithm can return results which are different from the totals of the categories in the Facet Webpart. You can switch off word stemming by changing a property in the results webpart.

I would be very interested in hearing the comments of others as to other behaviours which have been noted as it is clear that the Search Facet totals do not coincide with the Search Results totals. There is no thread which documents how they are calculated and how to mitigate all of the problems encountered.

I don’t want to sound too negative as the Search Facet idea is a really brilliant one and it is very clear that there has been a lot of hard work put into developing it.

Regards,

Barry

Sep 3, 2008 at 6:31 AM
The count differences due to word stemming looks like an easy fix. There is code there for getting the StemmingEnabled, DuplicatesRemoved and NoiseIgnored properties from Core results web part - it is just not getting executed. The GenericQuery class has this code in the setters for its public properties to set the corresponding properties on the query. Just need to add some code to the constructor of GenericQuery to make sure these values get set. This seems to fix the problem when stemming is enabled anyways.
Another thing with the counts that could throw you off is if you have column mappings make sure you map every possible value. I.e. if you only include mappings for the values you want different display names or icons for etc you will also need to have a wildcard match for everything else. 2.5 is certainly a great effort anyways.
Sep 4, 2008 at 12:32 AM
great feedback, thanks! the issue will be looked at.