This project is read-only.

How is the number in the facet calculated

Nov 5, 2007 at 1:47 PM
Hi

In order to understand the number and the filter, could you please in detail explain what the number represents, and how the number is calculated. Is it calculated based on part of the result ?


I tried the forllowing:
when I search for Ian Morrish as author: http://www.wssdemo.com/Search/Pages/results.aspx?k=Author%3A%22Ian%20Morrish%22
There are 11008 documents but only 1012 hits on author Ian Morrish.

When I search free text on Ian Morrish: http://www.wssdemo.com/Search/Pages/results.aspx?k=ian%20morrish
There are 3584 documents but the count on Ian Morrish on author is only 940

When I search on microsoft as free text: http://www.wssdemo.com/Search/Pages/results.aspx?k=microsoft
the no of documenst are 512 but the count on author Ian Morrish is 991

When I search on author:"ian morrish" there are 11008 documents, if I then click on the filter ianmorr(2) then 3 documents are returne but now there are ianmorr in the filter with (5)

Thanks very much :)

Dan



Nov 5, 2007 at 8:29 PM
The SharePoint OOTB search result count is an estimate only. You will notice as you start paging through the results that the count changes. The Facet count is probably more accurate :-)

Regards,
Ian
Nov 6, 2007 at 8:10 AM
Would somebody please, in plain english explain what the number after the facettes means, and how it is calcualated. Secondly how the facettes them selves are established.

Thanks :)

Dan
Nov 6, 2007 at 8:08 PM
I'm with Dan. I really have no idea what is going on with this example. Maybe it's outdated code? FWIW: I have an example running internally at MS which behaves very differently than this example.

Here's what is confusing about the usibility in this example:
1. Do a free-text search on "MOSS." You get 512 results.
2. Then click on "Administration" on the "Topic" Facet - you are told that there will be 29 hits, but instead 40 results come back. (You should never get more results than the number indicated on the filter).
3. In addition to this, some "Topics" disappar, but not all - there are still several left, including "Administration," which is confusing. Filtering on "Administration" should exclude ther "Topic" filters. If you want to undo "Administration," and replace it with another, that's what the breadcrumb trail should be used for.
4. There is something very strange with the general fact that you could click on a filter like Author = Ian Morrish (40), and get 80 results. Why even list the number of hits if it has no bearing on the number of results you'll get back? This defeats the power of Faceted Search.

Why does happen? Probably because the facet number indicated behind the tag is counting the author multiple times: once in the doc properties, and again if the author's name appears in text.... that's what I think.

For a very intuitive example of faceted searching that is available on the open internet, see: http://www.acornweb.org. In this example, never do you get more results than the number indicated in the filter. In the example here, sometimes you get less, and sometimes more, which baffles me.
Nov 8, 2007 at 6:38 PM
A bit of background.
A MOSS search Managed Property may contain multiple mappings to crawled properties. There is an option of:

1. Include values from all crawled properties mapped
or
2. Include values from a single crawled property based on the order specified

There is a bug with 2, especially with the Author property which causes it not to work as expected. This is fixed in SP1 (which is probably running on the MS internal MOSS servers and would explain why the problem is not seen there.
MSDN Doc http://msdn2.microsoft.com/en-us/library/ms497276.aspx





Nov 9, 2007 at 12:42 AM
Looks like the bug may have been fixed with the latest security patch (you could consider that patch a pre-SP1 fix given the amount of stuff it updates).
I set the Author property to option 2 above on wssdemo.com and the facet count looks much better now for Author filtering.

The problem you see with the Topic facit is that this metadata is only used on a limited number of content types in this site. So:
The initial search for MOSS is returning all items in the index
Adding any of the Topic facits to the search (which does an And with the free text string) is limiting the result set to a only indexed items that are based on a content type that includes the Topic column.

There is still some consistancy issues with the numbers.
I have another tab which limits the results to the resources that use the Topic metadata column
http://www.wssdemo.com/search/Pages/Resourceesults.aspx?k=MOSS
It seems that any scope or pre-property filtering of the results web part is ignored by the facet item count.

Regards,
Ian
Nov 9, 2007 at 9:05 AM
Hi everybody

Sorry for asking exactly the same question again:

"Would somebody please, in plain english explain what the number after the facettes means, and how it is calcualated. Secondly how the facettes them selves are established."

- When I search for Ian Morrish as author: http://www.wssdemo.com/search/Pages/Resourceesults.aspx?k=Author%3A%22Ian%20Morrish%22
There are 1792 documents but only 500 hits on author Ian Morrish, and 6 on Topic "MOSS 2007"

- When clicking on "MOSS 2007" in the facetted navigation in this search there are now suddenly 465 hits: http://www.wssdemo.com/search/Pages/Resourceesults.aspx?k=Author%3a%22IanMorrish%22Platform%3a%22MOSS+2007%22

This is highly illogical.

A quick answer on how the numbers are calculated would help alot :)

Best
Dan





Nov 9, 2007 at 9:08 AM
Maybe it's is related to this (http://www.codeplex.com/FacetedSearch/WorkItem/View.aspx?WorkItemId=1724) issue? Currently, the SQL used in the faceted search uses a CONTAINS condition for each facet. According to this (http://msdn2.microsoft.com/en-us/library/ms544086.aspx) article, CONTAINS uses linguistic matching, which may provide extra hits when the facet value is clicked.
Nov 26, 2007 at 12:41 PM
Edited Nov 26, 2007 at 12:41 PM
Hi Everybody

Sorry for asking the same question again and again:

"Would somebody please, in plain english explain what the number after the facettes means, and how it is calcualated. Secondly how the facettes them selves are established."

Is there a reason for not sharing this information?

Best

Dan

Nov 26, 2007 at 3:57 PM
Hi Dan,
the number after the facets should imply the number of matches with that facet value, but there seems to be something wrong with the code. If you have set up a compiler, you may want to try this:

In the SearchFacets.cs file, GetWHERE method, change the

where.Add(" CONTAINS(\"" + propName + "\", '\"" + propValue + "\"') ");

statement to

where.Add(" \"" + propName + "\" LIKE '" + propValue + "' ");
Nov 26, 2007 at 4:26 PM
Hi Knut :)

I don't know whare you are based, it sounds scandinavian :), I'm based in Copenhagen. Im not a developer, but trying to understand how the numbers are calculated. Are the number of mathes done for all documents in the search result or only for a limited number of documents, e.g. the first 512 documents.

You are more then welcome to mail me directly also :) dth@interse.com

Best

Dan
Nov 27, 2007 at 11:30 AM
Hi Dan, I'm working from Oslo ;)

The numbers behind the facets should be equal to the number of search results with that distinct facet value. For instance, if you search for "microsoft" and there are 6 documents and 3 spreadsheets matching this, then the contenttypes should indicate something like "documents (6)" and "spreadsheets (3)", a total of 9 results. You should then be able to click on a facet value ("documents (6)") and see the corresponding 6 documents.

The problem is - the searchfacets webpart executes its own seperate search, which is not necessarily equal to the search executed by the searchresults webpart. I suspect this is why the numbers don't match. Unfortunately, the searchresults webpart is not opensource, so there's no way for us to compare the code.
Dec 8, 2007 at 4:10 AM
To add to this discussion, it is really important how your SSP configured to count crawled properties. Often the same managed property is a result of several different crawled properties, e.g. Author mapped at least to 2. By default selected 1st option of:
* Include values from all crawled properties mapped
* Include values from a single crawled property based on the order specified

The Faceted Search will thus count each hit of each assigned crawled property to the counter. Try the same when the 2nd option chosen (some reindexing might be required), and you'll get different counts.

danthomsen wrote:
Hi

In order to understand the number and the filter, could you please in detail explain what the number represents, and how the number is calculated. Is it calculated based on part of the result ?


I tried the forllowing:
when I search for Ian Morrish as author: http://www.wssdemo.com/Search/Pages/results.aspx?k=Author%3A%22Ian%20Morrish%22
There are 11008 documents but only 1012 hits on author Ian Morrish.

When I search free text on Ian Morrish: http://www.wssdemo.com/Search/Pages/results.aspx?k=ian%20morrish
There are 3584 documents but the count on Ian Morrish on author is only 940

When I search on microsoft as free text: http://www.wssdemo.com/Search/Pages/results.aspx?k=microsoft
the no of documenst are 512 but the count on author Ian Morrish is 991

When I search on author:"ian morrish" there are 11008 documents, if I then click on the filter ianmorr(2) then 3 documents are returne but now there are ianmorr in the filter with (5)

Thanks very much :)

Dan





Apr 19, 2008 at 1:30 PM
Edited Apr 19, 2008 at 1:35 PM
Should we expect a fix to this problem in a future version of Faceted Search? (I'm using Faceted Search v2, MOSS SP1)

Update: I just found the answer to my question. It seems to be on the to-do list for version 2.5: http://www.codeplex.com/FacetedSearch/WorkItem/View.aspx?WorkItemId=3909
Apr 23, 2008 at 2:54 PM


Tourpe wrote:
Should we expect a fix to this problem in a future version of Faceted Search? (I'm using Faceted Search v2, MOSS SP1)

Update: I just found the answer to my question. It seems to be on the to-do list for version 2.5: http://www.codeplex.com/FacetedSearch/WorkItem/View.aspx?WorkItemId=3909


Hi Tourpe,

Yes, as you correctly point out we're aware of this issue and we're working towards a resolution.

Thanks,

Shaun
Jun 2, 2008 at 2:47 PM
Hi
Can i please know how to hide/remove the facet count ?
I have downloaded the source code. If you please let me know which files to update it would be much appreciated
Thanks
Aug 24, 2008 at 11:24 PM
Wow long thread without answers.....I have recelty found the numbers just don't add up.  Just installed the newest of everything..but the counts of the facets are all wrong....how so?  Why Why Why?  How could this have been overlooked?  I see somen blaming how you configure the SSP...well, tell me how to configure the SSP..better yet put it in your install directions...
Aug 25, 2008 at 4:00 AM
The easiest way to remove counters is to update FacetLight template.
Oct 17, 2008 at 8:23 AM
Edited Oct 17, 2008 at 10:03 AM
How do you change template? Must get rid of the counters since the counting is all wrong. This might at least foul the users until this is fixed.

--edit
I Fixed it bye going in to the code and changing the template. I guessed that was how it should be done.
Oct 22, 2008 at 7:25 PM
Ludvig,

Please can you post your experiences of Facet counts here and the magnitude of differences between Core Search and Faceted Search results?

Thanks,

Shaun O'Callaghan
Oct 23, 2008 at 8:13 AM
Sure!

First some background:
I have three different server I'm running it on.
(M1) MOSS 2007 Enterprise x64(with infrastructure update for booth wss and moss 2007) running the 27 sep installation package from this site.
(M2) MOSS 2007 Enterpise x64 running older version of facet.
(MSS1) Microsoft Server Server 2008 with compiled version from svn (25448)
M1 and MSS1 as allmost the same content

(MSS2) Microsoft Search Server 2008 with 27 sep installation package from this site
This one has totally different content with some

They all work sadly the same. Result is always different.

Example on (MSS1):
1: I search on application support in the normal search box. I get (1024 core and 2190 facet)(M1: 1024 core and 2348 facet)
2: I apply on the search a facet content source Network Storage (1934). I get (768 core and 1934 facet)(M1: 768 core and 2081 facet)
3: I apply the navigator content type html (805). I get (187 core and 805 facet)(M1: 118 core and 893 facet)
4. I apply the naviagotr author Microsoft User (1). I get (0 core and 1 facet)(M1: 0 core and 1 facet) (This one is for me the worst one since this really confuses the user)

One more example (MSS1):
1. I search on my name Ludvig Johansson. I get (512 core and 724 facet)(M1: 256 core and 642 facet)
2. I apply navigator autorh Ludvig Johansson. I get (15 core and 15 facet)(M1: 17 core and 17 facet)
3. I apply navigator language Swedish. I get (14 core and 13 facet)(M1: 15 and 15)

Its allmost never syncronized.
Maybe the best would be to create a core result search webpart that actually is fully syncronized with the facet. In this way you can allways be sure that its perfect. Then you can allso make this a fulltext result search part with wildcard support.

Please just ask if you want more examples or want me to test something. I have alot of test enviorments avilable.
Hope we can get this working. Its a really needed function.
Nov 10, 2008 at 8:15 AM
Had any time to work with this?
Nov 10, 2008 at 5:31 PM
Just to emphasize: the SVN source is work in progress and not included into any release on purpose.

The discrepancy in search counters is very content dependent. It wold be helpful if you describe what your content sources are built of. No promise though.

Nov 10, 2008 at 6:07 PM
Hi!
Of course I get that. But thanks for the note. I only use SVN for internal testing. And use the release for production.

Okey, so to content.
One system that we have at a customer (MSSX): Sharepoint 2003 sites with normal documents only. Around 140 000 documents.
Internal system (Moss 2007 SP1 with MSS infrastructure update): MOSS 2007 sites with documents, Network storage with files (all kinds of filetypes), 2 different web page systems.  Around 20000 document.
Internal system (MSSX): The same as above.

They all have the same flaws sadly. If you want more information feel free to ask. I can give you even more detailed if you want.
Nov 18, 2008 at 12:05 PM
Ludvig,

Sorry for the delay in responding.

The problem with the facet counts differing from core search results is that the SharePoint search functionality is actually encapsulated inside an internal class which obviously cannot be derived.  The problem with synchronisation has always been that two seperate calls are made, one from Core Search and one from Faceted Search, which makes this problem harder to resolve.  There are methods to resolve this problem through synchronisation: the principal one being through reflection and a lot of research.

As soon as there is an update I'll let you know.

Thanks,

Shaun
Nov 19, 2008 at 8:27 AM
Thanks for you replay.
How big of a work would it be to create a new core result. That uses free text search instead. Don't have to derive it just create it from scratch? This would enhance the search great. Because in this way you can make them perfectly synced and features like wild card can also be used.  I think this solution would benefit from this a lot.

//Ludvig
Nov 19, 2008 at 7:38 PM
Edited Nov 19, 2008 at 7:39 PM
Hi Ludvig,

Thank you for your comment.  Although that sounds like a simple fix unfortunately it will only solve half of the problem.  Yes in principle we could very quickly create a new search core results web part but what about the other enterprise search web parts?  Unfortunately it's not just a case of recreating search core results there are also others like Search Statistics web part and paging etc.  The reason for this is because these web parts, from what I understand, all use a query generated by an internal class.

So yes, you can create a new search core results, and sync between Core and Faceted... but then there's the other web parts.  The solution to this problem has to be solved via synchronisation with core search unfortunately and this isn't something that's particularly easy to do.  I just don't see any other way, however if you do we'd be very happy indeed to hear your thoughts.

Meanwhile, we progress towards a solution

Thanks,

Shaun O'Callaghan
Nov 20, 2008 at 9:14 AM
Thanks Shaun,

I thought it would be something like that. And recreating everything would be a bummer of course. But still how can they be so synchronized when one use keyword and the other one freetext search. Can these two really work the same?

I can't say I have any other way of solving this sadly. I'm working at a company specialized in search and search engines and I can just say that we have stumbled on this problem before with other search engines that doesn't support navigators from the core of the engine. It's always hard since you have to make assumptions since you can't really get the full resultset without having serious performance issues.

In this case I think the synchronization would be really hard for you to make. Because at least as far my knowledge goes, the only thing you could get of from finding the query in the internal class would be witch parameters it goes on. But we would still have the same problem with different query types. If one would create the search web parts and have a core part that makes the search in the exact same way for all the parts it will higher up the quality significant. We would get the same behavior on all the web parts, we could make the search box look better without having to have the facets name inside it etc.

But of course, if you have a solution coming I'm gladly looking forward for testing it.

Regards
Ludvig Johansson
Nov 20, 2008 at 10:52 AM
Ludvig,

Thanks again for your response.

To my understanding, SearchCoreResults handles both Keyword and FreeText queries: KeywordQuery coming from the search text box and FullTextSqlQuery coming from Advanced Search.  This is based upon .NET reflection and conjecture, however.

If you have any more thoughts, please let me know.

Thanks,

Shaun O'Callaghan
Dec 10, 2008 at 12:42 PM
This is a project that you guys definitive should look up:
http://www.codeplex.com/WildcardSearch

He have found out ways to extract information from core result part I think.
These to projects should be used together and everything would be awesome :-)


Dec 10, 2008 at 11:11 PM
Ludvig,

Please check out the latest release.  There have been major improvements in these areas.  Please could you test and report back some preliminary results?

Thanks,

Shaun O'Callaghan
Jan 21, 2009 at 2:38 PM
I have now tested this in three different environments with no real improvement. We still get very unsynchronized results.
Last I saw a result list with 7 hits and a navigator for html on 64 hits. Not that good. Have also seen the worst one where you have zero hits in result and still have navigators.
Jan 24, 2009 at 6:27 PM
Edited Jan 25, 2009 at 5:06 PM
Could someone please document the template that needs to be changed, its location and and code that must be modified to remove the hit count? 
Jan 26, 2009 at 8:29 AM
Hikmer:

With svn version 28032 (latest).
Open Templates.cs and go to line 211:

container.Controls.Add(hits);

Uncomment this line and your good to go. Remember that this will not solve the problem that you have navigators showing no hits sadly.


Feb 5, 2009 at 7:37 AM
Edited Feb 5, 2009 at 7:37 AM
Check this thread out:
http://www.codeplex.com/FacetedSearch/Thread/View.aspx?ThreadId=46176