Tag: Search
Integrating Sitecore with Algolia

Integrating Sitecore with Algolia

Out of the box Sitecore ships with Solr as its search provider. As a Sitecore developer the amount of Solr knowledge you need is relatively low, as you access it through Sitecore's own APIs. This makes things simple to get going as it doesn't require a huge amount of effort. However this is where all that's good about using Solr seems to end.

It's not that Solr is bad, it's actually very powerful, has a load of config options for boosting fields, result items etc. If there's something you want to do with your search results then it can probably do it. For admin users though it's just a bit of a black box. Results come out in an order and sometimes your not sure why. If you asked a content editor to change the order of search results they would look at you blankly and not have a clue where to start, other than to ask their dev to do it for them.

Algolia on the other hand, has been designed for the end user. They can try searches through the admin interface, drag and drop results into a different order, run campaigns and affect results in numerous other ways. Not only that but it offers analytics so they can see what searches are returning no results along with searches that have results, but no click throughs.

For devs it's also easy to see what's actually in the search index and front end devs can easily integrate through the APIs rather than requiring a .NET dev to write something against Sitecore's search provider for them.

Creating a search with Algolia and Sitecore

In this article I'm going to show you how to populate an Algolia index with data from Sitecore. What I'm not doing is creating a new Sitecore search provider for Algolia. Other people have attempted that before, but it requires a lot of maintenance. You also have to implement a lot of functionality that your unlikely to use!

My aim also isn't to replace Solr. Sitecore uses it for some of it's functionality and is doing that job perfectly fine. My aim is to add a search to the front end of a Sitecore site powered by Algolia so that a content editor can make use of Algolias features. For that I just need relevant content from Sitecore to be added and removed from Algolias index when a publish happens.

Populating Algolias Index from Sitecore

We want our Algolia Index to contain data for published items and to update as future publishes occur. Publishing being when content is published from the Master DB to the Web DB. A good way to do this is to hook into the Sitecore publishing pipeline.

In my solution I am creating a new pipeline processor that calls directly to Algolia. In my case the amount of content is relatively small and for usability when the publish dialogue completes I want the content editors to be confident that the index has updated. A more scalable solution that is less blocking would be to first post the data to a service bus and then have the integration to Algolia subscribe to the bus. This way any delay caused by Algolia wont affect the publishing experience.

The default PublishItem pipeline in Sitecore is as follows:

1<publishItem help="Processors should derive from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor">
2<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessingEvent, Sitecore.Kernel"/>
3<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckVirtualItem, Sitecore.Kernel"/>
4<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckSecurity, Sitecore.Kernel"/>
5<processor type="Sitecore.Publishing.Pipelines.PublishItem.DetermineAction, Sitecore.Kernel"/>
6<processor type="Sitecore.Buckets.Pipelines.PublishItem.ProcessActionForBucketStructure, Sitecore.Buckets" patch:source="Sitecore.Buckets.config"/>
7<processor type="Sitecore.Publishing.Pipelines.PublishItem.MoveItems, Sitecore.Kernel"/>
8<processor type="Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel"/>
9<processor type="Sitecore.Publishing.Pipelines.PublishItem.AddItemReferences, Sitecore.Kernel"/>
10<processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel"/>
11<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessedEvent, Sitecore.Kernel" runIfAborted="true"/>
12<processor type="Sitecore.Publishing.Pipelines.PublishItem.UpdateStatistics, Sitecore.Kernel" runIfAborted="true">
13<traceToLog>false</traceToLog>
14</processor>
15</publishItem>

The Sitecore.Publishing.Pipelines.PublishItem.PerformAction step in the pipeline is the one which does the actual work of updating the web db.

To capture deletes as well as inserts / updates we need a step to happen both before and after this action. The step before will capture the deletes and the step after will push the update to Algolia.

My code for capturing the deletes is as follows. This is needed as once the PerformAction step has finished, the item no longer exists so we need to grab it first.

1using Sitecore.Diagnostics;
2using Sitecore.Publishing;
3using Sitecore.Publishing.Pipelines.PublishItem;
4
5namespace SitecoreAlgolia
6{
7 public class DelateAlgoliaItemsAction : PublishItemProcessor
8 {
9 public override void Process(PublishItemContext context)
10 {
11 Assert.ArgumentNotNull(context, "context");
12
13 // We just want to process deletes because this is the only time the item being deleted may exist.
14 if (context.Action != PublishAction.DeleteTargetItem && context.Action != PublishAction.PublishSharedFields)
15 return;
16
17 // Attempt to find the item. If not found, item has already been deleted. This can occur when more than one langauge is published. The first language will delete the item.
18 var item = context.PublishHelper.GetTargetItem(context.ItemId) ??
19 context.PublishHelper.GetSourceItem(context.ItemId);
20
21 if (item == null)
22 return;
23
24 // Hold onto the item for the PublishChangesToAlgoliaAction PublishItemProcessor.
25 context.CustomData.Add("Item", item);
26 }
27 }
28}
29

With the deletes captured the next pipeline action will push each change to Algolia.

The first part of my function is going to ignore anything we're not interested in. This includes:

  • Publishes where the result of the operation was skipped or none (as nothings changed)
  • If we don't have an item
  • If the template of the item isn't one we're interested in pushing to Algolia
  • If the item is a standard values
1// Skip if the publish operation was skipped or none.
2if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) && (context.Result.Operation == PublishOperation.Skipped || context.Result.Operation == PublishOperation.None))
3 return;
4
5// For deletes the VersionToPublish is the parent, we need to get the item from the previous step
6var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
7 if (item == null)
8 return;
9
10// Restrict items to certain templates
11var template = TemplateManager.GetTemplate(item);
12// SearchableTemplates is a List<ID>
13if (!SearchableTemplates.Any(x => template.ID == x))
14 return;
15
16// Don't publish messages for standard values
17if (item.ParentID == item.TemplateID)
18 return;

Next I convert the Sitecore items into a simple poco object. This is what the Algolia client requires for updating the index.

Notice the first property is called ObjectID, this is a required property for Algolia and is used to identify the record for updates and deletes. I'm using the Sitecore Item ID for this.

1// Convert item to the model for Algolia
2var searchItem = new SearchResultsItem()
3{
4 ObjectID = item.ID.ToString(),
5 Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
6 Content = item.Fields[FieldNames.Base.Content].Value,
7 Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
8};

One thing to note that I've not included here is to be careful with any link fields. If you are wanting to add a URL into Algolia you may find that the site context the publishing pipeline runs in may not be the same as your final website and therefore you need to set some additional URLOptions on the LinkManager to get the correct URLs.

Finally to push to Algolia it's a case of creating the SearchClient, initializing the index and picking the relevant operation on the index. Just make sure you install the Algolia.Search NuGet package.

1// Init Algolia Client
2SearchClient client = new SearchClient("<Application ID>", "<API Key>");
3SearchIndex index = client.InitIndex("<Index Name>");
4
5// Decide what type of update is going to Algolia
6var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
7switch (operation)
8{
9 case PublishOperation.Deleted:
10 // Delete
11 index.DeleteObject(item.ID.ToString());
12 break;
13 case PublishOperation.Skipped:
14 // Skipped
15 break;
16 default:
17 // Created / Update
18 index.SaveObject(searchItem);
19 break;
20}

My complete class looks like this. For simplicity of the article I've built this quite crudely with everything in one giant function. For production you would want to split up as per good coding standards.

1using Algolia.Search.Clients;
2using SitecoreAlgolia.Models;
3using Sitecore.Data;
4using Sitecore.Data.Items;
5using Sitecore.Data.Managers;
6using Sitecore.Diagnostics;
7using Sitecore.Links;
8using Sitecore.Publishing;
9using Sitecore.Publishing.Pipelines.PublishItem;
10using System.Collections.Generic;
11using System.Linq;
12
13namespace SitecoreAlgolia
14{
15 public class PublishChangesToAlgoliaAction : PublishItemProcessor
16 {
17 private static readonly List<ID> SearchableTemplates = new[] {
18 ItemIds.Templates.PageTemplates.EventItem,
19 ItemIds.Templates.PageTemplates.Content,
20 }.Select(x => new ID(x))
21 .ToList();
22
23 public override void Process(PublishItemContext context)
24 {
25 Assert.ArgumentNotNull(context, "context");
26
27 // Skip if the publish operation was skipped or none.
28 if ((context.Action != PublishAction.DeleteTargetItem || context.PublishOptions.CompareRevisions) &&
29 (context.Result.Operation == PublishOperation.Skipped ||
30 context.Result.Operation == PublishOperation.None))
31 return;
32
33 // For deletes the VersionToPublish is the parent, we need to get the item from the previous step
34 var item = (Item)context.CustomData["Item"] ?? context.VersionToPublish;
35 if (item == null)
36 return;
37
38 // Restrict items to certain templates
39 var template = TemplateManager.GetTemplate(item);
40 // SearchableTemplates is a List<ID>
41 if (!SearchableTemplates.Any(x => template.ID == x))
42 return;
43
44 // Don't publish messages for standard values
45 if (item.ParentID == item.TemplateID)
46 return;
47
48 // Convert item to the model for Algolia
49 var searchItem = new SearchResultsItem()
50 {
51 ObjectID = item.ID.ToString(),
52 Title = item.Fields[FieldNames.Standard.MenuTitle].Value,
53 Content = item.Fields[FieldNames.Base.Content].Value,
54 Description = item.Fields[FieldNames.Standard.ShortDescription].Value,
55 };
56
57 // Init Algolia Client
58 SearchClient client = new SearchClient("<Application ID>", "<API Key>");
59 SearchIndex index = client.InitIndex("<Index Name>");
60
61 // Decide what type of update is going to Algolia
62 var operation = (context.Action == PublishAction.DeleteTargetItem && !context.PublishOptions.CompareRevisions) ? PublishOperation.Deleted : context.Result.Operation;
63 switch (operation)
64 {
65 case PublishOperation.Deleted:
66 // Delete
67 index.DeleteObject(item.ID.ToString());
68 break;
69 case PublishOperation.Skipped:
70 // Skipped
71 break;
72 default:
73 // Created / Update
74 index.SaveObject(searchItem);
75 break;
76 }
77 }
78 }
79}
80

To get our code to run we now need to patch them in using a config file.

1<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
2 <sitecore>
3 <pipelines>
4 <publishItem>
5 <processor patch:before="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.DelateAlgoliaItemsAction, SitecoreAlgolia"/>
6 <processor patch:after="*[@type='Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel']" type="SitecoreAlgolia.PublishChangesToAlgoliaAction, SitecoreAlgolia"/>
7 </publishItem>
8 </pipelines>
9 </sitecore>
10</configuration>

And that's it. As you publish changes the Algolia index will get updated and a front end can be implemented against Algolias API's as it would on any other site.

Special thanks to Mike Scutta for his blog post on Data Integrations with Sitecore which served as a basis for the logic here.

Populating the internal search report in Sitecore

Populating the internal search report in Sitecore

Out the box Sitecore ships with a number of reports pre-configured. Some of these will show data without you doing anything. e.g. The pages report will automatically start showing the top entry and exit pages as a page view is something Sitecore can track.

Other's like the internal search report will just show a message of no data to display, which can be confusing/frustrating for your users. Particularly when they've just spent money on a license fee to get great analytics data only to see a blank report.

The reason it doesn't show any information is relatively straight forward. Sitecore doesn't know how your site search is going to work and therefore it can't do the data capture part of the process. That part of the process however is actually quite simple to do.

Sitecore has a set of page events that can be registered in the analytics tracker. Some of these like Page Visited will be handled by Sitecore. In this instance the one we are interested in is Search and will we have to register it manually.

To register the search event use some code like this (note, there is a constant that references the item id of the search event). The query parameter should be populated with the search term the user entered.

1using Sitecore.Analytics;
2using Sitecore.Analytics.Data;
3using Sitecore.Data.Items;
4using Sitecore.Diagnostics;
5using SitecoreItemIds;
6
7namespace SitecoreServices
8{
9 public class SiteSearch
10 {
11 public static void TrackSiteSearch(Item pageEventItem, string query)
12 {
13 Assert.ArgumentNotNull(pageEventItem, nameof(pageEventItem));
14 Assert.IsNotNull(pageEventItem, $"Cannot find page event: {pageEventItem}");
15
16 if (Tracker.IsActive)
17 {
18 var pageEventData = new PageEventData("Search", ContentItemIds.Search)
19 {
20 ItemId = pageEventItem.ID.ToGuid(),
21 Data = query,
22 DataKey = query,
23 Text = query
24 };
25 var interaction = Tracker.Current.Session.Interaction;
26 if (interaction != null)
27 {
28 interaction.CurrentPage.Register(pageEventData);
29 }
30 }
31 }
32 }
33}

Now after triggering the code to be called a few times, your internal search report should start to be populated like this.

Sitecore Search and Indexing: Creating a simple search

With Sitecore 7, Sitecore introduced the new Sitecore.ContentSearch API which out of the box can query Lucene and SOLR based index's.

Searching the index's has been made easier through Linq to Sitecore that allows you to construct a query using Linq, the same as you would use with things like Entity Framework or Linq to SQL.

To do a query you first need a search context. Here I'm getting the a context on one of the default index's:

1using (var context = ContentSearchManager.GetIndex("sitecore_web_index").CreateSearchContext()) { ... }

Next a simple query would look like this. Here I'm doing a where parameter on the "body" field:

1using (var context = ContentSearchManager.GetIndex("sitecore_web_index").CreateSearchContext())
2{
3 IQueryable<SearchResultItem> searchQuery = context.GetQueryable<SearchResultItem>().Where(item => item["body"] == “Sitecore”)
4}

But what if you want to add a search to your site. Typically you would want to filter on more than one field, what the user enters may be a collection of words rather than an exact phrase and you'd also like some intelligent ordering to your results.

Here I am splitting the search term on spaces and then building a predicate that has an "or" between each of its conditions. For each condition rather than doing a .Contains on a specific field, I'm doing it on a content field that will contain data for all fields in the item.

1using (var context = ContentSearchManager.GetIndex("sitecore_web_index").CreateSearchContext())
2{
3 IQueryable<SearchResultItem> query = context.GetQueryable<SearchResultItem>();
4
5 var predicate = PredicateBuilder.True<SearchResultItem>();
6
7 foreach (string term in criteria.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
8 {
9 predicate = predicate.Or(p => p.Content.Contains(searchTerm.Trim()));
10 }
11
12 SearchResults<SearchResultItem> searchResults = query.Where(predicate).GetResults();
13
14 results = (from hit in searchResults.Hits
15 select hit.Document).ToList();
16}

The intelligent ordering of results you will get for free based on what was search for.