CSV Collection

CSV collection is used to index records from the CSV file.

Creating CSV Collection

You can create a CSV collection by using the following steps.

  • After logging in to the Admin Console, select the Collections tab and click on Create a New
    Collection or "+" icon.
  • Choose CSV Collection as Collection Type.
  • Enter a unique name for your collection (for example, CSV).
  • Choose Private/Public Collection Access and Collection Encryption as per the requirements.
  • Choose the language of the content (if the language is other than English).
  • Click Save to create the collection.
1660
  • Once the CSV collection is created you will be taken to the CSV Setting tab.

CSV Collection Settings

  • CSV setting values must be set explicitly for CSV collections.
  • The mandatory settings for CSV collection are
    • Folder Path
    • Unique field
  • It is required to map a unique field in CSV in CSV collection settings. Only if the mapped field is unique all records in the CSV file will be indexed.
  • SearchBlox also comes pre-configured with few additional parameters when a new collection is created which can be modified as required.
  • The following table has the list of settings available in CSV collection
FieldDescriptioin
Folder PathThe folder path where the CSV file(s) is available, which can be done by uploading or by directly giving the CSV file path.
Field SeparatorCSV files are separated by a comma so “,” is given (default value).
Escape CharacterThe escape character is “;” (default value).
Quote CharacterQuote, value is a single quote “’” (default value).
Use first record as headerIf the first record in the CSV file has to be taken as the header then check this box.
Unique FieldThis unique field should have the name of the CSV column that has unique values in each row.
This value is very important for indexing and searching values from the CSV file indexed.
Relevance - Remove DuplicateAvoids the indexing of duplicate documents, i.e., documents which have the same exact content. The default is NO
Relevance - StemmingStemming considers the inflected words of the root form within the search page. For example, "running", "runs", and "ran" are all inflected forms of run. The default is YES.
Relevance - Spelling SuggestionsProvide spelling suggestions for the collection. The default is YES.
Keyword-in-Context DisplayThe keyword-in-context returns search results with the description displayed from content areas where the search term occurs.
Enable Detailed Log SettingsWhen debug mode is enabled, indexing activity gets logged in detail within the index.log. Log details include: Indexing status of each URL along with timestamp, URL indexing status along with timestamp, status code and time taken for indexing. By default this is set to NO
Enable Content APIProvides the ability to crawl the document content with special characters included.
  • Once we click on the save button, we can index/preview the uploaded CSV file.

❗️

Important Note:

Note: If the Unique Field values are not unique, the CSV collection results will match the number of records in the CSV file.

Schedule and Index

Sets the frequency and the start date/time for indexing a collection for the given folder path. Schedule Frequency supported in SearchBlox is as follows:

  • Once
  • Hourly
  • Daily
  • Every 48 Hours
  • Every 96 Hours
  • Weekly
  • Monthly

The following operations can be performed in CSV collection.

ScheduleFor each collection, indexing can be scheduled based on the above options.

Viewing Search Results for CSV Collections

Data Fields Tab

Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:

Keyword
Number
Date
Text

  • Once the Data fields are configured, collection must be cleared and re-indexed to take effect.

To know more about Data Fields please refer to Data Fields Tab

👍

Best Practices

  • Please ensure to have a unique field in the CSV file so that it can be mapped to a unique id in the collection settings for indexing all records successfully.
  • Please ensure to maintain the same column schema if the CSV folder has multiple CSV files.
  • Please ensure that there are open and close quotes if available, and the quote character is rightly specified in the settings.
  • If there is an open quote and no close quote, please remove the quote character.
  • If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.