CSV collection is used to index records from the CSV file.
You can create a CSV collection by using the following steps.
- After logging in to the Admin Console, select the Collections tab and click on Create a New
Collection or "+" icon.
- Choose CSV Collection as Collection Type.
- Enter a unique name for your collection (for example, CSV).
- Choose Private/Public Collection Access and Collection Encryption as per the requirements.
- Choose the language of the content (if the language is other than English).
- Click Save to create the collection.
- Once the CSV collection is created you will be taken to the CSV Setting tab.
- CSV setting values must be set explicitly for CSV collections.
- The mandatory settings for CSV collection are
- Folder Path
- Unique field
- It is required to map a unique field in CSV in CSV collection settings. Only if the mapped field is unique all records in the CSV file will be indexed.
- SearchBlox also comes pre-configured with few additional parameters when a new collection is created which can be modified as required.
- The following table has the list of settings available in CSV collection
The folder path where the CSV file(s) is available, which can be done by uploading or by directly giving the CSV file path.
CSV files are separated by a comma so “,” is given (default value).
The escape character is “;” (default value).
Quote, value is a single quote “’” (default value).
Use first record as header
If the first record in the CSV file has to be taken as the header then check this box.
This unique field should have the name of the CSV column that has unique values in each row.
Relevance - Remove Duplicate
Avoids the indexing of duplicate documents, i.e., documents which have the same exact content. The default is NO
Relevance - Stemming
Stemming considers the inflected words of the root form within the search page. For example, "running", "runs", and "ran" are all inflected forms of run. The default is YES.
Relevance - Spelling Suggestions
Provide spelling suggestions for the collection. The default is YES.
The keyword-in-context returns search results with the description displayed from content areas where the search term occurs.
Enable Detailed Log Settings
When debug mode is enabled, indexing activity gets logged in detail within the index.log. Log details include: Indexing status of each URL along with timestamp, URL indexing status along with timestamp, status code and time taken for indexing. By default this is set to NO
- Once we click on the save button, we can index/preview the uploaded CSV file.
Note: If the Unique Field values are not unique, the CSV collection results will match the number of records in the CSV file.
Sets the frequency and the start date/time for indexing a collection for the given folder path. Schedule Frequency supported in SearchBlox is as follows:
- Every 48 Hours
- Every 96 Hours
The following operations can be performed in CSV collection.
For each collection, indexing can be scheduled based on the above options.
- Users can view the search results by searching for the records here: https://localhost:8443/search/index.html.
- After clicking the search results, the data will appear in a grid format as shown:
Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:
- Once the Data fields are configured, collection must be cleared and re-indexed to take effect.
To know more about Data Fields please refer to Data Fields Tab
- Please ensure to have a unique field in the CSV file so that it can be mapped to a unique id in the collection settings for indexing all records successfully.
- Please ensure to maintain the same column schema if the CSV folder has multiple CSV files.
- Please ensure that there are open and close quotes if available, and the quote character is rightly specified in the settings.
- If there is an open quote and no close quote, please remove the quote character.
- If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.
Updated 8 months ago