CSV Collection
CSV collection is used to index records from the CSV file.
Creating CSV Collection
You can create a CSV collection by using the following steps.
- After logging in to the Admin Console, select the Collections tab and click on Create a New
Collection or "+" icon. - Choose CSV Collection as Collection Type.
- Enter a unique name for your collection (for example, CSV).
- Choose Private/Public Collection Access and Collection Encryption as per the requirements.
- Choose the language of the content (if the language is other than English).
- Click Save to create the collection.
- Once the CSV collection is created you will be taken to the CSV Setting tab.
CSV Collection Settings
- CSV setting values must be set explicitly for CSV collections.
- The mandatory settings for CSV collection are
- Folder Path
- Unique field
- It is required to map a unique field in CSV in CSV collection settings. Only if the mapped field is unique all records in the CSV file will be indexed.
- SearchBlox also comes pre-configured with few additional parameters when a new collection is created which can be modified as required.
- The following table has the list of settings available in CSV collection
Field | Descriptioin |
---|---|
Folder Path | The folder path where the CSV file(s) is available, which can be done by uploading or by directly giving the CSV file path. |
Field Separator | CSV files are separated by a comma so “,” is given (default value). |
Escape Character | The escape character is “;” (default value). |
Quote Character | Quote, value is a single quote “’” (default value). |
Use first record as header | If the first record in the CSV file has to be taken as the header then check this box. |
Unique Field | This unique field should have the name of the CSV column that has unique values in each row. This value is very important for indexing and searching values from the CSV file indexed. |
Relevance - Remove Duplicate | Avoids the indexing of duplicate documents, i.e., documents which have the same exact content. The default is NO |
Relevance - Stemming | Stemming considers the inflected words of the root form within the search page. For example, "running", "runs", and "ran" are all inflected forms of run. The default is YES. |
Relevance - Spelling Suggestions | Provide spelling suggestions for the collection. The default is YES. |
Keyword-in-Context Display | The keyword-in-context returns search results with the description displayed from content areas where the search term occurs. |
Enable Detailed Log Settings | When debug mode is enabled, indexing activity gets logged in detail within the index.log. Log details include: Indexing status of each URL along with timestamp, URL indexing status along with timestamp, status code and time taken for indexing. By default this is set to NO |
Enable Content API | Provides the ability to crawl the document content with special characters included. |
- Once we click on the save button, we can index/preview the uploaded CSV file.
Important Note:
Note: If the Unique Field values are not unique, the CSV collection results will match the number of records in the CSV file.
Schedule and Index
Sets the frequency and the start date/time for indexing a collection for the given folder path. Schedule Frequency supported in SearchBlox is as follows:
- Once
- Hourly
- Daily
- Every 48 Hours
- Every 96 Hours
- Weekly
- Monthly
The following operations can be performed in CSV collection.
Schedule | For each collection, indexing can be scheduled based on the above options. |
Viewing Search Results for CSV Collections
- Users can view the search results by searching for the records here: https://localhost:8443/search/index.html.
- After clicking the search results, the data will appear in a grid format as shown:
Data Fields Tab
Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:
Keyword
Number
Date
Text
- Once the Data fields are configured, collection must be cleared and re-indexed to take effect.
To know more about Data Fields please refer to Data Fields Tab
Best Practices
- Please ensure to have a unique field in the CSV file so that it can be mapped to a unique id in the collection settings for indexing all records successfully.
- Please ensure to maintain the same column schema if the CSV folder has multiple CSV files.
- Please ensure that there are open and close quotes if available, and the quote character is rightly specified in the settings.
- If there is an open quote and no close quote, please remove the quote character.
- If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.
Updated about 2 months ago