CSV collection is used to index records from the CSV file.
You can create a CSV collection by using the following steps.
- After logging in to the Admin Console, select the Collections tab and click on Create a New
Collection or "+" icon.
- Choose CSV Collection as Collection Type.
- Enter a unique name for your collection (for example, CSV).
- Choose Private/Public Collection Access and Collection Encryption as per the requirements.
- Choose the language of the content (if the language is other than English).
- Click Save to create the collection.
- Once the CSV collection is created you will be taken to the CSV Setting tab.
- CSV setting values must be set explicitly for CSV collections.
- The mandatory settings for CSV collection are
- Folder Path
- Unique field
- It is required to map a unique field in CSV in CSV collection settings. Only if the mapped field is unique all records in the CSV file will be indexed.
- SearchBlox also comes pre-configured with few additional parameters when a new collection is created which can be modified as required.
- The following table has the list of settings available in CSV collection
|Folder Path||The folder path where the CSV file(s) is available, which can be done by uploading or by directly giving the CSV file path.|
|Field Separator||CSV files are separated by a comma so “,” is given (default value).|
|Escape Character||The escape character is “;” (default value).|
|Quote Character||Quote, value is a single quote “’” (default value).|
|Use first record as header||If the first record in the CSV file has to be taken as the header then check this box.|
|Unique Field||This unique field should have the name of the CSV column that has unique values in each row.|
This value is very important for indexing and searching values from the CSV file indexed.
|Relevance - Remove Duplicate||Avoids the indexing of duplicate documents, i.e., documents which have the same exact content. The default is NO|
|Relevance - Stemming||Stemming considers the inflected words of the root form within the search page. For example, "running", "runs", and "ran" are all inflected forms of run. The default is YES.|
|Relevance - Spelling Suggestions||Provide spelling suggestions for the collection. The default is YES.|
|Keyword-in-Context Display||The keyword-in-context returns search results with the description displayed from content areas where the search term occurs.|
|Enable Detailed Log Settings||When debug mode is enabled, indexing activity gets logged in detail within the index.log. Log details include: Indexing status of each URL along with timestamp, URL indexing status along with timestamp, status code and time taken for indexing. By default this is set to NO|
|Enable Content API||Provides the ability to crawl the document content with special characters included.|
- Once we click on the save button, we can index/preview the uploaded CSV file.
Note: If the Unique Field values are not unique, the CSV collection results will match the number of records in the CSV file.
Sets the frequency and the start date/time for indexing a collection for the given folder path. Schedule Frequency supported in SearchBlox is as follows:
- Every 48 Hours
- Every 96 Hours
The following operations can be performed in CSV collection.
|Schedule||For each collection, indexing can be scheduled based on the above options.|
- Users can view the search results by searching for the records here: https://localhost:8443/search/index.html.
- After clicking the search results, the data will appear in a grid format as shown:
Using Data Fields tab we can create custom fields for search and we can see the Default Data Fields with non-encrypted collection. SearchBlox supports 4 types of Data Fields as listed below:
- Once the Data fields are configured, collection must be cleared and re-indexed to take effect.
To know more about Data Fields please refer to Data Fields Tab
- Please ensure to have a unique field in the CSV file so that it can be mapped to a unique id in the collection settings for indexing all records successfully.
- Please ensure to maintain the same column schema if the CSV folder has multiple CSV files.
- Please ensure that there are open and close quotes if available, and the quote character is rightly specified in the settings.
- If there is an open quote and no close quote, please remove the quote character.
- If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.
Updated over 1 year ago