CSV Collection
CSV collection is used to index records from the CSV file.
Creating CSV Collection
You can create a CSV collection by using the following steps.
- After logging in to the Admin Console, click Add Collection button. The Add Collection screen will be displayed.
- Enter a unique name for your collection (for example, CSV).
- Select CSV collection radio button.
- Click Add to create the collection.
CSV Collection Settings
- The Settings sub-tab holds settings for CSV and tunable parameters for the search.
- CSV setting values must be set explicitly for CSV collections.
- The mandatory settings for CSV collection are
- Folder
- Unique field
- It is required to map a unique field in CSV in CSV collection settings. Only if the mapped field is unique all records in the CSV file will be indexed.
- SearchBlox also comes pre-configured with few additional parameters when a new collection is created which can be modified as required.
- The following table has the list of settings available in CSV collection
Field | Descriptioin |
---|---|
Folder | The folder path where the CSV file(s) is available. |
Field Separator | CSV files are separated by a comma so “,” is given (default value). |
Escape Character | The escape character is “;” (default value). |
Quote Character | Quote, value is a single quote “’” (default value). |
Use first record as header | If the first record in the CSV file has to be taken as the header then check this box. |
Unique Field | This unique field should have the name of the CSV column that has unique values in each row. This value is very important for indexing and searching values from the CSV file indexed. |
Keyword-in-Context Display | The keyword-in-context returns search results with the description displayed from content areas where the search term occurs. |
Boosting | Boost search terms for the collection by setting a value greater than 1 (maximum value 9999). |
Stemming | When stemming is enabled, inflected words are reduced to a root form. For example, "running", "runs", and "ran" are the inflected form of "run". |
Spelling Suggestions | When enabled, a spelling index is created at the end of the indexing process. |
Important Note:
Note: If the Unique Field values are not unique, the CSV collection results will match the number of records in the CSV file.
Indexing and Other Operations
The following operations can be performed in CSV collection.
Index | Starts the indexer for the selected collection. |
Clear | Clears the current index for the selected collection. |
Scheduled Activity | For each collection, any of the following scheduled indexer activity can be set: Index - Set the frequency and the start date/time for indexing a collection. Clear - Set the frequency and the start date/time for clearing a collection. |
- Indexer activity is controlled from the Index sub-tab in the collection. The current status of an indexer for a particular collection is indicated.
- Indexing operation starts the indexer for the Database collection.
- On reindexing that is, clicking on index again after the initial index operation, all crawled documents will be reindexed. If documents have been deleted from Database since the first index operation, they will be deleted from the index. New documents will also be indexed.
- Also, indexing is controlled from the Index sub-tab for a collection or through API. The current status of a collection is always indicated on the Collection Dashboard and the Index page.
- Index operation can also be initiated from the Collection Dashboard.
- Scheduling can be performed only from the Index sub-tab.
Viewing Search Results for CSV Collections
- Users can view the search results by searching for the records here: https://localhost:8443/searchblox/plugin/index.html.
- After clicking the search results, the data will appear in a grid format as shown:
- Customized facets can be added to the index.html page. The facet can be any value in the table that can be used to filter results.
- The results can also be viewed in JSON format by clicking the CSV search results in a plugin search: https://localhost:8443/searchblox/plugin/index.html?query=*&public=true&debug=true. See the following:
[
{"keywords":" 10 11 20 0 0 1 0 61 8 SFN 1 2 0 0 5 5 32 0 1 NL 0 0 8 2004 0.41 6.75 aardsda01",
"description":" 10 11 20 0 0 1 0 61 8 SFN 1 2 0 0 5 5 32 0 1 NL 0 0 8 2004 0.41 6.75 aardsda01",
"created_at":"2015-09-14T05:04:03.345Z",
"_autocomplete":" 10 11 20 0 0 1 0 61 8 SFN 1 2 0 0 5 5 32 0 1 NL 0 0 8 2004 0.41 6.75 aardsda01",
"source":"
{\"BB\":\"10\",\"G\":\"11\",\"H\":\"20\",\"IBB\":\"0\",\"BK\":\"0\",\"HR\":\"1\",\"L\":\"0\",\"BFP\":\"61\",
\"GIDP\":\"\",\"R\":\"8\",\"SF\":\"\",\"SH\":\"\",\"teamID\":\"SFN\",\"W\":\"1\",\"HBP\":\"2\",\"WP\":\"0\",
\"SHO\":\"0\",\"SO\":\"5\",\"GF\":\"5\",\"IPouts\":\"32\",\"SV\":\"0\",\"stint\":\"1\",\"lgID\":\"NL\",\"CG\":\"0\",
\"GS\":\"0\",\"ER\":\"8\",\"yearID\":\"2004\",\"BAOpp\":\"0.41\",\"ERA\":\"6.75\",\"playerID\":\"aardsda01\"}",
"title":"1",
"content":" 10 11 20 0 0 1 0 61 8 SFN 1 2 0 0 5 5 32 0 1 NL 0 0 8 2004 0.41 6.75 aardsda01",
"contenttype":"csv",
"uid":"1"
}
]
Best Practices
- Please ensure to have a unique field in the CSV file so that it can be mapped to a unique id in the collection settings for indexing all records successfully.
- Please ensure to maintain the same column schema if the CSV folder has multiple CSV files.
- Please ensure that there are open and close quotes if available, and the quote character is rightly specified in the settings.
- If there is an open quote and no close quote, please remove the quote character.
- Do not schedule the same time for two operations (Index, Clear).
- If you have multiple collections, always schedule the activity to prevent more than 2-3 collections indexing at the same time.
Updated almost 4 years ago