Before using the Amazon S3 Connector, install SearchBlox successfully, then create a Custom Collection.
- All the files related to the connector should be available in the same folder that is, all files should be extracted into the same folder.
- Create a data folder on your drive where the files would be temporarily stored and mention in yml files.
Please contact [email protected] to request the download link for Amazon S3 connector.
In Windows, the files related to connector can be installed in the C:\ drive in Windows, in Linux it can be installed in /opt
- Download the SearchBlox Amazon S3 connector. Extract the downloaded zip to a folder.
- Unzip the archive under C:* or /opt*. Please find the screenshot on the extracted files in the following:
- The extracted files would be an executable based on your OS, yml file and an encrypter.
- Using the encrypter you can encrypt your access key and secret key from Amazon S3.
- Configure the amazonS3.yml file which includes Amazon S3 properties and SearchBlox properties as listed in the following:
Encrypted access key from Amazon s3.
Encrypted secret key from Amazon s3.
Region of Amazon S3 instance
Data Folder where the data needs to be stored. Make sure it has write permission.
SearchBlox API Key
The name of the custom collection in SearchBlox.
Amazon S3 SQS parameter.
If set to true the content from private buckets will be indexed
If set to true the content from public buckets will be indexed
S3 buckets to be included
File formats to be included
Expiring URLs in the search result. Default URL will be expiring URL
Default expire time will be 300 mins
permanent URLs in the search result. Among the expiring and permanent URL expiring URL is given more preference
Maximum size of static folder after which it should be sweeped in MB.
servlet url & delete-api-url:
Make sure that the port number is right. If your SearchBlox runs in 8080 port the URLs should be right.
- The content details of amazonS3.yml are provided here:
#User credentials encrypted-ak: oSMMs-K1nkdjukjkk003kdkdW004k9GB0wUXjJoCkZ encrypted-sk: 8kZf1ZPV_WcU9LBEJFdddIskdFzg5i6kNe-vk6ahffIvc8= region: us-east-1 #Data Folder where the data needs to be stored Make sure it has write permission data-directory: D:\GoWorkspace\searchblox\src\sbgoclient\examples\amazonS3 #SearchBlox API Key api-key: 83F9C9AF71B1D0B334A7DDE36C99BF6A #The name of the collection colname: amazon queuename: [testabcde] #private-buckets if set to true will index files in private buckets else if set to false the private bucket files will not be indexed private-buckets: true #public-buckets if set to true will index files in public buckets else if set to false the public bucket files will not be indexed public-buckets: false #SearchBlox URL url: http://localhost:8080/searchblox/rest/v2/api/ #IncludeBucket includebucket: [tests3] #The Included formats wont be indexed include-formats: [.pdf,.docx,.xml,.json] #servlet-url servlet-url: http://localhost:8080/searchblox/servlet/SearchServlet #expiring urls in search result. Default url will be expiring url and default expire time will be 300 mins expiring-url: true #expire time in min expire-time: 300 #permanent urls in search result. Among the expiring and permanent url expiring url is given more preference permanent-url: true #maximum size of static folder aftre which it should be sweeped in MB max-folder-size: 2 #delete-api-url: http://localhost:8080/searchblox/api/rest/docdelete delete-api-url: http://localhost:8080/searchblox/rest/v1/api/docdelete
- Please make sure to give the API key, collection name, data folder path and Amazon S3 keys(encrypted) and other related properties.
- Start running the amazonS3.exe file for Windows and ./amazonS3Linux32 or ./amazonS3Linux64
- In Amazon S3 console under services, click on the Simple Queue Service(SQS) which can be found under application integration.
- The Select Create New Queue, then give an appropriate queue name and configure the queue.
- Set message visibility timeout, retention days and receive message wait time.
- Receive message wait time should be set between 1 - 20 seconds. So as to set the queue for long polling.
- Then in the S3 console select a bucket whose SQS you need to set and click on “Properties”, Then in “Advanced settings” click on events.
- Now select “Add Notification” and give a name to the notification and check the events for which you need notification.https://files.readme.io/e00dec7-4.png
- Then in “Send To” fill SQS Queue and under SQS fill in the name of the queue which we created before and click on Save.
Now, whenever there is change to the documents in this particular bucket then a notification will be sent to SQS.
The queue names need to be given in the S3 connector yml file in order to trigger indexing whenever a document is updated
Updated about a month ago