Google Drive Collection

SearchBlox has a crawler that can index documents from a Google Drive account, and you can create a Google Drive Collection using the steps given below.

🚧

Prerequisites

  • Create a Project in Google Account to get KEY File, ServiceAccount ID, Application Name and ServiceAccount User.
  • Provide the Required Permissions to ServiceAccount User to access the files/folders of Google Drive.
  • Share the files/folders with ServiceAccount User.
  • Guidelines to create an ServiceAccount

Creating Google Drive Collection

You can create a Google Drive Collection using these steps:

  1. Log in to the Admin Console, go to the Collections tab, and click Create a New Collection or the "+" icon.
  2. Select Google Drive Collection as the Collection Type.
  3. Enter a unique collection name (example: Drive).
  4. Enable or disable RAG (enable if using ChatBot or Hybrid RAG search).
  5. Choose Private/Public access and set Collection Encryption as needed.
  6. Select the content language (if not English).
  7. Click Save to create the collection.
  8. After creating the Google Drive collection, you will be taken to the Authentication tab.

Settings Tab

FieldDescription
KEY FileThe file generated from the Google Drive Service Account when creating a key. It can be in JSON or PKCS12 format.
ServiceAccount IDThe email address of the created Service Account.
Application NameThe name of the application linked to the Service Account.
ServiceAccount UserThe email of the user who has permission to access Google Drive files (default is the ServiceAccount ID).

Upload the KEY File and enter the ServiceAccount ID, Application Name, and ServiceAccount User. You can find all these details in the ServiceAccount section of the Google Cloud Console.

  • Choose the settings for Generate Using LLM and Hybrid Search.
SettingsDescription
TitleGenerates concise and relevant titles for the indexed documents using LLM.
DescriptionGenerates the description for indexed documents using LLM.
TopicGenerates relevant topics for indexed documents using LLM based on document's content.
Auto RelevanceEnable/Disable Hybrid Search for automatic relevance ranking
  • Click on Save button and Click on Test Connection.

Schedule and Index

Google Drive collections should only index files or folders that are shared. You can set how often and when the collection should be indexed. SearchBlox supports the following schedule options:

Once

Hourly

Daily

Every 48 Hours

Every 96 Hours

Weekly

Monthly

The following operation can be performed in Google Drive collections

ActivityDescription
Enable Scheduler for IndexingTurn this on to choose when indexing should start and how often it should repeat.
ScheduleLets you set an indexing schedule using the available frequency options.
View all Collection SchedulesTakes you to the Schedules page where you can see all indexing schedules in one place.

Manage Documents Tab

  • In the Manage Documents tab, you can do these actions:

    1. Filter – Find specific documents.
    2. View content – See the document’s text.
    3. View metadata – Check details about the document.
    4. Refresh – Update the document list.
    5. Delete – Remove documents.
  • To delete a file, type the file path and click Delete.

  • To check the status of an indexed file, click View Metadata.

Data Fields Tab


In the Data Fields tab, you can create custom fields for search and view the default fields in a non-encrypted collection. SearchBlox supports 4 types of Data Fields:

  1. Keyword
  2. Number
  3. Date
  4. Text

After configuring Data Fields, the collection must be cleared and re-indexed to apply the changes.

To know more about Data Fields please refer to Data Fields Tab


Models

Embedding

  • Provider specifies the embedding provider used to generate vector representations of documents.
  • Model defines the embedding model used to convert document content into vectors for semantic search.

Reranker

  • Provider specifies the reranker provider used for improving search result relevance.
  • Model defines the reranker model used to re-score and reorder search results based on relevance.

LLM

  • Provider specifies the Large Language Model provider used for AI-powered features.

  • Model defines the LLM used for tasks such as document enrichment, summaries, and SmartFAQs.

  • These settings override global configurations and apply only to the current collection.