南昌網(wǎng)站關(guān)鍵詞優(yōu)化廣州百度關(guān)鍵詞推廣
Elasticsearch scroll 之滾動查詢
Elasticsearch 的 Scroll API 是一種用于處理大規(guī)模數(shù)據(jù)集的機制,特別是在需要從索引中檢索大量數(shù)據(jù)時。通常情況下,Elasticsearch 的搜索請求會有一個結(jié)果集大小的限制 (from+size 的檢索數(shù)量默認是 10,000 條記錄),而 Scroll API 允許你繞過這個限制,通過滾動的方式逐步獲取數(shù)據(jù)
關(guān)鍵概念
- Scroll Context(滾動上下文)
- 當(dāng)你第一次發(fā)起一個滾動請求時,Elasticsearch 會創(chuàng)建一個滾動上下文。這個上下文保存了搜索的狀態(tài)和位置,以便在后續(xù)請求中繼續(xù)檢索數(shù)據(jù)
- 滾動上下文是有狀態(tài)的,它在服務(wù)器端保存了一段時間 (由你指定的超時時間決定)
- Scroll ID(滾動 ID)
- 每次滾動請求都會返回一個scrollId,這是一個唯一標(biāo)識符,用于標(biāo)識和管理滾動上下文
- 你需要在后續(xù)的滾動請求中提供這個scrollId,以便 Elasticsearch 知道從哪里繼續(xù)檢索數(shù)據(jù)
- Timeout(超時時間)
- 你可以為滾動上下文指定一個超時時間,這個時間決定了滾動上下文在服務(wù)器端保持活躍的時間
- 如果在超時時間內(nèi)沒有新的滾動請求,滾動上下文會被自動清除
工作原理
- 初始請求
- 你首先發(fā)起一個搜索請求,并指定滾動參數(shù) (如超時時間)。這個請求會返回初始的搜索結(jié)果和一個scrollId
- 后續(xù)請求
- 使用返回的scrollId發(fā)起后續(xù)的滾動請求。每個請求都會返回一批新的結(jié)果和一個新的scrollId
- 你繼續(xù)使用新的scrollId進行后續(xù)請求,直到?jīng)]有更多結(jié)果返回
- 清除滾動上下文
- 當(dāng)你完成數(shù)據(jù)檢索后,應(yīng)該顯式地清除滾動上下文,以釋放服務(wù)器資源。這可以通過ClearScrollRequest來實現(xiàn)
Java 實現(xiàn)
package com.xxx;import org.elasticsearch.action.search.ClearScrollRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchScrollRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.enthusa.avatar.core.utils.DateUtil;
import org.enthusa.avatar.utils.task.TaskModel;
import org.springframework.stereotype.Component;@Slf4j
@Component
public class ESScrollTask extends AbstractTask {private static final String[] INCLUDE_FIELDS = {"entity_id", "job_name", "job_city", "edu_level", "locations", "company_id", "career_job_id2", "salary"};private static final String[] EXCLUDE_FIELDS = {};public static final int ES_EACH_SIZE = 500;public static final int ES_TOTAL_SIZE = 10000;@Resourceprotected RestHighLevelClient utEsClient;private List<PlatformJob> termSearchWithScroll(Integer recruitType) {final long scrollTimeout = 60000;List<PlatformJob> platformJobs = new ArrayList<>();try {SearchRequest searchRequest = new SearchRequest(Constants.ES_JOB_ITEM);SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();// 自定義查詢語句searchSourceBuilder.query(buildQuery(recruitType));searchSourceBuilder.fetchSource(INCLUDE_FIELDS, EXCLUDE_FIELDS);searchSourceBuilder.size(ES_EACH_SIZE);searchRequest.source(searchSourceBuilder);// 滾動查詢及超時時間searchRequest.scroll(TimeValue.timeValueMillis(scrollTimeout));SearchResponse searchResponse = utEsClient.search(searchRequest, RequestOptions.DEFAULT);String scrollId = searchResponse.getScrollId();SearchHit[] searchHits = searchResponse.getHits().getHits();while (searchHits != null && searchHits.length > 0 && platformJobs.size() < ES_TOTAL_SIZE) {for (SearchHit hit : searchHits) {// 自定義業(yè)務(wù)PlatformJob platformJob = new PlatformJob();platformJob.setJobId(IdMapping.toId((Long) hit.getSourceAsMap().get("entity_id")));platformJob.setJobName((String) hit.getSourceAsMap().get("job_name"));platformJob.setJobCity((String) hit.getSourceAsMap().get("job_city"));platformJob.setEducation(eduLevelMap.get((Integer) hit.getSourceAsMap().get("edu_level")));platformJob.setLocations((String) hit.getSourceAsMap().get("locations"));platformJob.setCompanyId((Integer) hit.getSourceAsMap().get("company_id"));platformJob.setCareerJobId2((Integer) hit.getSourceAsMap().get("career_job_id2"));platformJob.setSalary((String) hit.getSourceAsMap().get("salary"));platformJobs.add(platformJob);}SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);scrollRequest.scroll(TimeValue.timeValueMillis(scrollTimeout));searchResponse = utEsClient.scroll(scrollRequest, RequestOptions.DEFAULT);scrollId = searchResponse.getScrollId();searchHits = searchResponse.getHits().getHits();}// 清除滾動上下文ClearScrollRequest clearScrollRequest = new ClearScrollRequest();clearScrollRequest.addScrollId(scrollId);utEsClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);} catch (Exception e) {log.info("es search error", e);return Collections.emptyList();}return platformJobs;}public BoolQueryBuilder buildQuery(Integer recruitType) {BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();queryBuilder.filter(QueryBuilders.termQuery("content_type", IdType.PLATFORM_JOB.toString()));queryBuilder.filter(QueryBuilders.termQuery("status", 0));queryBuilder.filter(QueryBuilders.termQuery("recruit_type", recruitType));return queryBuilder;}
}