国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

當(dāng)前位置: 首頁 > news >正文

做企業(yè)網(wǎng)站注意些啥百度指數(shù)數(shù)據(jù)分析平臺(tái)官網(wǎng)

做企業(yè)網(wǎng)站注意些啥,百度指數(shù)數(shù)據(jù)分析平臺(tái)官網(wǎng),溫州公司建設(shè)網(wǎng)站制作,政府網(wǎng)站一般用什么做測(cè)試環(huán)境:hive on spark spark版本:3.3.1 一、執(zhí)行時(shí)機(jī)二、對(duì)結(jié)果集的影響三、效率對(duì)比1.內(nèi)連接1)on2)where 2.外連接1)on2)where 四、總結(jié)PS 一、執(zhí)行時(shí)機(jī) sql連接中,where屬于過濾條件&#…

測(cè)試環(huán)境:hive on spark
spark版本:3.3.1

  • 一、執(zhí)行時(shí)機(jī)
  • 二、對(duì)結(jié)果集的影響
  • 三、效率對(duì)比
    • 1.內(nèi)連接
      • 1)on
      • 2)where
    • 2.外連接
      • 1)on
      • 2)where
  • 四、總結(jié)
  • PS

一、執(zhí)行時(shí)機(jī)

sql連接中,where屬于過濾條件,用于對(duì)join的結(jié)果集進(jìn)行過濾,所以理論上的執(zhí)行時(shí)機(jī)在join之后。on屬于關(guān)聯(lián)條件,決定了滿足什么樣條件的數(shù)據(jù)才可以被關(guān)聯(lián)到一起,因此理論上的執(zhí)行時(shí)機(jī)在join時(shí)。

但是,大多數(shù)數(shù)據(jù)庫系統(tǒng)為了提升效率都采用了一些優(yōu)化技術(shù),思想都是將where中的篩選條件或是on中的關(guān)聯(lián)條件盡可能的提前到數(shù)據(jù)源側(cè)進(jìn)行篩選,目的是減少參與關(guān)聯(lián)的數(shù)據(jù)量。因此它們實(shí)際的執(zhí)行時(shí)機(jī)大多時(shí)候和理論上的不同。

二、對(duì)結(jié)果集的影響

內(nèi)連接中,條件放在where或者on中對(duì)結(jié)果集無影響。

外連接中(以左外連接為例),因?yàn)樽笸膺B接是完全保留左表記錄,on在join時(shí)生效,因此最終的結(jié)果集也會(huì)保留左表的全部記錄。where是對(duì)join后的結(jié)果集進(jìn)行操作,所以會(huì)過濾掉一些數(shù)據(jù)導(dǎo)致二者的結(jié)果集不相同。

三、效率對(duì)比

測(cè)試數(shù)據(jù)量如下:

poi_data.poi_res表:數(shù)據(jù)量8300W+
bi_report.mon_ronghe_pv表:分區(qū)表,總數(shù)據(jù)量120E+,本次采用分區(qū)20240522的數(shù)據(jù)關(guān)聯(lián),數(shù)據(jù)量5900W+,其中 bid like ‘1%’ & pv>100 的數(shù)據(jù)量120W+

兩表的關(guān)聯(lián)字段均無重復(fù)值。

1.內(nèi)連接

1)on

selectt1.bid,t1.name,t1.point_x,t1.point_y,t2.pv
from poi_data.poi_res t1 
join (select bid, pv from bi_report.mon_ronghe_pv where event_day='20240522') t2
on t1.bid=t2.bid 
and t2.bid like '1%' and t2.pv>100;
== Physical Plan ==
AdaptiveSparkPlan (28)
+- == Final Plan ==CollectLimit (17)+- * Project (16)+- * SortMergeJoin Inner (15):- * Sort (6):  +- AQEShuffleRead (5):     +- ShuffleQueryStage (4), Statistics(sizeInBytes=5.3 GiB, rowCount=4.57E+7):        +- Exchange (3):           +- * Filter (2):              +- Scan hive poi_data.poi_res (1)+- * Sort (14)+- AQEShuffleRead (13)+- ShuffleQueryStage (12), Statistics(sizeInBytes=58.5 MiB, rowCount=1.28E+6)+- Exchange (11)+- * Project (10)+- * Filter (9)+- * ColumnarToRow (8)+- Scan parquet bi_report.mon_ronghe_pv (7)
+- == Initial Plan ==CollectLimit (27)+- Project (26)+- SortMergeJoin Inner (25):- Sort (20):  +- Exchange (19):     +- Filter (18):        +- Scan hive poi_data.poi_res (1)+- Sort (24)+- Exchange (23)+- Project (22)+- Filter (21)+- Scan parquet bi_report.mon_ronghe_pv (7)(1) Scan hive poi_data.poi_res
Output [4]: [bid#297, name#299, point_x#316, point_y#317]
Arguments: [bid#297, name#299, point_x#316, point_y#317], HiveTableRelation [`poi_data`.`poi_res`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [bid#297, type#298, name#299, address#300, phone#301, alias#302, post_code#303, catalog_id#304, c..., Partition Cols: []](2) Filter [codegen id : 1]
Input [4]: [bid#297, name#299, point_x#316, point_y#317]
Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297))(3) Exchange
Input [4]: [bid#297, name#299, point_x#316, point_y#317]
Arguments: hashpartitioning(bid#297, 600), ENSURE_REQUIREMENTS, [plan_id=774](4) ShuffleQueryStage
Output [4]: [bid#297, name#299, point_x#316, point_y#317]
Arguments: 0(5) AQEShuffleRead
Input [4]: [bid#297, name#299, point_x#316, point_y#317]
Arguments: coalesced(6) Sort [codegen id : 3]
Input [4]: [bid#297, name#299, point_x#316, point_y#317]
Arguments: [bid#297 ASC NULLS FIRST], false, 0(7) Scan parquet bi_report.mon_ronghe_pv
Output [3]: [bid#334, pv#335, event_day#338]
Batched: true
Location: InMemoryFileIndex [afs://kunpeng.afs.baidu.com:9902/user/g_spark_rdw/rdw/poi_engine/warehouse/bi_report.db/mon_ronghe_pv/event_day=20240522]
PartitionFilters: [isnotnull(event_day#338), (event_day#338 = 20240522)]
PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)]
ReadSchema: struct<bid:string,pv:int>(8) ColumnarToRow [codegen id : 2]
Input [3]: [bid#334, pv#335, event_day#338](9) Filter [codegen id : 2]
Input [3]: [bid#334, pv#335, event_day#338]
Condition : (((isnotnull(bid#334) AND isnotnull(pv#335)) AND StartsWith(bid#334, 1)) AND (pv#335 > 100))(10) Project [codegen id : 2]
Output [2]: [bid#334, pv#335]
Input [3]: [bid#334, pv#335, event_day#338](11) Exchange
Input [2]: [bid#334, pv#335]
Arguments: hashpartitioning(bid#334, 600), ENSURE_REQUIREMENTS, [plan_id=799](12) ShuffleQueryStage
Output [2]: [bid#334, pv#335]
Arguments: 1(13) AQEShuffleRead
Input [2]: [bid#334, pv#335]
Arguments: coalesced(14) Sort [codegen id : 4]
Input [2]: [bid#334, pv#335]
Arguments: [bid#334 ASC NULLS FIRST], false, 0(15) SortMergeJoin [codegen id : 5]
Left keys [1]: [bid#297]
Right keys [1]: [bid#334]
Join condition: None(16) Project [codegen id : 5]
Output [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335]
Input [6]: [bid#297, name#299, point_x#316, point_y#317, bid#334, pv#335](17) CollectLimit
Input [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335]
Arguments: 1000(18) Filter
Input [4]: [bid#297, name#299, point_x#316, point_y#317]
Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297))(19) Exchange
Input [4]: [bid#297, name#299, point_x#316, point_y#317]
Arguments: hashpartitioning(bid#297, 600), ENSURE_REQUIREMENTS, [plan_id=759](20) Sort
Input [4]: [bid#297, name#299, point_x#316, point_y#317]
Arguments: [bid#297 ASC NULLS FIRST], false, 0(21) Filter
Input [3]: [bid#334, pv#335, event_day#338]
Condition : (((isnotnull(bid#334) AND isnotnull(pv#335)) AND StartsWith(bid#334, 1)) AND (pv#335 > 100))(22) Project
Output [2]: [bid#334, pv#335]
Input [3]: [bid#334, pv#335, event_day#338](23) Exchange
Input [2]: [bid#334, pv#335]
Arguments: hashpartitioning(bid#334, 600), ENSURE_REQUIREMENTS, [plan_id=760](24) Sort
Input [2]: [bid#334, pv#335]
Arguments: [bid#334 ASC NULLS FIRST], false, 0(25) SortMergeJoin
Left keys [1]: [bid#297]
Right keys [1]: [bid#334]
Join condition: None(26) Project
Output [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335]
Input [6]: [bid#297, name#299, point_x#316, point_y#317, bid#334, pv#335](27) CollectLimit
Input [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335]
Arguments: 1000(28) AdaptiveSparkPlan
Output [5]: [bid#297, name#299, point_x#316, point_y#317, pv#335]
Arguments: isFinalPlan=true

從物理執(zhí)行計(jì)劃可以看到第(2)步中的Filter使用條件Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297))在t1表讀取源數(shù)據(jù)時(shí)進(jìn)行了過濾,在第(7)步中通過謂詞下推在t2表scan源數(shù)據(jù)時(shí)使用條件PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)]進(jìn)行了過濾,兩表都是在數(shù)據(jù)源側(cè)進(jìn)行的數(shù)據(jù)過濾,減少了shuffle和參與join的數(shù)據(jù)量。

2)where

selectt1.bid,t1.name,t1.point_x,t1.point_y,t2.pv
from poi_data.poi_res t1 
join (select bid, pv from bi_report.mon_ronghe_pv where event_day='20240522') t2
on t1.bid=t2.bid 
where t2.bid like '1%' and t2.pv>100;
== Physical Plan ==
AdaptiveSparkPlan (28)
+- == Final Plan ==CollectLimit (17)+- * Project (16)+- * SortMergeJoin Inner (15):- * Sort (6):  +- AQEShuffleRead (5):     +- ShuffleQueryStage (4), Statistics(sizeInBytes=5.3 GiB, rowCount=4.57E+7):        +- Exchange (3):           +- * Filter (2):              +- Scan hive poi_data.poi_res (1)+- * Sort (14)+- AQEShuffleRead (13)+- ShuffleQueryStage (12), Statistics(sizeInBytes=58.5 MiB, rowCount=1.28E+6)+- Exchange (11)+- * Project (10)+- * Filter (9)+- * ColumnarToRow (8)+- Scan parquet bi_report.mon_ronghe_pv (7)
+- == Initial Plan ==CollectLimit (27)+- Project (26)+- SortMergeJoin Inner (25):- Sort (20):  +- Exchange (19):     +- Filter (18):        +- Scan hive poi_data.poi_res (1)+- Sort (24)+- Exchange (23)+- Project (22)+- Filter (21)+- Scan parquet bi_report.mon_ronghe_pv (7)(1) Scan hive poi_data.poi_res
Output [4]: [bid#350, name#352, point_x#369, point_y#370]
Arguments: [bid#350, name#352, point_x#369, point_y#370], HiveTableRelation [`poi_data`.`poi_res`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [bid#350, type#351, name#352, address#353, phone#354, alias#355, post_code#356, catalog_id#357, c..., Partition Cols: []](2) Filter [codegen id : 1]
Input [4]: [bid#350, name#352, point_x#369, point_y#370]
Condition : (StartsWith(bid#350, 1) AND isnotnull(bid#350))(3) Exchange
Input [4]: [bid#350, name#352, point_x#369, point_y#370]
Arguments: hashpartitioning(bid#350, 600), ENSURE_REQUIREMENTS, [plan_id=908](4) ShuffleQueryStage
Output [4]: [bid#350, name#352, point_x#369, point_y#370]
Arguments: 0(5) AQEShuffleRead
Input [4]: [bid#350, name#352, point_x#369, point_y#370]
Arguments: coalesced(6) Sort [codegen id : 3]
Input [4]: [bid#350, name#352, point_x#369, point_y#370]
Arguments: [bid#350 ASC NULLS FIRST], false, 0(7) Scan parquet bi_report.mon_ronghe_pv
Output [3]: [bid#387, pv#388, event_day#391]
Batched: true
Location: InMemoryFileIndex [afs://kunpeng.afs.baidu.com:9902/user/g_spark_rdw/rdw/poi_engine/warehouse/bi_report.db/mon_ronghe_pv/event_day=20240522]
PartitionFilters: [isnotnull(event_day#391), (event_day#391 = 20240522)]
PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)]
ReadSchema: struct<bid:string,pv:int>(8) ColumnarToRow [codegen id : 2]
Input [3]: [bid#387, pv#388, event_day#391](9) Filter [codegen id : 2]
Input [3]: [bid#387, pv#388, event_day#391]
Condition : (((isnotnull(bid#387) AND isnotnull(pv#388)) AND StartsWith(bid#387, 1)) AND (pv#388 > 100))(10) Project [codegen id : 2]
Output [2]: [bid#387, pv#388]
Input [3]: [bid#387, pv#388, event_day#391](11) Exchange
Input [2]: [bid#387, pv#388]
Arguments: hashpartitioning(bid#387, 600), ENSURE_REQUIREMENTS, [plan_id=933](12) ShuffleQueryStage
Output [2]: [bid#387, pv#388]
Arguments: 1(13) AQEShuffleRead
Input [2]: [bid#387, pv#388]
Arguments: coalesced(14) Sort [codegen id : 4]
Input [2]: [bid#387, pv#388]
Arguments: [bid#387 ASC NULLS FIRST], false, 0(15) SortMergeJoin [codegen id : 5]
Left keys [1]: [bid#350]
Right keys [1]: [bid#387]
Join condition: None(16) Project [codegen id : 5]
Output [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388]
Input [6]: [bid#350, name#352, point_x#369, point_y#370, bid#387, pv#388](17) CollectLimit
Input [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388]
Arguments: 1000(18) Filter
Input [4]: [bid#350, name#352, point_x#369, point_y#370]
Condition : (StartsWith(bid#350, 1) AND isnotnull(bid#350))(19) Exchange
Input [4]: [bid#350, name#352, point_x#369, point_y#370]
Arguments: hashpartitioning(bid#350, 600), ENSURE_REQUIREMENTS, [plan_id=893](20) Sort
Input [4]: [bid#350, name#352, point_x#369, point_y#370]
Arguments: [bid#350 ASC NULLS FIRST], false, 0(21) Filter
Input [3]: [bid#387, pv#388, event_day#391]
Condition : (((isnotnull(bid#387) AND isnotnull(pv#388)) AND StartsWith(bid#387, 1)) AND (pv#388 > 100))(22) Project
Output [2]: [bid#387, pv#388]
Input [3]: [bid#387, pv#388, event_day#391](23) Exchange
Input [2]: [bid#387, pv#388]
Arguments: hashpartitioning(bid#387, 600), ENSURE_REQUIREMENTS, [plan_id=894](24) Sort
Input [2]: [bid#387, pv#388]
Arguments: [bid#387 ASC NULLS FIRST], false, 0(25) SortMergeJoin
Left keys [1]: [bid#350]
Right keys [1]: [bid#387]
Join condition: None(26) Project
Output [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388]
Input [6]: [bid#350, name#352, point_x#369, point_y#370, bid#387, pv#388](27) CollectLimit
Input [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388]
Arguments: 1000(28) AdaptiveSparkPlan
Output [5]: [bid#350, name#352, point_x#369, point_y#370, pv#388]
Arguments: isFinalPlan=true

物理執(zhí)行計(jì)劃沒有變化,因此可以說,當(dāng)數(shù)據(jù)庫支持謂詞下推時(shí),篩選條件用where還是on沒有區(qū)別,數(shù)據(jù)庫都會(huì)在數(shù)據(jù)源側(cè)進(jìn)行數(shù)據(jù)過濾,減少參與關(guān)聯(lián)的數(shù)據(jù)量。

2.外連接

1)on

selectt1.bid,t1.name,t1.point_x,t1.point_y,t2.pv
from poi_data.poi_res t1 
left join (select bid, pv from bi_report.mon_ronghe_pv where event_day='20240522') t2
on t1.bid=t2.bid 
and t2.bid like '1%' and t2.pv>100;
== Physical Plan ==
AdaptiveSparkPlan (28)
+- == Final Plan ==CollectLimit (17)+- * Project (16)+- * SortMergeJoin LeftOuter (15):- * Sort (6):  +- AQEShuffleRead (5):     +- ShuffleQueryStage (4), Statistics(sizeInBytes=36.5 MiB, rowCount=3.07E+5):        +- Exchange (3):           +- * LocalLimit (2):              +- Scan hive poi_data.poi_res (1)+- * Sort (14)+- AQEShuffleRead (13)+- ShuffleQueryStage (12), Statistics(sizeInBytes=58.5 MiB, rowCount=1.28E+6)+- Exchange (11)+- * Project (10)+- * Filter (9)+- * ColumnarToRow (8)+- Scan parquet bi_report.mon_ronghe_pv (7)
+- == Initial Plan ==CollectLimit (27)+- Project (26)+- SortMergeJoin LeftOuter (25):- Sort (20):  +- Exchange (19):     +- LocalLimit (18):        +- Scan hive poi_data.poi_res (1)+- Sort (24)+- Exchange (23)+- Project (22)+- Filter (21)+- Scan parquet bi_report.mon_ronghe_pv (7)(1) Scan hive poi_data.poi_res
Output [4]: [bid#403, name#405, point_x#422, point_y#423]
Arguments: [bid#403, name#405, point_x#422, point_y#423], HiveTableRelation [`poi_data`.`poi_res`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [bid#403, type#404, name#405, address#406, phone#407, alias#408, post_code#409, catalog_id#410, c..., Partition Cols: []](2) LocalLimit [codegen id : 1]
Input [4]: [bid#403, name#405, point_x#422, point_y#423]
Arguments: 1000(3) Exchange
Input [4]: [bid#403, name#405, point_x#422, point_y#423]
Arguments: hashpartitioning(bid#403, 600), ENSURE_REQUIREMENTS, [plan_id=1043](4) ShuffleQueryStage
Output [4]: [bid#403, name#405, point_x#422, point_y#423]
Arguments: 0(5) AQEShuffleRead
Input [4]: [bid#403, name#405, point_x#422, point_y#423]
Arguments: coalesced(6) Sort [codegen id : 3]
Input [4]: [bid#403, name#405, point_x#422, point_y#423]
Arguments: [bid#403 ASC NULLS FIRST], false, 0(7) Scan parquet bi_report.mon_ronghe_pv
Output [3]: [bid#440, pv#441, event_day#444]
Batched: true
Location: InMemoryFileIndex [afs://kunpeng.afs.baidu.com:9902/user/g_spark_rdw/rdw/poi_engine/warehouse/bi_report.db/mon_ronghe_pv/event_day=20240522]
PartitionFilters: [isnotnull(event_day#444), (event_day#444 = 20240522)]
PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)]
ReadSchema: struct<bid:string,pv:int>(8) ColumnarToRow [codegen id : 2]
Input [3]: [bid#440, pv#441, event_day#444](9) Filter [codegen id : 2]
Input [3]: [bid#440, pv#441, event_day#444]
Condition : (((isnotnull(bid#440) AND isnotnull(pv#441)) AND StartsWith(bid#440, 1)) AND (pv#441 > 100))(10) Project [codegen id : 2]
Output [2]: [bid#440, pv#441]
Input [3]: [bid#440, pv#441, event_day#444](11) Exchange
Input [2]: [bid#440, pv#441]
Arguments: hashpartitioning(bid#440, 600), ENSURE_REQUIREMENTS, [plan_id=1067](12) ShuffleQueryStage
Output [2]: [bid#440, pv#441]
Arguments: 1(13) AQEShuffleRead
Input [2]: [bid#440, pv#441]
Arguments: coalesced(14) Sort [codegen id : 4]
Input [2]: [bid#440, pv#441]
Arguments: [bid#440 ASC NULLS FIRST], false, 0(15) SortMergeJoin [codegen id : 5]
Left keys [1]: [bid#403]
Right keys [1]: [bid#440]
Join condition: None(16) Project [codegen id : 5]
Output [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441]
Input [6]: [bid#403, name#405, point_x#422, point_y#423, bid#440, pv#441](17) CollectLimit
Input [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441]
Arguments: 1000(18) LocalLimit
Input [4]: [bid#403, name#405, point_x#422, point_y#423]
Arguments: 1000(19) Exchange
Input [4]: [bid#403, name#405, point_x#422, point_y#423]
Arguments: hashpartitioning(bid#403, 600), ENSURE_REQUIREMENTS, [plan_id=1029](20) Sort
Input [4]: [bid#403, name#405, point_x#422, point_y#423]
Arguments: [bid#403 ASC NULLS FIRST], false, 0(21) Filter
Input [3]: [bid#440, pv#441, event_day#444]
Condition : (((isnotnull(bid#440) AND isnotnull(pv#441)) AND StartsWith(bid#440, 1)) AND (pv#441 > 100))(22) Project
Output [2]: [bid#440, pv#441]
Input [3]: [bid#440, pv#441, event_day#444](23) Exchange
Input [2]: [bid#440, pv#441]
Arguments: hashpartitioning(bid#440, 600), ENSURE_REQUIREMENTS, [plan_id=1030](24) Sort
Input [2]: [bid#440, pv#441]
Arguments: [bid#440 ASC NULLS FIRST], false, 0(25) SortMergeJoin
Left keys [1]: [bid#403]
Right keys [1]: [bid#440]
Join condition: None(26) Project
Output [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441]
Input [6]: [bid#403, name#405, point_x#422, point_y#423, bid#440, pv#441](27) CollectLimit
Input [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441]
Arguments: 1000(28) AdaptiveSparkPlan
Output [5]: [bid#403, name#405, point_x#422, point_y#423, pv#441]
Arguments: isFinalPlan=true

因?yàn)樽箨P(guān)聯(lián),on中的條件屬于連接條件,結(jié)果需要保留左表全部記錄,所以t1表全量讀取,t2表使用了謂詞下推過濾。

2)where

selectt1.bid,t1.name,t1.point_x,t1.point_y,t2.pv
from poi_data.poi_res t1 
left join (select bid, pv from bi_report.mon_ronghe_pv where event_day='20240522') t2
on t1.bid=t2.bid 
where t2.bid like '1%' and t2.pv>100;
== Physical Plan ==
AdaptiveSparkPlan (28)
+- == Final Plan ==CollectLimit (17)+- * Project (16)+- * SortMergeJoin Inner (15):- * Sort (6):  +- AQEShuffleRead (5):     +- ShuffleQueryStage (4), Statistics(sizeInBytes=5.3 GiB, rowCount=4.57E+7):        +- Exchange (3):           +- * Filter (2):              +- Scan hive poi_data.poi_res (1)+- * Sort (14)+- AQEShuffleRead (13)+- ShuffleQueryStage (12), Statistics(sizeInBytes=58.5 MiB, rowCount=1.28E+6)+- Exchange (11)+- * Project (10)+- * Filter (9)+- * ColumnarToRow (8)+- Scan parquet bi_report.mon_ronghe_pv (7)
+- == Initial Plan ==CollectLimit (27)+- Project (26)+- SortMergeJoin Inner (25):- Sort (20):  +- Exchange (19):     +- Filter (18):        +- Scan hive poi_data.poi_res (1)+- Sort (24)+- Exchange (23)+- Project (22)+- Filter (21)+- Scan parquet bi_report.mon_ronghe_pv (7)(1) Scan hive poi_data.poi_res
Output [4]: [bid#456, name#458, point_x#475, point_y#476]
Arguments: [bid#456, name#458, point_x#475, point_y#476], HiveTableRelation [`poi_data`.`poi_res`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [bid#456, type#457, name#458, address#459, phone#460, alias#461, post_code#462, catalog_id#463, c..., Partition Cols: []](2) Filter [codegen id : 1]
Input [4]: [bid#456, name#458, point_x#475, point_y#476]
Condition : (StartsWith(bid#456, 1) AND isnotnull(bid#456))(3) Exchange
Input [4]: [bid#456, name#458, point_x#475, point_y#476]
Arguments: hashpartitioning(bid#456, 600), ENSURE_REQUIREMENTS, [plan_id=1176](4) ShuffleQueryStage
Output [4]: [bid#456, name#458, point_x#475, point_y#476]
Arguments: 0(5) AQEShuffleRead
Input [4]: [bid#456, name#458, point_x#475, point_y#476]
Arguments: coalesced(6) Sort [codegen id : 3]
Input [4]: [bid#456, name#458, point_x#475, point_y#476]
Arguments: [bid#456 ASC NULLS FIRST], false, 0(7) Scan parquet bi_report.mon_ronghe_pv
Output [3]: [bid#493, pv#494, event_day#497]
Batched: true
Location: InMemoryFileIndex [afs://kunpeng.afs.baidu.com:9902/user/g_spark_rdw/rdw/poi_engine/warehouse/bi_report.db/mon_ronghe_pv/event_day=20240522]
PartitionFilters: [isnotnull(event_day#497), (event_day#497 = 20240522)]
PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)]
ReadSchema: struct<bid:string,pv:int>(8) ColumnarToRow [codegen id : 2]
Input [3]: [bid#493, pv#494, event_day#497](9) Filter [codegen id : 2]
Input [3]: [bid#493, pv#494, event_day#497]
Condition : (((isnotnull(bid#493) AND isnotnull(pv#494)) AND StartsWith(bid#493, 1)) AND (pv#494 > 100))(10) Project [codegen id : 2]
Output [2]: [bid#493, pv#494]
Input [3]: [bid#493, pv#494, event_day#497](11) Exchange
Input [2]: [bid#493, pv#494]
Arguments: hashpartitioning(bid#493, 600), ENSURE_REQUIREMENTS, [plan_id=1201](12) ShuffleQueryStage
Output [2]: [bid#493, pv#494]
Arguments: 1(13) AQEShuffleRead
Input [2]: [bid#493, pv#494]
Arguments: coalesced(14) Sort [codegen id : 4]
Input [2]: [bid#493, pv#494]
Arguments: [bid#493 ASC NULLS FIRST], false, 0(15) SortMergeJoin [codegen id : 5]
Left keys [1]: [bid#456]
Right keys [1]: [bid#493]
Join condition: None(16) Project [codegen id : 5]
Output [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494]
Input [6]: [bid#456, name#458, point_x#475, point_y#476, bid#493, pv#494](17) CollectLimit
Input [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494]
Arguments: 1000(18) Filter
Input [4]: [bid#456, name#458, point_x#475, point_y#476]
Condition : (StartsWith(bid#456, 1) AND isnotnull(bid#456))(19) Exchange
Input [4]: [bid#456, name#458, point_x#475, point_y#476]
Arguments: hashpartitioning(bid#456, 600), ENSURE_REQUIREMENTS, [plan_id=1161](20) Sort
Input [4]: [bid#456, name#458, point_x#475, point_y#476]
Arguments: [bid#456 ASC NULLS FIRST], false, 0(21) Filter
Input [3]: [bid#493, pv#494, event_day#497]
Condition : (((isnotnull(bid#493) AND isnotnull(pv#494)) AND StartsWith(bid#493, 1)) AND (pv#494 > 100))(22) Project
Output [2]: [bid#493, pv#494]
Input [3]: [bid#493, pv#494, event_day#497](23) Exchange
Input [2]: [bid#493, pv#494]
Arguments: hashpartitioning(bid#493, 600), ENSURE_REQUIREMENTS, [plan_id=1162](24) Sort
Input [2]: [bid#493, pv#494]
Arguments: [bid#493 ASC NULLS FIRST], false, 0(25) SortMergeJoin
Left keys [1]: [bid#456]
Right keys [1]: [bid#493]
Join condition: None(26) Project
Output [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494]
Input [6]: [bid#456, name#458, point_x#475, point_y#476, bid#493, pv#494](27) CollectLimit
Input [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494]
Arguments: 1000(28) AdaptiveSparkPlan
Output [5]: [bid#456, name#458, point_x#475, point_y#476, pv#494]
Arguments: isFinalPlan=true

where屬于過濾條件,影響左關(guān)聯(lián)的最終結(jié)果,所以執(zhí)行計(jì)劃第(2)步中將where提前到j(luò)oin關(guān)聯(lián)之前按照bid對(duì)t1表進(jìn)行過濾。

四、總結(jié)

假設(shè)數(shù)據(jù)庫系統(tǒng)支持謂詞下推的前提下,

  • 內(nèi)連接:內(nèi)連接的兩個(gè)執(zhí)行計(jì)劃中,對(duì)t2表都使用了PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)],對(duì)t1表都使用了Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297)) ,因此可以說,內(nèi)連接中where和on在執(zhí)行效率上沒區(qū)別。
  • 外連接:還是拿左外連接來說,右表相關(guān)的條件會(huì)使用謂詞下推,而左表是否會(huì)提前過濾數(shù)據(jù),取決于where還是on以及篩選條件是否與左表相關(guān),1)當(dāng)為on時(shí),左表的數(shù)據(jù)必須全量讀取,此時(shí)效率的差別主要取決于左表的數(shù)據(jù)量。2)當(dāng)為where時(shí),如果篩選條件涉及到左表,則會(huì)進(jìn)行數(shù)據(jù)的提前過濾,否則左表仍然全量讀取。

PS

在內(nèi)連接的物理執(zhí)行計(jì)劃中,對(duì)poi_res表的過濾單獨(dú)作為一個(gè)Filter步驟(2)Condition : (StartsWith(bid#297, 1) AND isnotnull(bid#297)),而對(duì)mon_ronghe_pv表的過濾在第(7)步scan中PushedFilters: [IsNotNull(bid), IsNotNull(pv), StringStartsWith(bid,1), GreaterThan(pv,100)] ,二者有什么區(qū)別?查了一些資料,說的是可以將PushedFilters理解為在讀取數(shù)據(jù)時(shí)的過濾,不滿足條件的數(shù)據(jù)直接不讀取。Filter時(shí)將數(shù)據(jù)讀取之后,再判斷是否滿足條件,決定是否參與后續(xù)計(jì)算。

既然都是在數(shù)據(jù)源側(cè)進(jìn)行數(shù)據(jù)過濾,為什么Filter不能像PushedFilters那樣,直接在讀取數(shù)據(jù)的時(shí)候判斷,減少讀入的數(shù)據(jù)量呢,這樣也可以提升效率,這是一開始個(gè)人的疑問。查了一些資料,說的是是否支持在scan時(shí)filter數(shù)據(jù),主要受數(shù)據(jù)源的影響。大數(shù)據(jù)中的存儲(chǔ)方式主要分為行式存儲(chǔ)和列式存儲(chǔ),列式存儲(chǔ)的數(shù)據(jù)存儲(chǔ)方式和豐富的元數(shù)據(jù)對(duì)謂詞下推技術(shù)有更好的支持。當(dāng)前測(cè)試中,mon_ronghe_pv表的存儲(chǔ)格式為parquet,poi_res表存儲(chǔ)格式text。

http://aloenet.com.cn/news/39990.html

相關(guān)文章:

  • 專門做圖片的網(wǎng)站有哪些今日軍事新聞
  • 網(wǎng)站建設(shè)銷售實(shí)習(xí)服務(wù)營銷策劃方案
  • php網(wǎng)站微信支付怎么做黃頁引流推廣網(wǎng)站
  • 網(wǎng)站圖片計(jì)時(shí)器怎么做國內(nèi)時(shí)事新聞
  • 網(wǎng)站建設(shè)昆明色盲
  • 怎么查看網(wǎng)站的友情鏈接外鏈發(fā)布工具
  • 哪家做網(wǎng)站公司好搜索大全引擎地址
  • 免費(fèi)視頻網(wǎng)站制作愛上鏈外鏈購買平臺(tái)
  • 網(wǎng)站建設(shè)與維護(hù)模擬一新聞?lì)^條免費(fèi)下載安裝
  • 返利網(wǎng)站程序產(chǎn)品推廣
  • 北京網(wǎng)站制作西安西安網(wǎng)紅
  • 做網(wǎng)站和web前端一樣嗎百度seo優(yōu)化招聘
  • 成都網(wǎng)站建設(shè)贏展成都網(wǎng)站建設(shè)方案推廣
  • 網(wǎng)站建設(shè)預(yù)付款比例惠州網(wǎng)絡(luò)營銷
  • 成都廣告公司網(wǎng)站建設(shè)瑞金網(wǎng)絡(luò)推廣
  • 邯鄲網(wǎng)站建設(shè)公司群站優(yōu)化之鏈輪模式
  • 企業(yè)商務(wù)網(wǎng)站建設(shè)策劃書查詢域名注冊(cè)信息
  • 17網(wǎng)站一起做網(wǎng)店揭陽seo合作
  • wordpress自定義登陸頁面天津seo方案
  • 用表格做網(wǎng)站廊坊首頁霸屏優(yōu)化
  • 網(wǎng)站建設(shè)要求百度指數(shù)疫情
  • wordpress網(wǎng)站設(shè)計(jì)作業(yè)線下營銷推廣方式都有哪些
  • logosc網(wǎng)站怎么做的網(wǎng)絡(luò)營銷的基本特征有哪七個(gè)
  • 長沙建設(shè)公司網(wǎng)站網(wǎng)絡(luò)推廣產(chǎn)品公司
  • 小游戲網(wǎng)站審核怎么做百度信息
  • 迅馳互聯(lián)網(wǎng)站建設(shè)網(wǎng)絡(luò)推廣怎么樣廣東云浮疫情最新情況
  • 網(wǎng)站建設(shè) 大公司小公司seo排名優(yōu)化是什么意思
  • 做ppt哪個(gè)網(wǎng)站好百度seo查詢
  • 網(wǎng)站app用什么語言開發(fā)的互聯(lián)網(wǎng)推廣運(yùn)營
  • 如何修改wordpress優(yōu)化公司