當(dāng)前位置：首頁 > news >正文

搜索網(wǎng)站開發(fā)背景廣告留電話號的網(wǎng)站

news 2025/7/6 11:54:47

搜索網(wǎng)站開發(fā)背景,廣告留電話號的網(wǎng)站,電腦哪里做ppt下載網(wǎng)站,公司建設(shè)網(wǎng)站的目的該筆記來源于網(wǎng)絡(luò)，僅用于搜索學(xué)習(xí)，不保證所有內(nèi)容正確。文章目錄 1、Presto安裝使用2、事件分析3、漏斗分析4、漏斗分析UDAF開發(fā)開發(fā)UDF插件開發(fā)UDAF插件 5、漏斗測試 1、Presto安裝使用參考官方文檔：https://prestodb.io/docs/current/ P…

該筆記來源于網(wǎng)絡(luò)，僅用于搜索學(xué)習(xí)，不保證所有內(nèi)容正確。

文章目錄

- - 1、Presto安裝使用
  - 2、事件分析
  - 3、漏斗分析
  - 4、漏斗分析UDAF開發(fā)
  - - 開發(fā)UDF插件
    - 開發(fā)UDAF插件
  - 5、漏斗測試

1、Presto安裝使用

參考官方文檔：https://prestodb.io/docs/current/

Presto是一個高效的查詢分析引擎，支持多種數(shù)據(jù)源，例如（Hive、MySQL、MD、Kafka等），內(nèi)部查詢是基于內(nèi)存操作的，相比較Spark效率更高，而且更大的特點(diǎn)在于可以自定義內(nèi)存空間，設(shè)置內(nèi)存使用大小。

安裝部署

# 創(chuàng)建目錄
mkdir -p /opt1/soft/presto
# 下載presto-server
wget -P /opt1/soft/presto http://doc.yihongyeyan.com/qf/project/soft/presto/presto-server-0.236.tar.gz
# 解壓
tar -zxvf presto-server-0.236.tar.gz
# 創(chuàng)建軟連
ln -s  /opt1/soft/presto/presto-server-0.236 /opt1/soft/presto/presto-server
# 安裝目錄下創(chuàng)建etc目錄
cd /opt1/soft/presto/presto-server/ && mkdir etc
# 創(chuàng)建節(jié)點(diǎn)數(shù)據(jù)目錄
mkdir -p /data1/presto/data
# 接下來創(chuàng)建配置文件
cd /opt/soft/presto/presto-server/etc/
# config.properties  persto server的配置
cat << EOF > config.properties 
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
# 單個查詢在整個集群上夠使用的最大用戶內(nèi)存
query.max-memory=3GB
# 單個查詢在每個節(jié)點(diǎn)上可以使用的最大用戶內(nèi)存
query.max-memory-per-node=1GB
# 單個查詢在每個節(jié)點(diǎn)上可以使用的最大用戶內(nèi)存+系統(tǒng)內(nèi)存（user memory: hash join,agg等，system memory：input/output/exchange buffers等）
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://0.0.0.0:8080
EOF# node.properties 節(jié)點(diǎn)配置
cat << EOF > node.properties 
node.environment=production
node.id=node01
node.data-dir=/data1/presto/data
EOF#jvm.config 配置,注意-DHADOOP_USER_NAME配置，替換為你需要訪問hdfs的用戶
cat << EOF > jvm.config 
-server
-Xmx3G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-DHADOOP_USER_NAME=root
EOF#log.properties
#default level is INFO. `ERROR`,`WARN`,`DEBUG`
cat << EOF > log.properties
com.facebook.presto=INFO
EOF# catalog配置，就是各種數(shù)據(jù)源的配置，我們使用hive，注意替換為你自己的thrift地址
mkdir /opt1/soft/presto/presto-server/etc/catalog
cat <<EOF > catalog/hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://192.168.10.99:9083
hive.parquet.use-column-names=true
hive.allow-rename-column=true
hive.allow-rename-table=true
hive.allow-drop-table=true
EOF# 添加hudi支持
wget -P /opt1/soft/presto/presto-server/plugin/hive-hadoop2 http://doc.yihongyeyan.com/qf/project/soft/hudi/hudi-presto-bundle-0.5.2-incubating.jar# 客戶端安裝
wget -P /opt1/soft/presto/ http://doc.yihongyeyan.com/qf/project/soft/presto/presto-cli-0.236-executable.jar
cd /opt1/soft/presto/
mv presto-cli-0.236-executable.jar presto
chmod u+x presto
ln -s /opt1/soft/presto/presto /usr/bin/presto  
# 至此presto 安裝完畢

在這里插入圖片描述

測試

# 啟動persto-server， 注意下方命令是在后臺啟動，日志文件在node.properties中配置的 /data2/presto/data/var/log/ 目錄下
/opt1/soft/presto/presto-server/bin/launcher start
# presot 連接hive metastore
presto --server 192.168.10.99:8080 --catalog hive --schema ods_news1
# 執(zhí)行查詢你會看到我們hive中的表
show tables;

進(jìn)入客戶端后，查詢數(shù)據(jù)很多，需要用end鍵查看下拉，如果想退出按q鍵退出查看

2、事件分析

在這里我們先確定實(shí)施方案，也就是我們接下來開發(fā)的各種模型要怎么使用，給你大家提供了三種方案，第一種就是使用可視化工具superset，第二種就是使用hue、第三種使用自研Web平臺，我們選擇的是第三種方式，這種方式需要編寫JDBC連接操作Presto，然后根據(jù)每個模型查詢出來的不同結(jié)果集，提供不同的接口，客戶端可以用過訪問HTTP請求來調(diào)用接口拿到每個不同模型的不同數(shù)據(jù)。

-- 2. 分版本各APP頁面訪問次數(shù)(PV)的TOP-3, [當(dāng)日準(zhǔn)實(shí)時數(shù)據(jù)，當(dāng)下時間延遲5分鐘]with t1 as(selectlogday,app_version,element_page,count(1) as pvfrom ods_news1.eventwhere logday='20201227' and app_version!=''group by 1,2,3
),
t2 as(select logday,app_version,element_page,pv,row_number() over(partition by app_version order by pv desc) as rankfrom t1
)
select * from t2 where t2.rank<=3 order by app_version desc;/*類似結(jié)果如下:logday  | app_version | element_page | pv | rank
----------+-------------+--------------+----+------20200619 | 2.3         | 我的         | 48 |    120200619 | 2.3         | 活動頁       | 40 |    220200619 | 2.3         | 新聞列表頁   | 39 |    320200619 | 2.2         | 搜索頁       | 40 |    120200619 | 2.2         | 新聞列表頁   | 38 |    220200619 | 2.2         | 活動頁       | 37 |    320200619 | 2.1         | 首頁         | 41 |    120200619 | 2.1         | 活動頁       | 37 |    220200619 | 2.1         | 注冊登錄頁   | 35 |    3
*/

-- 3. 天，小時，分鐘 級別的APP頁面點(diǎn)擊的UV數(shù),并保證每一列降序輸出 [注意使用上卷函數(shù)，當(dāng)日準(zhǔn)實(shí)時數(shù)據(jù)，當(dāng)下時間延遲5分鐘]
--上卷（匯總數(shù)據(jù)）
上卷就是乘坐電梯上升觀測人的過程。數(shù)據(jù)的匯總聚合，細(xì)粒度到粗粒度的過程,會無視某些維度
按城市匯總的人口數(shù)據(jù)上卷，觀察按國家人口的數(shù)據(jù)。就是由細(xì)粒度到粗粒度觀測數(shù)據(jù)的過程，應(yīng)該還會記錄相應(yīng)變化。--下鉆（明細(xì)數(shù)據(jù)）
上卷的反向操作，數(shù)據(jù)明細(xì)，粗粒度到細(xì)粒度的過程，會細(xì)化某些維度
可以按照城市匯總的人口數(shù)據(jù)下鉆，觀察按城鎮(zhèn)人口匯總的數(shù)據(jù)。由粗粒度變?yōu)榧?xì)粒度。--例
select * from table group by A；
select * from table group by A,B；
select * from table group by A,B,C；
自上而下粒度變細(xì)，為下鉆；
自下而上粒度變粗，為上卷with t1 as(
select
format_datetime(from_unixtime(ctime/1000),'yyyy-MM-dd') as log_day,
format_datetime(from_unixtime(ctime/1000),'yyyy-MM-dd HH') as log_hour,
format_datetime(from_unixtime(ctime/1000),'yyyy-MM-dd HH:mm') as log_minute,
distinct_id
from ods_news1.event
where logday='20201227' and event='AppClick'
)
select 
log_day,log_hour,log_minute,
count(distinct distinct_id) uv,
grouping(log_day,log_hour,log_minute) group_id
from t1
group by
rollup(log_day,log_hour,log_minute)
order by group_id desc,log_day desc ,log_hour desc ,log_minute desc
/*類似結(jié)果如下:log_day   |   log_hour    |    log_minute    |  uv  | group_id
------------+---------------+------------------+------+----------NULL       | NULL          | NULL             | 2341 |        72020-06-19 | NULL          | NULL             | 2341 |        32020-06-19 | 2020-06-19 18 | NULL             |  584 |        12020-06-19 | 2020-06-19 17 | NULL             |  585 |        12020-06-19 | 2020-06-19 16 | NULL             |  562 |        12020-06-19 | 2020-06-19 15 | NULL             |  571 |        12020-06-19 | 2020-06-19 14 | NULL             |  298 |        12020-06-19 | 2020-06-19 18 | 2020-06-19 18:59 |    7 |        02020-06-19 | 2020-06-19 18 | 2020-06-19 18:58 |   13 |        02020-06-19 | 2020-06-19 18 | 2020-06-19 18:57 |   11 |        02020-06-19 | 2020-06-19 18 | 2020-06-19 18:56 |    8 |        02020-06-19 | 2020-06-19 18 | 2020-06-19 18:55 |   14 |        02020-06-19 | 2020-06-19 18 | 2020-06-19 18:54 |   12 |        02020-06-19 | 2020-06-19 18 | 2020-06-19 18:53 |   10 |        0
*/

3、漏斗分析

sql實(shí)現(xiàn)

# 我們漏斗分析中定義的需求如下
注冊-> 點(diǎn)擊新聞-> 進(jìn)入詳情頁-> 發(fā)布評論  
# 轉(zhuǎn)換成事件
SignUp -> AppClick[element_page='新聞列表頁'] -> AppClick[element_page='內(nèi)容詳情頁']->NewsAction[action_type='評論']# 接下來我們用SQL實(shí)現(xiàn)這個需求
# 我們來查詢 20201227到20201230 事件范圍內(nèi)，并且窗口時間是3天的漏斗
注意：我們這里數(shù)據(jù)就三天，所以窗口期也就是不用判斷，但是我們以后可能會拿到N天數(shù)據(jù)，所以要加窗口期判斷

-- 分析sql，首先我們可以先把每一個事件的數(shù)據(jù)按照條件查詢出來，然后在將每一個事件中的時間拿到，進(jìn)行關(guān)聯(lián)查詢，通過時間進(jìn)行判斷該事件是否在窗口期以內(nèi)，并且還要和上一個事件判斷，一定要大于它
-- 拿到三天內(nèi)每一個事件數(shù)據(jù)
with t1 as(selectdistinct_id,ctime,eventfrom  ods_news1.eventwhere event='SignUp'and format_datetime(from_unixtime(ctime/1000),'yyyyMMdd') >='20200923'and format_datetime(from_unixtime(ctime/1000),'yyyyMMdd') <='20200925'
),
t2 as(selectdistinct_id,ctime,eventfrom  ods_news1.eventwhere event='AppClick' and element_page='新聞列表頁'and format_datetime(from_unixtime(ctime/1000),'yyyyMMdd') >='20200923'and format_datetime(from_unixtime(ctime/1000),'yyyyMMdd') <='20200925'
),
t3 as(selectdistinct_id,ctime,eventfrom  ods_news1.eventwhere event='NewsAction' and element_page='評論'and format_datetime(from_unixtime(ctime/1000),'yyyyMMdd') >='20200923'and format_datetime(from_unixtime(ctime/1000),'yyyyMMdd') <='20200925'
),
t4 as(selectdistinct_id,ctime,eventfrom  ods_news1.eventwhere event='SignIn'and format_datetime(from_unixtime(ctime/1000),'yyyyMMdd') >='20200923'and format_datetime(from_unixtime(ctime/1000),'yyyyMMdd') <='20200925'
)
select
count(distinct t1.distinct_id) step1,
count(t2.event) step2,
count(t3.event) step3,
count(t4.event) step4
from t1 
left join t2 
on t1.distinct_id=t2.distinct_id 
and t1.ctime<t2.ctime and t2.ctime-t1.ctime<86400*3*1000
left join t3 
on t2.distinct_id=t3.distinct_id
and t2.ctime<t3.ctime and t3.ctime-t1.ctime<86400*3*1000
left join t4  
on t3.distinct_id=t4.distinct_id
and t3.ctime<t4.ctime and t4.ctime-t1.ctime<86400*3*1000

# 執(zhí)行上述查詢可以看到如下類似結(jié)果step1 | step2 | step3 | step4
-------+-------+-------+-------3154 |    79 |     2 |     1
# 代表著我們的漏斗的每一步的人數(shù)

4、漏斗分析UDAF開發(fā)

分析：UDAF開發(fā)我們分為兩步處理，第一步處理數(shù)據(jù)，求出用戶深度即可，第二步根據(jù)每一個用戶的深度將其轉(zhuǎn)換成數(shù)組，集合每一個數(shù)組中對應(yīng)下標(biāo)值，然后求sum。

Presto使用操作：

需要掌握內(nèi)容：

1、開辟內(nèi)存空間大小

2、合理設(shè)置存入數(shù)據(jù)大小，保證別越界，超出內(nèi)存

3、內(nèi)存地址結(jié)合使用

開發(fā)UDF插件

開發(fā)完成代碼后，然后將插件要部署到Presto上面，前提先打Jar，然后上傳到Presto，最后重啟，使用函數(shù)

在這里插入圖片描述

@ScalarFunction("my_upper") // 固定參數(shù)，這里面表示函數(shù)名的意思，也就我們在使用Presto的時候用的函數(shù)名
@Description("我的大小寫轉(zhuǎn)換函數(shù)") // 函數(shù)的注釋
@SqlType(StandardTypes.VARCHAR) // 表示數(shù)據(jù)類型

開發(fā)UDAF插件

@AggregationFunction("sumDouble") // 函數(shù)名
@Description("this is a sum double") // 注釋
@InputFunction  輸入的方法注釋
@CombineFunction  合并方法注釋
@OutputFunction()  輸出方法注釋

同理，打包上傳即可，然后重啟Presto就可以使用。

5、漏斗測試

用戶深度

select funnel(ctime, 86400*1000*3, event, 'SignUp,AppClick,AppClick,NewsAction') as user_depth
from ods_news1.event
where  (
event in ('SignUp') 
or (event='AppClick' and element_page='新聞列表頁' )
or (event='AppClick' and element_page='內(nèi)容詳情頁' )
or (event='NewsAction' and action_type='評論' )
)
and logday>='20201227' and logday<'20201230'
group by distinct_id

完整sql

select funnel_merger(user_depth, 4) as funnel_array from(
select funnel(ctime, 86400*1000*3, event, 'SignUp,AppClick,NewsAction,SignIn') as user_depth
from ods_news1.event
where  (
event in ('SignUp') 
or (event='AppClick' and element_page='新聞列表頁' )
or (event='NewsAction' and action_type='評論' )
or (event='SignIn')
)
and logday>='20200923' and logday<'20200925'
group by distinct_id
);

注意：我的數(shù)據(jù)里面沒有AppPageView數(shù)據(jù)，所以我在執(zhí)行的時候沒有添加它，但是我添加了兩個AppClick就不對了，因?yàn)槲覀冊陂_發(fā)UDAF的時候里面設(shè)置的是Map類型結(jié)構(gòu)，我們獲取Event名稱的時候，發(fā)現(xiàn)相同Key了，而Map的Key是唯一的，所以你寫入Key值得時候，會被覆蓋，那么數(shù)據(jù)就亂了，所以這里我選擇了一個SignIn，這個字段也沒有的，只是代替一下，所以大家在操作的時候要看一下你的數(shù)據(jù)是否有這幾個事件，不然結(jié)果就有可能不對。

查看全文

http://m.aloenet.com.cn/news/42817.html

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

搜索網(wǎng)站開發(fā)背景廣告留電話號的網(wǎng)站

文章目錄

1、Presto安裝使用

2、事件分析

3、漏斗分析

4、漏斗分析UDAF開發(fā)

開發(fā)UDF插件

開發(fā)UDAF插件

5、漏斗測試

相關(guān)文章：

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

文章目錄

1、Presto安裝使用

2、事件分析

3、漏斗分析

4、漏斗分析UDAF開發(fā)

開發(fā)UDF插件

開發(fā)UDAF插件

5、漏斗測試

相關(guān)文章：

1、Presto安裝使用

2、事件分析

5、漏斗測試