1、传统方式获得总条数例如以下:
select count(*) from ods.tracklog;
讯享网
讯享网不含分区 select FORMAT(sum(tb.PARAM_VALUE),0) from TBLS t left join DBS d on t.DB_ID = d.DB_ID left join TABLE_PARAMS tb on t.TBL_ID = tb.TBL_ID where d.NAME='hive_ods' and tb.PARAM_KEY='numRows'; 查询含有分区表的总条数 select FORMAT(sum(a.PARAM_VALUE),0) from TBLS t left join DBS d on t.DB_ID = d.DB_ID left join PARTITIONS p on t.TBL_ID = p.TBL_ID left join PARTITION_PARAMS a on p.PART_ID=a.PART_ID where d.NAME='hive_ods_partition' and a.PARAM_KEY='numRows';
B、查指定表
不含分区 select d.NAME,t.TBL_NAME,t.TBL_ID,p.PART_ID,p.PART_NAME,a.PARAM_VALUE from TBLS t left join DBS d on t.DB_ID = d.DB_ID left join PARTITIONS p on t.TBL_ID = p.TBL_ID left join PARTITION_PARAMS a on p.PART_ID=a.PART_ID where t.TBL_NAME='tracklog' and d.NAME='ods' and a.PARAM_KEY='numRows'; 含分区 select FORMAT(sum(a.PARAM_VALUE),0) from TBLS t left join DBS d on t.DB_ID = d.DB_ID left join PARTITIONS p on t.TBL_ID = p.TBL_ID left join PARTITION_PARAMS a on p.PART_ID=a.PART_ID where t.TBL_NAME='tracklog' and d.NAME='ods' and a.PARAM_KEY='numRows';
讯享网SELECT a.TBL_ID, d.`NAME` dbName, a.TBL_NAME, b.PARAM_VALUE numRows, c.PARAM_VALUE totalSize FROM TBLS AS a left JOIN TABLE_PARAMS AS b on a.TBL_ID = b.TBL_ID left JOIN TABLE_PARAMS AS c on a.TBL_ID = c.TBL_ID left JOIN DBS as d on d.DB_ID = a.DB_ID where b.PARAM_KEY = "numRows" and c.PARAM_KEY = "totalSize"
B、获取所有分区的总条数和总大小
SELECT main.*,b.numRows,b.totalSize FROM ( SELECT main.TBL_ID AS metaObjectId ,main.TBL_NAME AS tabName ,b.PART_ID AS partId ,b.PART_NAME AS partitionName ,b.CREATE_TIME AS createTime ,b.LAST_ACCESS_TIME AS updateTime FROM TBLS main INNER JOIN PARTITIONS b ON main.TBL_ID = b.TBL_ID WHERE b.PART_NAME IS NOT NULL ) main LEFT JOIN ( SELECT PART_ID AS partId, MAX(CASE PARAM_KEY WHEN 'numRows' THEN PARAM_VALUE ELSE 0 END) AS numRows, MAX(CASE PARAM_KEY WHEN 'totalSize' THEN PARAM_VALUE ELSE 0 END) AS totalSize FROM PARTITION_PARAMS GROUP BY PART_ID ) b ON main.partId=b.partId;
重点:对应hive分区的几个比较重要的元数据表
TBLS:hive所有表的基础信息,包括id,表名,等等
PARTITIONS:跟分区有关的表, 包括分区id,创建时间,分区表名,以及TBLS主表关联id等等
PARTITION_PARAMS:分区表比较重要的一些数据
主要字段:
numfiles:该分区下的文件数,
numRows:该分区下记录数,
rawDataSize是指原始数据的大小,
totalSize是指占用HDFS存储空间大小,
transient_lastDdlTime最后一次ddl时间

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/30003.html