atom feed30 messages in org.apache.spark.issues[jira] [Comment Edited] (SPARK-21725)...
FromSent OnAttachments
xinzhang (JIRA)Oct 30, 2017 11:44 pm 
xinzhang (JIRA)Oct 30, 2017 11:56 pm 
xinzhang (JIRA)Oct 31, 2017 12:02 am 
xinzhang (JIRA)Oct 31, 2017 12:04 am 
xinzhang (JIRA)Oct 31, 2017 12:08 am 
xinzhang (JIRA)Oct 31, 2017 12:08 am 
xinzhang (JIRA)Oct 31, 2017 12:10 am 
xinzhang (JIRA)Oct 31, 2017 7:31 pm 
xinzhang (JIRA)Oct 31, 2017 7:32 pm 
xinzhang (JIRA)Oct 31, 2017 7:33 pm 
xinzhang (JIRA)Oct 31, 2017 7:38 pm 
xinzhang (JIRA)Oct 31, 2017 7:39 pm 
xinzhang (JIRA)Oct 31, 2017 7:53 pm 
xinzhang (JIRA)Oct 31, 2017 8:00 pm 
xinzhang (JIRA)Oct 31, 2017 8:09 pm 
xinzhang (JIRA)Oct 31, 2017 8:17 pm 
xinzhang (JIRA)Oct 31, 2017 8:47 pm 
xinzhang (JIRA)Oct 31, 2017 8:49 pm 
xinzhang (JIRA)Oct 31, 2017 8:49 pm 
xinzhang (JIRA)Oct 31, 2017 8:52 pm 
xinzhang (JIRA)Oct 31, 2017 10:26 pm 
xinzhang (JIRA)Nov 1, 2017 3:07 am 
xinzhang (JIRA)Nov 1, 2017 4:06 am 
xinzhang (JIRA)Nov 1, 2017 4:18 am 
xinzhang (JIRA)Nov 1, 2017 4:19 am 
xinzhang (JIRA)Nov 1, 2017 6:10 pm 
xinzhang (JIRA)Nov 2, 2017 12:25 am 
xinzhang (JIRA)Nov 2, 2017 12:25 am 
xinzhang (JIRA)Nov 2, 2017 12:26 am 
xinzhang (JIRA)Nov 2, 2017 12:27 am 
Subject:[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
From:xinzhang (JIRA) (ji@apache.org)
Date:Oct 31, 2017 12:08:00 am
List:org.apache.spark.issues

[
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226337#comment-16226337
]

xinzhang edited comment on SPARK-21725 at 10/31/17 7:07 AM:

------------------------------------------------------------

[~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the
parameter's default value. If I tried set hive.default.fileformat=Parquet; The
problem has gone!! {color:red}Do not Miss the last pic that is the problem
core!!{color}) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql (spark-master I build it with master the
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver First time . Spark-sql result: *{color:red}GOOD{color}* Second time .Spark-sql result: *{color:red}BAD{color}* !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

{color:red}------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------{color} 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !!!!

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!

was (Author: zhangxin0112zx): [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the
parameter's default value. If I tried set hive.default.fileformat=Parquet; The
problem has gone!! {color:red}Do not Miss the last pic that is the problem
core!!{color}) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql (spark-master I build it with master the
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver First time . Spark-sql result: *{color:red}GOOD{color}* Second time .Spark-sql result: *{color:red}BAD{color}* !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

{color:red}----------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------{color} 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !!!!

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!

spark thriftserver insert overwrite table partition select

-----------------------------------------------------------

Key: SPARK-21725 URL: https://issues.apache.org/jira/browse/SPARK-21725 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Environment: centos 6.7 spark 2.1 jdk8 Reporter: xinzhang Labels: spark-sql

use thriftserver create table with partitions. session 1: SET hive.default.fileformat=Parquet;create table tmp_10(count bigint)
partitioned by (pt string) stored as parquet; --ok !exit session 2: SET hive.default.fileformat=Parquet;create table tmp_11(count bigint)
partitioned by (pt string) stored as parquet; --ok !exit session 3: --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10
partition(pt='1') select count(1) count from tmp_11; --ok !exit session 4(do it again): --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10
partition(pt='1') select count(1) count from tmp_11; --error !exit

------------------------------------------------------------------------------------- 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query,
currentState RUNNING, java.lang.reflect.InvocationTargetException ...... ...... Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move
source
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 512282-2/-ext-10000/part-00000 to destination
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-00000 at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) ... 45 more Caused by: java.io.IOException: Filesystem closed ....

------------------------------------------------------------------------------------- the doc about the parquet table desc here
http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files Hive metastore Parquet table conversion When reading from and writing to Hive metastore Parquet tables, Spark SQL will
try to use its own Parquet support instead of Hive SerDe for better performance.
This behavior is controlled by the spark.sql.hive.convertMetastoreParquet
configuration, and is turned on by default. I am confused the problem appear in the table(partitions) but it is ok with
table(with out partitions) . It means spark do not use its own parquet ? Maybe someone give any suggest how could I avoid the issue?