This past week we had quite few issues with users not being able to run Spark jobs running in YARN Cluster mode. Particularly a team that was on tight schedule used to get errors like this all the time: java.io.IOException: Failed to send RPC 8277242275361198650 to datanode-055: java.nio.channels.ClosedChannelException Mostly accompanied by error messages like: org.apache.spark.SparkException: Error … Continue reading Spark Error: Failed to Send RPC to Datanode
Author: Saeed Barghi
YARN Capacity Scheduler: Queue Priority
Capacity Scheduler is designed to run Hadoop jobs in a shared, multi-tenant cluster in a friendly manner. Its main strength is that it guarantees specific capacity for a certain group of users by supporting multiple queues and allowing users to submit their queries into their dedicated queues. Each queue is given a fraction of total … Continue reading YARN Capacity Scheduler: Queue Priority
Hive Performance Tuning
If you have been working in Big Data, you have definitely heard of Hive. Apache Hive is the data warehouse infrastructure build on top of Hadoop. I did a presentation on how to best use Apache Hive and few tips on how to best use it for one of our clients last week that I … Continue reading Hive Performance Tuning
Hadoop Error org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block
Like almost all Mondays, today was a very challenging one. The first thing I noticed was that our primary namenode had faced some issues over the weekend and went down. Which means secondary namenode, namenode-02, was active. I checked namenode-01 and made sure it is okay before making it active again. After that, I was made … Continue reading Hadoop Error org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block
How to import org.apache.spark.sql.SQLContext.implicits in Spark 1.6: error “value toDF is not a member of org.apache.spark.rdd.RDD”
Note: If you're using Spark 2.2, please read this post I am doing a mini project for my company using Spark/Scala and have been stuck with the error mentioned in the title for a couple of days. Googling that error suggested to import org.apache.spark.sql.SQLContext.implicits, and that's what I did: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.sql._ import … Continue reading How to import org.apache.spark.sql.SQLContext.implicits in Spark 1.6: error “value toDF is not a member of org.apache.spark.rdd.RDD”
OBIEE RPD Design: Convert Snowflake to Star schema from multiple sources in (Combine dimensions)
As I play more with OBIEE, I learn more about what it is capable of and where its main power resides. OBIEE has 3 layers: Physical, Business Model and Mapping, and Presentation. The middle layer, BMM, is what makes OBIEE special: it is where we can define how data from different sources and tables come … Continue reading OBIEE RPD Design: Convert Snowflake to Star schema from multiple sources in (Combine dimensions)
OBIEE: Multiple joins between same tables (Fact to Dim)
Hi all. I am finally writing a new post after more than 1 year and surprisingly it is not on SQL Server! I must confess that I am not a front-end kinda person and do not particularly enjoy doing dashboards and reports. But I recently started a new job and my first project is going … Continue reading OBIEE: Multiple joins between same tables (Fact to Dim)
An alternative for Index with Include: CLUSTERED (UNIQUE) Index
Indexes are good and helpful for reading from the tables and one can add different indexes based on any kind of queries that are going to be issued against the tables and make sure they always hit indexes, not the table. But that strategy has a drawback: it'll slow down inserts into the table because … Continue reading An alternative for Index with Include: CLUSTERED (UNIQUE) Index
SQL Server 2014 and SSDT (AKA BIDS)
Hey folks. This is going to be a short post, just wanted to mention something that may come handy for those who are interested in play with SQL Server 2014. I downloaded SQL Server 2014 a couple of weeks ago and started exploring its new features to see what has changed/improved. After going through very … Continue reading SQL Server 2014 and SSDT (AKA BIDS)
Drop failed for DatabaseRole : The database principal owns a schema in the schema and cannot be dropped
This error is raised when there is a schema owned by the role you are trying to drop. The most straight forward and quick fix for this error is to revert the schema ownership to the appropriate role in the database, which will make the dropping role not the owner of any schema in the database … Continue reading Drop failed for DatabaseRole : The database principal owns a schema in the schema and cannot be dropped
SSIS, Tempdb Database, and SQL Server Log Files
The tempdb system database is a global resource that is available to all users connected to the instance of SQL Server. Tempdb is re-created every time SQL Server is started, which means the system always starts with a clean copy of the database and there is never anything in tempdb that is saved from one … Continue reading SSIS, Tempdb Database, and SQL Server Log Files
Get tables and their data without backing up the database: SQL Server 2008 R2
There may be situations when you want to get all of the tables in a SQL Server database and the data they currently hold, but you can't (because of permissions assigned to your account, for example) or don't want to backup the whole database. Well, follow the following steps to achieve this: 1- Open SSMS … Continue reading Get tables and their data without backing up the database: SQL Server 2008 R2
Improve your SSIS package’s performance
Hello everyone. I spent almost the whole last week and the first 2 days of this week trying to improve my BI solutions' performance. In my quest on learning the tricks to make my package faster, I came across SSIS Performance Design Patterns video by Matt Masson. A comprehensive discussion indeed, that I'm gonna list … Continue reading Improve your SSIS package’s performance
Making SSRS reports faster: get rid of Parameter Sniffing
Is your SSRS report running slowly? Are you using a stored procedure to pull the data and pass them to report? If your answer to these questions is yes, then you are a victim of SQL Server's Parameter Sniffing. The first question is, what is Parameter Sniffing? It refers to SQL Server's effort to reduce … Continue reading Making SSRS reports faster: get rid of Parameter Sniffing
Spliting records into multiple rows using SSIS Script Component
Hi folks. This is a post in response to Add Interim Rows Depending on Max and Min Values question asked in MSDN SSIS Forum, but it could be applied to any similar situation and problem. HEre I am gonna describe the steps needs to be taken to split the records coming from a data base table … Continue reading Spliting records into multiple rows using SSIS Script Component