YARN Capacity Scheduler: Queue Priority

Capacity Scheduler is designed to run Hadoop jobs in a shared, multi-tenant cluster in a friendly manner. Its main strength is that it guarantees specific capacity for a certain group of users by supporting multiple queues and allowing users to submit their queries into their dedicated queues. Each queue is given a fraction of total … Continue reading YARN Capacity Scheduler: Queue Priority

Hive Performance Tuning

If you have been working in Big Data, you have definitely heard of Hive. Apache Hive is the data warehouse infrastructure build on top of Hadoop. I did a presentation on how to best use Apache Hive and few tips on how to best use it for one of our clients last week that I … Continue reading Hive Performance Tuning

Hadoop Error org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block

Like almost all Mondays, today was a very challenging one. The first thing I noticed was that our primary namenode had faced some issues over the weekend and went down. Which means secondary namenode, namenode-02, was active. I checked namenode-01 and made sure it is okay before making it active again. After that, I was made … Continue reading Hadoop Error org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block

How to import org.apache.spark.sql.SQLContext.implicits in Spark 1.6: error “value toDF is not a member of org.apache.spark.rdd.RDD”

Note: If you're using Spark 2.2, please read this post I am doing a mini project for my company using Spark/Scala and have been stuck with the error mentioned in the title for a couple of days. Googling that error suggested to import org.apache.spark.sql.SQLContext.implicits, and that's what I did: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.sql._ import … Continue reading How to import org.apache.spark.sql.SQLContext.implicits in Spark 1.6: error “value toDF is not a member of org.apache.spark.rdd.RDD”