Generate Big Datasets with Hive in HDInsight
Update 2020-05-29: Generate Big Datasets with Databricks I authored a post on the BlueGranite site on how to generate big datasets with Databricks. This is a more modern approach to generating the same TPC-DS benchmark datasets, and I recommend it over the HDInsight solution described below. The TPC (Transaction Processing Performance Council) provides tools for generating the benchmarking data, but using them to generate big data is not trivial, and would take a very long time on modest hardware....