

Yes, you are correct! with larger disks there would be longer healing time in case of disk failures and extra overhead for the NN to re-replicate bunch of blocks. Is it true? Also are there any consideration about number of master nodes? How can I decide it should be more than 2? So if I use an edge node, slave or master nodes wouldn't need to connect to outer network(except administration works), data transfer can be done over edge node. As far as I know edge node is a gateway between hadoop cluster and outer network. My last question about edge node and master nodes.

Have you ever tried that? Does it provide using heterogeneous disk types at different racks or in a same rack for different data types? According to hadoop documents, storage tiering is possible. Are there any calculation for name node storage requirement? For example how much meta data area is required for 100 TB hadoop data? Does it mean spark is CPU intensive also? Roughly, without JVM issue, can we say spark is cpu intensive?

Another issue about Spark, according to my readings more than 200 GB memory Java VM may not behave well so serialization recommended. What is the density of this type of operations in a spark task? Do speed and capacity of disks matter? What are the storage considerations about Apache Spark? It is documented that spark can use disks if tasks doesn't fit to memory and for intermediate outputs between stages. Today big capacities like 8 TB is possible in a single disk, what do you think using these disks? It seems if this type disk fails, healing time would take longer, so does it effect cluster performance? But according to many documents, it is said that using small capacities is better, but many documents are 2 years old or old. If workload needs performance using fast disks(SAS) is feasible, if workload needs storage then SATA disks can be used. According to public documents, storage requirement depends on workload. I have a bunch of questions about hadoop cluster hardware configuration, mostly about storage configuration.
