Using Hadoop MapReduce, Java programs are written to process large amounts of data. Testing has to be performed to check the accurate functioning of these applications. The testing process includes manually verifying business logic on each node for MapReduce process accuracy, data aggregation/segregation rules, and generation of key value pairs.
At the same time, output data files are also verified for transformation rules, successful load, data integrity, and data accuracy. Due to the enormous amount of data and various business rules, a manual testing process is time consuming and may lead to slippage of validations. Implementing the automation testing process using Selenium and Java adapters will make sure the data is complied with all the business/transformation rules and checks the data integrity.
Video producer: http://seleniumconf.org/
Further reading: Big Data: How to Test the Elephant?