[ Pobierz całość w formacie PDF ]
.Configuring Hadoop securityConfiguring Hadoop to operate in secure mode can be a daunting task with a numberof external dependencies.Detailed knowledge of Linux, Kerberos, SSL/TLS, and JVMsecurity constructs are required.At the time of this book, there are also some knowngotchas that exist in certain Linux distributions and versions of the JVM that can causeyou grief.Some of those are exposed below.The high-level process for enabling security is as follows.1.Audit all services to ensure enabling security will not break anything.Hadoop security is all or nothing; enabling it will prevent all non-Kerberos au-thenticated communication.It is absolutely critical that you first take an inventoryof all existing processes, both automated and otherwise, and decide how each willwork once security is enabled.Don t forget about administrative scripts and tools!2.Configure a working non-security enabled Hadoop cluster.Before embarking on enabling Hadoop s security features, get a simple mode clus-ter up and running.You ll want to iron out any kinks in DNS resolution, networkconnectivity, and simple misconfiguration early.Debugging network connectivityissues and supported encryption algorithms within the Kerberos KDC at the sametime is not a position that you want to find yourself in.3.Configure a working Kerberos environment.Basic Kerberos operations such as authenticating and receiving a ticket-grantingticket from the KDC should work before you continue.You are strongly encour-aged to use MIT Kerberos with Hadoop; it is, by far, the most widely tested.If youhave existing Kerberos infrastructure (such as provided by Microsoft Active Di-rectory) that you wish to authenticate against, it is recommended that you config-ure a local MIT KDC with one way cross realm trust so Hadoop daemon principalsexist in the MIT KDC and user authentication requests are forwarded to ActiveDirectory.This is usually far safer as large Hadoop clusters can accidentally createdistributed denial of service attacks against shared infrastructure when they be-come active.4.Ensure host name resolution is sane.As discussed earlier, each Hadoop daemon has its own principal that it must knowin order to authenticate.Since the hostname of the machine is part of the principal,all hostnames must be consistent and known at the time the principals are created.Once the principals are created, the hostnames may not be changed without rec-reating all of the principals! It is common that administrators run dedicated, cach-ing-only, DNS name servers for large clusters.Kerberos and Hadoop | 1435.Create Hadoop Kerberos principals.Each daemon on each host of the cluster requires a distinct Kerberos principalwhen enabling security.Additionally, the Web user interfaces must also be givenprincipals before they will function correctly.Just as the first point says, securityis all or nothing.6.Export principal keys to keytabs and distribute them to the proper cluster nodes.With principals generated in the KDC, each key must be exported to a keytab, andcopied to the proper host securely.Doing this by hand is incredibly laborious foreven small clusters and, as a result, should be scripted.7.Update Hadoop configuration files.With all the principals generated and in their proper places, the Hadoop configu-ration files are then updated to enable security.The full list of configuration prop-erties related to security are described later.8.Restart all services.To activate the configuration changes, all daemons must be restarted.The firsttime security is configured, it usually makes sense to start the first few daemons tomake sure they authenticate correctly and are using the proper credentials beforefiring up the rest of the cluster.9.Test!It s probably clear by now that enabling security is complex and requires a fair bitof effort.The truly difficult part of configuring a security environment is testingthat everything is working correctly.It can be particularly difficult on a large pro-duction cluster with existing jobs to verify that everything is functioning properly,but no assumptions should be made.Kerberos does not, by definition, afford le-niency to misconfigured clients.Creating principals for each of the Hadoop daemons and distributing their respectivekeytabs is the most tedious part of enabling Hadoop security.Doing this for each dae-mon by hand would be rather error prone, so instead, we ll create a file of host namesand use a script to execute the proper commands.These examples assume MIT Ker-beros 1.9 on CentOS 6.2.2First, build a list of fully qualified host names, either by exporting them from an in-ventory system or generating them based on a well-known naming convention
[ Pobierz całość w formacie PDF ]