## *Querying, Organizing and Visualizing Materials Data*


**Why?** Access to data associated with materials in electronic form enables engineers, scientists and
students to explore this data, display it graphically, find trends and develop models.

**What?** In this tutorial, we will learn how to query, organize and plot data from the databases associated with the Python libraries [Pymatgen](http://pymatgen.org/) and [Mendeleev](https://mendeleev.readthedocs.io/en/stable/). 

**How to use this?** This tutorial uses Python, some familiarity with programming would be beneficial but is not required. Run each code cell in order by clicking "Shift + Enter". Feel free to modify the code, or change queries to familiarize yourself with the workings on the code.


Suggested modifications and exercises are included in <font color=blue> blue</font>.

**Outline:**

1. Query from Pymatgen
2. Processing and Organizing Data
3. Plotting
4. Query from Mendeleev

**Get started:** Click "Shift-Enter" on the code cells to run! 

In [None]:
# These lines import both libraries and then define an array with elements to be used below

import pymatgen as pymat
import mendeleev as mendel

elements = ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg',
            'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr',
            'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br',
            'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag',
            'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'Hf', 'Ta', 'W',
            'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'La', 'Ce', 'Pr',
            'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu',
            'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu']

### 1. Query from Pymatgen

Pymatgen is an open-source library for python used for material analysis. Pymatgen is a powerful and popular library has become popular and can be used to access data in the [Materials Project](https://materialsproject.org/) and [Crystallography Open Database](http://www.crystallography.net/cod/). Pymatgen makes querying these resources user-friendly and in this tutorial we will be querying a database within the library by using the **Element** class. 

Making a query in Pymatgen requires the chemical symbol of the element, which are all listed in the cell above. From there, the property is accessible as an attribute of that Element object. For a list of all the properties that you can query, click [here](http://pymatgen.org/pymatgen.core.periodic_table.html?pymatgen.core.periodic_table.Specie.element) to look at the documentation for the Element class.

In this example we will query the Young's modulus for the elements in the list "sample". You will be able to see the values with the corresponding units for this quantity. You can use the commented code to query all the properties listed for the "sample" elements.

In [None]:
querable_pymatgen = ["atomic_mass", "poissons_ratio","atomic_radius", "electrical_resistivity","molar_volume","thermal_conductivity", "bulk_modulus", "youngs_modulus", 
                    "brinell_hardness", "average_ionic_radius", "melting_point", "rigidity_modulus", "density_of_solid","coefficient_of_linear_thermal_expansion"]

sample = ['Fe', 'Co', 'Ni', 'Cu', 'Zn']

for item in sample:
    element_object = pymat.Element(item)
    print(item, element_object.youngs_modulus) # You can change "youngs_modulus" to any of the properties in the querable_pymatgen list
    
#for item in sample:
#    for i in querable_pymatgen:
#        element_object = pymat.Element(item)
#        print(item, i, getattr(element_object,i))

 * <font color=blue> **Exercise 1.** Modify the query above to extract Brinell hardness. </font>
 * <font color=blue> **Exercise 2.** Uncomment the lines above to see all the properties of the selected elements. </font>
 
 Remember: "Shift-Enter" to re-run the cell.
 

### 2. Processing and Organizing Data

After going through the basics of a query, we will now learn how to organize data in Python lists and dictionaries.

Entries in a dictionary have a name (in our case, the element) and attributes associated with it. Dictonaries can be useful to store a collection of data values from a particular element. In this example, we will create one to store some of the properties for Iron, using queries from both of the libraries we discussed. Note that the specific heat is obtained from Mendeleev, which is another database to access properties of elements.

In [None]:
Fe_data = {} # Initializing a dictionary

# Each of the following lines is making a single entry

Fe_data["atomic_number"] = mendel.element("Fe").atomic_number 
Fe_data["coefficient_of_linear_thermal_expansion"] = pymat.Element("Fe").coefficient_of_linear_thermal_expansion
Fe_data["youngs_modulus"] = pymat.Element("Fe").youngs_modulus
Fe_data["specific_heat"] = mendel.element("Fe").specific_heat

#Print the entire entry for Fe
print(Fe_data)

#Print a specific attribute:
print(Fe_data["specific_heat"])

# This line is to delete an entry
    # del Fe_data["atomic_number"]

Another way we can organize data is in lists, which can be very helpful if we want to create plots with our data. Following the examples above, we will now query two specific properties for all elements to get a list of values which will be indexed corresponding to the positions of the elements in the "elements" list in the first cell of the tutorial.

In [None]:
sample = elements.copy()

CTE = [] # In this list we will store the Coefficients of Thermal Expansion
youngs_modulus = [] # In this list we will store the Young's Moduli
melting_temperature = [] # In this list we will store the Melting Temperatures

for item in sample:
    CTE.append(pymat.Element(item).coefficient_of_linear_thermal_expansion)
    youngs_modulus.append(pymat.Element(item).youngs_modulus)
    melting_temperature.append(pymat.Element(item).melting_point)

# You can visualize the lists by uncommenting these print statements
#print(CTE)
#print(youngs_modulus)
#print(melting_temperature)


# We will use the following arrays to group elements by their crystal structure at RT, all elements that are gases and liquids at RT have been removed

fcc_elements = ["Ac", "Ag", "Al", "Au", "Cu", "Ir", "Ni", "Pb", "Pd", "Pt", "Rh", "Sr", "Th", "Yb"]
bcc_elements = ["Ba", "Cr", "Cs", "Eu", "Fe", "K", "Li", "Mn", "Mo", "Na", "Nb", "P", "Rb", "Ta", "V", "W" ]
hcp_elements = ["Be", "Ca", "Cd", "Co", "Dy", "Er", "Gd", "Hf", "Ho", "Lu", "Mg", "Os", "Re", "Ru", "Sc", "Tb", "Tc","Ti", "Tl", "Tm", "Y", "Zn", "Zr"]

# Others (Solids): "B", "Sb", "Sm", "Bi" and "As" are Rhombohedral; "C" , "Ce" and "Sn" are Allotropic; "Si" and "Ge" are Face-centered diamond-cubic; "Pu" is Monoclinic;
#                  "S", "I", "U", "Np" and "Ga" are Orthorhombic; "Se" and "Te" Hexagonal; "In" and "Pa" are Tetragonal; "la", "Pr", "Nd", "Pm" are Double hexagonal close-packed;

### 3. Plotting

Finally, we are going to plot the values for the properties in the lists we just created. For this tutorial we will make two scatter plots:

-  Young's Modulus vs Melting Temperature
-  Coefficient of Linear Thermal Expansion vs Melting Temperature

We will be using a Python library called [Plotly](https://plot.ly/python/) to create these plots. This library allows you to create plots that are really interactive and highly customizable. <br>

#### Simple Plot

In this first cell we will import the library components we will use and create a simple plot.

In [None]:
import plotly #This is the library import
import plotly.graph_objs as go # This is the graphical object (Think "plt" in Matplotlib if you have used that before)

from plotly.offline import iplot # These lines are necessary to run Plotly in Jupyter Notebooks, but not in a dedicated environment
plotly.offline.init_notebook_mode(connected=True)

# To create a plot, you need a layout and a trace

# The layout gives Plotly the instructions on the background grids, tiles in the plot, 
# axes names, axes ticks, legends, labels, colors on the figure and general formatting.

layout = go.Layout(title = "Young's Moduli vs Melting Temperature",xaxis= dict(title= 'Melting Temperature (K)'), 
                   yaxis= dict(title= 'Youngs Modulus (GPa)'))

# The trace contains a type of plot (In this case, Scatter, but it can be "Bars, Lines, Pie Charts", etc.), 
# the data we want to visualize and the way ("Mode") we want to represent it.

trace = go.Scatter(x = melting_temperature, y = youngs_modulus, mode = 'markers')

# To plot, we create a figure and implement our components in the following way:

data = [trace] # We could include more than just one trace here

fig= go.Figure(data, layout=layout)
iplot(fig)

#### CUSTOM PLOTS

Now that we know how to make a basic plot, we can start adding more details to end up with something that looks a little bit better. All modifications are explained in the comments, but you can also find that information [here](https://plot.ly/python/axes/).

Before we start our new plot, wouldn't it look better if we could visualize the points with the elements' names and color them according to their crystal structures?

In [None]:
# Here we are creating a function that takes a value X (Which will be the Symbol of the Element) 
# and returns a color depending on what its crystal structure is in our arrays from the beginning.
# That is because we want to color data according to the crystal structure; therefore, we will have to pass this info to the plot

def SetColor_CrystalStr(x):
    if x in fcc_elements:
        return "red" #This are standard CSS colors, but you can also use Hexadecimal Colors (#009900) or RGB "rgb(0, 128, 0)"
    elif x in bcc_elements:
        return "blue"
    elif x in hcp_elements:
        return "yellow"
    else:
        return "lightgray"
    
# We will then create a list that passes all element symbols through this function. For that we will use the python function "map"    
# Map takes each element on a list and evaluates it in a function.

colors = list(map(SetColor_CrystalStr, sample))

# You can see this list of generated colors looks like by uncommenting this line

#print(colors)

In [None]:
# Back to creating a new plot

# Layout
layout0= go.Layout(title= 'Youngs Moduli vs Melting Temperature', hovermode= 'closest',  # Hovermode establishes the way the labels that appear when you hover are arranged
    xaxis= dict(title= 'Melting Temperature (K)', showgrid=True, zeroline= False, gridwidth= 1), # Axis Titles. Removing the X-axis Mark. Adding a Grid
    yaxis= dict(title= 'Youngs Modulus (GPa)', showgrid=True, zeroline= False, gridwidth= 1) # Axis Titles. Removing the Y-axis Mark. Adding a Grid
)

#Trace
trace0 = go.Scatter(x = melting_temperature,y = youngs_modulus, mode = 'markers',
    marker= dict(size= 14, line= dict(width=1), color=colors), # We add a size, a border and our custom colors to the markers
    text= sample # This attribute (Text) labels each point to this list, which contains our elements in the same indexes as our properties
)

data = [trace0]
fig= go.Figure(data, layout=layout0)
iplot(fig)

<font color=blue> **Exercise 3.**  a) Find the three metals with highest Young's moduli. b) What are the Young's moduli of Al, Fe and Pb? </font>

In [None]:
# Layout
layout1= go.Layout(title= 'CTE vs Melting Temperature', hovermode= 'closest',  # Hovermode establishes the way the labels that appear when you hover are arranged
    xaxis= dict(title= 'Melting Temperature (K)', showgrid=True, zeroline= False, gridwidth= 1), # Axis Titles. Removing the X-axis Mark. Adding a Grid
    yaxis= dict(title= 'Coefficient of Linear Thermal Expansion (K^-1)', showgrid=True, zeroline= False, gridwidth= 1) # Axis Titles. Removing the Y-axis Mark. Adding a Grid
)

#Trace
trace1 = go.Scatter(x = melting_temperature,y = CTE, mode = 'markers',
    marker= dict(size= 14, line= dict(width=1), color=colors), # We add a size, a border and our custom colors to the markers
    text= sample # This attribute (Text) labels each point to this list, which contains our elements in the same indexes as our properties
)

data = [trace1]
fig= go.Figure(data, layout=layout1)
iplot(fig)

 * <font color=blue> **Exercise 4.** Do you find correlations between the properties plotted? If so, what are the underlying reasons for them? </font>
 * <font color=blue> **Exercise 5.** Select a different pair or properties and create a similar plot. You can insert new cells below from the top menu (Insert -> Cell below) and copy and paste the code to creat new plots.  </font>

### 4. Query from Mendeleev

Another database we can query in a similar way is Mendeleev. Mendeleev is an API (Application programming interface) dedicated library to provide access to element properties in the periodic table. Just as Pymatgen, Mendeleev also uses an object and attributes to handle a query. Mendeleev uses the **element** class (Note that is all lowercase). 

Making a query in Mendeleev can be done either by using the chemical symbol the same way Pymatgen does, or by providing the atomic number of the elements. Similarly, you can get a property by using it as an attribute for the object. Again, not all properties that you can query are listed here, but you can find them [here](https://mendeleev.readthedocs.io/en/stable/data.html). Note that there Mendeleev does not provide units when returning values, but you can find them in the previous link too. 

In this example we will query the thermal conductivity for the elements in the list "sample".

With a little bit of programming experience in Python you can again use the commented code to query all the properties listed for the "sample" elements.

In [None]:
querable_mendeleev = ["atomic_number", "atomic_volume", "boiling_point", "electron_affinity", "en_allen", "en_pauling", "econf", "evaporation_heat", "fusion_heat", "heat_of_formation",
                     "lattice_constant", "melting_point", "specific_heat", "thermal_conductivity"]
    
# You can get the same results using either of these two lists (Numbers correspond to the element's atomic number)
sample = ['Fe', 'Co', 'Ni', 'Cu', 'Zn']
#sample = [26,27,28,29,30]
    
for item in sample:    
    element_object = mendel.element(item)
    print(item, element_object.thermal_conductivity) # You can put any of the properties in the querable_mendeleev list
    
#for item in sample:
#    for i in querable_mendeleev:
#        element_object = mendel.element(item)
#        print(item, i, getattr(element_object,i))