molmass
is a Python package designed for chemists and researchers. It provides tools for calculating the molar mass, mass distribution, and isotopic patterns of chemical substances. Developed by Christoph Gohlke, this package is particularly useful for those involved in analytical chemistry, molecular modeling, and educational purposes.
Key Features of molmass Python package
- Molar Mass Calculation: Quickly compute the molar mass of a chemical compound from its formula.
- Formula Analysis: Break down chemical formulas to understand the composition of molecules, including atom count and percentage by mass.
- Isotopic Pattern Simulation: Generate isotopic distribution patterns, which are crucial for interpreting mass spectrometry data.
- Interactive web application: For users less comfortable with coding,
molmass
provides both graphical and user interface for easy access to its features.
Benefits for Chemists and Data Analysts
- Efficiency in Data Analysis: Automate the calculation of molar masses and isotopic patterns, saving time in data processing.
- Accuracy: Reduces the risk of human error in manual calculations.
- Integration with Other Python Tools: Can be combined with other Python libraries for more comprehensive data analysis and visualization tasks.
- Educational Resource: A great tool for teaching molecular mass concepts and isotopic distribution.
Getting Started with Molmass
Installation
molmass
can be installed via pip:
pip install molmass
However, the above command will only install the core package. To install the interactive web application and also pandas
to have access to tabular data, use the following command instead:
pip install -U molmass[all]
This command will install the molmass
package and all dependencies.
It is recommended to install molmass
in a virtual environment. For more information on how to create and manage virtual environments, see Python’s official documentation.
Dictionary of elements
You can have quick access to physicochemical and descriptive properties of chemical elements using the ELEMENTS
dictionary.
from molmass import ELEMENTS
hydrogen = ELEMENTS["H"]
hydrogen
Element( 1, 'H', 'Hydrogen', group=1, period=1, block='s', series=1, mass=1.007941, eleneg=2.2, eleaffin=0.75420375, covrad=0.32, atmrad=0.79, vdwrad=1.2, tboil=20.28, tmelt=13.81, density=0.084, eleconfig='1s', oxistates='1*, -1', ionenergy=(13.5984,), isotopes={ 1: Isotope(1.00782503223, 0.999885, 1), 2: Isotope(2.01410177812, 0.000115, 2), }, )
See that each element is represented by an instance of an Element
class. This way, each property can be accessed as an attribute of the element:
hydrogen.protons
1
hydrogen.isotopes
{1: Isotope(mass=1.00782503223, abundance=0.999885, massnumber=1, charge=0), 2: Isotope(mass=2.01410177812, abundance=0.000115, massnumber=2, charge=0)}
For heavier elements, the package can be a useful resource of electron configuration data, both in the condensed and the orbital notation:
silver = ELEMENTS["Ag"]
silver
Element( 47, 'Ag', 'Silver', group=11, period=5, block='d', series=8, mass=107.8682, eleneg=1.93, eleaffin=1.30447, covrad=1.34, atmrad=1.75, vdwrad=1.72, tboil=2436.0, tmelt=1235.1, density=10.49, eleconfig='[Kr] 4d10 5s', oxistates='2, 1*', ionenergy=(7.5762, 21.49, 34.83), isotopes={ 107: Isotope(106.9050916, 0.51839, 107), 109: Isotope(108.9047553, 0.48161, 109), }, )
silver.eleconfig
'[Kr] 4d10 5s'
silver.eleconfig_dict
{(1, 's'): 2, (2, 's'): 2, (2, 'p'): 6, (3, 's'): 2, (3, 'p'): 6, (3, 'd'): 10, (4, 's'): 2, (4, 'p'): 6, (4, 'd'): 10, (5, 's'): 1}
Class Formula
One of most impressive features of molmass
is the Formula
class. It allows you to analyze chemical formulas and calculate their molar mass, mass distribution, and isotopic patterns. Let’s see how it works.
from molmass import Formula
water = Formula("H2O")
water
Formula('H2O')
Checking the number of atoms in a formula is easy:
water.atoms
3
As well the charge:
water.charge
0
For mass, we have three possibilities:
mass
: the average relative molecular mass considering the natural abundance of isotopes. Equals the molar mass in g/mol.monoisotopic_mass
: the mass of the molecule considering only the most abundant isotope of each element.nominal_mass
: the number of protons and neutrons in the isotope composed of the most abundant elemental isotopes.
water.mass
18.015287
water.monoisotopic_mass
18.01056468403
water.nominal_mass
18
We can have the elemental composition of a formula using the composition
method:
water.composition()
print(water.composition())
Element Count Relative mass Fraction % H 2 2.015882 11.1898 O 1 15.999405 88.8102
But it is more useful in a table format, that can be obtained using a Pandas dataframe:
water.composition().dataframe()
Count | Relative mass | Fraction | |
---|---|---|---|
Element | |||
H | 2 | 2.015882 | 0.111898 |
O | 1 | 15.999405 | 0.888102 |
Isotopes
One thing that makes molmass
stand out is its ability to calculate isotopic patterns. This is particularly useful for mass spectrometry data analysis. First, let’s see how the package deals with isotopes, even recognizing the most common isotopes of each element by their symbols:
# deuterated water (heavy water)
d_water = Formula("D2O")
d_water
Formula('[2H]2O')
d_water.mass
20.02760855624
d_water.composition().dataframe()
Count | Relative mass | Fraction | |
---|---|---|---|
Element | |||
2H | 2 | 4.028204 | 0.201133 |
O | 1 | 15.999405 | 0.798867 |
# C13 CO2
carbon13_co2 = Formula("[13C]O2")
carbon13_co2
Formula('[13C]O2')
carbon13_co2.mass
45.00216483507
Isotopic Distribution
The mass distribution spectrum of a molecule is an essential concept in chemistry, particularly in mass spectrometry. It provides insights into the isotopic composition of a molecule and how these isotopes contribute to the molecule’s overall mass.
Each element consists of isotopes, which are atoms with the same number of protons but different numbers of neutrons. This difference in neutrons results in isotopes of the same element having slightly different masses. When you analyze a molecule, these isotopic variations lead to a distribution of possible mass values rather than a single, definitive mass.
Consider a water molecule, which is composed of two hydrogen atoms and one oxygen atom. The hydrogen atom can exist as either protium (1H) or deuterium (2H), and oxygen can have isotopes like 16O, 17O, and 18O. The mass spectrum of water would show peaks corresponding to these isotopic combinations, such as H2[16O], H2[17O], and so on. Each peak represents a different isotopic version of water, contributing to the overall mass spectrum of the molecule.
For a given formula, we can calculate the isotopic distribution using the spectrum
method. Let’s see how it works for the previously define molecular formula of water:
water.spectrum().dataframe()
Relative mass | Fraction | Intensity % | m/z | |
---|---|---|---|---|
Mass number | ||||
18 | 18.010565 | 9.973406e-01 | 1.000000e+02 | 18.010565 |
19 | 19.015557 | 6.093273e-04 | 6.109521e-02 | 19.015557 |
20 | 20.014810 | 2.049629e-03 | 2.055094e-01 | 20.014810 |
21 | 21.021086 | 4.714508e-07 | 4.727079e-05 | 21.021086 |
22 | 22.027363 | 2.711125e-11 | 2.718354e-09 | 22.027363 |
We see that the most abundant isotopic combination is the one with the lowest mass, which is the one with the most abundant isotopes of each element. The other peaks are due to the presence of heavier isotopes of hydrogen and oxygen.
Sometimes we don’t want to see all the peaks, but only the most abundant ones. We can do this using the min_intensity
parameter:
water.spectrum(min_intensity=0.01).dataframe()
Relative mass | Fraction | Intensity % | m/z | |
---|---|---|---|---|
Mass number | ||||
18 | 18.010565 | 0.997341 | 100.000000 | 18.010565 |
19 | 19.015557 | 0.000609 | 0.061095 | 19.015557 |
20 | 20.014810 | 0.002050 | 0.205509 | 20.014810 |
chlorobenzene = Formula("C6H5Cl")
chlorobenzene
Formula('C6H5Cl')
chlorobenzene.mass
112.557045
chlorobenzene.composition().dataframe()
Count | Relative mass | Fraction | |
---|---|---|---|
Element | |||
C | 6 | 72.064440 | 0.640248 |
H | 5 | 5.039705 | 0.044775 |
Cl | 1 | 35.452900 | 0.314977 |
df_chlorobenzene = chlorobenzene.spectrum(min_intensity=0.01).dataframe()
df_chlorobenzene
Relative mass | Fraction | Intensity % | m/z | |
---|---|---|---|---|
Mass number | ||||
112 | 112.007978 | 0.709836 | 100.000000 | 112.007978 |
113 | 113.011358 | 0.046473 | 6.546944 | 113.011358 |
114 | 114.005082 | 0.228390 | 32.174991 | 114.005082 |
115 | 115.008420 | 0.014888 | 2.097378 | 115.008420 |
116 | 116.011802 | 0.000407 | 0.057363 | 116.011802 |
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.vlines(x="m/z", ymin=0, ymax="Intensity %", data=df_chlorobenzene)
ax.set_xlim((min(df_chlorobenzene["m/z"]) - 1, max(df_chlorobenzene["m/z"]) + 1))
ax.set_xlabel("m/z")
ax.set_ylabel("Intensity %")
plt.show()
The m/z ratio, often encountered in mass spectrometry, is a fundamental concept used for analyzing and identifying substances based on their mass and charge characteristics.
-
“m” Refers to Mass: In the m/z ratio, “m” stands for the mass of the ion. It’s important to note that this mass is the atomic mass unit (amu) of the ion, not the molecular mass of the original compound. This distinction is critical because, during mass spectrometry, molecules are often fragmented into ions.
-
“z” Refers to Charge: The “z” in the ratio represents the charge number of the ion. It’s the number of protons more or less than electrons, giving the ion its positive or negative charge. In most mass spectrometry applications, the ions are single-charged, meaning z is usually ±1. However, multiply-charged ions can also occur.
-
Separation of Ions: In mass spectrometry, ions are separated based on their m/z ratio. Since the technique involves ionizing the sample, different fragments or ions have different m/z ratios, allowing for their separation and detection.
-
Identification of Compounds: By analyzing the m/z ratios of the ions produced, chemists can deduce the molecular structure of the original compound. Each m/z value can correspond to a specific fragment of the molecule, providing clues to its structure.
-
Quantitative Analysis: The intensity of the signal at each m/z ratio can be used to determine the concentration of specific ions in the sample, making mass spectrometry a valuable tool for quantitative analysis.
We can define a new chlorobenzene formula, now with a charge of 2+:
chlorobenzene_ion = Formula("[C6H5Cl]2+")
chlorobenzene_ion
Formula('[C6H5Cl]2+')
chlorobenzene_ion.spectrum(min_intensity=0.01).dataframe()
Relative mass | Fraction | Intensity % | m/z | |
---|---|---|---|---|
Mass number | ||||
112 | 112.006881 | 0.709836 | 100.000000 | 56.003440 |
113 | 113.010261 | 0.046473 | 6.546944 | 56.505131 |
114 | 114.003985 | 0.228390 | 32.174991 | 57.001992 |
115 | 115.007323 | 0.014888 | 2.097378 | 57.503662 |
116 | 116.010705 | 0.000407 | 0.057363 | 58.005353 |
See that the m/z
column now changes to reflect the charge of the molecule.
Abbreviations
We have already seen that molmass
recognized D
as the symbol for deuterium. But there are other abbreviations that can be used to represent radicals, such as Me
for methyl, and Et
for ethyl. Let’s see how molmass
deals with these abbreviations.
We can define ethanol by its usual formula:
ethanol = Formula("C2H5OH")
ethanol
Formula('C2H5OH')
And also by using the abbreviation for ethyl:
ethanol = Formula("EtOH")
ethanol
Formula('(C2H5)OH')
Empirical Formula
The empirical formula of a compound is a simple expression of the relative number of each type of atom in it. It’s the simplest integer ratio of the elements present in the compound, reflecting the composition in terms of the smallest whole numbers. The empirical formula doesn’t necessarily represent the exact numbers of atoms found in a molecule of the compound (that’s the molecular formula), but rather the simplest whole-number ratio between the elements.
- Representation of Composition: The empirical formula shows the simplest ratio of the different atoms in a compound. For example, in glucose (C6H12O6), the empirical formula is CH2O, representing a 1:2:1 ratio of carbon to hydrogen to oxygen.
- Derivation from Mass Percentages: It can be derived from the mass percentages of each element in a compound. This is a common way to determine empirical formulas in a laboratory setting.
- Use in Stoichiometry: Empirical formulas are useful in stoichiometry for calculating reactant and product quantities in chemical reactions.
Examples of Empirical Formulas
- Water (H2O): The empirical formula is H2O, indicating a 2:1 ratio of hydrogen to oxygen atoms.
- Ethylene (C2H4): The empirical formula is CH2, which represents the simplest whole-number ratio of carbon to hydrogen atoms in the molecule.
- Benzene (C6H6): Despite having six carbon and six hydrogen atoms, the empirical formula is CH, as it reflects the 1:1 ratio of carbon to hydrogen atoms.
- Acetic Acid (C2H4O2): Although the molecular formula is C2H4O2, the empirical formula is CH2O, showing the simplest ratio of 1 carbon atom to 2 hydrogen atoms to 1 oxygen atom.
In chemistry, the empirical formula is a fundamental concept used to understand the basic composition of a compound, especially useful when the molecular formula is complex or when only compositional data is available.
molmass
can calculate the empirical formula of a compound using the empirical
method:
acetic_acid = Formula("CH3COOH")
acetic_acid
Formula('CH3COOH')
acetic_acid.empirical
'CH2O'
Ions
As well as neutral molecules, molmass
can deal with ions. Let’s see how it works.
sulphate = Formula("[SO4]2-")
sulphate
Formula('[SO4]2-')
sulphate.charge
-2
Hydrates
It is possible to parse hydrates with Formula
class.
copper_sulphate_hydrated = Formula("CuSO4.5H2O")
copper_sulphate_hydrated
Formula('CuSO4(H2O)5')
copper_sulphate_hydrated.composition().dataframe()
Count | Relative mass | Fraction | |
---|---|---|---|
Element | |||
Cu | 1 | 63.546000 | 0.254505 |
H | 10 | 10.079410 | 0.040369 |
O | 9 | 143.994645 | 0.576706 |
S | 1 | 32.064800 | 0.128421 |
Nucleotides
The common abbreviations for nucleotides are recognized by molmass
:
nucleotides = Formula("ATCG")
nucleotides
Formula('((C10H12N5O5P)(C9H12N3O6P)(C10H12N5O6P)(C10H13N2O7P)H2O)')
nucleotides.mass
1253.804568992
nucleotides.composition().dataframe()
Count | Relative mass | Fraction | |
---|---|---|---|
Element | |||
C | 39 | 468.418860 | 0.373598 |
H | 51 | 51.404991 | 0.040999 |
N | 15 | 210.100545 | 0.167570 |
O | 25 | 399.985125 | 0.319017 |
P | 4 | 123.895048 | 0.098815 |
Other features
from molmass import ELECTRON, PROTON, NEUTRON
ELECTRON
Particle(name='Electron', mass=0.000548579909065, charge=-1.602176634e-19)
ELECTRON.mass # relative mass of electron in atomic mass units (u)
0.000548579909065
PROTON
Particle(name='Proton', mass=1.007276466621, charge=1.602176634e-19)
NEUTRON
Particle(name='Neutron', mass=1.00866491595, charge=0.0)
Flask web application
You can launch the web application using the following command:
python -m molmass.web
A web browser will open with the application running. You can also access it by typing http://127.0.0.1:5001
in your browser. Pass a formula in the input field and click the “Submit” button to see the results.
Conclusion
molmass
is a versatile tool that bridges chemistry and data analysis. Its ease of use, combined with powerful features, makes it a useful tool for chemists, educators, and data analysts in the field of chemistry.
Further Resources
- Molmass GitHub Repository: Molmass on GitHub
- More Python posts: Python posts here on Chemistry Programming