Module polarity_index
=============================================================================== SFPPy: Approximate Correlation Between Polarity Index (P') and logP =============================================================================== Objective: Build a qualitative correlation between an approximate "polarity index" (P') and logP (octanol-water partition coefficient) values. This script demonstrates how to:
1) Fit a *quadratic* model using a small, "tuned" dataset that covers a
range of polarities from n-Hexane to Water.
2) Validate it (roughly) with an extended dataset of ~35 solvents.
3) Provide a function to invert the model and estimate P' from logP values,
for classification or for ranking in subsequent Henry-like or partition
coefficient models in SFPPy.
Disclaimer: - This correlation is qualitative and based on an intentionally small dataset, hand-tuned for a few representative solvents (non-polar to very polar). - Do not expect high accuracy. The goal is to produce a rough scale (P') that loosely tracks how "polar" or "non-polar" a compound might be. - The script demonstrates the approach rather than guaranteeing a universal model.
References & Data: - Data for Polarity Index (P') adapted from various solvent polarity tables: LSU Macromolecular Resources, HPLC Solvent Guides, and additional web resources. - logP values drawn from typical reference tables (PubChem, various compiled datasheets).
Motivation: We wish to provide a "fast" way to guess an ordering of polarity from logP, which is widely available for many chemicals (e.g., in PubChem). P' is then used within SFPPy to set or guess Henry-like coefficients or Flory–Huggins parameters in patankar.layer and patankar.food modules.
Created on Fri Feb 28 12:01:56 2025 @author: olivi (community)
Expand source code
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
===============================================================================
SFPPy: Approximate Correlation Between Polarity Index (P') and logP
===============================================================================
**Objective**:
Build a *qualitative* correlation between an approximate "polarity index"
(P') and logP (octanol-water partition coefficient) values. This script
demonstrates how to:
1) Fit a *quadratic* model using a small, "tuned" dataset that covers a
range of polarities from n-Hexane to Water.
2) Validate it (roughly) with an extended dataset of ~35 solvents.
3) Provide a function to invert the model and estimate P' from logP values,
for classification or for ranking in subsequent Henry-like or partition
coefficient models in SFPPy.
**Disclaimer**:
- This correlation is *qualitative* and based on an intentionally *small*
dataset, hand-tuned for a few representative solvents (non-polar to very
polar).
- Do **not** expect high accuracy. The goal is to produce a rough scale (P')
that loosely tracks how "polar" or "non-polar" a compound might be.
- The script demonstrates the approach rather than guaranteeing a universal
model.
**References & Data**:
- Data for Polarity Index (P') adapted from various solvent polarity tables:
LSU Macromolecular Resources, HPLC Solvent Guides, and additional web
resources.
- logP values drawn from typical reference tables (PubChem, various
compiled datasheets).
**Motivation**:
We wish to provide a "fast" way to guess an ordering of polarity from logP,
which is widely available for many chemicals (e.g., in PubChem). P' is then
used within SFPPy to set or guess Henry-like coefficients or Flory–Huggins
parameters in patankar.layer and patankar.food modules.
----------------------------------------------------------------------------
Created on Fri Feb 28 12:01:56 2025
@author: olivi (community)
----------------------------------------------------------------------------
"""
# %% Import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# %% Tuned Reference Dataset
"""
This small set of 8 solvents is carefully chosen to span from n-Hexane (non-polar)
to Water (very polar). Polarity Index (P') values are lightly 'tweaked' to get a
smooth progression. The logP values come from known literature data.
"""
solvents = [
"Water", "Methanol", "Ethanol", "Acetone", "Acetonitrile",
"Dichloromethane", "Toluene", "n-Hexane"
]
# Polarity Index (P') with minor manual adjustments (denoted +)
polarity_index = [10.2, # Water
5.1+3, # Methanol (5.1 + 3 = 8.1)
4.3+0.7, # Ethanol (4.3 + 0.7 = 5.0)
5.1+0.5, # Acetone (5.1 + 0.5 = 5.6)
5.8+1, # Acetonitrile (5.8 + 1 = 6.8)
3.1, # Dichloromethane
2.4, # Toluene
0.0 # n-Hexane
]
# logP reference data
logP_values = [-1.38, # Water
-0.77, # Methanol
-0.24, # Ethanol
-0.21, # Acetone
-0.22, # Acetonitrile
1.25, # Dichloromethane
2.73, # Toluene
3.90 # n-Hexane
]
# %% Extended Dataset for Validation
"""
This larger set (~35 solvents) is used to see if the small dataset's fit
extends (qualitatively) to other solvents. We exclude the ones already in
the small set to avoid double-counting.
"""
ext_solvents = [
"Pentane", "1,1,2-Trichlorotrifluoroethane", "Cyclopentane", "Heptane", "Hexane",
"Iso-Octane", "Petroleum Ether", "Cyclohexane", "n-Butyl Chloride", "Toluene",
"Methyl t-Butyl Ether", "o-Xylene", "Chlorobenzene", "o-Dichlorobenzene", "Ethyl Ether",
"Dichloromethane", "Ethylene Dichloride", "n-Butyl Alcohol", "Isopropyl Alcohol",
"n-Butyl Acetate", "Isobutyl Alcohol", "Methyl Isoamyl Ketone", "n-Propyl Alcohol",
"Tetrahydrofuran", "Chloroform", "Methyl Isobutyl Ketone", "Ethyl Acetate",
"Methyl n-Propyl Ketone", "Methyl Ethyl Ketone", "1,4-Dioxane", "Acetone", "Methanol",
"Pyridine", "2-Methoxyethanol", "Acetonitrile", "Propylene Carbonate", "N,N-Dimethylformamide",
"Dimethyl Acetamide", "N-Methylpyrrolidone", "Dimethyl Sulfoxide", "Water"
]
ext_polarity_index = [
0.0, 0.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 1.0, 2.4, 2.5, 2.5, 2.7, 2.7, 2.8,
3.1, 3.5, 3.9, 3.9, 4.0, 4.0, 4.0, 4.0, 4.0, 4.1, 4.2, 4.4, 4.5, 4.7, 4.8,
5.1, 5.1, 5.3, 5.5, 5.8, 6.1, 6.4, 6.5, 6.7, 7.2, 10.2
]
ext_logP_values = [
3.39, 4.30, 3.20, 4.66, 3.90, 4.50, 3.50, 3.44, 2.70, 2.73, 1.20, 3.12, 2.84, 3.38, 0.83,
1.25, 1.48, 0.88, 0.05, 1.82, 0.79, 1.98, 0.25, 0.46, 1.97, 1.31, 0.73, 1.50, 0.29, -0.27,
-0.24, -0.77, 0.65, -0.77, -0.22, -0.41, -1.01, -0.77, -0.38, -1.35, -1.38
]
# Filter out solvents that appear in the tuned dataset
validation_solvents = []
validation_polarity_index = []
validation_logP_values = []
for i, solvent in enumerate(ext_solvents):
if solvent not in solvents:
validation_solvents.append(solvent)
validation_polarity_index.append(ext_polarity_index[i])
validation_logP_values.append(ext_logP_values[i])
# Create DataFrames for easy inspection
df = pd.DataFrame({
"Solvent": solvents,
"Polarity Index (P')": polarity_index,
"logP": logP_values
}).sort_values(by="Polarity Index (P')", ascending=True)
df_validation = pd.DataFrame({
"Solvent": validation_solvents,
"Polarity Index (P')": validation_polarity_index,
"logP": validation_logP_values
}).sort_values(by="Polarity Index (P')", ascending=True)
# %% Quick Visualization of the Tuned Data
plt.figure(figsize=(8, 5))
plt.scatter(polarity_index, logP_values, color='blue', label="Reference Data (Tuned)")
for i, solvent in enumerate(solvents):
plt.annotate(solvent, (polarity_index[i], logP_values[i]),
fontsize=9, xytext=(5,5), textcoords='offset points')
plt.xlabel("Polarity Index (P')")
plt.ylabel("logP")
plt.title("Polarity Index (P') vs. logP (Tuned Dataset)")
plt.axhline(0, color='gray', linestyle='--', linewidth=0.8) # Zero line for logP
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.show()
# %% Fit a Quadratic Model
"""
We assume a model:
logP = a * (P')^2 + b * P' + c
We'll use np.polyfit() with degree=2. Then we compare it with
the extended validation set.
"""
coefficients = np.polyfit(polarity_index, logP_values, 2)
quadratic_model = np.poly1d(coefficients)
# For plotting a smooth curve:
x_range = np.linspace(min(polarity_index), max(polarity_index), 100)
y_fitted = quadratic_model(x_range)
quad_eq_str = (f"logP ≈ {coefficients[0]:.4f} * (P')² "
f"+ {coefficients[1]:.4f} * (P') "
f"+ {coefficients[2]:.4f}")
print("Fitted coefficients (a, b, c):", coefficients)
print("Quadratic equation =>", quad_eq_str)
# %% Compare with Extended Validation Dataset
plt.figure(figsize=(8, 5))
# Plot the tuned set
plt.scatter(polarity_index, logP_values, color='Crimson', label="Tuned (8 solvents)")
for i, solvent in enumerate(solvents):
plt.annotate(solvent, (polarity_index[i], logP_values[i]),
fontsize=8, xytext=(5,5), textcoords='offset points')
# Plot the validation set
plt.scatter(validation_polarity_index, validation_logP_values,
color='DeepSkyBlue', label="Validation (~35 solvents)")
for i, solvent in enumerate(validation_solvents):
plt.annotate(solvent, (validation_polarity_index[i], validation_logP_values[i]),
fontsize=6, xytext=(5,5), textcoords='offset points')
# Plot the fitted curve
plt.plot(x_range, y_fitted, color='Crimson', linestyle='--', label="Quadratic Fit")
plt.xlabel("Polarity Index (P')")
plt.ylabel("logP")
plt.title("Polarity Index (P') vs. logP\nQuadratic Model Fit & Validation")
plt.axhline(0, color='gray', linestyle='--', linewidth=0.8)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.show()
# %% Final implementation
import numpy as np
def polarity_index_from_logP(logP,
A=0.04426556625879231,
B=-0.9796466111259537,
C=4.086222276379432):
"""
Computes the polarity index (P') from a given logP value
as the positive root of the quadratic fitted equation:
logP = A * (P')² + B * P' + C
P' = (-B - sqrt(B² - 4A(C - logP))) / (2A)
Parameters:
----------
logP : float, list, or np.ndarray
The logP value(s) for which to compute the polarity index P'.
Returns:
-------
float or np.ndarray
The calculated polarity index P' corresponding to the input logP.
If logP is out of the valid range, returns:
- 10.2 for very polar solvents (beyond water)
- 0 for extremely hydrophobic solvents (beyond n-Hexane)
Example Usage:
-------------
>>> polarity_index_from_logP(-0.5)
8.34 # Example output
>>> polarity_index_from_logP([-1.0, 0.5, 2.0])
array([9.2, 4.5, 1.8]) # Example outputs
"""
# Define valid logP range based on quadratic model limits
logPmin = C - B**2 / (4*A) # ≈ -1.334 (theoretical minimum logP)
logPmax = C # 4.086 (theoretical maximum logP)
Pmax = 10.2 # value for water
def compute_P(logP_value):
"""Computes P' for a single logP value after input validation."""
if logP_value < logPmin:
return Pmax # Most polar (beyond water)
if logP_value > logPmax:
return 0.0 # Most hydrophobic (beyond n-Hexane)
discriminant = B**2 - 4*A*(C - logP_value)
sqrt_discriminant = np.sqrt(discriminant)
P2root = (-B - sqrt_discriminant) / (2*A) # Always select P2
return P2root if P2root<=Pmax else Pmax
# Handle both single and multiple values efficiently
if isinstance(logP, (list, tuple, np.ndarray)):
return np.vectorize(compute_P)(logP) # Vectorized for multiple inputs
else:
return compute_P(logP)
# %% Demonstration
print("\n=== Demonstration of polarity_index_from_logP ===")
demo_values = [-1.5, -0.5, 0.5, 5.0]
results = polarity_index_from_logP(demo_values)
for lv, r in zip(demo_values, results):
print(f"logP={lv:>5.2f} => P'={r:>5.2f}")
# %% Final Plot - Extended
plt.figure(figsize=(8, 5))
plt.scatter(ext_polarity_index, ext_logP_values, color='Teal', label="Extended Solvents")
for i, solvent in enumerate(ext_solvents):
plt.annotate(solvent, (ext_polarity_index[i], ext_logP_values[i]),
fontsize=7, xytext=(5,5), textcoords='offset points')
logP_range = np.linspace(-4, 6, 1000)
p_estimated = polarity_index_from_logP(logP_range)
plt.plot(p_estimated, logP_range, color='Crimson', linestyle='-.',
label="Inverse Quadratic Model (P'(logP))")
plt.xlabel("Polarity Index (P')")
plt.ylabel("logP")
plt.title("Approximate Mapping: P' <-> logP (Extended View)")
plt.axhline(0, color='gray', linestyle='--', linewidth=0.8)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.show()
"""
===============================================================================
Summary & Notes:
----------------
1) We fitted a simple quadratic (logP ~ a(P')² + bP' + c) to a small set
of 8 solvents. The shape is roughly correct for a wide range of polarities.
2) The extended dataset (~35 solvents) is used for a rough *visual* validation.
There's no guarantee of accuracy, especially for more exotic or borderline
solvents.
3) The final function `polarity_index_from_logP()` is the main deliverable.
- It clamps P' between 0.0 (extremely non-polar) and 10.2 (extremely polar).
- Perfect for a quick rank-based approach or a first-guess at how "polar"
an unknown solvent might be relative to water, methanol, or hexane.
4) Additional disclaimers:
- The model is purely *empirical*.
- Use at your own risk for approximate classification or quick bounding.
===============================================================================
"""
Global variables
var logP_values
-
This larger set (~35 solvents) is used to see if the small dataset's fit extends (qualitatively) to other solvents. We exclude the ones already in the small set to avoid double-counting.
Functions
def polarity_index_from_logP(logP, A=0.04426556625879231, B=-0.9796466111259537, C=4.086222276379432)
-
Computes the polarity index (P') from a given logP value as the positive root of the quadratic fitted equation:
logP = A * (P')² + B * P' + C P' = (-B - sqrt(B² - 4A(C - logP))) / (2A)
Parameters:
logP : float, list, or np.ndarray The logP value(s) for which to compute the polarity index P'.
Returns:
float or np.ndarray The calculated polarity index P' corresponding to the input logP. If logP is out of the valid range, returns: - 10.2 for very polar solvents (beyond water) - 0 for extremely hydrophobic solvents (beyond n-Hexane)
Example Usage:
>>> polarity_index_from_logP(-0.5) 8.34 # Example output
>>> polarity_index_from_logP([-1.0, 0.5, 2.0]) array([9.2, 4.5, 1.8]) # Example outputs
Expand source code
def polarity_index_from_logP(logP, A=0.04426556625879231, B=-0.9796466111259537, C=4.086222276379432): """ Computes the polarity index (P') from a given logP value as the positive root of the quadratic fitted equation: logP = A * (P')² + B * P' + C P' = (-B - sqrt(B² - 4A(C - logP))) / (2A) Parameters: ---------- logP : float, list, or np.ndarray The logP value(s) for which to compute the polarity index P'. Returns: ------- float or np.ndarray The calculated polarity index P' corresponding to the input logP. If logP is out of the valid range, returns: - 10.2 for very polar solvents (beyond water) - 0 for extremely hydrophobic solvents (beyond n-Hexane) Example Usage: ------------- >>> polarity_index_from_logP(-0.5) 8.34 # Example output >>> polarity_index_from_logP([-1.0, 0.5, 2.0]) array([9.2, 4.5, 1.8]) # Example outputs """ # Define valid logP range based on quadratic model limits logPmin = C - B**2 / (4*A) # ≈ -1.334 (theoretical minimum logP) logPmax = C # 4.086 (theoretical maximum logP) Pmax = 10.2 # value for water def compute_P(logP_value): """Computes P' for a single logP value after input validation.""" if logP_value < logPmin: return Pmax # Most polar (beyond water) if logP_value > logPmax: return 0.0 # Most hydrophobic (beyond n-Hexane) discriminant = B**2 - 4*A*(C - logP_value) sqrt_discriminant = np.sqrt(discriminant) P2root = (-B - sqrt_discriminant) / (2*A) # Always select P2 return P2root if P2root<=Pmax else Pmax # Handle both single and multiple values efficiently if isinstance(logP, (list, tuple, np.ndarray)): return np.vectorize(compute_P)(logP) # Vectorized for multiple inputs else: return compute_P(logP)