7.2
CiteScore
3.7
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
7.2
CiteScore
3.7
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
ABUNDANCE ESTIMATION IN AN ARID ENVIRONMENT
Case Study
Correspondence
Corrigendum
Editorial
Full Length Article
Invited review
Letter to the Editor
Original Article
Retraction notice
REVIEW
Review Article
SHORT COMMUNICATION
Short review
View/Download PDF

Translate this page into:

Full Length Article
12 2024
:36;
103541
doi:
10.1016/j.jksus.2024.103541

Optimizing structure-property models of three general graphical indices for thermodynamic properties of benzenoid hydrocarbons

Department of Mathematics, Science Faculty, King Abdulaziz University, Jeddah 21589, Saudi Arabia
Mathematical Sciences, Faculty of Science, Universiti Brunei Darussalam, Jln Tungku Link, Gadong BE1410, Brunei Darussalam

⁎Corresponding author. sakander1566@gmail.com (Sakander Hayat) sakander.hayat@ubd.edu.bn (Sakander Hayat)

Disclaimer:
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.
Financial Disclosure: This project was funded by the KAU Endowment (WAQF) at King Abdulaziz University, Jeddah, under grant no. (WAQF: 56-247-2024). The authors Sacknowledge with thanks WAQF and Deanship of Scientific Research (DSR) for technical and financial support. The authors acknowledge Prof. Nuha Wazzan from Chemistry department at King Abdulaziz University for her contribution with the DFT calculations and King Abdulaziz Universityís HighPerformance Computing Centre (Aziz Supercomputer) (http://hpc.kau.edu.sa) for supporting the computation for the work described in this paper.

Abstract

Cheminformatics is an interdisciplinary field that combines principles of chemistry, computer science, and information technology to process, store, analyze, and interpret chemical data. One area of cheminformatics is quantitative structure–property relationship (QSPR) modeling which is a computational approach that correlates the structural attributes of chemical compounds with their physical, chemical, or biological properties to predict the behavior and characteristics of new or untested compounds. Structure descriptors deliver contemporary mathematical tools required for QSPR modeling. One of a significant class of such descriptors is graph-based descriptors known as graphical descriptors. A degree-based graphical descriptor/invariant of a υ -vertex graph Ω = ( V Ω , E Ω ) has a general structure G D d = i j E Ω π deg x i , deg x j , where π is bivariate symmetric map, and deg x i is the degree of vertex x i V Ω . For α R { 0 } , if π = ( deg x i × deg x j ) α (resp. π = ( deg x i + deg x j ) α , then G D d is called the general product-connectivity P C α (resp. sum-connectivity S C α ) index of Ω . Moreover, the general Sombor index S O α has the structure π = ( deg x i 2 × deg x j 2 ) α . By choosing the heat capacity Δ H and the entropy E as representatives of thermodynamic properties, we in this paper find optimal value(s) of α which deliver the strongest potential of the predictors G D d { P C α , S C α , S O α } for predicting Δ H and E of benzenoid hydrocarbons. In order to achieve this, we employ tools such as discrete optimization and multivariate regression analysis. This, in turn, study completely solves two open problems proposed in the literature.

Keywords

05C92
05C90
05C09
Mathematical chemistry
QSPR modeling
Discrete optimization
Multivariate regression analysis
Benzenoid hydrocarbon
Thermodynamic property
PubMed

Data availability

The data generated related to this study is available in a public repository on GitHub https://github.com/Sakander/Predictive_Potential_General_Indices.git.

1

1 Introduction

Cheminformatics employs quantitative structure–property relationship (QSPR) studies (Katritzky et al., 2001) in order to estimate various thermodynamical and physicochemical characteristics of molecular compounds especially, organic structures. QSPR modeling utilizes contemporary mathematical and computational tools (Basak and Mills, 2001) in order to predict these properties. The historical root of this chemical modeling dates back to the pioneering of Wiener (1947) which provides the notion of a path number (the sum of pairwise distance) in estimation of boiling point of alkanes. Later, researchers named this invariant the Wiener index of graphs. Structure-based molecular descriptors (Gutman and Furtula, 2010) provide the contemporary mathematical tools required for QSPR modeling. Graph-related molecular descriptors also known as graphical invariants or topological indices (Balaban et al., 1983) deliver one of the extensively studied family of descriptors. Graphical invariants take up hydrogen-disregarded chemical structure (also known as a molecular/chemical graph) as input and transform it into a non-zero mathematical real number. These molecular graphs (Gutman and Polansky, 1986) are generated by constructing a correspondence between edges (resp. vertices) and bonds (resp. atoms). In order to effectively estimate a given physicochemical property like heat of formation (Allison and Burgess, 2015) and boiling point, graphical invariants propose a regression equation (Diudea et al., 2001) incorporating underlying chemical information of a compound by characterizing its structure. Wazzan and Ahmed (2024b) employed eccentric neighborhood forgotten indices for prediction of boiling point. Moreover, domination-based (Wazzan and Ahmed, 2024a) (resp. symmetry-adapted domination-based (Wazzan and Ahmed, 2023)) topological indices were employed for their role in QSPR studies of isomeric octanes.

A graphical invariant could be degree-related (Gutman, 2013) (based on vertices’ degrees), distance-based (Xu et al., 2014) (defined on distances), spectral (Consonni and Todeschini, 2008) (basing on eigenvalues of graphical matrices) and counting-related (Hosoya, 1988) polynomial and invariants (obtained by counting certain substructures). New graphical invariants are being introduced (Todeschini and Consonni, 2009) every passing day and sometimes without delivering a significant chemical applicability (Gutman and Furtula, 2010). To encounter the proliferation of these invariants, a firm criterion must be adopted in putting forwarding new descriptors. It is unfortunate that frequently these insignificant molecular descriptors are graphical. Gutman and Tošović (2013) used a mild phrase asserting that not following a firm criterion result in proliferation of these invariants and currently there are a lot more graphical invariant than there should be. These facts deliver a strong motivation for considering new emerging families of graphical descriptors to test their quality in structure–property modeling to put forward efficient descriptors, while singling out inefficient ones.

One of the contemporary research topics in mathematical chemistry nowadays is to consider a family of graphical invariants and conduct a comparative testing for predicting physicochemial/theromodynamical properties. The study was initiated Gutman and Tošović (2013) who considered commonly occurring degree-related graphical invariants for estimating physicochemical characteristics of octanes’ isomers and showed that the augmented Zagreb index (AZI) is the only degree-based invariant which qualifies to be considered for QSPR modeling. The study was extended to the thermodynamic properties (by opting the heat capacity Δ H and entropy E as their representatives) of benzenoid hydrocarbons (BHs) by Hayat et al. (2023). Note that they selected the lower 30 initial member of BHs as test molecules of the study. Hayat et al. (2024) further extended the similar study to temperature-based graphical descriptors. For structure–property modeling of lead sulphide, we refer to Lal et al. (2024b). Computational results on graph entropies and degree-based graphical indices are reported in Lal et al. (2024a). Other topics such as vertex-edge resolvability and face index of chemical structure are investigated in Negi and Bhat (2024) and Sharma et al. (2024). Applications of degree-based indices in fuzzy graphs are studied in Islam et al. (2024), Islam and Pal (2021, 2024a) and Islam and Pal (2024b).

In existing studied by Gutman and Tošović (2013) and later by Hayat et al. (2023), the general sum-connectivity S C α index and the general product-connectivity P C α were considered only for test values α { ± 1 , ± 1 2 , ± 2 , ± 3 } . Since Gutman and Tošović (2013) conducted their comparative testing for physicochemical properties, their results are irrelevant to the current study. However, Hayat et al. (2023) conducted their testing for thermodynamic properties of BHs and concluded that S C α with α = 3 and P C α with α = 1 , 1 2 are the best three degree-based invariants for predicting thermodynamic properties of BHs. Thus, they concluded their study by naturally asking the following two questions:

Problem 1.1

Find the optimal value(s) of α R { 0 } for which the correlation value between Δ H , E and S C α for the lower 30 BHs is the strongest.

Problem 1.2

Find the optimal value(s) of α R { 0 } for which the correlation value between Δ H , E and P C α for the lower 30 BHs is the strongest.

This paper intends to employ discrete optimization and multivariate regression analysis to answer both of the above problems. In addition, we also study the above two problems for the general Sombor index.

2

2 Preliminaries

A graph Ω is a pair ( V Ω , E Ω ) in which V Ω is the vertex set and E Ω V Ω 2 is the edge set. The valency/degree deg x of a vertex x V Ω is defined as deg x = { z V Ω : x z E Ω } . A degree-based graphical descriptor/invariant of a υ -vertex graph Ω = ( V Ω , E Ω ) has a general structure:

(2.1)
G D d = i j E Ω π deg x i , deg x j , where π is bivariate symmetric map (i.e. π ( x , y ) = π ( y , x ) ), and deg x i is the degree of vertex i V Ω .

Having π ( deg x i , deg x j ) = 1 deg x i × deg x j , the product-connectivity index was proposed by Randić (1975). It has been known as one of earliest degree-based index. It is defined as:

(2.2)
P C ( Ω ) = i j E Ω 1 deg x i × deg x j . Independent of its connection to the product-connectivity index, Bollobás and Erdös (1998) delivered the generalized version of P C index.
(2.3)
P C α ( Ω ) = i j E Ω deg x i × deg x j α ,
where α R { 0 } . One can observe that P C 1 2 ( Ω ) = P C ( Ω ) , for an arbitrary graph Ω .

The additive version of P C index called the sum-connectivity S C index was proposed by Zhou and Trinajstić (2009) in 2009. Mathematically, it has π ( deg x i , deg x j ) = 1 deg x i + deg x j .

(2.4)
S C ( Ω ) = i j E Ω 1 deg x i + deg x j . Diverse applicability of S C index across different disciplines motivated (Zhou and Trinajstić, 2010) to introduce the generalized version of the sum-connectivity index symbolized as S C α , where α R { 0 } . Thus, we have G D d = S C α , if π ( deg x i , deg x j ) = deg x i + deg x j α .
(2.5)
S C α ( Ω ) = i j E Ω deg x i + deg x j α .
Notice that S C 1 2 ( Ω ) = S C ( Ω ) , for an arbitrary graph Ω .

By considering π ( deg x i , deg x j ) = 1 deg x i 2 + deg x j 2 , Gutman (2021) recently put forwarded another degree-based graphical descriptor known as the Sombor S O index.

(2.6)
S O ( Ω ) = i j E Ω 1 deg x i 2 + deg x j 2 . There has been numerously papers published on the mathematical properties as well as chemical applicability of the Sombor index. This delivered motivation to Phanjoubam et al. (2023) to consider the generalized version of the Sombor index by considering π ( deg x i , deg x j ) = deg x i 2 + deg x j 2 α in the standard formula of G D d .
(2.7)
S O α ( Ω ) = i j E Ω deg x i 2 + deg x j 2 α ,
where α R { 0 } . One can notice that S O 1 2 ( Ω ) = S O ( Ω ) , giving the name “general” to this version of the Sombor index.

Discrete optimization is a branch of optimization in applied mathematics and operations research that deals with finding the best solution from a finite or countable set of possible solutions. Unlike continuous optimization, where variables can take any value within a range, discrete optimization restricts variables to discrete values, often integers or elements from a specific set.

A general discrete optimization problem can be formulated as follows: max / min f ( x ) subject to x S Z n or x S { 0 , 1 } n , where:

  • f ( x ) : S R is the objective function, which we aim to maximize or minimize.

  • x = ( x 1 , x 2 , , x n ) is a vector of decision variables.

  • S Z n (or, sometimes x S { 0 , 1 } n ) represents the feasible set defined by constraints, which limits x to discrete values, such as integers or binary values.

Multivariate regression analysis is a statistical technique used to model the relationship between multiple independent (predictor) variables and multiple dependent (response) variables. Unlike simple or multiple regression, which typically models a single dependent variable, multivariate regression allows for multiple outcomes to be analyzed simultaneously, capturing any correlations among them.

Let:

  • Y R n × m : the matrix of dependent variables, where n is the number of observations (samples) and m is the number of dependent variables.

  • X R n × p : the matrix of independent variables, where p is the number of independent variables.

  • B R p × m : the matrix of regression coefficients, with each column B j representing the coefficients for the j th dependent variable.

  • E R n × m : the matrix of error terms or residuals.

The model for multivariate regression can be written as: Y = X B + E , where:

  • Y i , j represents the i th observation of the j th dependent variable.

  • X i , k represents the i th observation of the k th independent variable.

  • B k , j represents the effect of the k th independent variable on the j th dependent variable.

  • E i , j captures the residuals for each observation and each dependent variable.

3

3 Materials and methods

Note that a BH generally belongs to the class of benzenoid system (BS). A BS (having BHs as a subclass) is a connected finite graph comprising no cut-vertices having internal faces encompassed by regular hexagon with unit sides. Fig. 1 delivers a benzenoid system L .

The degree sequence in a graph Ω is ( deg x 1 , deg x 2 , , deg x υ ) having vertex sequencing x 1 , x υ , x i V Ω . On the perimeter of L in Fig. 1, there exists different paths with degree-sequence (2,3,3,2), (2,3,2), (2,3,3,3,2), and (2,3,3,3,3,2) called bay, fissure, cove and fjord, respectively. Altogether, they are collectively called inlets. Let υ a b = { x z E Ω : deg x = a , deg z = b } | .

Assume a BS comprises υ vertices, η hexagons and τ inlets. Cruz et al. (2013) proved:

Instances of a fjord, cove, fissure and a bay in a BS.
Fig. 1
Instances of a fjord, cove, fissure and a bay in a BS.

Lemma 3.1

Suppose L is a BS with υ vertices, τ inlets and η hexagons. Then, υ 33 = 3 η τ 3 , υ 23 = 2 τ , υ 22 = υ 2 η τ + 2 .

Employing Lemma 3.1 on an arbitrary BS comprising υ vertices, τ inlets and η hexagons, one can calculate S C α , P C α and S O α as follows:

(3.8)
S C α = i j E Ω deg x i + deg x j α , = υ 33 ( 3 + 3 ) α + υ 23 ( 2 + 3 ) α + α 22 ( 2 + 2 ) α , = 6 α ( 3 η τ 3 ) + 5 α ( 2 τ ) + 4 α ( υ 2 η τ + 2 ) .
(3.9)
P C α = i j E Ω deg x i × deg x j α , = υ 33 ( 3 × 3 ) α + υ 23 ( 2 × 3 ) α + α 22 ( 2 × 2 ) α , = 9 α ( 3 η τ 3 ) + 6 α ( 2 τ ) + 4 α ( υ 2 η τ + 2 ) .
And, similarly for the general Sombor index, we have:
(3.10)
S O α = i j E Ω deg x i 2 + deg x j 2 α , = υ 33 ( 3 2 + 3 2 ) α + υ 23 ( 2 2 + 3 2 ) α + α 22 ( 2 2 + 2 2 ) α , = 1 8 α ( 3 η τ 3 ) + 1 3 α ( 2 τ ) + 8 α ( υ 2 η τ + 2 ) .

In sections that immediately follow, we evaluate S C α , P C α and S O α for the 30 BHs (chosen test molecules) by utilizing Eqs. (3.8), (3.9), and (3.10) respectively.

4

4 Optimization problem and algorithm

Following Hayat et al. (2023), we consider the heat capacity Δ H and the entropy E to be the representatives of thermodynamic properties of a chemical compound. Moreover, we choose the 30 lower benzenoid hydrocarbons (BHs) as our test molecules. Fig. 2 delivers the lower 30 BHs. Table 1 presents the heat capacity Δ H , the entropy E , the general Randić index R α , the general sum-connectivity index S C I α , and the general Sombor index S O α of 30 lower BHs.

Let R ( α ) = R α ( Y , X ) be the correlation function between Y { Δ H , E } and X { R α , S C I α , S O α } . Then, we formulate the following optimization problem:

(4.11)
min α | R α ( Y , X ) | s.t. 0 | R ( α ) | 1 α min < α < α max
The 30 lower BHs.
Fig. 2
The 30 lower BHs.
Table 1 The molecular structure, heat capacity Δ H , entropy E , general Randić index R α , general sum-connectivity index S C I α , and general Sombor index of 30 lower benzenoid hydrocarbons.
Molecule Δ H E R α S C I α S O α
Benzene 83.019 269.722 6 4 α 6 4 α 6 8 α
Naphthalene 133.325 334.155 6 4 α + 4 6 α + 9 α 6 4 α + 4 5 α + 6 α 6 8 α + 4 1 3 α + 1 8 α
Anthracene 184.194 389.475 6 4 α + 8 6 α + 2 9 α 6 4 α + 8 5 α + 2 6 α 6 8 α + 8 1 3 α + 2 1 8 α
Phenanthrene 183.654 395.882 7 4 α + 6 6 α + 3 9 α 7 4 α + 6 5 α + 3 6 α 7 8 α + 6 1 3 α + 3 1 8 α
Tetracene 235.165 444.724 6 4 α + 12 6 α + 3 9 α 6 4 α + 12 5 α + 3 6 α 6 8 α + 12 1 3 α + 3 1 8 α
Benzo[c]phenanthrene 233.497 447.437 8 4 α + 8 6 α + 5 9 α 8 4 α + 8 5 α + 5 6 α 8 8 α + 8 1 3 α + 5 1 8 α
Benzo[a]phenanthrene 234.568 457.958 7 4 α + 10 6 α + 4 9 α 7 4 α + 10 5 α + 4 6 α 7 8 α + 10 1 3 α + 4 1 8 α
Chrysene 234.638 455.839 8 4 α + 8 6 α + 5 9 α 8 4 α + 8 5 α + 5 6 α 8 8 α + 8 1 3 α + 5 1 8 α
Triphenylene 233.558 450.418 9 4 α + 6 6 α + 6 9 α 9 4 α + 6 5 α + 6 6 α 9 8 α + 6 1 3 α + 6 1 8 α
Pyrene 200.815 399.491 6 4 α + 8 6 α + 5 9 α 6 4 α + 8 5 α + 5 6 α 6 8 α + 8 1 3 α + 5 1 8 α
Pentacene 286.182 499.831 6 4 α + 16 6 α + 4 9 α 6 4 α + 16 5 α + 4 6 α 6 8 α + 16 1 3 α + 4 1 8 α
Benzo[a]tetracene 285.056 513.857 7 4 α + 14 6 α + 5 9 α 7 4 α + 14 5 α + 5 6 α 7 8 α + 14 1 3 α + 5 1 8 α
Dibenzo[a,h]anthracene 284.037 508.537 8 4 α + 12 6 α + 6 9 α 8 4 α + 12 5 α + 6 6 α 8 8 α + 12 1 3 α + 6 1 8 α
Dibenzo[a,j]anthracene 284.088 507.395 8 4 α + 12 6 α + 6 9 α 8 4 α + 12 5 α + 6 6 α 8 8 α + 12 1 3 α + 6 1 8 α
Pentaphene 285.148 506.076 7 4 α + 14 6 α + 5 9 α 7 4 α + 14 5 α + 5 6 α 7 8 α + 14 1 3 α + 5 1 8 α
Benzo[g]chrysene 284.595 512.523 10 4 α + 8 6 α + 8 9 α 10 4 α + 8 5 α + 8 6 α 10 8 α + 8 1 3 α + 8 1 8 α
Pentahelicene 284.870 500.734 9 4 α + 10 6 α + 7 9 α 9 4 α + 10 5 α + 7 6 α 9 8 α + 10 1 3 α + 7 1 8 α
Benzo[c]chrysene 284.503 510.307 9 4 α + 10 6 α + 7 9 α 9 4 α + 10 5 α + 7 6 α 9 8 α + 10 1 3 α + 7 1 8 α
Picene 284.785 509.210 9 4 α + 10 6 α + 7 9 α 9 4 α + 10 5 α + 7 6 α 9 8 α + 10 1 3 α + 7 1 8 α
Benzo[b]chrysene 284.740 513.879 8 4 α + 12 6 α + 6 9 α 8 4 α + 12 5 α + 6 6 α 8 8 α + 12 1 3 α + 6 1 8 α
Dibenzo[a,c]anthracene 284.233 511.770 9 4 α + 10 6 α + 7 9 α 9 4 α + 10 5 α + 7 6 α 9 8 α + 10 1 3 α + 7 1 8 α
Dibenzo[b,g]phenanthrene 284.552 509.611 8 4 α + 12 6 α + 6 9 α 8 4 α + 12 5 α + 6 6 α 8 8 α + 12 1 3 α + 6 1 8 α
Perylene 251.175 461.545 8 4 α + 8 6 α + 8 9 α 8 4 α + 8 5 α + 8 6 α 8 8 α + 8 1 3 α + 8 1 8 α
Benzo[e]pyrene 250.568 463.738 8 4 α + 8 6 α + 8 9 α 8 4 α + 8 5 α + 8 6 α 8 8 α + 8 1 3 α + 8 1 8 α
Benzo[a]pyrene 251.973 468.712 7 4 α + 10 6 α + 7 9 α 7 4 α + 10 5 α + 7 6 α 7 8 α + 10 1 3 α + 7 1 8 α
Hexahelicene 336.098 555.409 10 4 α + 12 6 α + 9 9 α 10 4 α + 12 5 α + 9 6 α 10 8 α + 12 1 3 α + 9 1 8 α
Benzo[ghi]perylene 267.543 472.295 7 4 α + 10 6 α + 10 9 α 7 4 α + 10 5 α + 10 6 α 7 8 α + 10 1 3 α + 10 1 8 α
Hexacene 337.204 554.784 6 4 α + 20 6 α + 5 9 α 6 4 α + 20 5 α + 5 6 α 6 8 α + 20 1 3 α + 5 1 8 α
Coronene 285.041 468.796 6 4 α + 12 6 α + 12 9 α 6 4 α + 12 5 α + 12 6 α 6 8 α + 12 1 3 α + 12 1 8 α
Ovalene 368.518 551.708 6 4 α + 16 6 α + 19 9 α 6 4 α + 16 5 α + 19 6 α 6 8 α + 16 1 3 α + 19 1 8 α

Next, we present the pseudo code of corresponding to the above optimization formulation in Algorithm 1.

Note that Algorithm 1 optimizes a correlation function by determining the best value of the parameter α . It constructs a data vector y based on given coefficients and α , then fits a linear model between y and the input data x , calculating the coefficient of determination R 2 as a measure of fit. The objective function is defined to minimize log ( 1 + R 2 ) , aiming to find the optimal α that maximizes correlation. Finally, the algorithm returns the optimal α and the corresponding R 2 value.

5

5 Computational results

In this section, a robust linear correlation is established between key molecular attributes — such as the number of atoms, molecular weight, and molecular surface area — and the thermodynamic properties, specifically heat capacity ( Δ H ) and entropy ( E ), for the 30 lower benzenoid hydrocarbons. The study demonstrates that as these the molecular features increase, there is a corresponding rise in both Δ H and E , underscoring the predictability of thermodynamic properties based on molecular structure. This foundational insight sets the stage for further exploration of how molecular characteristics influence thermodynamic behavior.

Entropy is the thermodynamic function for predicting the spontaneity of a reaction. Whereas, the heat capacity of a substance is defined as the amount of heat required to raise the temperature of a given quantity of the substance by one degree Celsius. Several factors can affect the entropy and heat capacities of the substances, including, number of atoms, molecular weight, volume, molecular surface area, boiling point and melting point (Latimer, 1921; Origlia et al., 2001). As the number of atoms in the system increases, regardless of their masses, its entropy and heat capacity values increase. The higher the boiling point and melting point, the larger entropy and heat capacity of the system. In addition, as the volume or the molecular surface area of the compound increases, the entropy and the heat capacity also increase. The entropies ( E ) and heat capacities ( Δ H ) , the molecular formula ( M F ) , number of atoms ( N a t o m s ) , molecular weights ( M W ) and molecular surface area ( M S A ) of the 30 lower benzenoids are listed in Table 2.

Results obtained show that there are adequate linear correlations between the number of atoms in the molecule ( N a t o m s ) versus the heat capacity (Fig. 3(a)) and the entropy (Fig. 3(b)) with R 2 of 0.9994 and 0.9774, respectively. For entropy property, the deviation of the linear correlation is found for N a t o m s > 36 . A closer examination of Table 2 reveals that substances with similar molecular formula or number of atoms have almost similar E , and Δ H values. For examples, systems with N a t o m s = 30 , Δ H value (measured in cal/mol.K) lies in the range from 50.938 to 52.44 with an average of 52.125, while E value (measured in cal/mol.K) is ranged from 105 . 261 110 . 0 . 37 with an average of 108.055. Additionally, systems with N a t o m s = 36 have Δ H values range from 63.813 to 64.388 with an average of 64.089, while their E values lie in the range from 115 . 262 123 . 493 with an average of 120.773.

A closer examination of Table 2 reveals that smaller molecules with lower molecular weights have lower E and Δ H values, while the opposite is true. For example, Table 2 shows that the smallest E of 69.028 and Δ H of 17.151 belong to benzene molecule with molecular weight of 78.048 g/mol. The picture is not similar for larger molecules, Whereas, the maximum E of 84.491 belongs to Ovalene molecule with molecular weight of 398.112 g/mol, while the maximum Δ H value of 133.95 corresponds to Hexacene with molecular weight of 328.128 g/mol. This deviation in the results of obtained for the larger molecule can be clearly viewed by plotting the linear correlation between the M W versus E and Δ H values as shown in Fig. 3. The linear correlation coefficient R 2 of 0.9862 and 0.9298 are belong, correspondingly, to Δ H (Fig. 3(a)) and E (Fig. 3(b)) properties. As can be seen in Fig. 3, the deviation of the linear correlation is clearly observed for larger molecules. In addition, the degree of deviation is greater for E property than for the Δ H one.

Correlation curves between N a t o m s , M W and M S A with the chosen properties.
Fig. 3
Correlation curves between N a t o m s , M W and M S A with the chosen properties.

Fig. 3 shows good linear correlations between the molecular weight ( M W ) of the investigated benzenoids and their Δ H and E values. The linear correlations coefficients R 2 of 0.9862 and 0.9298 are belong, correspondingly, to Δ H (Fig. 3(a)) and E (Fig. 3(b)) properties. As can be seen in Fig. 3, the deviation of the linear correlation is clearly observed for larger molecules. In addition, the degree of deviation is greater for E property than for the Δ H one.

Fig. 3 shows good linear correlations between the molecular weight ( M W ) of the investigated benzenoids and their Δ H and E values.

The entropy and heat capacity of a substance are also correlated with its molecular surface area (MSA); see Fig. 3 for details. Examination of Table 2 illustrates that the smaller the M S A , the smaller E and Δ H values. It is found that benzene molecule with the smallest M S A value of 135.58 A ̊ 2 has the smallest E and Δ H values of 69.028 and 17.151, respectively. On the other hand, Ovalene with the maximum M S A of 467.46 A ̊ 2 has the E and Δ H values of 133.467 and 84.491, respectively. Notice that, the maximum E value of 133.95 belongs to Hexacene molecule with M S A of 442.77 A ̊ 2 . An excellent linear correlation is found between the M S A and both E and Δ H properties with R 2 of 0.9744 and 0.9929, respectively. Again, the entropies of the larger systems are deviated from these linear relationships.

Optimized geometries of the thirty investigated aromatic hydrocarbon molecules at the B 3 L Y P / 6 31 G ( d ) level of theory in the gas phase computed by Gaussian 09 (Frisch, 2009) and visualized by GaussView 05 (Dennington et al., 2007) software packages as implemented in Aziz supercomputer (http://hpc.kau.edu.sa) at King Abdulaziz University’s High-Performance Computing Centre.

In general, it can be concluded that the entropies and heat capacities of the 30 lower benzenoids are well correlated with the number of atoms, molecular surface area and molecular weights, however, some deviation from the linear correlation is observed for larger systems.

In conclusion, the analyses in the next two subsections offer complementary approaches to predicting thermodynamic properties. This section establishes strong linear correlations based on physical molecular attributes, such as the number of atoms, molecular weight, and surface area. In contrast, the next two subsections refine this understanding by introducing and optimizing mathematical indices ( R α , S C I α , and S O α ), which more precisely capture these relationships. Together, these studies enhance our understanding of the influence of molecular structure on thermodynamic properties, effectively bridging the gap between direct physical observations and advanced mathematical modeling through graphical indices.

Table 2 Substance, heat capacity, entropy, molecular formula, number of atoms, molecular weights and molecular surface area of the 30 lower benzenoids.
Number Name Δ H E M F N a t o m s M W M S A
1 Benzene 17.151 69.028 C 6 H 6 12 78.048 135.58
2 Naphthalene 28.816 81.955 C10H 8 18 128.064 198.46
3 Anthracene 40.652 94.94 C14H10 24 176.064 261.3
4 Phenanthrene 40.561 95.176 C14H10 24 176.064 259.85
5 Tetracene 52.516 107.952 C18H12 30 228.096 320.22
6 Benzo[c]phenanthrene 50.938 105.261 C18H12 30 228.096 318.01
7 Benzo[a]phenanthrene 52.327 108.154 C18H12 30 228.096 325.3
8 Chrysene 52.402 108.871 C18H12 30 228.096 320.69
9 Triphenylene 52.44 110.037 C18H12 30 228.096 313.26
10 Pyrene 44.539 97.259 C16H10 26 202.080 286.42
11 Pentacene 64.388 120.96 C22H14 36 278.112 381.68
12 Benzo[a]tetracene 64.185 121.281 C22H14 36 278.112 379.3
13 Dibenzo[a,h]anthracene 64.079 121.741 C22H14 36 278.112 376.8
14 Dibenzo[a,j]anthracene 64.087 121.595 C22H14 36 278.112 377.3
15 Pentaphene 64.155 121.385 C22H14 36 278.112 379.59
16 Benzo[g]chrysene 63.834 120.473 C22H14 36 278.112 368.36
17 Pentahelicene 63.858 120.109 C22H14 36 278.112 392.09
18 Benzo[c]chrysene 63.813 120.199 C22H14 36 278.112 370.06
19 Picene 64.178 121.984 C22H14 36 278.112 374.67
20 Benzo[b]chrysene 64.156 121.463 C22H14 36 278.112 377.2
21 Dibenzo[a,c]anthracene 64.21 123.493 C22H14 36 278.112 374.76
22 Dibenzo[b,g]phenanthrene 63.858 120.108 C22H14 36 278.112 372.71
23 Perylene 56.484 112.373 C20H12 32 252.096 334.51
24 Benzo[e]pyrene 56.41 111.423 C20H12 32 252.096 332.24
25 Benzo[a]pyrene 56.381 110.484 C20H12 32 252.096 336.06
26 Hexahelicene 75.654 131.693 C26H16 42 328.128 446.78
27 Benzo[ghi]perylene 60.38 113.229 C22H12 34 276.096 353.2
28 Hexacene 76.264 133.95 C26H16 42 328.128 442.77
29 Coronene 64.354 115.262 C24H12 36 300.096 378.24
30 Ovalene 84.491 133.467 C32H14 46 398.112 467.46

In this section, we present our computational results by employing Algorithm 1 on different computational platforms such as Octave and R Studio. For results from the next subsection, we employed the computational platform Octave.

Note that, in order to be suited for a linear regression analysis the data is supposed to be tested for normality. There are several tests for normality, including Shapiro–Wilk, Lilliefors, and the tests reported in Jäntschi (2019). Other tests such as Anderson–Darling, Cramér-von Mises, Kolmogorov–Smirnov are reported in Jäntschi (2020). Moreover, it is very important to not have outliers and extreme values, since both may leverage your regression.

5.1

5.1 Linear correlation analysis of general graphical indices

The statistical analysis of Figs. 4 and 5 demonstrates the correlation between three general indices ( R α , S C I α , and S O α ) and the thermodynamic properties of lower benzenoid hydrocarbons, specifically heat capacity ( Δ H ) and entropy ( E ) . The curves in these figures show how the correlation coefficients vary with the parameter α . These curves have been generated by using the software Octave. Notably, the optimal α values for R α , S C I α , and S O α provide strong correlations with both Δ H and E . For R α , the optimal value is α = 1 . 845 , achieving a correlation coefficient of ρ = 0 . 997 with both Δ H and E . The S C I α index is optimal at α = 0 . 319 , with a corresponding correlation coefficient of ρ = 0 . 997 . The S O α index shows the highest correlation across both properties at α = 1 . 067 , with a correlation coefficient of ρ = 0 . 998 , indicating its superior predictive potential. These results emphasize the effectiveness of these indices, particularly S O α , when they are properly optimized for the prediction of thermodynamic properties in benzenoid hydrocarbons.

Figs. 4 and 5 provide a magnified view of the regions around the best values of the parameter α for the general indices R α , S C I α , and S O α when predicting the thermodynamic properties of lower benzenoid hydrocarbons. These figures emphasize the intervals of α where the correlation with the properties is at its peak. Specifically, for predicting heat capacity ( Δ H ) in Fig. 4, the optimal intervals are approximately α = [ 2 , 1 . 5 ] for R α , [ 0 . 5 , 0 . 1 ] for S C I α , and [ 1 . 5 , 0 . 5 ] for S O α . Similarly, for predicting entropy ( E ) in Fig. 5, the best intervals for these indices are also within these ranges: R α = [ 2 , 1 . 5 ] , S C I α = [ 0 . 5 , 0 . 1 ] , and S O α = [ 1 . 5 , 0 . 5 ] .

These intervals represent the ranges of α where each index reaches its highest predictive potential, with the S O α index particularly standing out due to its consistent performance across both thermodynamic properties. By focusing on these specific intervals, Figs. 4 and 5 underscore the importance of fine-tuning the parameter α to maximize the predictive accuracy of R α , S C I α , and S O α indices for estimating the thermodynamic properties of benzenoid hydrocarbons.

Far and new views of the correlation curves between general indices and Δ H of lower benzenoids.
Fig. 4
Far and new views of the correlation curves between general indices and Δ H of lower benzenoids.

It has been observed that the optimal α intervals for the three general indices, highlighting the regions above the horizontal dashed lines where the correlation coefficient ( ρ ) is strong. For the general Randić index ( R α ), the optimal interval for predicting heat capacity ( Δ H ) is approximately [ 1 . 8384 , 0 . 5499 ] , while for entropy 2 . 5110 , 1 . 3334 . The general sum-connectivity index ( S C I α ) shows strong correlation within the interval [ 3 . 3914 , 1 . 1480 ] for Δ H and [ 4 . 4900 , 2 . 7642 ] for E , indicating its effectiveness within these ranges. The general Sombor index ( S O α ) demonstrates the broadest and most stable interval of strong correlation, approximately [ 1 . 5559 , 1 . 4797 ] for both Δ H and E , making it the most versatile and reliable predictor across a wider range of α values compared to the other indices. This analysis underscores the importance of selecting the correct α interval for each index to achieve optimal predictive accuracy for thermodynamic properties.

Far and new views of the correlation curves between general indices and E of lower benzenoids.
Fig. 5
Far and new views of the correlation curves between general indices and E of lower benzenoids.

5.2

5.2 Multiple prediction potential of general graphical indices

Note that Section 5 consider the two chosen thermodynamic properties i.e. Δ H and E individually to investigate their prediction potential with G D d { S C α , P C α , S O α } . This section investigate the same problem with G D d { S C α , P C α , S O α } and simultaneously choosing both Δ H and E . In order to perform this study, we employ multivariate regression analysis as we now have more than one independent variables. The multivariate regression analysis has been performed on the statistical environment R Studio.

Let x 1 = Δ H , x 2 = E (resp. y = P C α ) the two independent variables (resp. dependent variable). Since there are more than one independent variables are involved, we employ multivariate correlation coefficient to investigate the prediction ability of the general product-connectivity index for predicting Δ H and E . Let R ( α ) = ρ ( P C α ; Δ H , E ) be the multivariate correlation function between P C α and the two chosen properties Δ H and E . Thus, optimizing R ( α ) would deliver us the optimal value(s) of α (let us denote that with α ˆ ) for which the prediction ability of P C α and the two test properties Δ H and E is the strongest. We apply the following same algorithm as we did in Section 5 by replacing the correlation function with the multivariate correlation function.

The main difference between Algorithm 2 and the previous algorithm (Algorithm 1) is the use of multiple independent variables ( x 1 and x 2 ) in the linear model. In Algorithm 1, the linear model is a simple regression with a single predictor variable x , whereas in Algorithm 2, the linear model is a multiple regression with two predictor variables, x 1 and x 2 . This modification requires adjusting the calculation of the correlation (specifically R 2 ), as Algorithm 2 now considers the combined effect of both predictors on y . The objective function and the optimization process remain similar, aiming to find the optimal α that maximizes the correlation in this multivariate context.

A built-in optimizing tool in R Studio language is used by applying Algorithm 2 to generate the required α vs R ( α ) curves. Fig. 6 depicts such a plot incorporating the bivariate relationship between R ( α ) and α delivering α ˆ = 0 . 319 and the corresponding correlation value of ρ = 0 . 997 .

Applying the same computational process and Algorithm 2, we obtain Fig. 7 for the general sum-connectivity index S C α . Multiple correlation curve between R ( α ) and α show that the optimal value of α is α max = 1 . 845 and the corresponding correlation value of ρ = 0 . 997 .

Multiple correlation curve between R ( α ) and α delivering α ˆ = − 0 . 319 and the corresponding correlation value of ρ = 0 . 997 .
Fig. 6
Multiple correlation curve between R ( α ) and α delivering α ˆ = 0 . 319 and the corresponding correlation value of ρ = 0 . 997 .

A similar computational process by employing Algorithm 2 deliver Fig. 8 for the general Sombor index S O α . Multiple correlation curve between R ( α ) and α show that the optimal value of α is α max = 1 . 067 and the corresponding correlation value of ρ = 0 . 998 .

Multiple correlation curve between R ( α ) and α delivering α max = − 1 . 845 and the corresponding correlation value of ρ = 0 . 997 .
Fig. 7
Multiple correlation curve between R ( α ) and α delivering α max = 1 . 845 and the corresponding correlation value of ρ = 0 . 997 .

Multiple correlation curve between R ( α ) and α delivering α max = − 1 . 067 and the corresponding correlation value of ρ = 0 . 998 .
Fig. 8
Multiple correlation curve between R ( α ) and α delivering α max = 1 . 067 and the corresponding correlation value of ρ = 0 . 998 .

6

6 Conclusion

Contributions

In this work, we:

  • Developed optimal predictive models using three general degree-related indices–general sum/product connectivity and Sombor indices–offering high predictive accuracy for thermodynamic properties of benzenoid hydrocarbons.

  • Addressed open problems by determining optimal parameter values of α that maximized correlations between graphical indices and properties such as heat capacity and entropy.

  • Validated the effectiveness of each index through discrete optimization and multivariate regression analysis, highlighting the superior performance of the general product-connectivity index over other degree-based indices.

Study implications

As potential study implications, this work:

  • Provides a mathematical framework that enhances the use of cheminformatics for predicting thermodynamic properties, supporting the integration of graphical indices in QSPR modeling.

  • Establishes that molecular features like the number of atoms, molecular weight, and surface area significantly influence entropy and heat capacity, guiding future molecular property predictions.

  • Highlights the general product-connectivity index as particularly effective, suggesting broader applications in chemical graph theory for structure–property modeling.

Limitations

Here we highlight the limitations of this study.

  • The study is limited to benzenoid hydrocarbons, restricting generalizability to other classes of chemical compounds.

  • Evaluated only a few thermodynamic properties (heat capacity and entropy), which may limit understanding of the indices’ predictive potential for other properties such as physicochemical properties.

Future study

Based on the limitations above, here are some possible research directions:

  • Extend the application of these indices to a broader range of physicochemical and quantum-theoretic properties.

  • Investigate the use of these indices for non-benzenoid and more complex molecular structures to test generalizability.

  • Further explore temperature-based graphical indices to enhance predictive capability across various chemical environments and applications.

CRediT authorship contribution statement

Suha Wazzan: Writing – review & editing, Validation, Supervision, Resources, Project administration, Methodology. Sakander Hayat: Writing – original draft, Methodology, Formal analysis, Conceptualization. Wafi Ismail: Writing – original draft, Visualization, Validation, Software, Investigation, Formal analysis.

Acknowledgments

The authors acknowledge Prof. Nuha Wazzan from Chemistry department at King Abdulaziz University for her contribution with the DFT calculations and King Abdulaziz Universityís HighPerformance Computing Centre (Aziz Supercomputer) (http://hpc.kau.edu.sa) for supporting the computation for the work described in this paper. The authors are grateful to the reviewers and editors for their helpful comments and suggestions which has improved the submitted version of this paper.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. , , . First-principles prediction of enthalpies of formation for polycyclic aromatic hydrocarbons and derivatives. J. Phys. Chem. A.. 2015;119:11329-11365.
    [Google Scholar]
  2. , , , , . Topological indices for structure–activity corrections. Top. Curr. Chem.. 1983;114:21-55.
    [Google Scholar]
  3. , , . Quantitative structure–property relationships (QSPRS) for the estimation of vapor pressure: A hierarchical approach using mathematical structural descriptors. J. Chem. Inf. Comput. Sci.. 2001;41(3):692-701.
    [Google Scholar]
  4. , , . Graphs of extremal weights. Ars Combin.. 1998;50:225.
    [Google Scholar]
  5. , , . New spectral indices for molecular description. MATCH Commun. Math. Comput. Chem.. 2008;60:3-14.
    [Google Scholar]
  6. , , , . On benzenoid systems with minimal number of inlets. J. Serb. Chem. Soc.. 2013;78(9):1351-1357.
    [Google Scholar]
  7. , , , . GaussView, Version 4.1.2. Shawnee Mission, KS: Semichem Inc.; .
  8. , , , . Molecular Topology. Nova, Huntington; .
  9. Frisch, A., 2009. Gaussian 09 W Reference. Vol. 25, Wallingford, USA, p. 470.
  10. , . Degree-based topological indices. Croat. Chem. Acta.. 2013;86:351-361.
    [Google Scholar]
  11. , . Geometric approach to degree-based topological indices: Sombor indices. MATCH Commun. Math. Comput. Chem.. 2021;86(1):11-16.
    [Google Scholar]
  12. Novel Molecular Structure Descriptors – Theory and Applications, Vols. 1 & 2 Kragujevac: Univ. Kragujevac; .
    [Google Scholar]
  13. , , . Mathematical Concepts in Organic Chemistry. New York: Springer-Verlag; .
  14. , , . Testing the quality of molecular structure descriptors, vertex-degree-based topological indices. J. Serb. Chem. Soc.. 2013;78:805-810.
    [Google Scholar]
  15. , , , , . Structure-property modeling for thermodynamic properties of benzenoid hydrocarbons by temperature-based topological indices. Ain Shams Eng. J.. 2024;15(3):102586
    [Google Scholar]
  16. , , , . Statistical significance of valency-based topological descriptors for correlating thermodynamic properties of benzenoid hydrocarbons with applications. Comput. Theor. Chem.. 2023;1227:114259
    [Google Scholar]
  17. , . On some counting polynomials in chemistry. Discrete Appl. Math.. 1988;19:239-257.
    [Google Scholar]
  18. , , , . Hyper-zagreb index in fuzzy environment and its application. Heliyon. 2024;10(16):e36110
    [Google Scholar]
  19. , , . Hyper-Wiener index for fuzzy graph and its application in share market. J. Intell. Fuzzy Systems. 2021;41(1):2073-2083.
    [Google Scholar]
  20. , , . Multiplicative version of first zagreb index in fuzzy graph and its application in crime analysis. Proc. Natl. Acad. Sci. India A. 2024;94(1):127-141.
    [Google Scholar]
  21. , , . Neighbourhood and competition graphs under fuzzy incidence graph and its application. Comput. Appl. Math.. 2024;43(7):411.
    [Google Scholar]
  22. , . A test detecting the outliers for continuous distributions based on the cumulative distribution function of the data being tested. Symmetry. 2019;11(6):835.
    [Google Scholar]
  23. , . Detecting extreme values with order statistics in samples from continuous distributions. Mathematics. 2020;8(2):216.
    [Google Scholar]
  24. , , , , , , , . Interpretation of quantitative structure- property and- activity relationships. J. Chem. Inf. Comput. Sci.. 2001;41(3):679-685.
    [Google Scholar]
  25. , , , . Topological indices and graph entropies for carbon nanotube Y-junctions. J. Math. Chem.. 2024;62(1):73-108.
    [Google Scholar]
  26. , , , , . Topological indices of lead sulphide using polynomial technique. Mol. Phys.. 2024;122(3):e2249131
    [Google Scholar]
  27. , . The mass effect in th entropy of solids and gases. J. Am. Chem. Soc.. 1921;43(4):818-826.
    [Google Scholar]
  28. , , . Face index of silicon carbide structures: An alternative approach. Silicon. 2024;16:5865-5876.
    [Google Scholar]
  29. , , , . Apparent molar volumes and apparent molar heat capacities of aqueous solutions ofn, N-dimethylformamide and n, N-dimethylacetamide at temperatures from 278.15 to 393.15 K and at the pressure 0.35 MPa. J. Chem. Thermodyn.. 2001;33(8):917-927.
    [Google Scholar]
  30. , , , . On general sombor index of graphs. Asian-Eur. J. Math.. 2023;16(3):2350052
    [Google Scholar]
  31. , . Characterization of molecular branching. J. Am. Chem. Soc.. 1975;97:6609-6615.
    [Google Scholar]
  32. , , , . Vertex-edge partition resolvability for certain carbon nanocones. Polycycl. Aromat. Compd.. 2024;44(3):1745-1759.
    [Google Scholar]
  33. , , . Molecular Descriptors for Chemoinformatics, Vols. 1 & 2. Weinheim, Germany: Wiley-VCH; .
  34. , , . Symmetry-adapted domination indices: The enhanced domination sigma index and its applications in QSPR studies of octane and its isomers. Symmetry. 2023;15(6):1202.
    [Google Scholar]
  35. , , . Advancing computational insights: Domination topological indices of polysaccharides using special polynomials and QSPR analysis. Contemp. Math.. 2024;5(1):26-49.
    [Google Scholar]
  36. , , . Unveiling novel eccentric neighborhood forgotten indices for graphs and gaph operations: A comprehensive exploration of boiling point prediction. AIMS Math.. 2024;9(1):1128-1165.
    [Google Scholar]
  37. , . Structural determination of the paraffin boiling points. J. Am. Chem. Soc.. 1947;69:17-20.
    [Google Scholar]
  38. , , , , , . A survey on graphs extremal with respect to distance-based topological indices. MATCH Commun. Math. Comput. Chem.. 2014;71:461-508.
    [Google Scholar]
  39. , , . On a novel connectivity index. J. Math. Chem.. 2009;46:1252-1270.
    [Google Scholar]
  40. , , . On general sum-connectivity index. J. Math. Chem.. 2010;47:210-218.
    [Google Scholar]
Show Sections