JKQTPlotter trunk/v5.0.0
an extensive Qt5+Qt6 Plotter framework (including a feature-richt plotter widget, a speed-optimized, but limited variant and a LaTeX equation renderer!), written fully in C/C++ and without external dependencies
Loading...
Searching...
No Matches
Tutorial (JKQTPDatastore): 1-Dimensional Group Statistics with JKQTPDatastore

This tutorial project (see ./examples/datastore_groupedstat/) explains several advanced functions of JKQTPDatastore in combination with the [JKQTPlotter Statistics Library] conatined in JKQTPlotter.

Note that there are additional tutorial explaining other aspects of data mangement in JKQTPDatastore:

The source code of the main application can be found in datastore_groupedstat.cpp. This tutorial cites only parts of this code to demonstrate different ways of working with data for the graphs.

Barcharts & Boxplots from categorized data

Generating a Dataset for Grouped Barcharts

To demonstrate the grouped statistics, we first have to generate a dataset. The datapoints consist of pairs <group,value>, where the groups are encoded by the numbers 1,2,3 and in each group, several measurements are taken:

size_t colBarRawGroup=datastore1->addColumn("barchart, rawdata, group");
size_t colBarRawValue=datastore1->addColumn("barchart, rawdata, value");
// data for group 1
datastore1->appendToColumns(colBarRawGroup, colBarRawValue, 1, 1.1);
datastore1->appendToColumns(colBarRawGroup, colBarRawValue, 1, 1.5);
datastore1->appendToColumns(colBarRawGroup, colBarRawValue, 1, 0.8);
// ...
// data for group 2
datastore1->appendToColumns(colBarRawGroup, colBarRawValue, 2, 2.2);
// ...
// data for group 3
datastore1->appendToColumns(colBarRawGroup, colBarRawValue, 3, 4.1);
// ...

Note that the data does not have to be sorted. You can add the dataset in any order!

This dataset can be visualized with a simple scatter plot:

JKQTPXYLineGraph* gScatterForBar;
plotbarchart->addGraph(gScatterForBar=new JKQTPXYLineGraph(plotbarchart));
gScatterForBar->setXYColumns(colBarRawGroup, colBarRawValue);
gScatterForBar->setDrawLine(false);
gScatterForBar->setSymbolType(JKQTPCross);
gScatterForBar->setSymbolSize(5);
gScatterForBar->setSymbolColor(QColorWithAlphaF(QColor("red"), 0.5));
void setSymbolSize(double __value)
set the size (=diameter in pt) of the graph symbol (in pt)
void setSymbolColor(const QColor &__value)
set the color of the graph symbols
void setSymbolType(JKQTPGraphSymbols __value)
set the type of the graph symbol
void setXYColumns(size_t xCol, size_t yCol)
sets xColumn and yColumn at the same time
This implements xy line plots. This also alows to draw symbols at the data points.
Definition jkqtplines.h:61
void setDrawLine(bool __value)
indicates whether to draw a line or not
QColor QColorWithAlphaF(const QColor &color, qreal alphaF)
construct a QColor, based on the given color, but with alpha set to the specified value alphaF
Definition jkqtptools.h:364
@ JKQTPCross
a X cross
Definition jkqtpdrawingtools.h:146

The resulting plot looks like this:

datastore_groupedstat_barchartrawdata

Calculating Grouped Statistics for a Barchart

Now we want to draw a barchart for every group, which indicates the average in each group. This is done using methods from the statistics library. First we need to group the data using jkqtpstatGroupData(), which assembles the data points in each group groupeddataBar

std::map<double, std::vector<double> > groupeddataBar;
jkqtpstatGroupData(datastore1->begin(colBarRawGroup), datastore1->end(colBarRawGroup),
datastore1->begin(colBarRawValue), datastore1->end(colBarRawValue),
groupeddataBar);
void jkqtpstatGroupData(InputCatIt inFirstCat, InputCatIt inLastCat, InputValueIt inFirstValue, InputValueIt inLastValue, std::map< double, std::vector< double > > &groupeddata, JKQTPStatGroupDefinitionFunctor1D groupDefFunc=&jkqtpstatGroupingIdentity1D)
groups data from an input range inFirstCat / inFirstValue ... inLastCat / outFirstCat representing pa...
Definition jkqtpstatgrouped.h:108

Now we can calculate the statistics for each group separately: Data is collected in new columns colBarGroup, colBarAverage and colBarStdDev. The statistics is then calculated by simply iterating over groupeddataBar and calling functions like jkqtpstatAverage() for each group:

size_t colBarGroup=datastore1->addColumn("barchart, group");
size_t colBarAverage=datastore1->addColumn("barchart, group-average");
size_t colBarStdDev=datastore1->addColumn("barchart, group-stddev");
for (auto it=groupeddataBar.begin(); it!=groupeddataBar.end(); ++it) {
datastore1->appendToColumn(colBarGroup, it->first);
datastore1->appendToColumn(colBarAverage, jkqtpstatAverage(it->second.begin(), it->second.end()));
datastore1->appendToColumn(colBarStdDev, jkqtpstatStdDev(it->second.begin(), it->second.end()));
}
double jkqtpstatStdDev(InputIt first, InputIt last, double *averageOut=nullptr, size_t *Noutput=nullptr)
calculates the standard deviation of a given data range first ... last
Definition jkqtpstatbasics.h:515
double jkqtpstatAverage(InputIt first, InputIt last, size_t *Noutput=nullptr)
calculates the average of a given data range first ... last
Definition jkqtpstatbasics.h:62

Finally the calculated groups are drawn:

plotbarchart->addGraph(gBar=new JKQTPBarVerticalErrorGraph(plotbarchart));
gBar->setXYColumns(colBarGroup, colBarAverage);
gBar->setYErrorColumn(static_cast<int>(colBarStdDev));
This implements a vertical bar graph with bars between and and error indicator.
Definition jkqtpbarchart.h:90
void setYErrorColumn(int __value)
the column that contains the error of the x-component of the datapoints

The resulting plot looks like this:

datastore_groupedstat_barchart

In order to safe yo the typing of the code above, shortcuts in the form of adaptors exist:

jkqtpstatAddYErrorBarGraph(plotbarchart->getPlotter(),
datastore1->begin(colBarRawGroup), datastore1->end(colBarRawGroup),
datastore1->begin(colBarRawValue), datastore1->end(colBarRawValue));
JKQTPBarVerticalErrorGraph * jkqtpstatAddYErrorBarGraph(JKQTBasePlotter *plotter, InputCatIt inFirstCat_X, InputCatIt inLastCat_X, InputValueIt inFirstValue_Y, InputValueIt inLastValue_Y, JKQTPStatGroupDefinitionFunctor1D groupDefFunc=&jkqtpstatGroupingIdentity1D, const QString &columnBaseName=QString("grouped data"))
create a JKQTPBarVerticalErrorGraph with y-direction error bars, calculated from average +/- stddev o...
Definition jkqtpstatisticsadaptors.h:2435

Also other flavors exist that generate different graphs (see the JKQTPlotter documentation):

Calculating Grouped Statistics for a Boxplot

With the methods above we can also calculate more advanced statistics, like e.g. boxplots:

size_t colBarMedian=datastore1->addColumn("barchart, group-median");
size_t colBarMin=datastore1->addColumn("barchart, group-min");
size_t colBarMax=datastore1->addColumn("barchart, group-max");
size_t colBarQ25=datastore1->addColumn("barchart, group-Q25");
size_t colBarQ75=datastore1->addColumn("barchart, group-Q75");
for (auto it=groupeddataBar.begin(); it!=groupeddataBar.end(); ++it) {
datastore1->appendToColumn(colBarMedian, jkqtpstatMedian(it->second.begin(), it->second.end()));
datastore1->appendToColumn(colBarMin, jkqtpstatMinimum(it->second.begin(), it->second.end()));
datastore1->appendToColumn(colBarMax, jkqtpstatMaximum(it->second.begin(), it->second.end()));
datastore1->appendToColumn(colBarQ25, jkqtpstatQuantile(it->second.begin(), it->second.end(), 0.25));
datastore1->appendToColumn(colBarQ75, jkqtpstatQuantile(it->second.begin(), it->second.end(), 0.75));
}
double jkqtpstatMedian(InputIt first, InputIt last, size_t *Noutput=nullptr)
calculates the median of a given data range first ... last
Definition jkqtpstatbasics.h:868
double jkqtpstatQuantile(InputIt first, InputIt last, double quantile, size_t *Noutput=nullptr)
calculates the quantile -th quantile of a given data range first ... last
Definition jkqtpstatbasics.h:1170
double jkqtpstatMaximum(InputIt first, InputIt last, InputIt *maxPos=nullptr, size_t *Noutput=nullptr)
calculates the maximum value in the given data range first ... last
Definition jkqtpstatbasics.h:265
double jkqtpstatMinimum(InputIt first, InputIt last, InputIt *minPos=nullptr, size_t *Noutput=nullptr)
calculates the minimum value in the given data range first ... last
Definition jkqtpstatbasics.h:223

The result can be plotted using JKQTPBoxplotVerticalGraph, which receives a column for each value class of the final plot:

plotboxplot->addGraph(gBoxplot=new JKQTPBoxplotVerticalGraph(plotboxplot));
gBoxplot->setPositionColumn(colBarGroup);
gBoxplot->setMinColumn(colBarMin);
gBoxplot->setMaxColumn(colBarMax);
gBoxplot->setMedianColumn(colBarMedian);
gBoxplot->setPercentile25Column(colBarQ25);
gBoxplot->setPercentile75Column(colBarQ75);
void setPercentile25Column(int __value)
the column that contains the 25% percentile-component of the datapoints
void setMinColumn(int __value)
the column that contains the minimum-component of the datapoints
void setPositionColumn(int __value)
the column that contains the x-component of the datapoints
void setMedianColumn(int __value)
the column that contains the median-component of the datapoints
void setMaxColumn(int __value)
the column that contains the maximum-component of the datapoints
void setPercentile75Column(int __value)
the column that contains the 75% percentile-component of the datapoints
This implements vertical boxplots, optionally also a notched boxplot.
Definition jkqtpboxplot.h:102

The resulting plot looks like this:

datastore_groupedstat_boxplot

In order to safe yo the typing of the code above, shortcuts in the form of adaptors exist:

jkqtpstatAddHBoxplotsAndOutliers(plotboxplot->getPlotter(),
datastore1->begin(colBarRawGroup), datastore1->end(colBarRawGroup),
datastore1->begin(colBarRawValue), datastore1->end(colBarRawValue));
std::pair< JKQTPBoxplotHorizontalGraph *, JKQTPXYLineGraph * > jkqtpstatAddHBoxplotsAndOutliers(JKQTBasePlotter *plotter, InputCatIt inFirstCat_Y, InputCatIt inLastCat_Y, InputValueIt inFirstValue_X, InputValueIt inLastValue_X, double quantile1Spec=0.25, double quantile2Spec=0.75, double minimumQuantile=0.03, double maximumQuantile=0.97, JKQTPStatGroupDefinitionFunctor1D groupDefFunc=&jkqtpstatGroupingIdentity1D, const QString &columnBaseName=QString("grouped boxplot data"))
create vertical boxplots of type JKQTPBoxplotHorizontalGraph, from the 5-value-summary of groups in t...
Definition jkqtpstatisticsadaptors.h:3014

Also other flavors exist that generate different graphs (see the JKQTPlotter documentation):

(Scatter-)Graphs with X/Y-errors from Categorized Data

Dataset for XY Scatter Graphs

First we generate a second dataset, which is going to be used for a scaterplot. The datapoints consist of pairs <x,y>, that are based on a parabula with random deviations, both in x- and y-direction:

size_t colScatterRawX=datastore1->addColumn("scatterplot, rawdata, x");
size_t colScatterRawY=datastore1->addColumn("scatterplot, rawdata, y");
std::random_device rd; // random number generators:
std::mt19937 gen{rd()};
std::normal_distribution<> d1{0,0.5};
const size_t N=100;
const double xmax=3.5;
for (size_t i=0; i<N; i++) {
const double x=(static_cast<double>(i)-static_cast<double>(N)/2.0)*xmax/(static_cast<double>(N)/2.0);
const double y=jkqtp_sqr(x)+2.0;
datastore1->appendToColumns(colScatterRawX, colScatterRawY, x+d1(gen), y+d1(gen));
}
T jkqtp_sqr(const T &v)
returns the quare of the value v, i.e. v*v
Definition jkqtpmathtools.h:327

This dataset can be visualized:

plotscattererrors->addGraph(gScatterRaw=new JKQTPXYParametrizedScatterGraph(plotscattererrors));
gScatterRaw->setXYColumns(colScatterRawX, colScatterRawY);
gScatterRaw->setDrawLine(false);
gScatterRaw->setSymbolType(JKQTPCross);
gScatterRaw->setSymbolSize(5);
This implements xy scatter plots (like JKQTPXYScatterGraph), but the color and size of the symbols ma...
Definition jkqtpscatter.h:147
void setDrawLine(bool __value)
indicates whether to draw a line or not

The resulting plot looks like this:

datastore_groupedstat_scatterrawdata

Calculating x- and y-Errors from Categorized Data

Now we want to draw a scatterchart of the data, where data-points should be grouped together, in x-intervals of width 0.5. From all the points in each interval, we calculate the in both x- and y-direction the average and standard deviation. First we need to group the data using jkqtpstatGroupData(), which assembles the data points in each group groupeddataScatter. For the custom grouping of the datapoints we use the optional functor provided to jkqtpstatGroupData(): We use jkqtpstatGroupingCustomRound1D() with given parameters 0.25 for the (center) location of the first bin and bin width 0.5. The functor is not built by hand (which would be possible using std::bind), but with the generator function jkqtpstatMakeGroupingCustomRound1D(). In addition we use a variant of jkqtpstatGroupData(), which outputs a column with the category assigned to every data pair in the input data range:

std::map<double, std::pair<std::vector<double>,std::vector<double> > > groupeddataScatter;
size_t colScatterRawGroup=datastore1->addColumn("scatterplot, rawdata, assigned-group");
jkqtpstatGroupData(datastore1->begin(colScatterRawX), datastore1->end(colScatterRawX),
datastore1->begin(colScatterRawY), datastore1->end(colScatterRawY),
datastore1->backInserter(colScatterRawGroup),
groupeddataScatter,
jkqtmath_LIB_EXPORT JKQTPStatGroupDefinitionFunctor1D jkqtpstatMakeGroupingCustomRound1D(double firstGroupCenter, double groupWidth)
generates a functor of jkqtpstatGroupingCustomRound1D() with the two paramaters firstGroupCenter and ...

The column colScatterRawGroup can now be used to color the scatter graph:

gScatterRaw->setColorColumn(colScatterRawGroup);
void setColorColumn(int __value)
this column contains the symbol color

Now we can calculate the statistics for each group separately: Data is collected in two new columns. Then the statistics is calculated by simply iterating over groupeddataScatter and calling functions like jkqtpstatAverage() for each group:

size_t colScatterXAvg=datastore1->addColumn("scatter, x, average");
size_t colScatterXStd=datastore1->addColumn("scatter, x, stddev");
size_t colScatterYAvg=datastore1->addColumn("scatter, y, average");
size_t colScatterYStd=datastore1->addColumn("scatter, y, stddev");
for (auto it=groupeddataScatter.begin(); it!=groupeddataScatter.end(); ++it) {
datastore1->appendToColumn(colScatterXAvg, jkqtpstatAverage(it->second.first.begin(), it->second.first.end()));
datastore1->appendToColumn(colScatterXStd, jkqtpstatStdDev(it->second.first.begin(), it->second.first.end()));
datastore1->appendToColumn(colScatterYAvg, jkqtpstatAverage(it->second.second.begin(), it->second.second.end()));
datastore1->appendToColumn(colScatterYStd, jkqtpstatStdDev(it->second.second.begin(), it->second.second.end()));
}

Finally the calculated groups are drawn

JKQTPXYLineErrorGraph* gScatterErr;
plotscattererrors->addGraph(gScatterErr=new JKQTPXYLineErrorGraph(plotscattererrors));
gScatterErr->setXYColumns(colScatterXAvg, colScatterYAvg);
gScatterErr->setXErrorColumn(static_cast<int>(colScatterXStd));
gScatterErr->setYErrorColumn(static_cast<int>(colScatterYStd));
gScatterErr->setDrawLine(false);
void setXErrorColumn(int __value)
the column that contains the error of the x-component of the datapoints
This implements xy line plots with x and y error indicators.
Definition jkqtplines.h:112
@ JKQTPFilledTriangle
a filled triangle (tip at top)
Definition jkqtpdrawingtools.h:153

The resulting plot looks like this:

datastore_groupedstat_scatter

In order to safe yo the typing of the code above, shortcuts in the form of adaptors exist:

jkqtpstatAddXYErrorLineGraph(plotscattererrors->getPlotter(),
datastore1->begin(colScatterRawX), datastore1->end(colScatterRawX),
datastore1->begin(colScatterRawY), datastore1->end(colScatterRawY),
JKQTPXYLineErrorGraph * jkqtpstatAddXYErrorLineGraph(JKQTBasePlotter *plotter, InputCatIt inFirstCat_X, InputCatIt inLastCat_X, InputValueIt inFirstValue_Y, InputValueIt inLastValue_Y, JKQTPStatGroupDefinitionFunctor1D groupDefFunc=&jkqtpstatGroupingIdentity1D, const QString &columnBaseName=QString("grouped data"))
create a JKQTPXYLineErrorGraph with y-direction error bars, calculated from average +/- stddev of gro...
Definition jkqtpstatisticsadaptors.h:2757

Also other flavors exist that generate different graphs (see the JKQTPlotter documentation):

Screenshot of the full Program

The output of the full test program datastore_groupedstat.cpp looks like this:

datastore_groupedstat