matplotlib中plt.hist()参数解释及应用实例_Python

一、plt.hist()参数详解

简介：
plt.hist()：直方图，一种特殊的柱状图。
将统计值的范围分段，即将整个值的范围分成一系列间隔，然后计算每个间隔中有多少值。
直方图也可以被归一化以显示“相对”频率。然后，它显示了属于几个类别中的每个类别的占比，其高度总和等于1。

				?

									import matplotlib as mpl

									import matplotlib.pyplot as plt

									from matplotlib.pyplot import MultipleLocator

									from matplotlib import ticker

									%matplotlib inline

									plt.hist(x, bins=None, range=None, density=None, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, label=None, stacked=False, normed=None, *, data=None, **kwargs)

常用参数解释：
x: 作直方图所要用的数据，必须是一维数组；多维数组可以先进行扁平化再作图；必选参数；
bins: 直方图的柱数，即要分的组数，默认为10；
range：元组(tuple)或None；剔除较大和较小的离群值，给出全局范围；如果为None，则默认为(x.min(), x.max())；即x轴的范围；
density：布尔值。如果为true，则返回的元组的第一个参数n将为频率而非默认的频数；
weights：与x形状相同的权重数组；将x中的每个元素乘以对应权重值再计数；如果normed或density取值为True，则会对权重进行归一化处理。这个参数可用于绘制已合并的数据的直方图；
cumulative：布尔值；如果为True，则计算累计频数；如果normed或density取值为True，则计算累计频率；
bottom：数组，标量值或None；每个柱子底部相对于y=0的位置。如果是标量值，则每个柱子相对于y=0向上/向下的偏移量相同。如果是数组，则根据数组元素取值移动对应的柱子；即直方图上下便宜距离；
histtype：{‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’}；'bar’是传统的条形直方图；'barstacked’是堆叠的条形直方图；'step’是未填充的条形直方图，只有外边框；‘stepfilled’是有填充的直方图；当histtype取值为’step’或’stepfilled’，rwidth设置失效，即不能指定柱子之间的间隔，默认连接在一起；
align：{‘left’, ‘mid’, ‘right’}；‘left’：柱子的中心位于bins的左边缘；‘mid’：柱子位于bins左右边缘之间；‘right’：柱子的中心位于bins的右边缘；
orientation：{‘horizontal’, ‘vertical’}：如果取值为horizontal，则条形图将以y轴为基线，水平排列；简单理解为类似bar()转换成barh()，旋转90°；
rwidth：标量值或None。柱子的宽度占bins宽的比例；
log：布尔值。如果取值为True，则坐标轴的刻度为对数刻度；如果log为True且x是一维数组，则计数为0的取值将被剔除，仅返回非空的(frequency, bins, patches）；
color：具体颜色，数组（元素为颜色）或None。
label：字符串（序列）或None；有多个数据集时，用label参数做标注区分；
stacked：布尔值。如果取值为True，则输出的图为多个数据集堆叠累计的结果；如果取值为False且histtype=‘bar’或’step’，则多个数据集的柱子并排排列；
normed: 是否将得到的直方图向量归一化，即显示占比，默认为0，不归一化；不推荐使用，建议改用density参数；
edgecolor: 直方图边框颜色；
alpha: 透明度；

返回值（用参数接收返回值，便于设置数据标签）：
n：直方图向量，即每个分组下的统计值，是否归一化由参数normed设定。当normed取默认值时，n即为直方图各组内元素的数量（各组频数）；
bins: 返回各个bin的区间范围；
patches：返回每个bin里面包含的数据，是一个list。
其他参数与plt.bar()类似。

二、plt.hist()简单应用

				?

									import matplotlib.pyplot as plt

									%matplotlib inline

									# 最简单，只传递x，组数，宽度，范围

									plt.hist(data13['carrier_no'], bins=11, rwidth=0.8, range=(1,12), align='left')

									plt.show()

matplotlib中plt.hist()参数解释及应用实例

三、plt.bar()综合应用

				?

									import matplotlib as mpl

									import matplotlib.pyplot as plt

									from matplotlib.pyplot import MultipleLocator

									from matplotlib import ticker

									%matplotlib inline

									plt.figure(figsize=(8,5), dpi=80)

									# 拿参数接收hist返回值，主要用于记录分组返回的值，标记数据标签

									n, bins, patches = plt.hist(data13['carrier_no'], bins=11, rwidth=0.8, range=(1,12), align='left', label='xx直方图')

									for i in range(len(n)):

									    plt.text(bins[i], n[i]*1.02, int(n[i]), fontsize=12, horizontalalignment="center") #打标签，在合适的位置标注每个直方图上面样本数

									plt.ylim(0,16000)

									plt.title('直方图')

									plt.legend()

									# plt.savefig('直方图'+'.png')

									plt.show()

matplotlib中plt.hist()参数解释及应用实例

附官方参数解释

				?

									Parameters

									----------

									x : (n,) array or sequence of (n,) arrays

									    Input values, this takes either a single array or a sequence of

									    arrays which are not required to be of the same length.

									bins : int or sequence or str, optional

									    If an integer is given, ``bins + 1`` bin edges are calculated and

									    returned, consistent with `numpy.histogram`.

									    If `bins` is a sequence, gives bin edges, including left edge of

									    first bin and right edge of last bin.  In this case, `bins` is

									    returned unmodified.

									    All but the last (righthand-most) bin is half-open.  In other

									    words, if `bins` is::

									        [1, 2, 3, 4]

									    then the first bin is ``[1, 2)`` (including 1, but excluding 2) and

									    the second ``[2, 3)``.  The last bin, however, is ``[3, 4]``, which

									    *includes* 4.

									    Unequally spaced bins are supported if *bins* is a sequence.

									    With Numpy 1.11 or newer, you can alternatively provide a string

									    describing a binning strategy, such as 'auto', 'sturges', 'fd',

									    'doane', 'scott', 'rice' or 'sqrt', see

									    `numpy.histogram`.

									    The default is taken from :rc:`hist.bins`.

									range : tuple or None, optional

									    The lower and upper range of the bins. Lower and upper outliers

									    are ignored. If not provided, *range* is ``(x.min(), x.max())``.

									    Range has no effect if *bins* is a sequence.

									    If *bins* is a sequence or *range* is specified, autoscaling

									    is based on the specified bin range instead of the

									    range of x.

									    Default is ``None``

									density : bool, optional

									    If ``True``, the first element of the return tuple will

									    be the counts normalized to form a probability density, i.e.,

									    the area (or integral) under the histogram will sum to 1.

									    This is achieved by dividing the count by the number of

									    observations times the bin width and not dividing by the total

									    number of observations. If *stacked* is also ``True``, the sum of

									    the histograms is normalized to 1.

									    Default is ``None`` for both *normed* and *density*. If either is

									    set, then that value will be used. If neither are set, then the

									    args will be treated as ``False``.

									    If both *density* and *normed* are set an error is raised.

									weights : (n, ) array_like or None, optional

									    An array of weights, of the same shape as *x*.  Each value in *x*

									    only contributes its associated weight towards the bin count

									    (instead of 1).  If *normed* or *density* is ``True``,

									    the weights are normalized, so that the integral of the density

									    over the range remains 1.

									    Default is ``None``.

									    This parameter can be used to draw a histogram of data that has

									    already been binned, e.g. using `np.histogram` (by treating each

									    bin as a single point with a weight equal to its count) ::

									        counts, bins = np.histogram(data)

									        plt.hist(bins[:-1], bins, weights=counts)

									    (or you may alternatively use `~.bar()`).

									cumulative : bool, optional

									    If ``True``, then a histogram is computed where each bin gives the

									    counts in that bin plus all bins for smaller values. The last bin

									    gives the total number of datapoints. If *normed* or *density*

									    is also ``True`` then the histogram is normalized such that the

									    last bin equals 1. If *cumulative* evaluates to less than 0

									    (e.g., -1), the direction of accumulation is reversed.

									    In this case, if *normed* and/or *density* is also ``True``, then

									    the histogram is normalized such that the first bin equals 1.

									    Default is ``False``

									bottom : array_like, scalar, or None

									    Location of the bottom baseline of each bin.  If a scalar,

									    the base line for each bin is shifted by the same amount.

									    If an array, each bin is shifted independently and the length

									    of bottom must match the number of bins.  If None, defaults to 0.

									    Default is ``None``

									histtype : {'bar', 'barstacked', 'step',  'stepfilled'}, optional

									    The type of histogram to draw.

									    - 'bar' is a traditional bar-type histogram.  If multiple data

									      are given the bars are arranged side by side.

									    - 'barstacked' is a bar-type histogram where multiple

									      data are stacked on top of each other.

									    - 'step' generates a lineplot that is by default

									      unfilled.

									    - 'stepfilled' generates a lineplot that is by default

									      filled.

									    Default is 'bar'

									align : {'left', 'mid', 'right'}, optional

									    Controls how the histogram is plotted.

									        - 'left': bars are centered on the left bin edges.

									        - 'mid': bars are centered between the bin edges.

									        - 'right': bars are centered on the right bin edges.

									    Default is 'mid'

									orientation : {'horizontal', 'vertical'}, optional

									    If 'horizontal', `~matplotlib.pyplot.barh` will be used for

									    bar-type histograms and the *bottom* kwarg will be the left edges.

									rwidth : scalar or None, optional

									    The relative width of the bars as a fraction of the bin width.  If

									    ``None``, automatically compute the width.

									    Ignored if *histtype* is 'step' or 'stepfilled'.

									    Default is ``None``

									log : bool, optional

									    If ``True``, the histogram axis will be set to a log scale. If

									    *log* is ``True`` and *x* is a 1D array, empty bins will be

									    filtered out and only the non-empty ``(n, bins, patches)``

									    will be returned.

									    Default is ``False``

									color : color or array_like of colors or None, optional

									    Color spec or sequence of color specs, one per dataset.  Default

									    (``None``) uses the standard line color sequence.

									    Default is ``None``

									label : str or None, optional

									    String, or sequence of strings to match multiple datasets.  Bar

									    charts yield multiple patches per dataset, but only the first gets

									    the label, so that the legend command will work as expected.

									    default is ``None``

									stacked : bool, optional

									    If ``True``, multiple data are stacked on top of each other If

									    ``False`` multiple data are arranged side by side if histtype is

									    'bar' or on top of each other if histtype is 'step'

									    Default is ``False``

									normed : bool, optional

									    Deprecated; use the density keyword argument instead.

									Returns

									-------

									n : array or list of arrays

									    The values of the histogram bins. See *density* and *weights* for a

									    description of the possible semantics.  If input *x* is an array,

									    then this is an array of length *nbins*. If input is a sequence of

									    arrays ``[data1, data2,..]``, then this is a list of arrays with

									    the values of the histograms for each of the arrays in the same

									    order.  The dtype of the array *n* (or of its element arrays) will

									    always be float even if no weighting or normalization is used.

									bins : array

									    The edges of the bins. Length nbins + 1 (nbins left edges and right

									    edge of last bin).  Always a single array even when multiple data

									    sets are passed in.

									patches : list or list of lists

									    Silent list of individual patches used to create the histogram

									    or list of such list if multiple input datasets.

									Other Parameters

									----------------

									**kwargs : `~matplotlib.patches.Patch` properties

									See also

									--------

									hist2d : 2D histograms

									Notes

									-----

									.. note::

									    In addition to the above described arguments, this function can take a

									    **data** keyword argument. If such a **data** argument is given, the

									    following arguments are replaced by **data[<arg>]**:

									    * All arguments with the following names: 'weights', 'x'.

									    Objects passed as **data** must support item access (``data[<arg>]``) and

									    membership test (``<arg> in data``).