The indexers must be in the category or the operation will raise a KeyError. See the this old issue for a more detailed discussion. A int64 B category dtype: It is important to note that the take method on pandas objects are not intended to work on boolean indices and may return unexpected results.
DataFrames from Python Structures
I'm using python 2. Thanks for the very comprehensive-looking response though! Do you run your code in shell version of ipython, ipython qtconsole or ipython notebook? And what exactly are you trying to accomplish? You cannot highlight the html output in the command prompt it's just raw text A MultiIndex can be created from a list of arrays using MultiIndex.
The Index constructor will attempt to return a MultiIndex when it is passed a list of tuples. The following examples demonstrate different ways to initialize MultiIndexes. When you want every pairing of the elements in two iterables, it can be easier to use the MultiIndex. As a convenience, you can pass a list of arrays directly into Series or DataFrame to construct a MultiIndex automatically:. All of the MultiIndex constructors accept a names argument which stores string names for the levels themselves.
If no names are provided, None will be assigned:. This index can back any axis of a pandas object, and the number of levels of the index is up to you:. The reason that the MultiIndex matters is that it can allow you to do grouping, selection, and reshaping operations as we will describe below and in subsequent areas of the documentation.
As you will see in later sections, you can find yourself working with hierarchically-indexed data without creating a MultiIndex explicitly yourself. However, when loading data from a file, you may wish to generate your own MultiIndex when preparing the data set. See Cross-section with hierarchical index for how to select on a deeper level.
The repr of a MultiIndex shows all the defined levels of an index, even if the they are not actually used. When slicing an index, you may notice this. This is done to avoid a recomputation of the levels in order to make slicing highly performant. If you want to see only the used levels, you can use the MultiIndex. Operations between differently-indexed objects having MultiIndex on the axes will work as you expect; data alignment will work the same as an Index of tuples:. Syntactically integrating MultiIndex in advanced indexing with.
In general, MultiIndex keys take the form of tuples. For example, the following works as you would expect:. If you also want to index a specific column with. This is a shortcut for the slightly more verbose notation df. It is important to note that tuples and lists are not treated identically in pandas when it comes to indexing. Whereas a tuple is interpreted as one multi-level key, a list is used to specify several keys.
Or in other words, tuples go horizontally traversing levels , lists go vertically scanning levels. Importantly, a list of tuples indexes several complete MultiIndex keys, whereas a tuple of lists refer to several values within a level:.
You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label , including slices, lists of labels, labels, and boolean indexers. You can use slice None to select all the contents of that level. You do not need to specify all the deeper levels, they will be implied as slice None. You should specify all axes in the.
There are some ambiguous cases where the passed indexer could be mis-interpreted as indexing both axes, rather than into say the MultiIndex for the rows. You can use pandas. IndexSlice to facilitate a more natural syntax using: It is possible to perform quite complicated selections using this method on multiple axes at the same time.
You can also specify the axis argument to. The xs method of DataFrame additionally takes a level argument to make selecting data at a particular level of a MultiIndex easier.
You can also select on the columns with xs , by providing the axis argument. The parameter level has been added to the reindex and align methods of pandas objects. This is useful to broadcast values across a level. The swaplevel function can switch the order of two levels:. For MultiIndex-ed objects to be indexed and sliced effectively, they need to be sorted. On higher dimensional objects, you can sort any of the other axes by level if they have a MultiIndex:.
Indexing will work even if the data are not sorted, but will be rather inefficient and show a PerformanceWarning. It will also return a copy of the data rather than a view:. Furthermore if you try to index something that is not fully lexsorted, this can raise:. Similar to NumPy ndarrays, pandas Index, Series, and DataFrame also provides the take method that retrieves elements along a given axis at the given indices.
The given indices must be either a list or an ndarray of integer index positions. For DataFrames, the given indices should be a 1d list or ndarray that specifies row or column positions.
It is important to note that the take method on pandas objects are not intended to work on boolean indices and may return unexpected results. Finally, as a small note on performance, because the take method handles a narrower range of inputs, it can offer performance that is a good deal faster than fancy indexing. We have discussed MultiIndex in the previous sections pretty extensively. CategoricalIndex is a type of index that is useful for supporting indexing with duplicates.
This is a container around a Categorical and allows efficient indexing and storage of an index with a large number of duplicated elements. Setting the index will create a CategoricalIndex.
The indexers must be in the category or the operation will raise a KeyError. The CategoricalIndex is preserved after indexing:. Sorting the index will sort by the order of the categories recall that we created the index with CategoricalDtype list 'cab' , so the sorted order is cab. Reindexing operations will return a resulting index based on the type of the passed indexer. Passing a list will return a plain-old Index ; indexing with a Categorical will return a CategoricalIndex , indexed according to the categories of the passed Categorical dtype.
This allows one to arbitrarily index these even with values not in the categories, similarly to how you can reindex any pandas index.
Reshaping and Comparison operations on a CategoricalIndex must have the same categories or a TypeError will be raised. Indexing on an integer-based Index with floats has been clarified in 0. Int64Index is a fundamental basic index in pandas. This is an Immutable array implementing an ordered, sliceable set. RangeIndex is a sub-class of Int64Index added in version 0. RangeIndex is an optimized version of Int64Index that can represent a monotonic ordered set.
These are analogous to Python range types. By default a Float64Index will be automatically created when passing floating, or mixed-integer-floating values in index creation. This enables a pure label-based slicing paradigm that makes ,ix,loc for scalar indexing and slicing work exactly the same.
Scalar selection for ,. An integer will match an equal float index e. The only positional indexing is via iloc. A scalar index that is not found will raise a KeyError.
Slicing is primarily on the values of the index when using ,ix,loc , and always positional when using iloc.