Typed-Dfsο
βοΈGuideο
Weβre going to wrangle and analyze data input from a bird-watching group.
Letβs just read a CSV. It looks like this:
species,person,date,notes
Blue Jay,Kerri Johnson,2021-05-14,perched in a tree
Weβd like to declare what this should look like.
import typeddfs as tdf
Sightings = (
tdf.typed("Sightings")
.require("species", "person", "date")
.reserve("notes")
.strict()
.build()
)
Letβs try reading a malformed CSV that is missing the βdateβ column.
Sightings.read_csv("missing_col.csv")
This will raise a typeddfs.errors.MissingColumnError
.
Much more to comeβ¦
Serializationο
Typing rulesο
Construction and customizationο
New functionsο
Natural sorting.
Matrix typesο
Imperative declarationο
Data types and freezingο
Checksums and cachingο
Advanced serializationο
Generating CLI-style helpο
Utilitiesο
π¨ Misc examplesο
π¨ Simple exampleο
from typeddfs import TypedDfs
MyDfType = (
TypedDfs.typed("MyDfType")
.require("name", index=True) # always keep in index
.require("value", dtype=float) # require a column and type
.drop("_temp") # auto-drop a column
.verify(lambda ddf: len(ddf) == 12) # require exactly 12 rows
).build()
df = MyDfType.read_file(input("filename? [.feather/.csv.gz/.tsv.xz/etc.]"))
df = df.sort_natural()
df.write_file("myfile.feather", mkdirs=True)
# want to write to a sha1sum-like (.sha256) file?
df.write_file("myfile.feather", file_hash=True)
# verify it?
MyDfType.read_file("myfile.feather", check_hash="file")
π A matrix-style DataFrameο
import numpy as np
from typeddfs import TypedDfs
Symmetric64 = (
TypedDfs.matrix("Symmetric64", doc="A symmetric float64 matrix")
.dtype(np.float64)
.verify(lambda df: df.values.sum().sum() == 1.0)
.add_methods(product=lambda df: df.flatten().product())
).build()
mx = Symmetric64.read_file("input.tab")
print(mx.product()) # defined above
if mx.is_symmetric():
mx = mx.triangle() # it's symmetric, so we only need half
long = mx.drop_na().long_form() # columns: "row", 'column", and "value"
long.write_file("long-form.xml")
π Example in terms of CSVο
For a CSV like this:
key,value,note
abc,123,?
from typeddfs import TypedDfs
# Build me a Key-Value-Note class!
KeyValue = (
TypedDfs.typed("KeyValue") # With enforced reqs / typing
.require("key", dtype=str, index=True) # automagically add to index
.require("value") # required
.reserve("note") # permitted but not required
.strict() # disallow other columns
).build()
# This will self-organize and use "key" as the index:
df = KeyValue.read_csv("example.csv")
# For fun, let"s write it and read it back:
df.to_csv("remake.csv")
df = KeyValue.read_csv("remake.csv")
print(df.index_names(), df.column_names()) # ["key"], ["value", "note"]
# And now, we can type a function to require a KeyValue,
# and let it raise an `InvalidDfError` (here, a `MissingColumnError`):
def my_special_function(df: KeyValue) -> float:
return KeyValue(df)["value"].sum()
π¨οΈ Q & Aο
What are the different types of typed DataFrames?ο
You should generally use two: typeddfs.typed_dfs.TypedDf
and typeddfs.matrix_dfs.MatrixDf
.
There is also a specialized matrix type, typeddfs.matrix_dfs.AffinityMatrixDf
.
You can construct these easily with typeddfs._entries.TypedDfs.typed()
,
typeddfs._entries.TypedDfs.matrix()
, and typeddfs._entries.TypedDfs.affinity_matrix()
.
There is a final type, defined to have no typing rules, that can be constructed with
typeddfs._entries.TypedDfs.untyped()
. You can convert a vanilla Pandas DataFrame to an βuntypedβ
variant via typeddfs._entries.TypedDfs.wrap()
to give it the additional methods.
from typeddfs import TypedDfs
MyDf = TypedDfs.typed("MyDf").build()
What is the hierarchy of DataFrames?ο
Itβs confusing. In general, you wonβt need to know the difference.
typeddfs.typed_dfs.TypedDf
and typeddfs.matrix_dfs.MatrixDf
inherit
from typeddfs.base_dfs.BaseDf
, which inherits from typeddfs.abs_dfs.AbsDf
,
which inherits from typeddfs._core_dfs.CoreDf
.
(Technically, CoreDf
inherits from typeddfs._pretty_dfs.PrettyDf
.)
The difference is:
typeddfs.base_dfs.BaseDf
has methodsconvert
andof
(generally overridden).typeddfs.abs_dfs.AbsDf
containstypeddfs.abs_dfs.AbsDf.get_typing()
, overrides IO methods from DataFrame, and addstypeddfs.abs_dfs.AbsDf.read_file()
andtypeddfs.abs_dfs.AbsDf.write_file()
.typeddfs._core_dfs.CoreDf
wraps DataFrame methods to retain the same type for returned DataFrames and adds a few extra methods.
What is the difference between __init__
, convert
, and of
?ο
These three methods in typeddfs.typed_dfs.TypedDf
(and its superclasses) are a bit different.
typeddfs.typed_dfs.TypedDf.__init__()
does NOT attempt to reorganize or validate your DataFrame,
while typeddfs.typed_dfs.TypedDf.convert()
and typeddfs.typed_dfs.TypedDf.of()
do.``of``
is simply more flexible than convert
: convert
only accepts a DataFrame,
while of
will take anything that DataFrame.__init__
will.
When do typed DFs βdetypeβ during chained invocations?ο
Most DataFrame-level functions that ordinarily return DataFrames themselves
try to keep the same type.
This includes typeddfs.abs_dfs.AbsDf.reindex()
,
typeddfs.abs_dfs.AbsDf.drop_duplicates()
,
typeddfs.abs_dfs.AbsDf.sort_values()
,
and typeddfs.abs_dfs.AbsDf.set_index()
.
This is to allow for easy chained invocation, but itβs important to note
that the returned DataFrame might not conform to your requirements.
Call typeddfs.abs_dfs.AbsDf.retype()
at the end to reorganize and verify.
from typeddfs import TypedDfs
MyDf = TypedDfs.typed("MyDf").require("valid").build()
my_df = MyDf.read_csv("x.csv")
my_df_2 = my_df.drop_duplicates().rename_cols(valid="ok")
print(type(my_df_2)) # type(MyDf)
# but this fails!
my_df_3 = my_df.drop_duplicates().rename_cols(valid="ok").retype()
# MissingColumnError "valid"
You can call typeddfs.abs_dfs.AbsDf.dtype()
to remove any typing rules
and typeddfs.abs_dfs.AbsDf.vanilla()
if you need a plain DataFrame,
though this should rarely be needed.
How does one get the typing info?ο
Call typeddfs.base_dfs.BaseDf.get_typing()
from typeddfs import TypedDfs
MyDf = TypedDfs.typed("MyDf").require("valid").build()
MyDf.get_typing().required_columns # ["valid"]
How are toml documents read and written?ο
These are limited to a single array of tables (AOT).
The AOT is named row
by default (set with aot=
).
On read, you can pass aot=None
to have it use the unique outermost key.
`
How are INI files read and written?ο
These require exactly 2 columns after reset_index()
.
Parsing is purposefully minimal because these formats are flexible.
Trailing whitespace and whitespace surrounding =
is ignored.
Values are not escaped, and keys may not contain =
.
Line continuation with \
is not allowed.
Quotation marks surrounding values are not dropped,
unless drop_quotes=True
is passed.
Comments begin with ;
, along with #
if hash_sign=True
is passed.
On read, section names are prepended to the keys.
For example, the key name will be section.key
in this example:
[section]
key = value
On write, the inverse happens.
What about .properties?ο
These are similar to INI files.
Only hash signs are allowed for comments, and reserved chars
are escaped in keys.
This includes \\
,``, ``\=
, and \:
These are not escaped in values.
What is βflex-width formatβ?ο
This is a format that shows up a lot in the wild, but doesnβt seem to have a name.
Itβs just a text format like TSV or CSV, but where columns are preferred to line up
in a fixed-width font. Whitespace is ignored on read, but on write the columns are made
to line up neatly. These files are easy to view.
By default, the delimiter is three vertical bars (|||
).
When are read and write guaranteed to be inverses?ο
In principle, this invariant holds when you call .strict()
to disallow
additional columns and specify dtype=
in all calls to .require
and .reserve
.
In practice, this might break down for certain combinations of
DataFrame structure, dtypes, and serialization format.
It seems pretty solid for Feather, Parquet, and CSV/TSV-like variants,
especially if the dtypes are limited to bools, real values, int values, and strings.
There may be corner cases for XML, TOML, INI, Excel, OpenDocument, and HDF5,
as well as for categorical and miscellaneous object
dtypes.
How do I include another filename suffix?ο
Use .suffix()
to register a suffix or remap it to another format.
from typeddfs import TypedDfs, FileFormat
MyDf = TypedDfs.typed("MyDf").suffix(tabbed="tsv").build()
# or:
MyDf = TypedDfs.typed("MyDf").suffix(**{".tabbed": FileFormat.tsv}).build()
How do the checksums work?ο
There are simple convenience flags to write sha1sum-like files while writing files, and to verify them when reading.
from pathlib import Path
from typeddfs import TypedDfs
MyDf = TypedDfs.typed("MyDf").build()
df = MyDf()
df.write_file("here.csv", file_hash=True)
# a hex-encoded hash and filename
Path("here.csv.sha256").read_text(encoding="utf8")
MyDf.read_file("here.csv", file_hash=True) # verifies that it matches
You can change the hash algorithm with .hash()
.
The second variant is dir_hash
.
from pathlib import Path
from typeddfs import TypedDfs, Checksums
MyDf = TypedDfs.typed("MyDf").build()
df = MyDf()
path = Path("dir", "here.csv")
df.write_file(path, dir_hash=True, mkdirs=True)
# potentially many hex-encoded hashes and filenames; always appended to
MyDf.read_file(path, dir_hash=True) # verifies that it matches
# read it
sums = Checksums.parse_hash_file_resolved(Path("my_dir", "my_dir.sha256"))
π© Nuts and boltsο
API Referenceο
This page contains auto-generated API reference documentation 1.
typeddfs
ο
Metadata and top-level declarations for typeddfs.
Subpackagesο
typeddfs._mixins
ο
Submodulesο
typeddfs._mixins._csv_like_mixin
οMixin for CSV and TSV.
- class typeddfs._mixins._csv_like_mixin._CsvLikeMixinο
- classmethod read_csv(*args, **kwargs) __qualname__ ο
Reads from CSV, converting to this type. Using to_csv() and read_csv() from BaseFrame, this property holds:
df.to_csv(path) df.__class__.read_csv(path) == df
Passing
index
onto_csv
orindex_col
onread_csv
explicitly will break this invariant.- Parameters
args β Passed to
pd.read_csv
; should start with a path or bufferkwargs β Passed to
pd.read_csv
.
- classmethod read_tsv(*args, **kwargs) __qualname__ ο
Reads tab-separated data. See
read_csv()
for more info.
- to_csv(*args, **kwargs) Optional[str] ο
typeddfs._mixins._dataclass_mixin
οDataclass mixin.
- class typeddfs._mixins._dataclass_mixin.TypedDfDataclassο
Just a
dataclass
for TypedDfs. Containsget_df_type()
to point to the original DataFrame.- get_as_dict() Mapping[str, Any] ο
Returns a mapping from the dataclass field name to the value.
- classmethod get_fields() Sequence[dataclasses.Field] ο
Returns the fields of this dataclass.
- class typeddfs._mixins._dataclass_mixin._DataclassMixinο
- classmethod _create_dataclass(fields: Sequence[Tuple[str, Type[Any]]]) Type[TypedDfDataclass] ο
- classmethod create_dataclass(reserved: bool = True) Type[TypedDfDataclass] ο
Creates a best-effort immutable
dataclass
for this type. The fields will depend on the columns and index levels present inget_typing()
. The type of each field will correspond to the specified dtype (typeddfs.df_typing.DfTyping.auto_dtypes()
), falling back toAny
if none is specified.Note
If this type can support additional columns (
typeddfs.df_typing.DfTyping.is_strict()
is the default,False
), the dataclass will not be able to support extra fields. For most cases,typeddfs.abs_dfs.AbsDf.to_dataclass_instances()
is better.- Parameters
reserved β Include reserved columns and index levels
- Returns
A subclass of
typeddfs.abs_dfs.TypedDfDataclass
- classmethod from_dataclass_instances(instances: Sequence[TypedDfDataclass]) __qualname__ ο
Creates a new instance of this DataFrame type from dataclass instances. This mostly delegates to
pd.DataFrame.__init__
, callingcls.of(instances)
. It is provided for consistency withto_dataclass_instances()
.- Parameters
instances β A sequence of dataclass instances. Although typed as
typeddfs.abs_dfs.TypedDfDataclass
, any type created by Pythonβsdataclass
module should work.- Returns
A new instance of this type
- to_dataclass_instances() Sequence[TypedDfDataclass] ο
Creates a dataclass from this DataFrame and returns instances. Also see
from_dataclass_instances()
.Note
Dataclass elements are equal if fields and values match, even if they are of different types. This was done by overriding
__eq__
to enable comparing results from separate calls to this method. Specifically,typeddfs.abs_dfs.TypedDfDataclass.get_as_dict()
must return True.Caution
Fields cannot be included if columns are not present. If
self.get_typing().is_strict is False
, then the dataclass created by two different DataFrames of typeself.__class__
may have different fields.Caution
A new dataclass is created per call, so
df.to_dataclass_instances()[0] is not df.to_dataclass_instances()[0]
.
typeddfs._mixins._excel_mixins
οMixin for Excel/ODF IO.
- class typeddfs._mixins._excel_mixins._ExcelMixinο
- classmethod read_excel(io, sheet_name: _SheetNamesOrIndices = 0, *args, **kwargs) __qualname__ ο
- classmethod read_ods(io, sheet_name: _SheetNamesOrIndices = 0, **kwargs) __qualname__ ο
Reads OpenDocument ODS/ODT files. Prefer this method over
read_excel()
.
- classmethod read_xls(io, sheet_name: _SheetNamesOrIndices = 0, **kwargs) __qualname__ ο
Reads legacy XLS Excel files. Prefer this method over
read_excel()
.
- classmethod read_xlsb(io, sheet_name: _SheetNamesOrIndices = 0, **kwargs) __qualname__ ο
Reads XLSB Excel files. This is a relatively uncommon format. Prefer this method over
read_excel()
.
- classmethod read_xlsx(io, sheet_name: _SheetNamesOrIndices = 0, **kwargs) __qualname__ ο
Reads XLSX Excel files. Prefer this method over
read_excel()
.
- to_excel(excel_writer, *args, **kwargs) Optional[str] ο
- to_ods(ods_writer, *args, **kwargs) Optional[str] ο
Writes OpenDocument ODS/ODT files. Prefer this method over
write_excel()
.
- to_xls(excel_writer, *args, **kwargs) Optional[str] ο
Reads legacy XLS Excel files. Prefer this method over
write_excel()
.
- to_xlsb(excel_writer, *args, **kwargs) Optional[str] ο
Writes XLSB Excel files. This is a relatively uncommon format. Prefer this method over
write_excel()
.
- to_xlsx(excel_writer, *args, **kwargs) Optional[str] ο
Writes XLSX Excel files. Prefer this method over
write_excel()
.
typeddfs._mixins._feather_parquet_hdf_mixin
οMixin for Feather, Parquet, and HDF5.
- class typeddfs._mixins._feather_parquet_hdf_mixin._FeatherParquetHdfMixinο
- classmethod read_feather(*args, **kwargs) __qualname__ ο
- classmethod read_hdf(*args, key: Optional[str] = None, **kwargs) __qualname__ ο
- classmethod read_parquet(*args, **kwargs) __qualname__ ο
- to_feather(path_or_buf, *args, **kwargs) Optional[str] ο
- to_hdf(path: typeddfs.utils._utils.PathLike, key: Optional[str] = None, **kwargs) None ο
- to_parquet(path_or_buf, *args, **kwargs) Optional[str] ο
typeddfs._mixins._flexwf_mixin
οMixin for flex-wf.
- class typeddfs._mixins._flexwf_mixin._FlexwfMixinο
- classmethod read_flexwf(path_or_buff, sep: str = '\\|\\|\\|', **kwargs) __qualname__ ο
Reads a βflexible-width formatβ. The delimiter (
sep
) is important. Note that ``sep`` is a regex pattern if it contains more than 1 char.These are designed to read and write (
to_flexwf
) as though they were fixed-width. Specifically, all of the columns line up but are separated by a possibly multi-character delimiter.The files ignore blank lines, strip whitespace, always have a header, never quote values, and have no default index column unless given by
required_columns()
, etc.- Parameters
path_or_buff β Path or buffer
sep β The delimiter, a regex pattern
kwargs β Passed to
read_csv
; may include βcommentβ and βskip_blank_linesβ
- to_flexwf(path_or_buff=None, sep: str = '|||', mode: str = 'w', **kwargs) Optional[str] ο
Writes a fixed-width formatter, optionally with a delimiter, which can be multiple characters.
See
read_flexwf
for more info.- Parameters
path_or_buff β Path or buffer
sep β The delimiter, 0 or more characters
mode β write or append (w/a)
kwargs β Passed to
Utils.write
; may include βencodingβ
- Returns
The string data if
path_or_buff
is a buffer; None if it is a file
typeddfs._mixins._formatted_mixin
οMixin for formats like HTML and RST.
- class typeddfs._mixins._formatted_mixin._FormattedMixinο
- _tabulate(fmt: Union[str, tabulate.TableFormat], **kwargs) str ο
- classmethod read_html(path: typeddfs.utils._utils.PathLike, *args, **kwargs) __qualname__ ο
Similar to
pd.read_html
, but requires exactly 1 table and returns it.- Raises
lxml.etree.XMLSyntaxError β If the HTML could not be parsed
NoValueError β If no tables are found
ValueNotUniqueError β If multiple tables are found
- to_html(*args, **kwargs) Optional[str] ο
- to_markdown(*args, **kwargs) Optional[str] ο
- to_rst(path_or_none: Optional[typeddfs.utils._utils.PathLike] = None, style: str = 'simple', mode: str = 'w') Optional[str] ο
Writes a reStructuredText table. :param path_or_none: Either a file path or
None
to return the string :param style: The type of table; currently only βsimpleβ is supported :param mode: Write mode
typeddfs._mixins._full_io_mixin
οCombines various IO mixins.
- class typeddfs._mixins._full_io_mixin._FullIoMixinο
- classmethod _call_read(clazz, path: Union[pathlib.Path, str], storage_options: Optional[pandas._typing.StorageOptions] = None) pandas.DataFrame ο
- _call_write(path: Union[pathlib.Path, str], storage_options: Optional[pandas._typing.StorageOptions] = None, atomic: bool = False) Optional[str] ο
- classmethod _check_io_ok(path: pathlib.Path, fmt: Optional[typeddfs.file_formats.FileFormat])ο
- classmethod _get_fmt(path: pathlib.Path) Optional[typeddfs.file_formats.FileFormat] ο
- classmethod _get_io(on, path: pathlib.Path, fmt: typeddfs.file_formats.FileFormat, custom, prefix: str)ο
- classmethod _get_read_kwargs(fmt: Optional[typeddfs.file_formats.FileFormat], path: pathlib.Path, storage_options: Optional[pandas._typing.StorageOptions]) Mapping[str, Any] ο
- classmethod _get_write_kwargs(fmt: Optional[typeddfs.file_formats.FileFormat], path: pathlib.Path, storage_options: Optional[pandas._typing.StorageOptions]) Mapping[str, Any] ο
- pretty_print(fmt: Union[None, str, tabulate.TableFormat] = None, *, to: Optional[typeddfs.utils._utils.PathLike] = None, mode: str = 'w', **kwargs) str ο
Outputs a pretty table using the tabulate package.
- Parameters
fmt β A tabulate format; if None, chooses according to
to
, falling back to"plain"
to β Write to this path (.gz, .zip, etc. is inferred)
mode β Write mode: βwβ, βaβ, or βxβ
kwargs β Passed to tabulate
- Returns
The formatted string
typeddfs._mixins._fwf_mixin
οMixin for fixed-width format.
- class typeddfs._mixins._fwf_mixin._FwfMixinο
- classmethod read_fwf(*args, **kwargs) __qualname__ ο
- to_fwf(path_or_buff=None, mode: str = 'w', colspecs: Optional[Sequence[Tuple[int, int]]] = None, widths: Optional[Sequence[int]] = None, na_rep: Optional[str] = None, float_format: Optional[str] = None, date_format: Optional[str] = None, decimal: str = '.', **kwargs) Optional[str] ο
Writes a fixed-width text format. See
read_fwf
andto_flexwf
for more info.- Parameters
path_or_buff β Path or buffer
mode β write or append (w/a)
colspecs β A list of tuples giving the extents of the fixed-width fields of each line as half-open intervals (i.e., [from, to[ )
widths β A list of field widths which can be used instead of
colspecs
if the intervals are contiguousna_rep β Missing data representation
float_format β Format string for floating point numbers
date_format β Format string for datetime objects
decimal β Character recognized as decimal separator. E.g. use β,β for European data.
kwargs β Passed to
typeddfs.utils.Utils.write()
- Returns
The string data if
path_or_buff
is a buffer; None if it is a file
typeddfs._mixins._ini_like_mixin
οMixin for INI, .properties, and TOML.
- class typeddfs._mixins._ini_like_mixin._IniLikeMixinο
- classmethod _assert_can_write_properties_class() None ο
- _assert_can_write_properties_instance() None ο
- classmethod _properties_files_apply() bool ο
- classmethod _read_properties_like(unescape_keys, unescape_values, comment_chars: Set[str], strip_quotes: bool, path_or_buff, **kwargs) __qualname__ ο
Reads a .properties-like file.
- _to_properties_like(escape_keys, escape_values, sep: str, comment_char: str, path_or_buff=None, mode: str = 'w', comment: Union[None, str, Sequence[str]] = None, **kwargs) Optional[str] ο
Writes a .properties-like file.
- classmethod read_ini(path_or_buff, hash_sign: bool = False, strip_quotes: bool = False, **kwargs) __qualname__ ο
Reads an INI file.
Caution
This is provided as a preview. It may have issues and may change.
- Parameters
path_or_buff β Path or buffer
hash_sign β Allow
#
to denote a comment (as well as;
)strip_quotes β Remove quotation marks (ββ or ββ) surrounding the values
kwargs β Passed to
typeddfs.utils.Utils.read()
- classmethod read_properties(path_or_buff, strip_quotes: bool = False, **kwargs) __qualname__ ο
Reads a .properties file. Backslashes, colons, spaces, and equal signs are escaped in keys and values.
Caution
This is provided as a preview. It may have issues and may change. It currently does not support continued lines (ending with an odd number of backslashes).
- Parameters
path_or_buff β Path or buffer
strip_quotes β Remove quotation marks (ββ) surrounding values
kwargs β Passed to
read_csv
; avoid setting
- classmethod read_toml(path_or_buff, aot: Optional[str] = 'row', aot_only: bool = True, **kwargs) __qualname__ ο
Reads a TOML file.
Caution
This is provided as a preview. It may have issues and may change.
- Parameters
path_or_buff β Path or buffer
aot β The name of the array of tables (i.e.
[[ table ]]
) If None, finds the unique outermost TOML key, implyingaot_only
.aot_only β Fail if any outermost keys other than the AOT are found
kwargs β Passed to
Utils.read
- to_ini(path_or_buff=None, comment: Union[None, str, Sequence[str]] = None, mode: str = 'w', **kwargs) __qualname__ ο
Writes an INI file.
Caution
This is provided as a preview. It may have issues and may change.
- Parameters
path_or_buff β Path or buffer
comment β Comment line(s) to add at the top of the document
mode β βwβ (write) or βaβ (append)
kwargs β Passed to
typeddfs.utils.Utils.write()
- to_properties(path_or_buff=None, mode: str = 'w', *, comment: Union[None, str, Sequence[str]] = None, **kwargs) Optional[str] ο
Writes a .properties file. Backslashes, colons, spaces, and equal signs are escaped in keys. Backslashes are escaped in values. The separator is always
=
.Caution
This is provided as a preview. It may have issues and may change.
- Parameters
path_or_buff β Path or buffer
comment β Comment line(s) to add at the top of the document
mode β Write (βwβ) or append (βaβ)
kwargs β Passed to
typeddfs.utils.Utils.write()
- Returns
The string data if
path_or_buff
is a buffer; None if it is a file
- to_toml(path_or_buff=None, aot: str = 'row', comment: Union[None, str, Sequence[str]] = None, mode: str = 'w', **kwargs) __qualname__ ο
Writes a TOML file.
Caution
This is provided as a preview. It may have issues and may change.
- Parameters
path_or_buff β Path or buffer
aot β The name of the array of tables (i.e.
[[ table ]]
)comment β Comment line(s) to add at the top of the document
mode β βwβ (write) or βaβ (append)
kwargs β Passed to
typeddfs.utils.Utils.write()
typeddfs._mixins._json_xml_mixin
οMixin for JSON and XML.
typeddfs._mixins._lines_mixin
οMixin for line-by-line text files.
- class typeddfs._mixins._lines_mixin._LinesMixinο
- classmethod _lines_files_apply() bool ο
- _tabulate(fmt: Union[str, tabulate.TableFormat], **kwargs) str ο
- classmethod read_lines(path_or_buff, **kwargs) __qualname__ ο
Reads a file that contains 1 row and 1 column per line. Skips lines that are blank after trimming whitespace. Also skips comments if
comment
is set.Caution
For technical reasons, values cannot contain a 6-em space (U+2008). Their presence will result in undefined behavior.
- Parameters
path_or_buff β Path or buffer
kwargs β Passed to
pd.DataFrame.read_csv
E.g. βcommentβ, βencodingβ, βskip_blank_linesβ, and βline_terminatorβ
- to_lines(path_or_buff=None, mode: str = 'w', **kwargs) Optional[str] ο
Writes a file that contains one row per line and 1 column per line. Associated with
.lines
or.txt
.Caution
For technical reasons, values cannot contain a 6-em space (U+2008). Their presence will result in undefined behavior.
- Parameters
path_or_buff β Path or buffer
mode β Write (βwβ) or append (βaβ)
kwargs β Passed to
pd.DataFrame.to_csv
- Returns
The string data if
path_or_buff
is a buffer; None if it is a file
typeddfs._mixins._new_methods_mixin
οMixin with misc new DataFrame methods.
- class typeddfs._mixins._new_methods_mixin._NewMethodsMixinο
- cfirst(cols: Union[str, int, Sequence[str]]) __qualname__ ο
Returns a new DataFrame with the specified columns appearing first.
- Parameters
cols β A list of columns, or a single column or column index
- drop_cols(*cols: Union[str, Iterable[str]]) __qualname__ ο
Drops columns, ignoring those that are not present.
- Parameters
cols β A single column name or a list of column names
- iter_row_col() Generator[Tuple[Tuple[int, int], Any], None, None] ο
Iterates over
((row, col), value)
tuples. The row and column are the row and column numbers, 1-indexed.
- only(column: str, exclude_na: bool = False) Any ο
Returns the single unique value in a column. Raises an error if zero or more than one value is in the column.
- Parameters
column β The name of the column
exclude_na β Exclude None/pd.NA values
- rename_cols(**cols) __qualname__ ο
Shorthand for
.rename(columns=)
.
- set_attrs(**attrs) __qualname__ ο
Sets
pd.DataFrame.attrs
, returning a copy.
- sort_natural(column: str, *, alg: Union[None, int, Set[str]] = None, reverse: bool = False) __qualname__ ο
Calls
natsorted
on a single column.- Parameters
column β The name of the (single) column to sort by
alg β Input as the
alg
argument tonatsorted
IfNone
, the βbestβ algorithm is chosen from the dtype ofcolumn
viatypeddfs.utils.Utils.guess_natsort_alg()
. Otherwise, :meth:typeddfs.utils.Utils.exact_natsort_alg` is called withUtils.exact_natsort_alg(alg)
.reverse β Reverse the sort order (e.g. βzβ before βaβ)
- sort_natural_index(*, alg: int = None, reverse: bool = False) __qualname__ ο
Calls natsorted on this index. Works for multi-index too.
- Parameters
alg β Input as the
alg
argument tonatsorted
IfNone
, the βbestβ algorithm is chosen from the dtype ofcolumn
viatypeddfs.utils.Utils.guess_natsort_alg()
. Otherwise, :meth:typeddfs.utils.Utils.exact_natsort_alg` is called withUtils.exact_natsort_alg(alg)
.reverse β Reverse the sort order (e.g. βzβ before βaβ)
- st(*array_conditions: Sequence[bool], **dict_conditions: Mapping[str, Any]) __qualname__ ο
Short for βsuch thatβ β an alternative to slicing with
.loc
.- Parameters
array_conditions β Conditions like
df["score"]<2
dict_conditions β Equality conditions, mapping column names to their values (ex
score=2
)
- Returns
A new DataFrame of the same type
- strip_control_chars() __qualname__ ο
Removes all control characters (Unicode group βCβ) from all string-typed columns.
typeddfs._mixins._pickle_mixin
οMixin for pickle.
typeddfs._mixins._pretty_print_mixin
οMixin that just overrides _repr_html.
- class typeddfs._mixins._pretty_print_mixin._PrettyPrintMixinο
A DataFrame with an overridden
_repr_html_
and some simple additional methods.- _dims() str ο
Returns a string describing the dimensionality.
- Returns
A text description of the dimensions of this DataFrame
- _repr_html_() str ο
Renders HTML for display() in Jupyter notebooks. Jupyter automatically uses this function.
- Returns
Just a string containing HTML, which will be wrapped in an HTML object
typeddfs._mixins._retype_mixin
οMixin that overrides Pandas functions to retype.
- class typeddfs._mixins._retype_mixin._RetypeMixinο
- __add__(other)ο
- __divmod__(other)ο
- __mod__(other)ο
- __mul__(other)ο
- __pow__(other)ο
- __radd__(other)ο
- __rdivmod__(other)ο
- __rmod__(other)ο
- __rmul__(other)ο
- __rpow__(other)ο
- __rsub__(other)ο
- __rtruediv__(other)ο
- __sub__(other)ο
- __truediv__(other)ο
- classmethod _change(df) __qualname__ ο
- classmethod _change_if_df(df)ο
- classmethod _convert_typed(df: pandas.DataFrame)ο
- _no_inplace(kwargs)ο
- abs() __qualname__ ο
- append(*args, **kwargs) __qualname__ ο
- applymap(*args, **kwargs) __qualname__ ο
- asfreq(*args, **kwargs) __qualname__ ο
- assign(**kwargs) __qualname__ ο
- astype(*args, **kwargs) __qualname__ ο
- bfill(**kwargs) __qualname__ ο
- convert_dtypes(*args, **kwargs) __qualname__ ο
- copy(deep: bool = False) __qualname__ ο
- drop(*args, **kwargs) __qualname__ ο
- drop_duplicates(**kwargs) __qualname__ ο
- dropna(*args, **kwargs) __qualname__ ο
- ffill(**kwargs) __qualname__ ο
- fillna(*args, **kwargs) __qualname__ ο
- infer_objects(*args, **kwargs) __qualname__ ο
- reindex(*args, **kwargs) __qualname__ ο
- rename(*args, **kwargs) __qualname__ ο
- replace(*args, **kwargs) __qualname__ ο
- reset_index(*args, **kwargs) __qualname__ ο
- set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False) __qualname__ ο
- shift(*args, **kwargs) __qualname__ ο
- sort_values(*args, **kwargs) __qualname__ ο
- to_period(*args, **kwargs) __qualname__ ο
- to_timestamp(*args, **kwargs) __qualname__ ο
- transpose(*args, **kwargs) __qualname__ ο
- truncate(*args, **kwargs) __qualname__ ο
- tz_convert(*args, **kwargs) __qualname__ ο
- tz_localize(*args, **kwargs) __qualname__ ο
typeddfs.utils
ο
Tools that could possibly be used outside of typed-dfs.
Submodulesο
typeddfs.utils._format_support
οHandles optional packages required for formats.
- typeddfs.utils._format_support.DfFormatSupportο
- typeddfs.utils._format_support.fastparquetο
- typeddfs.utils._format_support.openpyxlο
- typeddfs.utils._format_support.pyarrowο
- typeddfs.utils._format_support.pyxlsbο
- typeddfs.utils._format_support.tablesο
- typeddfs.utils._format_support.tomlkitο
typeddfs.utils._utils
οInternal utilities for typeddfs.
- typeddfs.utils._utils.PathLikeο
- typeddfs.utils._utils._AUTO_DROPPED_NAMESο
- typeddfs.utils._utils._DEFAULT_ATTRS_SUFFIX = .attrs.jsonο
- typeddfs.utils._utils._DEFAULT_HASH_ALG = sha256ο
- typeddfs.utils._utils._FAKE_SEP =ο
- typeddfs.utils._utils._FORBIDDEN_NAMESο
- typeddfs.utils._utils._SENTINELο
typeddfs.utils.checksum_models
οModels for shasum-like files.
- class typeddfs.utils.checksum_models.ChecksumFileο
- delete() None ο
Deletes the hash file by calling
pathlib.Path.unlink(self.hash_path)
.- Raises
OSError β Accordingly
- property file_path pathlib.Path ο
- property hash_value str ο
- load() __qualname__ ο
Returns a copy of
self
read fromhash_path
.
- classmethod new(hash_path: typeddfs.utils._utils.PathLike, file_path: typeddfs.utils._utils.PathLike, hash_value: str) ChecksumFile ο
Use this as a constructor.
- classmethod parse(path: pathlib.Path, *, lines: Optional[Sequence[str]] = None) __qualname__ ο
Reads hash file contents.
- Parameters
path β The path of the checksum file; required to resolve paths relative to its parent
lines β The lines in the checksum file; reads
path
if None
- Returns
A ChecksumFile
- rename(path: pathlib.Path) __qualname__ ο
Replaces
self.file_path
withpath
. This will affect the filename written in a .shasum-like file. No OS operations are performed.
- update(value: str, overwrite: Optional[bool] = True) __qualname__ ο
Modifies the hash.
- Parameters
value β The new hex-encoded hash
overwrite β If
None
, requires that the value is the same as before (no operation is performed). IfFalse
, this method will always raise an error.
- verify(computed: str) None ο
Verifies the checksum.
- Parameters
computed β A pre-computed hex-encoded hash
- Raises
HashDidNotValidateError β If the hashes are not equal
- write() None ο
Writes the hash file.
- Raises
OsError β Accordingly
- class typeddfs.utils.checksum_models.ChecksumMappingο
- __add__(other: Union[ChecksumMapping, Mapping[typeddfs.utils._utils.PathLike, str], __qualname__]) __qualname__ ο
Performs a symmetric addition.
- Raises
ValueError β If
other
intersects (shares keys) withself
See also
- __contains__(path: pathlib.Path) bool ο
- __getitem__(path: pathlib.Path) str ο
- __len__() int ο
- __sub__(other: Union[typeddfs.utils._utils.PathLike, Iterable[typeddfs.utils._utils.PathLike], ChecksumMapping]) __qualname__ ο
Removes entries.
See also
- append(append: Mapping[typeddfs.utils._utils.PathLike, str], *, overwrite: Optional[bool] = False) __qualname__ ο
Append paths to a dir hash file. Like
update()
but less flexible and only for adding paths.
- property entries Mapping[pathlib.Path, str] ο
- get(key: pathlib.Path, default: Optional[str] = None) Optional[str] ο
- items() AbstractSet[Tuple[pathlib.Path, str]] ο
- keys() AbstractSet[pathlib.Path] ο
- load(missing_ok: bool = False) __qualname__ ο
Replaces this map with one read from the hash file.
- Parameters
missing_ok β If the hash path does not exist, treat it has having no items
- classmethod new(hash_path: typeddfs.utils._utils.PathLike, dct: Mapping[typeddfs.utils._utils.PathLike, str]) ChecksumMapping ο
Use this as the constructor.
- classmethod parse(path: pathlib.Path, *, lines: Optional[Sequence[str]] = None, missing_ok: bool = False, subdirs: bool = False) __qualname__ ο
Reads hash file contents.
- Parameters
path β The path of the checksum file; required to resolve paths relative to its parent
lines β The lines in the checksum file; reads
path
if Nonemissing_ok β If
path
does not exist, assume it contains no itemssubdirs β Permit files within subdirectories specified with
/
Most tools do not support these.
- Returns
A mapping from raw string filenames to their hex hashes. Any node called
./
in the path is stripped.
- remove(remove: Union[typeddfs.utils._utils.PathLike, Iterable[typeddfs.utils._utils.PathLike]], *, missing_ok: bool = False) __qualname__ ο
Strips paths from this hash collection. Like
update()
but less flexible and only for removing paths.- Raises
typeddfs.df_errors.PathNotRelativeError β To avoid, try calling
resolve
first
- update(update: Union[Callable[[pathlib.Path], Optional[typeddfs.utils._utils.PathLike]], Mapping[typeddfs.utils._utils.PathLike, Optional[typeddfs.utils._utils.PathLike]]], *, missing_ok: bool = True, overwrite: Optional[bool] = True) __qualname__ ο
Returns updated hashes from a dir hash file.
- Parameters
update β Values to overwrite. May be a function or a dictionary from paths to values. If
None
is returned, the entry will be removed; otherwise, updates with the returned hex hash.missing_ok β Require that the path is already listed
overwrite β Allow overwriting an existing value. If
None
, only allow if the hash is the same.
- values() ValuesView[str] ο
- verify(path: typeddfs.utils._utils.PathLike, computed: str, *, resolve: bool = False, exist: bool = False) None ο
Verifies a checksum. The file
path
must be listed.- Parameters
path β The file to look for
computed β A pre-computed hex-encoded hash; if set, do not calculate from
path
resolve β Resolve paths before comparison
exist β Require that
path
exists
- Raises
FileNotFoundError β If
path
does not existHashFileMissingError β If the hash file does not exist
HashDidNotValidateError β If the hashes are not equal
HashVerificationError` β Superclass of
HashDidNotValidateError
if the filename is not listed, etc.
- write(*, sort: Union[bool, Callable[[Sequence[pathlib.Path]], Sequence[pathlib.Path]]] = False, rm_if_empty: bool = False) None ο
Writes to the hash (.shasum-like) file.
- Parameters
sort β Sort with this function, or
sorted
if Truerm_if_empty β Delete with
pathlib.Path.unlink
if this contains no items
- Raises
OSError β Accordingly
typeddfs.utils.checksums
οTools for shasum-like files.
- class typeddfs.utils.checksums.Checksumsο
- alg :strο
- calc_hash(path: typeddfs.utils._utils.PathLike) str ο
Calculates the hash of a file and returns it, hex-encoded.
- classmethod default_algorithm() str ο
- delete_any(path: typeddfs.utils._utils.PathLike, *, rm_if_empty: bool = False) None ο
Deletes the filesum and removes
path
from the dirsum. Ignores missing files.
- generate_dirsum(directory: typeddfs.utils._utils.PathLike, glob: str = '*') typeddfs.utils.checksum_models.ChecksumMapping ο
Generates a new hash mapping, calculating hashes for extant files.
- Parameters
directory β Base directory
glob β Glob pattern under
directory
(cannot be recursive)
- Returns
A ChecksumMapping; use
.write
to write it
- get_dirsum_of_dir(path: typeddfs.utils._utils.PathLike) pathlib.Path ο
Returns the path required for the per-directory hash of
path
.Example
Utils.get_hash_file("my_dir") # Path("my_dir", "my_dir.sha256")
- get_dirsum_of_file(path: typeddfs.utils._utils.PathLike) pathlib.Path ο
Returns the path required for the per-directory hash of
path
.Example
Utils.get_hash_file(Path("my_dir, my_file.txt.gz")) # Path("my_dir", "my_dir.sha256")
- get_filesum_of_file(path: typeddfs.utils._utils.PathLike) pathlib.Path ο
Returns the path required for the per-file hash of
path
.Example
Utils.get_hash_file("my_file.txt.gz") # Path("my_file.txt.gz.sha256")
- classmethod guess_algorithm(path: typeddfs.utils._utils.PathLike) str ο
Guesses the hashlib algorithm used from a hash file.
- Parameters
path β The hash file (e.g. my-file.sha256)
Example
Utils.guess_algorithm("my_file.sha1") # "sha1"
- load_dirsum_exact(path: typeddfs.utils._utils.PathLike, *, missing_ok: bool = True) typeddfs.utils.checksum_models.ChecksumMapping ο
- load_dirsum_of_dir(path: typeddfs.utils._utils.PathLike, *, missing_ok: bool = True) typeddfs.utils.checksum_models.ChecksumMapping ο
- load_dirsum_of_file(path: typeddfs.utils._utils.PathLike, *, missing_ok: bool = True) typeddfs.utils.checksum_models.ChecksumMapping ο
- load_filesum_exact(path: typeddfs.utils._utils.PathLike) typeddfs.utils.checksum_models.ChecksumFile ο
- load_filesum_of_file(path: typeddfs.utils._utils.PathLike) typeddfs.utils.checksum_models.ChecksumFile ο
- classmethod resolve_algorithm(alg: str) str ο
Finds a hash algorithm by name in
hashlib
. Converts to lowercase and removes hyphens.- Raises
HashAlgorithmMissingError β If not found
- verify_any(path: typeddfs.utils._utils.PathLike, *, file_hash: bool, dir_hash: bool, computed: Optional[str]) Optional[str] ο
- verify_hex(path: typeddfs.utils._utils.PathLike, expected: str) Optional[str] ο
Verifies a hash directly from a hex string.
- write_any(path: typeddfs.utils._utils.PathLike, *, to_file: bool, to_dir: bool, overwrite: Optional[bool] = True) Optional[str] ο
Adds and/or appends the hex hash of
path
.- Parameters
path β Path to the file to hash
to_file β Whether to save a per-file hash
to_dir β Whether to save a per-dir hash
overwrite β If True, overwrite the file hash and any entry in the dir hash. If False, never overwrite either. If None, never overwrite, but ignore if equal to any existing entries.
typeddfs.utils.cli_help
οUtils for getting nice CLI help on DataFrame inputs.
Attention
The exact text used in this module are subject to change.
Note
Two consecutive newlines (\n\n
) are used to separate sections.
This is consistent with a number of formats, including Markdown,
reStructuredText, and Typer.
- class typeddfs.utils.cli_help.DfCliHelpο
- classmethod help(clazz: Type[typeddfs.abs_dfs.AbsDf]) DfHelp ο
Returns info suitable for CLI help.
Display this info as the help description for an argument thatβs a path to a table file that will be read with
typeddfs.abs_dfs.AbsDf.read_file()
forclazz
.- Parameters
clazz β The
typeddfs.typed_dfs.AbsDf
subclass
- classmethod list_formats(*, flexwf_sep: str = _FLEXWF_SEP, hdf_key: str = _HDF_KEY, toml_aot: str = _TOML_AOT) DfFormatsHelp ο
Lists all file formats with descriptions.
For example,
typeddfs.file_formats.FileFormat.ods
is βOpenDocument Spreadsheetβ.
- class typeddfs.utils.cli_help.DfFormatHelpο
Help text on a specific file format.
- desc :strο
- fmt :typeddfs.file_formats.FileFormatο
- property all_suffixes Sequence[str] ο
Returns all suffixes, naturally sorted.
- property bare_suffixes Sequence[str] ο
Returns all suffixes, excluding compressed variants (etc.
.gz
), naturally sorted.
- get_text() str ο
Returns a 1-line string of the suffixes and format description.
- class typeddfs.utils.cli_help.DfFormatsHelpο
Help on file formats only.
Initialize self. See help(type(self)) for accurate signature.
- get_long_text(*, recommended_only: bool = False, nl: str = '\n', bullet: str = '- ', indent: str = ' ') str ο
Returns a multi-line text listing of allowed file formats.
- Parameters
recommended_only β Skip non-recommended file formats
nl β Newline characters; use βnβ, β\nβ, or β β
bullet β Prepended to each item
indent β Spaces for nested indent
- Returns
- Something like::
[[ Supported formats ]]:
.csv[.bz2/.gz/.xz/.zip]: comma-delimited
.parquet/.snappy: Parquet
.h5/.hdf/.hdf5: HDF5 (key βdfβ) [discouraged]
.pickle/.pkl: Python Pickle [discouraged]
- get_short_text(*, recommended_only: bool = False) str ο
Returns a single-line text listing of allowed file formats.
- Parameters
recommended_only β Skip non-recommended file formats
- Returns
- Something like::
.csv, .tsv/.tab, or .flexwf [.gz,/.xz,/.zip/.bz2]; .feather, .pickle, or .snappy β¦
- class typeddfs.utils.cli_help.DfHelpο
Info on a TypedDf suitable for CLI help.
- clazz :Type[typeddfs.abs_dfs.AbsDf]ο
- formats :DfFormatsHelpο
- get_header_text(*, use_doc: bool = True, nl: str = '\n') str ο
Returns a multi-line header of the DataFrame name and docstring.
- Parameters
use_doc β Include the docstring, as long as it is not
None
nl β Newline characters; use βnβ, βnnβ, or β β
- Returns
- Something like::
Path to a Big Table file.
This is a big table for big things.
- get_long_text(*, use_doc: bool = True, recommended_only: bool = False, nl: str = '\n', bullet: str = '- ', indent: str = ' ') str ο
Returns a multi-line text description of the DataFrame. Includes its required and optional columns, and supported file formats.
- Parameters
use_doc β Include the docstring of the DataFrame type
recommended_only β Only include recommended formats
nl β Newline characters; use βnβ, βnnβ, or β β
bullet β Prepended to each item
indent β Spaces for nested indent
- abstract get_long_typing_text() str ο
Returns multi-line text on only the required columns / structure.
- get_short_text(*, use_doc: bool = True, recommended_only: bool = False, nl: str = '\n') str ο
Returns a multi-line description with compressed text.
- Parameters
use_doc β Include the docstring of the DataFrame type
recommended_only β Only include recommended formats
nl β Newline characters; use βnβ, β\nβ, or β β
- abstract get_short_typing_text() str ο
Returns 1-line text on only the required columns / structure.
- property typing typeddfs.df_typing.DfTyping ο
typeddfs.utils.dtype_utils
οData type tools for typed-dfs.
- class typeddfs.utils.dtype_utils.DtypeUtilsο
- is_boolο
- is_bool_dtypeο
- is_categoricalο
- is_categorical_dtypeο
- is_complexο
- is_complex_dtypeο
- is_datetime64_any_dtypeο
- is_datetime64tz_dtypeο
- is_extension_typeο
- is_floatο
- is_float_dtypeο
- is_integerο
- is_integer_dtypeο
- is_intervalο
- is_interval_dtypeο
- is_numberο
- is_numeric_dtypeο
- is_object_dtypeο
- is_period_dtypeο
- is_scalarο
- is_string_dtypeο
- classmethod describe_dtype(t: Type[Any], *, short: bool = False) Optional[str] ο
Returns a string name for a Pandas-supported dtype.
- Parameters
t β Any Python type
short β Use shorter strings (e.g. βintβ instead of βintegerβ)
- Returns
A string like βfloating-pointβ or βzoned datetimeβ. Returns
None
if no good name is found or ift
isNone
.
typeddfs.utils.io_utils
οTools for IO.
- class typeddfs.utils.io_utils.IoUtilsο
- classmethod get_encoding(encoding: str = 'utf-8') str ο
Returns a text encoding from a more flexible string. Ignores hyphens and lowercases the string. Permits these nonstandard shorthands:
"platform"
: usesys.getdefaultencoding()
on the fly"utf8(bom)"
: use"utf-8-sig"
on Windows;"utf-8"
otherwise"utf16(bom)"
: use"utf-16-sig"
on Windows;"utf-16"
otherwise"utf32(bom)"
: use"utf-32-sig"
on Windows;"utf-32"
otherwise
- classmethod get_encoding_errors(errors: Optional[str]) Optional[str] ο
Returns the value passed as``errors=`` in
open
. :raises ValueError: If invalid
- classmethod is_binary(path: typeddfs.utils._utils.PathLike) bool ο
- classmethod path_or_buff_compression(path_or_buff, kwargs) typeddfs.file_formats.CompressionFormat ο
- classmethod read(path_or_buff, *, mode: str = 'r', **kwargs) str ο
Reads using Pandasβs
get_handle
. By default (unlesscompression=
is set), infers the compression type from the filename suffix. (e.g..csv.gz
).
- classmethod tmp_path(path: typeddfs.utils._utils.PathLike, extra: str = 'tmp') pathlib.Path ο
- classmethod verify_can_read_files(*paths: Union[str, pathlib.Path], missing_ok: bool = False, attempt: bool = False) None ο
Checks that all files can be written to, to ensure atomicity before operations.
- Parameters
*paths β The files
missing_ok β Donβt raise an error if a path doesnβt exist
attempt β Actually try opening
- Returns
If a path is not a file (modulo existence) or doesnβt have βWβ set
- Return type
- classmethod verify_can_write_dirs(*paths: Union[str, pathlib.Path], missing_ok: bool = False) None ο
Checks that all directories can be written to, to ensure atomicity before operations.
- Parameters
*paths β The directories
missing_ok β Donβt raise an error if a path doesnβt exist
- Returns
If a path is not a directory (modulo existence) or doesnβt have βWβ set
- Return type
- classmethod verify_can_write_files(*paths: Union[str, pathlib.Path], missing_ok: bool = False, attempt: bool = False) None ο
Checks that all files can be written to, to ensure atomicity before operations.
- Parameters
*paths β The files
missing_ok β Donβt raise an error if a path doesnβt exist
attempt β Actually try opening
- Returns
If a path is not a file (modulo existence) or doesnβt have βWβ set
- Return type
- classmethod write(path_or_buff, content, *, mode: str = 'w', atomic: bool = False, **kwargs) Optional[str] ο
Writes using Pandasβs
get_handle
. By default (unlesscompression=
is set), infers the compression type from the filename suffix (e.g..csv.gz
).
typeddfs.utils.json_utils
οTools that could possibly be used outside of typed-dfs.
- class typeddfs.utils.json_utils.JsonDecoderο
- from_bytes(data: ByteString) Any ο
- from_str(data: str) Any ο
- class typeddfs.utils.json_utils.JsonEncoderο
- bytes_options :intο
- default :Callable[[Any], Any]ο
- prep :Optional[Callable[[Any], Any]]ο
- str_options :intο
- as_bytes(data: Any) ByteString ο
- as_str(data: Any) str ο
- class typeddfs.utils.json_utils.JsonUtilsο
- classmethod decoder() JsonDecoder ο
- classmethod encoder(*fallbacks: Optional[Callable[[Any], Any]], indent: bool = True, sort: bool = False, preserve_inf: bool = True, last: Optional[Callable[[Any], Any]] = str) JsonEncoder ο
Serializes to string with orjson, indenting and adding a trailing newline. Uses
orjson_default()
to encode more types than orjson can.- Parameters
indent β Indent by 2 spaces
preserve_inf β Preserve infinite values with
orjson_preserve_inf()
sort β Sort keys with
orjson.OPT_SORT_KEYS
; only fortypeddfs.json_utils.JsonEncoder.as_str()
last β Last resort option to encode a value
- classmethod misc_types_default() Callable[[Any], Any] ο
- classmethod new_default(*fallbacks: Optional[Callable[[Any], Any]], first: Optional[Callable[[Any], Any]] = _misc_types_default, last: Optional[Callable[[Any], Any]] = str) Callable[[Any], Any] ο
Creates a new method to be passed as
default=
toorjson.dumps
. Tries, in order:orjson_default()
,fallbacks
, thenstr
.- Parameters
first β Try this first
fallbacks β Tries these, in order, after
first
, skipping any Nonelast β Use this as the last resort; consider
str
orrepr
- classmethod preserve_inf(data: Any) Any ο
Recursively replaces infinite float and numpy values with strings. Orjson encodes NaN, inf, and +inf as JSON null. This function converts to string as needed to preserve infinite values. Any float scalar (
np.floating
andfloat
) will be replaced with a string. Anynp.ndarray
, whether it contains an infinite value or not, will be converted to an ndarray of strings. The returned result may still not be serializable with orjson ororjson_bytes()
. Trying those methods is the best way to test for serializablity.
typeddfs.utils.misc_utils
οMisc tools for typed-dfs.
- class typeddfs.utils.misc_utils.MiscUtilsο
- classmethod choose_table_format(*, path: typeddfs.utils._utils.PathLike, fmt: Union[None, tabulate.TableFormat, str] = None, default: str = 'plain') Union[str, tabulate.TableFormat] ο
Makes a best-effort guess of a good tabulate format from a path name.
- classmethod delete_file(path: typeddfs.utils._utils.PathLike, *, missing_ok: bool = False, alg: str = _DEFAULT_HASH_ALG, attrs_suffix: str = _DEFAULT_ATTRS_SUFFIX, rm_if_empty: bool = True) None ο
Deletes a file, plus the checksum file and/or directory entry, and
.attrs.json
.- Parameters
path β The path to delete
missing_ok β ok if the path does not exist (will still delete any associated paths)
alg β The checksum algorithm
attrs_suffix β The suffix for attrs file (normally .attrs.json)
rm_if_empty β Remove the dir checksum file if it contains no additional paths
- Raises
typeddfs.df_errors.PathNotRelativeError β To avoid, try calling
resolve
first
- classmethod freeze(v: Any) Any ο
Returns
v
or a hashable view of it. Note that the returned types must be hashable but might not be ordered. You can generally add these values as DataFrame elements, but you might not be able to sort on those columns.- Parameters
v β Any value
- Returns
Either
v
itself, atypeddfs.utils.FrozeSet
(subclass oftyping.AbstractSet
), atypeddfs.utils.FrozeList
(subclass oftyping.Sequence
), or atypeddfs.utils.FrozeDict
(subclass oftyping.Mapping
). int, float, str, np.generic, and tuple are always returned as-is.- Raises
AttributeError β If
v
is not hashable and could not converted to a FrozeSet, FrozeList, or FrozeDict, or if one of the elements for one of the above types is not hashable.TypeError β If
v
is anIterator
or collections.deque`
- classmethod join_to_str(*items: Any, last: str, sep: str = ', ') str ο
Joins items to something like βcat, dog, and pigeonβ or βcat, dog, or pigeonβ.
- Parameters
items β Items to join;
str(item) for item in items
will be usedlast β Probably βandβ, βorβ, βand/orβ, or ββ Spaces are added/removed as needed if
suffix
is alphanumeric or βand/orβ, after stripping whitespace off the ends.sep β Used to separate all words; include spaces as desired
Examples
join_to_str(["cat", "dog", "elephant"], last="and") # cat, dog, and elephant
join_to_str(["cat", "dog"], last="and") # cat and dog
join_to_str(["cat", "dog", "elephant"], last="", sep="/") # cat/dog/elephant
- classmethod plain_table_format(*, sep: str = ' ', **kwargs) tabulate.TableFormat ο
Creates a simple tabulate style using a column-delimiter
sep
.- Returns
A tabulate
TableFormat
, which can be passed as a style
- classmethod table_format(fmt: str) tabulate.TableFormat ο
Gets a tabulate style by name.
- Returns
A TableFormat, which can be passed as a style
typeddfs.utils.parse_utils
οMisc tools for typed-dfs.
- class typeddfs.utils.parse_utils.ParseUtilsο
- classmethod _re_leaf(at: str, items: Mapping[str, Any]) Generator[Tuple[str, Any], None, None] ο
- classmethod _un_leaf(to: MutableMapping[str, Any], items: Mapping[str, Any]) None ο
- classmethod dict_to_dots(items: Mapping[str, Any]) Mapping[str, Any] ο
Performs the inverse of
dots_to_dict()
.Example
Utils.dict_to_dots({"genus": {"species": "fruit bat"}}) == {"genus.species": "fruit bat"}
- classmethod dicts_to_toml_aot(dicts: Sequence[Mapping[str, Any]])ο
Make a tomlkit Document consisting of an array of tables (βAOTβ).
- Parameters
dicts β A sequence of dictionaries
- Returns
//github.com/sdispater/tomlkit/blob/master/tomlkit/items.py>`_ (i.e.
[[array]]
)- Return type
A tomlkit`AoT<https
- classmethod dots_to_dict(items: Mapping[str, Any]) Mapping[str, Any] ο
Make sub-dictionaries from substrings in
items
delimited by.
. Used for TOML.Example
Utils.dots_to_dict({"genus.species": "fruit bat"}) == {"genus": {"species": "fruit bat"}}
See also
- classmethod property_key_escape(s: str) str ο
Escapes a key in a .property file.
- classmethod property_key_unescape(s: str) str ο
Un-escapes a key in a .property file.
- classmethod property_value_escape(s: str) str ο
Escapes a value in a .property file.
- classmethod property_value_unescape(s: str) str ο
Un-escapes a value in a .property file.
- classmethod strip_control_chars(s: str) str ο
Strips all characters under the Unicode βCcβ category.
typeddfs.utils.sort_utils
οTools for sorting.
- class typeddfs.utils.sort_utils.SortUtilsο
- classmethod _ns_info_from_int_flag(val: int) NatsortFlagsAndValue ο
- classmethod all_natsort_flags() Mapping[str, int] ο
Returns all flags defined by natsort, including combined and default flags. Combined flags are, e.g.,
ns_enum.ns.REAL ns_enum.nsFLOAT | ns_enum.ns.SIGNED.
. Default flags are, e.g.,ns_enum.ns.UNSIGNED
.See also
std_natsort_flags()
- Returns
A mapping from flag name to int value
- classmethod core_natsort_flags() Mapping[str, int] ο
Returns natsort flags that are not combinations or defaults.
See also
- Returns
A mapping from flag name to int value
- classmethod exact_natsort_alg(flags: Union[None, int, Collection[Union[int, str]]]) NatsortFlagsAndValue ο
Gets the flag names and combined
alg=
argument for natsort.Examples
exact_natsort_alg({"REAL"}) == ({"FLOAT", "SIGNED"}, ns.FLOAT | ns.SIGNED)
exact_natsort_alg({}) == ({}, 0)
exact_natsort_alg(ns.LOWERCASEFIRST) == ({"LOWERCASEFIRST"}, ns.LOWERCASEFIRST)
exact_natsort_alg({"localenum", "numafter"})
== ({"LOCALENUM", "NUMAFTER"}, ns.LOCALENUM | ns.NUMAFTER)
- Parameters
flags β Can be either: - a single integer
alg
argument - a set of flag ints and/or names innatsort.ns
- Returns
A tuple of the set of flag names, and the corresponding input to
natsorted
Only uses standard flag names, never the βcombinedβ ones. (E.g.exact_natsort_alg({"REAL"})
will return({"FLOAT", "SIGNED"}, ns.FLOAT | ns.SIGNED)
.
- classmethod guess_natsort_alg(dtype: Type[Any]) NatsortFlagsAndValue ο
Guesses a good natsorted flag for the dtype.
- Here are some specifics:
integers β INT and SIGNED
floating-point β FLOAT and SIGNED
strings β COMPATIBILITYNORMALIZE and GROUPLETTERS
datetime β GROUPLETTERS (only affects βZβ vs. βzβ; shouldnβt matter)
- Parameters
dtype β Probably from
pd.Series.dtype
- Returns
A tuple of (set of flags, int) β see
exact_natsort_alg()
- classmethod natsort(lst: Iterable[T], dtype: Type[T], *, alg: Union[None, int, Set[str]] = None, reverse: bool = False) Sequence[T] ο
Perform a natural sort consistent with the type
dtype
. Uses natsort.See also
- Parameters
lst β A sequence of things to sort
dtype β The type; must be a subclass of each element in
lst
alg β A specific natsort algorithm or set of flags
reverse β Sort in reverse (e.g. Z to A or 9 to 1)
Package Contentsο
Submodulesο
typeddfs._core_dfs
ο
Module Contentsο
- class typeddfs._core_dfs.CoreDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
An abstract Pandas DataFrame subclass with additional methods.
- abstract classmethod new_df(**kwargs) __qualname__ ο
Creates a new, somewhat arbitrary DataFrame of this type. Calling this with no arguments should always be supported.
- Parameters
**kwargs β These should be narrowed by the overriding method as needed.
- Raises
UnsupportedOperationError β Can be raised if a valid DataFrame is too difficult to create.
InvalidDfError β May be raised if the type requires specific constraints and did not overload this method to account for them. While programmers using the type should be aware of this possibility, consuming code, in general, should assume that
new_df
will always work.
- vanilla() pandas.DataFrame ο
Makes a copy thatβs a normal Pandas DataFrame.
- Returns
A shallow copy with its
__class__
set to pd.DataFrame
typeddfs._entries
ο
Convenient code for import.
Module Contentsο
- typeddfs._entries.affinity_matrixο
- typeddfs._entries.exampleο
- typeddfs._entries.matrixο
- typeddfs._entries.typedο
- typeddfs._entries.untypedο
- typeddfs._entries.wrapο
- class typeddfs._entries.FinalDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
An untyped DataFrame meant for general use.
- class typeddfs._entries.TypedDfsο
The only thing you need to import from
typeddfs
.Contains static factory methods to build new DataFrame subclasses. In particular, see:
- :meth:`typed` - :meth:`untyped` - :meth:`matrix` - :meth:`affinity_matrix`
- Checksumsο
- ClashErrorο
- CompressionFormatο
- FileFormatο
- FilenameSuffixErrorο
- FinalDfο
- FrozeDictο
- FrozeListο
- FrozeSetο
- InvalidDfErrorο
- MissingColumnErrorο
- NoValueErrorο
- NonStrColumnErrorο
- NotSingleColumnErrorο
- UnexpectedColumnErrorο
- UnexpectedIndexNameErrorο
- UnsupportedOperationErrorο
- Utilsο
- ValueNotUniqueErrorο
- VerificationFailedErrorο
- _loggerο
- classmethod affinity_matrix(name: str, doc: Optional[str] = None) typeddfs.builders.AffinityMatrixDfBuilder ο
Creates a new subclass of an
typeddfs.matrix_dfs.AffinityMatrixDf
.- Parameters
name β The name that will be used for the new class
doc β The docstring for the new class
- Returns
A builder instance (builder pattern) to be used with chained calls
- classmethod example() Type[typeddfs.typed_dfs.TypedDf] ο
Creates a new example TypedDf subclass. The class has:
required index βkeyβ
required column βvalueβ
reserved column βnoteβ
no other columns
- Returns
The created class
- classmethod matrix(name: str, doc: Optional[str] = None) typeddfs.builders.MatrixDfBuilder ο
Creates a new subclass of an
typeddfs.matrix_dfs.MatrixDf
.- Parameters
name β The name that will be used for the new class
doc β The docstring for the new class
- Returns
A builder instance (builder pattern) to be used with chained calls
- classmethod typed(name: str, doc: Optional[str] = None) typeddfs.builders.TypedDfBuilder ο
Creates a new type with flexible requirements. The class will enforce constraints and subclass
typeddfs.typed_dfs.TypedDf
.- Parameters
name β The name that will be used for the new class
doc β The docstring for the new class
- Returns
A builder instance (builder pattern) to be used with chained calls
Example
TypedDfs.typed("MyClass").require("name", index=True).build()
- classmethod untyped(name: str, doc: Optional[str] = None) Type[typeddfs.untyped_dfs.UntypedDf] ο
Creates a new subclass of
UntypedDf
. The returned class will not enforce constraints but will have some extra methods. In generaltyped()
should be preferred because it has more consistent behavior, especially for IO.- Parameters
name β The name that will be used for the new class
doc β The docstring for the new class
- Returns
A class instance
Example
MyClass = TypedDfs.untyped("MyClass")
typeddfs._pretty_dfs
ο
Defines a DataFrame with simple extra functions like column_names
.
Module Contentsο
- class typeddfs._pretty_dfs.PrettyDf(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)ο
A DataFrame with an overridden
_repr_html_
and some simple additional methods.- property _constructor_expanddimο
- column_names() List[str] ο
Returns the list of columns.
- Returns
A Python list
- index_names() List[str] ο
Returns the list of index names. Unlike
self.index.names
, returns[]
instead of[None]
if there is no index.- Returns
A Python list
- is_multindex() bool ο
Returns whether this is a
pd.MultiIndex
.
- n_columns() int ο
Returns the number of columns.
- n_indices() int ο
Returns the number of index names.
- n_rows() int ο
Returns the number of rows.
typeddfs.abs_dfs
ο
Defines a low-level DataFrame subclass.
It overrides a lot of methods to auto-change the type back to cls
.
Module Contentsο
- class typeddfs.abs_dfs.AbsDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
An abstract Pandas DataFrame subclass with additional methods.
- classmethod _check(df) None ο
Should raise an
typeddfs.df_errors.InvalidDfError
or subclass for issues.
- classmethod can_read() Set[typeddfs.file_formats.FileFormat] ο
Returns all formats that can be read using
read_file
. Some depend on the availability of optional packages. The lines format (.txt
,.lines
, etc.) is only included if this DataFrame can support only 1 column+index. Seetypeddfs.file_formats.FileFormat.can_read()
.
- classmethod can_write() Set[typeddfs.file_formats.FileFormat] ο
Returns all formats that can be written to using
write_file
. Some depend on the availability of optional packages. The lines format (.txt
,.lines
, etc.) is only included if this DataFrame type can support only 1 column+index. Seetypeddfs.file_formats.FileFormat.can_write()
.
- classmethod from_records(*args, **kwargs) __qualname__ ο
Convert structured or record ndarray to DataFrame.
Creates a DataFrame object from a structured ndarray, sequence of tuples or dicts, or DataFrame.
- Parameters
data (structured ndarray, sequence of tuples or dicts, or DataFrame) β Structured input data.
index (str, list of fields, array-like) β Field of array to use as the index, alternately a specific set of input labels to use.
exclude (sequence, default None) β Columns or fields to exclude.
columns (sequence, default None) β Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns).
coerce_float (bool, default False) β Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
nrows (int, default None) β Number of rows to read if data is an iterator.
- Return type
DataFrame
See also
DataFrame.from_dict
DataFrame from dict of array-like or dicts.
DataFrame
DataFrame object creation using constructor.
Examples
Data can be provided as a structured ndarray:
>>> data = np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')], ... dtype=[('col_1', 'i4'), ('col_2', 'U1')]) >>> pd.DataFrame.from_records(data) col_1 col_2 0 3 a 1 2 b 2 1 c 3 0 d
Data can be provided as a list of dicts:
>>> data = [{'col_1': 3, 'col_2': 'a'}, ... {'col_1': 2, 'col_2': 'b'}, ... {'col_1': 1, 'col_2': 'c'}, ... {'col_1': 0, 'col_2': 'd'}] >>> pd.DataFrame.from_records(data) col_1 col_2 0 3 a 1 2 b 2 1 c 3 0 d
Data can be provided as a list of tuples with corresponding columns:
>>> data = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')] >>> pd.DataFrame.from_records(data, columns=['col_1', 'col_2']) col_1 col_2 0 3 a 1 2 b 2 1 c 3 0 d
- classmethod read_file(path: Union[pathlib.Path, str], *, file_hash: Optional[bool] = None, dir_hash: Optional[bool] = None, hex_hash: Optional[str] = None, attrs: Optional[bool] = None, storage_options: Optional[pandas._typing.StorageOptions] = None) __qualname__ ο
Reads from a file (or possibly URL), guessing the format from the filename extension. Delegates to the
read_*
functions of this class.You can always write and then read back to get the same dataframe. .. code-block:
# df is any DataFrame from typeddfs # path can use any suffix df.write_file(path)) df.read_file(path)
Text files always allow encoding with .gz, .zip, .bz2, or .xz.
- Supports:
.csv, .tsv, or .tab
.json
.xml
.feather
.parquet or .snappy
.h5 or .hdf
.xlsx, .xls, .odf, etc.
.toml
.properties
.ini
.fxf (fixed-width)
.flexwf (fixed-but-unspecified-width with an optional delimiter)
.txt, .lines, or .list
See also
- Parameters
path β Only path-like strings or pathlib objects are supported, not buffers (because we need a filename).
file_hash β Check against a hash file specific to this file (e.g. <path>.sha1)
dir_hash β Check against a per-directory hash file
hex_hash β Check against this hex-encoded hash
attrs β Set dataset attributes/metadata (
pd.DataFrame.attrs
) from a JSON file. If True, usestypeddfs.df_typing.DfTyping.attrs_suffix
. If a str or Path, uses that file. If None or False, does not set.storage_options β Passed to Pandas
- Returns
An instance of this class
- classmethod read_url(url: str) __qualname__ ο
Reads from a URL, guessing the format from the filename extension. Delegates to the
read_*
functions of this class.See also
- Returns
An instance of this class
- write_file(path: Union[pathlib.Path, str], *, overwrite: bool = True, mkdirs: bool = False, file_hash: Optional[bool] = None, dir_hash: Optional[bool] = None, attrs: Optional[bool] = None, storage_options: Optional[pandas._typing.StorageOptions] = None, atomic: bool = False) Optional[str] ο
Writes to a file, guessing the format from the filename extension. Delegates to the
to_*
functions of this class (e.g.to_csv
). Only includes file formats that can be read back in with correspondingto
methods.- Supports, where text formats permit optional .gz, .zip, .bz2, or .xz:
.csv, .tsv, or .tab
.json
.feather
.fwf (fixed-width)
.flexwf (columns aligned but using a delimiter)
.parquet or .snappy
.h5, .hdf, or .hdf5
.xlsx, .xls, and other variants for Excel
.odt and .ods (OpenOffice)
.xml
.toml
.ini
.properties
.pkl and .pickle
.txt, .lines, or .list; see
to_lines()
andread_lines()
See also
- Parameters
path β Only path-like strings or pathlib objects are supported, not buffers (because we need a filename).
overwrite β If False, complain if the file already exists
mkdirs β Make the directory and parents if they do not exist
file_hash β Write a hash for this file. The filename will be path+β.β+algorithm. If None, chooses according to
self.get_typing().io.hash_file
.dir_hash β Append a hash for this file into a list. The filename will be the directory name suffixed by the algorithm; (i.e. path.parent/(path.parent.name+β.β+algorithm) ). If None, chooses according to
self.get_typing().io.hash_dir
.attrs β Write dataset attributes/metadata (
pd.DataFrame.attrs
) to a JSON file. usestypeddfs.df_typing.DfTyping.attrs_suffix
. If None, chooses according toself.get_typing().io.use_attrs
.storage_options β Passed to Pandas
atomic β Write to a temporary file, then renames
- Returns
Whatever the corresponding method on
pd.to_*
returns. This is usually either str or None- Raises
InvalidDfError β If the DataFrame is not valid for this type
ValueError β If the type of a column or index name is non-str
typeddfs.base_dfs
ο
Defines the superclasses of the types TypedDf
and UntypedDf
.
Module Contentsο
- class typeddfs.base_dfs.BaseDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
An abstract DataFrame type that has a way to convert and de-convert. A subclass of
typeddfs.abs_dfs.AbsDf
, it has methodsconvert()
andvanilla()
. but no implementation or enforcement of typing.- __getitem__(item)ο
Finds an index level or column, returning the Series, DataFrame, or value. Note that typeddfs forbids duplicate column names, as well as column names and index levels sharing names.
- classmethod convert(df: pandas.DataFrame) __qualname__ ο
Converts a vanilla Pandas DataFrame to cls.
Note
The argument
df
will have its__class__
changed tocls
but will otherwise be unaffected.- Returns
A copy
- classmethod of(df, *args, keys: Optional[Iterable[str]] = None, **kwargs) __qualname__ ο
Construct or convert a DataFrame, returning this type. Delegates to
convert()
for DataFrames, or tries first constructing a DataFrame by callingpd.DataFrame(df)
. Ifdf
is a list (Iterable
) of DataFrames, will callpd.concat
on them; for this,ignore_index=True
is passed. If the list is empty, will returnnew_df()
.May be overridden to accept more types, such as a string for database lookup. For example,
Customers.of("john")
could return a DataFrame for a database customer, or return the result ofCustomers.convert(...)
if a DataFrame instance is provided. You may add and process keyword arguments, but keyword args forpd.DataFrame.__init__
should be passed along to that constructor.- Parameters
df β A DataFrame, list of DataFrames, or something to be passed to
pd.DataFrame
.keys β Labels for the DataFrames (if passed a sequence of them) to use as attr keys; if None, attrs will be empty (
{}
) if concatenatingkwargs β Passed to
pd.DataFrame.__init__
; can be handled directly by this method for specialized construction, database lookup, etc.
- Returns
A new DataFrame; see
convert()
for more info.
- retype() __qualname__ ο
Calls
self.__class__.convert
on this DataFrame. This is useful to call at the end of a chain of DataFrame functions, where the type is preserved but the DataFrame may no longer be valid under this typeβs rules. This can occur because, for performance, typeddfs does not callconvert
on most calls.Examples
df = MyDf(data).apply(my_fn, axis=1).retype() # make sure it's still valid
df = MyDf(data).groupby(...).retype() # we maybe changed the index; fix it
- Returns
A copy
typeddfs.builders
ο
Defines a builder pattern for TypedDf
.
Module Contentsο
- class typeddfs.builders.AffinityMatrixDfBuilder(name: str, doc: Optional[str] = None)ο
A builder pattern for
typeddfs.matrix_dfs.AffinityMatrixDf
.Constructs a new builder.
- Parameters
name β The name of the resulting class
doc β The docstring of the resulting class
- Raises
TypeError β If
name
ordoc
non-string
- build() Type[typeddfs.matrix_dfs.AffinityMatrixDf] ο
Builds this type.
- Returns
A newly created subclass of
typeddfs.matrix_dfs.AffinityMatrixDf
.- Raises
typeddfs.df_errors.ClashError β If there is a contradiction in the specification
typeddfs.df_errors.FormatInsecureError β If
hash()
set an insecure hash format andsecure()
was set.
Note
Copies, so this builder can be used to create more types without interference.
- class typeddfs.builders.MatrixDfBuilder(name: str, doc: Optional[str] = None)ο
A builder pattern for
typeddfs.matrix_dfs.MatrixDf
.Constructs a new builder.
- Parameters
name β The name of the resulting class
doc β The docstring of the resulting class
- Raises
TypeError β If
name
ordoc
non-string
- _check_final() None ο
- build() Type[typeddfs.matrix_dfs.MatrixDf] ο
Builds this type.
- Returns
A newly created subclass of
typeddfs.matrix_dfs.MatrixDf
.- Raises
ClashError β If there is a contradiction in the specification
FormatInsecureError β If
hash()
set an insecure hash format andsecure()
was set.
Note
Copies, so this builder can be used to create more types without interference.
- Raises
DfTypeConstructionError β for some errors
- dtype(dt: Type[Any]) __qualname__ ο
Sets the type of value for all matrix elements. This should almost certainly be a numeric type, and it must be ordered.
- Returns
This builder for chaining
- class typeddfs.builders.TypedDfBuilder(name: str, doc: Optional[str] = None)ο
A builder pattern for
typeddfs.typed_dfs.TypedDf
.Example
TypedDfBuilder.typed().require("name").build()
Constructs a new builder.
- Parameters
name β The name of the resulting class
doc β The docstring of the resulting class
- Raises
TypeError β If
name
ordoc
non-string
- _check(names: Sequence[str]) None ο
- _check_final() None ο
Final method in the chain. Creates a new subclass of
TypedDf
.- Returns
The new class
- Raises
typeddfs.df_errors.ClashError β If there is a contradiction in the specification
- build() Type[typeddfs.typed_dfs.TypedDf] ο
Builds this type.
- Returns
A newly created subclass of
typeddfs.typed_dfs.TypedDf
.- Raises
DfTypeConstructionError β If there is a contradiction in the specification
Note
Copies, so this builder can be used to create more types without interference.
- drop(*names: str) __qualname__ ο
Adds columns (and index names) that should be automatically dropped.
- Parameters
names β Varargs list of names
- Returns
This builder for chaining
- require(*names: str, dtype: Optional[Type] = None, index: bool = False) __qualname__ ο
Requires column(s) or index name(s). DataFrames will fail if they are missing any of these.
- Parameters
names β A varargs list of columns or index names
dtype β An automatically applied transformation of the column values using
.astype
index β If True, put these in the index
- Returns
This builder for chaining
- Raises
typeddfs.df_errors.ClashError β If a name was already added or is forbidden
- reserve(*names: str, dtype: Optional[Type] = None, index: bool = False) __qualname__ ο
Reserves column(s) or index name(s) for optional inclusion. A reserved column will be accepted even if
strict
is set. A reserved index will be accepted even ifstrict
is set; additionally, it will be automatically moved from the list of columns to the list of index names.- Parameters
names β A varargs list of columns or index names
dtype β An automatically applied transformation of the column values using
.astype
index β If True, put these in the index
- Returns
This builder for chaining
- Raises
typeddfs.df_errors.ClashError β If a name was already added or is forbidden
- series_names(index: Union[None, bool, str] = False, columns: Union[None, bool, str] = False) __qualname__ ο
Sets
pd.DataFrame.index.name
and/orpd.DataFrame.columns.name
. Valid values areFalse
to not set (default),None
to set toNone
, or a string to set to.- Returns
This builder for chaining
- strict(index: bool = True, cols: bool = True) __qualname__ ο
Disallows any columns or index names not in the lists of reserved/required.
- Parameters
index β Disallow additional names in the index
cols β Disallow additional columns
- Returns
This builder for chaining
typeddfs.datasets
ο
Near-replica of example from the readme.
Module Contentsο
- class typeddfs.datasets.ExampleDfsο
DataFrames derived from Seaborn and other sources.
- anagramsο
- anscombeο
- attentionο
- brain_networksο
- car_crashesο
- diamondsο
- dotsο
- exerciseο
- flightsο
- fmriο
- gammasο
- geyserο
- irisο
- mpgο
- penguinsο
- planetsο
- taxisο
- tipsο
- titanicο
- class typeddfs.datasets.LazyDf(name: str, source: str, clazz: Type[T], _df: Optional[T])ο
A
typeddfs.abs_dfs.AbsDf
that is lazily loaded from a source. Create normally viafrom_source()
. Create withfrom_df()
to wrap an extant DataFrame into a LazyDataFrame.Example
lazy = LazyDataFrame.from_source("https://google.com/dataframe.csv")
- property clazz Type[T] ο
- property df T ο
- classmethod from_source(source: str, clazz: Type[S] = PlainTypedDf, name: Optional[str] = None) LazyDf[S] ο
- property name str ο
typeddfs.df_errors
ο
Exceptions used by typeddfs.
Module Contentsο
- exception typeddfs.df_errors.ClashError(*args, keys: Optional[AbstractSet[str]] = None)ο
Duplicate columns or other keys were added.
- keysο
The clashing name(s)
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.DfTypeConstructionErrorο
An inconsistency prevents creating the DataFrame type.
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.FilenameSuffixError(*args, key: Optional[str] = None, filename: Optional[str] = None)ο
A filename extension was not recognized.
- keyο
The unrecognized suffix
- filenameο
The bad filename
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.FormatDiscouragedError(*args, key: Optional[str] = None)ο
A requested format is not recommended.
- keyο
The problematic format name
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.FormatInsecureError(*args, key: Optional[str] = None)ο
A requested format is less secure than required or requested.
- keyο
The problematic format name
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashAlgorithmMissingError(*args, key: Optional[str] = None)ο
The hash algorithm was not found in
hashlib
.- keyο
The missing hash algorithm
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashContradictsExistingError(*args, key: Optional[str] = None, original: Optional[str] = None, new: Optional[str] = None)ο
A hash for the filename already exists in the directory hash list, but they differ.
- keyο
The filename (excluding parents)
- originalο
Hex hash found listed for the file
- newο
Hex hash that was to be written
- filenameο
The filename of the listed file
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashDidNotValidateError(*args, actual: Optional[str] = None, expected: Optional[str] = None)ο
The hashes did not validate (expected != actual).
- actualο
The actual hex-encoded hash
- expectedο
The expected hex-encoded hash
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashEntryExistsError(*args, key: Optional[str] = None)ο
The file is already listed in the hash dir, and it cannot be overwritten.
- keyο
The existing hash dir path
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashErrorο
Something went wrong with hash file writing or reading.
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashExistsError(*args, key: Optional[str] = None, original: Optional[str] = None, new: Optional[str] = None)ο
A hash for the filename already exists in the directory hash list.
- keyο
The filename (excluding parents)
- originalο
Hex hash found listed for the file
- newο
Hex hash that was to be written
- filenameο
The filename of the listed file
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashFileExistsError(*args, key: Optional[str] = None)ο
The hash file already exists and cannot be overwritten.
- keyο
The existing hash file path or filename
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashFileInvalidError(*args, key: Union[None, pathlib.PurePath, str] = None)ο
The hash file could not be parsed.
- keyο
The path to the hash file
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashFileMissingError(*args, key: Optional[str] = None)ο
The hash file does not exist.
- keyο
The path or filename of the file corresponding to the expected hash file(s)
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashFilenameMissingError(*args, key: Optional[str] = None)ο
The filename was not found listed in the hash file.
- keyο
The filename
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashVerificationErrorο
Something went wrong when validating a hash.
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.HashWriteErrorο
Something went wrong when writing a hash file.
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.InvalidDfErrorο
A general typing failure of typeddfs.
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.LengthMismatchError(*args, key: Optional[str] = None, lengths: AbstractSet[int])ο
The lengths of at least two lists do not match.
- keyο
The key used for lookup
- lengthsο
The lengths
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.MissingColumnError(*args, key: Optional[str] = None)ο
A required column is missing.
- keyο
The name of the missing column
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.MultipleHashFilenamesError(*args, key: Optional[str] = None)ο
There are multiple filenames listed in the hash file where only 1 was expected.
- keyο
The filename with duplicate entries
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.NoValueError(*args, key: Optional[str] = None)ο
No value because the collection is empty.
- keyο
The key used for lookup
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.NonStrColumnErrorο
A column name is not a string.
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.NotSingleColumnErrorο
A DataFrame needs to contain exactly 1 column.
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.PathNotRelativeError(*args, key: Optional[str] = None)ο
The filename is not relative to the hash dir.
- keyο
The filename
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.ReadPermissionsError(*args, key: Optional[str] = None)ο
Couldnβt read from a file.
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.RowColumnMismatchError(*args, rows: Optional[Sequence[str]] = None, columns: Optional[Sequence[str]] = None)ο
The row and column names differ.
- rowsο
The row names, in order
- columnsο
The column names, in order
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.UnexpectedColumnError(*args, key: Optional[str] = None)ο
An extra/unrecognized column is present.
- keyο
The name of the unexpected column
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.UnexpectedIndexNameError(*args, key: Optional[str] = None)ο
An extra/unrecognized index level is present.
- keyο
The name of the unexpected index level
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.UnsupportedOperationErrorο
Something could not be performed, in general.
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.ValueNotUniqueError(*args, key: Optional[str] = None, values: Optional[AbstractSet[str]] = None)ο
There is more than 1 unique value.
- keyο
The key used for lookup
- valuesο
The set of values
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.VerificationFailedError(*args, key: Optional[str] = None)ο
A custom typing verification failed.
- keyο
The key name of the verification that failed
Initialize self. See help(type(self)) for accurate signature.
- exception typeddfs.df_errors.WritePermissionsError(*args, key: Optional[str] = None)ο
Couldnβt write to a file.
Initialize self. See help(type(self)) for accurate signature.
typeddfs.df_typing
ο
Information about how DataFrame subclasses should be handled.
Module Contentsο
- typeddfs.df_typing.FINAL_DF_TYPINGο
- typeddfs.df_typing.FINAL_IO_TYPINGο
- class typeddfs.df_typing.DfTypingο
Contains all information about how to type a DataFrame subclass.
- _auto_dtypes :Optional[Mapping[str, Type[Any]]]ο
- _column_series_name :Union[bool, None, str]ο
- _columns_to_drop :Optional[Set[str]]ο
- _index_series_name :Union[bool, None, str]ο
- _io_typing :IoTypingο
- _more_columns_allowed :bool = Trueο
- _more_index_names_allowed :bool = Trueο
- _order_dclass :bool = Trueο
- _post_processing :Optional[Callable[[T], Optional[T]]]ο
- _required_columns :Optional[Sequence[str]]ο
- _required_index_names :Optional[Sequence[str]]ο
- _reserved_columns :Optional[Sequence[str]]ο
- _reserved_index_names :Optional[Sequence[str]]ο
- _value_dtype :Optional[Type[Any]]ο
- _verifications :Optional[Sequence[Callable[[T], Union[None, bool, str]]]]ο
- property auto_dtypes Mapping[str, Type[Any]] ο
A mapping from column/index names to the expected dtype. These are used via
pd.Series.as_type
for automatic conversion. An error will be raised if aas_type
call fails. Note that Pandas frequently just does not perform the conversion, rather than raising an error. The keys should be contained inknown_names
, but this is not strictly required.
- property column_series_name Union[bool, None, str] ο
Intelligently returns
df.columns.name
. Returns a value that will be forced intodf.columns.name
on callingconvert
. IfNone
, will setdf.columns.name = None
. IfFalse
, will not set. (True
is treated the same asNone
.)
- property columns_to_drop Set[str] ο
Returns the list of columns that are automatically dropped by
convert
. This does NOT include βlevel_0β and βindex, which are ALWAYS dropped.
- property index_series_name Union[bool, None, str] ο
Intelligently returns
df.index.name
. Returns a value that will be forced intodf.index.name
on callingconvert
, only if the DataFrame is multi-index. IfNone
, will setdf.index.name = None
ifdf.index.names != [None]
. IfFalse
, will not set. (True
is treated the same asNone
.)
- property is_strict bool ο
Returns True if this allows unspecified index levels or columns.
- property known_column_names Sequence[str] ο
Returns all columns that are required or reserved. The sort order positions required columns first.
- property known_index_names Sequence[str] ο
Returns all index levels that are required or reserved. The sort order positions required columns first.
- property known_names Sequence[str] ο
Returns all index and column names that are required or reserved. The sort order is: required index, reserved index, required columns, reserved columns.
- property more_columns_allowed bool ο
Returns whether the DataFrame allows columns that are not reserved or required.
- property more_indices_allowed bool ο
Returns whether the DataFrame allows index levels that are neither reserved nor required.
- property order_dataclass bool ο
Whether the corresponding dataclass can be sorted (has
__lt__
).
- property post_processing Optional[Callable[[T], Optional[T]]] ο
A function to be called at the final stage of
convert
. It is called immediately beforeverifications
are checked. The function takes a copy of the inputBaseDf
and returns a new copy.Note
Although a copy is passed as input, the function should not modify it. Technically, doing so will cause problems only if the DataFrameβs internal values are modified. The value passed is a shallow copy (see
pd.DataFrame.copy
).
- property required_columns Sequence[str] ο
Returns the list of required column names.
- property required_index_names Sequence[str] ο
Returns the list of required column names.
- property required_names Sequence[str] ο
Returns all index and column names that are required. The sort order is: required index, required columns.
- property reserved_columns Sequence[str] ο
Returns the list of reserved (optional) column names.
- property reserved_index_names Sequence[str] ο
Returns the list of reserved (optional) index levels.
- property reserved_names Sequence[str] ο
Returns all index and column names that are not required. The sort order is: reserved index, reserved columns.
- property value_dtype Optional[Type[Any]] ο
A type for βvaluesβ in a simple DataFrame. Typically numeric.
- property verifications Sequence[Callable[[T], Union[None, bool, str]]] ο
Additional requirements for the DataFrame to be conformant.
- Returns
A sequence of conditions that map the DF to None or True if the condition passes, or False or the string of an error message if it fails
- class typeddfs.df_typing.IoTypingο
Abstract base class for generic types.
A generic type is typically declared by inheriting from this class parameterized with one or more type variables. For example, a generic mapping type might be defined as:
class Mapping(Generic[KT, VT]): def __getitem__(self, key: KT) -> VT: ... # Etc.
This class can then be used as follows:
def lookup_name(mapping: Mapping[KT, VT], key: KT, default: VT) -> VT: try: return mapping[key] except KeyError: return default
- _attrs_json_kwargs :Optional[Mapping[str, Any]]ο
- _attrs_suffix :str = .attrs.jsonο
- _custom_readers :Optional[Mapping[str, Callable[[pathlib.Path], pandas.DataFrame]]]ο
- _custom_writers :Optional[Mapping[str, Callable[[pandas.DataFrame, pathlib.Path], None]]]ο
- _hash_alg :Optional[str] = sha256ο
- _hdf_key :str = dfο
- _read_kwargs :Optional[Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]]ο
- _recommended :bool = Falseο
- _remap_suffixes :Optional[Mapping[str, typeddfs.file_formats.FileFormat]]ο
- _remapped_read_kwargs :Optional[Mapping[str, Any]]ο
- _remapped_write_kwargs :Optional[Mapping[str, Any]]ο
- _save_hash_dir :bool = Falseο
- _save_hash_file :bool = Falseο
- _secure :bool = Falseο
- _text_encoding :str = utf-8ο
- _use_attrs :bool = Falseο
- _write_kwargs :Optional[Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]]ο
- property attrs_json_kwargs Mapping[str, Any] ο
Keyword arguments for
typeddfs.json_utils.JsonUtils.encoder
. Used when writing attrs.
- property attrs_suffix str ο
File filename suffix detailing where to save/load per-DataFrame βattrsβ (metadata). Will be appended to the DataFrame filename.
- property custom_readers Mapping[str, Callable[[pathlib.Path], pandas.DataFrame]] ο
Mapping from filename suffixes (module compression) to custom reading methods.
- property custom_writers Mapping[str, Callable[[pandas.DataFrame, pathlib.Path], None]] ο
Mapping from filename suffixes (module compression) to custom reading methods.
- property dir_hash bool ο
Whether to save (append) to per-directory hash files by default. Specifically, in
typeddfs.abs_df.AbsDf.write_file()
.
- property file_hash bool ο
Whether to save per-file hash files by default. Specifically, in
typeddfs.abs_df.AbsDf.write_file()
.
- property flexwf_sep str ο
The delimiter used when reading βflex-widthβ format.
Caution
Only checks the read keyword arguments, not write
- property hash_algorithm Optional[str] ο
The hash algorithm used for checksums.
- property hdf_key str ο
The default key used in
typeddfs.abs_df.AbsDf.to_hdf()
. The key is also used intypeddfs.abs_df.AbsDf.read_hdf.()
- property is_text_encoding_utf bool ο
- property read_kwargs Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]] ο
Passes kwargs into read functions from
read_file
. These are keyword arguments that are automatically added into specificread_
methods when called byread_file
.Note
This should rarely be needed
- property read_suffix_kwargs Mapping[str, Mapping[str, Any]] ο
Per-suffix kwargs into read functions from
read_file
. Modulo compression (e.g. .tsv is equivalent to .tsv.gz).
- property recommended bool ο
Whether to forbid discouraged formats like fixed-width and HDF5. Excludes all insecure formats.
- property remap_suffixes Mapping[str, typeddfs.file_formats.FileFormat] ο
Returns filename formats that have been re-mapped to file formats. These are used in
read_file
andwrite_file
.Note
This should rarely be needed. An exception might be
.txt
to tsv rather than lines; Excel uses this.
- property secure bool ο
Whether to forbid insecure operations and formats.
- property text_encoding str ο
Can be an exact encoding like utf-8, βplatformβ, βutf8(bom)β or βutf16(bom)β. See the docs in
TypedDfs.typed().encoding
for details.
- property toml_aot str ο
The name of the Array of Tables (AoT) used when reading TOML.
Caution
Only checks the read keyword arguments, not write
- property use_attrs bool ο
Whether to read and write
pd.DataFrame.attrs
when passingattrs=None
.
- property write_kwargs Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]] ο
Passes kwargs into write functions from
to_file
. These are keyword arguments that are automatically added into specificto_
methods when called bywrite_file
.Note
This should rarely be needed
- property write_suffix_kwargs Mapping[str, Mapping[str, Any]] ο
Per-suffix kwargs into read functions from
write_file
. Modulo compression (e.g. .tsv is equivalent to .tsv.gz).
typeddfs.example
ο
Near-replica of example from the readme.
Module Contentsο
- typeddfs.example.run() None ο
Runs an example usage of typeddfs.
typeddfs.file_formats
ο
File formats for reading/writing to/from DFs.
Module Contentsο
- class typeddfs.file_formats.BaseCompressionο
- base :pathlib.Pathο
- compression :CompressionFormatο
- class typeddfs.file_formats.BaseFormatCompressionο
- base :pathlib.Pathο
- compression :CompressionFormatο
- format :Optional[FileFormat]ο
- class typeddfs.file_formats.CompressionFormatο
A compression scheme or no compression: gzip, zip, bz2, xz, and none. These are the formats supported by Pandas for read and write. Provides a few useful functions for calling code.
Examples
CompressionFormat.strip("my_file.csv.gz") # Path("my_file.csv")
CompressionFormat.from_path("myfile.csv") # CompressionFormat.none
- bz2 = []ο
- gz = []ο
- none = []ο
- xz = []ο
- zip = []ο
- zstd = []ο
- classmethod all_suffixes() Set[str] ο
Returns all suffixes for all compression formats.
- classmethod from_path(path: typeddfs.utils._utils.PathLike) CompressionFormat ο
Returns the compression scheme from a path suffix.
- classmethod from_suffix(suffix: str) CompressionFormat ο
Returns the recognized compression scheme from a suffix.
- property full_name str ο
Returns a more-complete name of this format. For example, βgzipβ βbzip2β, βxzβ, and βnoneβ.
- property is_compressed bool ο
Shorthand for
fmt is not CompressionFormat.none
.
- classmethod list() Set[CompressionFormat] ο
Returns the set of CompressionFormats. Works with static type analysis.
- classmethod list_non_empty() Set[CompressionFormat] ο
Returns the set of CompressionFormats, except for
none
. Works with static type analysis.
- property name_or_none Optional[str] ο
Returns the name, or
None
if it is not compressed.
- classmethod of(t: Union[str, CompressionFormat]) CompressionFormat ο
Returns a FileFormat from a name (e.g. βgzβ or βgzipβ). Case-insensitive.
Example
CompressionFormat.of("gzip").suffix # ".gz"
- property pandas_value Optional[str] ο
Returns the value that should be passed to Pandas as
compression
.
- classmethod split(path: typeddfs.utils._utils.PathLike) BaseCompression ο
- classmethod strip_suffix(path: typeddfs.utils._utils.PathLike) pathlib.Path ο
Returns a path with any recognized compression suffix (e.g. β.gzβ) stripped.
- property suffix str ο
Returns the single Pandas-recognized suffix for this format. This is just ββ for CompressionFormat.none.
- class typeddfs.file_formats.FileFormatο
A computer-readable format for reading and writing of DataFrames in typeddfs. This includes CSV, Parquet, ODT, etc. Some formats also include compressed variants. E.g. a β.csg.gzβ will map to
FileFormat.csv
. This is used internally bytypeddfs.abs_df.read_file()
andtypeddfs.abs_df.write_file()
, but it may be useful to calling code directly.Examples
FileFormat.from_path("my_file.csv.gz").is_text() # True
FileFormat.from_path("my_file.csv.gz").can_read() # always True
FileFormat.from_path("my_file.xlsx").can_read() # true if required package is installed
- csv = []ο
- feather = []ο
- flexwf = []ο
- fwf = []ο
- hdf = []ο
- ini = []ο
- json = []ο
- lines = []ο
- ods = []ο
- parquet = []ο
- pickle = []ο
- properties = []ο
- toml = []ο
- tsv = []ο
- xls = []ο
- xlsb = []ο
- xlsx = []ο
- xml = []ο
- classmethod all_readable() Set[FileFormat] ο
Returns all formats that can be read on this system. Note that the result may depend on whether supporting packages are installed. Includes insecure and discouraged formats.
- classmethod all_writable() Set[FileFormat] ο
Returns all formats that can be written to on this system. Note that the result may depend on whether supporting packages are installed. Includes insecure and discouraged formats.
- property can_always_read bool ο
Returns whether this format can be read as long as typeddfs is installed. In other words, regardless of any optional packages.
- property can_always_write bool ο
Returns whether this format can be written to as long as typeddfs is installed. In other words, regardless of any optional packages.
- property can_read bool ο
Returns whether this format can be read. Note that the result may depend on whether supporting packages are installed.
- property can_write bool ο
Returns whether this format can be written. Note that the result may depend on whether supporting packages are installed.
- compressed_variants(suffix: str) Set[str] ο
Returns all allowed suffixes.
Example
FileFormat.json.compressed_variants(β.jsonβ) # {β.jsonβ, β.json.gzβ, β.json.zipβ, β¦}
- classmethod from_path(path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) FileFormat ο
Guesses a FileFormat from a filename.
See also
- Parameters
path β A string or
pathlib.Path
to a file.format_map β A mapping from suffixes to formats; if
None
, usessuffix_map()
.
- Raises
typeddfs.df_errors.FilenameSuffixError β If not found
- classmethod from_path_or_none(path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) Optional[FileFormat] ο
Same as
from_path()
, but returns None if not found.
- classmethod from_suffix(suffix: str, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) FileFormat ο
Returns the FileFormat corresponding to a filename suffix.
See also
- Parameters
suffix β E.g. β.csv.gzβ or β.featherβ
format_map β A mapping from suffixes to formats; if
None
, usessuffix_map()
.
- Raises
typeddfs.df_errors.FilenameSuffixError β If not found
- classmethod from_suffix_or_none(suffix: str, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) Optional[FileFormat] ο
Same as
from_suffix()
, but returns None if not found.
- property is_binary bool ο
Returns whether this format is text-encoded. Note that this does not consider whether the file is compressed.
- property is_recommended bool ο
Returns whether the format is good. Includes CSV, TSV, Parquet, etc. Excludes all insecure formats along with fixed-width, INI, properties, TOML, and HDF5.
- property is_secure bool ο
Returns whether the format does NOT have serious security issues. These issues only apply to reading files, not writing. Excel formats that support Macros are not considered secure. This includes .xlsm, .xltm, and .xls. These can simply be replaced with xlsx. Note that .xml is treated as secure: Although some parsers are subject to entity expansion attacks, good ones are not.
- property is_text bool ο
Returns whether this format is text-encoded. Note that this does not consider whether the file is compressed.
- classmethod list() Set[FileFormat] ο
Returns the set of FileFormats. Works with static type analysis.
- matches(*, supported: bool, secure: bool, recommended: bool) bool ο
Returns whether this format meets some requirements.
- Parameters
secure β
is_secure
is Truerecommended β
is_recommended
is True
- classmethod of(t: Union[str, FileFormat]) FileFormat ο
Returns a FileFormat from an exact name (e.g. βcsvβ).
See also
- classmethod split(path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) BaseFormatCompression ο
Splits a path into the base path, format, and compression.
See also
- Raises
FilenameSuffixError β If the suffix is not found
- Returns
A 3-tuple of (base base excluding suffixes, file format, compression format)
- classmethod split_or_none(path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) BaseFormatCompression ο
Splits a path into the base path, format, and compression.
See also
- Returns
A 3-tuple of (base base excluding suffixes, file format, compression format)
- classmethod strip(path: typeddfs.utils._utils.PathLike, *, format_map: Optional[Mapping[str, Union[FileFormat, str]]] = None) pathlib.Path ο
Strips a recognized, optionally compressed, suffix from
path
.See also
Example
FileFormat.strip("abc/xyz.csv.gz") # Path("abc") / "xyz"
- classmethod suffix_map() MutableMapping[str, FileFormat] ο
Returns a mapping from all suffixes to their respective formats. See
suffixes()
.
- property suffixes Set[str] ο
Returns the suffixes that are tied to this format. These will not overlap with the suffixes for any other format. For example, .txt is for
FileFormat.lines
, although it could be treated as tab- or space-separated.
- property supports_encoding bool ο
Returns whether this format supports a text encoding of some sort. This may not correspond to an
encoding=
parameter, and the format may be binary. For example, XLS and XML support encodings.
typeddfs.frozen_types
ο
Hashable and ordered collections.
Module Contentsο
- class typeddfs.frozen_types.FrozeDict(dct: Mapping[K, V])ο
An immutable dictionary/mapping. Hashable and ordered.
- EMPTY :FrozeDictο
- __contains__(item: K) bool ο
- __getitem__(item: K) T ο
- __hash__() int ο
Return hash(self).
- __iter__()ο
- __len__() int ο
- __lt__(other: Mapping[K, V])ο
Compares this dict to another, with partial ordering.
- The algorithm is:
Sort
self
andother
by keysIf
sorted_self < sorted_other
, returnFalse
If the reverse is true (
sorted_other < sorted_self
), returnTrue
(The keys are now known to be the same.) For each key, in order: If
self[key] < other[key]
, returnTrue
Return
False
- __repr__() str ο
Return repr(self).
- __str__() str ο
Return str(self).
- get(key: K, default: Optional[V] = None) Optional[V] ο
D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.
- property is_empty bool ο
- items() AbstractSet[tuple[K, V]] ο
D.items() -> a set-like object providing a view on Dβs items
- keys() AbstractSet[K] ο
D.keys() -> a set-like object providing a view on Dβs keys
- property length int ο
- req(key: K, default: Optional[V] = None) V ο
Returns the value corresponding to
key
. Short for βrequireβ. Falls back todefault
ifdefault
is not None andkey
is not in this dict.Raise: KeyError: If
key
is not in this dict anddefault
isNone
- to_dict() MutableMapping[K, V] ο
- values() ValuesView[V] ο
D.values() -> an object providing a view on Dβs values
- class typeddfs.frozen_types.FrozeList(lst: Sequence[T])ο
An immutable list. Hashable and ordered.
- EMPTY :FrozeListο
- __getitem__(item: int)ο
- __hash__() int ο
Return hash(self).
- __iter__() Iterator[T] ο
- __len__() int ο
- __repr__() str ο
Return repr(self).
- __str__() str ο
Return str(self).
- get(item: T, default: Optional[T] = None) Optional[T] ο
- property is_empty bool ο
- property length int ο
- req(item: T, default: Optional[T] = None) T ο
Returns the requested list item, falling back to a default. Short for βrequireβ.
- Raises
KeyError β If
item
is not in this list anddefault
isNone
- to_list() List[T] ο
- class typeddfs.frozen_types.FrozeSet(lst: AbstractSet[T])ο
An immutable set. Hashable and ordered. This is almost identical to
typing.FrozenSet
, but itβs behavior was made equivalent to those of FrozeDict and FrozeList.- EMPTY :FrozeSetο
- __contains__(x: T) bool ο
- __getitem__(item: T) T ο
- __hash__() int ο
Return hash(self).
- __iter__() Iterator[T] ο
- __len__() int ο
- __lt__(other: Union[FrozeSet[T], AbstractSet[T]])ο
Compares
self
andother
for partial ordering. Sortsself
andother
, then compares the two sorted sets.- Approximately::
return list(sorted(self)) < list(sorted(other))
- __repr__() str ο
Return repr(self).
- __str__() str ο
Return str(self).
- get(item: T, default: Optional[T] = None) Optional[T] ο
- property is_empty bool ο
- property length int ο
- req(item: T, default: Optional[T] = None) T ο
Returns
item
if it is in this set. Short for βrequireβ. Falls back todefault
ifdefault
is notNone
.- Raises
KeyError β If
item
is not in this set anddefault
isNone
- to_frozenset() AbstractSet[T] ο
- to_set() AbstractSet[T] ο
typeddfs.matrix_dfs
ο
DataFrames that are essentially n-by-m matrices.
Module Contentsο
- class typeddfs.matrix_dfs.AffinityMatrixDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
A similarity or distance matrix. The rows and columns must match, and only 1 index is allowed.
- __repr__() str ο
Return repr(self).
- __str__() str ο
Return str(self).
- classmethod _check(df: typeddfs.base_dfs.BaseDf)ο
Should raise an
typeddfs.df_errors.InvalidDfError
or subclass for issues.
- classmethod get_typing() typeddfs.df_typing.DfTyping ο
- classmethod new_df(n: Union[int, Sequence[str]] = 0, fill: Union[int, float, complex] = 0) __qualname__ ο
Returns a DataFrame that is empty but valid.
- Parameters
n β Either a number of rows/columns or a sequence of labels. If a number is given, will choose (str-type) labels β0β, β1β, β¦
fill β A value to fill in every cell. Should match
self.required_dtype
.
- Raises
InvalidDfError β If a function in
verifications
fails (returns False or a string).IntCastingNaNError β If
fill
is NaN or inf andself.required_dtype
does not support it.
- symmetrize() __qualname__ ο
Averages with its transpose, forcing it to be symmetric.
- class typeddfs.matrix_dfs.LongFormMatrixDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
A long-form matrix with columns βrowβ, βcolumnβ, and βvalueβ.
- classmethod get_typing() typeddfs.df_typing.DfTyping ο
- class typeddfs.matrix_dfs.MatrixDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
A dataframe that is best thought of as a simple matrix. Contains a single index level and a list of columns, with numerical values of a single dtype.
- classmethod get_typing() typeddfs.df_typing.DfTyping ο
- classmethod new_df(rows: Union[int, Sequence[str]] = 0, cols: Union[int, Sequence[str]] = 0, fill: Union[int, float, complex] = 0) __qualname__ ο
Returns a DataFrame that is empty but valid.
- Parameters
rows β Either a number of rows or a sequence of labels. If a number is given, will choose (str-type) labels β0β, β1β, β¦
cols β Either a number of columns or a sequence of labels. If a number is given, will choose (str-type) labels β0β, β1β, β¦
fill β A value to fill in every cell. Should match
self.required_dtype
. String values are
- Raises
InvalidDfError β If a function in
verifications
fails (returns False or a string).IntCastingNaNError β If
fill
is NaN or inf andself.required_dtype
does not support it.
typeddfs.typed_dfs
ο
Defines DataFrames with convenience methods and that enforce invariants.
Module Contentsο
- class typeddfs.typed_dfs.PlainTypedDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
A trivial TypedDf that behaves like an untyped one.
- class typeddfs.typed_dfs.TypedDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
A concrete BaseFrame that enforces conditions. Each subclass has required and reserved (optional) columns and index names. They may or may not permit additional columns or index names.
The constructor will require the conditions to pass but will not rearrange columns and indices. To do that, call
convert
.Overrides a number of DataFrame methods that preserve the subclass. For example, calling
df.reset_index()
will return aTypedDf
of the same type asdf
. If a condition would then fail, calluntyped()
first.For example, suppose
MyTypedDf
has a required index name called βxyzβ. Then this will be fine as long asdf
has a column or index name calledxyz
:MyTypedDf.convert(df)
. But callingMyTypedDf.convert(df).reset_index()
will fail. You can put the column βxyzβ back into the index usingconvert
:MyTypedDf.convert(df.reset_index())
. Or, you can get a plain DataFrame (UntypedDf) back:MyTypedDf.convert(df).untyped().reset_index()
.To summarize: Call
untyped()
before calling something that would result in anything invalid.- classmethod _check(df) None ο
Should raise an
typeddfs.df_errors.InvalidDfError
or subclass for issues.
- classmethod _check_has_required(df: pandas.DataFrame) None ο
- classmethod _check_has_unexpected(df: pandas.DataFrame) None ο
- classmethod convert(df: pandas.DataFrame) __qualname__ ο
Converts a vanilla Pandas DataFrame (or any subclass) to
cls
. Explicitly sets the new copyβs __class__ to cls. Rearranges the columns and index names. For example, if a column indf
is inself.reserved_index_names()
, it will be moved to the index.- The new index names will be, in order:
required_index_names()
, in orderreserved_index_names()
, in orderany extras in
df
, ifmore_indices_allowed
is True
- Similarly, the new columns will be, in order:
required_columns()
, in orderreserved_columns()
, in orderany extras in
df
in the original, ifmore_columns_allowed
is True
Note
Any column called
index
orlevel_0
will be dropped automatically.- Parameters
df β The Pandas DataFrame or member of cls; will have its __class_ change but will otherwise not be affected
- Returns
A copy
- Raises
InvalidDfError β If a condition such as a required column or symmetry fails (specific subclasses)
TypeError β If
df
is not a DataFrame
- classmethod get_typing() typeddfs.df_typing.DfTyping ο
- meta() __qualname__ ο
Drops the columns, returning only the index but as the same type.
- Returns
A copy
- Raises
InvalidDfError β If the result does not pass the typing of this class
- classmethod new_df(reserved: Union[bool, Sequence[str]] = False) __qualname__ ο
Returns a DataFrame that is empty but has the correct columns and indices.
- Parameters
reserved β Include reserved index/column names as well as required. If True, adds all reserved index levels and columns; You can also specify the exact list of columns and index names.
- Raises
InvalidDfError β If a function in
verifications
fails (returns False or a string).
- untyped() typeddfs.untyped_dfs.UntypedDf ο
Makes a copy thatβs an UntypedDf. It wonβt have enforced requirements but will still have the convenience functions.
- Returns
A shallow copy with its __class__ set to an UntypedDf
- See:
vanilla()
typeddfs.untyped_dfs
ο
Defines DataFrames with convenience methods but that do not enforce invariants.
Module Contentsο
- class typeddfs.untyped_dfs.UntypedDf(data=None, index=None, columns=None, dtype=None, copy=False)ο
A concrete DataFrame that does not require columns or enforce conditions. Overrides a number of DataFrame methods that preserve the subclass. For example, calling
df.reset_index()
will return aUntypedDf
of the same type asdf
.- classmethod get_typing() typeddfs.df_typing.DfTyping ο
- classmethod new_df(rows: int = 0, cols: Union[int, Sequence[str]] = 0, fill: Any = 0) __qualname__ ο
Creates a new, semi-arbitrary DataFrame of the specified rows and columns. The DataFrame will have no index.
- Parameters
rows β Number of rows
cols β Number of columns or a sequence of column labels
fill β Fill every cell with this value
Package Contentsο
- typeddfs.__pkgο
- typeddfs.loggerο
- typeddfs.metadataο
- typeddfs.metadataο
- 1
Created with sphinx-autoapi