PBQ¶
-
class
PBQ(query: pbq.query.Query, project=None)[source]¶ bigquery driver using the google official API
- query : str
- the query
- query_obj : Query
- pbq.Query object
- client : Client
- the client object for bigquery
- bqstorage_client : BigQueryStorageClient
- the google storage client object
- to_dataframe(save_query=False, **params)
- return the query results as data frame
- to_csv(filename, sep=’,’, save_query=False, **params)
- save the query results to a csv file
- save_to_table(table, dataset, project=None, replace=True, partition=None)
- save query to table
- run_query()
- simply execute your query
- table_details(table, dataset, project)
- get the information about the table
- save_file_to_table(filename, table, dataset, project, file_format=bigquery.SourceFormat.CSV, max_bad_records=0,
- replace=True, partition=None)
save file to table, it can be partitioned and it can append to existing table. the supported formats are CSV or PARQUET
- save_dataframe_to_table(df: pd.DataFrame, table, dataset, project, max_bad_records=0, replace=True,
- partition=None)
same as save file just with pandas dataframe
- table_exists(client: bigquery.Client, table_ref: bigquery.table.TableReference)
- check if table exists - if True - table exists else not exists
getting query to dataframe
>>> from pbq import Query, PBQ >>> query = Query("select * from table")
>>> print("the query price:", query.price)
>>> if not query.validate(): >>> raise RuntimeError("table not valid")
>>> pbq = PBQ(query) >>> pbq.to_dataframe()
saving query to csv
>>> from pbq import Query, PBQ >>> query = Query("select * from table") >>> pbq = PBQ(query) >>> pbq.to_csv()
saving dataframe to table
>>> import pandas as pd >>> from pbq import Query, PBQ >>> df = pd.DataFrame()
>>> PBQ.save_dataframe_to_table(df, 'table', 'dataset', 'project_id', partition='20191013', replace=False)
-
static
save_dataframe_to_table(df: pandas.core.frame.DataFrame, table, dataset, project, max_bad_records=0, replace=True, partition=None, validate_params=False)[source]¶ save pd.DataFrame object to table
Parameters: - df – pd.DataFrame the dataframe you want to save
- table – str table name
- dataset – str data set name
- project – str project name
- max_bad_records – int number of bad records allowed in file (default: 0)
- replace – boolean if set as true - it will replace the table, else append to table (default: True)
- partition – str partition format DDMMYYY (default: None)
- validate_params – boolean validate the schema of the table to the dataframe object (default: False)
-
static
save_file_to_table(filename, table, dataset, project, file_format='CSV', max_bad_records=0, replace=True, partition=None)[source]¶ save file to table, it can be partitioned and it can append to existing table. the supported formats are CSV or PARQUET
Parameters: - filename – str with the path to save the file
- table – str table name
- dataset – str data set name
- project – str project name
- file_format – str possible file format (CSV, PARQUET) (default: CSV)
- max_bad_records – int number of bad records allowed in file (default: 0)
- replace – boolean if set as trueit will replace the table, else append to table (default: True)
- partition – str partition format DDMMYYY (default: None)
-
save_to_table(table, dataset, project=None, replace=True, partition=None)[source]¶ save query to table
Parameters: - table – str table name
- dataset – str data set name
- project – str project name
- replace – boolean if set as true - it will replace the table, else append to table (default: True)
- partition – str partition format DDMMYYY (default: None)
-
static
table_details(table, dataset, project)[source]¶ return a dict object with some details about the table
Parameters: - table – str table name
- dataset – str data set name
- project – str project name
Returns: dict with some table information like, last_modified_time, num_bytes, num_rows, and creation_time
-
static
table_exists(client: google.cloud.bigquery.client.Client, table_ref: google.cloud.bigquery.table.TableReference)[source]¶ check if table exists - if True - table exists else not exists
Parameters: - client – bigquery.Client object
- table_ref – bigquery.table.TableReference object with the table name and dataset
Returns: boolean True if table exists False if table not exists
-
to_csv(filename, sep=', ', save_query=False, **params)[source]¶ save the query results to a csv file
in order to save the query to a table as well as getting the dataframe, send a dict as params with: - table - dataset it will save to the same project
Parameters: - filename – str with the path to save the file
- sep – str separator to the csv file
- save_query – boolean if to save the query to a table also
- params – dict when save_query flag is on you need to give the relevant params
-
to_dataframe(save_query=False, **params)[source]¶ return the query results as data frame
in order to save the query to a table as well as getting the dataframe, send a dict as params with: - table - dataset it will save to the same project
Parameters: - save_query – boolean if to save the query to a table also
- params – dict when save_query flag is on you need to give the relevant params
Returns: pd.DataFrame the query results