Compare commits

...

35 Commits

Author SHA1 Message Date
ValueRaider
ab1042b4c9 Dev version 0.2.19b1 2023-05-04 22:14:34 +01:00
ValueRaider
8172fc02d2 Merge pull request #1514 from ranaroussi/feature/optimise-history
Optimise Ticker.history() - up to 2x faster
2023-05-04 22:08:40 +01:00
ValueRaider
836082280b Merge branch 'dev' into feature/optimise-history 2023-05-04 22:08:28 +01:00
ValueRaider
6a98c2eda6 Merge pull request #1493 from ranaroussi/feature/error-reporting
Deprecate 'debug' arg, improve 'logging' use
2023-05-04 22:06:54 +01:00
ValueRaider
46f55c8983 Add debug logging to 'history()' ; Improve logger fmt 2023-05-04 22:04:39 +01:00
ValueRaider
b025fef22c Optimise Ticker.history() - up to 2x faster
format_history_metadata() is expensive. Improvements:
- only perform full formatting if user requests metadata
- when pruning prepost data, only format 'tradingPeriods' entry of metadata

Other small optimisations to several internal prices processing methods.

Speedups:
dat.history(period='1wk', interval='1h', prepost=True)  # 2x
dat.history(period='1mo', interval='1h', prepost=True)  # 1.46x
dat.history(period='1wk', interval='1h')  # 1.15x
dat.history(period='1mo', interval='1h')  # 1.13x
dat.history(period='1y', interval='1d')  # 1.36x
dat.history(period='5y', interval='1d')  # 1.13x
2023-04-30 00:35:08 +01:00
ValueRaider
e3778465d8 Merge branch 'dev' into feature/error-reporting 2023-04-22 16:02:56 +01:00
ValueRaider
f82177ea2e Improve download() logging - group errors & tracebacks for cleaner STDOUT 2023-04-16 21:57:04 +01:00
ValueRaider
142b1f3eb4 Merge pull request #1499 from ranaroussi/main
sync main -> dev
2023-04-16 19:08:50 +01:00
ValueRaider
d3e2e71a6e Improve logging behaviour, particulary download()
- Use same logger across all files
- download():
  - write tracebacks to DEBUG
  - deprecate 'show_errors' argument
2023-04-15 17:29:07 +01:00
ValueRaider
4937c933a2 Deprecate 'debug' arg, improve 'logging' use 2023-04-15 16:47:39 +01:00
ValueRaider
1e941fc86a Merge branch 'main' into dev 2023-04-09 23:45:37 +01:00
ValueRaider
e7a3848f69 Merge pull request #1477 from ranaroussi/feature/price-repair-tweaks
Price repair: add 'Repaired?' column, and a bugfix
2023-04-09 21:01:49 +01:00
ValueRaider
3d29ced428 Merge pull request #1474 from garrettladley/leverage-dict-and-list-comps
Leverage dict & list comprehensions in yfinance/tickers.py
2023-04-06 13:26:08 +01:00
garrettladley
2fe5a0a361 leveraged dict & list comps in yfinance/tickers.py 2023-04-05 18:55:47 -04:00
Value Raider
a649b40dc9 Price repair: add 'Repaired?' column, and a bugfix
Price repair changes:
- if user requests price repair, add 'Repaired?' bool column showing what rows were repaired.
- fix price repair requesting <1d data beyond Yahoo's limit.
- fix logger messages
2023-04-03 21:27:04 +01:00
ValueRaider
a01edee4fa Merge pull request #1476 from ranaroussi/main
main -> dev
2023-04-03 21:20:50 +01:00
Value Raider
e89e190d11 Merge branch 'main' into dev 2023-03-21 19:05:56 +00:00
ValueRaider
a236270389 Merge pull request #1457 from ranaroussi/fix/price-fixes-various
Various fixes to price data processing
2023-03-21 18:59:13 +00:00
Value Raider
b5dca4941a Order history_metadata['tradingPeriods'] DF sensibly 2023-03-20 21:18:53 +00:00
Value Raider
6b71ba977c Various fixes to price data processing
- move drop-duplicates to before repair
- fix 'format_history_metadata()' processing 'regular' column
- fix Pandas & Numpy warnings
2023-03-20 21:10:45 +00:00
ValueRaider
6c70b866c7 Merge pull request #1423 from flaviovs/no-print
No print
2023-02-20 20:07:23 +00:00
Value Raider
bd696fb4db Beta version 0.2.13b1 2023-02-17 17:04:39 +00:00
Value Raider
d13aafa633 Replace more prints with logging, mostly in 'price repair' 2023-02-17 12:01:11 +00:00
Flávio Veloso Soares
00823f6fa6 Remove redundant logging text 2023-02-16 16:53:33 -08:00
Flávio Veloso Soares
21fdba9021 Replace warnings print() with warnings.warn(...) calls 2023-02-16 16:53:33 -08:00
Flávio Veloso Soares
972547ca8c Replace prints with logging module 2023-02-16 16:53:33 -08:00
ValueRaider
23b400f0fb Merge pull request #1421 from ranaroussi/fix/missing-price-history-errors
Improve handling missing price history
2023-02-16 14:22:10 +00:00
Value Raider
a1a385196b Improve handling missing price history
Fix fast_info[] dying if metadata incomplete/missing ; Price repair fix when no fine data available ; Fix _fix_unit_mixups() report
2023-02-14 17:31:14 +00:00
ValueRaider
a0046439d1 Merge pull request #1400 from ranaroussi/feature/improve-performance
Optimise recent new features in `history`
2023-02-12 14:58:36 +00:00
ValueRaider
63a8476575 Merge pull request #1417 from ranaroussi/main
main -> dev
2023-02-12 14:56:19 +00:00
ValueRaider
0f5db35b6e Optimise Ticker._reconstruct_intervals_batch() (slightly) 2023-02-05 18:16:08 +00:00
ValueRaider
7c6742a60a Optimise Ticker._fix_unit_mixups() 2023-02-05 15:15:56 +00:00
ValueRaider
36ace8017d Optimise Ticker._fix_zeroes() 2023-02-05 13:46:57 +00:00
ValueRaider
ead0bce96e Optimise format_history_metadata() 2023-02-04 22:56:49 +00:00
15 changed files with 368 additions and 254 deletions

View File

@@ -1,6 +1,11 @@
Change Log
===========
0.2.19b1 - beta
-------
Optimise Ticker.history #1514
Logging module #1493
0.2.18
------
Fix 'fast_info' error '_np not found' #1496

View File

@@ -186,6 +186,17 @@ yf.download(tickers = "SPY AAPL", # list of tickers
Review the [Wiki](https://github.com/ranaroussi/yfinance/wiki) for more options and detail.
### Logging
`yfinance` now uses the `logging` module. To control the detail of printed messages you simply change the level:
```
import logging
logger = logging.getLogger('yfinance')
logger.setLevel(logging.ERROR) # default: only print errors
logger.setLevel(logging.CRITICAL) # disable printing
logger.setLevel(logging.DEBUG) # verbose: print errors & debug info
```
### Smarter scraping
To use a custom `requests` session (for example to cache calls to the

View File

@@ -1,5 +1,5 @@
{% set name = "yfinance" %}
{% set version = "0.2.18" %}
{% set version = "0.2.19b1" %}
package:
name: "{{ name|lower }}"

View File

@@ -15,6 +15,9 @@ Sanity check for most common library uses all working
import yfinance as yf
import unittest
import logging
logging.basicConfig(level=logging.DEBUG)
symbols = ['MSFT', 'IWO', 'VFINX', '^GSPC', 'BTC-USD']
tickers = [yf.Ticker(symbol) for symbol in symbols]

View File

@@ -479,6 +479,9 @@ class TestPriceRepair(unittest.TestCase):
f_1 = ratio == 1
self.assertTrue((f_100 | f_1).all())
self.assertTrue("Repaired?" in df_repaired.columns)
self.assertFalse(df_repaired["Repaired?"].isna().any())
def test_repair_100x_weekly_preSplit(self):
# PNL.L has a stock-split in 2022. Sometimes requesting data before 2022 is not split-adjusted.
@@ -536,6 +539,9 @@ class TestPriceRepair(unittest.TestCase):
f_1 = ratio == 1
self.assertTrue((f_100 | f_1).all())
self.assertTrue("Repaired?" in df_repaired.columns)
self.assertFalse(df_repaired["Repaired?"].isna().any())
def test_repair_100x_daily(self):
tkr = "PNL.L"
dat = yf.Ticker(tkr, session=self.session)
@@ -578,6 +584,9 @@ class TestPriceRepair(unittest.TestCase):
f_1 = ratio == 1
self.assertTrue((f_100 | f_1).all())
self.assertTrue("Repaired?" in df_repaired.columns)
self.assertFalse(df_repaired["Repaired?"].isna().any())
def test_repair_zeroes_daily(self):
tkr = "BBIL.L"
dat = yf.Ticker(tkr, session=self.session)
@@ -605,6 +614,9 @@ class TestPriceRepair(unittest.TestCase):
for c in ["Open", "Low", "High", "Close"]:
self.assertTrue(_np.isclose(repaired_df[c], correct_df[c], rtol=1e-8).all())
self.assertTrue("Repaired?" in repaired_df.columns)
self.assertFalse(repaired_df["Repaired?"].isna().any())
def test_repair_zeroes_hourly(self):
tkr = "INTC"
dat = yf.Ticker(tkr, session=self.session)
@@ -636,6 +648,9 @@ class TestPriceRepair(unittest.TestCase):
print(repaired_df[c] - correct_df[c])
raise
self.assertTrue("Repaired?" in repaired_df.columns)
self.assertFalse(repaired_df["Repaired?"].isna().any())
if __name__ == '__main__':
unittest.main()

View File

@@ -21,6 +21,7 @@
from __future__ import print_function
import warnings
import time as _time
import datetime as _datetime
import dateutil as _dateutil
@@ -47,6 +48,7 @@ _BASE_URL_ = 'https://query2.finance.yahoo.com'
_SCRAPE_URL_ = 'https://finance.yahoo.com/quote'
_ROOT_URL_ = 'https://finance.yahoo.com'
logger = utils.get_yf_logger()
class TickerBase:
def __init__(self, ticker, session=None):
@@ -54,6 +56,7 @@ class TickerBase:
self.session = session
self._history = None
self._history_metadata = None
self._history_metadata_formatted = False
self._base_url = _BASE_URL_
self._scrape_url = _SCRAPE_URL_
self._tz = None
@@ -91,7 +94,8 @@ class TickerBase:
start=None, end=None, prepost=False, actions=True,
auto_adjust=True, back_adjust=False, repair=False, keepna=False,
proxy=None, rounding=False, timeout=10,
debug=True, raise_errors=False) -> pd.DataFrame:
debug=None, # deprecated
raise_errors=False) -> pd.DataFrame:
"""
:Parameters:
period : str
@@ -132,26 +136,32 @@ class TickerBase:
seconds. (Can also be a fraction of a second e.g. 0.01)
Default is 10 seconds.
debug: bool
If passed as False, will suppress
error message printing to console.
If passed as False, will suppress message printing to console.
DEPRECATED, will be removed in future version
raise_errors: bool
If True, then raise errors as
exceptions instead of printing to console.
If True, then raise errors as Exceptions instead of logging.
"""
if debug is not None:
if debug:
utils.print_once(f"yfinance: Ticker.history(debug={debug}) argument is deprecated and will be removed in future version. Do this instead: logging.getLogger('yfinance').setLevel(logging.ERROR)")
logger.setLevel(logging.ERROR)
else:
utils.print_once(f"yfinance: Ticker.history(debug={debug}) argument is deprecated and will be removed in future version. Do this instead to suppress error messages: logging.getLogger('yfinance').setLevel(logging.CRITICAL)")
logger.setLevel(logging.CRITICAL)
if start or period is None or period.lower() == "max":
# Check can get TZ. Fail => probably delisted
tz = self._get_ticker_tz(debug, proxy, timeout)
tz = self._get_ticker_tz(proxy, timeout)
if tz is None:
# Every valid ticker has a timezone. Missing = problem
err_msg = "No timezone found, symbol may be delisted"
shared._DFS[self.ticker] = utils.empty_df()
shared._ERRORS[self.ticker] = err_msg
if debug:
if raise_errors:
raise Exception('%s: %s' % (self.ticker, err_msg))
else:
print('- %s: %s' % (self.ticker, err_msg))
if raise_errors:
raise Exception('%s: %s' % (self.ticker, err_msg))
else:
logger.error('%s: %s' % (self.ticker, err_msg))
return utils.empty_df()
if end is None:
@@ -187,20 +197,25 @@ class TickerBase:
#if the ticker is MUTUALFUND or ETF, then get capitalGains events
params["events"] = "div,splits,capitalGains"
params_pretty = dict(params)
tz = self._get_ticker_tz(proxy, timeout)
for k in ["period1", "period2"]:
if k in params_pretty:
params_pretty[k] = str(_pd.Timestamp(params[k], unit='s').tz_localize("UTC").tz_convert(tz))
logger.debug('%s: %s' % (self.ticker, "Yahoo GET parameters: " + str(params_pretty)))
# Getting data from json
url = "{}/v8/finance/chart/{}".format(self._base_url, self.ticker)
data = None
get_fn = self._data.get
if end is not None:
end_dt = _pd.Timestamp(end, unit='s').tz_localize("UTC")
dt_now = _pd.Timestamp.utcnow()
data_delay = _datetime.timedelta(minutes=30)
if end_dt+data_delay <= dt_now:
# Date range in past so safe to fetch through cache:
get_fn = self._data.cache_get
try:
get_fn = self._data.get
if end is not None:
end_dt = _pd.Timestamp(end, unit='s').tz_localize("UTC")
dt_now = _pd.Timestamp.utcnow()
data_delay = _datetime.timedelta(minutes=30)
if end_dt+data_delay <= dt_now:
# Date range in past so safe to fetch through cache:
get_fn = self._data.cache_get
data = get_fn(
url=url,
params=params,
@@ -220,7 +235,6 @@ class TickerBase:
self._history_metadata = data["chart"]["result"][0]["meta"]
except Exception:
self._history_metadata = {}
self._history_metadata = utils.format_history_metadata(self._history_metadata)
err_msg = "No data found for this date range, symbol may be delisted"
fail = False
@@ -243,11 +257,10 @@ class TickerBase:
if fail:
shared._DFS[self.ticker] = utils.empty_df()
shared._ERRORS[self.ticker] = err_msg
if debug:
if raise_errors:
raise Exception('%s: %s' % (self.ticker, err_msg))
else:
print('%s: %s' % (self.ticker, err_msg))
if raise_errors:
raise Exception('%s: %s' % (self.ticker, err_msg))
else:
logger.error('%s: %s' % (self.ticker, err_msg))
return utils.empty_df()
# parse quotes
@@ -261,15 +274,16 @@ class TickerBase:
except Exception:
shared._DFS[self.ticker] = utils.empty_df()
shared._ERRORS[self.ticker] = err_msg
if debug:
if raise_errors:
raise Exception('%s: %s' % (self.ticker, err_msg))
else:
print('%s: %s' % (self.ticker, err_msg))
if raise_errors:
raise Exception('%s: %s' % (self.ticker, err_msg))
else:
logger.error('%s: %s' % (self.ticker, err_msg))
return shared._DFS[self.ticker]
logger.debug(f'{self.ticker}: yfinance received OHLC data: {quotes.index[0]} -> {quotes.index[-1]}')
# 2) fix weired bug with Yahoo! - returning 60m for 30m bars
if interval.lower() == "30m":
logger.debug(f'{self.ticker}: resampling 30m OHLC from 15m')
quotes2 = quotes.resample('30T')
quotes = _pd.DataFrame(index=quotes2.last().index, data={
'Open': quotes2['Open'].first(),
@@ -299,7 +313,12 @@ class TickerBase:
quotes = utils.fix_Yahoo_returning_live_separate(quotes, params["interval"], tz_exchange)
intraday = params["interval"][-1] in ("m", 'h')
if not prepost and intraday and "tradingPeriods" in self._history_metadata:
quotes = utils.fix_Yahoo_returning_prepost_unrequested(quotes, params["interval"], self._history_metadata)
tps = self._history_metadata["tradingPeriods"]
if not isinstance(tps, pd.DataFrame):
self._history_metadata = utils.format_history_metadata(self._history_metadata, tradingPeriodsOnly=True)
tps = self._history_metadata["tradingPeriods"]
quotes = utils.fix_Yahoo_returning_prepost_unrequested(quotes, params["interval"], tps)
logger.debug(f'{self.ticker}: OHLC after cleaning: {quotes.index[0]} -> {quotes.index[-1]}')
# actions
dividends, splits, capital_gains = utils.parse_actions(data["chart"]["result"][0])
@@ -361,9 +380,13 @@ class TickerBase:
df.loc[df["Capital Gains"].isna(),"Capital Gains"] = 0
else:
df["Capital Gains"] = 0.0
logger.debug(f'{self.ticker}: OHLC after combining events: {quotes.index[0]} -> {quotes.index[-1]}')
df = df[~df.index.duplicated(keep='first')] # must do before repair
if repair==True or repair=="silent":
# Do this before auto/back adjust
logger.debug(f'{self.ticker}: checking OHLC for repairs ...')
df = self._fix_zeroes(df, interval, tz_exchange, prepost, silent=(repair=="silent"))
df = self._fix_unit_mixups(df, interval, tz_exchange, prepost, silent=(repair=="silent"))
@@ -380,11 +403,10 @@ class TickerBase:
err_msg = "back_adjust failed with %s" % e
shared._DFS[self.ticker] = utils.empty_df()
shared._ERRORS[self.ticker] = err_msg
if debug:
if raise_errors:
raise Exception('%s: %s' % (self.ticker, err_msg))
else:
print('%s: %s' % (self.ticker, err_msg))
if raise_errors:
raise Exception('%s: %s' % (self.ticker, err_msg))
else:
logger.error('%s: %s' % (self.ticker, err_msg))
if rounding:
df = _np.round(df, data[
@@ -396,15 +418,17 @@ class TickerBase:
else:
df.index.name = "Date"
# duplicates and missing rows cleanup
df = df[~df.index.duplicated(keep='first')]
self._history = df.copy()
# missing rows cleanup
if not actions:
df = df.drop(columns=["Dividends", "Stock Splits", "Capital Gains"], errors='ignore')
if not keepna:
mask_nan_or_zero = (df.isna() | (df == 0)).all(axis=1)
df = df.drop(mask_nan_or_zero.index[mask_nan_or_zero])
logger.debug(f'{self.ticker}: yfinance returning OHLC: {df.index[0]} -> {df.index[-1]}')
return df
# ------------------------
@@ -418,9 +442,6 @@ class TickerBase:
# Reconstruct values in df using finer-grained price data. Delimiter marks what to reconstruct
debug = False
# debug = True
if interval[1:] in ['d', 'wk', 'mo']:
# Interday data always includes pre & post
prepost = True
@@ -444,8 +465,7 @@ class TickerBase:
sub_interval = nexts[interval]
td_range = itds[interval]
else:
print("WARNING: Have not implemented repair for '{}' interval. Contact developers".format(interval))
raise Exception("why here")
logger.warning("Have not implemented price repair for '%s' interval. Contact developers", interval)
return df
df = df.sort_index()
@@ -461,25 +481,23 @@ class TickerBase:
m -= _datetime.timedelta(days=1) # allow space for 1-day padding
min_dt = _pd.Timestamp.utcnow() - m
min_dt = min_dt.tz_convert(df.index.tz).ceil("D")
if debug:
print(f"- min_dt={min_dt} interval={interval} sub_interval={sub_interval}")
logger.debug(f"min_dt={min_dt} interval={interval} sub_interval={sub_interval}")
if min_dt is not None:
f_recent = df.index >= min_dt
f_repair_rows = f_repair_rows & f_recent
if not f_repair_rows.any():
if debug:
print("data too old to repair")
logger.info("Data too old to repair")
return df
dts_to_repair = df.index[f_repair_rows]
indices_to_repair = _np.where(f_repair_rows)[0]
if len(dts_to_repair) == 0:
if debug:
print("dts_to_repair[] is empty")
logger.info("Nothing needs repairing (dts_to_repair[] empty)")
return df
df_v2 = df.copy()
df_v2["Repaired?"] = False
f_good = ~(df[price_cols].isna().any(axis=1))
f_good = f_good & (df[price_cols].to_numpy()!=tag).all(axis=1)
df_good = df[f_good]
@@ -502,8 +520,7 @@ class TickerBase:
grp_max_size = _datetime.timedelta(days=5) # allow 2 days for buffer below
else:
grp_max_size = _datetime.timedelta(days=30)
if debug:
print("- grp_max_size =", grp_max_size)
logger.debug(f"grp_max_size = {grp_max_size}")
for i in range(1, len(dts_to_repair)):
ind = indices_to_repair[i]
dt = dts_to_repair[i]
@@ -514,12 +531,11 @@ class TickerBase:
last_dt = dt
last_ind = ind
if debug:
print("Repair groups:")
for g in dts_groups:
print(f"- {g[0]} -> {g[-1]}")
logger.debug("Repair groups:")
for g in dts_groups:
logger.debug(f"- {g[0]} -> {g[-1]}")
# Add some good data to each group, so can calibrate later:
# Add some good data to each group, so can calibrate prices later:
for i in range(len(dts_groups)):
g = dts_groups[i]
g0 = g[0]
@@ -540,21 +556,18 @@ class TickerBase:
n_fixed = 0
for g in dts_groups:
df_block = df[df.index.isin(g)]
if debug:
print("- df_block:")
print(df_block)
logger.debug("df_block:")
logger.debug(df_block)
start_dt = g[0]
start_d = start_dt.date()
if sub_interval == "1h" and (_datetime.date.today() - start_d) > _datetime.timedelta(days=729):
# Don't bother requesting more price data, Yahoo will reject
if debug:
print(f"- Don't bother requesting {sub_interval} price data, Yahoo will reject")
logger.warning(f"Cannot reconstruct {interval} block starting {start_d}, too old, Yahoo will reject request for finer-grain data")
continue
elif sub_interval in ["30m", "15m"] and (_datetime.date.today() - start_d) > _datetime.timedelta(days=59):
# Don't bother requesting more price data, Yahoo will reject
if debug:
print(f"- Don't bother requesting {sub_interval} price data, Yahoo will reject")
logger.warning(f"Cannot reconstruct {interval} block starting {start_d}, too old, Yahoo will reject request for finer-grain data")
continue
td_1d = _datetime.timedelta(days=1)
@@ -574,16 +587,21 @@ class TickerBase:
if intraday:
fetch_start = fetch_start.date()
fetch_end = fetch_end.date()+td_1d
if debug:
print(f"- fetching {sub_interval} prepost={prepost} {fetch_start}->{fetch_end}")
if min_dt is not None:
fetch_start = max(min_dt.date(), fetch_start)
logger.debug(f"Fetching {sub_interval} prepost={prepost} {fetch_start}->{fetch_end}")
r = "silent" if silent else True
df_fine = self.history(start=fetch_start, end=fetch_end, interval=sub_interval, auto_adjust=False, actions=False, prepost=prepost, repair=r, keepna=True)
if df_fine is None or df_fine.empty:
if not silent:
print("YF: WARNING: Cannot reconstruct because Yahoo not returning data in interval")
logger.warning(f"Cannot reconstruct {interval} block starting {start_d}, too old, Yahoo is rejecting request for finer-grain data")
continue
# Discard the buffer
df_fine = df_fine.loc[g[0] : g[-1]+itds[sub_interval]-_datetime.timedelta(milliseconds=1)]
df_fine = df_fine.loc[g[0] : g[-1]+itds[sub_interval]-_datetime.timedelta(milliseconds=1)].copy()
if df_fine.empty:
if not silent:
print("YF: WARNING: Cannot reconstruct because Yahoo not returning data in interval")
continue
df_fine["ctr"] = 0
if interval == "1wk":
@@ -616,25 +634,22 @@ class TickerBase:
new_index = _np.append([df_fine.index[0]], df_fine.index[df_fine["intervalID"].diff()>0])
df_new.index = new_index
if debug:
print("- df_new:")
print(df_new)
logger.debug("df_new:")
logger.debug(df_new)
# Calibrate! Check whether 'df_fine' has different split-adjustment.
# If different, then adjust to match 'df'
common_index = _np.intersect1d(df_block.index, df_new.index)
if len(common_index) == 0:
# Can't calibrate so don't attempt repair
if debug:
print("Can't calibrate so don't attempt repair")
logger.warning(f"Can't calibrate {interval} block starting {start_d} so aborting repair")
continue
df_new_calib = df_new[df_new.index.isin(common_index)][price_cols].to_numpy()
df_block_calib = df_block[df_block.index.isin(common_index)][price_cols].to_numpy()
calib_filter = (df_block_calib != tag)
if not calib_filter.any():
# Can't calibrate so don't attempt repair
if debug:
print("Can't calibrate so don't attempt repair")
logger.warning(f"Can't calibrate {interval} block starting {start_d} so aborting repair")
continue
# Avoid divide-by-zero warnings:
for j in range(len(price_cols)):
@@ -650,8 +665,7 @@ class TickerBase:
weights = _np.tile(weights, len(price_cols)) # 1D -> 2D
weights = weights[calib_filter] # flatten
ratio = _np.average(ratios, weights=weights)
if debug:
print(f"- price calibration ratio (raw) = {ratio}")
logger.debug(f"Price calibration ratio (raw) = {ratio}")
ratio_rcp = round(1.0 / ratio, 1)
ratio = round(ratio, 1)
if ratio == 1 and ratio_rcp == 1:
@@ -670,18 +684,17 @@ class TickerBase:
df_new["Volume"] *= ratio_rcp
# Repair!
bad_dts = df_block.index[(df_block[price_cols+["Volume"]]==tag).any(axis=1)]
bad_dts = df_block.index[(df_block[price_cols+["Volume"]]==tag).to_numpy().any(axis=1)]
if debug:
no_fine_data_dts = []
for idx in bad_dts:
if not idx in df_new.index:
# Yahoo didn't return finer-grain data for this interval,
# so probably no trading happened.
no_fine_data_dts.append(idx)
if len(no_fine_data_dts) > 0:
print(f"Yahoo didn't return finer-grain data for these intervals:")
print(no_fine_data_dts)
no_fine_data_dts = []
for idx in bad_dts:
if not idx in df_new.index:
# Yahoo didn't return finer-grain data for this interval,
# so probably no trading happened.
no_fine_data_dts.append(idx)
if len(no_fine_data_dts) > 0:
logger.debug(f"Yahoo didn't return finer-grain data for these intervals:")
logger.debug(no_fine_data_dts)
for idx in bad_dts:
if not idx in df_new.index:
# Yahoo didn't return finer-grain data for this interval,
@@ -694,7 +707,7 @@ class TickerBase:
df_fine = df_fine.loc[idx:]
df_bad_row = df.loc[idx]
bad_fields = df_bad_row.index[df_bad_row==tag].values
bad_fields = df_bad_row.index[df_bad_row==tag].to_numpy()
if "High" in bad_fields:
df_v2.loc[idx, "High"] = df_new_row["High"]
if "Low" in bad_fields:
@@ -712,10 +725,11 @@ class TickerBase:
df_v2.loc[idx, "Adj Close"] = df_new_row["Adj Close"]
if "Volume" in bad_fields:
df_v2.loc[idx, "Volume"] = df_new_row["Volume"]
df_v2.loc[idx, "Repaired?"] = True
n_fixed += 1
if debug:
print("df_v2:") ; print(df_v2)
logger.debug("df_v2:")
logger.debug(df_v2)
return df_v2
@@ -728,13 +742,14 @@ class TickerBase:
return df
if df.shape[0] == 1:
# Need multiple rows to confidently identify outliers
logger.warning("Cannot check single-row table for 100x price errors")
return df
df2 = df.copy()
if df.index.tz is None:
df2.index = df2.index.tz_localize(tz_exchange)
else:
elif df2.index.tz != tz_exchange:
df2.index = df2.index.tz_convert(tz_exchange)
# Only import scipy if users actually want function. To avoid
@@ -743,19 +758,22 @@ class TickerBase:
data_cols = ["High", "Open", "Low", "Close", "Adj Close"] # Order important, separate High from Low
data_cols = [c for c in data_cols if c in df2.columns]
f_zeroes = (df2[data_cols]==0).any(axis=1)
f_zeroes = (df2[data_cols]==0).any(axis=1).to_numpy()
if f_zeroes.any():
df2_zeroes = df2[f_zeroes]
df2 = df2[~f_zeroes]
else:
df2_zeroes = None
if df2.shape[0] <= 1:
logger.warning("Insufficient good data for detecting 100x price errors")
return df
median = _ndimage.median_filter(df2[data_cols].values, size=(3, 3), mode="wrap")
ratio = df2[data_cols].values / median
df2_data = df2[data_cols].to_numpy()
median = _ndimage.median_filter(df2_data, size=(3, 3), mode="wrap")
ratio = df2_data / median
ratio_rounded = (ratio / 20).round() * 20 # round ratio to nearest 20
f = ratio_rounded == 100
if not f.any():
logger.info("No bad data (100x wrong) to repair")
return df
# Mark values to send for repair
@@ -765,14 +783,15 @@ class TickerBase:
c = data_cols[i]
df2.loc[fi, c] = tag
n_before = (df2[data_cols].to_numpy()==tag).sum()
n_before = (df2_data==tag).sum()
df2 = self._reconstruct_intervals_batch(df2, interval, prepost, tag, silent)
df2_tagged = df2[data_cols].to_numpy()==tag
n_after = (df2[data_cols].to_numpy()==tag).sum()
if n_after > 0:
# This second pass will *crudely* "fix" any remaining errors in High/Low
# simply by ensuring they don't contradict e.g. Low = 100x High.
f = df2[data_cols].to_numpy()==tag
f = df2_tagged
for i in range(f.shape[0]):
fi = f[i,:]
if not fi.any():
@@ -804,7 +823,10 @@ class TickerBase:
if fi[j]:
df2.loc[idx, c] = df2.loc[idx, ["Open", "Close"]].min()
n_after_crude = (df2[data_cols].to_numpy()==tag).sum()
df2_tagged = df2[data_cols].to_numpy()==tag
n_after_crude = df2_tagged.sum()
else:
n_after_crude = n_after
n_fixed = n_before - n_after_crude
n_fixed_crudely = n_after - n_after_crude
@@ -813,16 +835,17 @@ class TickerBase:
if n_fixed_crudely > 0:
report_msg += f"({n_fixed_crudely} crudely) "
report_msg += f"in {interval} price data"
print(report_msg)
logger.info('%s', report_msg)
# Restore original values where repair failed
f = df2[data_cols].values==tag
f = df2_tagged
for j in range(len(data_cols)):
fj = f[:,j]
if fj.any():
c = data_cols[j]
df2.loc[fj, c] = df.loc[fj, c]
if df2_zeroes is not None:
df2_zeroes["Repaired?"] = False
df2 = _pd.concat([df2, df2_zeroes]).sort_index()
df2.index = _pd.to_datetime()
@@ -836,9 +859,6 @@ class TickerBase:
if df.shape[0] == 0:
return df
debug = False
# debug = True
intraday = interval[-1] in ("m", 'h')
df = df.sort_index() # important!
@@ -846,7 +866,7 @@ class TickerBase:
if df2.index.tz is None:
df2.index = df2.index.tz_localize(tz_exchange)
else:
elif df2.index.tz != tz_exchange:
df2.index = df2.index.tz_convert(tz_exchange)
price_cols = [c for c in ["Open", "High", "Low", "Close", "Adj Close"] if c in df2.columns]
@@ -854,30 +874,27 @@ class TickerBase:
df2_reserve = None
if intraday:
# Ignore days with >50% intervals containing NaNs
df_nans = pd.DataFrame(f_prices_bad.any(axis=1), columns=["nan"])
df_nans["_date"] = df_nans.index.date
grp = df_nans.groupby("_date")
grp = pd.Series(f_prices_bad.any(axis=1), name="nan").groupby(f_prices_bad.index.date)
nan_pct = grp.sum() / grp.count()
dts = nan_pct.index[nan_pct["nan"]>0.5]
dts = nan_pct.index[nan_pct>0.5]
f_zero_or_nan_ignore = _np.isin(f_prices_bad.index.date, dts)
df2_reserve = df2[f_zero_or_nan_ignore]
df2 = df2[~f_zero_or_nan_ignore]
f_prices_bad = (df2[price_cols] == 0.0) | df2[price_cols].isna()
f_high_low_good = (~df2["High"].isna()) & (~df2["Low"].isna())
f_vol_bad = (df2["Volume"]==0).to_numpy() & f_high_low_good & (df2["High"]!=df2["Low"]).to_numpy()
f_high_low_good = (~df2["High"].isna().to_numpy()) & (~df2["Low"].isna().to_numpy())
f_change = df2["High"].to_numpy() != df2["Low"].to_numpy()
f_vol_bad = (df2["Volume"]==0).to_numpy() & f_high_low_good & f_change
# Check whether worth attempting repair
f_prices_bad = f_prices_bad.to_numpy()
f_bad_rows = f_prices_bad.any(axis=1) | f_vol_bad
if not f_bad_rows.any():
if debug:
print("no bad data to repair")
logger.info("No bad data (price=0) to repair")
return df
if f_prices_bad.sum() == len(price_cols)*len(df2):
# Need some good data to calibrate
if debug:
print("no good data to calibrate")
logger.warning("No good data for calibration so cannot fix price=0 bad data")
return df
data_cols = price_cols + ["Volume"]
@@ -892,37 +909,38 @@ class TickerBase:
f_vol_zero_or_nan = (df2["Volume"].to_numpy()==0) | (df2["Volume"].isna().to_numpy())
df2.loc[f_prices_bad.any(axis=1) & f_vol_zero_or_nan, "Volume"] = tag
# If volume=0 or NaN but price moved in interval, then tag volume for repair
f_change = df2["High"].to_numpy() != df2["Low"].to_numpy()
df2.loc[f_change & f_vol_zero_or_nan, "Volume"] = tag
n_before = (df2[data_cols].to_numpy()==tag).sum()
dts_tagged = df2.index[(df2[data_cols].to_numpy()==tag).any(axis=1)]
df2 = self._reconstruct_intervals_batch(df2, interval, prepost, tag, silent)
n_after = (df2[data_cols].to_numpy()==tag).sum()
dts_not_repaired = df2.index[(df2[data_cols].to_numpy()==tag).any(axis=1)]
df2_tagged = df2[data_cols].to_numpy()==tag
n_before = df2_tagged.sum()
dts_tagged = df2.index[df2_tagged.any(axis=1)]
df3 = self._reconstruct_intervals_batch(df2, interval, prepost, tag, silent)
df3_tagged = df3[data_cols].to_numpy()==tag
n_after = df3_tagged.sum()
dts_not_repaired = df3.index[df3_tagged.any(axis=1)]
n_fixed = n_before - n_after
if not silent and n_fixed > 0:
msg = f"{self.ticker}: fixed {n_fixed}/{n_before} value=0 errors in {interval} price data"
if n_fixed < 4:
dts_repaired = sorted(list(set(dts_tagged).difference(dts_not_repaired)))
msg += f": {dts_repaired}"
print(msg)
logger.info('%s', msg)
if df2_reserve is not None:
df2 = _pd.concat([df2, df2_reserve])
df2 = df2.sort_index()
df2_reserve["Repaired?"] = False
df3 = _pd.concat([df3, df2_reserve]).sort_index()
# Restore original values where repair failed (i.e. remove tag values)
f = df2[data_cols].values==tag
f = df3[data_cols].to_numpy()==tag
for j in range(len(data_cols)):
fj = f[:,j]
if fj.any():
c = data_cols[j]
df2.loc[fj, c] = df.loc[fj, c]
df3.loc[fj, c] = df.loc[fj, c]
return df2
return df3
def _get_ticker_tz(self, debug_mode, proxy, timeout):
def _get_ticker_tz(self, proxy, timeout):
if self._tz is not None:
return self._tz
cache = utils.get_tz_cache()
@@ -934,7 +952,7 @@ class TickerBase:
tz = None
if tz is None:
tz = self._fetch_ticker_tz(debug_mode, proxy, timeout)
tz = self._fetch_ticker_tz(proxy, timeout)
if utils.is_valid_timezone(tz):
# info fetch is relatively slow so cache timezone
@@ -945,7 +963,7 @@ class TickerBase:
self._tz = tz
return tz
def _fetch_ticker_tz(self, debug_mode, proxy, timeout):
def _fetch_ticker_tz(self, proxy, timeout):
# Query Yahoo for fast price data just to get returned timezone
params = {"range": "1d", "interval": "1d"}
@@ -957,25 +975,22 @@ class TickerBase:
data = self._data.cache_get(url=url, params=params, proxy=proxy, timeout=timeout)
data = data.json()
except Exception as e:
if debug_mode:
print("Failed to get ticker '{}' reason: {}".format(self.ticker, e))
logger.error("Failed to get ticker '{}' reason: {}".format(self.ticker, e))
return None
else:
error = data.get('chart', {}).get('error', None)
if error:
# explicit error from yahoo API
if debug_mode:
print("Got error from yahoo api for ticker {}, Error: {}".format(self.ticker, error))
logger.debug("Got error from yahoo api for ticker {}, Error: {}".format(self.ticker, error))
else:
try:
return data["chart"]["result"][0]["meta"]["exchangeTimezoneName"]
except Exception as err:
if debug_mode:
print("Could not get exchangeTimezoneName for ticker '{}' reason: {}".format(self.ticker, err))
print("Got response: ")
print("-------------")
print(" {}".format(data))
print("-------------")
logger.error("Could not get exchangeTimezoneName for ticker '{}' reason: {}".format(self.ticker, err))
logger.debug("Got response: ")
logger.debug("-------------")
logger.debug(" {}".format(data))
logger.debug("-------------")
return None
def get_recommendations(self, proxy=None, as_dict=False):
@@ -1028,7 +1043,7 @@ class TickerBase:
@property
def basic_info(self):
print("WARNING: 'Ticker.basic_info' is renamed to 'Ticker.fast_info', hopefully purpose is clearer")
warnings.warn("'Ticker.basic_info' is renamed to 'Ticker.fast_info', hopefully purpose is clearer", DeprecationWarning)
return self.fast_info
def get_sustainability(self, proxy=None, as_dict=False):
@@ -1257,7 +1272,7 @@ class TickerBase:
def get_shares_full(self, start=None, end=None, proxy=None):
# Process dates
tz = self._get_ticker_tz(debug_mode=False, proxy=None, timeout=10)
tz = self._get_ticker_tz(proxy=None, timeout=10)
dt_now = _pd.Timestamp.utcnow().tz_convert(tz)
if start is not None:
start_ts = utils._parse_user_dt(start, tz)
@@ -1272,7 +1287,7 @@ class TickerBase:
if start is None:
start = end - _pd.Timedelta(days=548) # 18 months
if start >= end:
print("ERROR: start date must be before end")
logger.error("Start date must be before end")
return None
start = start.floor("D")
end = end.ceil("D")
@@ -1284,14 +1299,14 @@ class TickerBase:
json_str = self._data.cache_get(shares_url).text
json_data = _json.loads(json_str)
except:
print(f"{self.ticker}: Yahoo web request for share count failed")
logger.error("%s: Yahoo web request for share count failed", self.ticker)
return None
try:
fail = json_data["finance"]["error"]["code"] == "Bad Request"
except:
fail = False
if fail:
print(f"{self.ticker}: Yahoo web request for share count failed")
logger.error(f"%s: Yahoo web request for share count failed", self.ticker)
return None
shares_data = json_data["timeseries"]["result"]
@@ -1300,7 +1315,7 @@ class TickerBase:
try:
df = _pd.Series(shares_data[0]["shares_out"], index=_pd.to_datetime(shares_data[0]["timestamp"], unit="s"))
except Exception as e:
print(f"{self.ticker}: Failed to parse shares count data: "+str(e))
logger.error(f"%s: Failed to parse shares count data: %s", self.ticker, e)
return None
df.index = df.index.tz_localize(tz)
@@ -1415,7 +1430,7 @@ class TickerBase:
if dates is None or dates.shape[0] == 0:
err_msg = "No earnings dates found, symbol may be delisted"
print('- %s: %s' % (self.ticker, err_msg))
logger.error('%s: %s', self.ticker, err_msg)
return None
dates = dates.reset_index(drop=True)
@@ -1443,7 +1458,7 @@ class TickerBase:
dates[cn] = _pd.to_datetime(dates[cn], format="%b %d, %Y, %I %p")
# - instead of attempting decoding of ambiguous timezone abbreviation, just use 'info':
self._quote.proxy = proxy
tz = self._get_ticker_tz(debug_mode=False, proxy=proxy, timeout=30)
tz = self._get_ticker_tz(proxy=proxy, timeout=30)
dates[cn] = dates[cn].dt.tz_localize(tz)
dates = dates.set_index("Earnings Date")
@@ -1456,4 +1471,9 @@ class TickerBase:
if self._history_metadata is None:
# Request intraday data, because then Yahoo returns exchange schedule.
self.history(period="1wk", interval="1h", prepost=True)
if self._history_metadata_formatted is False:
self._history_metadata = utils.format_history_metadata(self._history_metadata)
self._history_metadata_formatted = True
return self._history_metadata

View File

@@ -1,6 +1,7 @@
import functools
from functools import lru_cache
import logging
import hashlib
from base64 import b64decode
usePycryptodome = False # slightly faster
@@ -25,8 +26,12 @@ try:
except ImportError:
import json as json
from . import utils
cache_maxsize = 64
logger = utils.get_yf_logger()
def lru_cache_freezeargs(func):
"""
@@ -297,11 +302,11 @@ class TickerData:
# Gather decryption keys:
soup = BeautifulSoup(response.content, "html.parser")
keys = self._get_decryption_keys_from_yahoo_js(soup)
# if len(keys) == 0:
# msg = "No decryption keys could be extracted from JS file."
# if "requests_cache" in str(type(response)):
# msg += " Try flushing your 'requests_cache', probably parsing old JS."
# print("WARNING: " + msg + " Falling back to backup decrypt methods.")
if len(keys) == 0:
msg = "No decryption keys could be extracted from JS file."
if "requests_cache" in str(type(response)):
msg += " Try flushing your 'requests_cache', probably parsing old JS."
logger.warning("%s Falling back to backup decrypt methods.", msg)
if len(keys) == 0:
keys = []
try:

View File

@@ -21,6 +21,8 @@
from __future__ import print_function
import logging
import traceback
import time as _time
import multitasking as _multitasking
import pandas as _pd
@@ -28,10 +30,9 @@ import pandas as _pd
from . import Ticker, utils
from . import shared
def download(tickers, start=None, end=None, actions=False, threads=True, ignore_tz=None,
group_by='column', auto_adjust=False, back_adjust=False, repair=False, keepna=False,
progress=True, period="max", show_errors=True, interval="1d", prepost=False,
progress=True, period="max", show_errors=None, interval="1d", prepost=False,
proxy=None, rounding=False, timeout=10):
"""Download yahoo tickers
:Parameters:
@@ -77,11 +78,20 @@ def download(tickers, start=None, end=None, actions=False, threads=True, ignore_
Optional. Round values to 2 decimal places?
show_errors: bool
Optional. Doesn't print errors if False
DEPRECATED, will be removed in future version
timeout: None or float
If not None stops waiting for a response after given number of
seconds. (Can also be a fraction of a second e.g. 0.01)
"""
if show_errors is not None:
if show_errors:
utils.print_once(f"yfinance: download(show_errors={show_errors}) argument is deprecated and will be removed in future version. Do this instead: logging.getLogger('yfinance').setLevel(logging.ERROR)")
logging.getLogger('yfinance').setLevel(logging.ERROR)
else:
utils.print_once(f"yfinance: download(show_errors={show_errors}) argument is deprecated and will be removed in future version. Do this instead to suppress error messages: logging.getLogger('yfinance').setLevel(logging.CRITICAL)")
logging.getLogger('yfinance').setLevel(logging.CRITICAL)
if ignore_tz is None:
# Set default value depending on interval
if interval[1:] in ['m', 'h']:
@@ -114,6 +124,7 @@ def download(tickers, start=None, end=None, actions=False, threads=True, ignore_
# reset shared._DFS
shared._DFS = {}
shared._ERRORS = {}
shared._TRACEBACKS = {}
# download using threads
if threads:
@@ -146,12 +157,31 @@ def download(tickers, start=None, end=None, actions=False, threads=True, ignore_
if progress:
shared._PROGRESS_BAR.completed()
if shared._ERRORS and show_errors:
print('\n%.f Failed download%s:' % (
if shared._ERRORS:
logger = utils.get_yf_logger()
logger.error('\n%.f Failed download%s:' % (
len(shared._ERRORS), 's' if len(shared._ERRORS) > 1 else ''))
# print(shared._ERRORS)
print("\n".join(['- %s: %s' %
v for v in list(shared._ERRORS.items())]))
# Print each distinct error once, with list of symbols affected
errors = {}
for ticker in shared._ERRORS:
err = shared._ERRORS[ticker]
if not err in errors:
errors[err] = [ticker]
else:
errors[err].append(ticker)
for err in errors.keys():
logger.error(f'{errors[err]}: ' + err)
# Print each distinct traceback once, with list of symbols affected
tbs = {}
for ticker in shared._ERRORS:
tb = shared._TRACEBACKS[ticker]
if not tb in tbs:
tbs[tb] = [ticker]
else:
tbs[tb].append(ticker)
for tb in tbs.keys():
logger.debug(f'{tbs[tb]}: ' + tb)
if ignore_tz:
for tkr in shared._DFS.keys():
@@ -215,6 +245,7 @@ def _download_one_threaded(ticker, start=None, end=None,
keepna, timeout)
except Exception as e:
# glob try/except needed as current thead implementation breaks if exception is raised.
shared._TRACEBACKS[ticker] = traceback.format_exc()
shared._DFS[ticker] = utils.empty_df()
shared._ERRORS[ticker] = repr(e)
else:
@@ -234,5 +265,5 @@ def _download_one(ticker, start=None, end=None,
actions=actions, auto_adjust=auto_adjust,
back_adjust=back_adjust, repair=repair, proxy=proxy,
rounding=rounding, keepna=keepna, timeout=timeout,
debug=False, raise_errors=False # debug and raise_errors false to not log and raise errors in threads
raise_errors=False # stop individual threads raising errors
)

View File

@@ -58,7 +58,7 @@ class Analysis:
analysis_data = analysis_data['QuoteSummaryStore']
except KeyError as e:
err_msg = "No analysis data found, symbol may be delisted"
print('- %s: %s' % (self._data.ticker, err_msg))
logger.error('%s: %s', self._data.ticker, err_msg)
return
if isinstance(analysis_data.get('earningsTrend'), dict):

View File

@@ -1,4 +1,5 @@
import datetime
import logging
import json
import pandas as pd
@@ -8,6 +9,7 @@ from yfinance import utils
from yfinance.data import TickerData
from yfinance.exceptions import YFinanceDataException, YFinanceException
logger = utils.get_yf_logger()
class Fundamentals:
@@ -50,7 +52,7 @@ class Fundamentals:
self._fin_data_quote = self._financials_data['QuoteSummaryStore']
except KeyError:
err_msg = "No financials data found, symbol may be delisted"
print('- %s: %s' % (self._data.ticker, err_msg))
logger.error('%s: %s', self._data.ticker, err_msg)
return None
def _scrape_earnings(self, proxy):
@@ -144,7 +146,7 @@ class Financials:
if statement is not None:
return statement
except YFinanceException as e:
print(f"- {self._data.ticker}: Failed to create {name} financials table for reason: {repr(e)}")
logger.error("%s: Failed to create %s financials table for reason: %r", self._data.ticker, name, e)
return pd.DataFrame()
def _create_financials_table(self, name, timescale, proxy):
@@ -267,7 +269,7 @@ class Financials:
if statement is not None:
return statement
except YFinanceException as e:
print(f"- {self._data.ticker}: Failed to create financials table for {name} reason: {repr(e)}")
logger.error("%s: Failed to create financials table for %s reason: %r", self._data.ticker, name, e)
return pd.DataFrame()
def _create_financials_table_old(self, name, timescale, proxy):

View File

@@ -1,5 +1,7 @@
import datetime
import logging
import json
import warnings
import pandas as pd
import numpy as _np
@@ -7,6 +9,7 @@ import numpy as _np
from yfinance import utils
from yfinance.data import TickerData
logger = utils.get_yf_logger()
info_retired_keys_price = {"currentPrice", "dayHigh", "dayLow", "open", "previousClose", "volume", "volume24Hr"}
info_retired_keys_price.update({"regularMarket"+s for s in ["DayHigh", "DayLow", "Open", "PreviousClose", "Price", "Volume"]})
@@ -46,16 +49,16 @@ class InfoDictWrapper(MutableMapping):
def __getitem__(self, k):
if k in info_retired_keys_price:
print(f"Price data removed from info (key='{k}'). Use Ticker.fast_info or history() instead")
warnings.warn(f"Price data removed from info (key='{k}'). Use Ticker.fast_info or history() instead", DeprecationWarning)
return None
elif k in info_retired_keys_exchange:
print(f"Exchange data removed from info (key='{k}'). Use Ticker.fast_info or Ticker.get_history_metadata() instead")
warnings.warn(f"Exchange data removed from info (key='{k}'). Use Ticker.fast_info or Ticker.get_history_metadata() instead", DeprecationWarning)
return None
elif k in info_retired_keys_marketCap:
print(f"Market cap removed from info (key='{k}'). Use Ticker.fast_info instead")
warnings.warn(f"Market cap removed from info (key='{k}'). Use Ticker.fast_info instead", DeprecationWarning)
return None
elif k in info_retired_keys_symbol:
print(f"Symbol removed from info (key='{k}'). You know this already")
warnings.warn(f"Symbol removed from info (key='{k}'). You know this already", DeprecationWarning)
return None
return self.info[self._keytransform(k)]
@@ -587,7 +590,7 @@ class Quote:
quote_summary_store = json_data['QuoteSummaryStore']
except KeyError:
err_msg = "No summary info found, symbol may be delisted"
print('- %s: %s' % (self._data.ticker, err_msg))
logger.error('%s: %s', self._data.ticker, err_msg)
return None
# sustainability

View File

@@ -22,4 +22,5 @@
_DFS = {}
_PROGRESS_BAR = None
_ERRORS = {}
_TRACEBACKS = {}
_ISINS = {}

View File

@@ -87,10 +87,4 @@ class Tickers:
return data
def news(self):
collection = {}
for ticker in self.symbols:
collection[ticker] = []
items = Ticker(ticker).news
for item in items:
collection[ticker].append(item)
return collection
return {ticker: [item for item in Ticker(ticker).news] for ticker in self.symbols}

View File

@@ -36,6 +36,7 @@ import appdirs as _ad
import sqlite3 as _sqlite3
import atexit as _atexit
from functools import lru_cache
import logging
from threading import Lock
@@ -69,6 +70,20 @@ def print_once(msg):
print(msg)
yf_logger = None
def get_yf_logger():
global yf_logger
if yf_logger is None:
yf_logger = logging.getLogger("yfinance")
if yf_logger.handlers is None or len(yf_logger.handlers) == 0:
# Add stream handler if user not already added one
h = logging.StreamHandler()
formatter = logging.Formatter(fmt='%(levelname)s %(message)s')
h.setFormatter(formatter)
yf_logger.addHandler(h)
return yf_logger
def is_isin(string):
return bool(_re.match("^([A-Z]{2})([A-Z0-9]{9})([0-9]{1})$", string))
@@ -346,10 +361,10 @@ def _interval_to_timedelta(interval):
def auto_adjust(data):
col_order = data.columns
df = data.copy()
ratio = df["Close"] / df["Adj Close"]
df["Adj Open"] = df["Open"] / ratio
df["Adj High"] = df["High"] / ratio
df["Adj Low"] = df["Low"] / ratio
ratio = (df["Adj Close"] / df["Close"]).to_numpy()
df["Adj Open"] = df["Open"] * ratio
df["Adj High"] = df["High"] * ratio
df["Adj Low"] = df["Low"] * ratio
df.drop(
["Open", "High", "Low", "Close"],
@@ -412,12 +427,9 @@ def parse_quotes(data):
def parse_actions(data):
dividends = _pd.DataFrame(
columns=["Dividends"], index=_pd.DatetimeIndex([]))
capital_gains = _pd.DataFrame(
columns=["Capital Gains"], index=_pd.DatetimeIndex([]))
splits = _pd.DataFrame(
columns=["Stock Splits"], index=_pd.DatetimeIndex([]))
dividends = None
capital_gains = None
splits = None
if "events" in data:
if "dividends" in data["events"]:
@@ -446,6 +458,16 @@ def parse_actions(data):
splits["denominator"]
splits = splits[["Stock Splits"]]
if dividends is None:
dividends = _pd.DataFrame(
columns=["Dividends"], index=_pd.DatetimeIndex([]))
if capital_gains is None:
capital_gains = _pd.DataFrame(
columns=["Capital Gains"], index=_pd.DatetimeIndex([]))
if splits is None:
splits = _pd.DataFrame(
columns=["Stock Splits"], index=_pd.DatetimeIndex([]))
return dividends, splits, capital_gains
@@ -456,31 +478,30 @@ def set_df_tz(df, interval, tz):
return df
def fix_Yahoo_returning_prepost_unrequested(quotes, interval, metadata):
def fix_Yahoo_returning_prepost_unrequested(quotes, interval, tradingPeriods):
# Sometimes Yahoo returns post-market data despite not requesting it.
# Normally happens on half-day early closes.
#
# And sometimes returns pre-market data despite not requesting it.
# E.g. some London tickers.
tps_df = metadata["tradingPeriods"]
tps_df = tradingPeriods.copy()
tps_df["_date"] = tps_df.index.date
quotes["_date"] = quotes.index.date
idx = quotes.index.copy()
quotes = quotes.merge(tps_df, how="left", validate="many_to_one")
quotes = quotes.merge(tps_df, how="left")
quotes.index = idx
# "end" = end of regular trading hours (including any auction)
f_drop = quotes.index >= quotes["end"]
f_drop = f_drop | (quotes.index < quotes["start"])
if f_drop.any():
# When printing report, ignore rows that were already NaNs:
f_na = quotes[["Open","Close"]].isna().all(axis=1)
n_nna = quotes.shape[0] - _np.sum(f_na)
n_drop_nna = _np.sum(f_drop & ~f_na)
quotes_dropped = quotes[f_drop]
# f_na = quotes[["Open","Close"]].isna().all(axis=1)
# n_nna = quotes.shape[0] - _np.sum(f_na)
# n_drop_nna = _np.sum(f_drop & ~f_na)
# quotes_dropped = quotes[f_drop]
# if debug and n_drop_nna > 0:
# print(f"Dropping {n_drop_nna}/{n_nna} intervals for falling outside regular trading hours")
quotes = quotes[~f_drop]
metadata["tradingPeriods"] = tps_df.drop(["_date"], axis=1)
quotes = quotes.drop(["_date", "start", "end"], axis=1)
return quotes
@@ -519,16 +540,24 @@ def fix_Yahoo_returning_live_separate(quotes, interval, tz_exchange):
# Last two rows are within same interval
idx1 = quotes.index[n - 1]
idx2 = quotes.index[n - 2]
if idx1 == idx2:
# Yahoo returning last interval duplicated, which means
# Yahoo is not returning live data (phew!)
return quotes
if _np.isnan(quotes.loc[idx2, "Open"]):
quotes.loc[idx2, "Open"] = quotes["Open"][n - 1]
# Note: nanmax() & nanmin() ignores NaNs
quotes.loc[idx2, "High"] = _np.nanmax([quotes["High"][n - 1], quotes["High"][n - 2]])
quotes.loc[idx2, "Low"] = _np.nanmin([quotes["Low"][n - 1], quotes["Low"][n - 2]])
# Note: nanmax() & nanmin() ignores NaNs, but still need to check not all are NaN to avoid warnings
if not _np.isnan(quotes["High"][n - 1]):
quotes.loc[idx2, "High"] = _np.nanmax([quotes["High"][n - 1], quotes["High"][n - 2]])
if "Adj High" in quotes.columns:
quotes.loc[idx2, "Adj High"] = _np.nanmax([quotes["Adj High"][n - 1], quotes["Adj High"][n - 2]])
if not _np.isnan(quotes["Low"][n - 1]):
quotes.loc[idx2, "Low"] = _np.nanmin([quotes["Low"][n - 1], quotes["Low"][n - 2]])
if "Adj Low" in quotes.columns:
quotes.loc[idx2, "Adj Low"] = _np.nanmin([quotes["Adj Low"][n - 1], quotes["Adj Low"][n - 2]])
quotes.loc[idx2, "Close"] = quotes["Close"][n - 1]
if "Adj High" in quotes.columns:
quotes.loc[idx2, "Adj High"] = _np.nanmax([quotes["Adj High"][n - 1], quotes["Adj High"][n - 2]])
if "Adj Low" in quotes.columns:
quotes.loc[idx2, "Adj Low"] = _np.nanmin([quotes["Adj Low"][n - 1], quotes["Adj Low"][n - 2]])
if "Adj Close" in quotes.columns:
quotes.loc[idx2, "Adj Close"] = quotes["Adj Close"][n - 1]
quotes.loc[idx2, "Volume"] += quotes["Volume"][n - 1]
@@ -698,7 +727,7 @@ def is_valid_timezone(tz: str) -> bool:
return True
def format_history_metadata(md):
def format_history_metadata(md, tradingPeriodsOnly=True):
if not isinstance(md, dict):
return md
if len(md) == 0:
@@ -706,60 +735,54 @@ def format_history_metadata(md):
tz = md["exchangeTimezoneName"]
for k in ["firstTradeDate", "regularMarketTime"]:
if k in md and md[k] is not None:
md[k] = _pd.to_datetime(md[k], unit='s', utc=True).tz_convert(tz)
if not tradingPeriodsOnly:
for k in ["firstTradeDate", "regularMarketTime"]:
if k in md and md[k] is not None:
if isinstance(md[k], int):
md[k] = _pd.to_datetime(md[k], unit='s', utc=True).tz_convert(tz)
if "currentTradingPeriod" in md:
for m in ["regular", "pre", "post"]:
if m in md["currentTradingPeriod"]:
for t in ["start", "end"]:
md["currentTradingPeriod"][m][t] = \
_pd.to_datetime(md["currentTradingPeriod"][m][t], unit='s', utc=True).tz_convert(tz)
del md["currentTradingPeriod"][m]["gmtoffset"]
del md["currentTradingPeriod"][m]["timezone"]
if "tradingPeriods" in md:
if md["tradingPeriods"] == {"pre":[], "post":[]}:
del md["tradingPeriods"]
if "currentTradingPeriod" in md:
for m in ["regular", "pre", "post"]:
if m in md["currentTradingPeriod"] and isinstance(md["currentTradingPeriod"][m]["start"], int):
for t in ["start", "end"]:
md["currentTradingPeriod"][m][t] = \
_pd.to_datetime(md["currentTradingPeriod"][m][t], unit='s', utc=True).tz_convert(tz)
del md["currentTradingPeriod"][m]["gmtoffset"]
del md["currentTradingPeriod"][m]["timezone"]
if "tradingPeriods" in md:
tps = md["tradingPeriods"]
if isinstance(tps, list):
# Only regular times
regs_dict = [tps[i][0] for i in range(len(tps))]
pres_dict = None
posts_dict = None
elif isinstance(tps, dict):
# Includes pre- and post-market
pres_dict = [tps["pre"][i][0] for i in range(len(tps["pre"]))]
posts_dict = [tps["post"][i][0] for i in range(len(tps["post"]))]
regs_dict = [tps["regular"][i][0] for i in range(len(tps["regular"]))]
else:
raise Exception()
if tps == {"pre":[], "post":[]}:
# Ignore
pass
elif isinstance(tps, (list, dict)):
if isinstance(tps, list):
# Only regular times
df = _pd.DataFrame.from_records(_np.hstack(tps))
df = df.drop(["timezone", "gmtoffset"], axis=1)
df["start"] = _pd.to_datetime(df["start"], unit='s', utc=True).dt.tz_convert(tz)
df["end"] = _pd.to_datetime(df["end"], unit='s', utc=True).dt.tz_convert(tz)
elif isinstance(tps, dict):
# Includes pre- and post-market
pre_df = _pd.DataFrame.from_records(_np.hstack(tps["pre"]))
post_df = _pd.DataFrame.from_records(_np.hstack(tps["post"]))
regular_df = _pd.DataFrame.from_records(_np.hstack(tps["regular"]))
pre_df = pre_df.rename(columns={"start":"pre_start", "end":"pre_end"}).drop(["timezone", "gmtoffset"], axis=1)
post_df = post_df.rename(columns={"start":"post_start", "end":"post_end"}).drop(["timezone", "gmtoffset"], axis=1)
regular_df = regular_df.drop(["timezone", "gmtoffset"], axis=1)
cols = ["pre_start", "pre_end", "start", "end", "post_start", "post_end"]
df = regular_df.join(pre_df).join(post_df)
for c in cols:
df[c] = _pd.to_datetime(df[c], unit='s', utc=True).dt.tz_convert(tz)
df = df[cols]
def _dict_to_table(d):
df = _pd.DataFrame.from_dict(d).drop(["timezone", "gmtoffset"], axis=1)
df["end"] = _pd.to_datetime(df["end"], unit='s', utc=True).dt.tz_convert(tz)
df["start"] = _pd.to_datetime(df["start"], unit='s', utc=True).dt.tz_convert(tz)
df.index = _pd.to_datetime(df["start"].dt.date)
df.index = df.index.tz_localize(tz)
return df
df.index.name = "Date"
df = _dict_to_table(regs_dict)
df_cols = ["start", "end"]
if pres_dict is not None:
pre_df = _dict_to_table(pres_dict)
df = df.merge(pre_df.rename(columns={"start":"pre_start", "end":"pre_end"}), left_index=True, right_index=True)
df_cols = ["pre_start", "pre_end"]+df_cols
if posts_dict is not None:
post_df = _dict_to_table(posts_dict)
df = df.merge(post_df.rename(columns={"start":"post_start", "end":"post_end"}), left_index=True, right_index=True)
df_cols = df_cols+["post_start", "post_end"]
df = df[df_cols]
df.index.name = "Date"
md["tradingPeriods"] = df
md["tradingPeriods"] = df
return md
@@ -954,9 +977,10 @@ def get_tz_cache():
try:
_tz_cache = _TzCache()
except _TzCacheException as err:
print("Failed to create TzCache, reason: {}".format(err))
print("TzCache will not be used.")
print("Tip: You can direct cache to use a different location with 'set_tz_cache_location(mylocation)'")
logger.error("Failed to create TzCache, reason: %s. "
"TzCache will not be used. "
"Tip: You can direct cache to use a different location with 'set_tz_cache_location(mylocation)'",
err)
_tz_cache = _TzCacheDummy()
return _tz_cache

View File

@@ -1 +1 @@
version = "0.2.18"
version = "0.2.19b1"