Should I be happy to pay rather than receiving money when submitting annual tax returns?¶

Disclaimer: this is not a financial advise of any kind

Here in Spain, one must declare its annual income once a year. By law, a fraction of the salary gets retained to pay those taxes (IRPF). However, it is often adjusted by employers (and by law) so that when declaring taxes the amount of money to be paid/returned is close to zero. When not adjusted properly, one may receive a considerable amount of money back or the other way around (considerable debt in taxes). Latter scenario might feel annoying due to the punctual expenditure but probably it the best considering liquidity.
Here I'll explore whether it is worth having this "debt" and the amount of gains we can generate out of it paying them as late as possible and using a relatively low-risk financial product (ETF)

In [1]:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import yfinance as yf
import mplfinance as mpf
from scipy import stats
import seaborn as sns

In [2]:

# get data from yahoo finance
df = yf.download(
    'VWRL.AS', # Vandguard global ETF
    start='2012-06-01', 
    end='2021-06-15', 
    progress=False,
    interval='1mo'
)
# calc "revenue" between months. not sure why close and next open are not equal,
# unless close is from the same day (1st of the month, quite unlikely though) 
df['rev'] = df.Open.shift(-1) / df.Open
df.dropna(inplace=True)
df.tail() # why is High and Low data missing from the first rows?

Out[2]:

	Open	High	Low	Close	Adj Close	Volume	rev
Date
2021-01-01	86.489998	91.220001	85.949997	87.440002	87.440002	827466	1.017112
2021-02-01	87.970001	93.570000	87.889999	89.900002	89.900002	900040	1.033079
2021-03-01	90.879997	94.860001	89.269997	94.739998	94.739998	822635	1.045335
2021-04-01	95.000000	97.050003	94.949997	96.239998	96.239998	707753	1.014000
2021-05-01	96.330002	96.900002	92.709999	96.099998	96.099998	745363	1.001869

In [3]:

# "succinct" viz. about how it is performing
f, ax = plt.subplots(ncols=2, nrows=2,figsize=(15,15))

# 1st fig, historical price # candles broken due to data integrity
mpf.plot(df, ax=ax[0,0], type='candle', style='yahoo')

# 2nd fig, monthly revenue
df.rev.plot(ax=ax[1,0])
ax[1,0].set_ylabel('fraction revenue')
ax[1,0].axhline(df.rev.mean(), ls='--', c='g', 
                label=f'mean {round(df.rev.mean(),4)}')
ax[1,0].axhline(1, ls=':', c='gray');
ax[1,0].legend()

# 3rd, viz distribution
df.rev.hist(ax=ax[0,1], bins=40)
ax[0,1].axvline(1, c='k', ls=':')

# 4th, month superposition
(df.groupby(df.index.month)['rev'].agg(mean='mean', sem=stats.sem).
plot(ax=ax[1,1],y='mean', yerr='sem', legend=False, marker='o', 
    ls='none', xlabel='month', ylabel='profit (monthly)',
    capsize=4, color='b'))
ax[1,1].axhline(1, ls=':', c='gray')
ax[1,1].scatter(
    df.index.month+np.random.normal(scale=0.1, size=len(df)), 
    df.rev, color=(0,0,0,0), edgecolor='gray' )

# set titles
ax = ax.flatten()
for i, title in enumerate(
    ['time candles', 'revenue distribution', 'frac. rev vs time', 'rev. by month']
    ):
    ax[i].set_title(title)

In [4]:

# one last viz
plt.figure(figsize=(8,7))
sns.heatmap(
    df.pivot_table(index=df.index.year, columns=df.index.month, values="rev")*100-100,
    cmap='coolwarm', center=0, annot=True, fmt='.1f', linewidths=.5, 
    annot_kws={"fontsize":8}, vmin=-15, vmax=15
).set(xlabel='month', ylabel='year');

Well, obviously there's something wrong with high&low values of the first candles. Apart from that, there's not much to say, an ETF with av. 0.87% monthly returns influenced by global market.
To avoid making extra assumptions we will simply check how much money would have yielded investing some amount of money each month instead of being retained/paid instantaneously. Taxes are paid by the end of June (60%) and early November (remaining 40%) in the next year.
Let's calculate revenue assuming we invested 100€ each month

In [5]:

# simul year 2019, declared on 2020 Jul (1st) & Nov (2nd payment)
ex = df.loc[('2018-12-31'< df.index) & (df.index < '2020-11-01')] 
ex.tail()

Out[5]:

	Open	High	Low	Close	Adj Close	Volume	rev
Date
2020-06-01	76.419998	79.970001	73.400002	76.629997	76.629997	640732	1.007590
2020-07-01	77.000000	80.639999	76.209999	76.540001	76.540001	590949	1.005065
2020-08-01	77.389999	81.970001	77.029999	80.680000	80.680000	420163	1.046905
2020-09-01	81.019997	83.260002	77.029999	79.570000	79.570000	515303	0.987164
2020-10-01	79.980003	82.889999	77.150002	77.889999	77.889999	459042	0.978244

In [6]:

# get revenue per each month (accu prod/ cum prod)
revs = ex.rev.values[:12][::-1].cumprod()[::-1] 
# we flip it to cumprod and then flip again so that december's 
# returns are perceived by all of the (100€) investments. 
# November's all but 1, etc.
nyear = (revs * 100).sum() # 100€ each month * each revenue
nyear # 1316 rather than 1200, not bad

Out[6]:

1316.8238085151052

In [7]:

# now add further interest till Jul (tax campaign ends at the end of June)
money = nyear * ex.rev.values[12:18].prod() # oh shit dang global pandemics
# not adding more money because they fall in next(current) year taxes
# we pay 60% of the taxes (.6*1200)
money -= (1200 * .6)
# remaining interest [Jul to oct, early nov we need to finish tax payment]
money *= ex.rev.values[18:].prod()
money -= (1200 * .4)
# remove from 19 to 21 taxes in those earnings
money * 0.81 # taxes to capital gains (from 19 to 21%)
# not entirely sure it works the same way with losses

Out[7]:

26.250300756737758

26€ does not sound like enough for the amount of headaches
I'll write a function and test other years presenting revenue as percentage of investment

In [8]:

def IRPF_debt_rev(total_money, year, df=df):
    """
    total_money: total IRPF debt, assumes it is distributed uniformly
    year: year to test
    df: dataframe with column called 'rev' with monthly revenue, 
        indexed with datetime index
    """
    ex = df.loc[(f'{year-1}-12-31'< df.index) & (df.index < f'{year+1}-11-01')]
    assert len(ex)==22, f'is df complete? filtered df len must be 22'
    monthly_debt = total_money / 12
    revs = ex.rev.values[:12][::-1].cumprod()[::-1]
    money = (revs * monthly_debt).sum() 
    money *= ex.rev.values[12:18].prod()
    money -= (total_money * .6) # 1st payment
    money *= ex.rev.values[18:].prod()
    money -= (total_money * .4) # 2nd payment
    # remove from 19 to 21 taxes in those earnings
    return money * 0.81

returns = []
for y in range(2013,2020): # test from 2013 to 2020
    returns += [IRPF_debt_rev(1200,y)]
    print(f'In {y} it would have returned {round(returns[-1],2)}€'\
        f' ({round(100*returns[-1]/1200,2)}%)')
    
print(f'\non average it represents {round(np.mean(returns),2)}€/year')
print(f'which is {round(100 * np.mean(returns)/1200,2)}% '\
    'of the "borrowed" money')

In 2013 it would have returned 13.25€ (1.1%)
In 2014 it would have returned 174.64€ (14.55%)
In 2015 it would have returned -28.47€ (-2.37%)
In 2016 it would have returned 143.52€ (11.96%)
In 2017 it would have returned 26.09€ (2.17%)
In 2018 it would have returned 85.91€ (7.16%)
In 2019 it would have returned 26.25€ (2.19%)

on average it represents 63.03€/year
which is 5.25% of the "borrowed" money from taxes

More than 5%, not bad! Worth trying at least.

TBD: explore geometric mean of daily returns