Using Git to deploy changes to web sites has been around for a while - year and years. I just thought I'd share a small twist that I haven't seen others doing, yet. I started out the normal way by adding a post-receive hook on the git server, but the problem with that is it's not under version control itself. So when I want to change how I deploy I need to login to the server and make the changes, and I have to track those changes somewhere else - I guess another repo perhaps. So I came up with a bit better way.

I have a post-receive hook I put up on all my deployable web site repositories. It does the usual checkout but instead of taking further action it actually moves to the checked out directory and looks for a deploy script to call. So now the actual deployment is within the repo, and can be modified along with other code. And it can handle deployment differently for each repo/site.

Here's the scripts I use - first, the post-receive hook. This is real simple and goes in your git server repo hooks directory:


unset GIT_DIR

while read from to branch
    mkdir -p "${DEPLOY_WORK}"
    GIT_WORK_TREE="${DEPLOY_WORK}" git checkout -f "${branch}" 
    cd "${DEPLOY_WORK}"
    if [ -f deploy ]; then
        ./deploy "${branch##*/}"
    rm -rf "${DEPLOY_WORK}"

Then I make a deploy script that sits inside the repo. And a nice thing here is it can be in python or whatever you like, as long as you have that on your server. In my case I put static web sites in an Amazon S3 bucket because it's fast, scales well, and basically free for low traffic web sites. So I use the nice s3cmd tool to take care of uploading.

# for s3 deploy of git repo
# script to upload src directory to a bucket selected by branch script argument $1
# include in repo and git server post-receive hook can call to deploy
# depends on s3cmd - pip install s3cmd


declare -A branch

if [[ "${branch[$1]}" ]]; then
    echo "Deploying "$1" to "$bucket
    touch .gzs .gitignore .s3ignore
    gzs=$(find $src -name '*.gz')
    for f in $gzs; do 
      echo $fx >> .gzs
      echo ${fx%.gz} >> .gzs
      s3cmd sync --guess-mime-type --no-mime-magic --acl-public --add-header="Content-Encoding:gzip" --no-preserve --add-header="Cache-Control:public, max-age=86400" "$f" "s3://$bucket/${fx%.gz}";
    s3cmd sync -r --exclude-from '.s3ignore' --exclude-from '.gzs' --exclude-from '.gitignore' --delete-removed --acl-public --no-preserve --guess-mime-type --no-mime-magic --add-header="Cache-Control:public, max-age=86400" $src/ s3://$bucket
    rm .gzs
    echo "Branch "$1" has no bucket - not deployed."

The cool thing here is that this deploy script looks at the branch being deployed and chooses what bucket to push to. It could make other choices like what web root directory to copy to on the server. Mine also checks for .gz and renames and sets content encoding. It can even make other changes - nasty ones too, so be aware someone with access to your local git repo can run as the git user on your server - you have limited the privelages for your git user, right?

When I work on a web site I do it in the test branch. And a simple git push sends any detected changes to the server where the deploy script is invoked to push the right content up to the right bucket on S3. When I'm happy with changes, I git checkout master and git merge test, then git push. And auto-magically it ends up in the production bucket. Here's what see when I push my test branch - output from my deploy script:

neocogent$ git push
Counting objects: 42, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (14/14), done.
Writing objects: 100% (16/16), 1.26 KiB | 0 bytes/s, done.
Total 16 (delta 8), reused 0 (delta 0)
remote: Previous HEAD position was c80a8ab... how i deploy
remote: HEAD is now at 7359b74... tweaks
remote: Deploying test to
remote: upload: 'output/author/neocogent.html' -> 's3://'  [1 of 4]
remote:  47281 of 47281   100% in    0s   392.00 kB/s  done
remote: upload: 'output/blog/2017/01/how-i-do-deploy.html' -> 's3://'  [2 of 4]
remote:  21006 of 21006   100% in    0s   186.52 kB/s  done
remote: upload: 'output/index.html' -> 's3://'  [3 of 4]
remote:  47162 of 47162   100% in    0s   403.86 kB/s  done
remote: upload: 'output/sitemap.xml' -> 's3://'  [4 of 4]
remote:  8756 of 8756   100% in    0s    95.35 kB/s  done
remote: Done. Uploaded 124205 bytes in 1.0 seconds, 121.29 kB/s.
   c80a8ab..7359b74  test -> test

Typically this is pretty fast as Git only sends the changes to the server and it compresses data. Manually uploading to S3 is quite slow from my location so having the server expand files and send from there on a "big pipe" is super quick. Notice above - I cannot get 400 KB/s upload from home.

I also have a few aliases that reduce command fatigue - put these in your ~/.gitconfig (all repos) or .git/config (local repo). repush allows me to re-deploy even without changing files; useful for testing. The other two are handy too.

    repush = "!f() { git commit --allow-empty --amend --no-edit; git push; }; f"
    golive = "!f() { git checkout master; git merge test; git push; }; f"
    test = "!f() { git checkout test; }; f"

I've spent the last week or so working on getting a demo server running for sqlChain. This is just a first step on the way to a more functional and useful explorer site.

I started out on Amazon EC2 with an m4.large instance. It's familiar and easy to start a server and get to it. I created a 50GB EBS data volume and attached that. Trying to save money I mistakenly chose the "Magnetic" type. First time I tried that, and little did I know that it isn't the same as "Cold HDD". I thought it cost 2.5 cents/GB/mo. but it was actually 5 cents. The real kicker is this EBS type charges for I/O and 42 million DB accesses later I realized this. Ehh, nuked that volume and thought more about using a VPS where resources aren't pay-as-you-go.

After a brief scan of I found Hudson Valley had a 4-vCPU-4GB-300GB offering for $5/mo. and takes Bitcoin. That sounds great, and even though it's not SSD storage, I thought it would do for testing. I sync'd up to block 389,000 before the IOPS of hard disk storage became painful and started looking for a temporary high end solution. Next up - because they also take Bitcoin. They require you to initially fund with Paypal or credit card. I guess that's to link your real world identity to the account. No worries. I added $5 from Paypal and got a $20 bonus credit to play with. Later I found out with some Google-fu I could have got $50 free credit.

Vultr Usage Charts

The Hudson Valley VPS is your traditional pay-per-month but Vultr is more a cross between a VPS and Amazon cloud computing. You pay-per-use but not as granular and not as flexible with many niceties missing. I needed something with SSD and more speed. I started a quick test on the 1-vCPU-1GB VPS and noticed the storage speed is not even close to real SSD specs. I guess they use the SSD speed to cram more users in for a cheap price. So the next step was a "hybrid" server - they have a 2-CPU-8GB-120GB-SSD for 8 cents/hour. I later tried the 4-CPU offering as well but it gave little more oompf. It did prompt me to rework some of the threading in sqlChain and try to get more connections open to MySQL. So the 2-CPU "dedicated" option was the best so far. No extra charges for bandwidth and IOPS but with more access to raw CPU power. I have to say being able to test out different specs for a few hours, without bothering support or much hassle, is very nice.

I installed sqlChain as per my install guide using the PyPi package and was up and syncing within minutes. Well, almost. Turns out they mess around with the normal Ubuntu install, missing some things and opening up root logins. So first I had to do a bit of server admin to get a customary environment and then zip-zap, up and sqlChaining.

I rsync'd about 75 GB of data from Hudson Valley over to this new system, and continued on. Until my next lesson in hard knocks: don't test new code on a half sync'd database because you may have to start again. I had written the new threading code and local testing showed it should be more than double the speed. The hitch? It actually wasn't working correctly and threw away most of the outputs from about 4 million txs before I noticed. Fix, fix, fix. Tedious sql maneuvering to figure out what txs are toast and delete part of the database, then restart. On it goes... but it does run faster. Tomorrow some numbers...

It's been so long but I am finally getting some more work done on sqlChain. I don't think anyone noticed that it was stalled, or even existed. I saw a few others following it on GitHub but that's probably just onlookers wondering if this would go anywhere. It hasn't yet. I had to work on actual paying projects over the last year. Now I have some time and really want to get this thing flying.

I'm working on a new Bitcoin / Blockchain explorer type site with some hopefully useful and interesting new features. I've already got the domain name: - currently parked right here. I've been fixing up some issues with sqlChain over the last week and readying it for some real testing on an EC2 server.

Setup Screen

I've added the sqlchain-init script to ask a series of setup questions and then create directories, config files, system boot init scripts and do MySQL DB creation. It's working here on my laptop and in the next couple days I'll test it out on a fresh server along with the newly updated PyPi package. I'm expecting users can just pip install it and run the sqlchain-init script to have a working server, though it will assume bitcoind installed.

The other main update is support for manual pruning on upcoming Bitcoin releases. Currently a custom build with PR #7871 is needed for manual pruning. However, this new feature allows safely running a pruning node under sqlChain. It lets sqlChain notify bitcoind which blocks have been processed into MySQL so it won't prune any still required. Stay tuned.

There are four basic ways to run sqlChain. Choosing which model to follow at the outset is best. It is possible to change between models but it will incur a time cost to rework data. You can run sqlChain either with or without the --no-sigs option. This controls whether input sigScript data is maintained and significantly affects storage size. And you can run over a full node or pruning node. These two choices give us the following four combinations, in order of disk usage:

  • --no-sigs, pruning - this case requires the least disk space and discards sigScript data completely. If you want to run an Electrum server or as a backend for some application that doesn't need this data then this minimizes disk cost. It cannot provide raw transaction data in standard form and cannot be used to validate transaction data. Validation was done when bitcoind downloaded the blocks. As of block 370,000 sqlChain will require ~27GB of data and bitcoind as low as ~1GB, for a total disk size of ~28GB.

  • default, pruning - sqlChain keeps sigScript data but the underlying blockchain has been pruned. sqlChain can provide complete raw transaction data in standard form from it's native api interface. As of block 370,000 sqlChain uses ~52GB and bitcoind as low as ~1GB, giving a total disk use of ~53GB.

  • --no-sigs, full - sqlChain does not have sigScript data but the underlying blockchain data is still intact. Raw transaction data can be returned from the rpc interface only. As of block 370,000 sqlChain uses ~27GB and the blockchain (with --txindex) about 51GB, for a total of ~78GB.

  • default, full - both layers have full data, so sigScript data can be queried from either the sqlChain api or rpc interfaces. This uses the maximal disk space of ~103GB. This seems excessive considering once transactions have been validated there is no compelling further use for the sigScript data.

As to changing models after building the sql data the time costs are as follows:

  • removing sigScript data from sqlChain is possible with the stripsigs utility. Depending on system speed it can take several to ~16 hours (as gauged by my aging laptop; it coulld be slower still on a arm based board, without SSD) to scan transaction data and rewrite the external blob data file, currently wiping out ~25GB.

  • adding sigScript data afterwards would require re-building sqlChain from genesis block again; usually quite time consuming (full sync about 160 hours on my ol' laptop). It would be possible to have a utility for rebuilding just this data but I have not bothered to write one.

  • changing a pruning node to full node requires re-downloading the full blockchain from the beginning.

  • going from a full node to pruning node is pretty easy, also quick as it discards blk*.dat files but is non-reversible. If you copy the bitcoin directory and start with the copy as -data-dir option then it will prune the copy, and you can revert to the full one if need be.

I'm personally interested in running a no-sigs, pruning node for a personal Electrum server and will be exploring that over the next few days. There are some gotchas in trying to sync from a pruning node. It is possible, even likely, for the node to prune away data before it gets pulled into the sqlChain database (which would force beginning again). I have code in place now to manage this but as of today it's untested.

With the new blkdat module sqlChain can now read block data directly and by monitoring blk*.dat file presence, along with a nifty btcGate utility, it can pause/resume the pruning node when it can't keep up. My experience over the last few days has been that sqlChain can build sql data at the same time, and as quickly as bitcoind can sync the blockchain. If you have a slower system or low-end VPS then that's pretty sweet.

Coming soon - a full tutorial on installing and running sqlChain.

Any time you store data you have trade offs to consider. I wanted to use sqlChain on low-cost a VPS where every GB pushes up the monthly fee. At a volunteer level I really couldn't see paying for 300GB of SSD disk space as several APIs are reportedly demanding. On Amazon EC2 that would be $30/month for data. No big deal when you have a business model; but kind of costly if you are paying from spare cash to "support the community". I'm also developing on an old laptop with very modest resources and I wanted it to be usable without requiring high-end hardware.

In building sqlChain I chose to reduce data size by eliminating redundancies, and using a few tricks, at the cost of being able to query all values directly in SQL. Some data requires external scripting to access. For the purpose of providing an API layer I think this has worked well, but some analysis cannot be done with SQL alone. C'est la vie. This article discusses these trade offs.

First thing is not storing textual or hex versions of data. That's obvious, and comes at a slight cost in query simplicity because you have to do various conversions on both inserting and selecting data. In sqlChain addresses are stored as 20 byte binary pubkey hashes, and tx/block hashes are stored as 32 byte binary values.

To get to results in a time useful for an API you need to index the data. This is probably the biggest cost, so choosing what and how you index greatly affects both size and speed. I looked for ways to reduce indexed column size. Instead of indexing on 32 byte hash values I used a scheme that truncates hashes as 5 byte IDs, converted to decimal values, requiring ony 6 or 8 bytes. This means that IDs may not directly be unique, requiring a small trade off in speed as you need to check an ID and increment it until an unused value is found. It makes queries only slightly more complex. Instead of a query like: select * from trxs where hash=%s; you need to use select * from trxs where id>=%s and hash=%s limit 1;, with an index built only on the id column. I use this for transaction ids, address ids, and output ids, with a further constraint of output ids being derived from transaction ids, cutting down again on indexing.

To see how well this works I did some testing on collision rates for the IDs. Then I revised the method until collisions were rare enough that you almost always got the first record indexed. In the last test I ran on transaction ids only 1 in 41000 records required even one increment. For address IDs I use a slightly different conversion that encodes the address type (1- or 3-) in the lowest ID bit (odd/even). Both address and transaction ids take 6 bytes and output ids take 7 bytes. In all cases my index files are actually smaller than the data files; usually much smaller, which I consider a big win.

The blockchain has tremedous data redundancies. Almost all input/output records contain standard values. Weird transactions that deviate are a very small percent of the whole, so I looked for ways to encode these redundancies. The biggest space saver is the encoding of output script PK data, which is very often just the pubkey hash with a few extra script bytes. We're already storing the pubkey hash for each output as an ID value, so I can truncate all standard outputs to a single byte. If it's zero, it indicates no more data, and we look at the pubkey hash and ID (low bit) to derive back the orignal scriptPK. There are very few scriptPKs that have a non-zero length byte, and store actual script data.

The locktime and sequence values in each transaction and input record are likewise encoded as single bits, along with a variable input/output count, in a 1 byte header. This reduces data and these values have no common need to be indexed, so not much is given up. If you wanted to count how many transactions have >1000 outputs, or a given locktime you'd have to use a script instead of a SQL query.

The biggest space saver of all is the input script data (sig,pubkey). I still carry this along for now but will add a pruning option that can drop this data. Obviously it's critical for the verification of transactions but after that it is rarely or never used. It accounts for over 25GB of data that in most cases can be simply dropped with no loss of function other than spewing out interesting hex digits on API requests. Once verified by bitcoind they never get used. Rather surprisingly, considering the size of the blockchain, the only data you actually need for spending, other than your private key, is the address, or more specifically the pubkey hash (pkh) bytes, linked to a specific output id (tx,n). These are all maintained in the outputs table as a pair (out_id,addr_id). It's deliberate that the table with most records (outputs) has the smallest row size.

The full SQL schema is in the GitHub repo, but only tells part of the story. The code to handle bit flags and ID values is in the sqlchain support module, lib/

I'm finally back up to block 336,892... Debug Screen

Two weeks ago I started working on my SQL converter again. I've had a new (faster, larger) SSD for months but I was busy and, frankly, lacked the desire to work more on it. During July, on another paid project, I happened to set up an Electrum server and was a little miffed at how long it takes to sync with bitcoind. On this particular VPS it was often taking several minutes per block. Very often when checking my wallet I'd find it was lagging by 2-3 blocks. This then became my new impetus for working again on the project - it is to become much more than just a converter - Introducing sqlChain, a full API layer over the blockchain with Electrum support.

There's nothing special about block 336,892. It's just that in the process of developing sqlChain as a new API platform I discovered flaws in my data schema and conversion that forced me several times to restart the process. I think this is my third time in two weeks. And my fingers are crossed, I'm confident, that this time is it. I've been madly coding while the sqlchain is plodding along in the background (on my ancient laptop, someone please send me a new one).

sqlChain is my attempt to build an API platform over the blockchain using MySQL as backend. Not really something new, except that my priority has been on storage size - the goal is having it take less space than the blockchain. I think I'm succeeding at this, though I'm not yet fully caught up (this moment we're all at 369,927). One of my design parameters was to be able to run a fully capable sqlChain on a pruning node. In this scenario bitcoind becomes a verifying front end for sqlChain, which provides a more fully usable API layer over this. And yes, that includes an Electrum server running on sqlChain, over a pruning node.

Here is some current sqlchain row counts at block 337,037 from the /status API call:

  • trxs 55,543,931
  • addresses 58,949,447
  • outputs 153,007,757

The current total sqlChain data size is 35.8 GB but includes pruneable sigscript data of approx. 17 GB. This data is not required for an Electrum server, nor most typical applications that may run over sqlChain, and can be easily pruned. In theory, we should be able to run an Electrum server over a pruning node using < 1 GB in bitcoind and ~20 GB in sqlchain, with sync times of 5-10 seconds (see screen shot, on my 7 year old Core2Duo laptop, probably much faster on any decent server instance). I'll be very happy if this works out.

sqlChain currently consists of three components, all basically working at this time (with some small bits being incomplete). Being under active development, it needs far more testing. The components are:

  • sqlchaind - the blockchain monitor daemon. Polls bitcoind to sync the blockchain to MySQL, monitors the mempool for unconfirmed txs, and detects orphans. It builds and revises the SQL data. Dependent on bitcoind responding to rpc, it can build sql data as bitcoind syncs.

  • sqlchain-api - a multi-API layer that queries SQL data created above to service applications. At this time it offers Insight, and RPC compatible API interfaces, with some nice extensions. Oh, it also provides a WebSocket interface ( compatible), and includes a basic web server. It's designed to be easy to add new API modules.

  • sqlchain-electrum - a fully functional Electrum server that runs on top of both modules above. At this time this only supports private use. No support for the IRC peer server discovery - which is fine as it needs far more testing before it should be used publicly. This is a suprisingly thin layer over the Insight compatible (extended) API that it actually is more like a proxy, managing subscribed addresses, and otherwise mostly relaying requests down to the underlying API.

The two API daemons are using gevent co-operative threading with the expectation this will allow them to support many concurrent connections. How well this works still needs to be fully tested but given there are three interfaces (web socket, long polling sync, Electrum) that can hold connections open it seems like a good move. Other reports from highly concurrent web servers based on gevent seem to indicate it should do well. sqlchain-api is an WSGI server built with gevent-websocket; which is a layer over pywsgi, providing Web, API and Websocket support.

So, what next? More development work. I have a DEMO WEB/API server here, used for my local testing. I expect to soon move it to an AWS instance for public beta testing. It offers real-time blockchain/sqlchain statistics as part of it's "/status" API. I will add some blog posts describing some of the API extensions, design decisions and performace measures, and a tutorial on running Electrum over a pruning node. The open source code is available on my GitHub. First, I have to fully sync the chain again.

I'd very much welcome feedback from the community. And if so inclined donations will help offset diversions on paid work. Everyone needs to eat.

Main Screen Today I found myself wanting to produce a quick slideshow video combining some photos and some music. A pretty awesome tool for this is PhotoFilmStrip. I tried a few others but this was the the easiest I found that also produced excellent output quality. It did have a couple limitations which I resolved with a very modest bit of hacking that I'm going to share here.

Adding H264 Support

The first thing I wanted was output in h264 MP4 format for viewing in XBMC. After digging through a bit of python code I found that a few small mods would achieve this. I've posted these changes as a Gist so others can use them. Note that you should back up the two relevant files in case of problems, and also need to edit them as root.

I just add a new class similar to the MPEG4-XVID one that has workable Mencoder options, and then make it available in the list shown by the render dialog. This worked quite well for me and I tested the output files in XBMC (after a bit of tooling around with test values). There is some warnings in the err log file but these did not seem to cause any problems.

Disabling Ken Burns

The next thing I found a bit limiting was not being able to bypass the pan/zoom "Ken Burns" effect easily. You can manually stop it by clicking the lock icon (between start/end images), and then adjusting the image scaling (with scroll wheel/mouse movements) but it's not accurate and must be done for every image in the slideshow. So I went looking for a way to edit the slideshow control settings and found it's just a sqlite3 database file. That's cool. So with a bit more fiddling I found I could write a one line sql statement that would instantly set all images to center stage and have no pan/zoom - like a simpler slideshow program might do. The nice thing about this method is you could potentially get very fancy by generating the slideshow control pan/zooms with a small script. I don't need that now but it's nice to know it could be done quite easily.

So here's the very simple sql for centering in fixed position (in HD resolution 1280x720, change accordingly if desired):

update picture set start_left=-(1280-width)/2,start_top=-(720-height)/2,start_width=1280,start_height=720, target_left=-(1280-width)/2,target_top=-(720-height)/2,target_width=1280,target_height=720;

You can put this in a file and pipe it into sqlite3 on the command line or use echo, like this:

echo "update picture set start_left=-(1280-width)/2,start_top=-(720-height)/2,start_width=1280,start_height=720, target_left=-(1280-width)/2,target_top=-(720-height)/2,target_width=1280,target_height=720;" | sqlite3 path/to/slideshow.pfs

If you want to avoid the command line then any sqlite3 editor can be used. I initially tested this with the SQLite Manager Firefox plugin by selecting the pfs file and then executing the sql statement from there.

As a side note, something that's not immediately clear with PhotoFilmStrip is when you add an audio file to the project it forces the slideshow length to match the audio and then adjusts the timing of each image (proportionately according to actual time setting on each) so the sum of all image times match the audio. This is handy as long as that's what you want.

It would be nice to be able to set a background image for the slideshow. I haven't looked into this yet. Maybe I will some day.

Main Screen Last summer I wrote an Android app. It was my first Java in more than a decade and mostly my intent was to figure out what this platform was all about. I needed something fairly easy but it still had to touch on many aspects of the user interface and use enough API calls to be a good learning vehicle. A dice bias/fairness testing app seemed a good fit.

The idea for this sprung from my recent use of dice for Bitcoin wallet seed generation. I had done a bit of research into whether dice were good for this or had noticeable bias. I came across an exellent pair of blogs posts on Delta's D&D Hotspot from several years earlier, where he laid out the details of using Pearson's Chi-Square Test for evaluating dice. A later post also goes into more detail about the "power" of this test. Both are well worth reading if you care about evaluating dice bias and the math behind it.

I'm not much of a mathematician. I could never really handle the theorems and proofs side but I was not bad at actually using calculus, differential equations and linear algebra. I gathered enough from the above articles to move ahead and work out the code for the basic math for this app. To the best of my knowledge, and from testing, it appears to work correctly. Regardless, I have had a couple responses from users where they are greatly disappointed in the fairness of common (usually cheap round edged) dice.

The overall idea was that using the Chi-Square test meant jotting down dice values from a large set of rolls, and then doing a few calculations to give a statistical indicator of fairness. Stats Screen It seemed like an app could save the manual work, and also allow for keeping a dice log so that you could keep doing an ongoing evaluation (while using the dice in a game of some sort). This is what apps are good for - removing the grunt work.

Having coded it up over a few days I tested a few dice and some coins. Then I promptly put it aside and forgot about it. Come around to December and I thought why not put this up in the app store for others to use? That's when I discovered you actually have to pay to put an app on the Google App Store, and I wasn't too much interested in that. I did check a few other options and found I could put it on Amazon for free. So that's what I did. Over the next month I think I had one single download. Obviously there isn't much interest in Dice Testing, or maybe Amazon App Store isn't popular, or both possibilties combined.

After coming across some dice questions on /r/Bitcoin last month I also put up on Github the same signed APK for reddit users to test out. I kind of hinted at eventually releasing the source code. Today I finally added an MIT open source license and put up everything (with signed APK for those not wanting to build it). All part of my new efforts to be more social, build up a web presence, or in other words "actually do something" for a change. I think I'll also look at submitting it to F-Droid as well - but they make it sound somewhat arduous.

Maybe some netizens will actually build it, use it, test'em dice. Let me know if you do. Cheers.

Linux, Electronics, Open Source Programming, Bitcoin, and more

© Copyright 2018 neoCogent. All rights reserved.

About Me - Hire Me